Development of Statistical Methods for Multipollutant Research

Research Report 183, Parts 1 and 2, June 2015

This report contains two studies, by Drs. Brent A Coull and Eun Sug Park and their colleagues, and a Commentary discussing each study individually, as well as an Integrative Discussion of the two.

Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents. Brent A. Coull, Jennifer F. Bobb, Gregory A. Wellenius, Marianthi-Anna Kioumourtzoglou, Murray A. Mittleman, Petros Koutrakis, and John J. Godleski.

Part 2. Development of Enhanced Statistical Methods for Assessing Health Effects Associated with an Unknown Number of Major Sources of Multiple Air Pollutants. Eun Sug Park, Elaine Symanski, Daikwon Han, and Clifford Spiegelman.

This report is the first result of HEI's effort to spur innovation in how statistics are applied to analyze air pollution effects in a multipollutant world. Air pollution is a complex mixture of gases and particles of varying sizes and composition. The concentrations of individual pollutants are often highly correlated with one another, reflecting similar sources, or may differ in how well exposures to them are measured in space and over time. Such factors create substantial challenges for conventional statistical methods, which focus on just one or, at most, a small handful of pollutants. In particular, scientists have been concerned that conventional methods might over- or underestimate the health effects associated with individual pollutants or sources, a concern that is raised regularly in decision making about ambient air quality and emissions standards.

In this Research Report, Coull and Park offer alternatives to the conventional "two-stage" approaches to the analysis of ambient air pollution in which exposures and health outcomes are first estimated separately and then combined in one model. Coull and associates developed methods to identify which key pollutants within a simple mixture are most closely associated with adverse health outcomes, to accommodate linear and non-linear exposure–response relationships, and to characterize uncertainty in the estimated health effects more fully. Park and colleagues extended existing methods for characterizing relationships between emission sources and health by allowing for the contributions from sources to be correlated and by making sure that the health effects estimates account for uncertainties in estimating the source contributions.