EPA particulate matter data – Analyses using Local Control Strategy


Analyses of large observational datasets tend to be complicated and prone to fault depending upon the variable selection, data cleaning and analytic methods employed. Here, we discuss the analysis of 2016 US environmental epidemiology data and outline a new “benchmark” Non-parametric and Unsupervised analysis of these data. Readers are invited to download our CSV file from the archive and apply whatever analytic approach they think appropriate. We hope to encourage the development of a widely-held “consensus view” on the effects of Secondary Organic Aerosols (Volatile Organic Compounds that have predominantly Biogenic or Anthropogenic origin) within PM2.5 particulate matter on Circulatory and/or Respiratory mortality. For example, the reanalyses described here focus on the question: “Can life in a region with an abundance of trees be relatively dangerous?”


The data set was constructed following the directions given in Pye et al. 2021.

Pye, H.O.T., Ward-Caviness, C.K., Murphy, B.N., Appel, K.W. and Seltzer, K.M. (2021), “Secondary organic aerosol association with cardiorespiratory disease mortality in the United States”. Nature Communications 12, 7215.

Variable names were shortened and simplified.

Usage Notes

The file is .csv. Excel can be used to open the files.

A file named Supp_to_LCSonEPA.PDF contains R-code that recreates many of the analyses and graphics displayed here. This PDF file can be downloaded from