Data and code from: A comparative analysis of wildland fire smoke PM2.5 exposure estimates across California from 2008-2018
Data files
May 25, 2026 version files 2.03 GB
-
00_setup.R
4.51 KB
-
01_summary_and_validation.R
13.97 KB
-
02_metrics.R
17.70 KB
-
03_statistical_tests.R
6.88 KB
-
04_case_studies.R
19.34 KB
-
05_in_text_numbers.R
2.42 KB
-
README.md
4.07 KB
-
smoke_comp_final_data.csv
2.03 GB
Abstract
This Dryad repository contains four daily wildfire smoke PM2.5 datasets for California (2008–2018), aggregated to the census tract level using a population-weighted approach. The datasets include: (1) a chemical transport model (CTM), the U.S. Environmental Protection Agency's CMAQ model (Wilkins and Connolly, 2024), originally at 12-km resolution; (2) a machine learning (ML) model for the contiguous U.S. (Childs, 2022); (3) a California-specific ML ensemble model (Aguilera, 2023); and (4) a hybrid CTM–ML fusion product, originally at 1-km resolution (Zhang, 2023).
All datasets provide isolated daily smoke PM2.5 (µg/m³). The Aguilera and Childs datasets were publicly available at the tract level and were not modified prior to analysis. The Wilkins (CMAQ) and Zhang (fusion) datasets were aggregated from their native gridded resolutions to population-weighted census tract centroids. The repository also contains R scripts used for the publication.
Dataset DOI: 10.5061/dryad.0zpc867bw
Description of the data and file structure
This is a compilation of four daily wildfire smoke PM2.5 datasets for California (2008–2018), aggregated to the census tract level using a population-weighted approach. The datasets include: (1) a chemical transport model (CTM), the U.S. Environmental Protection Agency's CMAQ model (Wilkins and Connolly, 2024), originally at 12-km resolution; (2) a machine learning (ML) model for the contiguous U.S. (Childs, 2022); (3) a California-specific ML ensemble model (Aguilera, 2023); and (4) a hybrid CTM–ML fusion product, originally at 1-km resolution (Zhang, 2023).
All datasets provide isolated daily smoke PM2.5 (µg/m³). The Aguilera and Childs datasets were publicly available at the tract level and were not modified prior to analysis. The Wilkins (CMAQ) and Zhang (fusion) datasets were aggregated from their native gridded resolutions to population-weighted census tract centroids.
In the file, GEOID is the 2010 census tract designation, and a date column is accompanied by day, year, and month columns to use for data analysis. Wilkins, Childs, Aguilera, and Zhang dataset columns are labeled accordingly.
Aguilera, R., Luo, N., Basu, R., Wu, J., Clemesha, R., Gershunov, A., & Benmarhnia, T. (2023). A novel ensemble-based statistical approach to estimate daily wildfire-specific PM2.5 in California (2006–2020). Environment International, 171, 107719. https://doi.org/10.1016/j.envint.2022.107719
Childs, M. L., Li, J., Wen, J., Heft-Neal, S., Driscoll, A., Wang, S., et al. (2022). Daily Local-Level Estimates of Ambient Wildfire Smoke PM2.5 for the Contiguous US. Environmental Science & Technology, 56(19), 13607–13621. https://doi.org/10.1021/acs.est.2c02934
Wilkins, Joseph, & Connolly, R. (2024). Wildland fire PM2.5 modeled estimates for the US from 2008-2018 (Version 3) [Data set]. Dryad. https://doi.org/10.5061/DRYAD.SXKSN03B3
Zhang, D., Wang, W., Xi, Y., Bi, J., Hang, Y., Zhu, Q., et al. (2023). Wildland Fires Worsened Population Exposure to PM2.5 Pollution in the Contiguous United States. Environmental Science & Technology, 57(48), 19990–19998. https://doi.org/10.1021/acs.est.3c05143
Files and variables
Code
Scripts should be run in the following order. All scripts require 00_setup.R to be run first.
Description: R scripts with code for analysis
00_setup.R— Loads all required libraries and imports all input datasets. Must be run before any other script.01_summary_and_validation.R— Produces Figure 1, Figure S1, Figure S2, Figure S4, and Table S2.02_metrics.R— Produces Figure 2, Figure S5, and Figure S6.03_statistical_tests.R— Produces Figure S7 and Figure S8.04_case_studies.R— Produces Figure 3, Figure S3, Figure S9, Figure S10, and Table 2.05_in_text_numbers.R— Calculates summary statistics cited in the manuscript text. Requires00_setup.Rand02_metrics.Rto be run first.
File: smoke_comp_final_data.csv
Description: Smoke PM2.5 dataset
Variables
- GEOID: 2010 census tract designation
- date: date (Jan 1, 2008-Dec. 31 2018)
- day: day of the year (1-365 or 366)
- year: year (2008-2018)
- month: month (numerical)
- wilkins: Wilkins smoke PM2.5
- childs: Childs smoke PM2.5
- aguilera: Aguilera smoke PM2.5
- zhang: Zhang smoke PM2.5
Access information
Other publicly accessible locations of the data:
