Dataset parent CARB 19RD004: Daily pollutant concentrations of NO2, PM2.5 and O3 of 100 m resolution for California 2012-2019
Data files
Aug 01, 2024 version files 503.61 MB
-
no2_2012_01_01.7z
164.17 MB
-
o3_2012-01-01.7z
108.46 MB
-
pm25_2012_01-01.7z
230.98 MB
-
README.md
5.12 KB
Abstract
This study bridges gaps in air pollution research by examining exposure dynamics in disadvantaged communities. Using cutting-edge machine learning and massive data processing, we produced high-resolution (100 m) daily air pollution maps for nitrogen dioxide (NO2), fine particulate matter (PM2.5), and ozone (O3) across California from 2012 to 2019. Our findings revealed opposite spatial patterns of NO2 and PM2.5 to that of O3. We also identified consistent, higher pollutant exposure for disadvantaged communities from 2012 to 2019 though the most disadvantaged communities saw the largest NO2 and PM2.5 reductions and the advantaged neighborhoods experienced the greatest rising O3 concentrations. Further, day-to-day exposure variations decreased for NO2 and O3. The disparity in NO2 exposure decreased, while it persisted for O3. Additionally, PM2.5 showed increased day-to-day variations across all communities, due to the increase in wildfire frequency and intensity, particularly affecting advantaged suburban and rural communities.
https://doi.org/10.5061/dryad.6djh9w18p
This study presents a comprehensive dataset consisting of high-resolution (100 m) daily air pollution surfaces for nitrogen dioxide (NO2), fine particulate matter (PM2.5), and ozone (O3) across California from 2012 to 2019. The dataset is the result of cutting-edge machine learning techniques and extensive data processing, integrating air quality monitoring data, traffic data, land use and land cover data, meteorological data, aerosol optical depth data, and other remote sensing data from various sources. They were acquired from the US Environmental Protection Agency (EPA), the California Air Resources Board (CARB), the Google Earth Engine, the National Aeronautics and Space Administration (NASA), and the US Geological Survey (USGS). The project is funded by CARB (19RD004) and the dataset generated from the project could be used for environmental health research and the identification of air pollution exposure disparities at the neighborhood level. The methodology for modeling concentrations of the three air pollutants can be found in the following publication: Jason G. Su, Vy Vuong, Eahsan Shahriary, Emma Yakutis, Emma Sage, Rebecca Haile, John Balmes, Michael Jerrett, Meredith Barrett, “Examining Air Pollution Exposure Dynamics in Disadvantaged Communities through High-Resolution Mapping,” Science Advances (in press).
Description of the data and file structure
The concentrations of NO2, PM2.5, and O3 are, respectively, in ppb, ug/m^3 and ppb. The dataset is organized by pollutant (NO2, PM2.5, O3) and each zip file contains two years of daily data. All the daily data in a year for a pollutant are contained in one folder, allowing users to access daily data for specific years. The pollutant concentration surfaces are provided in GeoTIFF (tif) format, using the NAD 1983 Albers projection. There are a total of six datasets, each with its own DOI:
- Dataset 1 - CARB 19RD004 NO2 (2012-2015): https://doi.org/10.5061/dryad.dncjsxm6z
- Dataset 2 - CARB 19RD004 NO2 (2016-2019): https://doi.org/10.5061/dryad.j3tx95xp4
- Dataset 3 - CARB 19RD004 PM2.5 (2012-2013): https://doi.org/10.5061/dryad.5qfttdzf6
- Dataset 4 - CARB 19RD004 PM2.5 (2014-2015): https://doi.org/10.5061/dryad.wm37pvmw6
- Dataset 5 - CARB 19RD004 PM2.5 (2016-2017): https://doi.org/10.5061/dryad.9w0vt4bpx
- Dataset 6 - CARB 19RD004 PM2.5 (2018-2019): https://doi.org/10.5061/dryad.h70rxwds5
- Dataset 7 - CARB 19RD004 O3 (2012-2015): https://doi.org/10.5061/dryad.jsxksn0jh
- Dataset 8 - CARB 19RD004 O3 (2016-2019): https://doi.org/10.5061/dryad.0vt4b8h6h
Sharing/Access information
A sample dataset from January 1, 2012 for each pollutant is included in the general section, including
- NO2: no2_2012_01_01.7z
- PM2.5: pm25_2012_01_01.7z
- O3: o3_2012_01_01.7z
The dataset integrates data from various sources:
- Air quality monitoring data: USEPA and CARB regulatory monitoring, supplemented by Google Streetcar measurements.
- Daily traffic data: CalTrans Performance Measurement System (PeMS).
- Land use data: Statewide parcel data for the year 2019.
- Land cover data: Google Earth Engine National Land Cover Database (NLCD) for the year 2016.
- Daily meteorological data: Google Earth Engine GridMet data.
- Aerosol Optical Depth (AOD) data: Google Earth Engine MCD19A2.006: Terra & Aqua MAIAC Land Aerosol Optical Depth.
- Other remote sensing data: Ozone Monitoring Instrument (OMI) for NO2 and O3 concentrations from NASA.
- Street network data: Business Analysts from the Environmental Systems Research Institute for calculating proximity to highways and major roadways.
- Digital Elevation Model (DEM) data: the US Geological Survey.
Code/Software
This repository contains scripts utilized for data processing and modeling across different domains. It encompasses three types of scripting languages:
- JavaScript scripts (available in Google_Earth_Engine_scripts.zip) were employed for processing Google Earth Engine data.
- R scripts were utilized for data processing and machine learning in the context of Land Use Regression (LUR) modeling. The repository includes R scripts dedicated to this purpose (R_Analysis_DSA_models.7z). The repository also contains R-based Deletion/Substitution/Addition (DSA) modeling source codes (DSA_3.1.4.tar.gz and modelUtils_3.1.4.tar.gz) that require compilation for execution on the current R platform.
- Python scripts (found in NO2_surface_scripts_python.7z, PM2.5_surface_scripts_python.7z, and O3_surface_scripts_python.7z) were used to generate daily air pollution surfaces.
This dataset was collected and processed as detailed in our Science Advances paper:
Jason G. Su, Vy Vuong, Eahsan Shahriary, Emma Yakutis, Emma Sage, Rebecca Haile, John Balmes, Michael Jerrett, Meredith Barrett. "Examining Air Pollution Exposure Dynamics in Disadvantaged Communities through High-Resolution Mapping." Science Advances, https://doi.org/10.1126/sciadv.adm9986.