The impact of sea-level and groundwater rise on indoor exposure to volatile organic compounds near contaminated sites in socially vulnerable communities in the San Francisco Bay Area
Data files
Apr 24, 2026 version files 156.85 MB
-
GA_Hill_VOC_Sites.zip
76.66 KB
-
Lasky_FlowlineResults.Rmd
4.59 KB
-
modpath2shp_loop_240315.py
13.56 KB
-
NetworkAnalystSlopeMethod.txt
6.28 KB
-
OPC_Data_Dictionary_Infrastructure.pdf
67.89 KB
-
README.md
20.08 KB
-
SF_BayArea_pathlines.zip
21.03 MB
-
sfbay_voc_sites_flowlines.zip
135.62 MB
-
SupplementalTable1SiteDescriptions.txt
7.74 KB
-
SupplementaryTableOSMRoads2.txt
457 B
Abstract
We examined three publicly available datasets, including EnviroStor, GeoTracker, and the US EPA’s Superfund Data and Reports from DTSC, RWQCB, and the EPA, respectively, to identify contaminants of concern at each location. These datasets provide detailed records of site contamination, monitoring activities, and remediation efforts. In addition, guidance, input, and insight from our community partners (Greenaction for Health and Environmental Justice), academic researchers, and state government agency experts helped us refine the study area and more accurately model the potential movement of VOCs from contaminated sites to nearby buildings. Most data included in this repository is derived from publicly available sources. Parcel-specific data was not included in this repository as it is publicly available elsewhere. Publicly available data is linked to the original source website and can be downloaded there.
Dataset DOI: 10.5061/dryad.9p8cz8wvv
Description of the data and file structure
Dataset DOI: 10.5061/dryad.9p8cz8wvv
Access information
Susceptible Population. Susceptible population information was derived from two data sources: the California Department of Social Services and the California Department of Education School Directory. The school directory includes all pre-K to 12th-grade schools registered with the California Department of Education (2024). Georeferenced eldercare and childcare facilities that were present in 2024 were downloaded from the California Department of Social Services (2024).
Sewer System. Sewer system spatial data is publicly unavailable, therefore OpenStreetMap roads were used as a proxy for sewers and were downloaded from GeoFabrik (2024). OpenStreetMap has 26 different road classifications, not all of which are relevant to this study (i.e., steps, cycleway, footway, etc.). Living street, motorway(link), primary(link), residential, secondary(link), service, and unclassified were determined to be the most likely to have sewers beneath them and, therefore, were the only road classifications included in this analysis.
Files and variables
File: sfbay_voc_sites_flowlines.zip
Description: Shapefiles of MODPATH modeled flowlines for each of the 21 sites included in this study.
File: GA_Hill_VOC_Sites.zip
Description: Shapefiles of parcel boundaries for the 21 sites included in this study, as defined by Greenaction and Hill et al. 2023.
File: SupplementalTable1SiteDescriptions.txt
Description: Site descriptions for each of the 21 sites including the site number, site name, county, site status, CalEnviroScreen score, oversight agency, and identified contaminants in a .txt format.
File: SupplementaryTableOSMRoads2.txt
Description: Inclusion/exclusion criteria for OpenStreetMap Roads included in this study in a .txt.
File: NetworkAnalystSlopeMethod.txt
Description: ArcGIS Pro geoprocessing protocol to identify slope along roads using SQL and step-by-step instructions for manually conducting the network analyst slope method in a .txt format.
File: OPC_Data_Dictionary_Infrastructure.pdf
Description: Repository of publicly available data downloaded for this project as a viewable and downloadable .pdf. A similar data dictionary is found on this page in the "Access Information" section.
File: SF_BayArea_pathlines.zip
Description: Shapefiles of MODPATH modeled pathlines for counties in the San Francisco Bay Area.
File: modpath2shp_loop_240315.py
Description: Python code to create shapefiles from MODPATH models. This code can also be downloaded from the data available in the Befus et. al. 2020 study which includes detailed descriptions of how to effectively run the code.
File: Lasky_FlowlineResults.Rmd
Description: .rmd file showing how results were coded and processed to obtain data on vulnerable population building use. Some files found within the .rmd file are not hosted on Dryad and instead must be created using the NetworkAnalysistSlopeMethod.txt in ArcGIS or a similar software and/or downloaded from in the Access Information section. Specifically, "Combined_Parcel_Points_Sewers_Sites.csv" requires the user to complete the NetworkAnalystSlopeMethod.txt steps and then convert the results into a .csv file. The csv. must include the following column titles "sitenum, fips_code, x_coord, y_coord, acerage, asmt_year, county, use_code_std_ctgr_lps_desc, year_built".
The "Combined_Sus_Pops_Sewers_Sites.csv" was created using the results from the NetworkAnalystSlopeMethod.txt to select buildings with susceptible populations from websites under the names "CDSS Susceptible Groups", "CSCD School Footprints", "OSHPD Healthcare Facilities", and "CA Public Schools and Districts Map" and downloaded as a .csv file. The .csv file must include the following column titles "sitenum, name, address, type".
The .rmd file can be opened in RStudio or similar software to view. RStudio can be downloaded on Posit for Mac or Windows computers.
Code/software
RStudio, Google Earth, and Python. RStudio code is found as an .rmd file. Shapefiles can be downloaded into Google Earth, ESRI products, or QGIS. Python code is included as a .py file and can be opened in any Python-supporting software. For this project, Anaconda was used.
Access information
Data was derived from the following sources and can be downloaded via the included links
Particle Flow Model
The San Francisco Bay Region’s groundwater basins are comprised of aquifer materials ranging from unconsolidated fill to fractured metamorphic rock complexes and surficial geologic features (e.g., paleochannels, alluvial fans) that influence the movement of groundwater across a gradient. To model the hydrologic fate of dissolved VOCs in groundwater originating within site parcel boundaries, we used MODPATH 7 (Pollock, 2016). The particle tracking was based on high-resolution (10 m x 10 m) one-layer, steady-state groundwater flow models conducted previously to quantify unconfined groundwater responses to sea-level rise (Befus et al., 2020). These models used a homogeneous and isotropic hydraulic conductivity of 1 m/day and a Bay constant head set to the mean higher high water (MHHW) tidal datum. No groundwater pumping or other remediation activities (e.g., enhanced drainage or impermeable barriers) were included in these models. For particle tracking, each site parcel identified within the San Francisco Bay Area was seeded with one particle per model grid cell, entering the flow model via recharge at the top of the model. All particles were allowed to flow until either a strong sink or a discharge location led to the particle leaving the model, and the San Francisco Bay was set as a secondary stop condition for particles. MODPATH 7 calculates sub-grid scale particle trajectories, such that the computed particle trajectories can include multiple vertices within a single groundwater cell (Pollock, 2016).
Sewer Connectivity Model
Using particle flow models developed for 21 sites in MODPATH, we created a spatial model that identifies the potential transport of VOCs through sewer systems and into buildings. This spatial model was built in GIS software (ESRI ArcGIS Pro 3.2.2). Since sewer line spatial data is not publicly available, we used roads as proxies (OpenStreetMap). Potential VOC pathways through the sewer line were drawn outwards from the point of intersection of each road and the modeled flow path from the parcel of origin.
Beckley and McHugh (2020) identified the distance contaminants may travel through sewer systems based on elevation gradients, providing a maximum uphill distance of 228.6 meters (750 feet) and a downhill distance of 685.8 meters (2250 feet) to represent the farthest distance that VOCs might travel through a sewer line (Beckley & McHugh, 2020). Z-values for elevation were identified at 76.2-meter (250 feet) intervals, starting from 0 and extending up to 685.8 meters from the intersection of flowlines with streets. Elevation data was obtained from USGS 10-meter resolution DEM topographic layers. If the difference in the Z-value between 0 and 228.6m was negative, indicating a downhill slope, our model extended the potential VOC travel distance 685.8m. If the difference in the Z-value between 0 and 228.6 meters was positive, indicating an uphill slope, the model limited the potential VOC travel distance to 228.6 meters.
Identifying Potentially Exposed Buildings and Their Uses
To identify buildings potentially impacted by vapor intrusion from VOC-containing sewer systems, we used 2020 parcel data to map buildings in proximity to sewer lines. Since sewer laterals typically connect buildings to main sewer lines, we created a 30m buffer around the potentially exposed sewer lines to account for the typical distance between sewers and buildings.
Using the 30-meter buffer, we identified the structures located within the zone of potential VOC exposure from the sewer line. The buildings were classified according to their primary use as defined in 2020 parcel data acquired from LandVision. The building uses of primary interest were residential buildings, schools, and daycares. In addition, the total number of potentially impacted structures was determined for each of the 17 individual and 4 combined sites. Residential buildings, schools, and daycares were of particular interest as they may house populations especially vulnerable to the health effects of VOC inhalation (Kuang et al., 2021; Madaniyazi et al., 2022).
Given the ongoing redevelopment of residential areas in the San Francisco Bay Area, we included priority development areas and previously permitted developments as defined by the Association of Bay Area Governments (ABAG). Lastly, we included the CalEnviroScreen percentile score for each site.
Data Management
Data was managed using three primary software platforms: RStudio, Python, and ArcGIS Pro. RStudio (R 4.3.1) was used to quantify the number of potentially exposed buildings. Python was used to automate the process of modeling contaminant movement through groundwater. ArcGIS Pro was used to model sewer intersections, identify buildings occupied by susceptible populations, and identify the social vulnerability of parcels intersecting those locations.
