Dataset: Surface waters in socially vulnerable areas are disproportionately under-monitored for nutrients in the U.S. South Atlantic-Gulf Region
Data files
Mar 27, 2025 version files 61.96 MB
-
AL_counts.zip
1.59 MB
-
FL_counts.zip
3.80 MB
-
GA_counts.zip
2.83 MB
-
merged.zip
12.91 MB
-
monitored.csv
1.99 MB
-
MS_counts.zip
716.21 KB
-
NC_counts.zip
2.66 MB
-
README.md
18.99 KB
-
SAGR_stations_18_22.csv
2.17 MB
-
SC_counts.zip
1.56 MB
-
state_boundaries.zip
437.69 KB
-
svi_SAG_full.zip
19.33 MB
-
unmonitored.csv
9.91 MB
-
VA_counts.zip
491.56 KB
-
WBDHU2.zip
1.53 MB
Abstract
In this study, we investigated: Are water quality monitoring stations proportionally distributed across communities of varying social vulnerability? We specifically focus on nutrient monitoring of surface waters in the South Atlantic-Gulf region, a water-rich area with diverse land uses and communities spanning the social vulnerability spectrum. We used 2018-2022 data from the U.S. Geological Survey (USGS) National Water Information System and U.S. Environmental Protection Agency Storage and Retrieval database to compare station locations to census tract-scale metrics from the U.S. Center for Disease Control Social Vulnerability Index (SVI) and hydrography from the USGS. Statistical analyses revealed a significant disparity in the distribution of active monitoring station placements, with more monitoring stations in lower vulnerability areas and fewer in highly vulnerable areas. Stations were also clustered in areas of similar SVI values; areas were less likely to be monitored if they were near areas of differing SVI.
https://doi.org/10.5061/dryad.7d7wm3858
Files and variables:
The zipped files contain all of the individual files that are required to open, project, and manipulate shapefiles. All of the shape (.shp) files used in this study contain the geometry and attributes of geospatial features (e.g., points, lines, polylines, polygons). The zipped file bundle contains the main file .shp and companion files including: .cpg, .dbf, .prj, .qmd, and .shx.
The main .shp file can be opened and analyzed by Python, R, and many other programming languages, and open-source geospatial software such as QGIS, SAGA GIS, GRASS GIS, GeoDa, etc. These .shp files were the base of much of this study’s analysis.
merged.zip: contains data from both the Centers for Disease Control and Prevention Social Vulnerability Index (SVI) and a series of geospatial processes conducted in both Python and QGIS. This analysis uses the nationwide, census tract-scale 2022 version of the SVI; see https://svi.cdc.gov/map25/data/docs/SVI2022Documentation_ZCTA.pdf for data documentation and information on all column headers. The first set of columns (OBJECTID to FID) are all from the CDC SVI dataset. The remaining columns were created in the analysis (see oates_et_al_nature_water.ipynb).
svi_SAG_full.zip is the SVI data for all 9 states (Alabama, Florida, Georgia, Louisiana, Mississippi, North Carolina, South Carolina, Tennessee, and Virginia) in the South Atlantic-Gulf region.
[state abbreviation]_counts.zip (e.g., FLcounts.zip, NCcounts.zip) are shapefiles of study area census tracts for each state in the South Atlantic-Gulf region (Louisiana and Tennessee were excluded since both states have so few monitoring stations inside of the South Atlantic-Gulf region). The shapefiles contain a column named “sts” that represents the number of active monitoring stations in each of that state’s census tracts.
The aforementioned files all share the following column and variable information.
The following column headings are all original to the 2022 CDC/ATSDR Social Vulnerability Index. Detailed explanation of what each column denotes can be found at: https://svi.cdc.gov/map25/data/docs/SVI2022Documentation_ZCTA.pdf
- OBJECTID: Object ID (not in documentation, database-specific identifier)
- ST: State-level FIPS code
- STATE: State name
- ST_ABBR: State abbreviation
- STCNTY: County-level FIPS code
- COUNTY: County name
- FIPS: Tract-level geographic identification code
- LOCATION: Text description of tract, county, state
- AREA_SQMI: Tract area in square miles
- E_TOTPOP: Total population estimate
- M_TOTPOP: Margin of error for total population estimate
- E_HU: Estimate of total housing units
- M_HU: Margin of error for total housing units
- E_HH: Estimate of total households
- M_HH: Margin of error for total households
- E_POV150: Estimate of persons below 150% of the poverty line
- M_POV150: Margin of error for persons below 150% of the poverty line
- EP_POV150: Percentage of persons below 150% of the poverty line
- MP_POV150: Margin of error for percentage of persons below 150% of the poverty line
- E_UNEMP: Estimate of unemployed civilians age 16+
- M_UNEMP: Margin of error for unemployed civilians age 16+
- EP_UNEMP: Percentage of unemployed civilians age 16+
- MP_UNEMP: Margin of error for percentage of unemployed civilians age 16+
- E_HBURD: Estimate of cost-burdened housing units (income < $75k spending 30%+ on housing)
- M_HBURD: Margin of error for cost-burdened housing units (income < $75k)
- EP_HBURD: Percentage of housing cost-burdened households
- MP_HBURD: Margin of error for housing cost-burdened households
- E_NOHSDP: Estimate of persons age 25+ with no high school diploma
- M_NOHSDP: Margin of error for persons with no high school diploma
- EP_NOHSDP: Percentage of persons 25+ with no high school diploma
- MP_NOHSDP: Margin of error for percentage of persons with no high school diploma
- E_UNINSUR: Estimate of uninsured persons in civilian noninstitutionalized population
- M_UNINSUR: Margin of error for uninsured estimate
- EP_UNINSUR: Percentage of uninsured persons
- MP_UNINSUR: Margin of error for percentage of uninsured persons
- E_AGE65: Estimate of persons age 65 and older
- M_AGE65: Margin of error for persons age 65 and older
- EP_AGE65: Percentage of persons aged 65 and older
- MP_AGE65: Margin of error for persons aged 65 and older
- E_AGE17: Estimate of persons age 17 and younger
- M_AGE17: Margin of error for persons age 17 and younger
- EP_AGE17: Percentage of persons aged 17 and younger
- MP_AGE17: Margin of error for persons aged 17 and younger
- E_DISABL: Estimate of persons with a disability in civilian noninstitutionalized population
- M_DISABL: Margin of error for persons with a disability
- EP_DISABL: Percentage of persons with a disability
- MP_DISABL: Margin of error for percentage of persons with a disability
- E_SNGPNT: Estimate of single-parent households with children under 18
- M_SNGPNT: Margin of error for single-parent households
- EP_SNGPNT: Percentage of single-parent households with children
- MP_SNGPNT: Margin of error for single-parent households
- E_LIMENG: Estimate of persons age 5+ who speak English ‘less than well’
- M_LIMENG: Margin of error for limited English estimate
- EP_LIMENG: Percentage of persons (5+) who speak English ‘less than well’
- MP_LIMENG: Margin of error for limited English proficiency
- E_MINRTY: Estimate of racial/ethnic minority persons (non-White, non-Hispanic)
- M_MINRTY: Margin of error for racial/ethnic minority estimate
- EP_MINRTY: Percentage of minority population
- MP_MINRTY: Margin of error for minority population
- E_MUNIT: Estimate of housing units in structures with 10+ units
- M_MUNIT: Margin of error for multi-unit housing estimate
- EP_MUNIT: Percentage of housing in multi-unit structures
- MP_MUNIT: Margin of error for multi-unit housing
- E_MOBILE: Estimate of mobile homes
- M_MOBILE: Margin of error for mobile homes
- EP_MOBILE: Percentage of mobile homes
- MP_MOBILE: Margin of error for mobile homes
- E_CROWD: Estimate of crowded households (more people than rooms)
- M_CROWD: Margin of error for crowded households
- EP_CROWD: Percentage of crowded households
- MP_CROWD: Margin of error for crowded households
- E_NOVEH: Estimate of households with no vehicle available
- M_NOVEH: Margin of error for households with no vehicle
- EP_NOVEH: Percentage of households with no vehicle
- MP_NOVEH: Margin of error for households with no vehicle
- E_GROUPQ: Estimate of persons in group quarters
- M_GROUPQ: Margin of error for persons in group quarters
- EP_GROUPQ: Percentage of persons in group quarters
- MP_GROUPQ: Margin of error for persons in group quarters
- EPL_POV150: Percentile rank: percent below 150% poverty
- EPL_UNEMP: Percentile rank: unemployment rate
- EPL_HBURD: Percentile rank: housing cost burden
- EPL_NOHSDP: Percentile rank: no high school diploma
- EPL_UNINSU: Percentile rank: uninsured population
- SPL_THEME1: Summed percentile of socioeconomic indicators
- RPL_THEME1: Rank of socioeconomic vulnerability theme
- EPL_AGE65: Percentile rank: age 65+ population
- EPL_AGE17: Percentile rank: age 17 and younger population
- EPL_DISABL: Percentile rank: population with disability
- EPL_SNGPNT: Percentile rank: single-parent households
- EPL_LIMENG: Percentile rank: limited English proficiency
- SPL_THEME2: Summed percentile of household characteristics
- RPL_THEME2: Rank of household characteristics theme
- EPL_MINRTY: Percentile rank: minority population
- SPL_THEME3: Summed percentile of minority status indicators
- RPL_THEME3: Rank of racial and ethnic minority status theme
- EPL_MUNIT: Percentile rank: housing in multi-unit structures
- EPL_MOBILE: Percentile rank: mobile homes
- EPL_CROWD: Percentile rank: crowded households
- EPL_NOVEH: Percentile rank: households with no vehicle
- EPL_GROUPQ: Percentile rank: persons in group quarters
- SPL_THEME4: Summed percentile of housing type and transport
- RPL_THEME4: Rank of housing and transportation theme
- SPL_THEMES: Summed percentile of all themes
- RPL_THEMES: Overall vulnerability rank
- F_POV150: Flag for high vulnerability (90th percentile) in persons below 150% poverty
- F_UNEMP: Flag for high vulnerability in unemployment rate
- F_HBURD: Flag for high vulnerability in housing cost burden
- F_NOHSDP: Flag for high vulnerability in education (no high school diploma)
- F_UNINSUR: Flag for high vulnerability in uninsured population
- F_THEME1: Sum of flags for socioeconomic status theme
- F_AGE65: Flag for high vulnerability in population aged 65 and older
- F_AGE17: Flag for high vulnerability in population aged 17 and younger
- F_DISABL: Flag for high vulnerability in population with disability
- F_SNGPNT: Flag for high vulnerability in single-parent households
- F_LIMENG: Flag for high vulnerability in limited English proficiency
- F_THEME2: Sum of flags for household characteristics theme
- F_MINRTY: Flag for high vulnerability in minority population
- F_THEME3: Sum of flags for racial and ethnic minority status theme
- F_MUNIT: Flag for high vulnerability in multi-unit housing
- F_MOBILE: Flag for high vulnerability in mobile homes
- F_CROWD: Flag for high vulnerability in crowded households
- F_NOVEH: Flag for high vulnerability in households with no vehicle
- F_GROUPQ: Flag for high vulnerability in group quarters population
- F_THEME4: Sum of flags for housing type and transportation theme
- F_TOTAL: Total number of flags across all themes
- E_DAYPOP: Estimated daytime population from LandScan 2021
- E_NOINT: Estimate of households without internet subscription
- M_NOINT: Margin of error for households without internet subscription
- E_AFAM: Estimate of Black/African American, not Hispanic or Latino population
- M_AFAM: Margin of error for Black/African American estimate
- E_HISP: Estimate of Hispanic or Latino population
- M_HISP: Margin of error for Hispanic or Latino estimate
- E_ASIAN: Estimate of Asian, not Hispanic or Latino population
- M_ASIAN: Margin of error for Asian population
- E_AIAN: Estimate of American Indian/Alaska Native, not Hispanic or Latino
- M_AIAN: Margin of error for American Indian/Alaska Native estimate
- E_NHPI: Estimate of Native Hawaiian/Other Pacific Islander, not Hispanic or Latino
- M_NHPI: Margin of error for NHPI estimate
- E_TWOMORE: Estimate of two or more races, not Hispanic or Latino
- M_TWOMORE: Margin of error for two or more races estimate
- E_OTHERRAC: Estimate of some other race, not Hispanic or Latino
- M_OTHERRAC: Margin of error for some other race estimate
- EP_NOINT: Percentage of households without internet subscription
- MP_NOINT: Margin of error for percentage without internet subscription
- EP_AFAM: Percentage of Black/African American, not Hispanic or Latino
- MP_AFAM: Margin of error for percentage of Black/African American
- EP_HISP: Percentage of Hispanic or Latino population
- MP_HISP: Margin of error for percentage of Hispanic or Latino
- EP_ASIAN: Percentage of Asian, not Hispanic or Latino
- MP_ASIAN: Margin of error for percentage of Asian
- EP_AIAN: Percentage of American Indian/Alaska Native, not Hispanic or Latino
- MP_AIAN: Margin of error for percentage of American Indian/Alaska Native
- EP_NHPI: Percentage of Native Hawaiian/Other Pacific Islander, not Hispanic or Latino
- MP_NHPI: Margin of error for percentage of NHPI
- EP_TWOMORE: Percentage of two or more races, not Hispanic or Latino
- MP_TWOMORE: Margin of error for percentage of two or more races
- EP_OTHERRA: Percentage of some other race, not Hispanic or Latino
- MP_OTHERRA: Margin of error for percentage of some other race
The following columns were created at varying stages of our analysis:
- Shape_Leng: length of waterways and water body perimeters in degrees
- Shape_Area: area of census tracts in degrees2
- FID: unique tract identifier
- SVI_characterization: SVI decile characterization
- flowline_length_km: length of waterways and water body perimeters in degrees
- area_sqkm: area of census tracts in kilometers2
- pop_den_sqkm: population density of census tracts in person/kilometers2
- FLLR_tract: Flowline length ratio of tract in kilometers/kilometers2
- sts: the number of water quality monitoring stations in a tract
monitored.csv and unmonitored.csv are subset from the merged.zip dataset and include lists of all monitored and unmonitored study area census tracts, respectively. Monitored tracts have at least one active nutrient monitoring station, and unmonitored tracts have 0 stations. Since they are both subsets of merged.zip, they share all of the column and variable information of merged.zip.
SAGR_stations_18_22.csv contains all of the water quality monitoring stations in the contiguous United States and Washington D.C. that recorded at least twenty concentration observations from at least twenty sampling activities between January 1st, 2018, and December 31st, 2022; this approximates seasonal data collection (i.e., four observations per year). The column headings are original to the EPA Water Quality Portal Portal (https://www.waterqualitydata.us/) download. See https://www.waterqualitydata.us/portal_userguide/ for additional metadata and column descriptions.
The column and variable definitions and descriptions are as follows:
- OrganizationIdentifier: A code used to uniquely identify a specific organization or business.
- OrganizationFormalName: The official legal name of the organization.
- MonitoringLocationIdentifier: A unique code or name used to identify a sampling location.
- MonitoringLocationName: The name given by the organization for the place where they collect data.
- MonitoringLocationTypeName: A description of the kind of place being monitored (like a stream, well, etc.).
- MonitoringLocationDescriptionText: A written description of the sampling location.
- HUCEightDigitCode: An 8-digit code that identifies the watershed or hydrologic unit the site is in.
- DrainageAreaMeasure.MeasureValue: The size of the land area that drains to the location, in specific units.
- DrainageAreaMeasure.MeasureUnitCode: The unit used to measure the drainage area (like square kilometers or miles).
- ContributingDrainageAreaMeasure.MeasureValue: The part of the drainage area that actually contributes flow to the location.
- ContributingDrainageAreaMeasure.MeasureUnitCode: The unit used for that contributing drainage area.
- LatitudeMeasure: How far north or south the site is from the equator.
- LongitudeMeasure: How far east or west the site is from the prime meridian.
- SourceMapScaleNumeric: The scale of the map used to determine the coordinates (like 1:24,000).
- HorizontalAccuracyMeasure.MeasureValue: How accurate the horizontal (latitude/longitude) location is, in a specific unit.
- HorizontalAccuracyMeasure.MeasureUnitCode: The unit used to measure the horizontal accuracy.
- HorizontalCollecitonMethodName: The method used to collect latitude and longitude (like GPS).
- HorizontalCoordinateReferenceSystemDatumName: The coordinate system used to define the location (like NAD83 or WGS84).
- VerticalMeasure.MeasureValue: How high or low the site is above sea level, in specific units.
- VerticalMeasure.MeasureUnitCode: The unit used to measure the vertical height (like meters or feet).
- VerticalAccuracyMeasure.MeasureValue: How accurate the vertical elevation is.
- VerticalAccuracyMeasure.MeasureUnitCode: The unit for measuring that vertical accuracy.
- VerticalCollectionMethodName: The method used to collect the elevation.
- VerticalCoordinateReferenceSystemDatumName: The reference system used for elevation (like NAVD88).
- CountryCode: A code representing the country (like “US”).
- StateCode: A code representing the U.S. state or territory.
- CountyCode: A code representing the county.
- AquiferName: The name of the underground layer of water (if it’s a well).
- ***LocalAqfrName: The local name of aquifers
- FormationTypeText: The name of the main type of rock or soil where the well is completed.
- AquiferTypeName: What kind of aquifer it is — like confined or unconfined.
- ConstructionDateText: When the well was built (could just be the year).
- WellDepthMeasure.MeasureValue: Total depth of the well from the surface.
- WellDepthMeasure.MeasureUnitCode: The unit used to measure that well depth.
- WellHoleDepthMeasure.MeasureValue: Depth of the drilled hole at the time of well completion.
- WellHoleDepthMeasure.MeasureUnitCode: The unit used to measure that hole depth.
- ProviderName: The name of the database that gave the data (like WQX or NWIS).
***LocalAqfrName: was not listed on https://www.waterqualitydata.us/portal_userguide/, but it likely refers to the local/colloquial names for aquifers.
state_boundaries.zip contains the state boundaries for Alabama, Florida, Georgia, Louisiana, Mississippi, North Carolina, South Carolina, Tennessee, and Virginia. Data from the U.S. Census Bureau (https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html).
WBDHU2.zip is the USGS-defined South Atlantic-Gulf region’s watershed boundary (https://apps.nationalmap.gov/downloader/).
Code/software:
All of the .zip and .csv files must be in the same file directory in order to fully run oates_et_al_nature_water.ipynb (Python Version 3.10.12 and QGIS version 3.32.2 Lima) and/or oates_et_al_nature_water.R (R Version 4.3.1).
In the Python script (.ipynb), the packages used are geopandas, pandas, numpy, matplotlib.pyplot, matplotlib.patches, matplotlib.cm, matplotlib.colors, matplotlib.lines, pysal, esda, folium, glob, sys, matplotlib, rasterio, libpysal, splot, adjustText, contextily, json, rtree, math, scipy.stats, seaborn, seaborn.objects, statsmodels.api, plotly.graph_objs, plotly.express, pointpats.quadrat_statistics, requests, and io.
In the R script (.R), the packages used are tidyverse, janitor, forcats, lubridate, dplyr, caret, readr, sf, ggplot2, gridExtra, gghalves, ggdist, patchwork, extrafont, and showtext.
Access and sharing information:
All geospatial data used in our analyses are freely available online from U.S. government agencies (U.S. Geological Survey, U.S. Centers for Disease Control & Prevention, and U.S. Environmental Protection Agency).