Agriculture-urban interfaces, social vulnerability, and climate change shape West Nile virus risk across the United States
Data files
Dec 03, 2025 version files 316.30 MB
-
1_DataCleaning.Rmd
26.06 KB
-
2_CMIP6Cleaning.Rmd
9.28 KB
-
3_R0(T)Calculations.Rmd
42.21 KB
-
4_GAMAnalysis.Rmd
3.58 KB
-
5_EnvironmentalRiskMaps_v2.Rmd
14.75 KB
-
6_Figures.Rmd
41.84 KB
-
acs_over65_20251201.csv
41.31 KB
-
arbonet_wnvnd_20251201.csv
55.85 KB
-
bls_agemployed_20251201.csv
70.37 KB
-
cdc_svi_20251201.csv
70.94 KB
-
cmip6_rR0_allspecies.csv
11.50 MB
-
cmip6_ssp245_historic_county_long.csv
15.56 MB
-
cmip6_ssp245_late_county_long.csv
63.28 MB
-
cmip6_ssp245_mid_county_long.csv
71.89 MB
-
cmip6_ssp585_historic_county_long.csv
15.56 MB
-
cmip6_ssp585_late_county_long.csv
63.20 MB
-
cmip6_ssp585_mid_county_long.csv
71.85 MB
-
gam_covariates_full.csv
1.16 MB
-
gam_covariates_noR0_20251201.csv
1.18 MB
-
Manuscript_DriversofWNV_DataAvailability.pdf
149.15 KB
-
nlcd_urbanag_edgesummary_20251201.csv
232.29 KB
-
nlcd_urbanag_pct_20251201.csv
149.31 KB
-
README.md
23.46 KB
-
rR0_peakmonth_county_ssp245_hist.csv
162.02 KB
-
usgs_birds_20251201.csv
35.48 KB
Abstract
Climate and land use change are reshaping the dynamics of vector-borne diseases. West Nile virus (WNV), the most widespread zoonotic arbovirus in the United States, illustrates the need to integrate climate, land use, and social vulnerability across heterogenous landscapes when assessing spatial risk. We present a nationwide, county-level assessment of WNV risk, using complementary statistical and mechanistic models to (1) identify socio-ecological predictors of current WNV incidence, and (2) project species-specific, temperature-dependent transmission suitability under mid- and late-century climate change scenarios. We find that land use gradients, temperature-driven transmission, and both occupational and residential exposure jointly shape WNV incidence, particularly in mixed urban-agricultural landscapes. Future temperature and land use projections suggest spatially variable shifts in environmental suitability, driven by divergent physiological responses among Culex species vectors. Our results highlight temperature and land use as robust, mechanistically grounded predictors of WNV risk at the national scale, while underscoring the need for refined, species-specific analyses at local levels. These insights can inform more targeted surveillance, vector control, and climate adaptation strategies. We also identify key knowledge gaps, particularly around host and vector ecology, that must be addressed to improve public health response in the face of ongoing environmental change.
Dataset DOI: 10.5061/dryad.dz08kps9j
Description of the data and file structure
All of the data used in this study came from publicly available data sources. This table contains the links to the data sources. Listed are the data abbreviations used in the main text with general description and hyperlink to the data sources. Please see ‘Manuscript_DriversofWNV_DataAvailability.pdf’ for more details.
| Abbreviation | Description | Hyperlink |
|---|---|---|
| ArboNet | Centers for Disease Control and Prevention passive surveillance system for arboviruses | https://www.cdc.gov/west-nile-virus/data-maps/historic-data.html |
| ACS | US Census Bureau American Community Survey | https://data.census.gov/table/ACSST5Y2020.S0101?q=S0101&g=010XX00US$0500000 |
| NLCD | United States Geological Survey National Land Cover Database | https://www.mrlc.gov/data/land-cover-conus-6 |
| GAP | United States Geological Survey Gap Analysis Project | https://www.usgs.gov/programs/gap-analysis-project/science/species-data-download |
| QCEW | US Bureau of Labor Statistics Quarterly Census of Employment and Wages | https://data.bls.gov/cew/apps/data_views/data_views.htm#tab=Tables |
| SVI | US Center for Disease Control and Prevention and Agency for Toxic Substances and Disease Registry Social Vulnerability Index | https://svi.cdc.gov/dataDownloads/data-download.html |
| LOCAv2 | Localized Constructed Analogs version 2 produced by Pierce et al. 2023 (J. Hydrometerology) | https://cirrus.ucsd.edu/~pierce/LOCA2/NAmer/GFDL-CM4/0p0625deg/r1i1p1f1/ |
| FORE-SCE | United States Geological Survey Forecasting Scenarios of Land-use Change | https://www.usgs.gov/special-topics/land-use-land-cover-modeling/land-cover-modeling-methodology-fore-sce-model |
Files and variables
.Rmd files are scripts related to data processing, analysis, and figure making. Due to data not being compatible with CC0 license, code was commented out but retained to show how flat files for analysis were produced. All data in the study are publicly available and details to obtain these data sources can be found in the ‘Manuscript_DriversofWNV_DataAvailability.pdf’.
File: 1_DataCleaning.Rmd
Description: Code to format covariate data from various sources into a standardized version to make one .csv file for GAM analysis. For data sources with restricted publishing, the code was commented out but links to the original data source were provided. The commented code is meant to show how flat files for analysis were created.
File: 2_CMIP6Cleaning.Rmd
Description: Code to format CMIP6 projection data into a shape that can be plugged into R0(T) calculations. For data sources with restricted publishing, the code was commented out but links to the original data source were provided. The commented code is meant to show how flat files for analysis were created.
File: 3_R0(T)Calculations.Rmd
Description: Code to run R0(T) calculations with CMIP6 data for three time periods (current/historic, mid-century, late-century) under two emission scenarios (SSP2-4.5 and SSP5-8.5).
File: 4_GAMAnalysis.Rmd
Description: Code to run and assess GAM performance and results.
File: 6_Figures.Rmd
Description: Code to make figures and tables used in main text and supplement. For data sources with restricted publishing, the code was commented out but links to the original data source were provided. The commented code is meant to show how figures were made.
File: 5_EnvironmentalRiskMaps_v2.Rmd
Description: Code to create scaled environmental risk value for current time period and calculate the change of risk from current to mid-century time period. Maps for three Culex species are produced. For data sources with restricted publishing, the code was commented out but links to the original data source were provided. Users will need to download .nc spatial files from Gorris et al. 2021 (https://github.com/lanl/culexmaxentmodels) to make current environmental risk maps. Users will need to download spatial files from USGS FORE-SCE (https://www.sciencebase.gov/catalog/item/5b96c2f9e4b0702d0e826f6d) to make projected environmental risk maps.
File: arbonet_wnvnd_20251201.csv
Description: Human West Nile virus (WNV) incidence data from ArboNET, the national passive surveillance system maintained by the US CDC.
Variables
- GEOID: geographic identifies for US counties
- incidence: Average annual incidence of reported neuroinvasive WNV disease per 100,000 population from 1999-2023, based on county of residence.
File: acs_over65_20251201.csv
Description: The percentage of the population aged 65 and older at the county-level. County-level estimates were obtained from US Census Bureau’s American Community Survey (ACS, S0101)
Variables
-
GEOID: geographic identifies for US counties
-
perc_over65: percent of population aged 65 and older
File: nlcd_urbanag_pct_20251201.csv
Description: Calculated percentage land cover for urban (codes = 21, 22, 23, 24) or agricultural (codes = 81, 82) land from National Land Cover Database (30 m resolution).
Variables
-
GEOID: geographic identifies for US counties
-
urban_pct: percentage of land cover that is urban per county
-
ag_pct: percentage of land cover that is agricultural per county
File: nlcd_urbanag_edgesummary_20251201.csv
Description: Calculated landscape metrics for urban-agriculture interface per county using National Land Cover Database (NLCD) data (resolution 30 m) for US
Variables
- GEOID: geographic identifies for US counties
- edge_pixel_count: total pixels where urban pixel is touching an agricultural pixel
- edge_length_m: the total pixels multiplied by NLCD pixel length (30 m)
- area_km2: total geographic area per county
- edge_density_mk2: calculated urban-agricultural edge density
- log_edge_density_mk2: log10 transformed calculated urban-agricultural edge density
File: bls_agemployed_20251201.csv
Description: The proportion of the population employed in agriculture. Employment data were obtained from the Bureau of Labor Statistics (BLS) Quarterly Census of Employment and Wages (QCEW), which provides monthly employment data categorized by industry according to the North American Industry Classification System (NAICS). For this study, we focus on two specific agricultural industries identified by their NAICS code: 111 (crop production) and 1151 (support activities for crop production). The analysis was conducted using data for the years 2020-2023, the most recently available data online, and specifically for the private sector (as defined by ownership code 5). For each county, we combined the annual employment data from both NAICS 111 and NAICS 1151 to obtain the total agricultural employment per year. To provide a basis for comparison, we used the total private-sector employment across all industries (NAICS 10) per year. Next, we calculated the proportion of the workforce employed in agriculture per county by dividing the total agricultural employment (sum of NAICS 111 and NAICS 1151) by the total private-sector employment (NAICS 10). To ensure data quality, we handled missing values by setting them to 0 for counties with no reported agricultural employment.
Variables
-
GEOID: geographic identifies for US counties
-
bls_agemploy: the proportion of the population employed in agriculture
File: cdc_svi_20251201.csv
Description: Social vulnerability data were obtained from the Centers for Disease Control and Prevention (CDC) (https://www.atsdr.cdc.gov/place-health/php/svi/index.html). Specifically, we used the Social Vulnerability Index (SVI), which captures demographic and socioeconomic characteristics at the county level. Higher SVI values reflect greater vulnerability, including poverty and related social determinants that may elevate mosquito exposure risk. The current CDC SVI uses 16 US Census variables from the 5-year American Community Survey (ACS) to identify communities that may need additional support.
Variables
- GEOID: geographic identifies for US counties
- svi_mean: social vulnerability index between 0-1, with values closer to 1 representing more vulnerable communities at the county-level
File: usgs_birds_20251201.csv
Description: The total competent bird species at the county-level. The geographic ranges of individual bird species were obtained from United States Geological Survey Gap Analysis Project.
Variables
- GEOID: geographic identifies for US counties
- birds_tot: total competent bird species (i.e., bird species richness)
File: gam_covariates_noR0_20251201.csv
Description: Socio-ecological covariates used in the generalized additive model (GAM). Covariates were left in raw form for figure making and also scaled for model fitting.
Variables
- STATEFP: geographic identities for US states
- GEOID: geographic identifies for US counties
- lat: latitude of county centroid from
tigirispackage - lon: longitude of county centroid from
tigirispackage - log_incidence: log10 transformed human WNVND incidence (there were no reported 0s so did not have to add 1)
- raw_incidence: reported annual average human WNVND incidence from 1999-2023
- raw_perc_over65: percent of population aged 65 and older
- raw_edge_pixel_count: the total edges of urban-agricultural pixels per county
- raw_edge_length_m: the length of edges of urban-agricultural pixels per county (NLCD pixels are 30 m in length so we multiplied pixel county * 30)
- raw_area_km2: the total area of the county GEOID
- raw_edgedensity_mk2: urban-agricultural edge density (m/km2) for each county
- raw_log_edgedensity_mk2: the log10 transformed urban-agricultural edge density (m/km2) for each county
- raw_bls_agemploy: the proportion of the population employed in agriculture per county
- raw_svi_mean: the social vulnerability index (0-1) per county
- raw_birds_tot: the total competent bird species present per county
- scale_perc_over65: scaled percent of population aged 65 and older
- scale_edge_pixel_count: scaled the total edges of urban-agricultural pixels per county
- scale_edge_length_m: scaled length of edges of urban-agricultural pixels per county (NLCD pixels are 30 m in length so we multiplied pixel county * 30)
- scale_area_km2: scaled total area of the county GEOID
- scale_edgedensity_mk2: scaled urban-agricultural edge density (m/km2) for each county
- scale_log_edgedensity_mk2: scaled log10 transformed urban-agricultural edge density (m/km2) for each county
- scale_bls_agemploy: scaled the proportion of the population employed in agriculture per county
- scale_svi_mean: scaled the social vulnerability index (0-1) per county
- scale_birds_tot: scaled total competent bird species present per county
File: cmip6_ssp245_historic_county_long.csv
Description: Formatted CMIP6 data ready to be inputted into R0(T) calculations. This data is for emission scenario SSP2-4.5 (low-emissions) during current time period (2015-2020) per county.
Variables
- GEOID: geographic identifies for US counties
- date: YYYY-MM-DD of when data was recorded (on a monthly basis)
- rcp: emission scenario (low- or high-emissions)
- century: time period (current, mid-, or late-century)
- min_temp: minimum temperature (C)
- max_temp: maximum temperature (C)
File: cmip6_ssp585_historic_county_long.csv
Description: Formatted CMIP6 data ready to be inputted into R0(T) calculations. This data is for emission scenario SSP5-8.5 (high-emissions) during current time period (2015-2020) per county.
Variables
- GEOID: geographic identifies for US counties
- date: YYYY-MM-DD of when data was recorded (on a monthly basis)
- rcp: emission scenario (low- or high-emissions)
- century: time period (current, mid-, or late-century)
- min_temp: minimum temperature (C)
- max_temp: maximum temperature (C)
File: cmip6_rR0_allspecies.csv
Description: Calculated temperature-dependent relative R0 transmission suitability using CMIP6 data for three Culex species during three time periods and under two emission scenarios
Variables
- GEOID: geographic identifies for US counties
- species: Culex mosquito species (Cx. pipiens, Cx. tarsalis, Cx. quinquefasciatus)
- century: time periods (current 2015-2020, mid-century 2045-2074, late-century 2075-2100)
- scenario: CMIP6 emission scenario (SSP2-4.5 low-emission, SSP5-8.5 high-emission)
- annual_avgrR0: averaged relative R0 per species-century-scenario for all calendar months
- peakmonths_avgrR0: averaged relative R0 per species-century-scenario for peak activity months (April-October)
- January: calculated relative R0 per species-century-scenario in January
- February: calculated relative R0 per species-century-scenario in February
- March: calculated relative R0 per species-century-scenario in March
- April: calculated relative R0 per species-century-scenario in April
- May: calculated relative R0 per species-century-scenario in May
- June: calculated relative R0 per species-century-scenario in June
- July: calculated relative R0 per species-century-scenario in July
- August: calculated relative R0 per species-century-scenario in August
- September: calculated relative R0 per species-century-scenario in September
- October: calculated relative R0 per species-century-scenario in October
- November: calculated relative R0 per species-century-scenario in November
- December: calculated relative R0 per species-century-scenario in December
File: rR0_peakmonth_county_ssp245_hist.csv
Description: Calculated relative R0 for peak mosquito activity months (April-October) during emission scenario SSP2-4.5 (low-emissions) during current time period (2015-2020) per county.
Variables
- GEOID: geographic identifies for US counties
- raw_peakmonths_rR0_mean: the average calculated relative R0 during peak months
- scale_peakmonths_rR0_mean: scaled average calculated relative R0 during peak months
File: cmip6_ssp245_late_county_long.csv
Description: Formatted CMIP6 data ready to be inputted into R0(T) calculations. This data is for emission scenario SSP2-4.5 (low-emissions) during late-century period (2075-2100) per county.
Variables
- GEOID: geographic identifies for US counties
- date: YYYY-MM-DD of when data was recorded (on a monthly basis)
- rcp: emission scenario (low- or high-emissions)
- century: time period (current, mid-, or late-century)
- min_temp: minimum temperature (C)
- max_temp: maximum temperature (C)
File: cmip6_ssp245_mid_county_long.csv
Description: Formatted CMIP6 data ready to be inputted into R0(T) calculations. This data is for emission scenario SSP2-4.5 (low-emissions) during mid-century period (2045-2074) per county.
Variables
- GEOID: geographic identifies for US counties
- date: YYYY-MM-DD of when data was recorded (on a monthly basis)
- rcp: emission scenario (low- or high-emissions)
- century: time period (current, mid-, or late-century)
- min_temp: minimum temperature (C)
- max_temp: maximum temperature (C)
File: cmip6_ssp585_mid_county_long.csv
Description: Formatted CMIP6 data ready to be inputted into R0(T) calculations. This data is for emission scenario SSP5-8.5 (high-emissions) during mid-century period (2055-2074) per county.
Variables
- GEOID: geographic identifies for US counties
- date: YYYY-MM-DD of when data was recorded (on a monthly basis)
- rcp: emission scenario (low- or high-emissions)
- century: time period (current, mid-, or late-century)
- min_temp: minimum temperature (C)
- max_temp: maximum temperature (C)
File: cmip6_ssp585_late_county_long.csv
Description: Formatted CMIP6 data ready to be inputted into R0(T) calculations. This data is for emission scenario SSP5-8.5 (high-emissions) during late-century period (2075-2100) per county.
Variables
- GEOID: geographic identifies for US counties
- date: YYYY-MM-DD of when data was recorded (on a monthly basis)
- rcp: emission scenario (low- or high-emissions)
- century: time period (current, mid-, or late-century)
- min_temp: minimum temperature (C)
- max_temp: maximum temperature (C)
File: gam_covariates_full.csv
Description: Socio-ecological and temperature-dependent R0 covariates used in the generalized additive model (GAM). Covariates were left in raw form for figure making and also scaled for model fitting.
Variables
- STATEFP: geographic identities for US states
- GEOID: geographic identifies for US counties
- lat: latitude of county centroid from
tigirispackage - lon: longitude of county centroid from
tigirispackage - log_incidence: log10 transformed human WNVND incidence (there were no reported 0s so did not have to add 1)
- raw_incidence: reported annual average human WNVND incidence from 1999-2023
- raw_perc_over65: percent of population aged 65 and older
- raw_edge_pixel_count: the total edges of urban-agricultural pixels per county
- raw_edge_length_m: the length of edges of urban-agricultural pixels per county (NLCD pixels are 30 m in length so we multiplied pixel county * 30)
- raw_area_km2: the total area of the county GEOID
- raw_edgedensity_mk2: urban-agricultural edge density (m/km2) for each county
- raw_log_edgedensity_mk2: the log10 transformed urban-agricultural edge density (m/km2) for each county
- raw_bls_agemploy: the proportion of the population employed in agriculture per county
- raw_svi_mean: the social vulnerability index (0-1) per county
- raw_birds_tot: the total competent bird species present per county
- scale_perc_over65: scaled percent of population aged 65 and older
- scale_edge_pixel_count: scaled the total edges of urban-agricultural pixels per county
- scale_edge_length_m: scaled length of edges of urban-agricultural pixels per county (NLCD pixels are 30 m in length so we multiplied pixel county * 30)
- scale_area_km2: scaled total area of the county GEOID
- scale_edgedensity_mk2: scaled urban-agricultural edge density (m/km2) for each county
- scale_log_edgedensity_mk2: scaled log10 transformed urban-agricultural edge density (m/km2) for each county
- scale_bls_agemploy: scaled the proportion of the population employed in agriculture per county
- scale_svi_mean: scaled the social vulnerability index (0-1) per county
- scale_birds_tot: scaled total competent bird species present per county
- raw_peakmonths_rR0_mean: calculated average R0(T) per county for all three Culex species during peak activity months (April-October)
- scale_peakmonths_rR0_mean: scaled calculated average R0(T) per county for all three Culex species during peak activity months (April-October)
File: Manuscript_DriversofWNV_DataAvailability.pdf
Description: Additional information on where to find raw data sources. The table summarizes metadata for all variables used in the analysis with full source attributions below the table. This information is in Supplementary Table 3 of manuscript.
Code/software
All of the analysis and figure components were made done using RStudio. Some figures were brought into BioRender to add illustrations (ie Figure 1 transmission cycles). All data used in this study was publicly available. The raw data used in 1.DataCleaning.Rmd and 2.CMIP6Cleaning.Rmd will need to unzipped (raw.zip) prior to running those scripts. However, the formatted data is provided as .csv files and are ready to be used for the following scripts: 4.GAMAnalysis.Rmd, 5.EnvironmentalRiskMap.Rmd, and 6.Figures.Rmd.
Software R version 4.4.1
R packages
- ggpubr_0.6.0, cowplot_1.1.3, ncdf4_1.22, gam.hp_0.0-3, ggcorplot_0.1.4.1, gratia_0.9.2.9011, mgcv_1.9-1, terra_1.8-42, sf_1.0-20, exactextractr_0.10.0, raster_3.6-32, tidycensus_1.7.1, janitor_2.2.0, lubridate_1.9.3, stringr_1.5.1, dplyr_1.1.4, readr_2.1.5, tidyr_1.3.1, ggpot2_3.5.2, tidyverse_2.0.0, tigris_2.1
Access information
Data was derived from data sources can be found in 'Manuscript_DriversofWNV_DataAvailability.pdf’.
