Data from: Haemosporidian infection risk across an urban gradient in a songbird
Data files
Dec 23, 2025 version files 276.55 KB
-
BU_Precip_Cleaned.csv
267.45 KB
-
README.md
9.10 KB
Abstract
Urbanization is a significant source of inter- and intra-city environmental variation and is associated with declining population sizes that are increasingly homogeneous. However, whether this shift extends to urban disease ecology and related parasite communities requires further examination. By comparing the prevalence and diversity of two related parasite genera (host-generalist Plasmodium and host-specialist Haemoproteus) in dark-eyed junco (Junco hyemalis) populations across an urbanization gradient in California, we can determine how broad urban-associated land use changes and localized habitat composition correlate with pathogen communities. Additionally, by examining vector abundance responses, we can begin to assess broader impacts on urban disease transmission and ecology. We report that Haemoproteus prevalence decreased in urban habitats, with a larger presence of host-generalist lineages, suggesting urbanization increases homogenization of host-specialist pathogens. Unsurprisingly, the host-generalist pathogen, Plasmodium, showed no correlation with urbanization, but prevalence increased with rainfall. Local habitat characteristics had limited effects on Plasmodium infection prevalence, but moderate shrub coverage and low human presence were associated with Plasmodium infections. Lastly, Culex tarsalis, an important vector for Plasmodium and zoonotic diseases, was the only vector to also increase in abundance in response to rainfall. Our results show that broad land use changes associated with urbanization decrease avian parasite biodiversity and highlight localized abiotic and biotic habitat characteristics that may reduce infection prevalence.
https://doi.org/10.5061/dryad.rr4xgxdj0
Description of the data and file structure
Data was collected from dark-eyed juncos from San Francisco, Santa Barbara, Los Angeles, and San Diego. Sites are listed based on broad categorization (urban vs non-urban) but also include GPS coordinates at each capture site.
Per-individual data is listed, including morphometric data, infection status, and habitat conditions (including vegetation data within 50 m of a capture site). Local vegetation is based on a modified survey by combining the California Native Plant Society and California Department of Fish and Wildlife surveys. Both precipitation and urbanization values (calculated as NDBI-NDVI) are based on remote sensing data.
Lineage data is based onthe closest similarity to published sequence data available on MalAvi and verified in NCBI GenBank. The MalAvi dataset was also used to determine known susceptible hosts to calculate host-specificity.
NOTE: Data in tables listed as "NULL" indicates measurements/relevant information were not able to be obtained atthe time of sampling. "NA" denotes "not applicable".
Files and variables
File: BU_Precip_Cleaned.csv
Description: Complete, per-individual dataset
Variables
- Sample.Number: Corresponds to a specific blood sample
- USGS.band: Federal band number
- Specificity: Parasite–host specificity (STD value based on host relatedness)
- Species: Species and lineage of parasites
- Haemoproteus: Infection status (presence/absence)
- Plasmodium: Infection status (presence/absence)
- HP: Infection status for either Haemoproteus or Plasmodium, including non-specific lineages
- Year: Year of capture
- Month: Month of capture
- Day: Day of month of capture
- Date: Full date of capture (YYYY-MM-DD)
- Julian_Date: Julian date (days)
- Time: Time of capture (local time)
- Status: New or recapture
- BandCombo: Color combination of plastic markers (see USGS for full details)
- Min_Age: Lowest possible age based on aging criteria (years, categorical)
- Age: Age of the individual (L = Local, HY = Hatch Year [flying], AHY = After Hatch Year, SY = Second Year [based on molt limits], ASY = After Second Year)
- Sex: Numeric sex categorization (coded)
- Sex..apprt.: Apparent sex (M = male, F = female, U = unknown)
- Reproductive_stage: Ranking for cloacal protuberance (M) or brood patch (F/U)
- Tarsus: Tarsus length (mm)
- Wing: Wing cord at rest (mm)
- Mass: Total body mass (g)
- Fat: Visible fat score along the breast (ordinal score)
- Bill_length: Bill length from nares to tip (mm)
- Bill_depth: Bill depth at widest point at the nares (mm)
- Bill_width: Bill width at widest point at the nares (mm)
- Tail: Tail length (mm)
- General.location: Description of capture site for banders
- Notes: Comments on significant observations
- Lat: Latitude of capture site (decimal degrees)
- Long: Longitude of capture site (decimal degrees)
- GPS.coordinates: Combined latitude/longitude (decimal degrees)
- LA_Sites: Indicates sites located within Los Angeles (binary/categorical)
- URB: Broad site categorization (urban or non-urban)
- Site: Broader region and/or county
- Name: Name of bird
- bled.: Blood collected or not (yes/no)
- Who measured: Bander initials
- Cloacal.sample: Indicates whether a cloacal swab was collected in RNAlater (Y = yes, N = no)
- Fecal.Sample: Indicates whether fecal samples were collected (none collected in this study)
- Tail.Pictures: Indicates whether tail photographs were collected (yes/no)
- Feathers.: Indicates whether feathers were collected (contour feathers from dorsal or ventral region)
- NestWatchID: NestWatch identifier (no nest data available for these individuals)
- Proportion.of.White.on.Tail: Proportion of tail feathers that are white (unitless proportion; NA if unavailable)
- Bill_SA: Bill surface area (mm²)
- Bill_Volume: Bill volume (mm³)
- BillSAV: Bill surface area-to-volume ratio (unitless)
- Veg.Survey: Indicates whether a vegetation survey was conducted within the year (yes/no)
- Who.Collected..Veg.Survey..Comments: Collector initials and/or comments
- Cloacal.Swab.DNA.Yield..ng.uL.: DNA yield from cloacal swab (ng/µL; NA if unavailable)
- Total.CS.DNA.Yield: Total cloacal swab DNA yield (ng; NA if unavailable)
- lat_v: Latitude for vegetation survey location (decimal degrees)
- long_v: Longitude for vegetation survey location (decimal degrees)
- GPS_device: Device used to collect GPS coordinates for vegetation surveys
- elev_ft: Elevation (feet)
- Elevation: Elevation (m)
- soil.texture: Soil texture code
- habitat_type: Habitat type code
- macro..relative.topography.: Broad-scale topographic classification
- micro: Local habitat pattern classification
- Impervious_Cover: Percent of total impervious cover (%, non-vegetated surfaces)
- lg_rock....: Percent cover of large rocks (%)
- sm_rock: Percent cover of small rocks (%)
- bare_fine: Percent cover that is bare ground or fine substrate (%)
- litter....: Percent leaf litter cover (%)
- X..BaseArea_stems: Percent cover of stem base area (%)
- Water_Cover: Percent water cover (%)
- impervious: Percent impervious surface cover (%)
- Vegetation_Cover: Percent total vegetation cover (%)
- slope_exposure: Slope aspect or direction (categorical)
- slope_steep: Slope steepness (degrees, categorical bins)
- stand_size: Broad stand size category
- X01_development: Degree of human development (ordinal/categorical)
- X05_exotics: Presence of exotic plants (yes/no)
- X15_road: Presence of road cover (yes/no)
- X20_trampling: Presence of human or animal trampling (yes/no)
- X29_recreation: Presence of human recreational activity (yes/no)
- DBH: Diameter at breast height of trees (cm)
- overstory_spp1: Most prominent overstory species
- overstory_spp2: Second most prominent overstory species
- overstory_spp3: Third most prominent overstory species
- shurb_cat: Shrub categorization
- herbs_cat: Herbaceous plant categorization
- Desert_Palms: Percent cover of desert palms (%)
- Nonvascular: Percent cover of nonvascular plants (%)
- veg_cover: Percent vegetation cover (%)
- OS_Conifers: Percent cover of overstory conifers (%)
- os_conifer_height_meters: Overstory conifer height categories (meters)
- OS_Hardwoods: Percent cover of overstory hardwoods (%)
- os_hardwood_height: Overstory hardwood height categories (meters)
- Understory: Percent cover of understory plants (%)
- understory_height: Understory height categories (meters)
- Shrubs_Cover: Percent shrub cover (%)
- shrub_height: Shrub height categories (meters)
- Herbaceous: Percent cover of herbaceous plants (%)
- herb_height: Herbaceous height categories (meters)
- Buildings: Percent cover of buildings (%)
- building_height: Building height categories (meters)
- Built_features: Percent cover of other built features (%)
- Trash_counts: Number of trash cans within the region (count)
- Tables_Count: Number of tables within the region (count)
- built_feat_height: Height categories of other built features (meters)
- multiple_juncos: Presence of multiple juncos within a territory (yes/no)
- X.4: Indicates sites located within Los Angeles (binary)
- BU_50m: Built-up Index value within 50 m of capture sites (unitless index)
- ppt: Total precipitation within the capture month/year (mm)
Code/software
All script was conducted in R version 4.2.1.
Vector_Data_Analysis.R: Prepares mosquito vector data by merging trapping records with precipitation and built-up land cover metrics. Runs species-level abundance models and generates plots linking vectors to rainfall and urbanization.
SiteComparisson_Models.R: Summarizes and compares haemosporidian infection prevalence across sites, years, and habitat types. Uses Fisher’s tests and visualizations to test urban vs montane differences among parasites.
Random_Forest_for_Haemosporidians.R: Builds random forest and classification tree models to predict parasite infection from environmental and host variables. Evaluates variable importance and model accuracy, focusing mainly on Plasmodium infections.
Linear_Models_for_Infection_Across_Habitats.R: Fits generalized linear mixed models to test effects of precipitation and urbanization on infection probability. Produces effect plots for Plasmodium and Haemoproteus across habitats and environmental gradients.
Access information
Other publicly accessible locations of the data:
- None
Data was derived from the following sources:
- Vector data was requested from California Vector Borne Disease Surveillance System (CalSurv). https://vectorsurv.org/
- NDVI and NDBI data was identified and request from the US Geological Survey Earth Resources Observation and Science using reference obtained from https://earthexplorer.usgs.gov/
- Climate data was requested from the Oregon PRISM Climate Group https://prism.oregonstate.edu/recent/
Field Sampling
Bird capture was conducted during three consecutive breeding seasons (January–August) from 2021–2023. The study sites spanned four major metropolitan areas (Los Angeles, San Diego, Santa Barbara, and San Francisco, CA) and included two non-urban sites (Angeles Nation Forest, ANF; Santa Monica Mountains, SMM) and six urban sites (University of California, Los Angeles campus, UCLA; University of California, San Diego campus, UCSD; University of California, Santa Barbara, UCSB; Occidental College campus, Occ; San Francisco State University campus, SFSU; and parks throughout Los Angeles, LA).
At each site, wild juncos were captured by targeted mist netting. Trapping efforts included audio lures recorded from juncos in Los Angeles. All captured individuals, including males, females, and juveniles, were used in this analysis. Upon capture, individuals were banded with metal federal aluminum bands and three plastic color bands to create unique band combinations for behavioral studies performed in tandem with this study. Body morphometrics, including weight, wing cord length, tail length, tarsus length, bill width, depth, and length, were recorded. Age is determined based on plumage characteristics, and sex is determined by a combination of plumage and brood patches or cloacal protuberances (Pyle et al. 1997). Blood samples, >10% of the total mass with an average volume of 50 µL, were collected from individuals via brachial venipuncture and a 30G needle following sterilization of the puncture site with alcohol pads. Blood was collected via heparinized capillary tubes. Parasitemia was not assessed for this study, but two blood smears were prepared per individual, and whole blood was then stored in lysis buffer (Valkiūnas 2005). Samples were collected from recaptures if the capture date exceeded two weeks.
Abiotic Environmental Assessment
Habitats were characterized on the basis of local vegetation surveys and remote sensing data. For both the local and remote datasets, we focused on the 50 m radius surrounding a capture site. This space presents a reasonable approximation of the space used by small territorial passerines, including juncos (Chandler et al. 1994, Blair 2001).
Local vegetation was assessed between 2022 and 2023 via a modified version of the combined Rapid Assessment Protocol (CNPS-RAP) developed by the California Native Plant Society and California Department of Fish and Wildlife (California Native Plant Society (CNPS), 2022). These modifications included conducting surveys at a junco capture location, regardless of whether they fell within the CNPS-RAP definition of a vegetation stand. Rapid assessments were conducted in circular plots with a 50-meter radius around each individual’s capture location. Finally, we counted the number of trash cans and tables within the assessment plot as a proxy for anthropogenic waste availability (Mazué et al., 2023).
Broader habitat conditions included Built-Up Index (BU) as a metric for urbanization and monthly precipitation as a metric for water availability. To calculate the BU values, the difference between the normalized difference built-up index (NDBI) and the normalized difference vegetation index (NDVI) was calculated (He et al. 2010). The raster files for both the NDBI and the NDVI were from the U.S. Geological Survey Earth Resources Observation and Science (EROS) Science Processing Architecture (ESPA) Collection 2 Level 2 Landsat Surface Reflectance-Derived Spectral Indices and had a resolution of 30 m. The BU values corresponded to the average within a 50 m radius of a banding site. The monthly precipitation values were obtained from the PRISM Climate Group via the prism 0.2.0 R package. The total rainfall for the month of the capture date was obtained.
Laboratory analysis
DNA was extracted from whole blood lysis solution via a Qiagen DNeasy Blood and Tissue Extraction Kit (San Diego, CA, USA) or a Wizard® SV Blood and Tissue Extraction Kit (Madison, WI, USA) following the manufacturer’s instructions. A total of 10 µL of DNA was then used to screen for the presence of Haemoproteus/Plasmodium via the previously described nested PCR protocol (Waldenström et al. 2004). In brief, an initial 25-µL reaction was performed with the primers HaemNF and HaemNR2, followed by HaemF and HaemR2 for nested PCR (Waldenström et al. 2004). All reactions were conducted via ThermoFisher DreamTaq MasterMix (Hanover Park, IL, USA). Positive samples were submitted for Sanger sequencing (Azenta US Inc., La Jolla, CA). No coinfections were detected via chromatography; however, this is likely due to PCR bias, which preferentially amplifies Haemoproteus over Plasmodium (Ciloglu et al. 2019).
Sanger sequences were used to identify parasite genera and lineages. All sequences were able to be identified to the genus level; however, only sequences with >80% high-quality reads were used to assign lineages. All sequences were initially aligned via the MUSCLE alignment feature available in Geneious Prime 2023.2.1, with all sequences that exceeded 1% (4 bp) dissimilarity classified as a unique lineage (Bensch et al. 2009). Each unique lineage was compared via BLAST with sequences available in the GenBank and MalAvi databases to determine if it was previously observed.
Host-parasite specificity was reported as STD and calculated via TAXOBIODIV2 http://www.otago.ac.nz/parasitegroup/downloads.html.While other metrics of host-parasite specificity exist (Svensson-Coelho et al. 2013), STD is a comparable method and is broadly used (Poulin and Mouillot 2005). The data on hosts were based on reported observations available from the MalAvi data for each parasite lineage. Host-parasite specificity was reported via STD values ranging from 1--4, with higher values suggesting more generalist parasites. Lineages that were unique from those previously published were given a default STD value of 1.
