Spatiotemporal organization of cryptic North American Culex species along an urbanization gradient
Data files
Sep 05, 2023 version files 464.80 KB
Abstract
Landscape heterogeneity creates diverse habitat and resources for mosquito vectors of disease. A consequence may be varied distribution and abundance of vector species over space and time dependent on niche requirements. We tested the hypothesis that landscape heterogeneity driven by urbanization influences the distribution and relative abundance of Culex pipiens , Cx. restuans, and Cx. quinquefasciatus, three vectors of West Nile virus (WNv) in the eastern North American landscape. We collected 9,803 cryptic Culex from urban, suburban, and rural sites in metropolitan Washington, District of Columbia, during the months of June-October, 2019-2021. In 2021, we also collected mosquitoes in April and May to measure early-season abundance and distribution. Molecular techniques were used to identify a subset of collected Culex to species (n = 2,461). Ecological correlates of the spatiotemporal distribution of these cryptic Culex were examined using constrained and unconstrained ordination.
Seasonality was not associated with Culex community composition in June-October over three years but introducing April and May data revealed seasonal shifts in community composition in the final year of our study. Culex pipiens were dominant across site types, while Cx. quinquefasciatus were associated with urban environments, and Cx. restuans were associated with rural and suburban sites. All three species rarely coexisted.
Synthesis and applications: Our work demonstrates that human-mediated land-use changes influence the distribution and relative abundance of Culex vectors of WNv, even on fine geospatial scales. Site classification, percent impervious surface, distance to city center, and longitude predicted Culex community composition. We documented active Culex months before vector surveillance typically commences in this region, with Culex restuans being most abundant during April and May. Active suppression of Cx. restuans in April and May could reduce early enzootic transmission, delay the seasonal spread of WNv, and thereby reduce overall WNv burden. By June, the highest risk of epizootic spillover of WNv to human hosts may be in suburban areas with high human population density and mixed Culex assemblages that can transmit WNv between birds and humans. Focusing management efforts there may further reduce human disease burden.
README: Spatiotemporal organization of cryptic North American Culex species along an urbanization gradient; Arsenault-Benoit and Fritz 2023
From 2019-2021, the authors collected mosquitoes via gravid traps from 15 sites in Washington D.C. and Maryland. Cryptic Culex specimens were molecularly identified to assess patterns of community composition, habitat use, and seasonality among these species.
Description of the Data and file structure
All data files in this dataset are in .xlsx format.
The contents of each dataset are as follows:
File Name: MD_3yr_totalcatch.xlsx
Description of file:
MD_3yr_totalcatch.xlsx: The number of unfed, cryptic Culex collected in each trap. This is the total number of Culex collected across sites and seasons, from which a subset were molecularly identified. Culex collected are individual counts/integers. This dataset includes individual trap identification codes, site (nominal variable), site classificaitons (nominal variable), and collection dates.
Variables:
Trap_ID: Identification code for each trap, including year, a two letter abbeviation for site, and collection event. Nominal variable
CollNum: A unique identifier for each trap set- corresponds to the site and date of collection; each individual collected in the same trap has the same trap ID. Nominal variable
Date: Calendar date that mosquitoes were collected on.
Site: Site name where collection occurred (Factor)
CollEvent:Number of the trapping instance at each site throughout the year from 1-11 (Apr-Oct) in 2021 and 3-11 (Jun-Oct) in 2019 and 2020. Orderd numerical variable
Prim_Class: nominal factor variable determined by visual classification in combination with the quantifiable urbanization measures used. Sites are classified as either urban, suburban, or rural. Site classifications were assigned at the beginning of the study and remained consistant.
Sec_Class:nominal factor variable determined by visual classification for land use in combination with the quantifiable urbanization measures used. Sites are classified as either commercial, residential, agricultural, or natural. Site classifications were assigned at the beginning of the study and remained consistant.
Season:an ordinal factor variable separating collections into seasonal bins. (very early (April and May), early (Jun 1-Jul 8, mid (Jul 9-Aug 20, late (Aug 20-Oct 1))
Unfed_PerTrap:Integer variable indicating the number of unfed cryptic Culex (target taxa for this study)collected in each trap on the indicated site and date
Year:Ordered numerical variable indiciating whether a collection was done in 2019, 2020, or 2021.
WOY:week of the year, and is ordinal with week 1 corresponding to the week containing Jan. 1.
*This file contains NAs. These are due to either trap failures (traps malfunctioned or were tampered with) or instances where we could not access sites in 2020 due to COV19 restrictions.
File Name: summary.matrix.xlsx
Description of file:
A summary of total unfed Culex collected across site classification and season, generated from the "aggregate" function in R.
Variables:
Class:Primary site classification assigned to the site. Classes are either urban, suburban, or rural; this is a factor variable
Season:An ordinal factor variable describing whether collection was in the early, mid, or late season,
Unfed_PerTrap:Mean number of unfed cryptic Culex collected in each class x season combination. (Continuous variable)
SE:standard error of the mean of unfed cryptic Culex collected in traps of the same site class x season combination. Continuous variable
File Name: summatrix21.xlsx
Description of file:
A summary of unfed Culex per trap similar to summary.matrix.xlsx, but inclusive of the "very early season", from which collections were only done in 2021.
Variables:
Class:Primary site classification assigned to the site. Classes are either urban, suburban, or rural; this is a factor variable
Season:An ordinal factor variable describing whether collection was in the very early, early, mid, or late season,
Unfed_PerTrap:Mean number of unfed cryptic Culex collected in each class x season combination. (Continuous variable)
SE:standard error of the mean of unfed cryptic Culex collected in traps of the same site class x season combination. Continuous variable
N:Total number of individuals collected in traps of the same site class x season combination. (Integer)
File Name: MD_3yr_finals.xlsx
Description of file:
Each individual included in the analysis, with a unique identifier, molecular identification and collection information.
Variables:
TotID_Yr:Identification code for each trap, including year, a two letter abbeviation for site, and collection event. Nominal variable
TotID:Identification code for each trap, including a two letter abbeviation for site, and collection event. Nominal variable
CollectionNum:The collection number is a unique identified for a trap, so individuals collected in a single trap share a collection number (nominal).
UniqueID: An identifier given to each individual specimen processed. ID begins with year, then collection number corresponding to site and date, and an alphanumeric code for the individual ID.
Location: Collection site name (factor)
Final_ID:Final ID is the species identification (pipiens, restuans, or quinquefasciatus)for each individual using PCR (factor)
Prim_Class: nominal factor variable determined by visual classification in combination with the quantifiable urbanization measures used. Sites are classified as either urban, suburban, or rural. Site classifications were assigned at the beginning of the study and remained consistant.
Sec_Class:nominal factor variable determined by visual classification for land use in combination with the quantifiable urbanization measures used. Sites are classified as either commercial, residential, agricultural, or natural. Site classifications were assigned at the beginning of the study and remained consistant.
Season:an ordinal factor variable separating collections into seasonal bins. (very early (April and May), early (Jun 1-Jul 8, mid (Jul 9-Aug 20, late (Aug 20-Oct 1)
CollDate:Calendar date when collection occurred (month/day/year)
Coll_Event:Number of the trapping instance at each site throughout the year from 1-11 (Apr-Oct) in 2021 and 3-11 (Jun-Oct) in 2019 and 2020. Orderd numerical variable
WOY:week of the year, and is ordinal with week 1 corresponding to the week containing Jan. 1.
Year:Year specimens were collected, from 2019-2021
File Name: pq_HWE.xlsx
Description of file:
Used for Hardy Weinberg Equilibirum analysis for specimens molecularly identifed as Culex pipiens or Culex restuans at a single locus. These data represent the number of alleles for pipiens homozygotes (pp), quinquefasciatus homozygotes (qq) and heterozygotes by year, site, and season.
Variables:
Year: Year of collection from 2019-2021 (ordered factor)
Site: Name of collection site (factor)
Season: Season (very early, early, mid, late) when collection occurred
pp: number of individuals in each year, site, and season combination homozygous for the pipiens allele (integer)
pq:number of heterozygous individuals in each year, site, and season combination (integer)
pp: number of individuals in each year, site, and season combination homozygous for the quinquefascaitus allele (integer)
File Name: env_phen.xslx
Description of file:
Environmental data, primarily generated using ArcGIS tools, associated with each collection.
Variables:
Tot_ID_Yr:is a combination of the two-letter abbreviation for site and collection event for each trap, with year as well (factor)
TotID: is a combination of the two-letter abbreviation for site and collection event for each trap (factor)
CollectionNum: The collection number is associated with each trap that was set (values are numeric but the variable is meant to be nominal)..
Location: Collection site name (factor)
Prim_Class: nominal factor variable determined by visual classification in combination with the quantifiable urbanization measures used. Sites are classified as either urban, suburban, or rural. Site classifications were assigned at the beginning of the study and remained consistant.
Sec_Class:nominal factor variable determined by visual classification for land use in combination with the quantifiable urbanization measures used. Sites are classified as either commercial, residential, agricultural, or natural. Site classifications were assigned at the beginning of the study and remained consistant.
Season:an ordinal factor variable separating collections into seasonal bins. (very early (April and May), early (Jun 1-Jul 8, mid (Jul 9-Aug 20, late (Aug 20-Oct 1)
CollDate:Calendar date when collection occurred (month/day/year)
Coll_Event:is the number in a sequence of traps annually (1-11) that correspond with the trap date and is ordinal.
WOY:is week of the year, and is ordinal with week 1 corresponding to the week containing Jan. 1.
Year: calendar year from 2019-2021 (ordered factor).
X: site latitude
Y: site longitude
LandCover_Maj: the land cover classification code number, based on the NLCD 2016 datalayer and legend, that makes up the majority or greatest number of pixels within a 0.5km buffer of each trap site (nominal).
LandCov_Name: the land cover classifcation that corresponds to the codes in LandCov_Maj variable.
PerentImp_Mean: mean percent impervious surface, within a 0.5km buffer of the trap site, based on NLCD percent impervious surface data layer (continuous from 0-100%)
PerentImp_Med: median percent impervious surface, within a 0.5km buffer of the trap site, based on NLCD percent impervious surface data layer (integer from 0-100%)
Percent_tree: the percent tree cover, based on NLCD 2016 data, in a 0.5km buffer radius around each trap site (continous from 0-100%).
Pop: human popululation counts in a 0.5km radius buffer around trap sites from the 2010 US Cenusus (integer)
House: Number of housing units within a 0.5km radius buffer around trap sites based on the 2010 US Census
DCC: Distance to city center: linear distance to Washington, DC from each trap site in km (continuous)
WaterTbl: the depth required to reach surface water at each trap site based on USGS data (cm, continuous)
Elev: Site elevation above sea level (m, continuous)
NDVI:Normalized differential vegetation index: a measure of vegetation density at each site based on light refraction. These are mean values within a 0.5km radius buffer of each site (continuous on a 0-200 scale).
File Name: env_21.xslx
Description of file:
Environmental data for each collection, similar to env_phen.xslx, but only including collections from 2021, which were analyzed separately.
Variables:
TotID: is a combination of the two-letter abbreviation for site and collection event for each trap (factor)
CollectionNum: The collection number is associated with each trap that was set (values are numeric but the variable is meant to be nominal).
Location: Collection site name (factor)
Prim_Class: nominal factor variable determined by visual classification in combination with the quantifiable urbanization measures used. Sites are classified as either urban, suburban, or rural. Site classifications were assigned at the beginning of the study and remained consistant.
Sec_Class:nominal factor variable determined by visual classification for land use in combination with the quantifiable urbanization measures used. Sites are classified as either commercial, residential, agricultural, or natural. Site classifications were assigned at the beginning of the study and remained consistant.
Season:an ordinal factor variable separating collections into seasonal bins. (very early (April and May), early (Jun 1-Jul 8, mid (Jul 9-Aug 20, late (Aug 20-Oct 1)
CollDate:Calendar date when collection occurred (month/day/year)
Coll_Event:is the number in a sequence of traps annually (1-11) that correspond with the trap date and is ordinal.
WOY:is week of the year, and is ordinal with week 1 corresponding to the week containing Jan. 1.
Year: calendar year from 2019-2021 (ordered factor).
X: site latitude
Y: site longitude
LandCover_Maj: the land cover classification code number, based on the NLCD 2016 datalayer and legend, that makes up the majority or greatest number of pixels within a 0.5km buffer of each trap site (nominal).
LandCov_Name: the land cover classifcation that corresponds to the codes in LandCov_Maj variable.
PerentImp_Mean: mean percent impervious surface, within a 0.5km buffer of the trap site, based on NLCD percent impervious surface data layer (continuous from 0-100%)
PerentImp_Med: median percent impervious surface, within a 0.5km buffer of the trap site, based on NLCD percent impervious surface data layer (integer from 0-100%)
<br>
Percent_tree: the percent tree cover, based on NLCD 2016 data, in a 0.5km buffer radius around each trap site (continous from 0-100%).
Pop: human popululation counts in a 0.5km radius buffer around trap sites from the 2010 US Cenusus (integer)
House: Number of housing units within a 0.5km radius buffer around trap sites based on the 2010 US Census
DCC: Distance to city center: linear distance to Washington, DC from each trap site in km (continuous)
WaterTbl: the depth required to reach surface water at each trap site based on USGS data (cm, continuous)
Elev: Site elevation above sea level (m, continuous)
NDVI:Normalized differential vegetation index: a measure of vegetation density at each site based on light refraction. These are mean values within a 0.5km radius buffer of each site (continuous on a 0-200 scale).
File Name: dat_phen.csv
Description of file:
Molecular identification of each individual, as well as collection information for each individual.
Variables:
Tot_ID_Yr:is a combination of the two-letter abbreviation for site and collection event for each trap, with year as well (factor)
TotID: is a combination of the two-letter abbreviation for site and collection event for each trap (factor)
CollectionNum: The collection number is associated with each trap that was set (values are numeric but the variable is meant to be nominal).
UniqueID: An identifier given to each individual specimen processed. ID begins with year, then collection number corresponding to site and date, and an alphanumeric code for the individual ID.
Location: Collection site name (factor)
Final_ID:Final ID is the species identification (pipiens, restuans, or quinquefasciatus)for each individual using PCR (factor)
Prim_Class: nominal factor variable determined by visual classification in combination with the quantifiable urbanization measures used. Sites are classified as either urban, suburban, or rural. Site classifications were assigned at the beginning of the study and remained consistant.
Sec_Class:nominal factor variable determined by visual classification for land use in combination with the quantifiable urbanization measures used. Sites are classified as either commercial, residential, agricultural, or natural. Site classifications were assigned at the beginning of the study and remained consistant.
Season:an ordinal factor variable separating collections into seasonal bins. (very early (April and May), early (Jun 1-Jul 8, mid (Jul 9-Aug 20, late (Aug 20-Oct 1)
CollDate:Calendar date when collection occurred (month/day/year)
Coll_Event:is the number in a sequence of traps annually (1-11) that correspond with the trap date and is ordinal.
WOY:is week of the year, and is ordinal with week 1 corresponding to the week containing Jan. 1.
Year: calendar year from 2019-2021 (ordered factor).
CSV File Name: dat_reg.csv
Description of file:
Counts of each species at each site and collection event, but summed across years.
Variables:
TotID: is a combination of the two-letter abbreviation for site and collection event for each trap (factor)
pip: Count of Cx. pipiens for each TotID (integer)
quinq: Count of individuals with Cx. quinquefasciatus ancestry for each TotID (integer)
rest: Count of Cx. restuans for each TotID (integer)
CSV File Name: dat_21.csv
Description of file:
Same as dat_reg.xslx but including collection events 1-3, corresponding to the "very early season" that were only collected in 2021. This data was analyzed separately.
Variables:
TotID: is a combination of the two-letter abbreviation for site and collection event for each trap (factor)
pip: Count of Cx. pipiens for each TotID (integer)
quinq: Count of individuals with Cx. quinquefasciatus ancestry for each TotID (integer)
rest: Count of Cx. restuans for each TotID (integer)
File Name: TotID_m.csv
Description of file:
A matrix generated in long form to represent the full species counts per site per collection event in the dataset.
Variables:
TotID: is a combination of the two-letter abbreviation for site and collection event for each trap (factor)
pip: Count of Cx. pipiens for each TotID (integer)
quinq: Count of individuals with Cx. quinquefasciatus ancestry for each TotID (integer)
rest: Count of Cx. restuans for each TotID (integer)
Codes and abbreviations:
Fifteen sites were used in this study, and each has a two-letter abbreviation commonly used throughout the dataset. Sites and site codes are described in env_phen.xlsx. See the description of env_phen.xlsx for variable abbreviations, format, and scale for environmental data.
We completed eight collection events per season in 2019-2020, and eleven in 2021. Those collection event (CE) numbers correspond to the season of collection. CE 1-3 are "very early season" (April and May), CE 4-6 are "early season" (June -mid-July), CE 7-9 are "mid season" (mid July-late August) and CE 10&11 are "late season" (late August- early October).WOY= week of the year, with Jan 1 as week one.
Abbreviations for species identifications are as follows:
pip: Culex pipiens
quinq: Culex quinquefasciatus (or having Cx. quinquefasciatus ancestry at the CQ11 locus)
rest: Culex restuans
Some cells throughout the dataset are blank or have NA values. These are due to either trap malfunctions or instances where traps could not be set due to COV19 restrictions. Each instance is described in Arsenault-Benoit and Fritz 2023.
Sharing/access Information
Data was generated by the authors. Description of methodology and analysis, including a link to the script used, can be found in Arsenault-Benoit and Fritz, 2023.
Methods
Mosquito collection was completed by authors over three years. Specimens were morphologically identified to the lowest taxon possible and cryptic Culex were identified with molecular methods (PCR). Once individuals were identified, species distribution patterns were analyzed by site, by site class, and by season using R and associated R packages.