Data from: Leveraging satellite observations to reveal ecological drivers of pest densities across landscapes
Data files
Mar 19, 2024 version files 3.07 MB
-
10km_PrecipDaymet_NLCD__MODIS_cottonfields.csv
-
20km_PrecipDaymet_NLCD__MODIS_cottonfields.csv
-
30km_PrecipDaymet_NLCD__MODIS_cottonfields.csv
-
CottonPestData.csv
-
README.md
Abstract
Landscape ecologists have long suggested that pest abundances increase in simplified, monoculture landscapes. However, tests of this theory often fail to predict pest population sizes in real-world agricultural fields. These failures may arise not only from variations in pest ecology but also from the widespread use of categorical land-use maps that do not adequately characterize habitat availability for pests. We used 1163 field-year observations of Lygus hesperus (Western Tarnished Plant Bug) densities in California cotton fields to determine whether integrating remotely sensed metrics of vegetation productivity and phenology into pest models could improve pest abundance analysis and prediction. Because L. hesperus often overwinters in non-crop vegetation, we predicted that pest abundances would peak on farms surrounded by more non-crop vegetation, especially when the non-crop vegetation is initially productive but then dries down early in the year, causing the pest to disperse into cotton fields. We found that the effect of non-crop habitat on pest densities varied across latitudes, with a positive relationship in the north and a negative one in the south. Aligning with our hypotheses, models predicted that L. hesperus densities were 35 times higher on farms surrounded by high versus low productivity non-crop vegetation (EVI area 350 vs. 50) and 2.8 times higher when dormancy occurred earlier versus later in the year (May 15 vs. June 30). Despite these strong and significant effects, we found that integrating these remote-sensing variables into land-use models only marginally improved pest density predictions in cotton compared to models with categorical land cover metrics alone. Together, our work suggests that the remote sensing variables analyzed here can advance our understanding of pest ecology, but not yet substantively increase the accuracy of pest abundance predictions.
README: Data from: Leveraging satellite observations to reveal ecological drivers of pest densities across landscapes
https://doi.org/10.5061/dryad.r4xgxd2mz
These data include pest (Lygus hesperus) density data in early-season cotton from commercial fields in California between 1997-2008. There are also data at 3 radii around these pest density replicate observations (30 km, 20km, 10km), in which the NLCD crop and non-crop habitat are given as a proportion of the total area, precipitation from DAYMET is reported for the entire radius and split between crop and non-crop habitat and MODIS variables are extracted from Google Earth Engine for the entire radius and split between crop and non-crop habitat at a given radius.
A common replicate ID can be matched across pest observation data and NLCD/DAYMET/MODIS data. Generalized additive models were used to test hypotheses about the proportion, phenology, and productivity of non-crop habitat as predictors for pest density in early-season cotton.
Description of the data and file structure
Variables in the CottonPestData.csv include all information available on (Lygus hesperus) densities sampled in May and June as well as the planting date and yield of the cotton in the given field-year replicate. Models in associated code files cannot be run since latitude cannot be publicly shared for privacy reasons.
All variable names in the NLCD/DAYMET/MODIS files are consistent across radii and are only described for the 30km radius file.
Variables in CottonPestData.csv
RanchID: numeric values from 1-17 indicating the 17 different ranches from which pest densities were observed
crop_year: numeric value between 1997-2008 indicating the year that the cotton was planted and the pest desnities were observed
cotton_type: Categorical variable with either Pima or Acala
replicate_num: integers identifying individual unique field-year observations
field_num_ucd: value identifying individual fields- which can be replicated across year
actual_yield: cotton lint yield in bales/acre
may_june_total_insects: Pest densities were calculated from aggregating 50 swings of a sweep net across the top of the plant canopy. Usually 6-12 sweep samples were taken for a given field on a given date. Pests densities reflect all motile stages combined
may_june_insect_sample_dates: number of times L. hesperus was sampled during May and June- range 1-13
planting_date: date on which cotton was planted given in YYYY-MM-DD format
Variables in 30km_PrecipDaymet_NLCD_ MODIS_cottonfields.csv
Reg: five categorical variables based on latitudinal gradient and groupings of field-year replicate locations- South (35.1-35.29), MidSouth (35.3-35.59), Mid (36.0-36.39), MidNorth (36.4-36.7) and North (37.0-37.3)
crop_year: numeric value between 1997-2008 indicating the year that the cotton was planted and the pest desnities were observed
replicate_num: integers identifying individual unique field-year observations
field_num_ucd: value identifying individual fields- which can be replicated across year
precip_daymet_-121_152: Total precipitation in a 30km radius around field-year replicate locations reported in mm between September 1 in the year previous to cotton planting until the end of May. Data from DAYMET
precip_daymet_-121_152-nlcd-crop: Precipitation in the crop habitat identified from NLCD in a 30km radius around field-year replicate locations reported in mm between September 1 in the year previous to cotton planting until the end of May. Data from DAYMET
precip_daymet_-121_152-nlcd-non-crop: Precipitation in the non-crop habitat identified from NLCD in a 30km radius around field-year replicate locations reported in mm between September 1 in the year previous to cotton planting until the end of May. Data from DAYMET
precip_daymet-nearest_year: year from which the DAYMET data were scraped representing a 1:1 match with crop year planting from 1997-2008
nlcd-crop-prop: the proportion of crop habitat in a 30km radius from the National Land Cover Database. Crop area was defined as either pasture/hay or cultivated crops (NLCD classes 81 and 82).
nlcd-noncrop-prop: the proportion of non-crop habitat in a 30km radius from the National Land Cover Database. Non-crop vegetation was defined as grasslands (71), shrub/scrub (52), forests (41, 42, 43), or wetlands (90, 95).
nlcd-nearest_year: the year from which NLCD data were used to match as closely as possible to the cotton planting year. Since NLCD data were not available each year, data were matched with the closest year for which data were available (crop years 1997-2002: NLCD 2001, 2003-2005: NLCD 2004, 2006-2007: NLCD 2006, 2008: NLCD 2008).
valid_modis_year: a binary variable with 0 (invalid) or 1 (valid), representing whether the year in which pest densities were observed at a specific field-year were years for which MODIS data was available (2001-2008) or not (1997-2000). If 0, all subsequent MODIS variables read "invalid"
Greenup30km: Green-up is a continuous variable defined as the day of the year (DOY) that EVI amplitude first crosses 15% of the maximum EVI amplitude. Greenup30km is extrapolated for vegetation in the entire 30km radius.
MidGreenup30km: Mid green-up is a continuous variable defined as the day of the year (DOY) that EVI amplitude first crosses 50% of the maximum EVI amplitude. MidGreenup30km is extrapolated for vegetation in the entire 30km radius.
Peak30km: Date when EVI amplitude first crossed 90% of the maximum EVI amplitude. Peak30km is extrapolated for vegetation in the entire 30km radius.
Maturity30km MidGreendown30km: Date when EVI amplitude reached the segment maximum. Maturity30km is extrapolated for vegetation in the entire 30km radius.
Senescence30km: Date when EVI amplitude last crossed 90% of the maximum EVI amplitude. Senescence30km is extrapolated for vegetation in the entire 30km radius.
MidGreendown30km: Mid green-down is a continuous variable defined as the day of the year (DOY) that EVI amplitude last crosses 50% of the maximum EVI amplitude. MidGreendown30km is extrapolated for vegetation in the entire 30km radius.
Dormancy30km: Dormancy is a continuous variable defined as the day of the year (DOY) that EVI amplitude last crosses 15% of the maximum EVI amplitude. Dormancy30km is extrapolated for vegetation in the entire 30km radius.
EVI_Minimum30km: A continuous spectral measure of the minimum Enhanced Vegetation Index amplitude value (minimum biological productivity of observed vegetation present on the ground) from all 16-day composite segments during the given season across all 500m pixels in the entire 30km radius
EVI_Amplitude30km: A continuous spectral measure of the biological productivity of observed vegetation present on the ground reported as the maximum - minimum Enhanced Vegetation Index amplitude value for a given 16-day composite segment across all 500m pixels in the entire 30km radius
EVI_Area30km: EVI area is the sum of the daily interpolated EVI amplitude values from green-up to dormancy across all 500m pixels in the entire 30km radius
valid-modis-crop-prop: The proportion area in the 30km radius that MODIS identified as crop habitat
Greenup-crop: Green-up is a continuous variable defined as the day of the year (DOY) that EVI amplitude first crosses 15% of the maximum EVI amplitude. Greenup-crop is extrapolated for vegetation only in the crop habitat in a 30km radius.
MidGreenup-crop: Mid green-up is a continuous variable defined as the day of the year (DOY) that EVI amplitude first crosses 50% of the maximum EVI amplitude. MidGreenup-crop is extrapolated for vegetation only in the crop habitat in a 30km radius.
Peak-crop: Date when EVI amplitude first crossed 90% of the maximum EVI amplitude. Peak-crop is extrapolated for vegetation only in the crop habitat in a 30km radius.
Maturity-crop: Date when EVI amplitude reached the segment maximum. Maturity-crop is extrapolated for vegetation only in the crop habitat in a 30km radius.
Senescence-crop: Date when EVI amplitude last crossed 90% of the maximum EVI amplitude. Senescence-crop is extrapolated for vegetation only in the crop habitat in a 30km radius.
MidGreendown-crop: Mid green-down is a continuous variable defined as the day of the year (DOY) that EVI amplitude last crosses 50% of the maximum EVI amplitude. MidGreendown-crop is extrapolated for vegetation only in the crop habitat in a 30km radius.
Dormancy-crop: Dormancy is a continuous variable defined as the day of the year (DOY) that EVI amplitude last crosses 15% of the maximum EVI amplitude. Dormancy-crop is extrapolated for vegetation only in the crop habitat in a 30km radius.
EVI_Minimum-crop: A continuous spectral measure of the minimum Enhanced Vegetation Index amplitude value (minimum biological productivity of observed vegetation present on the ground) from all 16-day composite segments during the given season across all 500m pixels identified as crop habitat in a 30km radius.
EVI_Amplitude-crop: A continuous spectral measure of the biological productivity of observed vegetation present on the ground reported as the maximum - minimum Enhanced Vegetation Index amplitude value for a given 16-day composite segment across all 500m pixels identified as crop habitat in a 30km radius.
EVI_Area-crop: EVI area is the sum of the daily interpolated EVI amplitude values from green-up to dormancy across all 500m pixels identified as crop habitat in a 30km radius
valid-modis-noncrop-prop: The proportion area in the 30km radius that MODIS identified as noncrop habitat
Greenup-noncrop: Green-up is a continuous variable defined as the day of the year (DOY) that EVI amplitude first crosses 15% of the maximum EVI amplitude. Greenup-noncrop is extrapolated for vegetation only in the noncrop habitat in a 30km radius.
MidGreenup-noncrop: Mid green-up is a continuous variable defined as the day of the year (DOY) that EVI amplitude first crosses 50% of the maximum EVI amplitude. MidGreenup-noncrop is extrapolated for vegetation only in the noncrop habitat in a 30km radius.
Peak-noncrop: Date when EVI amplitude first crossed 90% of the maximum EVI amplitude. Peak-noncrop is extrapolated for vegetation only in the noncrop habitat in a 30km radius.
Maturity-noncrop: Date when EVI amplitude reached the segment maximum. Maturity-noncrop is extrapolated for vegetation only in the noncrop habitat in a 30km radius.
Senescence-noncrop: Date when EVI amplitude last crossed 90% of the maximum EVI amplitude. Senescence-noncrop is extrapolated for vegetation only in the noncrop habitat in a 30km radius.
MidGreendown-noncrop: Mid green-down is a continuous variable defined as the day of the year (DOY) that EVI amplitude last crosses 50% of the maximum EVI amplitude. MidGreendown-noncrop is extrapolated for vegetation only in the noncrop habitat in a 30km radius.
Dormancy-noncrop: Dormancy is a continuous variable defined as the day of the year (DOY) that EVI amplitude last crosses 15% of the maximum EVI amplitude. Dormancy-noncrop is extrapolated for vegetation only in the noncrop habitat in a 30km radius.
EVI_Minimum-noncrop: A continuous spectral measure of the minimum Enhanced Vegetation Index amplitude value (minimum biological productivity of observed vegetation present on the ground) from all 16-day composite segments during the given season across all 500m pixels identified as noncrop habitat in a 30km radius.
EVI_Amplitude-noncrop: A continuous spectral measure of the biological productivity of observed vegetation present on the ground reported as the maximum - minimum Enhanced Vegetation Index amplitude value for a given 16-day composite segment across all 500m pixels identified as noncrop habitat in a 30km radius.
EVI_Area-noncrop: EVI area is the sum of the daily interpolated EVI amplitude values from green-up to dormancy across all 500m pixels identified as noncrop habitat in a 30km radius
Code/Software
DRYADcommonCottonPackages.R includes most common packages used across all analysis
DRYAD_30km.R includes the R code used to run analysis on hypotheses 1-3 at the 30km radius (structure is the same at 20km and 10km radii using the reference data at the appropriate scale
DRYAD_CrossValidation30km.R is the code used to run analysis on hypothesis 4 at 30 km (the only scale at which it was run)
Methods
Our cotton dataset encompassed 1487 field-year replicates of L. hesperus observations across 565 conventionally managed irrigated cotton fields located within 18 ranches (i.e., fields managed by the same organization or grower that may or may not be spatially contiguous). The study site network spanned ~280km of California’s Central Valley, with fields in different ranches separated by an average of 100 km (Interquartile range 31 km). Cotton was usually planted in April (N = 630/872 for which planting date was known). Pesticides were regularly applied to target L. hesperus, most often at peak trap capture (July) and not in the early season studied here (see below). Latitude, longitude, year, and ranch name were available for all fields. Lygus hesperus densities were sampled in Gossypium hirsutum (“upland cotton”) and Gossypium barbadense (“Pima cotton”).
Pest densities were calculated by aggregating 50 swings of a sweep net across the top of the plant canopy. Usually, 6-12 sweep samples were taken for a given field on a given date. Pests were typically surveyed 3-8 times during this early season period (range 1-13) and reflect all motile stages combined. Linear interpolation was used to transform successive samples into mean density estimates by calculating the area under the curve of L. hesperus density by time and dividing by the number of days between sampling intervals. Cotton lint yield was measured and reported once per field year in bales/acre, which was converted to kilograms/hectare for this analysis.
The fractions of crop and non-crop habitat around each focal field were extracted from the National Land Cover Database (NLCD) by quantifying the fraction of 30m2 pixels in each of the two cover classes within three buffer radii around each pest sampling site. Since NLCD data were not available each year, data were matched with the closest year for which data were available (crop years 1997-2002: NLCD 2001, 2003-2005: NLCD 2004, 2006-2007: NLCD 2006, 2008: NLCD 2008). Crop area was defined as either pasture/hay or cultivated crops (NLCD classes 81 and 82). Non-crop vegetation was defined as grasslands (71), shrub/scrub (52), forests (41, 42, 43), or wetlands (90, 95).
We extracted satellite-based climate and vegetation variables within the non-crop habitat. For precipitation in the non-crop habitat, we averaged the total annual precipitation reported from Daymet across all 1 km pixels within both the non-crop habitat and the relevant buffer radius. Daymet data estimate near-surface meteorological conditions where no instrumentation exists using statistically interpolated weather variables.
For information on vegetation productivity throughout the growing season (Enhanced Vegetation Index [EVI] area) and vegetation phenology (dormancy day of the year), we acquired MODIS satellite products (MCD12Q2, Version 6) using the Land Cover Type 2 band. MODIS data are available at a 500 m resolution from 2001-2019; therefore, earlier pest density data (1997-2000) were not analyzed. EVI area reflects the sum of daily estimates of EVI amplitude between green-up and dormancy. The days on which green-up and dormancy are reached were estimated as the days of the year when the EVI amplitude first (green-up) and last (dormancy) crossed 15% of the maximum EVI amplitude. For both EVI area and dormancy of vegetation, values were averaged across all 500 m pixels within the non-crop habitat and the relevant buffer radius. To account for the seasonal nature of precipitation in the California Mediterranean climate, both metrics of productivity (precipitation in non-crop habitat, and EVI area), and phenology (day of year on which dormancy was reached in the non-crop habitat) were estimated using a start date of September 1 in the previous year (i.e., the beginning of the rainy season).
All landscape, precipitation, and satellite observation data were extracted at multiple spatial scales (10 km, 20 km, and 30 km).