Data for: Remote sensing reveals the importance of adjacent seminatural habitat and irrigation method on aphid biocontrol in arid agroecosystems
Data files
Aug 16, 2024 version files 528.43 MB
-
2020_Exclosures.zip
523.89 MB
-
aphid_models_vegetation.csv
35.53 KB
-
aphid_models.csv
2.78 MB
-
Fall_counts.csv
8.34 KB
-
predator_models_vegetation.csv
22.66 KB
-
predator_models.csv
1.51 MB
-
README.md
17.33 KB
-
Spring_counts.csv
12.52 KB
-
superDoveSupervisedClassification_areaScore_fixedClass.csv
37.06 KB
-
superDoveSupervisedClassification_areaScore.csv
37.05 KB
-
updated_plant_ids.csv
7.22 KB
-
vegSurvey_fieldJoin.csv
18.37 KB
-
VegSurvey.csv
46.91 KB
-
vegSurveyClasses.csv
5.60 KB
Abstract
Remote sensing and nuanced spatial analyses are increasingly used to understand the role of seminatural habitats in biocontrol, but knowing how to best leverage these tools is a persistent challenge. Furthermore, these tools are seldom applied to arid and semiarid agroecosystems, where irrigation often exaggerates the differences between crop fields and their interspaces. In arid systems, small weedy areas adjacent to watered fields are often one of the only sources of persistent vegetation; remote sensing and careful spatial analysis may be essential to capture the effect of this fine-scale variation on pest biocontrol.
Using irrigated alfalfa farms in the Great Basin Desert (Nevada, USA), we examined the role of land-cover in determining the degree of aphid pest pressure and biocontrol within alfalfa fields. We used a combination of field surveys, remote sensing, and spatial analysis to permit comparisons between different methods of assessing land-cover and the importance of spatial scale. Additionally, we experimentally manipulated predator densities to assess the combined direct and indirect effects of seminatural habitat on aphid biocontrol.
Although the influence of land-cover types varied between seasons and among arthropod taxa, our results indicate that our predictions were supported in a few cases—notably for coccinellid beetles, a key aphid predator in this system.
Our remote-sensing approach was more effective than conventional vegetation surveys in revealing the importance of spatial scale, the effect of flood irrigation, and the role of weedy patches within alfalfa fields.
Synthesis and applications. Weedy seminatural habitat near alfalfa fields, such as naturally occurring weedy areas along field margins and the banks of irrigation ditches, was positively associated with the density of a key aphid predator. Preserving these weedy areas can enhance aphid biocontrol, but farmers must consider potential tradeoffs between pest control and weed control.
README: Data for: "Remote sensing reveals the importance of adjacent seminatural habitat and irrigation method on aphid biocontrol in arid agroecosystems"
https://doi.org/10.5061/dryad.np5hqc00r
Raw data, code, and complete model selection tables for the article "Remote sensing reveals the importance of adjacent seminatural habitat and irrigation method on aphid biocontrol in arid agroecosystems" submitted to Journal of Applied Ecology in October 2023.
Description of the data and file structure
Raw data from arthropod collections are located in the following CSV files:
- Spring_counts.csv (for data collected in the spring season): 22 columns, with rows representing an individual subplot.
Column | Description | |
---|---|---|
Number (int) | A unique identifier for each subplot | |
Vial (char) | The label of the vial where collected insects are stored. Vials are stored at the Pringle Laboratory, University of Nevada, Reno. | |
Site (char) | The name of the farm. Occasional NA values represent missing or misrecoreded data. These missing values are recovered through disambiguation and deduction in the "data_processing.R" script. | |
Field (int) | Identifies a single field within the farm. Occasional NA values represent missing or misrecoreded data. These missing values are recovered through disambiguation and deduction in the "data_processing.R" script. | |
Plot (int) | Identifies a single plot within a field. Occasional NA values represent missing or misrecoreded data. These missing values are recovered through disambiguation and deduction in the "data_processing.R" script. | |
Treatment (char, factor) | One of three treatment conditions, or "Pre-" for pre-treatment arthropod collections. Full == full exclosure, Sham == canopy exclosure, Control == no exclosure. | |
Sorter:Counter_2 (three columns, char) | Lab technician who sorted, counted, or checked counts. May be NA if only one counter checked their own work. | |
Arachnida:Other (12 columns, double) | Counts of insect abundance within the subplot | |
Notes (char) | Miscellaneous notes. |
- Fall_counts.csv (for data collected in the fall season): 22 columns, with the same column specification as above.
- VegSurvey.csv: 889 rows, containing data from field vegetation surveys. Each row is a single plant species observed at a single survey plot.
Column | Description |
---|---|
survey_date (Date) | Date of the survey when the plant was observed. |
observers (char) | Initials of the field technicians who were present. |
site (factor) | The name of the farm where the survey was conducted. |
waypoint (char) | Corresponds to a gps waypoint in the veg_survey_locs.csv file. |
species (char) | USDA NRCS code or other taxonomic name. |
count | Number of individual plants observed within the survey plot. Sometimes NA if the number of individuals present was too high to count manually, but also NA if the plot was bare (see "cover" below). |
cover | Visually-estimated percent cover within the survey plot. NA values for both count and cover indicate a survey plot with no vegetation. |
notes | misc. notes. |
key1:key2 | metadata |
- superDoveSupervisedClassification_areaScore.csv: 1 column. Each row represents a site.
- dataSeries (char): this column contains weighted area scores from processed satellite imagery and exported from Google Earth Engine. It must be parsed by the data_processing.R script to create analysis-ready land cover factors.
- superDoveSupervisedClassification_areaScore_fixedClass.csv: same as above, but with weighted area scores calculated when the land cover classification of alfalfa fields was completed manually, rather than by the random forest classifier. See the manuscript for further details.
- updated_plant_ids.csv: links to the veg_data.csv file and provides updated taxonomic names and NRCS codes for plants that were identified in the lab after field pressing. See "data_processing.R" for more details.
Column | Description |
---|---|
orig_code | Original taxonomic name listed under "species" column in VegSurvey.csv. |
new_code | Updated NRCS taxonomic code. |
code | Code written in plant press. |
loc | Location where plant was collected. NA if unknown or if collected in multiple locations. |
ssn | Season when plant was collected. NA if unknown or if collected in both seasons. |
id | Scientific name of identified plant. |
notes | Notes on identification. |
key | metadata notes: column names |
value | metadata notes: column descriptions |
- VegSurvey_fieldJoin.csv: links to the VegSurvey.csv and vegSurveyClasses.csv to provide location data for vegetation survey points.
Column | Description |
---|---|
OID | Identifier column from ArcMap. |
Name | The name of the waypoint, as stored in the .gpx. |
site | The farm where the point is located. |
DateTimeS | Date/timestamp for waypoint creation. |
CID | The land cover class of the point. 0==field margin [1-6]==land cover classes 0-5 in early EE analysis. 7==not a survey point - a wayfinding or study site point marked on the gps. Not useful except for distinguishing survey locations from directional waypoints. |
POINT_X | Longitude (WGS84) |
POINT_Y | Latitude (WGS84) |
field_id | This row is used to add field identifiers to the vegSurvey dataset. It was created in ArcMap using the spatial join tool, using parameter "CLOSEST". It is most relevant for relating margin points to the field they are associated with. |
key1 | metadata notes: column names |
key2 | metadata notes: column descriptions |
- vegSurveyClasses.csv: links to the VegSurvey.csv and VegSurvey_fieldJoin.csv tables to provide information on whether vegetation survey points were randomly located or located in field margins. Only plots from field margins were used in the final analysis; other rows were discarded.
Column | Description |
---|---|
name | Corresponds to "Name" column in VegSurvey_fieldJoin.csv table. |
plotnum (char) | Corresponds to "waypoint" column in the VegSurvey.csv table. This is a character (text) column that identifies GPS waypoints and are not necessarily a numeric value. |
site | Corresponds to "site" column in the VegSurvey.csv table. |
type | "Margin" or "Random". "Random" points were not used in our analysis. |
Remote sensing imagery and the associated Google Earth Engine code is available in the following GEE repository:
- https://earthengine.googlesource.com/users/ansoncall/Exclosures_2020
- Note: A Google Earth Engine account is required; such accounts are free for non-commercial users.
During the analysis, a series of model selection tables were created. Only a subset of the candidate models were presented in the manuscript, but the full model selection tables are available here:
- aphid_models.csv:
- A table describing all of the initial candidate models of aphid abundance. Candidate models contain all combinations of land cover and predator variables, with a maximum of 4 variables in each model. All models are shown.
Column | Description |
---|---|
Season | Spring or Fall arthropod collection |
Intercept | Model intercept term. |
OtherAgriculture_Proximate:Weedy_None | Coefficients of predictor terms in the candidate models. The names of land cover predictors follow the convention LandCoverType_DistanceWeighting, where "LandCoverType" describes the category of land cover and "DistanceWeighting" describes the distance weighting function that was used to calculate the weighted area of that cover type. SimpsonDiversity_[DistanceWeighting] denotes Simpson's diversity of land cover types, after weighting land cover with the noted with the "distant" distance decay function. |
df | Degrees of freedom of the candidate model |
logLik | Log likelihood of the candidate model |
AICc | (corrected) Akaike's Information Criterion of the candidate model |
delta | ΔAICc for the candidate model. |
weight | Akaike weights for the candidate model. |
- aphid_models_vegetation.csv
- A table with the same structure as aphid_models.csv (above). Candidate models are based on the best model from "aphid_models," with the addition of data from vegetation surveys. Predictors from vegetation surveys are listed with the suffix _FM to denote that these variables describe traits of the field margins.
- predator_models.csv
- A table with the same structure as aphid_models.csv (above), describing the initial candidate models of predator abundance. Candidate models contain all combinations of land cover and aphid variables, with a maximum of 2 variables in each model.
- predator_models_vegetation.csv
- A table with the same structure as aphid_models.csv (above). Candidate models are based on the best model from "predator_models.csv," with the addition of data from vegetation surveys. Predictors from vegetation surveys are listed with the suffix _FM to denote that these variables describe traits of the field margins.
Sharing/Access information
[empty]
Code/Software
All of the above data and the associated R code (excepting satellite imagery and Google Earth Engine code) is also available as zipped folder containing an RStudio project:
- 2020_exclosures.zip
The purpose and functions of specific R scripts are described in the header of each individual script, but a general overview of key scripts are given here:
- data_processing.R
- Run this script first. Loads data from the /raw_data/ directory, corrects errors and joints tables, and produces tidy data for downstream analysis.
- data_analysis_landcover_models.R
- fits candidate models and stores the details of model fits. Takes a long time to run.
- data_analysis.R
- The main analysis script. Uses tidy data and candidate model fits that are generated by data_processing.R and data_analysis.R.