Remote sensing and nuanced spatial analyses are increasingly used to understand the role of seminatural habitats in biocontrol, but knowing how to best leverage these tools is a persistent challenge. Furthermore, these tools are seldom applied to arid and semiarid agroecosystems, where irrigation often exaggerates the differences between crop fields and their interspaces. In arid systems, small weedy areas adjacent to watered fields are often one of the only sources of persistent vegetation; remote sensing and careful spatial analysis may be essential to capture the effect of this fine-scale variation on pest biocontrol.

Using irrigated alfalfa farms in the Great Basin Desert (Nevada, USA), we examined the role of land-cover in determining the degree of aphid pest pressure and biocontrol within alfalfa fields. We used a combination of field surveys, remote sensing, and spatial analysis to permit comparisons between different methods of assessing land-cover and the importance of spatial scale. Additionally, we experimentally manipulated predator densities to assess the combined direct and indirect effects of seminatural habitat on aphid biocontrol.

Although the influence of land-cover types varied between seasons and among arthropod taxa, our results indicate that our predictions were supported in a few cases—notably for coccinellid beetles, a key aphid predator in this system.

Our remote-sensing approach was more effective than conventional vegetation surveys in revealing the importance of spatial scale, the effect of flood irrigation, and the role of weedy patches within alfalfa fields.

Synthesis and applications. Weedy seminatural habitat near alfalfa fields, such as naturally occurring weedy areas along field margins and the banks of irrigation ditches, was positively associated with the density of a key aphid predator. Preserving these weedy areas can enhance aphid biocontrol, but farmers must consider potential tradeoffs between pest control and weed control.

https://doi.org/10.5061/dryad.np5hqc00r

Raw data, code, and complete model selection tables for the article “Remote sensing reveals the importance of adjacent seminatural habitat and irrigation method on aphid biocontrol in arid agroecosystems” submitted to Journal of Applied Ecology in October 2023.

Description of the data and file structure

Raw data from arthropod collections are located in the following CSV files:

Spring_counts.csv (for data collected in the spring season): 22 columns, with rows representing an individual subplot.

Column	Description
Number (int)	A unique identifier for each subplot
Vial (char)	The label of the vial where collected insects are stored. Vials are stored at the Pringle Laboratory, University of Nevada, Reno.
Site (char)	The name of the farm. Occasional NA values represent missing or misrecoreded data. These missing values are recovered through disambiguation and deduction in the “data_processing.R” script.
Field (int)	Identifies a single field within the farm. Occasional NA values represent missing or misrecoreded data. These missing values are recovered through disambiguation and deduction in the “data_processing.R” script.
Plot (int)	Identifies a single plot within a field. Occasional NA values represent missing or misrecoreded data. These missing values are recovered through disambiguation and deduction in the “data_processing.R” script.
Treatment (char, factor)	One of three treatment conditions, or “Pre-“ for pre-treatment arthropod collections. Full == full exclosure, Sham == canopy exclosure, Control == no exclosure.
Sorter:Counter_2 (three columns, char)	Lab technician who sorted, counted, or checked counts. May be NA if only one counter checked their own work.
Arachnida:Other (12 columns, double)	Counts of insect abundance within the subplot
Notes (char)	Miscellaneous notes.

Fall_counts.csv (for data collected in the fall season): 22 columns, with the same column specification as above.
VegSurvey.csv: 889 rows, containing data from field vegetation surveys. Each row is a single plant species observed at a single survey plot.

Column	Description
survey_date (Date)	Date of the survey when the plant was observed.
observers (char)	Initials of the field technicians who were present.
site (factor)	The name of the farm where the survey was conducted.
waypoint (char)	Corresponds to a gps waypoint in the veg_survey_locs.csv file.
species (char)	USDA NRCS code or other taxonomic name.
count	Number of individual plants observed within the survey plot. Sometimes NA if the number of individuals present was too high to count manually, but also NA if the plot was bare (see “cover” below).
cover	Visually-estimated percent cover within the survey plot. NA values for both count and cover indicate a survey plot with no vegetation.
notes	misc. notes.
key1:key2	metadata

superDoveSupervisedClassification_areaScore.csv: 1 column. Each row represents a site.
- dataSeries (char): this column contains weighted area scores from processed satellite imagery and exported from Google Earth Engine. It must be parsed by the data_processing.R script to create analysis-ready land cover factors.
superDoveSupervisedClassification_areaScore_fixedClass.csv: same as above, but with weighted area scores calculated when the land cover classification of alfalfa fields was completed manually, rather than by the random forest classifier. See the manuscript for further details.
updated_plant_ids.csv: links to the veg_data.csv file and provides updated taxonomic names and NRCS codes for plants that were identified in the lab after field pressing. See “data_processing.R” for more details.

Column	Description
orig_code	Original taxonomic name listed under “species” column in VegSurvey.csv.
new_code	Updated NRCS taxonomic code.
code	Code written in plant press.
loc	Location where plant was collected. NA if unknown or if collected in multiple locations.
ssn	Season when plant was collected. NA if unknown or if collected in both seasons.
id	Scientific name of identified plant.
notes	Notes on identification.
key	metadata notes: column names
value	metadata notes: column descriptions

VegSurvey_fieldJoin.csv: links to the VegSurvey.csv and vegSurveyClasses.csv to provide location data for vegetation survey points.

Column	Description
OID	Identifier column from ArcMap.
Name	The name of the waypoint, as stored in the .gpx.
site	The farm where the point is located.
DateTimeS	Date/timestamp for waypoint creation.
CID	The land cover class of the point. 0==field margin [1-6]==land cover classes 0-5 in early EE analysis. 7==not a survey point - a wayfinding or study site point marked on the gps. Not useful except for distinguishing survey locations from directional waypoints.
POINT_X	Longitude (WGS84)
POINT_Y	Latitude (WGS84)
field_id	This row is used to add field identifiers to the vegSurvey dataset. It was created in ArcMap using the spatial join tool, using parameter “CLOSEST”. It is most relevant for relating margin points to the field they are associated with.
key1	metadata notes: column names
key2	metadata notes: column descriptions

vegSurveyClasses.csv: links to the VegSurvey.csv and VegSurvey_fieldJoin.csv tables to provide information on whether vegetation survey points were randomly located or located in field margins. Only plots from field margins were used in the final analysis; other rows were discarded.

Column	Description
name	Corresponds to “Name” column in VegSurvey_fieldJoin.csv table.
plotnum (char)	Corresponds to “waypoint” column in the VegSurvey.csv table. This is a character (text) column that identifies GPS waypoints and are not necessarily a numeric value.
site	Corresponds to “site” column in the VegSurvey.csv table.
type	“Margin” or “Random”. “Random” points were not used in our analysis.

Remote sensing imagery and the associated Google Earth Engine code is available in the following GEE repository:

https://earthengine.googlesource.com/users/ansoncall/Exclosures_2020
Note: A Google Earth Engine account is required; such accounts are free for non-commercial users.

During the analysis, a series of model selection tables were created. Only a subset of the candidate models were presented in the manuscript, but the full model selection tables are available here:

aphid_models.csv:
- A table describing all of the initial candidate models of aphid abundance. Candidate models contain all combinations of land cover and predator variables, with a maximum of 4 variables in each model. All models are shown.

Column	Description
Season	Spring or Fall arthropod collection
Intercept	Model intercept term.
OtherAgriculture_Proximate:Weedy_None	Coefficients of predictor terms in the candidate models. The names of land cover predictors follow the convention LandCoverType_DistanceWeighting, where “LandCoverType” describes the category of land cover and “DistanceWeighting” describes the distance weighting function that was used to calculate the weighted area of that cover type. SimpsonDiversity_[DistanceWeighting] denotes Simpson’s diversity of land cover types, after weighting land cover with the noted with the “distant” distance decay function.
df	Degrees of freedom of the candidate model
logLik	Log likelihood of the candidate model
AICc	(corrected) Akaike’s Information Criterion of the candidate model
delta	ΔAICc for the candidate model.
weight	Akaike weights for the candidate model.

aphid_models_vegetation.csv
- A table with the same structure as aphid_models.csv (above). Candidate models are based on the best model from “aphid_models,” with the addition of data from vegetation surveys. Predictors from vegetation surveys are listed with the suffix _FM to denote that these variables describe traits of the field margins.
predator_models.csv
- A table with the same structure as aphid_models.csv (above), describing the initial candidate models of predator abundance. Candidate models contain all combinations of land cover and aphid variables, with a maximum of 2 variables in each model.
predator_models_vegetation.csv
- A table with the same structure as aphid_models.csv (above). Candidate models are based on the best model from “predator_models.csv,” with the addition of data from vegetation surveys. Predictors from vegetation surveys are listed with the suffix _FM to denote that these variables describe traits of the field margins.

Sharing/Access information

[empty]

Code/Software

All of the above data and the associated R code (excepting satellite imagery and Google Earth Engine code) is also available as zipped folder containing an RStudio project:

2020_exclosures.zip

The purpose and functions of specific R scripts are described in the header of each individual script, but a general overview of key scripts are given here:

data_processing.R
- Run this script first. Loads data from the /raw_data/ directory, corrects errors and joints tables, and produces tidy data for downstream analysis.
data_analysis_landcover_models.R
- fits candidate models and stores the details of model fits. Takes a long time to run.
data_analysis.R
- The main analysis script. Uses tidy data and candidate model fits that are generated by data_processing.R and data_analysis.R.

Data for: Remote sensing reveals the importance of adjacent seminatural habitat and irrigation method on aphid biocontrol in arid agroecosystems

Data files

Abstract

Description of the data and file structure

Sharing/Access information

Code/Software

Data for: Remote sensing reveals the importance of adjacent seminatural habitat and irrigation method on aphid biocontrol in arid agroecosystems

Data files

Abstract

README: Data for: “Remote sensing reveals the importance of adjacent seminatural habitat and irrigation method on aphid biocontrol in arid agroecosystems”

Description of the data and file structure

Sharing/Access information

Code/Software

Works referencing this dataset