A beneficial arthropod dataset for agricultural landscapes in Western Canada and adjacent mountain ecosystems
Data files
Dec 13, 2024 version files 84.53 MB
-
beneficials_unified.csv
84.52 MB
-
README.md
15.88 KB
Abstract
One of the largest drivers of global biodiversity trends is land use change and habitat loss. Through several studies of beneficial arthropods, we have compiled a spatially- extensive passive-sampling arthropod dataset for Western Canada focused on landscape diversity. This dataset, collected from 2015-2019, consists of more than 200,000 specimens, five arthropod orders, and 26 families of either pollinators (Hymenoptera, Diptera) or natural enemies of pests (Coleoptera, Araneae, Opiliones). In the research that collectively makes up this dataset, there are 409 sampling sites in two focal areas: the Canadian Rockies (n=70) and the agriculturally intense Canadian prairies (n=339). Sampled in the montane region focused on Bombus species, while both pollinators and natural enemies were sampled in the prairies. Within the prairie region, there was also a focus on non-crop habitat that occurs within or adjacent to the annual crop fields and rangelands that dominate the region. This data can be used to investigate beneficial insect abundance and richness over a gradient of elevation, land cover, landscape diversity and climate.
README: A beneficial arthropod dataset for agricultural landscapes in Western Canada, and adjacent mountain ecosystems
https://doi.org/10.5061/dryad.tmpg4f55s
This dataset contains 200,000+ records of of either pollinators (Hymenoptera, Diptera) or natural enemies of pests (Coleoptera, Araneae, Opiliones), collected between 2015-2019 in Western Canada. All pollinator specimens were collected using either blue vane traps or coloured cup traps filled with propylene glycol, and natural enemies were collected using pitfall traps, also filled with propylene glycol. The trapping duration varies by trap but is about 14 days on average. Specimens were stored in ethyl alcohol, then washed, pinned, and identified to the lowest possible taxonomic level. Some specimens are identified to sex, and social bees are identified to caste when possible.
Description of the data and file structure
The original database was managed using three separate tables: a site table, a trap table, and an arthropod specimen table. The site data table contains location data on each site, and each site has a unique Beneficial Location ID (BLID) number. The trap data table contains data about each trap within each site, linked by BLID, and each trap has a unique Beneficial Trap ID (BTID). This includes temporal information about trap deployment and collection, pass (the ordinal number of visits) and information about the trap itself. The arthropod data table contains specimen information, and used BLID and BTID to link each specimen to the site and trap it was caught in. This contains the taxonomic identification and sex/caste information, as well as the year of identification and the name of the person who determined the identification. Each specimen also has a Beneficial Protocol ID (BPID), which is used to link specimens with the same collection and trapping protocol. While this data overlaps with BTID, there are certain replicate names that were used across studies in the same year, and the BPID allows for selecting data based on how it was collected.
Each table was separately checked for errors, cleaned, and filtered as needed before being combined into a unified database. Data fields from each data table that are redundant, irrelevant, collected for only a small subset of data, or not collected systematically were excluded from this dataset, but can be found on Zenodo with the R script used for data cleaning.
Sharing/Access information
This is the only location where these data are currently accessible.
Code/Software
We made an R script to clean and prepare each original data table and then unite them into a single dataset. Each data table was checked for entry errors, which were then corrected individually. We removed data columns representing data that 1) are redundant after uniting original data tables 2) were only collected for a small subset of studies, or 3) are not relevant outside their original use (i.e. directions to sites, field notes, location of specimens within physical collection).
Below are descriptions for the fields in each original data table. Fields that are bolded are included in the final unified dataset.
Site
BLID: Bee location ID, unique code assigned to each sample site. Protocol for assigning site IDs is always 5 digits, first two digits were chosen in blocks starting from 100 and the last three digits are generally (but not always), multiples of 7.
lat: Latitude, using decimal degree coordinates.
lon: Longitude, using decimal degree coordinates.
aliasBLID1: initial site ID, later discarded for BLID.
aliasBLID2: initial site ID, later discarded for BLID.
locality: description of site using distances and geographic landmarks.
region: general area of sampling site, determined by landmark towns?
elevation: elevation of the sampling site, in meters. Elevation is determined using a DEM GIS layer.
country: the country the sampling site is located in- all sampling sites are within Canada.
province: the Canadian province the sampling site is located in, either Alberta or BC (British Columbia).
siteType: a general classification of sampling site. Site types include:
- alpine: Typically have higher elevation (>1000) and are within the Rockies or Selkirks region. All alpine sites have BLIDs that start with “5”.
- ditch: sites are sites that are along road margin. Within the Claresholm, Dutchess, Grande Prairie, Kinsella, Red Deer, South Calgary, and Vauxhall-Taber regions.
- infield_control: sites in the middle of a field that has neither a pivot corner nor wetland (non-irrigated), crop or pasture. small subset of sites, likely study-specific. Two each in Kinsella, Duchess, and South Calgary regions, one in Claresholm.
- infield_pivot: sites at which sampling stations were placed at multiple distances from a pivot corner (irrigated). small subset of sites, likely study-specific. All but one in Duchess region, with one in Claresholm.
- infield_wetland: sites at which sampling stations are placed at multiple distances from in-field wetlands, crop or pasture fields. . Within the Claresholm, Dutchess, Kinsella, Red Deer, South Calgary, and Vermillion regions.
WLID: Weather station location ID, unique code assigned to nearest weathe station to site. All sites with WLIDs also have a by distToWLID_km value and are either ditch sites or infield_wetland sites. Sites that have WLIDs are within the South Calgary, Grande Prairie, and Taber-Vauxhall regions.
distToWLID_km: the distance in kilometers between a site and an IDed weather station, not entered for all sites.
irrigated: a binary variable that indicates irrigation status, 0 mean no irrigation, 1 means irrigation
expt_yield_2017: designates whether or not the site was part of the 2017 yield experiment (can’t share data?).
expt_restore_2018: designates whether or not the site was chosen as part of grassland-wetland complex restoration study.
provincial region: a classification of larger regions within the province of Alberta, include Central Alberta, Northern Alberta, Rocky Mountains, and Southern Alberta. Classification is determined by vernacular. DISCARD
expt_yield_Lakeland: sites that are part of the canola yield experiment done at Lakeland college in Vermillion, AB.
notes: include notes on alternative field names.
Trap
BTID: Bee Trap ID, unique code assigned to each trap. The protocol for assigning code is BLID-pass-rep-year.
BLID: unique site code, described above.
Collector: the person leading the field team, not the actual trap collector (collectors can be on multiple field teams in certain years) in the year the trap was collected.
trapType: The type of trap deployed. Trap types include:
- Blue vane: A passive collection trap (do we have a brand?) targeted towards pollinators, filled with propylene glycol.
- Colored Cups: A passive collection trap targeted towards pollinators, 12 oz. white plastic cups with interiors painted fluorescent blue or fluorescent yellow or left unpainted. The number or color of cups varies across studies, but typically include three total, one of each color.
- Pit Fall: a passive trapping method, targeted towards ground-based natural enemies. These traps were 528 ml Solo® cups buried into the ground up to the rim, then filled halfway with propylene glycol and covered with 2 cm wire mesh to exclude small vertebrates.
pass: A number that designates how many times the site has been consecutively visited. The number of visits to a site is determined by the study and protocol (once we assign BPIDs we should check passes/protocol), but the range is 0 (likely indicates the trap set up, no collection) to 8.
replicate: a designation of trap replicate and/or treatment, depending on the protocol. See the table of PBIDs and corresponding protocol for more information on replicate meanings.
startYear: the calendar year in which the trap was deployed.
startMonth: the calendar month in which the trap was deployed.
startDay: the calendar day in which the trap was deployed.
startHour: the hour (using the 24 hour system) in which the trap was deployed.
startMinute: the minute in which the trap was deployed.
endYear: the calendar year in which the trap was collected.
endMonth: the calendar month in which the trap was collected.
endDay: the calendar day in which the trap was collected.
endHour: the hour (using the 24 hour system) in which the trap was collected.
endMinute: the minute in which the trap was collected.
canolaAdjacent: binary, whether the trap was adjacent to canola? Either Yes, No, or NA.
percentCanolaBloom: a measure of canola bloom, often a range? This will make no sense to anyone who didn’t collect that data.
floralAdjacentNotes: Notes on flower species adjacent to the trap? Not all traps have these.
deployedNotes: Notes on the trap surroundings, in some cases, others have notes on the trap surroundings.
retrivalNotes: notes on collecting the traps, if they have non-target organisms (like vertebrates), if traps were disturbed or diluted, other notes.
lonTrap: the longitude of the trap. In some cases, traps have two sets of coordinates, one for site and one for trap, in other cases trap coordinates are the same as the site coordinates. All traps have site coordinates but not all have trap coordinates.
elevTrap: the elevation of the trap, in meters. Not all traps have an elevation.
latTrap: the latitude of the trap.
trapLoc: a general classification of trap location. Classifications include:
- alpine: traps used by D. Clake for their dissertation research. Corresponds to the “alpine” siteType in the site database.
- canola: traps are in canola fields under conventional management.
- ditch: traps are alongside roads, Corresponds to the ditch siteType in the site database
- native: traps in native vegetation, like wetland and wetland-grassland complexes.
- pivot: traps located in the center of canola fields, near the pivot irrigation.
- wetland: traps located in wetlands
- wheat: traps located in wheat fields under conventional management.
adjCrop: the land cover of the area one the side of the trap that isn’t the road land cover closest (?) or adjacent to the trap. Can either be a crop type (alfalfa, barley, canola, flax), land cover type (tree, urban, wetland), or other (gravel, stream, lawn, yard). Not a classification variable, generally descriptive.
oppCrop: the land cover opposite the road the trap lies along. Has the same general classifications as adjCrop but has other categories. When the land cover type was ambiguous, technicians chose the most dominant land cover.
adjCropBloom: an estimation of what percentage of adjCrop has bloomed. Only estimated when adjCrop is an actual crop category.
oppCropBloom: same estimation as above, but for oppCrop.
adjMowed: an estimation of what percentage of adjCrop has been mowed. Seems to be estimated across adjCrop classifications.
oppMowed: same estimation as above, but for oppCrop.
startjulian: the Julian day when the trap was deployed. Not calculated for all traps.
endjulian: the Julian day when the trap was collected. Not calculated for all traps.
midjulian: the Julian day of the midpoint between trap deployment and collection? Not calculated for all traps.
midyear: the calendar year at the midpoint between trap deployment and collection. Not calculated for all traps.
midmonth: the calendar month at the midpoint between trap deployment and collection. Not calculated for all traps.
midday: the calendar day at the midpoint between trap deployment and collection. Not calculated for all traps.
Deployedhours: the number of hours the trap was deployed. Not calculated for all traps.
dist: the distance in meters of the trap from the focal wetland, along a transect. Pivot features have distances of 0, 25, 75, and 200 m. Wetland features have distances of 0, 2, 25, 50, 75, 100, 200 m.
distFrom: the feature the dist is being measured from. Some features all have dist of 0: alpine, control, ditch.
lonSite: the longitude of the site, in decimal degrees. Should correspond to the lon column in the site datasheet.
latSite: the latitude of the site, in decimal degrees. Should correspond to the lat column in the datasheet.
Arthropod
BBID: A unique ID for each specimen, 7 digits (there are two specimens with 6 digit BBIDs). ID numbers are assigned using an automated software (developed by Galpern) that reduced similarity between numbers, otherwise chosen sequentially.
BTID: unique trap ID number, described in trap database.
BLID: unique site ID number, described in site database.
pass: described in trap database.
rep: identical to the “replicate” column in trap database.
Year: the year the specimen was collected.
arthOrder: The arthropod order the specimen belongs to. The orders represented are Aranea, Coleoptera, Diptera, Hymenoptera, Opiliones. All specimens are identified to order.
family: the taxonomic family the specimen belongs to. All specimens are IDed to family.
genus: the taxonomic genus the specimen belongs to. Not all specimens are IDed to genus, when genus is unknown, they are labelled as [family]_unknown (example: anthomyiidae_unknown) or as NA when they are in EtOH and not pinned.
subgenus: the taxonomic subgenus the specimen belongs to. Not all specimens have a subgenus- there are no Araneae, Diptera, or Opiliones that have an identified subgenus. Among the Coleoptera and Hymenoptera, not all species identified to genus also have a subgenus, as not all genera can be divided into subgenera.
species: the taxonomic species the specimen belongs to. Not all specimens identified to species: if a specimen is not identified to genus, the species is labelled with a morphospecies concept sp# or sp.# (example: sp9) or as [family]_unknown. If a specimen is identified to genus and to morphospecies, it is labelled [genus]_sp# or [genus]_sp.# (example: dialictus_sp7). When a specimen can be identified to genus but not to morphospecies, it is labelled as [genus]_unknown, or [genus]_ss, or spp.
length_mm: Bombus specimens collected in 2015 were measured lengthwide in millimeters. Ask Jen R what length_mm means.
caste: Only applied to social bees of the Apis and Bombus genera, the three caste designations are worker, queen and male. Caste is determined through differences in size.
drawer: the number of the drawer in the University of Calgary Invertebrate Specimen Collection. No longer up to date.
tray: the number of the tray within the drawer.
detYear: the year the specimen was identified.
detBy: the lab tech who identified the specimen.
sex: The biological sex of the specimen, determined through taxonomic difference when applicable. M is for male, F is for female, and I stands for immature. No Coleoptera are identified to sex.
Notes: include some notes on updating BBIDs and some floral visiting records.
New Fields
Origin: the organization responsible for data collection, in this case, the Agriculture, Biodiversity, and Conservation (ABC) lab at the University of Calgary, led by Dr. Paul Galpern.
Site_year: a combination of site and year, produced a code that identifies unique sampling sites, since some sites were sampled across multiple years.
Duration_d: the trap duration, in days. Determined by the interval between the start and end dates listed for the trap, therefore traps that don’t have this information also lack durations.
BPID: Bee protocol ID, used to describe the site section and trapping protocol associated with each trap. The BPID is generated as “trapLoc-year-replicate-trapType”.
Methods
The specimens in the dataset came from either montane or prairie sampling sites. The montane data was collected in the Canadian Rockies along hiking trails and used only blue vane traps (SpringStar LLC, Woodinville, WA, USA) meant for sampling arthropod pollinators. The prairie data was primarily collected in intensely agricultural areas and the surrounding non-crop land cover like native prairie grasslands, tree stands, and wetlands. In this area, there were two types of sampling stations- “ditch” stations along road margins that border agricultural areas, or in-field stations that are inside agricultural or non-crop areas. Prairie sites had either a blue vane trap, a pitfall trap, or both. Pitfall traps are made by burying a 528 ml Solo® cups into the ground up to the rim and filling it halfway with propylene glycol and covering it with 2 cm wire mesh to exclude small vertebrates. Sites with both blue vane traps and pitfall traps also had a trio of colored cup traps (3 x 12 oz. white plastic cups with interiors painted fluorescent blue, fluorescent yellow, or left unpainted), which also primarily target pollinators, on posts to be level with the crop canopy.
Across all this research, the deployment and retrieval of traps, as well as the handling of specimens was uniform. The average trap duration is about two weeks, or 13.3 ± 2.8 d (Mean ± SD) and traps were deployed from mid-May to the end of September. All traps were filled with propylene glycol to catch and preserve arthropods and once collected, trap catch was emptied into WhirlPak® bags with 95% (for the montane sites) or 70% (for prairie sites) ethyl alcohol (EtOH) for transportation back to the lab. A subset of the montaine bee collection is preserved in EtOH, but all other arthropods were washed, pinned, and labelled. All specimens are identified at minimum to the family level, 99.2% are identified to the genus level, and 92.4% are identified to the species level. Some genera are also assigned a subgenus when applicable.
Specimens in the order Hymenoptera (95.6%), Diptera (0.15%), Opiliones (19.9%), and Araneae (93%) are identified to sex, while the Coleoptera are not. Some (4.5%) Araneae and Opiliones are designated immature (I), rather than male (M) or female (F). The social bees of the Apis (23.9%) and Bombus (80%) genera are identified to caste (male, worker, queen) when possible. Each pinned specimen has a species label with taxonomic information and a location label with trap and site information. Each site has a unique 5-digit “beneficials location” ID number (BLID), and each trap has a unique “beneficials trap” ID number (BTID) that is generated as “BLID-pass-replicate-year”. The species label also has a unique 7-digit specimen ID (BBID). All specimens are kept in the Invertebrate Section of the Museum of Zoology at the Department of Biology, University of Calgary (DBUC)