Data from: Associations between soil characteristics and ground-nesting bees on farms
Data files
May 29, 2025 version files 1.43 MB
-
DATA_BEE.csv
1.29 MB
-
DATA_BOWLS.csv
76.65 KB
-
DATA_FARM.txt
13.40 KB
-
DATA_SOIL_FINAL.txt
20.20 KB
-
DATA_TEXTURE18.txt
7.62 KB
-
DATA_TEXTURE19.txt
10.49 KB
-
GENUS_NESTING.csv
576 B
-
README.md
15.52 KB
Abstract
Much of the world's agricultural production depends on pollination services provided by wild bees. At the same time, agriculture changes landscapes in ways that can alter bee habitat. However, little is known about the nesting habitat requirements of the many bee species that nest underground. Here, we asked which soil factors influence the abundance, diversity, and community composition of ground-nesting bees in agroecosystems around Ottawa, Canada. We measured soil characteristics (texture, hardness, slope, ground cover) and sampled bee communities at 131 plots on 35 farms over two years. We identified the ground-nesting bees to species. We collected 8,661 ground-nesting bees representing 100 species. Ground-nesting bee abundance and species richness were higher with increased percentages of bare ground and sand, while Simpson’s diversity was negatively associated with slope. The abundance of non-ground-nesting (cavity-nesting) bees was not related to any measured soil properties, suggesting that the associations between soil variables and ground-nesting bees reflect direct effects of soils on these bees, rather than indirect effects mediated by unmeasured variables. Only a small proportion of the variance in ground-nesting bee community composition was explained by soil factors; however, sand percentage, slope, soil compaction, and bare ground were all significant predictors, reflecting the fact that relationships between soil predictors and ground-nesting bee taxa were species-specific. Compared to floral resources, soils have been neglected as components of bee habitat quality, but understanding the soil characteristics preferred by ground-nesting bees can assist in efforts to protect this important group of pollinators.
https://doi.org/10.5061/dryad.zgmsbccnx
Description of the data and file structure
Files and variables
File: DATA_BOWLS.csv
Description: This file includes information on the number of pan traps (= bee bowls) deployed at each sampling location, the dates on which they were deployed, and the geographic coordinates of each sampling location.
Variables
● year: Year of data collection (2018 or 2019).
● farm: Full name of farm on which data were collected.
● site: Three- or four-letter code for the farm name. Three-letter codes are used for 2019 sampling locations and four-letter codes for 2018 sampling locations.
● plot: Plot number, from one to three. See Figure 1 of manuscript text for explanation. With one exception, each site included two or three individual sampling plots. The one exception is site LAVO, with a single plot. Farms Lavoie Sylvain (LAVO, with one plot) and Pine Hill Orchard (PINE, with two plots) are adjacent (within 1 km), and the two sites were therefore treated as one site with three plots for data collection and analysis. Specifically, plot 1 of LAVO was treated as a third plot of site “PINE” and is thus recoded as site code = “PINE3” in the analysis scripts.
● round: The sampling occasion within each year. Each site was sampled approximately monthly for five months in each year, so round 1 = May, round 2 = June, etc. At four sites in 2018 (CANN, GLEN, ORLE, ZAND), a storm during round 4 overturned numerous bowls, so a supplemental sampling round (“4e”) was added.
● site code: Concatenation of site and plot number. Note that plots JUN1-JUN3 are interpreted as dates if the spreadsheet is opened in Excel.
● survey code: Concatenation of site code and round number.
● date1: Date on which pan traps (= bee bowls) were deployed.
● date2: Date on which pan traps were collected.
● nb_days: Number of days over which pan traps were deployed (date2 – date 1).
● location: Name of nearest municipality.
● lat: Latitude of the specific plot.
● long: Longitude of the specific plot.
● bees: Number of bees collected.
● bowls: Number of bowls (pan traps) collected. Initially (in the first sampling round of 2018), 9 bowls were deployed per site, meaning 3 or 6 bowls per plot, depending on whether there were two or three plots per site. Subsequently, nine bowls were always deployed per site (three white, three yellow, three blue), but some were overturned or lost, so the number here can be less than 9.
● Whi NC: Number of white bowls not collected. Possible values: 1, 2, 3 or blank.
● Yel NC: Number of yellow bowls not collected. Possible values: 1, 2, 3 or blank.
● Blu NC: Number of blue bowls not collected. Possible values: 1, 2, 3 or blank.
File: DATA_FARM.txt
Description: This file includes geographic coordinates of farms and individual sampling locations (plots) as well as information on crop type and farm management.
Variables
● year: Year of data collection (2018 or 2019).
● farm: Full name of farm on which data were collected. In this file, farm Lavoie Sylvain has been recoded as Pine Hill Orchard (see site_code below).
● plot: Plot number, from one to three. Each farm included two or three individual sampling plots.
● site_code: Concatenation of three- or four-letter site code and plot number. In this file, farm Lavoie Sylvain (LAVO) has been recoded as Pine Hill Orchard (PINE), and the LAVO1 plot recoded as PINE3. Note that plots JUN1-JUN3 are interpreted as dates if the spreadsheet is opened in Excel.
● city: Name of nearest municipality.
● crop: Type of crop grown at plot location in the year of data collection. Possible values are: berries, corn/soy/wheat, orchard, pasture/forages, squash, strawberries, veggies.
● farm_management: Type of farm management. Possible values are: conventional, organic.
● lat: Latitude of the specific plot, in decimal degrees.
● long: Longitude of the specific plot, in decimal degrees.
● lat_farm: Latitude of farm centroid (calculated as the mean of the two-three plot latitudes for that farm), in decimal degrees.
● long_farm: Longitude of farm centroid (calculated as the mean of the two-three plot longitudes for that farm), in decimal degrees.
File: DATA_SOIL_FINAL.txt
Description: This file contains all soil-related nesting habitat variables except soil texture. Each soil variable was measured at three locations within each plot; each of those locations corresponds to a single row in this dataset.
Variables
● year: Year of data collection (2018 or 2019).
● farm: Full name of farm on which data were collected.
● plot: Plot number, from one to three. See Figure 1 of manuscript text for explanation. With one exception, each site included two or three individual sampling plots. The one exception is site LAVO, with a single plot. Farms Lavoie Sylvain (LAVO) and Pine Hill Orchard (PINE) are adjacent (within 1 km), and the two sites were therefore treated as one for data collection and analysis. Specifically, plot “LAVO1” was treated as a third plot of site “PINE” and is thus recoded as “PINE3” in the analysis scripts.
● code: Concatenation of three- or four-letter site code and plot number. Note that plots JUN1-JUN3 are interpreted as dates if the spreadsheet is opened in Excel.
● date: Date on which soil data were collected.
● slope: Measured as angle from the horizontal using an Abney level, in degrees.
● compaction: Measured using a hand-held pocket penetrometer (G118H4200 Hoskin Scientific, Canada), in kg/cm2.
● bareground: Percent of a 1 x 1 m quadrat occupied by bare ground, estimated visually to the nearest 5%.
● vege: Percent of a 1 x 1 m quadrat occupied by living vegetation, estimated visually to the nearest 5%.
● deadlitter: Percent of a 1 x 1 m quadrat occupied by dead litter, estimated visually to the nearest 5%.
File: DATA_TEXTURE18.txt
Description: This file contains the hydrometer readings obtained from soil samples taken from the 2018 study plots. In 2018, separate samples were collected from three soil-sampling locations per plot; each of these samples was processed separately and is represented by a single row in this dataset.
Variables
● obs_no: Unique identifier for the measurement.
● sheet_no: Number of the datasheet on which the measurement was recorded (values: 1 to 17)
● year: Year of data collection (2018).
● farm: Full name of farm on which data were collected.
● plot: Plot number, from one to three. See Figure 1 of manuscript text for explanation. With one exception, each site included two or three individual sampling plots. The one exception is site LAVO, with a single plot. Farms Lavoie Sylvain (LAVO) and Pine Hill Orchard (PINE) are adjacent (within 1 km), and the two sites were therefore treated as one for data collection and analysis. Specifically, the LAVO plot 1 was treated as a third plot of site “PINE” and is thus recoded as code = “PINE3” in the analysis scripts.
● code: Concatenation of the four-letter site code and plot number.
● sample: Number (from 1 to 3) of the sample taken from each plot.
● Temperature_0: temperature of the soil solution at the start of the hydrometer analysis, in degrees Fahrenheit
● Hydro_30s_1: first measurement of the hydrometer reading at 30 s after shaking the soil solution.
● Hydro_30s_2: second measurement of the hydrometer reading at 30 s after shaking the soil solution.
● Hydro_30s_3: third measurement of the hydrometer reading at 30 s after shaking the soil solution.
● Hydro_60s: measurement of the hydrometer reading at 60 s after shaking the soil solution.
● Hydro_1.5h: hydrometer reading at 1.5 h after shaking the soil.
● Temperature_24h: temperature of the soil solution in the cylinder at 24 h after the shaking, in degrees Fahrenheit.
● Hydro_24h: hydrometer reading at 24 h after shaking the soil solution.
● Observer: Initials of the person taking the hydrometer readings. CA = Cécile Antoine.
File: DATA_TEXTURE19.txt
Description: This file contains the hydrometer readings obtained from soil samples taken from the 2019 study plots. In 2019, each soil sample was a composite from three sampling locations per plot; this composite sample was divided into two or three subsamples for analysis. Each row in this dataset corresponds to an individual subsample.
Variables
● obs_no: Unique identifier for the measurement.
● sheet_no: Number of the datasheet on which the measurement was recorded (values: 1 to 25)
● year: Year of data collection (2019).
● farm: Full name of farm on which data were collected.
● plot: Plot number, from one to three.
● code: Concatenation of three-letter site code and plot number. Note that plots JUN1-JUN3 are interpreted as dates if the spreadsheet is opened in Excel.
● Temperature_0h: Temperature of the soil solution at the start of the hydrometer analysis, in degrees Fahrenheit
● Hydro_30s_1: First hydrometer measurement at 30 s after shaking the soil solution.
● Hydro_30s_2: Second 30 s hydrometer measurement.
● Hydro_30s_3: Third 30 s hydrometer measurement.
● Hydro_40s: Hydrometer measurement taken at 40 s after shaking the soil solution. This was not typically recorded, so most cells are blank.
● Hydro_60s: Hydrometer measurement at 60 s after shaking the soil solution. (For Observer SW, it’s unclear whether these values were recorded at 40 s or 60 s, except in cases where a 40 s value was recorded separately.)
● Hydro_7h: hydrometer reading at 7 h after shaking the soil solution.
● Temperature_7h: temperature of the soil solution in the cylinder at 7 h after the shaking, in degrees Fahrenheit.
● Observer: Initials of the person taking the hydrometer readings. CA = Cécile Antoine, LF = Luca Fiorindi, SW = Xiaoyue (Sherry) Wu.
File: GENUS_NESTING.csv
Description: This file associates each bee genus with the most typical type of nesting habitat for that genus.
Variables
● Genus: Bee genus name.
● Nesting: Type of nesting habitat. Possible values are: Cavity, Ground, Parasitic.
File: DATA_BEE.csv
Description: The full list of bee specimens collected during the study, with information on sex, taxonomic identity, date collected, and location of collection.
Variables
● Number: Unique specimen number, from 1 to 11722
● Locality: Locality information as entered on specimen label, in the format COUNTRY (“CAN”): PROVINCE (QC or ON): Municipality.
● Lat, Long: Coordinates of specimen collection, as entered on specimen label, in the format DD°MM’SS.S’’N; DD°MM’SS.S’’ W
● Date: Dates on which the pan traps were deployed, as entered on specimen label, in the format day1-day2.month.year, where month is reported as lower-case Roman numerals, and day 1 and day 2 represent the day of the month on which the pan trap was first deployed and collected, respectively.
● Site_code: Concatenation of site code and round number, separated by a hyphen.
● Family: The taxonomic family to which the specimen belongs. Blank if unknown.
● Genus: The genus to which the specimen belongs. Blank if unknown.
● Species: The species or species complex to which the specimen belongs. Specimens that were identified to genus but not to species are entered here as “sp.” Blank for specimens not identified to genus.
● Sex: Sex of the specimen. Possible values: F (female), M (male), or blank (for specimens not identified to species).
Code/software
All scripts were written for R v. 4.4.0 (https://www.r-project.org/) within RStudio 2024.04.1, and all will run within RStudio except where indicated in the commentary at the start of the “ORDINATION.r” script. Each script requires specific R packages, listed at the start of each script, which must be installed before running the script.
DATA_PREP_SOIL_TEXTURE.r must be run first. It calls the raw data files from the same working directory in which it is stored and writes the output files to a new subdirectory (“Processed data”) that it creates. This script calculates the percentage sand, silt, and clay in each soil sample based on hydrometer readings and temperatures at different time points.
DATA_PREP_BEES.r must be run second. This script calculates summary statistics for each study plot (total number of bees collected, mean % sand, etc.), does some cleaning to harmonize variable names, and creates summary files, which are written to the “Processed data” subdirectory. It also generates a sites (plots) x species matrix (“GNSpecies_matrix.csv”) that is used both within this script to calculate additional summary statistics (Simpson’s diversity, species richness) and as input for ORDINATION.r. The final summary file created by this script (“GNBee_plot_summary_FINAL.csv”) is used as the input for all other scripts (DATA_VISUALIZATION.r, MODELS.r, FIGURES_BOXPLOTS.r, FIGURES_SCATTERPLOTS.r).
The remaining scripts can be run in any order.
DATA_VISUALIZATION.r produces exploratory visualizations of the data to check for outliers and collinearity. It is used to create supplementary Figures S2 (scatterplot matrix) and S5 (soil texture triangle).
MODELS.r produces all statistical models and tests reported in the manuscript, except those related to the redundancy analysis, which can be found in ORDINATION.r. Tables of statistical results are written to a new subdirectory (“Results tables”) that the script creates.
FIGURES_SCATTERPLOTS.r produces main text Figures 2, 3, and 5, showing associations between soil variables and ground-nesting bee abundance and diversity.
FIGURES_BOXPLOTS.r produces Supplementary Figures S3, S4, S6, and S7, showing associations between crop type and soil variables (Fig. S3), farm management and soil variables (Fig. S4), ground-nesting bee abundance and diversity vs. crop type and farm management (Fig. S6), and cavity-nesting bee abundance vs. crop type (Fig. S7).
ORDINATION.r produces the redundancy analysis and accompanying biplots (Figures 4 and S8) and statistical tests (Table S8).
Bees were collected using pan-traps at 35 farms over a two-year period. Soil data were collected from the same study plots where bees were sampled. Soil samples were analysed in the laboratory to determine particle size composition (texture) and bees we identified as belonging to ground-nesting genera were further identified to species.