Separating sampling bias from abundance shows that different methods catch different wild bees
Data files
Feb 13, 2026 version files 2.75 MB
-
IT_species.csv
6.35 KB
-
methods--final_EE_010926.R
30.60 KB
-
pan_net_all.csv
1.44 MB
-
pan_vane_all.csv
1.25 MB
-
raupcrick_Chase.R
8.54 KB
-
README.md
5.12 KB
Abstract
Ecological community sampling methods have taxonomic biases, producing samples where relative abundances of taxa may differ from the underlying sampled community. Evaluating sampling methods’ relative biases is therefore necessary for accurately interpreting community data. Wild bees (Hymenoptera: Apoidea) have been the focus of intensive community sampling and many studies have compared the properties of samples collected by different methods. However, comparative studies have often conflated differences in sampling bias with differences in effort and absolute abundance between methods, potentially obscuring methods’ true biases.
Here, we compare wild bee communities in the northeastern United States as sampled by pan traps, vane traps, and hand netting. Using a dataset of simultaneous sampling by different methods, we compare sample richness and composition between pairs of methods while accounting for differences in the overall number of bees sampled by each.
For a given number of individuals sampled, hand netting captured more bee species than pan traps, which captured more species than vane traps. Pan traps sampled a different pool of species than either of the other two methods. Of 21 bee genera analyzed, eight were overrepresented in pan trap samples relative to hand netting, while seven were relatively underrepresented in pan traps. When compared against vane traps, four genera of 20 were relatively overrepresented in pan traps while six were relatively underrepresented. Pan traps poorly represented very large-bodied genera as compared with the other methods.
We find pervasive biases in bee community sampling methods, with most genera showing significant differences in relative abundance in at least one methodological comparison. At times, genera were relatively underrepresented even by methods that collected them in higher absolute abundance. Since bias is unavoidable in community sampling, studies must measure taxon-specific biases in the context of their system and evaluate the robustness of analytical results.
Dataset DOI: 10.5061/dryad.r2280gbr8
Description of the data and file structure
Files and variables
File: methods--final_EE_010926.R
Description: See Code/software below.
File: raupcrick_Chase.R
Description: See Code/software below.
File: IT_species.csv
Description: Species-level average intertegular distances (ITD) from a selection of specimens in the Winfree Lab collection. Intertegular (IT) distance is measured as the distance across the thorax of a bee specimen, between the bases of the wings (tegulae).
Variables
- genus: genus-level identification of bees that measurements pertain to
- species: species-level identification of bees that measurements pertain to
- femaleIT: average intertegular distance (measured in millimeters) of female specimens measured for a given species
- maleIT: average intertegular distance (measured in millimeters) of female specimens measured for a given species
File: pan_net_all.csv
Description: Collection information and taxonomic identifications for bee specimens collected during sampling events with simultaneous use of hand-netting and pan traps.
Variables
- uniqueID: unique identifying code for each individual specimen
- genus: genus-level taxonomic specimen identification
- species: species- or species-group-level taxonomic specimen identification
- collector: individual or individuals who collected a given specimen
- method: sampling method by which a given specimen was collected - either hand net ("net") or pan trap ("pan")
- date: date (month/day/year) on which a given specimen was collected in the field
- round: order of an individual sampling event relative to other sampling events conducted at the same site and year (e.g., first sampling event of the year at a site is round 1, second is round 2, etc.)
- site: unique identifier for field site/sampling location
- study: unique identifier for each individual research study
- latitude: decimal latitude of geographic location of field site/sampling location
- longitude: decimal longitude of geographic location of field site/sampling location
File: pan_vane_all.csv
Description: Collection information and taxonomic identifications for bee specimens collected during sampling events with simultaneous use of pan traps and blue vane traps.
Variables
- uniqueID: unique identifying code for each individual specimen
- genus: genus-level taxonomic specimen identification
- species: species- or species-group-level taxonomic specimen identification
- method: sampling method by which a given specimen was collected - either vane trap ("vane") or pan trap ("pan")
- date: date (month/day/year) on which a given specimen was collected in the field
- site: unique identifier for field site/sampling location
- study: unique identifier for each individual research study
- latitude: decimal latitude of geographic location of field site/sampling location
- longitude: decimal longitude of geographic location of field site/sampling location
Code/software
methods--final_EE_010926.R: R script needed to run all main analyses in our paper entitled "Separating sampling bias from abundance shows that different methods catch different wild bees". This script compares bee community samples as collected by pan traps against simultaneously collected samples from 1) hand netting and 2) blue vane traps. Specifically, we compare richness of samples via rarefaction and species composition/identities via multivariate methods (Raup-Crick dissimilarity, as calculated using code made available by Chase et al. (2011)). Finally, we measure methods' relative taxonomic biases with respect to individual bee genera using generalized linear mixed models (GLMMs) and test whether genus-level differences in bias may be attributable to differences in body size.
raupcrick_Chase.R: *This file is not our original code - all of the code within was provided by Chase et al. (2011) for calculating the Raup-Crick dissimilarity metric. Our main analysis R script (above) sources the function "raup_crick" from this script to calculate Raup-Crick dissimilarity between bee community samples. See citation for this file below.
Citation: M. Chase, Jonathan; J. B. Kraft, Nathan; G. Smith, Kevin; Vellend, Mark; Inouye, Brian D (2016). Using null models to disentangle variation in community dissimilarity from variation in α-diversity. Wiley. Collection. https://doi.org/10.6084/m9.figshare.c.3308220.v1
Access information
Other publicly accessible locations of the data:
- https://doi.org/10.1086/675716
- https://doi.org/10.1111/gcb.13921
- https://doi.org/10.1016/j.biocon.2021.109202
Data was derived from the following sources:
