Separating sampling bias from abundance shows that different methods catch different wild bees

McCarthy, Max 1 ; Simpson, Dylan1 2 3; Aldercotte, Andrew1; Smith, Colleen1; Harrison, Tina1; Winfree, Rachael1

Published Feb 13, 2026 on Dryad. https://doi.org/10.5061/dryad.r2280gbr8

Data files

Feb 13, 2026 version files 2.75 MB

IT_species.csv

6.35 KB
methods--final_EE_010926.R

30.60 KB
pan_net_all.csv

1.44 MB
pan_vane_all.csv

1.25 MB
raupcrick_Chase.R

8.54 KB
README.md

5.12 KB

Abstract

Ecological community sampling methods have taxonomic biases, producing samples where relative abundances of taxa may differ from the underlying sampled community. Evaluating sampling methods’ relative biases is therefore necessary for accurately interpreting community data. Wild bees (Hymenoptera: Apoidea) have been the focus of intensive community sampling and many studies have compared the properties of samples collected by different methods. However, comparative studies have often conflated differences in sampling bias with differences in effort and absolute abundance between methods, potentially obscuring methods’ true biases.

Here, we compare wild bee communities in the northeastern United States as sampled by pan traps, vane traps, and hand netting. Using a dataset of simultaneous sampling by different methods, we compare sample richness and composition between pairs of methods while accounting for differences in the overall number of bees sampled by each.

For a given number of individuals sampled, hand netting captured more bee species than pan traps, which captured more species than vane traps. Pan traps sampled a different pool of species than either of the other two methods. Of 21 bee genera analyzed, eight were overrepresented in pan trap samples relative to hand netting, while seven were relatively underrepresented in pan traps. When compared against vane traps, four genera of 20 were relatively overrepresented in pan traps while six were relatively underrepresented. Pan traps poorly represented very large-bodied genera as compared with the other methods.

We find pervasive biases in bee community sampling methods, with most genera showing significant differences in relative abundance in at least one methodological comparison. At times, genera were relatively underrepresented even by methods that collected them in higher absolute abundance. Since bias is unavoidable in community sampling, studies must measure taxon-specific biases in the context of their system and evaluate the robustness of analytical results.

Dataset DOI: 10.5061/dryad.r2280gbr8

Description of the data and file structure

Files and variables

File: methods--final_EE_010926.R

Description: See Code/software below.

File: raupcrick_Chase.R

Description: See Code/software below.

File: IT_species.csv

Description: Species-level average intertegular distances (ITD) from a selection of specimens in the Winfree Lab collection. Intertegular (IT) distance is measured as the distance across the thorax of a bee specimen, between the bases of the wings (tegulae).

Variables

genus: genus-level identification of bees that measurements pertain to
species: species-level identification of bees that measurements pertain to
femaleIT: average intertegular distance (measured in millimeters) of female specimens measured for a given species
maleIT: average intertegular distance (measured in millimeters) of female specimens measured for a given species

File: pan_net_all.csv

Description: Collection information and taxonomic identifications for bee specimens collected during sampling events with simultaneous use of hand-netting and pan traps.

Variables

uniqueID: unique identifying code for each individual specimen
genus: genus-level taxonomic specimen identification
species: species- or species-group-level taxonomic specimen identification
collector: individual or individuals who collected a given specimen
method: sampling method by which a given specimen was collected - either hand net ("net") or pan trap ("pan")
date: date (month/day/year) on which a given specimen was collected in the field
round: order of an individual sampling event relative to other sampling events conducted at the same site and year (e.g., first sampling event of the year at a site is round 1, second is round 2, etc.)
site: unique identifier for field site/sampling location
study: unique identifier for each individual research study
latitude: decimal latitude of geographic location of field site/sampling location
longitude: decimal longitude of geographic location of field site/sampling location

File: pan_vane_all.csv

Description: Collection information and taxonomic identifications for bee specimens collected during sampling events with simultaneous use of pan traps and blue vane traps.

Variables

uniqueID: unique identifying code for each individual specimen
genus: genus-level taxonomic specimen identification
species: species- or species-group-level taxonomic specimen identification
method: sampling method by which a given specimen was collected - either vane trap ("vane") or pan trap ("pan")
date: date (month/day/year) on which a given specimen was collected in the field
site: unique identifier for field site/sampling location
study: unique identifier for each individual research study
latitude: decimal latitude of geographic location of field site/sampling location
longitude: decimal longitude of geographic location of field site/sampling location

Code/software

methods--final_EE_010926.R: R script needed to run all main analyses in our paper entitled "Separating sampling bias from abundance shows that different methods catch different wild bees". This script compares bee community samples as collected by pan traps against simultaneously collected samples from 1) hand netting and 2) blue vane traps. Specifically, we compare richness of samples via rarefaction and species composition/identities via multivariate methods (Raup-Crick dissimilarity, as calculated using code made available by Chase et al. (2011)). Finally, we measure methods' relative taxonomic biases with respect to individual bee genera using generalized linear mixed models (GLMMs) and test whether genus-level differences in bias may be attributable to differences in body size.

raupcrick_Chase.R: *This file is not our original code - all of the code within was provided by Chase et al. (2011) for calculating the Raup-Crick dissimilarity metric. Our main analysis R script (above) sources the function "raup_crick" from this script to calculate Raup-Crick dissimilarity between bee community samples. See citation for this file below.

Citation: M. Chase, Jonathan; J. B. Kraft, Nathan; G. Smith, Kevin; Vellend, Mark; Inouye, Brian D (2016). Using null models to disentangle variation in community dissimilarity from variation in α-diversity. Wiley. Collection. https://doi.org/10.6084/m9.figshare.c.3308220.v1

Access information

Other publicly accessible locations of the data:

Data was derived from the following sources:

Separating sampling bias from abundance shows that different methods catch different wild bees

Data files

Abstract

README: Separating sampling bias from abundance shows that different methods catch different wild bees

Description of the data and file structure

Files and variables

File: methods--final_EE_010926.R

File: raupcrick_Chase.R

File: IT_species.csv

Variables

File: pan_net_all.csv

Variables

File: pan_vane_all.csv

Variables

Code/software

Access information