Code for: A century of wild bee sampling: historical data and neural network analysis reveal ecological traits associated with species loss

Data files

Aug 05, 2024 version files 2.03 MB

AllVettedVeganESGR.csv

35.46 KB
ML_nwk_version.txt

11.56 KB
na_clepto_OldenImp_1000rep.csv

341.70 KB
na_OldenEvansIsaacs_allbees_1000iter.csv

598.19 KB
na_oliG_OldenImp_1000rep.csv

341.65 KB
na_polyC_OldenImp_1000rep.csv

344.71 KB
na_polyG_OldenImp_1000rep.csv

344.07 KB
README.md

8.04 KB

Abstract

We analyzed the wild bee community sampled from 1921 to 2018 at a nature preserve in southern Michigan, USA to study long-term community shifts in a protected area. During an intensive survey in 1972 and 1973, F.C. Evans detected 135 bee species. In the most recent intensive surveys conducted in 2017 and 2018, we recorded 90 species. Only 58 species were recorded in both sampling periods, indicating a significant shift in the bee community. We found that the bee community diversity, species richness, and evenness were all lower in recent samples. Additionally, 64% of the more common species exhibited a more than 30% decline in relative abundance. Neural network analysis of species traits revealed that extirpation from the reserve was most likely for oligolectic ground-nesting bees and kleptoparasitic bees, whereas polylectic cavity-nesting bees were more likely to persist. Having longer phenological ranges also increased the chance of persistence in polylectic species. Further analysis suggests a climate response as bees in the contemporary sampling period had a more southerly overall distribution compared to the historic community. Results exhibit the utility of both long-term data and machine learning in disentangling complex indicators of bee population trajectories.

Analysis of historical bee community sampling in the E.S. George Reserve: an ecological preserve. Published as Graham et al. 2024. The contents here are split into two sections:

1 - R scripts used in the analysis of Graham et al. 2024

2 - Data used in the analysis of Graham et al. 2024

Further info available at https://github.com/prglaum/historicalBeeRecords_ESGR & all data is available from https://agdatacommons.nal.usda.gov/articles/dataset/Data_from_A_century_of_wild_bee_sampling_historical_data_and_neural_network_analysis_reveal_ecological_traits_associated_with_species_loss_/25233991

R scripts used in the analysis of Graham et al. 2024 (linked to Zenodo):

All analysis code is presented as R scripts from the R statistical language. All R scripts are commented with detailed instructions. All R scripts can be opened in base R (https://cran.r-project.org/), RStudio (https://posit.co/download/rstudio-desktop/), or any other IDE.

We recommend reviewing NNscript_analysis_plots_github.R before investigating code/analysis presented in the phylogeneticAutocorrelation.R script. This is because this later script utilizes analysis techniques introduced in the NNscript_analysis_plots script.

AnalyticalScript_plots:

R script for all the figures from the main text.

NNscripts_analysis_plots:

R script detailing the neural network analysis used in the main text and supplementary material.

SupplFigures:

R script with all the necessary code for the (non-neural network) figures in the Supplementary Results.

geoPlotting:

R script detailing the mapping and GIS work done in Graham et al 2024. Also includes code to recreate Figure S26.

phyloAutCorr:

R scripts detailing the use of Phylogenetic Eigenvector Mapping and other techniques used in analyzing phylogenetic autocorrelation.

Data used in the analysis of Graham et al. 2024:

Most data is presented in .csv or .xlsx formats. These are common file types which can be opened in Excel or via languages like R and Python. The bee phylogeny is a .txt file written in Newick format and should be opened via the phyloAuto script for our purposes. Each data file is followed by the figures it was used in creating and the R script(s) which call load that data.

AllVettedVeganESGR.csv (Fig S11; SupplFigures script):

Sampled specimen data from the Evans historical (1972/1973) and Isaacs contemporary (2017/2018) sampling periods in the necessary format to rarify species richness using the vegan package in R. Each column heading represents a unique species and each row indicates a sample year with values indicating the number of that species sampled per year. This file is the input data required to run the rarefaction analysis using the vegan package in R (SupplFigures script).

ML_nwk_version.txt (Fig S7,S8, S13, S20; phyloAutoCorr script):

Bee phylogeny used in analysis of phylogenetic autocorrelation in Newick formatting. Load into a session of R using the phyloAuroCorr script. Phylogeny can be sourced: Henríquez-Piskulich P, Hugall AF, Stuart-Fox D. 2024 A supermatrix phylogeny of the world’s bees (Hymenoptera: Anthophila). Mol. Phylogenet. Evol. 190, 107963. (doi:10.1016/J.YMPEV.2023.107963) OR at: http://beetreeoflife.org/

Neural Network Olden values (Fig 2 & Figs S15,S17,S18,S19; NNscript_analysis_plots script):

Necessary Re-scaled Olden (RSO) importance values from the unique training iterations of neural networks trained on North American trait variables and extirpation/persistence data from the historical (1972/1973) and contemporary (2017/2018) sampling periods.
All Olden value data files have the following variables:

x_names: The names of the input feature (trait variable in our case) evaluated for RSO Importance.
importance: The raw Olden importance value from the weights of one of the individual neural network out of the 1000 trained.
RSimportance: The Olden importance rescaled to be between -1 and 1 to be able to better compare across individually trained networks. This is the variable used in the RSO Importance analysis (e.g., Fig 2).
order: The relative order of the trait variable's RSO Importance variable from each unique training instance out of the 1000 uniquely neural networks. There are 7 input trait features, so each trained network ordered traits w/in a range from 1 (most associated with extirpation) to 7 (most associated with persistence).
rel_imp: Value for the Garson importance, an importance metric without direction only weight of predictive power.

Neural Network Olden Files:

na_OldenEvansIsaacs_allbees_1000iter.csv - Olden data from neural networks trained on all bee data in extirpation analysis between 1972/1973 & 2017/2018. Used in NNscripts_analysis_plots script to make Figure 2.
na_polyC_OldenImp_1000rep.csv - Olden data from neural networks trained on polylectic cavity nesting bee data in extirpation analysis between 1972/1973 & 2017/2018. Used in NNscript_analysis_plots script to make Fig S15.
na_polyG_OldenImp_1000rep.csv - Olden data from neural networks trained on polylectic ground nesting bee data in extirpation analysis between 1972/1973 & 2017/2018. Used in NNscript_analysis_plots script to make Fig S16.
na_oliG_OldenImp_1000rep.csv - Olden data from neural networks trained on oligolectic ground nesting bee data in extirpation analysis between 1972/1973 & 2017/2018. Used in NNscript_analysis_plots script to make Fig S18.
na_clepto_OldenImp_1000rep.csv - Olden data from neural networks trained on kleptoparasitic bee data in extirpation analysis between 1972/1973 & 2017/2018. Used in NNscript_analysis_plots script to make Fig S19.

Related files NOT available on Dryad, but available on Ag Data Commons:

https://agdatacommons.nal.usda.gov/articles/dataset/Data_from_A_century_of_wild_bee_sampling_historical_data_and_neural_network_analysis_reveal_ecological_traits_associated_with_species_loss_/25233991

ESGR_samples_EvansIsaacs_withTraits.csv (Fig S22-S24; SupplFigures script):

Sampled specimen data from the Evans historical (1972/1973) and Isaacs contemporary (2017/2018) sampling periods with species trait data attached to each individual specimen. Each row represents a unique sampling of an individual bee. The species traits of each bee are described in every row it appears. See Grahametal2023_BeeTraits.xlsx's ReadMe sheet for a full description of the trait variables.

Grahametal2023_BeeTraits.xlsx (all figures describing species traits; called in every script):

Species taxonomic information, natural/life history traits, physiological traits, geographic distribution, etc. The .xlsx file contains two sheets. TraitData: all trait data per species. ReadMe: a mini "ReadMe" with all trait data columns explained.

RawGBIFdata_final.csv (File containing the raw dated georeferenced data from GBIF):

GBIF Occurrence Download https://doi.org/10.15468/dl.ta4zxp Accessed from R via rgbif (https://github.com/ropensci/rgbif) on 2023-04-26 Data was used to create phenology and geospatial profiles per bees species. See GBIF for more on their data formatting.

Contemporary sampling

An open area on the north side of the ESGR (GPS coordinates: 42.461808, -84.011128) was the primary site for this study as it corresponds to the location of “Evans’ Old Field”, one of the areas historically sampled for bees. The site was visited every other week during the summers of 2017 and 2018 to sample bees. In 2017, the first sampling day was June 1 and the final sampling day was September 25. In 2018, the first sampling day was May 8 and the final day was October 3. We expanded sampling in 2018 to include a wider diversity of bees with narrower phenological periods.

During each visit we sampled bees using three methods. First, we walked to the center of the open field and randomly selected a direction to start the first 25 meter transect. Three other 25 m transects were then established based on the first one, each at a 90-degree angle from the neighboring transect for a total of 100m sampled, with each transect segment moving away from a central location. Each transect was walked for 10 minutes each, a total of 40 minutes of sampling. We used aerial insect nets to collect bees found within 1.5m of the transect, and time was stopped for specimen processing. The host plant was recorded for all specimens captured from flowers. Flowering plants were identified to the lowest taxonomic level in the field using Newcomb’s guide (Newcomb, 1977) and the PlantNet app (Joly et al., 2016), usually to species. Second, we spent 20 minutes collecting bees from plants of any species in the general vicinity of the open field. Third, to most closely match the methods used by Evans (see below), we spent 30 minutes sampling bees at each of the primary blooming plant species located in the field. Total time spent conducting this final sampling method varied based on the number of primary blooming plants at each visit, with a minimum of 30-minutes if there was only one primary plant. This sampling method was always done last, and included any plants that we collected more than one bee from that day. All bees were identified to species (or lowest possible taxonomic level) using relevant keys (Mitchell, 1960; Mitchell, 1962; Ribble, 1967; LaBerge and Bouseman, 1970; LaBerge, 1971; LaBerge, 1973; LaBerge, 1977; Bouseman and LaBerge, 1978; LaBerge, 1980; LaBerge, 1985; McGinley, 1986; LaBerge, 1989; Coelho, 2004; Gibbs, 2011; Rehan and Sheffield, 2011; Gibbs et al., 2013). All specimens collected in 2017 and 2018 are currently held in the Isaacs Lab at Michigan State University (as of 2024), and will eventually be deposited at the A.J. Cook Arthropod Collection at Michigan State University for long-term inclusion in that collection.

Historical data

The University of Michigan Museum of Zoology Insect Collection (UMMZI), Ann Arbor, MI, holds over 4,000 bee specimens from the historical collections at the ESGR, and specimens were databased as part of this study (see supplemental methods). Historical data were checked for entry errors and outdated taxonomies. Specimens with questionable species determinations were re-examined and re-identified using relevant keys (see above) where possible. Bees that could not be confidently identified to the species level were excluded from the dataset, and entries that were missing the date of collection were also removed. Excluded entries accounted for less than 1% of the specimens. There were notable gaps in records at the ESGR, as there were no focused survey efforts since Evans’ last efforts in 1989, and only occasional specimen records from 1990-1999. There were no surveys and no records for the ESGR after 1999 and prior to this study in 2017/2018. All specimens from the ESGR were included in this dataset, not only those specifically collected at the Evans’ Old Field.

In addition to analyses of the 4,000 plus records from the ESGR since 1921, we also conducted more focused analyses comparing Evans’ dataset from his 1972 and 1973 collection effort to our contemporary collections in 2017 and 2018. Evans’ original dataset from 1972/1973 was available through UM records, and we used this for our focused analyses comparing historic (1972/1973) years to the contemporary sampling years (2017/2018), including comparison of floral host plants. The dataset is unique compared to the records from the museum, because Evans did not always collect observed bees if he was confident in their identification (especially Bombus spp. and oligolectic species, e.g., Andrena rudbeckiae Robertson, 1891 and Dufourea monardae (Viereck, 1924); (Evans, 1986)), and these records come only from the site now called Evans’ Old Field, whereas the exact sampling locations within the ESGR of many other specimens in the collection are not known. Therefore, his original dataset provides a more complete representation of the community he encountered at the Evans’ Old Field location.

Evans describes his sampling as: “records of the dates and duration of flowering were made at frequent intervals (2-3 days every week) throughout the flowering season…Observation of visitation by bees was usually made between 9:00 am and 4:00 pm and on any given day was limited to a maximum of 30-40 minutes per flower species…no orderly system of monitoring was developed. More attention was given to abundant resources when they were being heavily visited than was paid to them near the beginning or end of their flower periods or to less frequently encountered species” (Evans, 1986).

Data analyses

We conducted analyses across the near century of bee records (1921-2018), with separate analyses comparing the major sampling periods in 1972/1973 (“historical” period) and 2017/2018 (“contemporary” period). See Fig. S1 for a graphic representing our data structure and analyses. We discuss possible limitations to our study based on differences in collection efforts between each focused sampling period (historical vs contemporary) in Suppl. Methods.

Community change across a century of data.

We employed two different methods to analyze the museum records. Each method accounts for biases in the data in different ways. Our first approach was similar to that used by Bartomeus et al. (Bartomeus et al., 2013). To account for changes in practices across a century of records (1921-2018), such as including only a synoptic collection in the museum, we removed any repeat specimens from the dataset. These were records of the same species collected on the same date (e.g., if there were five Bombus impatiens Cresson, 1893 records for June 3, 1979, we only kept one of these records in the dataset for analyses) (see methods in Bartomeus et al. (Bartomeus et al., 2013)). Then, to account for uneven sampling, we binned specimens into seven groupings of consecutive years to achieve more even sample numbers in each bin prior to comparison of species richness (Bartomeus et al., 2013) (Fig. S1; Table 1). We then rarefied all bins to the bin with the lowest sample size (n = 320) using the rarefy function (package: vegan (Oksanen et al., 2019)) in R (R Core Team, 2019) to calculate the mean species richness and standard error of the mean. Sampling primarily occurred between April and September, though there were occasional specimen collections in October and November. The number of specimens collected each month are reported in the Supplementary Materials (Table S1). To account for uneven sampling across all months, we also report species richness and rarefied species richness, as above, on a reduced dataset that only includes specimens collected from May – September.

Our second approach for determining changes in the community across the near century of records was primarily to reduce the influence of under-sampled years. We accomplished this by calculating rarefied species richness per year to compare equal sample sets for richness, using the R package vegan. To limit effects of low sample years, we included years with 20, 30, 40, 50, 60, or 70 observations. Due to apparent quadratic patterns, richness was regressed across sample years using generalized additive models (GAMs) with year as a smoothing factor and minimized generalized cross-validation (GCV) for fitting (function: gam, package: mgcv; (Wood, 2004; Wood, 2011; Wood, 2022)).

Measuring change in community composition between contemporary and historical sampling periods.

We compared the contemporary sampling (years 2017 and 2018) to Evans’ historical dataset (1972 and 1973). We chose these years to compare due to similarities in sampling methods and intensity. However, only data from months with sampling efforts in both the contemporary and historical data sets were included (May, June, July, August, September). We rarefied the species richness for the historical data and the contemporary data (function: rarefy, package: vegan) to the lowest sample size (contemporary data: n = 1271). We also calculated Shannon diversity (Hill number) in R for each sampling effort (function: hill_div; package: hilldiv (Alberdi, 2019)), which accounts for both species richness and evenness in the population. A higher diversity estimate would indicate high species richness and evenness, whereas lower diversity can be due to low richness or evenness or both. We rounded to whole numbers to represent an estimate of the number of species (as Hill numbers represent the effective number of species). Shannon diversity was statistically compared between datasets using Hutcheson t-test for two communities in R (function: Hutcheson_t_test, package: ecolTest (Salinas, 2021)). Additionally, we calculated Pielou’s evenness index ([Shannon diversity]/log(species richness)) for both datasets in R.

Non-metric multidimensional scaling (NMDS) ordinations were used to visualize differences in species composition between the historical and contemporary datasets (vegan package). We then used permutational multivariate analysis of variance (PERMANOVA) (function: adonis, package: vegan) (Anderson, 2001) in R to determine whether the bee communities were significantly dissimilar in ordination between the two sampling periods. Since PERMANOVA is sensitive to differences in dispersion among groups, regardless of ordination, we first tested for homogeneity of dispersion using the betadisper function (package: vegan). This is a multivariate analog of Levene’s test for homogeneity of variances that implements PERMDISP2 (Anderson, 2004). Homogeneity of dispersion paired with significant differences between groups according to the PERMANOVA would therefore indicate that the community composition differed between historical and contemporary samples. A dummy species was added to the species matrix prior to analyses and analyses were based on Bray-Curtis dissimilarity, with 999 permutations. We also used the envfit function (package: vegan) to determine which members of the bee community were significantly (p < 0.01) contributing to ordination in the NMDS. We fit vectors on the NMDS plot according to species contributions, with the length of the vector corresponding to the strength of the contribution. As Bray-Curtis can be sensitive to overall abundance and relative abundance, we also conducted the same analyses as above using relative abundance (relative abundance = [raw species abundance]/[total abundance of all species for that month] * 1000).

To compare dominant species within the two communities, we calculated the percent composition of each species to the overall community in the historical sampling years and the contemporary years. These values were compared for species that made up over 2%. We also calculated the percent change in relative abundance of species that were collected in both the historical and contemporary samples. To calculate relative abundance, we summed the number of records for each species collected in each sampling period (records were summed for 1972/1973 and for 2017/2018). We then calculated the relative abundance by dividing the summed species record by the total number of specimens in that collection period. This was to account for more sampling effort (more days sampling and more specimens collected) in the historical period. We then calculated the percent change in relative abundance for species with at least 30 records across the two sampling periods. Species were designated as declining if they had an abundance decline of more than 30% and increasing if they had an increase of more than 30%. Species in between were considered stable. This cutoff was chosen to match the IUCN designation for vulnerable species (IUCN Standards and Petitions Subcommittee, 2014).

Trait Database

Natural history characteristics and traits were collected and collated for the species in our study. Trait data were sourced via literature review, expert knowledge, specimen measurement, the Global Biodiversity Information Facility (DOI: https://doi.org/10.15468/dl.ta4zxp GBIF 2023 (GBIF, 2023)), and geographic measurements taken using the sf package in R (Pebesma, 2018). Finally, for all species present in the historical (1972/1973) period, we created a binary “extirpation” variable by numerically labeling the species as persistent (0) or extirpated (1) depending on their sample status in the contemporary (2017/2018) sample period. See Supplementary Methods, Fig. S2, and Tables S2 & S3 for a full description of each trait and the process used to obtain the information.

Trait data were used in two types of analyses: 1) At the species level we used traits as potential predictors of species persistence or local extirpation in the ESGR between the historical and contemporary periods, and 2) At the community and species levels, we compared trait composition between the historical and contemporary sampling periods, and across the entire near century of data.

Trait based predictors of species extirpation: Artificial Neural Network analysis

At the species level we used traits as potential predictors of species extirpation from the ESGR between the historical (1972/1973) and contemporary (2017/2018) periods. Initial use of mixed modeling approaches (given the mixed categorical and continuous trait data) proved incapable of efficiently exploring the broad possibilities while achieving explanatory power. Ordination methods fared better, but presented their own limitations (though, see Boersma et al. (Boersma et al., 2016) for more on ordination analysis in trait space). PERMANOVA and NMDS explained little variance (see Supplementary Results; Fig. S3) and PCA analysis was limited by the high dimensionality required to achieve any explanatory power and the number of predictors that highly contributed to multiple principal components (Fig. S3).

Therefore, to study the relationship between bee traits and extirpation, we pursued neural networks due to their flexible non-parametric nature, ability to handle both categorical and continuous data, and increased capability to capture variance and feature effects despite lingering collinearity and limited data, respectively. Here we provide only a general description of our neural network analysis in R (“nnet” package; (Venables and Ripley, 2002)), please see Supplementary Methods for further details. We used single-layer neural networks with a limited number of nodes (3-6 nodes; see Fig. S4) to restrain overfitting (Heaton, 2008) and a decay rate during training ranging between 5e-2 and 5e-4. Species’ trait data were cleaned/prepared (see Supplementary Methods) to function as “input features” (analogous to predictor variables) in order to predict species extirpation in the ESGR. To control the number of input features and avoid collinearity between categorical variables (see Supplementary Methods), we combined categorical variables into 4 dichotomous categorical traits: “polylectic ground-nesting”, “polylectic cavity-nesting”, “oligolectic ground-nesting,” and “kleptoparasitic.” We did not include oligolectic cavity-nesting bees as a separate input feature because only three sampled species matched the designation. After data cleaning, 11 traits remained as input features, including our 4 dichotomous categorical traits and 7 quantitative traits: intertegular distance (ITD; mm), species’ North American (NA) latitudinal max/min range, species North American (NA) longitudinal max/min, phenological mean date (day of the year), phenological range (day of the year; see Table S3 in the Supplementary Methods).

For each unique feature set (i.e., unique set of species’ traits and extirpation/persistence data; see results), we determined the simplest network structure that consistently limited error while producing high predictive performance (i.e., >95% accuracy). This structure was then initialized with distinct random initial weight connections between nodes and uniquely trained 1000 times (see Beck 2018). In each unique network, feature influence over model prediction was measured with Olden importance values (NeuralNetTool package; (Beck, 2018)). The Olden method produces continuous values with positive values indicating a positive relationship with the active state of the predicted variable (i.e., species extirpation) and negative values indicating the opposite (i.e., species persistence). For every unique network, we standardized Olden importance values by rescaling them between -1 and 1 in order to preserve relative trait importance and test for consistency of effect across networks (see Supplementary Methods). For brevity, we refer to this rescaled Olden value as RSO importance value.

To obtain high predictive accuracy, neural networks can rely on both subtle or complex relationships in training data that do not necessarily represent relevant ecological relationships in nature. However, by focusing on the clearest relationships in our RSO values, such as highest absolute values (abs) & lowest dispersion, we can aid in the selection of relevant trait variables to statistically “vet” for significant ecological relationships. Specifically, we selected traits with median abs(RSO) >0.5 as candidate predictor variables in binomial regressions. Exact equation structure of the binomial regressions was developed heuristically with the aid of partial dependence plots (“iml” package; (Molnar et al., 2018)) and fundamental domain knowledge (see Supplementary Methods, Fig. S5 & S6).

We tested RSO importance values from all trained network structures in our analysis for robustness against the potentially confounding issues of phylogenetic autocorrelation and the influence of rare species in our dataset. Using a recent molecular bee phylogeny (Henríquez-Piskulich et al., 2024), we found no phylogenetic signal in extirpation/persistence across species in our neural network analysis (see Fig. S7 & S8). Furthermore, we explored any effects of phylogenetic signal in our traits over network training, accuracy, and RSO importance values using Phylogenetic Eigenvector Maps (MPSEM package; (Guenard and Legendre, 2022)) as separate or additional input features in our neural networks (de Oliveira Caetano et al., 2022). Additionally, we considered the effects of outlying rare species on RSO importance values by comparing RSO importance values to those from a corresponding analysis with singletons and doubletons removed. See Supplementary Methods for full descriptions.

Trait composition across sampling periods

We compared the distributions of individual quantitative traits between the two sampling periods at two levels: the sample community level and the species level (see Fig. S9 for graphic representation). The sample community level accounts for sample abundance and defines community trait distributions as a function of all sampled bee individuals per period (Fig. S9a). On the other hand, the species level does not account for sample abundance and instead defines community trait distributions across all the uniquely sampled bee species per period (Fig. S9b). Trait distributions between the major historical (1972/1973) and contemporary (2017/2018) sampling periods were compared at the sample and species level. Trait distributions across the seven sample period bins (see Methods; Table 1, Fig. S1) used during analyses of the near century of data were compared at the species level only.

Furthermore, we used ordination methods to develop a holistic understanding of the trait composition between the major historical and contemporary periods. Specifically, we separated community composition via averages of quantitative traits and percent makeup of each categorical trait (e.g., nesting location or lecty) by sample period and month. We visualized trait composition differences between the historical and contemporary communities via NMDS (see methods described above) and used PERMANOVA to test for statistical differences after confirming no differences in dispersion.

Change in floral hosts

To test if there was a significant change in species richness of floral hosts (flowers that bees were netted on) we used a chi-square test to compare floral species richness across the four years (1972, 1973, 2017, 2018). This was conducted in GraphPad Prism. Percent of specimens collected from each floral host in each year was also calculated to compare the dominant host species across years.

Alberdi A. 2019. Integral Analysis of Diversity Based on Hill Numbers. Available from https://github.com/anttonalberdi/hilldiv.

Anderson MJ. 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecol. 26(1):32–46. https://doi.org/10.1046/j.1442-9993.2001.01070.x.

Anderson MJ. 2004. PERMDISP: a FORTRAN computer program for permutational analysis of multivariate dispersions (for any two-factor ANOVA design) using permutation tests.

Bartomeus I, Ascher JS, Gibbs J, et al. 2013. Historical changes in northeastern US bee pollinators related to shared ecological traits. Proc. Natl. Acad. Sci. U. S. A. 110(12):4656–60. https://doi.org/10.1073/pnas.1218503110.

Beck MW. 2018. NeuralNetTools: Visualization and Analysis Tools for Neural Networks. J. Stat. Softw. 85(11):1–20.

Boersma KS, Dee LE, Miller SJ, et al. 2016. Linking multidimensional functional diversity to quantitative methods: a graphical hypothesis-evaluation framework. Ecology. 97(3):583–593. https://doi.org/10.1890/15-0688.

Bouseman JK, LaBerge WE. 1978. A revision of the bees of the genus Andrena of the Western Hemisphere. Part IX. Subgenus Melandrena. Trans. Am. Entomol. Soc. 104(3/4):275–389. Available from https://about.jstor.org/terms.

Coelho B. 2004. A review of the bee genus Augochlorella (Hymenoptera: Halictidae: Augochlorini). Syst. Entomol. 29(3):282–323. https://doi.org/10.1111/j.0307-6970.2004.00243.x.

Evans FC. 1986. Bee - Flower Interactions on an Old Field in Southeastern Michigan. In: Clambey GK, Pembley RH, editors. The prairie: past, present and future. Proceedings of the ninth North American Prairie Conference. Fargo, ND: Tri-College University Center for Environmental Studies. p. 103–109. Available from https://digitalcommons.usu.edu/bee_lab_er/20.

GBIF. 2023. GBIF Occurrence Download. Available from https://doi.org/10.15468/dl.ta4zxp Accessed from R via rgbif (https://github.com/ropensci/rgbif) on 2023-04-26.

Gibbs J. 2011. Revision of the metallic Lasioglossum (Dialictus) of eastern North America (Hymenoptera: Halictidae: Halictini). Zootaxa. 3073:1–216. Available from www.mapress.com/zootaxa/.

Gibbs J, Packer L, Dumesh S, et al. 2013. Revision and reclassification of Lasioglossum (Evylaeus), L. (Hemihalictus) and L. (Sphecodogastra) in eastern North America (Hymenoptera: Apoidea: Halictidae). Zootaxa. 3672:1–117. https://doi.org/10.11646/zootaxa.3672.1.1.

Guenard G, Legendre P. 2022. MPSEM: Modeling Phylogenetic Signals using Eigenvector Maps. Available from https://cran.r-project.org/package=MPSEM.

Heaton J. 2008. Introduction to neural networks with Java. Heaton Research, Inc.

Henríquez-Piskulich P, Hugall AF, Stuart-Fox D. 2024. A supermatrix phylogeny of the world’s bees (Hymenoptera: Anthophila). Mol. Phylogenet. Evol. 190:107963. https://doi.org/10.1016/J.YMPEV.2023.107963.

IUCN Standards and Petitions Subcommittee. 2014. Guidelines for Using the IUCN Red List Categories and Criteria. Available from http://www.iucnredlist.org/documents/RedListGuidelines.pdf.

Joly A, Bonnet P, Goëau H, et al. 2016. A look inside the Pl@ntNet experience: The good, the bias and the hope. Multimed. Syst. 22(6):751–766. https://doi.org/10.1007/s00530-015-0462-9.

LaBerge WE. 1971. A revision of the bees of the genus Andrena of the Western Hemisphere. Part IV. Scrapteropsis, Xiphandrena and Raphandrena. Trans. Am. Entomol. Soc. 97:441–520.

LaBerge WE. 1973. A revision of the bees of the genus Andrena of the Western Hemisphere. Part VI. Subgenus Trachandrena. Trans. Am. Entomol. Soc. 99(3):235–371. Available from https://www.jstor.org/stable/pdf/25078133.pdf?refreqid=excelsior%3A77d65d640a3a193bb61c919e3bcbe271.

LaBerge WE. 1977. A revision of the bees of the genus Andrena of the Western Hemisphere. Part VIII. Subgenera Thysandrena, Dasyandrena, Psammandrena, Rhacandrena, Euandrena, Oxyandrena. Trans. Am. Entomol. Soc. 103(1):1–143. Available from https://about.jstor.org/terms.

LaBerge WE. 1980. A revision of the bees of the genus Andrena of the western hemisphere. Part X. Subgenus Andrena. Trans. Am. Entomol. Soc. 106(4):395–525. Available from https://about.jstor.org/terms.

LaBerge WE. 1985. A revision of the bees of the genus Andrena of the Western Hemisphere. Part XI. Minor subgenera and subgeneric key. Trans. Am. Entomol. Soc. 111(4):441–567. Available from https://about.jstor.org/terms.

LaBerge WE. 1989. A revision of the bees of the genus Andrena of the Western Hemisphere. Part XIII. Subgenera Simandrena and Taeniandrena. Trans. Am. Entomol. Soc. 115(1):1–56. Available from https://www.jstor.org/stable/pdf/25078446.pdf.

LaBerge WE, Bouseman JK. 1970. A revision of the bees of the genus Andrena of the Western Hemisphere. Part III. Tylandrena. Trans. Am. Entomol. Soc. 4:543–605. Available from https://about.jstor.org/terms.

McGinley RJ. 1986. Studies of Halictinae (Apoidea: Halictidae), I: Revision of New World Lasioglossum Curtis. Smithson. Contrib. to Zool. 429:1–294.

Mitchell TB. 1960. Bees of the Eastern United States: volume I. N. C. Agric. Exp. Stn. Tech. Bull. 141:1–538.

Mitchell TB. 1962. Bees of the Eastern United States: volume II. N. C. Agric. Exp. Stn. Tech. Bull. 152:1–557.

Molnar C, Bischl B, Casalicchio G. 2018. iml: An R package for Interpretable Machine Learning. https://doi.org/10.21105/joss.00786.

Newcomb L. 1977. Newcomb’s Wildflower guide: The classic field guide for quick identification of wildflowers, flowering shrubs, and vines. First Edition. Little, Brown and Company. 490 p.

Oksanen J, Blanchet FG, Friendly M, et al. 2019. vegan: Community Ecology Package. Available from https://cran.r-project.org/package=vegan.

de Oliveira Caetano GH, Chapple DG, Grenyer R, et al. 2022. Automated assessment reveals that the extinction risk of reptiles is widely underestimated across space and phylogeny. PLOS Biol. 20(5):e3001544. https://doi.org/10.1371/JOURNAL.PBIO.3001544.

Pebesma E. 2018. Simple Features for R: Standardized Support for Spatial Vector Data. R J. 10(1):439–446. Available from https://doi.org/10.32614/RJ-2018-009.

R Core Team. 2019. R: A Language and Environment for Statistical Computing. Available from https://www.r-project.org/.

Rehan SM, Sheffield CS. 2011. Morphological and molecular delineation of a new species in the Ceratina dupla species-group (Hymenoptera: Apidae: Xylocopinae) of eastern North America 1. Zootaxa. 2873:35–50. Available from www.mapress.com/zootaxa/.

Ribble DW. 1967. The Monotypic North American Subgenus Larandrena of Andrena (Hymenoptera: Apoidea). Bull. Univ. Nebraska State Museum. 6(3):27–42. Available from http://digitalcommons.unl.edu/museumbulletin.

Salinas H. 2021. ecolTest: Community Ecology Tests. Available from https://github.com/hugosal/ecolTest.

Venables WN, Ripley BD. 2002. Modern Applied Statistics with S. Fourth. New York: Springer.

Wood S. 2022. mgcv: Mixed GAM Computation Vehicle with Automatic Smoothness Estimation. Available from https://cran.r-project.org/package=mgcv.

Wood SN. 2004. Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Am. Stat. Assoc. 99(467):673–686. https://doi.org/10.1198/016214504000000980.

Wood SN. 2011. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J. R. Stat. Soc. Ser. B (Statistical Methodol. 73(1):3–36. https://doi.org/10.1111/J.1467-9868.2010.00749.X.