Data from: Range-wide genetic analysis of an endangered bumble bee (Bombus affinis) reveals population structure, isolation by distance, and low colony abundance
Data files
Mar 22, 2024 version files 199.54 MB
-
README.md
-
rpbb_popgen_archive.zip
Abstract
Declines in bumblebee species ranges and abundances are documented across multiple continents and have prompted the need for research to aid species recovery and conservation. The rusty patched bumblebee (Bombus affinis) is the first federally-listed bumblebee species in North America. We conducted a range-wide population genetics study of B. affinis from across all extant conservation units to inform conservation efforts. To understand the species’ vulnerability and help establish recovery targets, we examined population structure, patterns of genetic diversity, and population differentiation. Additionally, we conducted site-level analysis of colony abundance to inform prioritizing areas for conservation, translocation, and other recovery actions. We find substantial evidence of population structuring along an east-to-west gradient. Putative populations show evidence of isolation by distance, high inbreeding coefficients, and a range wide male diploidy rate of ~15%. Our results suggest the Appalachians represents a genetically distinct cluster with high levels of private alleles and substantial differentiation from the rest of the extant range. Site-level analyses suggest low colony abundance estimates for B. affinis compared to similar datasets of stable, co-occurring species. These results lend genetic support to trends from observational studies suggesting B. affinis has undergone a recent decline and exhibits substantial spatial structure. The low colony abundances observed here suggest caution in overinterpreting the stability of populations even where B. affinis is reliably detected interannually. These results help delineate informed management units, provide context for the potential risks of translocation programs, and can help set clear recovery targets for this and other threatened bumblebee species.
README: README
*CAUTION* Results sensitive to lat-long for users of this dataset will be slightly different than the published version due to the need to obscure coordinates of collections of an endangered species. If precise reproducibility is required, and necessary credentials/permits are held by the research team, exact coordinates can be provided upon request.
EXPLANATION OF FILES AND DIRECTORY STRUCTURE
This repository is intended to document the code behind the project "Range-wide genetic analysis of an endangered bumble bee (Bombus affinis) reveals population structure, isolation by distance, and low colony abundance". The code is all written in R.
Anyone who downloads this repository should be able to reconstruct the results of the analysis. Two steps require downloading software outside of R: COLONY and STRUCTURE. Included within these scripts are the input steps to feed into these pieces of software, and then the wrangling steps to take the output and wrangle it in R. The scripts indicate when you will need to run these analyses outside of R, but for convenience all outputs of COLONY and STRUCTURE are provided.
The scripts are all intended to be run in alpha-numerical order (e.g. 01a, 01b, 02a, 02a01, ...). Occassionally scripts have oddly long sub-prefixes (e.g. 02a01b...), this exists simply because I had some prior branch, but the idea is still the same, if you sort by name, the scripts are in the order they are intended to be run.
For convenience, this directory has the outputs of all data wrangling and data analysis steps already included (though not all figures, necessarily). To conduct a clean run, you could delete all of the files in data_output and analyses_output, then run everything to yield clean outputs.
Should you have any questions or encounter issues, please contact me at john.mola@colostate.edu
---------
data/data_wrangling/
01b_wrangle... -- conducts some basic data cleaning
01c_spatial_data... - cluster into putative populations
01d_merge_genotype... - combine genotype data with specimen metadata
analyses/
02a01_preparing... -- filters and prepares data for COLONY program
02c01_batch... -- further prepares a batch run (i.e. all putative pops separately) for COLONY
[intermediate steps run in COLONY]
03a_create_geneind ... -- creates a genind object from the output of COLONY and for use in estimation of pop-gen statistics
03a01_quality... -- checks HWE, LD, etc
03a02_genetic... -- calculates F-statistics, allelic richness, etc from genind object
03a03_Fst_calc... -- runs models to check pairwise Fst across range
03b_diploid... -- counts frequency of diploid males
03c_capwire... -- runs genetic mark-recapture on COLONY output to determine colony abundance
03d_colony_site... -- compares B. affinis worker:colony ratios to those from Cameron et al. 2011
03e_STRUCTURE... -- prepares data to be run in program STRUCTURE, which is run externally, and then wrangles the data after STRUCTURE outputs are produced
03f_AMOVA... -- runs AMOVA procedure
03g_dapc... -- runs DAPC procedure
figures/figure_wrangling
04a_... -- generates Figure 1 map
04a_figure_supplement... -- generates supplemental map
04b_... -- generates figure of het, allelic richness, etc
04c_... -- generates Fst pairwise figures
04d_... -- generates output figures from STRUCTURE
04e_... -- generates DAPC figure
04g_... -- generates figure comparing Cameron et al. 2011 data to ours
-----------
Explanations of raw data files
Details of their use are described in the data wrangling steps above in scripts 01b through 01d
Bombus_affinis_repository__msatdata_ver_22September2022
This excel workbook contains 3 sheets
- Main data: contains extensive information on the specimen storage and access data used by the USDA. Columns relevant to our study include the Internal Barcode, which is for a unique ID for each species, the sex of each specimen (Male/Female), latitude and longitude, state and county information. These former columns are redundant with the information from collectors, as explained below.
- MSAT data with barcode: contains the internal barcode (i.e. specimen ID) and then the genotypes (columns D-AH) for each specimen
- MSAT error rate: contains the locus ID, error rate from regenotyping at each loci, and standard information relevant to calling micro satellite alleles.
cameron2011_colony_counts
- This file is blank, but can be populated by downloading the original source data from Cameron et al. 2011 PNAS. It is available in Table S8 here: https://doi.org/10.1073/pnas.1014743108
- The columns and data types are explained therein. Once downloaded, users can create a filled in version of this file to run the analysis following our script.
Files within from_collectors (each of these files contains common columns):
- unique_id -- a column of a unique identifier for any given specimen
- sex- indicates the sex of the specimen
- longitude/latitude - gps coordinates of the specimen (these are jittered within these feels due to the endangered status of the species)
- site -- a custom site name for where the specimen was collected
- state, county, and other locality information provided and clear
- collection date - date the specimen was collected
- some files contain additional self explanatory columns which are not used in our study
Files within meta_rpbb_external:
- bombus_affinis_shapefiles -- contains a shapefile detailing priority zones, historic counties, as indicated by the US Fish and Wildlife Service's endangered species priority zones
- rpbb_extant_counties.csv -- this file contains two columns, indicating the state and counties in the U.S. which are occupied by Bombus affinis in recent years
- - rpbb_historic_counties.csv -- this file contains two columns, indicating the state and counties in the U.S. which were occupied by Bombus affinis at any time
Methods
Please see the published manuscript for full data collection details.