Data from: Range-wide genetic analysis of an endangered bumble bee (Bombus affinis) reveals population structure, isolation by distance, and low colony abundance

Data files

Mar 22, 2024 version files 199.54 MB

README.md

6.03 KB
rpbb_popgen_archive.zip

199.53 MB

Abstract

Declines in bumblebee species ranges and abundances are documented across multiple continents and have prompted the need for research to aid species recovery and conservation. The rusty patched bumblebee (Bombus affinis) is the first federally-listed bumblebee species in North America. We conducted a range-wide population genetics study of B. affinis from across all extant conservation units to inform conservation efforts. To understand the species’ vulnerability and help establish recovery targets, we examined population structure, patterns of genetic diversity, and population differentiation. Additionally, we conducted site-level analysis of colony abundance to inform prioritizing areas for conservation, translocation, and other recovery actions. We find substantial evidence of population structuring along an east-to-west gradient. Putative populations show evidence of isolation by distance, high inbreeding coefficients, and a range wide male diploidy rate of ~15%. Our results suggest the Appalachians represents a genetically distinct cluster with high levels of private alleles and substantial differentiation from the rest of the extant range. Site-level analyses suggest low colony abundance estimates for B. affinis compared to similar datasets of stable, co-occurring species. These results lend genetic support to trends from observational studies suggesting B. affinis has undergone a recent decline and exhibits substantial spatial structure. The low colony abundances observed here suggest caution in overinterpreting the stability of populations even where B. affinis is reliably detected interannually. These results help delineate informed management units, provide context for the potential risks of translocation programs, and can help set clear recovery targets for this and other threatened bumblebee species.

*CAUTION* Results sensitive to lat-long for users of this dataset will be slightly different than the published version due to the need to obscure coordinates of collections of an endangered species. If precise reproducibility is required, and necessary credentials/permits are held by the research team, exact coordinates can be provided upon request.

EXPLANATION OF FILES AND DIRECTORY STRUCTURE

This repository is intended to document the code behind the project "Range-wide genetic analysis of an endangered bumble bee (Bombus affinis) reveals population structure, isolation by distance, and low colony abundance". The code is all written in R.

Anyone who downloads this repository should be able to reconstruct the results of the analysis. Two steps require downloading software outside of R: COLONY and STRUCTURE. Included within these scripts are the input steps to feed into these pieces of software, and then the wrangling steps to take the output and wrangle it in R. The scripts indicate when you will need to run these analyses outside of R, but for convenience all outputs of COLONY and STRUCTURE are provided.

The scripts are all intended to be run in alpha-numerical order (e.g. 01a, 01b, 02a, 02a01, ...). Occassionally scripts have oddly long sub-prefixes (e.g. 02a01b...), this exists simply because I had some prior branch, but the idea is still the same, if you sort by name, the scripts are in the order they are intended to be run.

For convenience, this directory has the outputs of all data wrangling and data analysis steps already included (though not all figures, necessarily). To conduct a clean run, you could delete all of the files in data_output and analyses_output, then run everything to yield clean outputs.

Should you have any questions or encounter issues, please contact me at john.mola@colostate.edu

---------

data/data_wrangling/

01b_wrangle... -- conducts some basic data cleaning

01c_spatial_data... - cluster into putative populations

01d_merge_genotype... - combine genotype data with specimen metadata

analyses/

02a01_preparing... -- filters and prepares data for COLONY program

02c01_batch... -- further prepares a batch run (i.e. all putative pops separately) for COLONY

[intermediate steps run in COLONY]

03a_create_geneind ... -- creates a genind object from the output of COLONY and for use in estimation of pop-gen statistics

03a01_quality... -- checks HWE, LD, etc

03a02_genetic... -- calculates F-statistics, allelic richness, etc from genind object

03a03_Fst_calc... -- runs models to check pairwise Fst across range

03b_diploid... -- counts frequency of diploid males

03c_capwire... -- runs genetic mark-recapture on COLONY output to determine colony abundance

03d_colony_site... -- compares B. affinis worker:colony ratios to those from Cameron et al. 2011

03e_STRUCTURE... -- prepares data to be run in program STRUCTURE, which is run externally, and then wrangles the data after STRUCTURE outputs are produced

03f_AMOVA... -- runs AMOVA procedure

03g_dapc... -- runs DAPC procedure

figures/figure_wrangling

04a_... -- generates Figure 1 map

04a_figure_supplement... -- generates supplemental map

04b_... -- generates figure of het, allelic richness, etc

04c_... -- generates Fst pairwise figures

04d_... -- generates output figures from STRUCTURE

04e_... -- generates DAPC figure

04g_... -- generates figure comparing Cameron et al. 2011 data to ours

-----------

Explanations of raw data files

Details of their use are described in the data wrangling steps above in scripts 01b through 01d

Bombus_affinis_repository__msatdata_ver_22September2022

This excel workbook contains 3 sheets

Main data: contains extensive information on the specimen storage and access data used by the USDA. Columns relevant to our study include the Internal Barcode, which is for a unique ID for each species, the sex of each specimen (Male/Female), latitude and longitude, state and county information. These former columns are redundant with the information from collectors, as explained below.
MSAT data with barcode: contains the internal barcode (i.e. specimen ID) and then the genotypes (columns D-AH) for each specimen
MSAT error rate: contains the locus ID, error rate from regenotyping at each loci, and standard information relevant to calling micro satellite alleles.

cameron2011_colony_counts

This file is blank, but can be populated by downloading the original source data from Cameron et al. 2011 PNAS. It is available in Table S8 here: https://doi.org/10.1073/pnas.1014743108
The columns and data types are explained therein. Once downloaded, users can create a filled in version of this file to run the analysis following our script.

Files within from_collectors (each of these files contains common columns):

unique_id -- a column of a unique identifier for any given specimen
sex- indicates the sex of the specimen
longitude/latitude - gps coordinates of the specimen (these are jittered within these feels due to the endangered status of the species)
site -- a custom site name for where the specimen was collected
state, county, and other locality information provided and clear
collection date - date the specimen was collected
some files contain additional self explanatory columns which are not used in our study

Files within meta_rpbb_external:

bombus_affinis_shapefiles -- contains a shapefile detailing priority zones, historic counties, as indicated by the US Fish and Wildlife Service's endangered species priority zones
rpbb_extant_counties.csv -- this file contains two columns, indicating the state and counties in the U.S. which are occupied by Bombus affinis in recent years
- rpbb_historic_counties.csv -- this file contains two columns, indicating the state and counties in the U.S. which were occupied by Bombus affinis at any time

Data from: Range-wide genetic analysis of an endangered bumble bee (Bombus affinis) reveals population structure, isolation by distance, and low colony abundance

Data files

Abstract

README: README

Explanations of raw data files

Methods

Works referencing this dataset