Data from: Genome-wide markers test the status of two putative species of North American bumble bees

Data files

Mar 11, 2026 version files 17.04 MB

README.md

3.52 KB
RohdeEtAl2025_BayesianCOITree.newick

8.43 KB
RohdeEtAl2025_COIContigSlices.zip

1.46 MB
RohdeEtAl2025_MaximumLikelihoodCOI.newick

8.16 KB
RohdeEtAl2025_MaximumLikelihoodUCETree.newick

3.84 KB
RohdeEtAl2025_UCEAlignments.zip

15.55 MB

Abstract

Accurate species delimitation is critical to identifying the conservation status of species. Molecular species delimitation methods have revealed previously unrecognized cryptic species across the taxonomic spectrum. However, studies vary in the molecular markers selected, analytical approaches used, and taxon sampling, which sometimes results in conflicting conclusions. We tested a two-species hypothesis of the Bombus occidentalis complex using nuclear (ultraconserved elements, UCE) and mitochondrial (cytochrome c oxidase I, COI) markers to infer maximum likelihood and Bayesian phylogenies for the taxa. We extracted tissue and sequenced 102 specimens from across the geographic range of the species complex. Through our analyses, we concluded that the complex actually represents two species, B. occidentalis and B. mckayi. Here, we include the raw sequences for the UCE and COI analyses and the final newick-formatted trees from the analyses.

https://doi.org/10.5061/dryad.z612jm6nw

Description of the data and file structure

The goal of this study was to clarify the species status of Bombus occidentalis and B. mckayi by expanding on the phylogenetic analyses that were previously done, which focused on only one mitochondrial gene (cytochrome c oxidase I, COI) and included few specimens from a limited portion of the geographic range. Results of phylogenetic studies vary based on the molecular markers selected, analytical approaches used, and taxon sampling, which sometimes results in conflicting conclusions. Phylogenetic studies that focus on a single gene are criticized for misrepresenting the evolutionary history of species because nuclear and mitochondrial genomes, and even some genes within them, may have different evolutionary patterns. B. occidentalis was once an abundant insect pollinator in western North America but has declined severely since the mid 1990s and is predicted to continue to diminish under even optimistic future climate scenarios. We tested a two-species hypothesis of the B. occidentalis complex using nuclear (ultraconserved elements, UCE) and mitochondrial (COI) markers to infer maximum likelihood and Bayesian phylogenies for the taxa, including many more specimens from across the geographic range than were included in previous studies. Here, we include the raw UCE and COI sequence data for all 102 specimens sequenced for the analysis and the Newick-formatted phylogenetic trees from the final analyses.

Files and variables

File: RohdeEtAl2025_MaximumLikelihoodCOI.newick

Description: The newick-formatted treefile for the phylogenetic tree from our final maximum likelihood analysis of the cytochrome c oxidase I dataset. The methods to create the phylogeny are described in the primary publication.

File: RohdeEtAl2025_COIContigSlices.zip

Description: The raw fasta-formatted cytochrome c oxidase I sequence slices that were extracted from the bycatch of the UCE sequencing for all 102 specimens. These slices were aligned following the methods described in the primary publication.

File: RohdeEtAl2025_BayesianCOITree.newick

Description: The newick-formatted treefile for the phylogenetic tree from our final Bayesian analysis of the COI dataset. The methods to create the phylogeny are described in the primary publication.

File: RohdeEtAl2025_MaximumLikelihoodUCETree.newick

Description: The newick-formatted treefile for the phylogenetic tree from our final maximum likelihood analysis of the UCE dataset. The methods to create the phylogeny are described in the primary publication.

File: RohdeEtAl2025_UCEAlignments.zip

Description: The raw fasta-formatted alignments of UCE sequences for all 102 specimens. These alignments were trimmed and cleaned following the methods described in the primary publication.

Code/software

Phylogenetic trees:

There are several freely available programs that can read and display newick files. We used Figtree version 1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/)

Sequence data:

The sequence data are stored as .fasta files, which is a text-based format designed to represent nucleotide sequences. These files are recognized by most softwares developed to analyze nucleotide sequences.

UCE methods:

Methods generally followed those in Branstetter et al. (2021). We extracted DNA from the mid and hind legs of specimens using a Zymo Quick-DNA Miniprep Plus extraction kit and stored extracts in -80°C freezers at PIRU. Specimens were collected between 1956 and 2017, with one specimen from 1920.

We used a Tapestation 4150 automated electrophoresis system (Agilent, 5301 Stevens Creek Blvd., Santa Clara, CA 95051, USA) to measure the size of DNA fragments extracted from the specimens and Qubit 3.0 to quantify DNA concentrations. The size of fragments varied among specimens due to their variable ages, collection methods, and storage histories. We sheared the DNA fragments to target fragment sizes of 400 to 600 base pairs using a Q800R2 acoustic sonicator (Qsonica, Newtown, CT, U.S.A.). We varied shearing times from 0 seconds to 120 seconds with a 10-second on, 10-second off pulsing pattern. Samples with small fragment sizes were sheared for less time and samples with large fragment sizes were sheared for more time. Once sonicated, we purified the DNA samples using a homemade paramagnetic bead solution (Rohland and Reich 2012).

We captured and sequenced UCE loci from our sample specimens following the methods described in Branstetter et al. (2021). We prepared Illumina sequencing libraries using Kapa Hyper prep kits and custom 8 bp dual indexing adapters (Glenn et al. 2019). We amplified the libraries using 12 cycles of PCR, cleaned the amplified DNA using 1.0 to 1.2x SPRI beads to remove contaminants and fragments smaller than 200 bp, and quantified the DNA using Qubit. Samples with low measured volumes of DNA were re-amplified for 14 to 16 PCR cycles from an aliquot of the pre-PCR library.

We enriched the samples using an existing UCE bee-ant specific baitset (bee-ant-specific Hym-v2, Branstetter et al. 2017, Grab et al. 2019) identified and optimized for use in the order Hymenoptera. The baitset was developed using seven genomes from hymenopteran species, including two species from the bee families Apidae (the family that contains all bumble bees) and Halictidae. We enriched the pooled libraries following a combination of the Arbor Biosciences v3.02 protocol (enrichment day 1) and a protocol based on Blumenstiel et al. (2010). We pooled up to ten samples per library at equimolar concentrations for enrichment. Finally, we repeated the PCR amplification, purification, and quantification steps previously described for the pooled enriched samples. Enriched pools were combined into a final sequencing pool and sent to Novogene Inc. for sequencing on an Illumina HiSeq X instrument (PE150).

COI Methods:

We extracted COI barcodes from the B. occidentalis and B. terricola UCE targeted sequences using the Phyluce program assembly_match_contigs_to_barcodes and a sequence downloaded from BOLD as a bait sequence (BBHYL247).