Molecular refinement of the taxonomy of Pyrgulopsis springsnails in western North America
Data files
Abstract
Springsnails in the genus Pyrgulopsis in western North America have long been a dilemma for taxonomists. Defining species boundaries has been complicated by the presence of undiscovered or cryptic taxa, range over- or underestimation, species complexes representing an unknown number of species, and named species synonymous with other taxa. To address these issues, we conducted an array of molecular species delimitation analyses using sequences from up to two mitochondrial and two nuclear genes from over 5,000 specimens representing 140 of the 151 currently recognized taxa. These analyses reaffirmed most species hypotheses, favored synonymy for some taxa and re-drawing of species boundaries for others, and provided evidence of the presence of a host of new or cryptic taxa. There were occasional disagreements among methods and genes with respect to species boundaries, particularly within the P. pilsbryana species complex, and we also observed instances of recent hybridization and potential taxa of hybrid origin. Although these analyses clarified many of the species hypotheses within the genus, further taxonomic resolution of this group will require more spatially comprehensive and intensive field sampling coupled with greater genetic representation of individuals. We suspect that the complex evolutionary histories of members of Pyrgulopsis will constitute an ongoing challenge for characterizing the biodiversity of this group.
Dataset DOI: 10.5061/dryad.95x69p8zn
Description of the data and file structure
Attempt to define species boundaries for almost all known species of Pyrgulopsis, springsnails from the Western United States.
Files and variables
File: Concatenated_COI_ND1.fas
Description:
File: ITS2.fas
Description: Alignment of all Internal Transcribed Spacer 2 (ITS2) nuclear sequences. Includes nuclear alignments consisted of 198 positions of ITS2, 46 gap-coded positions, and the first 307 positions in 28S. 489 sequences, 551 bp length
File: ND1.fas
Description: Alignment of all NADH dehydrogenase (ND1) sequences, mitochondrial gene. 943 sequences, 530 bp length.
File: 28S.fas
Description: Alignment consisted of 944 positions in 28S (largenuclear ribosomal subunit) and 44 gap-coded positions. 521 sequences, 998 bp length.
File: COI.fas
Description: Alignment of mitochondrial cytochrome oxidase 1 gene. 5,481 sequences, 658 bp length.
Code/software
Alignments can be examined in notepad, MEGA, or Bioedit. They are in standard fasta format. Files were initially aligned in MEGA 7.0. Nuclear sequences further refined using the online version of MAFFT and Noisy, implemented at NGPhylogeny website (Lemoine et al. 2019; https://ngphylogeny.fr/).
Access information
Other publicly accessible locations of the data:
- Genbank
Data was derived from the following sources:
Specimens and sequencing
New samples for molecular analysis consisted of 4,270 specimens from 644 locations (Figure 1, Supplementary Material 2). These locations were generally chosen to address questions about particular species of interest, to obtain samples from at or near the type locations for some taxa, or to represent undersampled regions that might harbor new species.
We used the QIAGEN DNeasy Blood and Tissue kit to extract genomic DNA from tissues, following the manufacturer’s instructions for tissue. We sequenced two mitochondrial regions, cytochrome c oxidase subunit 1 (COI) and NADH dehydrogenase subunit 1 (ND1), and two nuclear genes, the second internal transcribed spacer (ITS2) and the large ribosomal subunit (28S). These four genes were chosen because they exhibited a range of evolutionary rates and exposure to selection, and represented the broadest taxonomic array of reference sequences for Pyrgulopsis. Because of their linkages, we assumed the phylogenetic signal would be similar between COI and ND1, and between ITS2 and 28S. Not all individuals were sequenced at all genes. We began by sequencing all individuals at COI, and from each COI clade exhibiting a marked level of divergence and represented by more than one individual, we selected one or more individuals to sequence at the three additional genes.
We amplified the four regions using previously published primers and primer-specific annealing temperatures (Supplementary Material 3). Reaction volumes of 30 μl contained 50–100 ng DNA, 1× reaction buffer, 2.5 mM MgCl2, 200 μM each dNTP, 1 μM each primer, and 1 U Taq polymerase (Thermo Fisher Scientific, MA). The PCR program was 94 °C/5 min, [94 °C/60 s, annealing temperature/60 s, 72 °C/90 s] × 34 cycles, 72 °C/5 min. Because both reverse primers for the ITS2 and 28S regions were published as “LSU3”, we added “Benke” to the reverse primer for the 28S region to distinguish them in the lab. We evaluated PCR products using electrophoresis of 1.6% agarose gel. Those products were purified using ExoSap-IT (Thermo Fisher, MA) according to manufacturer’s instructions. Reactions were sequenced bidirectionally at Eurofins Genomics (Louisville, KY) using the primers used for amplification and standard Sanger sequencing protocols.
Sequences of COI and ND1 were aligned by eye in MEGA 7.0 (Kumar et al. 2016). All lacked indels and were translated into amino acids to verify that stop codons were absent and the reading frame remained consistent. The nuclear sequences had multiple indels and were initially aligned using the online version of MAFFT (Katoh et al. 2019; https://mafft.cbrc.jp/alignment/server/). In some portions of the nuclear sequences, these subsequent alignments were nonsensical (but obviously so, because the indel-rich portions of the nuclear sequences were bounded by highly conserved regions that were sometimes incorrectly split by the alignment algorithm), and we subjected these to further manual modification. Highly ambiguous portions of these sequences remained, which we identified and removed using Noisy (Dress et al. 2008) with more conservative settings (Tan et al. 2015) implemented via the NGPhylogeny website (Lemoine et al. 2019; https://ngphylogeny.fr/). The final alignments still contained gaps, which we chose to retain because they can be useful for species delimitation and elucidating deeper phylogenetic relationships (Nagy et al. 2012; Tan et al. 2015). These sequence gaps were coded using FastGap (Borchsenius 2009) with the gap codes appended to the respective nuclear sequences. We sought to compare newly collected samples to reference sequences already present in public databases, so adjusted new sequence fragments to maximize overlap. One of the nuclear alignments consisted of 198 positions of ITS2, 46 gap-coded positions, and the first 307 positions in 28S (hereafter referred to as ITS2), whereas the second nuclear alignment consisted of 944 positions in 28S and 44 gap-coded positions (hereafter, 28S). These nuclear alignments shared 233 bases of 28S.
