Data from: Comparative species delimitation of a biological conservation icon
Data files
Jan 27, 2025 version files 3.80 GB
-
all_pairwise_Nei_function.R
3.10 KB
-
BPP_MCMCresults_and_GDIcalcs.zip
587.49 MB
-
bpp.ctl
586 B
-
BPPSetPriors.zip
48.41 KB
-
Fst_Nei_descriptive_localityNames.zip
18.35 KB
-
IQTree.slurm
399 B
-
meristic_data.zip
23.85 KB
-
pairwise_fst_hierfstat_function.R
3.05 KB
-
phylogenies.zip
65.44 KB
-
README.md
8.53 KB
-
sequence_alignments.zip
3.08 GB
-
snmf.zip
205.93 KB
-
VCF.zip
129.34 MB
Abstract
The conservation of biodiversity relies on an accurate assessment of species diversity. We developed a comparative approach to species delimitation that integrates genomic and morphological data for objective assessment of the distinctiveness of species targeted for protection by governmental agencies. We apply this protocol to the Snail Darter (Percina tanasi), a freshwater fish from the Tennessee River that was the focus of the first major legal conflict over protections afforded by the United States Endangered Species Act. Here, we demonstrate the Snail Darter is not a distinct species but is a population of the Stargazing Darter (Percina uranidea) described in 1886. These results illustrate how the integration of multiple lines of evidence in a comparative framework is imperative for properly directing efforts to protect species. This dataset corresponds to the 2025 study in Current Biology titled "Comparative species delimitation of a biological conservation icon."
README: Data from: Comparative species delimitation of a biological conservation icon
https://doi.org/10.5061/dryad.s4mw6m9c9
Refer to Table S1 for Clone Codes that serve as a key to sample names. The Clone Code for a given tissue sample / DNA extraction comprises the abbreviation for its species name and a qualifier, almost always in the form of one or two letters (e.g. EuniB is the second Etheostoma uniporum DNA extraction in Yale's collection, and EuniAA would be the 27th). All files are named with one of the following codes corresponding to sister species pairs:
- EfraEuni = Etheostoma fragi - E. uniporum;
- EsmiEstr = Etheostoma smithi - E. striatulum;
- EatrEsim = Etheostoma atripinne - E. simoterum;
- NmclNsgfNstn = Nothonotus microlepidus - N. sanguifluus - N. starnesi;
- Nden-Ntip = Nothonotus denoncourti - N. tippecanoe;
- NbelNcam = Nothonotus bellus - N. camurus;
- PausPkat = Percina austroperca - P. kathae;
- PapnPbur = Percina apina - P. burtoni;
- PcymPsti = Percina cymatotaenia - P. stictogaster;
- PmcePwil = Percina macrocephala - P. williamsi;
- Ppal = Percina palmaris - P. cf. palmaris;
- PsipPtal = Percina sipsi - P. smithvanizi;
- PtanPura = Percina tanasi - P. uranidea
meristic_data.zip
Contains raw meristic count data tables and specimen information, used for principal component analyses (PCAs). “nd” stands for "no data;” often no data for A1 because it is almost always 2 and not included in the PCAs.
Catalog - this is the catalog number (institutional codes follow Sabaj 2020)
Individual - this denotes the exact individual
Drainage - the river drainage the fish was collected from
Sex - the sex of the fish, if it was possible to determine
SL - standard length (in mm; not included in most datasets because not used in analyses)
LL - lateral line scales
PoreLL - pored lateral-line scales
AbLL - scale rows above the lateral line
BlwLL - scale rows below the lateral line
Trans - transverse scale rows
CD - scales around the caudal peduncle
D1 – dorsal-fin spines
D2 – dorsal-fin rays
P1 – pectoral-fin rays
CD1 – caudal-fin rays
A1 – anal-fin spines (not included in all datasets because not used in analyses)
A2 – anal-fin rays
Historical Discoveries and Introductions
chronology_of_records.png (Zenodo)
Map of the Tennessee River drainage displaying the chronology of Snail Darter discoveries. Circles indicate natural populations discovered between 1973 and 1983. Squares represent localities where the Tennessee Valley Authority (TVA) introduced N number of Snail Darter individuals. Triangles represent collections made between 2016 and 2022 due to sampling efforts instigated by the 2015 discovery of Snail Darters in Bear Creek and the Elk River.
Phylogenetic Analysis
sequence_alignments.zip
Sequence alignments of double digest restriction-site associated DNA loci generated by ipyrad v.9.5.0 and used to infer the maximum likelihood trees in phylogenies.zip. Details for these alignments are found in the final column of Table S2. Files with the label “m20,” “m50” or “m80” indicate an alignment file that includes loci shared by 20%, 50% or 80% of all samples, respectively.
phylogenies.zip
Zipped file holding all the phylogenetic trees inferred using IQ-TREE v1.7 for this study. Files with the label “m20,” “m50” or “m80” were inferred with an alignment file of loci shared by 20%, 50% or 80% of all samples, respectively. All trees were inferred using a GTR+G model of molecular evolution, except for PtanPura_m50_modelFinderPlus.treefile, which was inferred using the best-fit model and served as the basis of Fig. S1B.
IQTree.slurm
IQ-TREE code for inferring the maximum likelihood trees in this study.
Population genetics
VCF.zip
Variable site files as generated by ipyrad comprising genomic sites shared by 80% of the ingroup individuals (“m80”), and the same files after filtering with VCFtools v.0.1.16 to retain only biallelic sites where <15% of individuals are missing a genotype. The latter form of alignments are those used for all the population genetics analyses (i.e. all analyses except the meristic principal component analyses, the phylogenetic analyses, and the inference of genealogical divergence indices); they end with the suffix “2alleles_mac2_miss15p.vcf,” to reflect the additional filtering performed with VCFtools.
Fst_Nei_descriptive_localityNames.zip
Compressed folder containing all the files with locality names, used in the R scripts to estimate measures of genetic distance.
pairwise_fst_hierfstat_function.R
R script used to infer Weir and Cockerham’s pairwise fixation index for each sister species pair.
all_pairwise_Nei_function.R
R script used to infer Nei’s genetic distance for each sister species pair.
snmf.zip
Population structure plots resulting from the snmf analyses assuming K=2, used as the basis of Fig. S3 and the additional results in snmf_results.pdf.
snmf_results.pdf (Zenodo)
Ancestry coefficients inferred for individual Snail Darter and Stargazing Darter samples in the analysis of population structure. The results presented here are based on the same data matrix described in Table S2. Each vertical bar represents an individual colored by the percentage of sites that support the individual’s derivation from a given ancestral population. (A) Ancestry coefficients assuming K = 2 ancestral populations. (B) Ancestry coefficients assuming K = 3 ancestral populations. (C) Estimates of cross-entropy support for a given K number of ancestral populations among the Snail Darter and Stargazing Darter samples.
Genealogical Divergence Index Estimation
Files used to run the gdiPipeline at https://github.com/dmacguigan/gdiPipeline for each sister species pair, as well as the outputs used to generate the distributions of gdi estimates.
bpp.ctl
Control file needed to run BPP prepared by the gdiPipeline, written in the example of the BPP analysis used to estimate the Percina tanasi – P. uranidea analysis visualized in Fig. 4B (Model 1). This file reflects the MCMC settings used for all the sister-species pairs.
BPP_MCMCresults_and_GDIcalcs.zip
Results of all the BPP analyses assuming models where sister species pairs are divided into two lineages (except for the Snail Darter – Stargazing Darter pairing). For each analysis, the results of all BPP runs were combined into a single spreadsheet (the first sheet in each file), after removing a burn-in of 500,000 generations from each run. GDI summary calculations for plotting can be found highlighted at the bottom of this first sheet. In each document, the remaining sheets reflect the MCMC results for each individual run (after removing the burn-in).
SetPriors.zip
Compressed file containing the three R scripts that precede the GDI pipeline and allow you to set biologically logical priors (listed in BPP_priors.xlsx). Scripts are written in the example of the Snail Darter – Stargazing Darter analysis represented in Fig. 4B.
BPP_priors.xlsx
Spreadsheet with all the priors needed to run the gdiPipeline, as estimated by the three R scripts in this folder. Each sheet corresponds to a different BPP analysis (i.e. a different sister species pair or topology for Ptan-Pura).
ddRAD_PairwiseDists.R
Example script to calculate the proportion of sites that differ between two samples (used in the calculation of the tau_beta prior in BPP_priorDists.R), each representing a species in a species pair. For each sister species pair, the samples comprising the phylip file used for this code are noted in BPP_priors.xlsx.
runPopGenome.R
Example script to use the PopGenome package for the estimation of within-population nucleotide diversity (used in the calculation of the theta_beta prior in BPP_priorDists.R). For each sister species pair, the samples comprising the phylip file used for this code are noted in BPP_priors.xlsx.
BPP_priorDists.R
Script to estimate and examine the theta_beta and tau_beta priors for BPP. For each sister species pair, the samples comprising the phylip file used for this code are noted in BPP_priors.xlsx.