Data from: Genomic and phenotypic delimitation of species in a temperate aquatic biodiversity hotspot
Data files
Nov 24, 2025 version files 1.78 GB
-
BPP.tar
11.15 MB
-
FEEMS.tar
7.98 MB
-
ipyrad.tar
17.92 KB
-
IQTree.tar
726.35 MB
-
LEA_SNMF.tar
5.12 KB
-
PhenoDelimit.tar
403.97 KB
-
README.md
4.45 KB
-
VCFs.tar
1.03 GB
Abstract
Biologists have relied on morphological characteristics to identify, define, and formally describe species for the past 250 years. The advent of phylogenetic species concepts and the introduction of molecular data have spawned new species delimitation methods applicable to a wide range of eukaryotic lineages. However, these approaches heavily emphasize genomic data, often overlooking phenotypic traits. We present and implement a species delimitation approach that utilizes genome-wide markers from ddRAD-seq and meristic morphological traits, which have long been used to identify and delineate fish species. Our methodology employs unsupervised machine learning to analyze morphological data without a priori species assignments, allowing phenotypic patterns to emerge independently from genomic-based species delimitation. We apply our combined genomic and phenotypic methodology to the freshwater systems of Southeastern North America, a biodiversity hotspot where conservation efforts are hampered by an incomplete knowledge of species diversity. Our investigation focuses on the darter clade Allohistium, a threatened lineage comprising two described species. Through phylogenomic, population genetic, and phenotypic model comparisons, we provide evidence supporting the delimitation of a third species of Allohistium, which we formally describe. Our approach shows how unsupervised machine learning can reveal cryptic morphological diversity that might otherwise be obscured by taxonomic preconceptions. This study demonstrates that model testing using diverse lines of evidence yields a more comprehensive, data-driven hypothesis of species diversity.
Dryad DOI: https://doi.org/10.5061/dryad.r4xgxd2q4
ipyrad.tar
iPyrad parameters for ddRAD assembly
- params-Allohistium_ref.txt: ipyrad parameter file used to assemble dataset, no missing data filtering
- params-Allohistium_ref_m80p.txt: ipyrad parameter file used to assemble dataset, no missing data filtering with a min_samples_locus threshold of 80 samples (>80% samples per locus)
- params-Allohistium_ref_m90p.txt: ipyrad parameter file used to assemble dataset, no missing data filtering with a min_samples_locus threshold of 90 samples (>90% samples per locus)
- params-Allohistium_ref_m95p.txt: ipyrad parameter file used to assemble dataset, no missing data filtering with a min_samples_locus threshold of 95 samples (>95% samples per locus)
IQTree.tar
Data, scripts, and results of IQGTree analyses
- m80p
- Allohistium_ref_m80p.phy: phylip formatted concatenated ddRAD alignment, min 80% samples per locus
- runIQTree.sh: script used to run IQTree analyses
- Allohistium_ref_m80p.phy.*: IQTree results, see IQTree documentation for more details
- m90p
- Allohistium_ref_m90p.phy: phylip formatted concatenated ddRAD alignment, min 90% samples per locus
- runIQTree.sh: script used to run IQTree analyses
- Allohistium_ref_m90p.phy.*: IQTree results, see IQTree documentation for more details
- m95p
- Allohistium_ref_m95p.phy: phylip formatted concatenated ddRAD alignment, min 95% samples per locus
- runIQTree.sh: script used to run IQTree analyses
- Allohistium_ref_m95p.phy.*: IQTree results, see IQTree documentation for more details
BPP.tar
Data and scripts to run BPP using ddRAD data and calculate GDI values. See https://github.com/dmacguigan/gdiPipeline for more info.
- Allo_GDIPipeline.R: R script to set up BPP runs and calculate GDI
- Allo_bpp_loci.txt: alignments of ddRAD loci used for BPP analyses
- Allo.tree: species guide tree for nested BPP analyses
- Allo.Imap.txt: map of taxa to tips in guide tree
VCFs.tar
Script and data used to filter ddRAD SNPs
- VCFToolsPipeline.sh: script used to filter ddRAD SNPs for FEEMS and LEA analyses
- Allohistium.vcf: original ddRAD data, outgroups excluded
- Allohistium.m95p.unlinked.vcf: filtered ddRAD data
FEEMS.tar
Script and data for FEEMS analysis
- Allo_feems.ipynb: script to run FEEMS
- Allo.boundary.txt: polygon boundary for FEEMS network
- Allo.coord.txt: sample coordinates, matches order of samples in Allohistium.m95p.unlinked.vcf
LEA_SNMF.tar
Script for LEA population structure analysis
- ddRAD_LEA.R: R script to run LEA SNMF population structure analysis, uses Allohistium.m95p.unlinked.vcf as input
PhenoDelimit.tar
Scripts and data to run PhenoDelimit analyses
- Allohistium
- phenoDelimit_Allohistium.R: R script to run phenoDelimit
- Allohistium_meristic_specimensIDs.csv: specimen ID data frame, see Table S3 in the Supplementary Materials for additional specimen information
- Allohistium_meristic_traits.csv: meristic and squamation trait data, rows match order of Allohistium_meristic_specimensIDs.csv
- Allohistium_models.csv: delimitation models for PhenoDelimit, rows match order of Allohistium_meristic_specimensIDs.csv
- DarterSisterSpecies: contains 13 subfolders for darter species pairs, all of which contain the following
- phenoDelimit.R: script to run PhenoDelimit
- specimen_IDs.csv: specimen ID data frame
- traits.csv: meristic trait data, rows match order of specimen_IDs.csv
- models.csv: delimitation models for PhenoDelimit, rows match order of specimen_IDs.csv
Trait abbreviations in Allohistium_meristic_traits.csv and all traits.csv files
- LL: number of lateral line scales
- PoreLL: number of pored lateral line scales
- AbLL: number of scale rows above the lateral line
- BlwLL: number of scale rows below the lateral line
- Trans: number of transverse scale rows
- CD: number of circum-caudal-peduncular scale rows
- D1: number of first dorsal fin spines
- D2: number of second dorsal fin rays
- A1: number of anal fin spines
- A2: number of anal fin rays
- P1: number of pectoral fin rays
- Nape: percentage of nape squamation
- Cheek: percentage of cheek squamation
- Opercle: percentage of opercular squamation
- Breast: percentage of breast squamation
- Belly: percentage of belly squamation
