Data and code from: Aquifer geology drives speciation in cave-adapted fishes
Data files
Jun 02, 2026 version files 6.24 GB
-
ASTRAL_gene_trees.zip
1.24 MB
-
BEAST_input_and_output.zip
3.49 GB
-
BioGeoBears.zip
595.98 KB
-
BPP.zip
732.65 MB
-
cavefish_concat.zip
618.06 KB
-
cavefish_partition_concat.zip
11.20 MB
-
concordance_factors.zip
8.81 KB
-
mtDNA_trees.zip
66.73 KB
-
README.md
5.65 KB
-
scripts.zip
14.94 KB
-
styx_segments.zip
487.95 MB
-
subterraneus_segments.zip
1.52 GB
Abstract
Subterranean faunas often exhibit numerous adaptations to these harsh environments. Yet, owing to the remoteness of caves, underground speciation remains poorly understood, and whether subterranean ecosystems are evolutionary dead ends has been debated since the 1800s. We use comparative genomics and computed tomography to describe a new cavefish, Typhlichthys styx sp. nov., and show that speciation and secondary contact in Typhlichthys occur across aquifer boundaries. Although minimally distinct on the basis of morphology, different cavefish lineages have dispersed using widespread aquifers contained within Carboniferous-aged limestone formations across southeastern and central North America, facilitating secondary sympatry of cavefish species last sharing common ancestry eight million years ago. Our results establish a mechanism for allopatric speciation underground dependent on subterranean geology.
Dataset DOI: 10.5061/dryad.bk3j9kdr4
Description of the data and file structure
CT scan data were collected at Yale University and segmented in the program 3DSlicer. Sequences were compiled from previously published UCE and whole-genome data.
Files and variables
File: ASTRAL_gene_trees.zip
Description: ASTRAL multispecies coalescent summary tree file, input gene trees from IQ-TREE analysis, and log file from IQ-TREE analysis.
- cave_75p_genetrees.log #log file
- cave_75p_genetrees.treefile #gene trees
- cavefish_final_out.tree #ASTRAL-III tree
File: BioGeoBears.zip
Description: Output files and plots from the BioGeoBears analysis.
- clado_subsetXYZ.csv #csv file for plotting types of biogeographic events from stochastic mapping. Node = node number, node.type = internal node or tip; node_ht, node height, in millions of years; time_bp, time before present; clado_event_type = type of cladogenetic event; clado_event_text = areas involved in cladogenetic biogeographic event
- comparisons #model comparisons R table files. "restable_AIC_rellike_formatted", "restable_AIC_rellike", "restable_AICc_rellike_formatted", and "restable_AICc_rellike" contain AIC and corrected AIC scores across models in both formatted and unformatted tables for convenience, "restable" contains model log-likelihoods and parameter values, and "testtable" contains chi-squared tests for comparing models, log-likelihoods, and parameter values.
- Input #input newick tree (cave.newick) and ranges file (geography.txt)
- models #model R data files. These can be accessed by reading the data file into RStudio using load("filename"). These files are generated automatically during the BioGeoBEARS run process.
- plots #output reconstruction plots
- Stochastic_mapping_output #Stochastic mapping output, including RData files, a plot of event counts, and tables containing types of events.
File: concordance_factors.zip
Description: Output files and plots from the concordance factor analysis in IQ-TREE.
- concordance_factors.cf.stat #stat file holding all calculated concordance factors - columns are node support values and lengths of subtending branches, rows are node numbers.
- concordance_factors.cf.tree #tree file with calculated concordance factors labeled at branches
File: scripts.zip
Description: R scripts.
- Aquifer_Map_Detail.r #aquifer map plotting
- Cavefish_BioGeoBears.R #biogeobears run
- cavefishmapplot.R #map plotting
- GDI.R #Genealogical divergence index plotting
- Pairwise_Dissimilarity_and_Haplotype.R #haplotype mapping
File: mtDNA_trees.zip
Description: mtDNA sequence alignment, output trees, and output files, including:
#Mr_Bayes
- amblyopsidND2.nex.con.tree #output tree from the MrBayes analysis
#IQ-TREE
- amblyopsidND2.fas.contree #output .contree file from the IQTREE analysis
- amblyopsidND2.fas.iqtree #output iqtree file from the IQTREE analysis
- amblyopsidND2.fas.log #output log file from the IQTREE analysis
- amblyopsidND2.fas.treefile #output tree file from the IQTREE analysis
#Haplotypes_and_Sequence_Dissimilarity
- amblyopsidND2_sequence_dissimilarity.fas #input fasta file for haplotype analysis
- delimitation.csv #input delimitation file in csv format; column 1 has the sample ID in the input data, column 2 has the species ID, and no values are missing
- Haplotype_IDs.tsv #input delimitation file in tsv format; column 1 has the sample ID in the input data, column 2 has the species ID, and no values are missing
#amblyopsidND2.fas #alignment used in phylogenetic analyses
File: cavefish_concat.zip
Description: Single partition concatenated sequences IQ-TREE run output
- cave_concat_75p.log #log file
- cave_concat_75p.treefile #tree file
File: BPP.zip
Description: Analysis output from BPP for calculation of the genealogical divergence index (GDI)
run_2_clades_subsample #T. styx as one lineage
run_4_clades_styx_subsample #T. styx southeastern vs. all other T. styx
run_6_clades_styx_subsample #All four major lineages of T. styx sampled
File: cavefish_partition_concat.zip
Description:
- cave_partitioned_75p.log #log file
- cave_partitioned_75p.treefile #tree file
File: BEAST_input_and_output.zip
Description: BEAST input XML, output log, and output combined tree set and maximum clade credibility tree files
run_1_logs.zip #log files from run 1
run2_logs.zip #log files from run 2
run3_logs.zip #log files from run 3
Typhlichthys_all9_posterior.trees #combined posterior tree sets
Typhlichthys_posterior.tree #maximum clade credibility tree
xml #input xml files
File: styx_segments.zip
Description: Segmentations of Typhlichthys styx are new to this paper.
segments #segmented skeletons and skull bones, in STL format
File: subterraneus_segments.zip
Description: Segmentations of Typhlichthys subterraneus sensu stricto are new to this paper
segments #segmented skeletons and skull bones, in STL format
Code/software
R 4.5.1 and the listed packages in the scripts directories
BEAST v. 2.6.7 # time calibration
IQ-TREE 2 #phylogenetic analysis
MrBayes v. 3.2.1 #phylogenetic analysis of the ND2 alignment
3DSlicer #CT scan segmentation
Blender #Rendering of segments
ASTRAL-III #species tree inference
Access information
Other publicly accessible locations of the data:
- NCBI Genbank
- Morphosource
Data was derived from the following sources:
- NCBI Genbank
