Data from: Ancient climate changes and relaxed selection shape cave colonization in North American cavefishes
Data files
Jun 06, 2025 version files 21.72 MB
-
astral_percopsiformes.tre
1.54 KB
-
CDHit_Exonerate.sh
2.73 KB
-
gene_alignments.zip
200.72 KB
-
iqtree_percopsiformes.tre
1.74 KB
-
mrbayes_tipdating_percopsiformes.tre
1.38 KB
-
nhmmer2fasta.py
2.72 KB
-
Pam_Climatic.zip
61.06 KB
-
Percopsiformes_exons_NT_Concatenated_alignments.fasta
21.26 MB
-
Percopsiformes_fossil_coding.csv
1.38 KB
-
README.md
4.36 KB
-
Relax_Intensify_Selection.sh
620 B
-
run_macse2.edit.sh
1.88 KB
-
SuppMat1_Percopsiformes_molecular-specimens.csv
4.84 KB
-
SuppMat2_Percopsiformes.txt
2.86 KB
-
SuppMat3_Percopsiformes_genome-stats.xlsx
14.93 KB
-
SuppMat4_Percopsiformes_SelectionAnalyses_Results.xlsx
93.19 KB
-
SuppMat5_Percopsiformes_GUIS.xlsx
37.38 KB
-
SuppMat6_Percopsiformes_GURS.xlsx
17.01 KB
-
SuppMat7_Percopsiformes_GURS_Sources.txt
14.11 KB
Abstract
Extreme environments serve as natural laboratories for studying evolutionary processes, with caves offering replicated instances of independent colonisations. The timing, mode, and genetic underpinnings underlying cave-obligate organismal evolution remain enigmatic. We integrate phylogenomics, fossils, paleoclimatic modeling, and newly sequenced genomes to elucidate the evolutionary history and adaptive processes of cave colonisation in the study group, the North American Amblyopsidae fishes. Amblyopsid fishes present a unique system for investigating cave evolution, encompassing surface, facultative cave-dwelling, and cave-obligate (troglomorphic) species. Using 1,105 exon markers and total-evidence dating, we reconstructed a robust phylogeny that supports the nested position of eyed, facultative cave-dwelling species within blind cavefishes. We identified three independent cave colonisations, dated to the Early Miocene (18.5 Mya), Late Miocene (10.0 Mya), and Pliocene (3.0 Mya). Evolutionary model testing supported a climate-relict hypothesis, suggesting that global cooling trends since the Early–Middle Eocene may have influenced cave colonisation. Comparative genomic analyses of 487 candidate genes revealed both relaxed and intensified selection on troglomorphy-related loci. We found more loci under relaxed selection, supporting neutral mutation as a significant mechanism in cave-obligate evolution. Our findings provide empirical support for climate-driven cave colonisation and offer insights into the complex interplay of selective pressures in extreme environments.
https://doi.org/10.5061/dryad.3bk3j9ktx
Description of the data and file structure
The dataset contains the aligned exons (fasta format) harvested from whole genome resequencing data for the dataset in Hart et al. (in review). Also included is the coding scheme for the morphological matrix used to incorporate fossil Percopsiformes taxa into the total-evidence divergence analyses.
Folders, Files, and variables
File: Percopsiformes_exons_NT_Concatenated_alignments.fasta
Description: All exons harvested from whole genome resequencing data for all specimens aligned.
File: Percopsiformes_fossil_coding.csv
Description: Coding of fossil taxa into the morphological matrix from Armbruster et al. (2016).
Variables
- Fossil_Taxa
- Order
- Family
- Morphological characters from Armbruster et al. (2016)
File: astral_percopsiformes.tre
Description: File containing the Percopsiformes phylogeny estimated using ASTRAL.
File: iqtree_percopsiformes.tre
Description: File containing the Percopsiformes phylogeny estimated using IQTree.
File: mrbayes_tipdating_percopsiformes.tre
Description: File containing the tip-dated Percopsiformes phylogeny estimated using Mr. Bayes.
Folder: gene_alignments.zip
Description: Folder containing all gene alignments used in this study in fasta format.
File: nhmmer2fasta.py
Description: Python script used to concatenated and convert nhmmer output to fasta.
File: CDHit_Exonerate.sh
Description: Bash script used to run CDHIT and exonerate on the prealignments.
File: run_macse2.edit.sh
Description: Bash script used to align our alignments using MACSE.
File: Relax_Intensify_Selection.sh
Description: Bash script used to test for relaxation and intensifying selection.
File: SuppMat1_Percopsiformes_molecular-specimens.csv
Description: Tissue specimens used for whole shotgun genome resequencing and their associated BioProject numbers.
File: SuppMat2_Percopsiformes.txt
Description: Fossil calibration age priors & integration of fossil and extant species
File: SuppMat3_Percopsiformes_genome-stats.xlsx
Description: Genome statistics for the Percopsiformes used in this study.
File: SuppMat4_Percopsiformes_SelectionAnalyses_Results.xlsx
Description: Results for the positive selection analyses performed for this study.
File: SuppMat5_Percopsiformes_GUIS.xlsx
Description: Genes found to be under intensified selection.
File: SuppMat6_Percopsiformes_GURS.xlsx
Description: Genes found to be under relaxed selection.
File: SuppMat7_Percopsiformes_GURS_Sources.txt
Description: Sources used to determine the function for genes under relaxed selection.
Folder: Pam_Climatic.zip
Description: Folder containing all data needed to replicate the paleoclimatic analyses used for this study. Folder contents include the R hisotry file (.Rhistory), a text file containing the trait data (Data_w_Out.txt), an Excel file containing the cave state for our taxa (pformes_cave-state_table) needed to run our analysis, tree file containing the phylogeny (Percopsiformes_TipDating_MrBayes.tre), the different scales used for the analyses (Scaled_Scotese_Deep.csv, Scaled_Scotese_GAT.csv, Scaled_Scotese_Polar.csv, Scaled_Scotese_Tropical.csv), and the results (Results.Pam). An additional subfolder contains the the facultative alternate scheme (Faculative_Alternate). This folder contains the the two functions used in all of our analyses (BM_OU_Lambda_ML_gauss.r and Climate_ML_LL_Gauss_ME.r), the submission script (Clim_Pam.sh), a text file containing the trait data (Faculative_Data_w_Out), a newick file containing the phylogeny (Newick_Pam_tree.nwk), R Script used for the analyses (Pam_Climatic.R), raw error and output files, result files (Results_Pam.csv), the different scales used for the analyses (Scaled_Scotese_GAT.csv, Scaled_Scotese_Tropical.csv), and the screen log outputs from R classified as either output (r_output_21248717) or error (r_error_21248717) to provide an all-encompassing and transparent view of the analysis.
We performed short-read shotgun genome sequencing on Percopsiformes fishes, then harvested exons using Hughes et al. (2018) FishLifeExonHarvesting pipeline. Exon markers were used to create a phylogeny, then extinct fossil taxa were added as tips in the phylogeny to time-calibrate the tree.
