Reconstructing the phylogeny and evolutionary history of freshwater fishes (Nemacheilidae) across Eurasia since early Eocene
Data files
Mar 05, 2025 version files 260.20 KB
-
Nemacheilidae_ASTRAL.tre
30.96 KB
-
Nemacheilidae_fulldataset_ML.tre
52.75 KB
-
Nemacheilidae_fulldataset_MrBayes.tre
52.50 KB
-
Nemacheilidae_reduceddataset_ML.tre
40.72 KB
-
Nemacheilidae_reduceddataset_MrBayes.tre
40.33 KB
-
Nemacheilidae_Timetree_BEAST2.tre
36.02 KB
-
README.md
6.92 KB
Abstract
Eurasia has undergone substantial tectonic, geological, and climatic changes throughout the Cenozoic, primarily associated with tectonic plate collisions and a global cooling trend. The evolution of present-day biodiversity unfolded in this dynamic environment, characterised by intricate interactions of abiotic factors. However, comprehensive, large-scale reconstructions illustrating the extent of these influences are lacking. We reconstructed the evolutionary history of the freshwater fish family Nemacheilidae across Eurasia and spanning most of the Cenozoic on the base of 471 specimens representing 279 species and 37 genera. Molecular phylogeny using 6 genes uncovered six major clades within the family, along with numerous unresolved taxonomic issues. Dating of cladogenetic events and ancestral range estimation traced the origin of Nemacheilidae to Indochina around 48 million years ago. Subsequently, one branch of Nemacheilidae colonised eastern, central, and northern Asia, as well as Europe, while another branch expanded into the Burmese region, the Indian subcontinent, the Near East, and northeast Africa. These expansions were facilitated by tectonic connections, favourable climatic conditions, and orogenic processes. Conversely, aridification emerged as the primary cause of extinction events. Our study marks the first comprehensive reconstruction of the evolution of Eurasian freshwater biodiversity on a continental scale and across deep geological time.
https://doi.org/10.5061/dryad.rxwdbrvhz
Description of the data and file structure
471 samples of more than 250 species from 37 genera of the family Nemacheilidae were examined, including the set of sequences of 364 specimens analysed in our laboratory and sequences of 107 specimens obtained from GenBank. Tissue samples used for the present study were fixed and stored in 96% ethanol. For more details about species, geographical origin and GenBank accession numbers see Supplementary material Table S1.
We have sequenced one mitochondrial gene (cytochrome b) and five nuclear genes (RAG1, IRBP2, MYH6, RH1 and EGR3).
Chromatograms were checked and assembled in the SeqMan II module of the DNA Star software package (LASERGENE). Single gene alignments were done in BioEdit (Hall, 1999) with use of ClustalW (Larkin et al., 2007) multiple alignment algorithm. The datasets were concatenated in PhyloSuite v1.2.2 (Zhang et al., 2020).
The best-fit substitution models and partitioning schemes were estimated using Partition Finder 2 (Lanfear et al., 2016) implemented in PhyloSuite 1.2.2 based on the corrected Akaike Information Criterion (AICc).
The 492 specimen phylogenetic trees were inferred from six loci concatenated dataset using the maximum likelihood (ML) and Bayesian inference (BI). The ML analyses were performed using IQ-TREE (Nguyen et al., 2015) implemented in PhyloSuite. The best-fit evolutionary model for each codon partition was automatically determined by ModelFinder (Kalyaanamoorthy, et al., 2017). The node support values were obtained with 5000 ultrafast bootstrap replicates (UFBoot) (Hoang et al., 2018). For the BI analyses we used MrBayes 3.2.7a (Ronquist and Huelsenbeck, 2003) on CIPRES Science Gateway (Miller et al., 2010). The datasets were partitioned into genes and codon positions and analyses were performed in two parallel runs of 10-20 million generations with 8 Metropolis Coupled Markov Chains Monte Carlo (MCMCMC). The relative burnin of 25% was used and from the remaining trees a 50% majority rule consensus trees were constructed.
For reconstructing of the species tree we used the Accurate Species TRee ALgorithm (ASTRAL III) (Zhang et al., 2018). For ASTRAL we used unrooted single gene ML trees reconstructed in IQ-TREE implemented in PhyloSuite. We have used IQ-TREE 2 (Minh at al., 2020) to calculate gene concordance factor (gCF, indicating the percentage of decisive gene trees containing particular branch) and site concordance factor (sCF, indicating the percentage of decisive alignment sites that support a branch in the reference tree) for every branch in the ASTRAL species tree.
The ages of cladogenetic events were estimated in BEAST 2.6.4 (Bouckaert et al., 2014) via CIPRES Science Gateway with use of four calibration points:
1) based on the oldest known fossil of the family Catostomidae, Wilsonium brevipinne, from early Eocene (Liu, 2021), approximately 56-48 my old.
2) only known nemacheilid fossil record, Triplophysa opinata from Kyrgyzstan from middle-upper Miocene (16.0 to 5.3 mya) (Böhme and Ilg, 2003; Prokofiev, 2007). For some time, the Miocene fossil species *Nemachilus tener *from Central European was considered a member of Nemacheilidae. However, subsequent research has cast doubt on placement of this species within loaches (Obrhelova, 1967), leaving Triplophysa opinata as the only known fossil of Nemacheilidae.
3) Third calibration point is based on the oldest known fossil of the genus Cobitis, C. naningensis, from early to middle Oligocene in southern China*,* 34-28 my old (Chen et al., 2015).
4) The fourth calibration point, based on the river history of the southern Korean Peninsula, dates the separation between the species Cobitis tetralineata and C. lutheri to 2.5-3.5 my (Kwan et al., 2014).
For the fossil-based calibration points we have used a fossil calibration prior implemented in CladeAge (Matschiner et al., 2017), the fourth calibration point was set to a uniform prior (2.5 – 3.5).
The partitions were unlinked and assigned the estimated evolutionary models. We used relaxed lognormal molecular clock and birth-death prior. Given the large and complex nature of our dataset, and in light of several preliminary analyses where MCMC chains did not provide satisfactory ESSs even after 1 billion iterations, we made the decision to streamline the dataset by retaining only one specimen per species (reduced dataset of 300 specimens). To enhance the performance of the analysis, we also implemented parallel tempering using the CoupledMCMC package (Müller and Bouckaert, 2019, 2020). The analysis was configured with four parallel chains of 6x10˄8 generations, resampling every 1000. Tree and parameter sampling intervals were set to 60.000. The effective sampling sizes (ESS) for all parameters were assessed in Tracer 1.7.1 (Rambaut et al., 2018). A maximum clade credibility (MCC) tree was built in TreeAnnotator 2.6.0 (Rambaut and Drummond, 2010) after discarding the first 25% of trees. The final trees were visualised in FigTree 1.4.4 (Rambaut, 2019).
ML and BI were performed with both full (492) as well as the reduced (300) datasets.
Files and variables
Nemacheilidae_ASTRAL.tre
Description: A species tree reconstructed in ASTRAL-III from unrooted single gene trees
Nemacheilidae_fulldataset_ML.tre
Description: Maximum likelihood tree reconstructed from concatenated 492 specimen dataset in IQ-TREE
Nemacheilidae_fulldataset_MrBayes.tre
Description: Bayesian tree reconstructed from concatenated 492 specimen dataset in MrBayes
Nemacheilidae_reduceddataset_ML.tre
Description: Maximum likelihood tree reconstructed from 300 specimen dataset in IQ-TREE
Nemacheilidae_reduceddataset_MrBayes.tre
Description: Bayesian tree reconstructed from concatenated 300 specimen dataset in MrBayes
Nemacheilidae_Timetree_StarBEAST2.tre
Description: Calibrated time tree resulting from Bayesian divergence time analysis of concatenated dataset in BEAST 2
Slechtova_et_al_eLife_101080_Supplementary_material_Table_S1_VOR.docx
Description: Table of material used in the current study including the GenBank accession numbers
Code/software
The files can be viewed in e.g.:
Mesquite: A modular system for evolutionary analysis - available from https://www.mesquiteproject.org/
FigTree (Latest Version - v1.4.4) - available from https://github.com/rambaut/figtree/releases
Access information
Other publicly accessible locations of the data:
- NCBI GenBank, see Supplementary_material_Table_S1
Data was derived from the following sources:
- NCBI GenBank, see Supplementary_material_Table_S1
471 samples of more than 250 species from 37 genera of the family Nemacheilidae were examined, including the set of sequences of 364 specimens analysed in our laboratory and sequences of 107 specimens obtained from GenBank. We have sequenced one mitochondrial and five nuclear genes. We have performed various phylogenetic analyses, including maximum likelihood analyses in IQ-TREE, Bayesian inference in MrBayes, species tree was constructed using ASTRAL-III. The divergence time estimations were performed in BEAST2.
