Ecological interactions and genomic innovation fueled the evolution of ray-finned fish endothermy
Data files
Mar 17, 2025 version files 447.06 MB
-
README.md
23.98 KB
-
Suplemmentary_Datasets.tar.gz
443.23 MB
-
Supplementary_Tables.xlsx
3.81 MB
Abstract
Endothermy has independently evolved in several vertebrate lineages but remains rare among fishes. Using an integrated approach combining phylogenomic and ecomorphological data for 1,051 ray-finned fishes, a time-dependent evolutionary model, and comparative genomic analyses of 205 marine vertebrates, we show that ecological interactions with modern cetaceans coincided with the evolution of endothermy in ray-finned fishes during the Eocene–Miocene. This result is supported by evidence of temporal and geographical overlap between cetaceans and endothermic fish lineages in the fossil record, as well as correlations between cetacean diversification and the origin of endothermy in fishes. Phylogenetic comparative analyses identified correlations between endothermy, large body sizes, and specialized swimming modes while challenging diet specialization and depth range expansion hypotheses. Comparative genomic analyses identified several genes under selection in endothermic lineages, including carnmt1 (involved in fatty acid metabolism) and dcaf6 (associated with development). Our findings advance the understanding of how ecological interactions and genomic factors shape key adaptations.
https://doi.org/10.5061/dryad.ht76hdrpj
Endothermy has independently evolved in several vertebrate lineages but remains rare among fishes. Using an integrated approach combining phylogenomic and ecomorphological data for 1,051 ray-finned fishes, a time-dependent evolutionary model, and comparative genomic analyses of 205 marine vertebrates, we show that ecological interactions with modern cetaceans coincided with the evolution of endothermy in ray-finned fishes during the Eocene–Miocene. This result is supported by evidence of temporal and geographical overlap between cetaceans and endothermic fish lineages in the fossil record, as well as correlations between cetacean diversification and the origin of endothermy in fishes. Phylogenetic comparative analyses identified correlations between endothermy, large body sizes, and specialized swimming modes while challenging diet specialization and depth range expansion hypotheses. Comparative genomic analyses identified several genes under selection in endothermic lineages, including carnmt1 (involved in fatty acid metabolism) and dcaf6 (associated with development). Our findings advance the understanding of how ecological interactions and genomic factors shape key adaptations.
Description of the data and file structure
On this repository you will find the following files:
1. Supplementary_Datasets.tar.gz
This folder will contain the supplementary datasets which are structured in the following way:
- Supplementary Dataset 1: All gene alignments for the RFFD. This dataset contains all gene alignments for the Ray-Finned Fish Dataset (RFFD), providing the foundational data for phylogenetic and evolutionary analyses.
- Supplementary Dataset 2: File containing all phylograms and chronograms used in this study. A comprehensive file with all phylograms and chronograms used in this study, supporting divergence time estimation and evolutionary modeling.
- Supplementary Dataset 3: RFFD calibration file using 31 fossils. The RFFD calibration file incorporating 31 fossil constraints, essential for time-calibrated phylogenetic reconstructions.
- Supplementary Dataset 4: RFFD calibration file using 33 fossils. The RFFD calibration file using 33 fossil constraints, refining divergence estimates within the dataset.
- Supplementary Dataset 5: RFFD calibration file using 34 fossils. The RFFD calibration file applying 34 fossil constraints, further improving calibration accuracy in evolutionary analyses.
- Supplementary Dataset 6: RFFD calibration file using averaged calibrations from previous studies. A calibration file for the RFFD using averaged calibrations from previous studies, integrating multiple sources for robust phylogenetic dating.
- Supplementary Dataset 7: Scripts and results for the HiSSE analyses. This dataset includes all files required to replicate the HiSSE analyses, organized by different analysis schemes (e.g., all_species, less_species, etc.). Each subfolder contains input data, intermediate files, R scripts, and output files corresponding to the hidden state-dependent diversification models applied in our study. The only R file type in each subfolder contains the script that will replicate the analyses. The output is designated accordingly based on the type of analyses conducted on the script.
- Supplementary Dataset 8: Script for the modified threshold model, including the climatic, LTT, and DTT analyses. A script for the modified threshold model, including climatic, Lineage Through Time (LTT), and Disparity Through Time (DTT) analyses.This dataset is organized into two main subfolders:
- Climatic_Analyses: This folder contains the input data, scripts, and results used for the climatic model analyses. Summarized results are available in an Excel file, while various subfolders categorize results by different analytical scales. Each subfolder contains:
- Input trees used in the analyses
- Code files (Climate_ML_LL_Gauss_ME.r and BM_OU_Lambda_ML_gauss.R)
- The R script for replication (Multi_Tree_Fernan_Climatic.R)
- The presence/absence matrix (Endothermy_PresAbs.txt)
- Specific results for each scale analyzed
- LTT_DTT_Analyses: This folder contains the results of the Lineage Through Time (LTT) and Disparity Through Time (DTT) analyses. It is divided into two subfolders based on taxa studied (Sharks or Whales), each of which includes:
- A tree repository (Tree_Repo)
- Subfolders for each category of analysis performed
- Scripts and models required for replication (Climate_ML_LL_Gauss_ME.r and BM_OU_Lambda_ML_gauss.R)
- CSV files containing LTT or DTT curves
- A folder with the scripts and input data used to calculate the curves
- Climatic_Analyses: This folder contains the input data, scripts, and results used for the climatic model analyses. Summarized results are available in an Excel file, while various subfolders categorize results by different analytical scales. Each subfolder contains:
- Supplementary Dataset 9: List of gene IDs for the 894 single-copy orthologs for the MVD. A list of gene IDs for the 894 single-copy orthologs identified in the Marine Vertebrate Dataset (MVD), supporting comparative genomic analyses.
- Supplementary Dataset 10: Single-copy orthologs obtained from OrthoFinder and exon marker alignments for MVD. Single-copy orthologs derived from OrthoFinder and exon marker alignments for the MVD, providing key genomic data for evolutionary inference.
- Supplementary Dataset 11: Technique used to match amino acid to nucleotide sequences. A custom Python script designed to match amino acid sequences to their corresponding nucleotide sequences, facilitating molecular evolutionary analyses.
2. Supplementary Tables.xlsx
An “.xlsx” file containing all of the supplementary tables for this study. The tables are organized per sheet to maintain clear readability and organization. The following tables are included in the file
- Supplementary Table S1: List of specimens that compose the Ray-finned Fishes Database (RFFD). This table includes the family, genus, and species of the each studied individual alongside their respective museum catalog number, the institution affiliated with the sample, and the code of that institution.
- Supplementary Table S2: Comparison of previously proposed ages for different higher-level clades. The second-to-last column shows the average age obtained from all previous studies, while the last column corresponds to the assigned age on our revised calibration file. Ages that appear in bold reflect the most probable and accurate date for the clade based on the nature of the study that proposed it, and were considered as the final ages regardless of the average value when accounting for other studies. Ages that have an asterisk (*) represent a potential outlier for the age
- Supplementary Table S3: Multi-factorial data matrix for 1051 species of ray-finned fishes. The table includes information about the species being analyzed, the depth range (m), average standard length (cm), length-weight relationship (g), diet, swimming mode, presence and absence of endothermy, and endothermy type for each species when applicable.
- Supplementary Table S4: Multi-factorial data matrix for morphometrical characteristics for 542 species of ray-finned fishes obtained from the FishShapes database. The table includes information about the species being analyzed, the average standard length (cm), average body depth (cm), average width(cm), average head depth (cm), average mouth width corrected for size (cm), average caudal peduncle width (cm), average weight (g), body depth-length ratio (cm), body length-weight ratio, body length-depth-weight ratio (cm²/g), and presence and absence of endothermy.
- Supplementary Table S5: Outcomes of the univariate phylogenetic generalized logistic regression analyses conducted on distinct phylogenetic trees. The columns within the table represent the following: the analyzed factor, the FishShapes indicator, the number of taxa considered for analysis, the corresponding p-value, the p-value accounting for the false discovery rate (Q-Value), the phylogenetic signal (α), and the Akaike score. The two swimming mode variations correspond to the two possible assignments of swimming mode for istiophorids: sub-carangiform (Variation 1), and tunniform (Variation 2).
- Supplementary Table S6: Outcomes of the multivariate phylogenetic generalized logistic regression analyses conducted on distinct phylogenetic trees. The columns within the table represent the following: the analyzed factor, the number of taxa considered for analysis, the corresponding p-value, he phylogenetic signal (α), and the Akaike score. The two swimming mode variations correspond to the two possible assignments of swimming mode for istiophorids: sub-carangiform (Variation 1), and tunniform (Variation 2).
- Supplementary Table S7: Averaged p-values of the univariate phylogenetic generalized logistic regression analyses conducted on distinct phylogenetic trees. The columns within the table represent the analyzed factor and the corresponding avereaged p-value. The two swimming mode variations correspond to the two possible assignments of swimming mode for istiophorids: sub-carangiform (Variation 1), and tunniform (Variation 2).
- Supplementary Table S8: Averaged p-values of the multivariate phylogenetic generalized logistic regression analyses conducted on distinct phylogenetic trees. The columns within the table represent the swimming mode variation type (sub-carangiform (Variation 1) and tunniform (Variation 2)), the analyzed factor, and the corresponding averaged p-values.
- Supplementary Table S9: Model fitting results for the 16,200 simulations conducted using our climatic model . The table includes the model generating the data (Model), the number of species (Taxa), the number of integrations (Integrations), the parameter being generated (Parameter), the Akaike scores for each model fit (AIC), the estimated parameter (Est. Param), the mu value estimated (μ), and the weighted Akaike scores for each model fit (AICw).
- Supplementary Table S10: Model fitting results for each phylogenetic reconstruction of our RFFD. The table includes the phylogenetic tree being assessed (Tree), the paleoclimatic curve used in the analysis (Curve), Akaike scores for each model fit (AIC), weighted Akaike scores for each model fit (AICw), beta (β), mu (μ) , and lambda (λ).
- Supplementary Table S11: Model fitting results for the lineages through time values (LTTs) for each of the cetacean groups studied based on each phylogenetic reconstruction of our RFFD. The table includes the phylogenetic tree being assessed (Tree), the cetacean group under study (Group), Akaike scores for each model fit (AIC), weighted Akaike scores for each model fit (AICw), beta (β), mu (μ) , and lambda (λ).
- Supplementary Table S12: Model fitting results for the lineages through time values (DTTs) for each of the cetacean groups studied based on each phylogenetic reconstruction of our RFFD. The table includes the phylogenetic tree being assessed (Tree), the cetacean group under study (Group), Akaike scores for each model fit (AIC), weighted Akaike scores for each model fit (AICw), beta (β), mu (μ), and lambda (λ).
- Supplementary Table S13: Model fitting results for the lineages through time values (LTTs) for the Chondrichthyes phylogeny studied based on each phylogenetic reconstruction of our RFFD. The table includes the phylogenetic tree being assessed (Tree), the group under study (Group), Akaike scores for each model fit (AIC), weighted Akaike scores for each model fit (AICw), beta (β), mu (μ) , and lambda (λ).
- Supplementary Table S14: Model fitting results for the diversity through time values (DTTs) for the Chondrichthyes phylogeny studied based on each phylogenetic reconstruction of our RFFD. The table includes the phylogenetic tree being assessed (Tree), the group under study (Group), Akaike scores for each model fit (AIC), weighted Akaike scores for each model fit (AICw), beta (β), mu (μ) , and lambda (λ).
- Supplementary Table S15: Model fitting results for the lineages through time values (LTTs) for the Carcharhiniformes phylogeny studied based on each phylogenetic reconstruction of our RFFD. The table includes the phylogenetic tree being assessed (Tree), the group under study (Group), Akaike scores for each model fit (AIC), weighted Akaike scores for each model fit (AICw), beta (β), mu (μ) , and lambda (λ).
- Supplementary Table S16: Model fitting results for the net diversification through time values (DTTs) for the Carcharhiniformes phylogeny studied based on each phylogenetic reconstruction of our RFFD. The table includes the phylogenetic tree being assessed (Tree), the group under study (Group), Akaike scores for each model fit (AIC), weighted Akaike scores for each model fit (AICw), beta (β), mu (μ), and lambda (λ).
- Supplementary Table S17: List of specimens the compose the Marine Vertebrates Dataset (MVD). This table includes the family, genus, and species of each studied individual alongside their respective museum catalog number or accesion number, the institution affiliated with the sample, and the code of that institution.
- Supplementary Table S18: List of all of the genes across our four scenarios the exhibited signs of positive selection. An ‘X’ under the specified scenario indicates that the gene experienced positive selection in that particular scenario.
- Supplementary Table S19: The results from our BUSTED-PH analysis for scenario 1, where all endotherms were included in the foreground and all ectotherms in the background. The columns in the table represent various parameters. The ‘ID’ column represents the ID of the alignment used. The ‘Panther Code’ is the code from the Panther Database associated with the alignment. ‘Name’ is the name of the gene. ‘Sequences’ is the number of sequences present in the alignment. ‘Num FG’ is the number of species in the foreground. ‘LRT’ indicates the likelihood ratio test values obtained for that specific gene. ‘BUSTED-E Filtered’ indicates if the alignment required filtering through BUSTED-E. ‘ω1’, ‘ω2’, and ‘ω3’ represent the three omega values estimated by the analysis. ‘FG P-Value’ is the p-value associated with the foreground group. ‘BG P-Value’ is the p-value associated with the background group. ‘DF P-Value’ is the difference between the p-values of the foreground and the background groups. ‘FDR Class’ inform the type of positive selection observed after the FDR correction with class values at 101 representing selection in the fg, no selection in the bg and significant differences between the degree of selection observed, and values at 111 representing selection in the fg, selection in the bg, but significant differences between the degree of selection observed, indicating a degree of positive selection in the fg.
- Supplementary Table S20: The results from our relative evolutionary rates (RER) analysis for scenario 1, where all endotherms were included in the foreground and all ectotherms in the background. The columns in the table represent various parameters. The ‘ID’ column represents the ID of the alignment used. The ‘Panther Code’ is the code from the Panther Database associated with the alignment. ‘Name’ is the name of the gene. ‘FG Rel. Rate’ represents the relative evolutionary rates of our species in the foreground, whereas ‘BG Rel. Rate’ represents the relative evolutionary rates for our background group. ‘P-Value FG=BG’ and ‘Q-Value FG=BG’ represent the respective p and q-values of our foreground background comparison.
- Supplementary Table S21: The results from our BUSTED-PH analysis for scenario 2, where only regional endotherms were included in the foreground, the rest of the endotherms were placed as nuisance species, and all ectotherms remained in the background. The columns in the table represent various parameters. The ‘ID’ column represents the ID of the alignment used. The ‘Panther Code’ is the code from the Panther Database associated with the alignment. ‘Name’ is the name of the gene. ‘Sequences’ is the number of sequences present in the alignment. ‘Num FG’ is the number of species in the foreground. ‘LRT’ indicates the likelihood ratio test values obtained for that specific gene. ‘BUSTED-E Filtered’ indicates if the alignment required filtering through BUSTED-E. ‘ω1’, ‘ω2’, and ‘ω3’ represent the three omega values estimated by the analysis. ‘FG P-Value’ is the p-value associated with the foreground group. ‘BG P-Value’ is the p-value associated with the background group. ‘DF P-Value’ is the difference between the p-values of the foreground and the background groups. ‘FDR Class’ inform the type of positive selection observed after the FDR correction with class values at 101 representing selection in the fg, no selection in the bg and significant differences between the degree of selection observed, and values at 111 representing selection in the fg, selection in the bg, but significant differences between the degree of selection observed, indicating a degree of positive selection in the fg.
- Supplementary Table S22: The results from our relative evolutionary rates (RER) analysis for scenario 2, where only regional endotherms were included in the foreground, the rest of the endotherms were placed as nuisance species, and all ectotherms remained in the background. The columns in the table represent various parameters. The ‘ID’ column represents the ID of the alignment used. The ‘Panther Code’ is the code from the Panther Database associated with the alignment. ‘Name’ is the name of the gene. ‘FG Rel. Rate’ represents the relative evolutionary rates of our species in the foreground, whereas ‘BG Rel. Rate’ represents the relative evolutionary rates for our background group. ‘P-Value FG=BG’ and ‘Q-Value FG=BG’ represent the respective p and q-values of our foreground background comparison.
- Supplementary Table S23: The results from our BUSTED-PH analysis for scenario 3, where only eye-brain endotherms were included in the foreground, the rest of the endotherms were placed as nuisance species, and all ectotherms remained in the background. The columns in the table represent various parameters. The ‘ID’ column represents the ID of the alignment used. The ‘Panther Code’ is the code from the Panther Database associated with the alignment. ‘Name’ is the name of the gene. ‘Sequences’ is the number of sequences present in the alignment. ‘Num FG’ is the number of species in the foreground. ‘LRT’ indicates the likelihood ratio test values obtained for that specific gene. ‘BUSTED-E Filtered’ indicates if the alignment required filtering through BUSTED-E. ‘ω1’, ‘ω2’, and ‘ω3’ represent the three omega values estimated by the analysis. ‘FG P-Value’ is the p-value associated with the foreground group. ‘BG P-Value’ is the p-value associated with the background group. ‘DF P-Value’ is the difference between the p-values of the foreground and the background groups. ‘FDR Class’ inform the type of positive selection observed after the FDR correction with class values at 101 representing selection in the fg, no selection in the bg and significant differences between the degree of selection observed, and values at 111 representing selection in the fg, selection in the bg, but significant differences between the degree of selection observed, indicating a degree of positive selection in the fg.
- Supplementary Table S24: The results from our relative evolutionary rates (RER) analysis for scenario 3, where only eye-brain endotherms were included in the foreground, the rest of the endotherms were placed as nuisance species, and all ectotherms remained in the background. The columns in the table represent various parameters. The ‘ID’ column represents the ID of the alignment used. The ‘Panther Code’ is the code from the Panther Database associated with the alignment. ‘Name’ is the name of the gene. ‘FG Rel. Rate’ represents the relative evolutionary rates of our species in the foreground, whereas ‘BG Rel. Rate’ represents the relative evolutionary rates for our background group. ‘P-Value FG=BG’ and ‘Q-Value FG=BG’ represent the respective p and q-values of our foreground background comparison.
- Supplementary Table S25: The results from our BUSTED-PH analysis for scenario 4, where only full-bodied endotherms were included in the foreground, the rest of the endotherms were placed as nuisance species, and all ectotherms remained in the background. The columns in the table represent various parameters. The ‘ID’ column represents the ID of the alignment used. The ‘Panther Code’ is the code from the Panther Database associated with the alignment. ‘Name’ is the name of the gene. ‘Sequences’ is the number of sequences present in the alignment. ‘Num FG’ is the number of species in the foreground. ‘LRT’ indicates the likelihood ratio test values obtained for that specific gene. ‘BUSTED-E Filtered’ indicates if the alignment required filtering through BUSTED-E. ‘ω1’, ‘ω2’, and ‘ω3’ represent the three omega values estimated by the analysis. ‘FG P-Value’ is the p-value associated with the foreground group. ‘BG P-Value’ is the p-value associated with the background group. ‘DF P-Value’ is the difference between the p-values of the foreground and the background groups. ‘FDR Class’ inform the type of positive selection observed after the FDR correction with class values at 101 representing selection in the fg, no selection in the bg and significant differences between the degree of selection observed, and values at 111 representing selection in the fg, selection in the bg, but significant differences between the degree of selection observed, indicating a degree of positive selection in the fg.
- Supplementary Table S26: The results from our relative evolutionary rates (RER) analysis for scenario 4, where only full-bodied endotherms were included in the foreground, the rest of the endotherms were placed as nuisance species, and all ectotherms remained in the background. The columns in the table represent various parameters. The ‘ID’ column represents the ID of the alignment used. The ‘Panther Code’ is the code from the Panther Database associated with the alignment. ‘Name’ is the name of the gene. ‘FG Rel. Rate’ represents the relative evolutionary rates of our species in the foreground, whereas ‘BG Rel. Rate’ represents the relative evolutionary rates for our background group. ‘P-Value FG=BG’ and ‘Q-Value FG=BG’ represent the respective p and q-values of our foreground background comparison.
- Supplementary Table S27: Data on the ecological interactions of various fish and mammal groups during the Late Miocene and Serravallian intervals, extracted from the Paleobiology Database. The ‘Group’ column identifies the taxonomic group involved in the interaction, while ‘Interaction’ specifies the type of ecological relationship observed. ‘Early Interval’ denotes the geological time frame, with ‘Max Age’ and ‘Min. Age’ indicating its temporal boundaries in million years ago (Mya). The locations of these interactions are represented by coordinates in ‘Paleo-longitude’ and ‘Paleo-latitude’. In this dataset, Thunnini is observed as a predator/prey during both intervals at distinct global positions. This information provides valuable insights into the ecological dynamics of Thunnini during these geological periods.
3. High quality image files for all of the figures used in this study.
Found under “Supplemental Information” in Related Works.