Skip to main content
Dryad

Supplementary Information for Phylogenetic analyses of ray-finned fishes (Actinopterygii) using collagen type I protein sequences

Cite this dataset

Harvey, Virginia; Keating, Joseph; Buckley, Michael (2021). Supplementary Information for Phylogenetic analyses of ray-finned fishes (Actinopterygii) using collagen type I protein sequences [Dataset]. Dryad. https://doi.org/10.5061/dryad.xgxd254gs

Abstract

Ray-finned fishes (Actinopterygii) are the largest and most diverse group of vertebrates, comprising over half of all living vertebrate species. Phylogenetic relationships between ray-finned fishes have historically pivoted on the study of morphology, which has notoriously failed to resolve higher-order relationships, such as within the percomorphs. More recently, comprehensive genomic analyses have provided further resolution of actinopterygian phylogeny, including higher-order relationships. Such analyses are rightfully regarded as the ‘gold standard’ for phylogenetics. However, DNA retrieval requires modern or well-preserved tissue and is less likely to be preserved in archaeological or fossil specimens. In contrast some proteins, such as collagen, are phylogenetically informative and can survive into deep time. Here, we test the utility of collagen type I amino acid sequences for phylogenetic estimation of ray-finned fishes. We estimate topology using Bayesian approaches and compare the congruence of our estimated trees with published genomic phylogenies. Furthermore, we apply a Bayesian molecular clock approach and compare estimated divergence dates with previously published genomic clock analyses. Our collagen-derived trees exhibit 77% of node positions as congruent with recent genomic-derived trees, with the majority of discrepancies occurring in higher-order node positions, almost exclusively within the Percomorpha. Our molecular clock trees present divergence times that are fairly comparable with genomic-based phylogenetic analyses. We estimate the mean node age of Actinopteri at ~293 million years (Ma), the base of Teleostei at ~211 Ma and the radiation of percomorphs beginning at ~141 Ma (~350 Ma, ~250–283 Ma and ~120–133 Ma in genomic trees, respectively). Finally, we show that the average rate of collagen (I) sequence evolution is 0.9 amino acid substitutions for every million years of divergence, with the α3 (I) sequence evolving the fastest, followed by the α2 (I) chain. This is the quickest rate known for any vertebrate group. We demonstrate that phylogenetic analyses using collagen type I amino acid sequences generate tangible signals for actinopterygians that are highly congruent with recent genomic-level studies. However, there is limited congruence within percomorphs, perhaps due to clade-specific functional constraints acting upon collagen sequences. Our results provide important insights for future phylogenetic analyses incorporating extinct actinopterygian species via collagen (I) sequencing.

Methods

1. Bayesian topology analysis – Bayesian phylogenetic trees were estimated using MrBayes software (v3.2.7). A collagen (I) sequence dataset was run in PartitionFinder2 using the ‘MrBayes only’ option, and testing both linked and unlinked branch lengths.

2. Tree space visualisation – Tree space was visualised in R using a custom script utilising the phylogenetic packages phangorn(), Quartet(), vegan() and ade4().

3. Bayesian clock analysis – The rate of sequence evolution per collagen (I) α-chain was investigated in MrBayes using both a uniform and birth-death clock prior.

Funding

University of Manchester, Award: Dean's Award scholarship funding

Royal Society, Award: UF120473

European Research Council, Award: 788203