Phylogenomics is extremely powerful but introduces new challenges as no agreement exists on “standards” for data selection, curation and tree inference. We use jawed vertebrates (Gnathostomata) as model to address these issues. Despite considerable efforts in resolving their evolutionary history and macroevolution, few studies have included a full phylogenetic diversity of gnathostomes and some relationships remain controversial. We tested a novel bioinformatic pipeline to assemble large and accurate phylogenomic datasets from RNA sequencing and find this phylotranscriptomic approach successful and highly cost-effective. Increased sequencing effort up to ca. 10Gbp allows recovering more genes, but shallower sequencing (1.5Gbp) is sufficient to obtain thousands of full-length orthologous transcripts. We reconstruct a robust and strongly supported timetree of jawed vertebrates using 7,189 nuclear genes from 100 taxa, including 23 new transcriptomes from previously unsampled key species. Gene jackknifing supports the robustness of our tree and allows calculating genome-wide divergence times by overcoming gene sampling bias. Mitochondrial genomes prove insufficient to resolve the deepest relationships due to limited signal and among-lineage rate heterogeneity. Our analyses emphasize the importance of large curated nuclear datasets to increase the accuracy of phylogenomics and provide a reference framework for the evolutionary history of jawed vertebrates.
Supplementary Methods, Tables and Figures
Supplementary Methods, Tables, and Figures
Irisarri_et_al_Supplement_combined.R2.pdf
Custom scripts
detect-divergent-seq-ali.c
detect-problems-arb-3-9-15.c
ParalogDetector_V4-sort.sh
split-out-paralog.c
code-Vertebrata.tgz
Nuclear datasets
0DP gene set – 4593 gene alignments; 1DP gene set – 1162 gene alignments; 2DP gene set – 1434 gene alignments; nuclear test dataset
alignments.zip
ML tree NoDP dataset, RAxML GTR+G
ML tree NoDP dataset, RAxML GTR+G, 100 non-parametric rapid bootstrap
misgen_50-RAXML-PROTGAMMAGTR-100xRAPIDBP.tre
ML tree 0DP dataset, LG+G+F
ML tree 0DP dataset, RAxML LG+G+F, 100 non-parametric rapid bootstrap
misgen_50-RAXML-PROTGAMMALGF-100xRAPIDBP.tre.pdf
Mitochondrial datasets
Concatenated mitochondrial proteins of jawed vertebrate (except ND6), either containing the total set of 106 taxa or a reduced set of 95 taxa after removing fastest-evolving taxa.
mtP-Gnatho-ND6_datasets.tgz
BI tree, mitochondrial dataset 95 taxa, PhyloBayes CAT+G
mtP-Gnatho_mi11F-ND6_strh04_R95_V2072-CAT_200_1k_con_root.ann.tre
BI tree, mitochondrial dataset 95 taxa, PhyloBayes CATGTR+G
mtP-Gnatho_mi11F-ND6_strh04_R95_V2072-CATGTR_10_1k_con_root.ann.tre
BI tree, mitochondrial dataset 106 taxa, PhyloBayes CAT+G
mtP-Gnatho-ND6_strh04_R106_V2086-CAT_25_1k_con_root.ann.tre
BI tree, mitochondrial dataset 106 taxa, PhyloBayes CATGTR+G
mtP-Gnatho-ND6_strh04_R106_V2086-CATGTR_10_1k_con_root.ann.tre
Species tree of the 0DP dataset, ASTRAL
Species tree of the 0DP dataset, ASTRAL tree estimated on 4593 gene trees, branch support is measured by local posterior probabilities
ASTRAL_0DP-4593.tre
Species tree of the 1DP dataset, ASTRAL
Species tree of the 1DP dataset, ASTRAL tree estimated on 1162 gene trees, branch support is measured by local posterior probabilities
ASTRAL_1DP-1162.tre
Species tree of the 2DP dataset, ASTRAL
Species tree of the 2DP dataset, ASTRAL tree estimated on 1434 gene trees, branch support is measured by local posterior probabilities
ASTRAL_2DP-1434.tre
BI tree, NoDP dataset, PhyloBayes CAT+G
Majority rule consensus from 100 BI analyses of 100 gene jackknife replicates (alignments with ~50,000 amino acid position each). PhyloBayes, CAT+G model
jack50000-0DP-all-CATG4.con.ann.tre
BI tree, 1DP dataset, PhyloBayes CAT+G
Majority rule consensus from 100 BI analyses of 100 gene jackknife replicates (alignments with ~50,000 amino acid position each). PhyloBayes, CAT+G model
jack50000-1DP-all-CATG-A.con.ann.tre
BI tree, 2DP dataset, PhyloBayes CAT+G
Majority rule consensus from 100 BI analyses of 100 gene jackknife replicates (alignments with ~50,000 amino acid position each). PhyloBayes, CAT+G model
jack50000-2DP-all-CATG-A.con.ann.tre
Genome-averaged timetree, PhyloBayes
Timetree showing averaged dates across 100 timetrees, each estimated in PhyloBayes from 100 independent gene jackknife replicates. CATGTR+G substitution model, autocorrelated log-normal clock model, 16 cross-validated calibration points with soft bounds and birth-death tree prior
CATGTR-LN-BD-SB_100jacks.chronogram_mean_compCrI.tre
Timetree with 30 calibrations, nuclear test dataset, PhyloBayes
Timetree estimated in PhyloBayes under CATGTR+G substitution model, autocorrelated log-normal clock model, 30 calibration points with soft bounds and birth-death tree prior
14K_CATGTR-LN-BD-SB_all30.ch2_sample.chronogram.tre
Timetree with 16 calibrations, nuclear test dataset, PhyloBayes
Timetree estimated in PhyloBayes under CATGTR+G substitution model, autocorrelated log-normal clock model, 16 cross-validated calibration points with soft bounds and birth-death tree prior
14K_CATGTR-LN-BD-SB_CVed16b.ch2_sample.chronogram
Neoceratodus forsteri transcriptome
Neoceratodus_forsteri_transcriptome_trinity_oases.fa.gz
Megophrys nasuta transcriptome
Megophrys_nasuta_transcriptome_JP25.fasta.gz
Discoglossus pictus transcriptome
Discoglossus_pictus_transcriptome_JP15.fasta.gz
Andrias davidianus transcriptome
Andrias_davidianus_transcriptome_JP19.fasta.gz
Calotriton asper transcriptome
Calotriton_asper_transcriptome_JP21.fasta.gz
Lepidosiren paradoxa transcriptome
Lepidosiren_paradoxa_transcriptome_trinity_oases.fa.gz
Protopterus annectens transcriptome
Protopterus_annectens_transcriptome_trinity_oases.fa.gz
Geotrypetes seraphini transcriptome
Geotrypetes_seraphini_transcriptome_JP24.fasta.gz
Hymenochirus curtipes transcriptome
Hymenochirus_curtipes_transcriptome_JP17.fasta.gz
Pipa pipa transcriptome
Pipa_pipa_transcriptome_JP18.fasta.gz
Proteus anguinus transcriptome
Proteus_anguinus_transcriptome_JP22.fasta.gz
Siren lacertina transcriptome
Siren_lacertina_transcriptome_JP16.fasta.gz
Typhlonectes natans transcriptome
Typhlonectes_natans_transcriptome_JP23.fasta.gz
Acipenser baerii transcriptome
Acipenser_baerii_transcriptome.fasta.bz2
Amia calva transcriptome
Amia_calva_transcriptome.fasta.bz2
Lepisosteus platyrhincus transcriptome
Lepisosteus_platyrhincus_transcriptome.fasta.bz2
Pleurodeles waltl transcriptome
Pleurodeles_waltl_transcriptome.fasta.bz2
Polypterus senegalus transcriptome
Polypterus_senegalus_transcriptome.fasta.bz2
Protopterus aethiopicus transcriptome
Protopterus_aethiopicus_transcriptome.fasta.bz2
Raja clavata transcriptome
Raja_clavata_transcriptome.fasta.bz2
Rhinatrema bivittatum transcriptome
Rhinatrema_bivittatum_transcriptome.fasta.bz2
Scyliorhinus canicula transcriptome
Scyliorhinus_canicula_transcriptome.fasta.bz2
Tarentola mauritanica transcriptome
Tarentola_mauritanica_transcriptome.fasta.bz2
Typhlonectes compressicauda transcriptome
Typhlonectes_compressicauda_transcriptome.fasta.bz2
ML tree, mitochondrial dataset 106 taxa, RAxML GTR+G
mtP-Gnatho-ND6_strh04_R106_2773_RaxGTR4g_ML_Long.tre
ML tree, mitochondrial dataset 106 taxa, RAxML MTREV+G
mtP-Gnatho-ND6_strh04_R106_2773_RaxMtRevF4g_ML_Long.tre
ML tree, mitochondrial dataset 95 taxa, RAxML GTR+G
mtP-Gnatho_mi11F-ND6_strh04_R95_2866_RaxGTR4g_ML_Long.tre
ML tree, mitochondrial dataset 95 taxa, RAxML MTREV+G
mtP-Gnatho_mi11F-ND6_strh04_R95_2866_RaxMtRevF4g_ML_Long.tre