The evolution of the cetaceans, from their early transition to an aquatic lifestyle to their subsequent diversification, has been the subject of numerous studies. However, while the higher-level relationships among cetacean families have been largely settled, several aspects of the systematics within these groups remain unresolved. Problematic clades include the oceanic dolphins (37 spp.), which have experienced a recent rapid radiation, and the beaked whales (22 spp.), which have not been investigated in detail using nuclear loci. The combined application of high-throughput sequencing with techniques that target specific genomic sequences provide a powerful means of rapidly generating large volumes of orthologous sequence data for use in phylogenomic studies. To elucidate the phylogenetic relationships within the Cetacea, we combined sequence capture with Illumina sequencing to generate data for ~3200 protein-coding genes for 68 cetacean species and their close relatives including the pygmy hippopotamus. By combining data from >38,000 exons with existing sequences from 11 cetaceans and seven outgroup taxa, we produced the first comprehensive comparative genomic dataset for cetaceans, spanning 6,527,596 aligned base pairs and 89 taxa. Phylogenetic trees reconstructed with maximum likelihood and Bayesian inference of concatenated loci, as well as with coalescence analyses of individual gene trees, produced mostly concordant and well-supported trees. Our results completely resolve the relationships among beaked whales as well as the contentious relationships among ocean dolphins, especially the problematic subfamily Delphininae, which includes the common and bottlenose dolphins. We performed Bayesian estimation of species divergence times using MCMCtree, integrating recently described fossils as calibration points (e.g., Mystacodon selenensis) that have not been used before. Integration of new fossil dates in the context of autocorrelated rates indicate that the diversification of Crown Cetacea began before the Late Eocene and the divergence of Crown Delphinidae as early as the Middle Miocene.
Figure_S1
Maximum likelihood phylogram of Dataset B with the maximum number of partitions. Bootstrap values are 100 for all 3 analyses except at 6 nodes labelled with a red circle; bootstrap values for these are shown in the upper left.
Figure_S2
ASTRAL species tree. All support values are 1.0 unless orthwise noted over the branch.
FigureS3
Time tree of Cetacea using the independent rates (IR) model. Numbers over each node correspond to raw values in Table 3.
Table_S1
Description of values for sequencing (ie. number of reads), Trinity (ie. number of contigs), and reciprocal BLAST searches for each sample for which we performed target sequence capture
Table_S2
List of Genbank accession numbers for sequences included in our analysis for Platanista gangetica and Balaenoptera omurai
DATASET_A.phylip
Dataset A, concatenated alignment
DATASET_B
Dataset B (without Platanista ganagetica and Balaenoptera omurai).
Cetacea_gene_partition
RAXML partitions for each gene (3,191)
PartitionFinder Partitions
RAXML partitions generated by Partition Finder
partitionfindersets
DATASET_A_RAxML_unpartitioned_best_tree
Best tree for unpartitioned analysis of RAxML using DATASET A
RAxML_unpartitioned_best_tree.tree
DATASET_A_RAxML_unpartitioned_bootstrap
DATASET_A_RAxML_unpartitioned_bootstrap
RAxML_unpartitioned_bootstrap.result
DATASET_A_RAxMLpartitionfinder_best_tree
Best tree of RAxML analysis of Dataset A using the partition scheme generated by Partition Finder.
RAxMLpartitionfinder_best_tree.tre
DATASET_A_RAxML_partitionfinder_bootstrap.result
DATASET_A_RAxML_partitionfinder_bootstrap trees
RAxML_partitionfinder_bootstrap.result.txt
DATASET_A_RAxML_partition_by_gene_best_tree
Best tree of RAxML analysis partitioned by gene and using DATASET A
RAxML_partition_by_gene_best_tree.tre
DATASET_A_RAxML_bootstrap_partition_by_gene
DATASET_A_RAxML_bootstrap_partition_by_gene
RAxML_bootstrap_partition_by_gene.result
DATASET_B_RAxML_unpartitioned_bestTree
DATASET_B_RAxML_unpartitioned_bestTree
RAxML_DATASET_B_unpartitioned_bestTree.result
DATASET_B_RAxML_bootstrap_unpartitioned.result
DATASET_B_RAxML_bootstrap_unpartitioned trees
RAxML_bootstrap_unpartitioned.result.txt
DATASET_B_RAxML_partitionfinder_bestTree
DATASET_B_RAxML_partitionfinder_bestTree
RAxML_DATASET_B_partitionfinder_bestTree.result
DATASET_B_RAxML_bootstrap_partitionfinder.result
DATASET_B_RAxML_bootstrap_partitionfinder.result
RAxML_bootstrap_partitionfinder.result.txt
DATASET_B_RAxML_partition_by_gene_bestTree
DATASET_B_RAxML_partition_by_gene_bestTree
RAxML_DATASET_B_all_genes_bestTree.result
DATASET_B_RAXML_boostrap_partition_by_gene_result
DATASET_B_RAXML_boostrap_partition_by_gene_result
RAXML_boostrap_by_gene_result.txt
Exabayes_tree
Tree resulting from the ExaBayes analysis
Bayes_tree.nex
Bayes_tree.nex
ASTRAL input of RAxML gene trees for each of the 3,191 genes
ASTRAL input of RAxML gene trees for each of the 3,191 genes
RAXML_gene_trees_ASTRAL_input
ASTRAL_species_tree_results
Results of the ASTRAL species tree analysis
ASTRAL_species_tree.txt
MCMCTree input
Dataset including the top 1/3 of genes in terms of divergence between odontocetes and mysticetes. This was the inout for all MCMCTree analyses.
GENE_LIST3.phylip
MCMCTREE.tre
Tree input with calibration points for all MCMCTree analyses
Hessian matrix file for input in MCMCTree analyses
in.BV
Output for MCMCTree Strict clock Run 1
Output for MCMCTree Strict clock analysis; Run 1
out_clock_1_1.txt
Output for MCMCTree Strict clock Run 2
Output for MCMCTree Strict clock analysis; Run 2
out_clock_1_2.txt
Output for MCMCTree IR analysis; Run 1
Output for MCMCTree IR analysis; Run 1
out_clock2_1.txt
Output for MCMCTree IR analysis; Run 2
Output for MCMCTree IR analysis; Run 2
out_clock_2_2.txt
Output for MCMCTree AR analysis; Run 1
Output for MCMCTree AR analysis; Run 1
out_clock3_1.txt
Output for MCMCTree AR analysis; Run 2
Output for MCMCTree AR analysis; Run 2
out_clock3_2.txt
Figure_S2
Figure S2. Tracer file showing convergence of -lnL values for both runs of the Bayesian analysis using ExaBayes.
Figure_S3
Figure S3. Species tree of Dataset B generated by ASTRAL. All nodes have posterior probabilities of 1.0, except for those with values listed above the node.
Figure_S4_Tracer_3_AR
Figure S4. Tracer file showing convergence of -lnL values for both runs of the 3-partition analysis with autocorrelated rates using MCMCTree.
Figure_S5_Tracer_3_IR
Figure S5. Tracer file showing convergence of -lnL values for both runs of the 3-partition analysis with independent rates using MCMCTree.
Figure_S6_Tracer_6_AR
Figure S6. Tracer file showing convergence of -lnL values for both runs of the 6-partition analysis with autocorrelated rates using MCMCTree.
Figure_S8
Figure S8. Cetartiodactyl tree with the topology from Figure 3 with nodes labelled corresponding to the list of mean ages and 95% confidence intervals (CIs) for both the AR and IR models of the 6-partition scheme in Table S3.
Figure_S9
Figure S9. Timetree of Cetacea analyzed in the MCMCTree package of PAML 4.9h using 3 partitions and approximate likelihood (Yang, 2007). A time scale in Ma (millions of years) is shown above the tree, with geologic periods labelled below the tree for reference (Q=Quaternary). Above each node the posterior distributions of the AR model (purple) and IR model (white) are shown. Red circles at each node represent calibration
Supplemental_Figure_Captions
Cetacea_ExaBayes Input File
Input file for ExaBayes analyses.
Cetacea_ExaBayes.phy
Configuration file used in ExaBayes analyses
config.nex
Topologies for ExaBayes Run 1
ExaBayes_topologies.run-0.Cetacea_1
Parameters for ExaBayes Run 1
ExaBayes_parameters.run-0.Cetacea_1
Topologies for ExaBayes Run 2
ExaBayes_topologies.run-0.Cetacea_2
Parameters for ExaBayes Run 2
ExaBayes_parameters.run-0.Cetacea_2
Cetacea_partition_mcmctree_3
Alignment file for the 3 partition analyses for MCMCTree
Cetacea_partition_mcmctree_6
Alignment file for the 6-partition analyses in MCMCTree
Hessian matrix file for input in 3-partition MCMCTree analyses
in.BV1-3
Hessian matrix file for input in 6-partition MCMCTree analyses
in.BV1-6
Tree file for MCMCTree analyses
MCMCTREE.tre
Result file for 3-partition mcmctree AR Run 1
parts_3_mcmctree_AR_mcmc.txt
FigTree result for 3-partition mcmctree AR Run 1
FigTree_parts_3_mcmctree_AR_1.tre
Control file for 3-partition AR analyses MCMCTree
mcmctree_3p_AR.ctl
Result file for 3-partition mcmctree AR Run 2
parts_3_mcmctree_AR_2_mcmc.txt
FigTree result for 3-partition mcmctree AR Run 2
FigTreeparts_3_mcmctree_AR_2.tre
Result file for 3-partition mcmctree IR Run 1
parts_3_mcmctree_IR_mcmc.txt
FigTree result for 3-partition mcmctree IR Run 1
FigTree_parts_3_mcmctree_IR.tre
Control file for 3-partition IR analyses MCMCTree
mcmctree_3p_IR.ctl
Result file for 3-partition mcmctree IR Run 2
parts_3_mcmctree_IR_2_mcmc.txt
FigTree result for 3-partition mcmctree IR Run 2
FigTree_parts_3_mcmctree_IR_2.tre
Result file for 6-partition mcmctree AR Run 1
parts_6_mcmctree_AR_mcmc.txt
FigTree result for 6-partition mcmctree AR Run 1
FigTree_parts_6_mcmctree_AR.tre
Control file for 6-partition AR analyses MCMCTree
mcmctree_6p_AR.ctl
Result file for 6-partition mcmctree AR Run 2
parts_6_mcmctree_AR_mcmc.txt
FigTree result for 6-partition mcmctree AR Run 2
FigTree_parts_6_mcmctree_AR.tre
Result file for 6-partition mcmctree IR Run 1
parts_6_mcmctree_IR_mcmc.txt
FigTree result for 6-partition mcmctree IR Run 1
FigTree_parts_6_mcmctree_IR.tre
Control file for 6-partition IR analyses MCMCTree
mcmctree_6p_IR.ctl
Result file for 6-partition mcmctree IR Run 2
parts_6_mcmctree_IR_2_mcmc.txt
FigTree result for 6-partition mcmctree IR Run 2
FigTree_parts_6_mcmctree_IR_2.tre