Data from: Multiple morphological clocks and total-evidence tip-dating in mammals
Lee, Michael, South Australian Museum
Lee, Michael S. Y., South Australian Museum
Published Sep 13, 2016 on Dryad.
https://doi.org/10.5061/dryad.3h4m5
Cite this dataset
Lee, Michael; Lee, Michael S. Y. (2016). Data from: Multiple morphological clocks and total-evidence tip-dating in mammals [Dataset]. Dryad. https://doi.org/10.5061/dryad.3h4m5
Abstract
Morphological integration predicts that correlated characters will coevolve; thus, each distinct suite of correlated characters might be expected to evolve according to a separate clock or ‘pacemaker’. Characters in a large morphological dataset for mammals were found to be evolving according to seven separate clocks, each distinct from the molecular clock. Total-evidence tip-dating using these multiple clocks inflated divergence time estimates, but potentially improved topological inference. In particular, single-clock analyses placed several meridiungulates and condylarths in a heterodox position as stem placentals, but multi-clock analyses retrieved a more plausible and orthodox position within crown placentals. Several shortcomings (including uneven character sampling) currently impact upon the accuracy of total-evidence dating, but this study suggests that when sufficiently large and appropriately constructed phenotypic datasets become more commonplace, multi-clock approaches are feasible and can affect both divergence dates and phylogenetic relationships.
Usage notes
Description of All Data Files on Dryad
Description of All Data Files on Dryad (docx); this is also available on on the Biology Letters website. All numbered refs, such as [6], in the descriptions below refer to citations in the main Biology Letters paper or in this document.
Description.docx
A1_CharacterPartitions
File_A1 (Excel). Table describing the 28 partitions (25 morphological, 3 molecular) and listing the included characters. Character numbering is based on the matrix from [6], available as MorphoBank Project 773 (http://dx.doi.org/10.7934/P773).
B1_TreeRef6_MrBayesFile_Morph1-25
File_B1 (plain text). The MrBayes [10] executable file to infer branch lengths for the 25 candidate morphological partitions, using the total-evidence Maximum Likelihood topology from [6]. The matrix consists of only the 46 extant taxa from [6], since molecular branch lengths (to which these morphological branch lengths have to be compared) cannot be ascertained for extinct taxa.
B2_TreeRef6_MrBayesFile_Codons1-3
File_B2 (plain text). The MrBayes executable file to infer branch lengths for the 3 candidate molecular partitions (and overall morphological branch lengths), using the total-evidence Maximum Likelihood topology from [6]. The matrix consists of the 46 extant taxa from [6].
B3_TreeRef6_Parts1-28branchlengths_newick
File_B3 (newick format). The ClockstaR [3] treefile containing the trees with the branch lengths for the 28 candidate partitions obtained from the MrBayes analysis in files B1, B2. Note: ClockstaR treats all trees as unrooted, so different rootings of trees are of no consequence.
B4_TreeRef9_MrBayesFile_Morph1-25
File_B4 (plain text). The MrBayes [10] executable file to infer branch lengths for the 25 candidate morphological partitions, using the topology from [9]. The matrix consists of only the 46 extant taxa from [6], since molecular branch lengths (to which these morphological branch lengths have to be compared) cannot be ascertained for extinct taxa.
B5_TreeRef9_MrBayesFile_Codons1-3
File_B5 (plain text). The MrBayes executable file to infer branch lengths for the 3 candidate molecular partitions (and overall morphological branch lengths), using the topology from [9]. The matrix consists of the 46 extant taxa from [6].
B6_TreeRef9_Parts1-28branchlengths_newick
File_B6 (newick format). The ClockstaR treefile containing the trees with the branch lengths for the 28 candidate partitions obtained from the MrBayes analysis in files B4, B5. Note: ClockstaR treats all trees as unrooted, so different rootings of trees are of no consequence.
B7_ClockstarResults
File_B7 (pdf). ClockstaR partition matrices and gap statistics for analyses using the guide tree from ref [6] (upper panels) and ref [9] (lower panels). From the 28 candidate partitions, both analyses identified an optimal number of 8 clock-partitions (pacemakers) of similar composition. Note that the order of partitions in these tables (molecular partitions uppermost) is different to that in Fig. 1, but the same as in Fig. B10.
B8_Randomised_TOLMorph25p_TOL_Gam
File_B8 (plain text). The MrBayes [10] executable file to infer branch lengths for the 25 randomised morphological partitions. The partitions were of the same size as the 25 original partitions (42-553 characters); morphological characters were randomly shuffled in Excel. The molecular data were not randomised. The matrix otherwise is identical to that in File B1.
B9_Randomised_TreeRef6_Parts1-28branchlengths_newick
File_B9 (newick format). The ClockstaR [3] treefile containing the trees with the branch lengths for the 25 randomised morphological partitions, and the original (not randomised) 3 molecular partitions obtained from the MrBayes analysis as per file B2. Note: ClockstaR treats all trees as unrooted, so different rootings of trees are of no consequence.
B10_Randomised_clockstar_Results
File_B10 (pdf). ClockstaR partition matrices and gap statistics for analysis using randomised morphological partitions, using the guide tree from ref [6]. From the 28 candidate partitions, the analysis preferred very many, or very few, morphological partitions, unlike the corresponding analysis of the original data (File B7, top) which preferred an intermediate number. Note that the order of partitions in these tables (molecular partitions uppermost) is different to that in Fig. 1, but the same as in Fig. B7.
B11_ClockstaR_script
File_B11 (plain text). ClockstaR R script (and output) for the above analysis. Note filenames and paths need to be changed as appropriate.
C1_PartitionFinder
File_C1 (zipped, plain text). Matrix with molecular data for 46 extant taxa (extracted from [6]), in Phylip format (mammals.phy); PartitionFinder [12] command file (cfg) with 71 candidate partitions (by genes and by codons, with noncoding genes treated as single candidate partitions); best scheme with 7 partitions requiring separate substitution models, found by PartitionFinder using the Bayesian Information Criterion with unlinked branch lengths.
C2_1clock - MrBayes file
File_C2 (plain text). MrBayes executable file for a total-evidence dating analysis of all 86 taxa in the matrix in [6], using a single relaxed (independent gamma rates) clock for all traits (morphological and molecular). The sampled ancestor birth-death tree prior [17], and the Markov model of morphological evolution [18,19], were used. Optimal substitution models and substitution-model-partitions were found with PartitionFinder for molecular data (see C1) and with stepping-stone analysis in MrBayes for morphological data as implemented in [10]. Numerous (>20) MCMC runs were initially performed for 5 million generations to investigate tuning, mixing and convergence. The final analysis was then performed with 4 runs of 20 million generations, with the first 30% of samples discarded as burnin.
C2_1clock.mrb
C3_1clock_PostburninTrees&Params_folder
File_C3 (zipped, plain text). The full MCMC tree and parameter output files from the MrBayes analysis in file C1. Only post-burnin samples are included to reduce file sizes. Run the MrBayes file in C2 (after disabling the MCMC command and setting burnin to 0) to generate consensus trees and statistics from these files.
C4_1clock_con_fig
File_C4 (approximate nexus format). The majority-rule consensus tree from the MrBayes analysis in files C1-2. Note: the wrong file was previously uploaded onto Dryad; I thank Joseph Brown for pointing this out.
C5_1clock_topologyConvergence
File_C5 (pdf). Convergence diagnostics for tree topologies sampled in C2. (A) AWTY [20] plots demonstrating similar posterior probabilities for all clades across 4 runs, and (B) at different stages in a single run. The topology convergence statistics from MrBayes and AWTY (standard deviation of split frequencies across runs) were also good ie low, being <0.031 across all comparisons (see top right cells of panel A). These patterns are consistent with good convergence. The kink in one of the fitted lines appears to be an artefact of a glitch in AWTY.
C6_1clock_paramConvergence
File_C6 (plain text). Convergence diagnostics for numerical parameters from MrBayes [9] for the analysis in Files C2-4. PSRFs (ratio of within-run to between-run variance) is approximately 1 for all parameters, consistent with the view the MCMC runs are sampling from the same posterior and consistent with good convergence [10].
C7_8clocks MrBayes file
File_C7 (plain text). MrBayes executable file for a total-evidence dating analysis of all 86 taxa in the matrix in [6], using 8 separate relaxed clocks (independent gamma rates; 7 for morphology and 1 for molecular data), as found in analysis B-3. The sampled ancestor birth-death tree prior [17], and the Markov model of morphological evolution [18,19], were used. Optimal substitution models and substitution-model-partitions were found with PartitionFinder for molecular data (see C1) and with stepping-stone analysis in MrBayes for morphological data [10]. Numerous (>20) MCMC runs were initially performed for 20 million generations to investigate tuning, mixing and convergence. The final analysis was then performed with 4 runs and the trees from 10 million post-burnin generations retained
Note: The burnin for each run varies considerably due to variation in time to reach (apparent) stationarity, from 20 million to 50 million. For computational efficiency, the 4 runs were performed separately and in all cases, the last 10 million steps (after stationarity) were retained. However, the step numbers have been readjusted in file C8 below so they are all identical across runs (ie to a common burnin of 50 million), to facilitate downstream analyses e.g. generating summary statistics in MrBayes.
C7_8clocks.mrb
C8_8clocks_PostburninTrees&Params_folder
File_C8 (zipped, plain text). The full MCMC tree and parameter output files from the MrBayes analysis in file C7. Only post-burnin samples for each of the 4 runs (last 10 million, see below) are included to reduce file sizes. Run the MrBayes file in C7 (first disabling the MCMC command) and setting burnin to 0 to generate consensus trees and statistics. Run the relevant MrBayes file (disabling the MCMC command) and setting burnin to 0 to generate consensus statistics.
C9_8clocks_con_fig
File_C9 (approximate nexus format). The majority-rule consensus tree from the MrBayes analysis in files C6-7.
C10_8clocks_topolConvergence
File_C10 (pdf). Convergence diagnostics for tree topologies sampled in C8. (A) AWTY [20] plots demonstrating relatively good correlation (albeit with substantial variance) for all clades across 4 runs, and (A) high variation at different stages in a single run, but no obvious directional trends. The topology convergence statistics from MrBayes and AWTY (standard deviation of split frequencies across runs) were also relatively good ie being <0.093 across all comparisons (see top right cells of panel A). These results are consistent with convergence or near-convergence, i.e. runs sampling similar distributions but cycling very slowly through parameter space.
C11_8clocks_param_summary
File_C11 (plain text). Convergence diagnostics for numerical parameters from MrBayes [10] for the analysis in Files C7-9. PSRFs (ratio of within-run to between-run variance) is very 1 for most numerical parameters, but approaches ~1.7 for a single parameter (due to variance in 1 run). This single outlier is slightly higher than desirable. These diagnostics do not indicate convergence, though are consistent with convergence being approached.
C12_NodeAges
File_C12 (Word docx). Comparison of divergence dates obtained from the single-clock and multi-clock analyses, and two previous studies [6,9]. Numerical dates were not published in [9] so dates were retrieved from a detailed time-tree (Figure S1) and are thus shown as estimated (e.g. ~56).