Phylogenetic signal and bias in paleontology
Data files
Sep 14, 2021 version files 81.35 MB
-
appendS4_artex-allMPTs.csv
55.93 MB
-
appendS5_artex-strictCons.csv
25.42 MB
Abstract
An unprecedented amount of evidence now illuminates the phylogeny of living mammals and birds on the Tree of Life. We use this tree to measure phylogenetic value of data typically used in paleontology (bones and teeth) from six datasets derived from five published studies. We ask three interrelated questions: 1) Can these data adequately reconstruct known parts of the Tree of Life? 2) Is accuracy generally similar for studies using morphology, or do some morphological datasets perform better than others? 3) Does the loss of non-fossilizable data cause taxa to occur in misleadingly basal positions? Adding morphology to DNA datasets usually increases congruence of resulting topologies to the well corroborated tree, but this varies among morphological datasets. Extant taxa with a high proportion of missing morphological characters can greatly reduce phylogenetic resolution when analyzed together with fossils. Attempts to ameliorate this by deleting extant taxa missing morphology are prone to decreased accuracy due to long-branch artefacts. We find no evidence that fossilization causes extinct taxa to incorrectly appear at or near topologically basal branches. Morphology comprises the evidence held in common by living taxa and fossils, and phylogenetic analysis of fossils greatly benefits from inclusion of molecular and morphological data sampled for living taxa, whatever methods are used for phylogeny estimation.
Methods
These files include both supplementary "data" and "materials" referred to Asher & Smith, Systematic Biology, "Phylogenetic Signal & Bias in Paleontology":
https://doi.org/10.1093/sysbio/syab072
Usage notes
Supplementary Figures:
Figure S1. Percent completeness of morphological data in fossil templates from each study. Horizontal lines represent median, boxes middle quartiles, and whiskers range.
Figure S2. Congruence of artificial-extinction topologies with well-corroborated trees for each dataset, based on only templates of at least 53% complete (corresponding to the least complete template from Asher). Each horizontal bar shows the number of shared splits using strict consensuses (top Y-axis) or quartet similarity averaged across all MPTs (bottom Y-axis), averaged across all extant subjects per fossil template. Datasets are ordered from largest (left) to smallest (right) difference in median shared splits obtained by real vs. 01-randomized character states (see Table 2). Boxes denote median and interquartile range, whiskers the range; non-overlapping notches represent a significant difference in medians (Chambers et al. 1983). Characters missing in a fossil template were coded as missing in each extant subject. Remaining characters were coded with the real states in each template (blue), randomized using states drawn from different extant taxa (yellow, "noInfo"), or randomized with states 0 or 1 (red, "random01").
Figure S3. Majority Rule consensus (as shown by percentages adjacent to each node) of 11 topologies derived from the Pattinson dataset using equal and implied weighting values (k = 2, 4, 8, 16, 32, 64, 128, 256, 512, 999) that maximize quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".
Figure S4. Majority Rule consensus (as shown by percentages adjacent to each node) of 7 topologies derived from the Asher dataset using implied weighting concavity values (k = 8, 16, 32, 64, 128, 512, 999) that maximize quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".
Figure S5. Single MPT derived from Halliday-All dataset using implied weighting value (k = 2, 11026.82414 steps) that maximizes quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".
Figure S6. Majority Rule consensus (as shown by numbers adjacent to each node) of 11 topologies derived from the Huttenlocker dataset using equal and implied weighting values (k = 2, 4, 8, 16, 32, 64, 128, 256, 512, 999) that maximize quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".
Figure S7. Single MPT derived from Livezey-Zusi dataset using implied weighting value (k = 4, 65443.9752 steps) that maximizes quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".
Figure S8. Single MPT derived from Halliday-50 dataset using using implied weighting concavity value (k = 999, 46.99872 steps) that maximizes quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".
Supplementary Tables in "asherSmith_suppTablesS1-5.odt":
Table S1, Summary of taxon samples used to join morphology matrices of Asher, Halliday, Huttenlocker to DNA alignment of Upham, and morphology matrix of Livezey-Zusi to DNA alignment of Prum.
Table S2, R script for writing ArtEx TNT batch files
Table S3, R script for writing binary morph rand states (A) and rands for DNA sites (B) and taxa (C)
Table S4, R script for calculating congruence
Table S5, R script for calculating root-to-node distances
Appendices:
Appendix S1, Methods for matrix assembly & phylogenetic search strategies
Appendix S2, TNT matrices
filename |
dataset |
morphology source |
DNA source |
birdNC.tnt |
birds |
Livezey & Zusi (2006) |
Prum et al. (2015) |
pattinsonS1comb.tnt |
primates |
Pattinson et al. (2015), Seiffert et al. (2009) |
Springer et al. (2012) |
upham-asher.tnt |
mammals Asher |
Asher 2007 |
Upham et al. (2019) |
upham-hallid.tnt |
mammals Halliday-all |
Halliday et al. 2019 |
Upham et al. (2019) |
upham-hallid50.tnt |
mammals Halliday-50 |
Halliday et al. 2019 |
Upham et al. (2019) |
upham-hutt.tnt |
mammals Huttenlocker |
Huttenlocker et al. 2018 |
Upham et al. (2019) |
Appendix S3, Newick topologies representing well-corroborated trees (Fig. 1), random samples of DNA sites (Fig. 7) and taxa known for DNA (Fig. 8), bifurcating full-data topologies of extant taxa used to calculate root-to-node distances (Fig. 11), and optimal topologies derived from individual datasets and the literature (Table 3; Figs. S3-S8).
Appendix S4, All MPTs resulting from ArtEx analyses (newick)
Appendix S5, Strict consensuses of MPTs derived from each ArtEx subject-template combination (newick)