Skip to main content
Dryad logo

Phylogenetic signal and bias in paleontology

Citation

Asher, Robert; Smith, Martin (2021), Phylogenetic signal and bias in paleontology, Dryad, Dataset, https://doi.org/10.5061/dryad.w3r2280q3

Abstract

An unprecedented amount of evidence now illuminates the phylogeny of living mammals and birds on the Tree of Life. We use this tree to measure phylogenetic value of data typically used in paleontology (bones and teeth) from six datasets derived from five published studies. We ask three interrelated questions: 1) Can these data adequately reconstruct known parts of the Tree of Life? 2) Is accuracy generally similar for studies using morphology, or do some morphological datasets perform better than others? 3) Does the loss of non-fossilizable data cause taxa to occur in misleadingly basal positions? Adding morphology to DNA datasets usually increases congruence of resulting topologies to the well corroborated tree, but this varies among morphological datasets. Extant taxa with a high proportion of missing morphological characters can greatly reduce phylogenetic resolution when analyzed together with fossils. Attempts to ameliorate this by deleting extant taxa missing morphology are prone to decreased accuracy due to long-branch artefacts. We find no evidence that fossilization causes extinct taxa to incorrectly appear at or near topologically basal branches. Morphology comprises the evidence held in common by living taxa and fossils, and phylogenetic analysis of fossils greatly benefits from inclusion of molecular and morphological data sampled for living taxa, whatever methods are used for phylogeny estimation.

Methods

These files include both supplementary "data" and "materials" referred to Asher & Smith, Systematic Biology, "Phylogenetic Signal & Bias in Paleontology":

https://doi.org/10.1093/sysbio/syab072

Usage Notes

Supplementary Figures:

Figure S1. Percent completeness of morphological data in fossil templates from each study. Horizontal lines represent median, boxes middle quartiles, and whiskers range.

Figure S2. Congruence of artificial-extinction topologies with well-corroborated trees for each dataset, based on only templates of at least 53% complete (corresponding to the least complete template from Asher). Each horizontal bar shows the number of shared splits using strict consensuses (top Y-axis) or quartet similarity averaged across all MPTs (bottom Y-axis), averaged across all extant subjects per fossil template. Datasets are ordered from largest (left) to smallest (right) difference in median shared splits obtained by real vs. 01-randomized character states (see Table 2). Boxes denote median and interquartile range, whiskers the range; non-overlapping notches represent a significant difference in medians (Chambers et al. 1983). Characters missing in a fossil template were coded as missing in each extant subject. Remaining characters were coded with the real states in each template (blue), randomized using states drawn from different extant taxa (yellow, "noInfo"), or randomized with states 0 or 1 (red, "random01").

Figure S3. Majority Rule consensus (as shown by percentages adjacent to each node) of 11 topologies derived from the Pattinson dataset using equal and implied weighting values (k = 2, 4, 8, 16, 32, 64, 128, 256, 512, 999) that maximize quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".

Figure S4. Majority Rule consensus (as shown by percentages adjacent to each node) of 7 topologies derived from the Asher dataset using implied weighting concavity values (k = 8, 16, 32, 64, 128, 512, 999) that maximize quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".

Figure S5. Single MPT derived from Halliday-All dataset using implied weighting value (k = 2, 11026.82414 steps) that maximizes quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".

Figure S6. Majority Rule consensus (as shown by numbers adjacent to each node) of 11 topologies derived from the Huttenlocker dataset using equal and implied weighting values (k = 2, 4, 8, 16, 32, 64, 128, 256, 512, 999) that maximize quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".

Figure S7. Single MPT derived from Livezey-Zusi dataset using implied weighting value (k = 4, 65443.9752 steps) that maximizes quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".

Figure S8. Single MPT derived from Halliday-50 dataset using using implied weighting concavity value (k = 999, 46.99872 steps) that maximizes quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".

Supplementary Tables in "asherSmith_suppTablesS1-5.odt":

Table S1, Summary of taxon samples used to join morphology matrices of Asher, Halliday, Huttenlocker to DNA alignment of Upham, and morphology matrix of Livezey-Zusi to DNA alignment of Prum.

Table S2, R script for writing ArtEx TNT batch files

Table S3, R script for writing binary morph rand states (A) and rands for DNA sites (B) and taxa (C)

Table S4, R script for calculating congruence

Table S5, R script for calculating root-to-node distances

Appendices:

Appendix S1, Methods for matrix assembly & phylogenetic search strategies

Appendix S2, TNT matrices

filename

dataset

morphology source

DNA source

birdNC.tnt

birds

Livezey & Zusi (2006)

Prum et al. (2015)

pattinsonS1comb.tnt

primates

Pattinson et al. (2015), Seiffert et al. (2009)

Springer et al. (2012)

upham-asher.tnt

mammals Asher

Asher 2007

Upham et al. (2019)

upham-hallid.tnt

mammals Halliday-all

Halliday et al. 2019

Upham et al. (2019)

upham-hallid50.tnt

mammals Halliday-50

Halliday et al. 2019

Upham et al. (2019)

upham-hutt.tnt

mammals Huttenlocker

Huttenlocker et al. 2018

Upham et al. (2019)

Appendix S3, Newick topologies representing well-corroborated trees (Fig. 1), random samples of DNA sites (Fig. 7) and taxa known for DNA (Fig. 8), bifurcating full-data topologies of extant taxa used to calculate root-to-node distances (Fig. 11), and optimal topologies derived from individual datasets and the literature (Table 3; Figs. S3-S8).

Appendix S4, All MPTs resulting from ArtEx analyses (newick)

Appendix S5, Strict consensuses of MPTs derived from each ArtEx subject-template combination (newick)