Dryad logo

Inferring the mammal tree: Species-level sets of phylogenies for questions in ecology, evolution, and conservation

Citation

Upham, Nathan S.; Esselstyn, Jacob A.; Jetz, Walter (2019), Inferring the mammal tree: Species-level sets of phylogenies for questions in ecology, evolution, and conservation, v4, Dryad, Dataset, https://doi.org/10.5061/dryad.tb03d03

Abstract

Big, time-scaled phylogenies are fundamental to connecting evolutionary processes to modern biodiversity patterns. Yet inferring reliable phylogenetic trees for thousands of species involves numerous trade-offs that have limited their utility to comparative biologists. To establish a robust evolutionary timescale for all ~6000 living species of mammals, we developed credible sets of trees that capture root-to-tip uncertainty in topology and divergence times. Our ‘backbone-and-patch’ approach to tree-building applies a newly assembled 31-gene supermatrix to two levels of Bayesian inference: (i) backbone relationships and ages among major lineages, using fossil node- or tip-dating; and (ii) species-level ‘patch’ phylogenies with non-overlapping in-groups that each correspond to one representative lineage in the backbone. Species unsampled for DNA are either excluded (‘DNA-only’ trees) or imputed within taxonomic constraints using branch lengths drawn from local birth-death models (‘completed’ trees). Joining time-scaled patches to backbones results in species-level trees of extant Mammalia with all branches estimated under the same modeling framework, thereby facilitating rate comparisons among lineages as disparate as marsupials and placentals. We compare our phylogenetic trees to previous estimates of mammal-wide phylogeny and divergence times, finding that (i) node ages are broadly concordant among studies, and (ii) recent (tip-level) rates of speciation are estimated more accurately in our study than in previous ‘supertree’ approaches where unresolved nodes led to branch length artifacts. Credible sets of mammalian phylogenetic history are now available for download at http://vertlife.org/phylosubsets, enabling investigations of long-standing questions in comparative biology.

Usage Notes

S1 Data. Details of the DNA cleaning steps and updated master taxonomy of mammals used in this study. Three multi-tab Excel files, including the per-gene sampling in the final supermatrix, initial gene lengths, NCBI accession numbers, and the authors of each sequence. (ZIP)

S2 Data. Per-gene DNA alignments for 31 genes and gene tree outputs from RAxML. Includes PDF plots of each gene tree along with the newick tree file and the DNA alignments in phylip format. (ZIP)

S3 Data. Global ML tree for 4098 species of mammals built from the 31-gene supermatrix. Includes the full 31-gene supermatrix alignment, taxonomy file, newick tree file and PDF plotting of global RAxML tree, and R code for dividing the tree into patch clade segments for scaffolding the subsequent Bayesian analyses. (ZIP)

S4 Data. Results of 28 patch clade phylogenies delimited across Mammalia. Maximum clade credibility (MCC) trees for each of the patch clade runs, both as nexus and PDF plots, and details of the species and gene sampling in each patch clade. (ZIP)

S5 Data. Results and run files for backbone divergence-time analyses in MrBayes. Maximum clade credibility (MCC) trees for each of the three backbone dating analyses in nexus format, as well as the run files and details of taxon sampling and node constraints. (ZIP)

**Additional files** [not referenced in the main text due to large file sizes]:

S6 Data. Patch clade run files for MrBayes with constraints constructed using PASTIS. Nexus files ready for execution, both for the DNA-only patches (‘freeOut’) and the completed patches (‘cons’), as well as the other input files used by the PASTIS R package for forming the topological constraints to place DNA-missing species within given taxa. (ZIP)

S7 Data. Credible tree sets of Mammalia species-level evolutionary history. Four credible sets of 10,000 trees each are provided for the DNA-only and completed data sets and the node- and tip-dated backbones. Also provided are the maximum clade credibility (MCC) trees for the DNA-only data sets in PDF and newick formats, and the species-level tip DR calculations summarized across 10,000 completed trees (note that MCC trees and tip DR calculations only make sense for the DNA-only and completed data sets, respectively). (ZIP)

S8 Data. Plotting code and corresponding data for all figures. The R code, source data, and intermediate files are provided for each figure in the main text and supplement to promote re-use of these resources.  This file will be updated at https://github.com/n8upham/MamPhy_v1 (ZIP)

Funding

Division of Environmental Biology, Award: 1441634

Division of Environmental Biology, Award: 1441737

Division of Biological Infrastructure, Award: 1262600

References

Location

global