Rapidly growing biological data –including molecular sequences and fossils– hold an unprecedented potential to reveal how evolutionary processes generate and maintain biodiversity. However, researchers often have to develop their own idiosyncratic workflows to integrate and analyse these data for reconstructing time-calibrated phylogenies. In addition, divergence times estimated under different methods and assumptions, and based on data of various quality and reliability, should not be combined without proper correction. Here we introduce a modular framework termed SUPERSMART (Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages, and Relationships of Taxa), and provide a proof of concept for dealing with the moving targets of evolutionary and biogeographical research. This framework assembles comprehensive datasets of molecular and fossil data for any taxa and infers dated phylogenies using robust species tree methods, also allowing for the inclusion of genomic data produced through next-generation sequencing techniques. We exemplify the application of our method by presenting phylogenetic and dating analyses for the mammal order Primates and for the plant family Arecaceae (palms). We believe that this framework will provide a valuable tool for a wide range of hypothesis-driven research questions in systematics, biogeography, and evolution. SUPERSMART will also accelerate the inference of a “Dated Tree of Life” where all node ages are directly comparable.
zip of primate phylogeny inference
Zip file containing all input, intermediate, and output files from the inference of the primate phylogeny using the SUPERSMART pipleine. See README.txt for details.
supersmart-primates.zip
zip of palm phylogeny inference
Zip file containing all input, intermediate, and output files from the inference of the palm phylogeny using the SUPERSMART pipleine. See README.txt for details.
supersmart-palms.zip
zip of simulation study data
Zip file containing all input, intermediate, and output files from the simulation study described in the manuscript. See README.txt for details.
supersmart-simulations.zip
Supplementary Figure S1
Backbone supermatrix dimensions for varying minimum marker coverage in the case of the Primates. For each value for minimum marker coverage we varied the maximum divergence for a marker to be accepted in the set from 0.05 to 0.19 in increments of 0.02, hence the scatter around each value. Higher values of maximum accepted divergence mean that more markers are included, which consequently results in more characters as well as more taxa in the supermatrix.
Fig_S1-Marker-thresholds.pdf
Supplementary Figure S2
Average nodal posterior probabilities p as a function of varying minimum marker coverage and varying maximum divergence. Values are binned as p<0.8 (red), 0.8 < p < 0.95 (green), p>0.95 (blue).
Fig_S2-Posterior-probabilities.pdf
Supplementary Figure S3
Fully annotated primate phylogeny visualized with FigTree. Posterior support values are displayed on the internal nodes. Family names are displayed on the branches of the respective clades. Taxon names and branches are colour-coded by family. The scale bar on the bottom represents the time relative to the root node in Million years.
Fig_S3-Primate-tree.pdf
Supplementary Figure S4
Fully annotated palm phylogeny generated with FigTree. Posterior support values are displayed on the internal nodes. Subfamily names are displayed on the branches of the respective clades. Taxon names and branches are colour-coded by subfamily. The scale bar on the bottom represents the time relative to the root node in million years.
Fig_S4-Arecaceae-tree.pdf
Supplementary Figure S5
Comparison between the palm phylogeny produced by SUPERSMART (right) and that inferred in a previous study (Faurby, S., Eiserhardt, W.L., et al. 2016a, left), with crosses between phylogenies showing the placement of common species. The plot was generated using the R package ‘ape’.
Fig_S5-SUPERSMART-Faurby-comparison.pdf
Supplementary Figure S6
Illustration of how the relative number of dispersals (reported in Fig. 5) is calculated. First, the total branch length within a period (bold branches) is calculated (in this case 17.5 My). In order to account for the decreasing number of lineages towards the root of the tree, the relative number of dispersals is calculated by dividing the total branch length per time bin, thus computing the number of dispersal events in relation to the number of lineages available for dispersal. The green line shows a dispersal event at a node, whereas the red and yellow lines show events at branches that partially fall in another time bin.
Fig_S6-Relative-dispersals.pdf
Supplementary Figure S7
Ancestral range reconstructions for palms using BioGeoBEARS, the SUPERSMART dated phylogeny, and the bioregions defined in Fig. 5.
Fig_S7-BiogeoBEARS.pdf
Supplementary Figure S8
Validation of the three-step phylogenetic inference process. Cladograms of the simulated tree (left) matched with the tree that was re-estimated from the synthetic dataset using SUPERSMART (right). Species present in both trees are connected by lines which are colour-coded by the subclades that the backbone tree was decomposed into. Branches in the re-estimated tree that form the backbone are coloured in red. Black lines represent genera that have less than three species. The plot was generated using the R package ‘ape’.
Fig_S8-Simulated-Estimated-Trees.pdf
Supplementary Figure S9
The long-term vision of SUPERSMART, including current and planned interactions with other initiatives during different analytical stages. “Global analyses”, when fully implemented, aims at providing continuously updating, dated phylogenies of all species with publicly available molecular sequences, and (in future versions) estimates of diversification and migration rates among and within a set of pre-defined GIS polygons (such as WWF’s realms and biomes). The results may be retrieved by other initiatives and will be deposited in data repositories. “User-defined analyses” are influenced by individual choices, including defined polygons (areas), taxa of interest, and fossil records. The user may also include data that are not yet published or are not public.
Fig_S9-Interactions.pdf
Supplementary Table S1
Palm fossils and morphological justification for their placement in the Arecaceae phylogeny.
Table_S1-palm-fossil-calibrations.xlsx
Supplementary Table S2
Estimated mean crown ages in millions of years of major palm clade from SUPERSMART results as compared to previous studies.
Table_S2-palms-dates-comparison.xlsx