Data from: Bayesian total-evidence dating revisits sloth phylogeny and biogeography: a cautionary tale on morphological clock analyses
Data files
Dec 01, 2023 version files 7.78 MB
-
Distribution_matrix.xlsx
13.24 KB
-
README.md
4.12 KB
-
Tejada_et_al_Supp_Info_110623.pdf
7.77 MB
Abstract
Combining morphological and molecular characters through Bayesian total-evidence dating allows inferring the phylogenetic and timescale framework of both extant and fossil taxa, while accounting for the stochasticity and incompleteness of the fossil record. Such an integrative approach is particularly needed when dealing with clades such as sloths (Mammalia: Folivora), for which developmental and biomechanical studies have shown high levels of morphological convergence whereas molecular data can only account for a limited percentage of their total species richness. Here, we propose an alternative hypothesis of sloth evolution that emphasizes the pervasiveness of morphological convergence and the importance of considering the fossil record and an adequate taxon sampling in both phylogenetic and biogeographic inferences. Regardless of different clock models and morphological datasets, the extant sloth Bradypus is consistently recovered as a megatherioid, and Choloepus as a mylodontoid, in agreement with molecular-only analyses. The recently extinct Caribbean sloths (Megalocnoidea) are found to be a monophyletic sister-clade of Megatherioidea, in contrast to previous phylogenetic hypotheses. Our results contradict previous morphological analyses and further support the polyphyly of “Megalonychidae”, whose members were found in five different clades. Regardless of taxon sampling and clock models, the Caribbean colonization of sloths is compatible with the exhumation of islands along Aves Ridge and its geological time frame. Overall, our total-evidence analysis illustrates the difficulty of positioning highly incomplete fossils, although a robust phylogenetic framework was recovered by an a posteriori removal of taxa with high percentages of missing characters. Elimination of these taxa improved topological resolution by reducing polytomies and increasing node support. However, it introduced a systematic and geographic bias because most of these incomplete specimens are from northern South America. This is evident in biogeographic reconstructions, which suggest Patagonia as the area of origin of many clades when taxa are underrepresented, but Amazonia and/or Central and Southern Andes when all taxa are included. More generally, our analyses demonstrate the instability of topology and divergence time estimates when using different morphological datasets and clock models, and thus caution against making macroevolutionary inferences when node support is weak or when uncertainties in the fossil record are not considered.
Summary of methods conducted:
Phylogeny and divergence time estimations:
Bayesian total-evidence dating approach with the fossilized birth-death model (FBD, Heath et al. 2014; Zhang et al. 2016; Gavryushkina et al. 2017), performed with MrBayes 3.2.7a
Molecular partitions with their own evolutionary model: three for protein-coding genes (one per codon position) and one for the ribosomal RNAs 12S and 16S RNAs
Models of sequence evolution were sampled with the reversible-jump Markov Chain Monte Carlo (MCMC) with gamma-distributed site rates and the default parameter of four gamma rate categories for rate heterogeneity (Huelsenbeck et al. 2004). Additionally, we performed analyses with an eight-category lognormal rate distribution.
One partition for all the morphological characters.
Morphological evolution computed under the Markov one-parameter (Mk1) model (Lewis 2001) with a gamma rate variation across characters.
MrBayes analysis consisted of two runs with eight incrementally heated MCMC starting from a random tree.
MCMC were run for 20 million generations with trees and associated model parameters sampled every 2,000 generations.
Analyses conducted using two types of relaxed clock models: (1) an uncorrelated independent gamma rate model (IGR model, Lepage et al. 2007), in which tree branches have their own evolutionary rates that follow a gamma distribution, and (2) an autocorrelated lognormal rate model (TK02 model; Thorne and Kishino 2002) in which branch rates follow a lognormal distribution and the evolutionary rate in a descendant branch is influenced by that of its ancestral branch.
Sampling strategy of taxa was set to diversity (prset samplestrat=diversity)
Fossilization prior set with a beta distribution: prset fossilizationpr=beta(1, 1).
An exponential prior and a beta prior were used for the net speciation rate and the relative extinction rate, respectively: prset speciationpr=exp(10); prset extinctionpr=beta(1, 1)
Node age prior was set to ‘calibrated’ (prset nodeagepr=calibrated).
Topological constraints were set for Pilosa (excluding Pseudoglyptodon), and for Bradypus spp. Although results without the constraint on Pilosa are also presented
Dates of fossil occurrence (fixed, for radiometrically-dated Quaternary sloths; uniform intervals for all pre-Quaternary sloths)
Root age prior: 72-51 Ma (prset treeagepr=unif(51, 72)) based on previous studies (Meredith et al. 2011; Gibb et al. 2016; Delsuc et al. 2019) and the most ancient xenarthran fossil (†Riostegotherium yanei Oliveira and Bergqvist 1998, from the Itaboraí Formation [50-53 Ma]).
For the reduced matrix of 64 taxa, an age prior was also set on the Mylodontoidea node (45-33.2 Ma) based on the record from Antoine et al (2021), but total-evidence dating analyses without this internal calibration were also performed
Convergence diagnostics were checked for each analysis with average standard deviation of split frequencies (ASDF) <0.01, potential scale reduction factor (PSRF) close to 1.0, and effective sample size (ESS) >200 in Tracer 1.7.1 (Rambaut et al. 2018).
Biogeographic estimations:
Ancestral ranges on the total-evidence time-calibrated phylogenies were done using the Dispersal-Extinction-Cladogenesis (DEC) model (Ree and Smith 2008) as implemented in DECX (Beeravolu and Condamine 2016).
Description of the data and file structure
-Data file is available in the form of an Excel file which includes the distribution matrix used for the biogeographic reconstructions, as described in main text and SI
-Abbreviations of column header of the "Distribution_matrix" Excel file refer to: AM: Amazonia, AF: Atlantic Forest, DD: Dry Diagonal, SSA: Southern South America, NAD: Northern Andes, SAD: Southern Andes, CNA: Central and North America, GrA: Greater Antilles, LeA_N: Northern Lesser Antilles, LeA_S: Southern Lesser Antilles, PR: Puerto Rico, Aves: Aves Ridge
Sharing/Access information
-All data sources are available in SI file in DRYAD
