Skip to main content
Dryad

Data from: Coalescent-based branch length estimation improves dating of species trees

Data files

Apr 03, 2026 version files 14.19 GB

Click names to download individual files Select up to 11 GB of files for zip download

Abstract

Species trees need to be dated for many downstream applications. Typical molecular dating methods take a phylogenetic tree with branch lengths in substitution units, as well as a set of calibrations, as input and convert the branch lengths of the species tree to the unit of time, while being consistent with the pre-specified calibrations. When dating species trees from multi-locus genome-scale datasets, the branch lengths and sometimes the topology of the species tree are estimated using concatenation. However, concatenation does not address gene tree heterogeneity across the genome. While Bayesian dating methods can address some forms of gene tree heterogeneity, such as incomplete lineage sorting, they are not scalable to large datasets. In this paper, we introduce a new scalable pipeline for dating species trees that addresses gene tree discordance for both topology and branch length estimation. The pipeline uses discordance-aware methods that account for incomplete lineage sorting for estimating the topology and branch lengths, and maximum likelihood-based methods for the dating step. Our simulation study on datasets with gene tree discordance shows that this pipeline produces more accurate and less biased dates than pipelines that use concatenation. Furthermore, it is substantially more scalable and can handle datasets with thousands of species and genes. Our results on two biological datasets demonstrate that this new pipeline improves the inference of node ages and branch lengths for certain nodes, particularly those closer to the tree tips, and improves the downstream diversification analysis.