Skip to main content
Dryad

Data from: Species tree branch length estimation despite incomplete lineage sorting, duplication, and loss

Data files

Dec 16, 2025 version files 12.41 GB

Click names to download individual files Select up to 11 GB of files for zip download

Abstract

Phylogenetic branch lengths are essential for many analyses, such as estimating divergence times, analyzing rate changes, and studying adaptation. However, true gene tree heterogeneity due to incomplete lineage sorting, gene duplication and loss, and horizontal gene transfer can complicate the estimation of species tree branch lengths. While several tools exist for estimating the topology of a species tree addressing various causes of gene tree discordance, much less attention has been paid to branch length estimation on multi-locus datasets. For single-copy gene trees, some methods are available that summarize gene tree branch lengths onto a species tree, including coalescent-based methods that account for heterogeneity due to incomplete lineage sorting. However, no such branch length estimation method exists for multi-copy gene family trees that have evolved with gene duplication and loss. To address this gap, we introduce the CASTLES-Pro algorithm for estimating species tree branch lengths while accounting for both gene duplication and loss and incomplete lineage sorting. CASTLES-Pro improves on the existing coalescent-based branch length estimation method CASTLES by increasing its accuracy for single-copy gene trees and extending it to handle multi-copy ones. Our simulation studies show that CASTLES-Pro is generally more accurate than alternatives, eliminating the systematic bias toward overestimating terminal branch lengths often observed when using concatenation. Moreover, while not theoretically designed for horizontal gene transfer, we show that CASTLES-Pro is relatively robust to random horizontal gene transfer, though its accuracy can degrade at the highest levels of horizontal gene transfer.