Skip to main content
Dryad

Data from: Leveraging weighted quartet distributions for enhanced species tree inference from genome-wide data

Data files

Nov 11, 2024 version files 3.05 GB

Abstract

Species tree estimation from genes sampled from throughout the whole genome is challenging because of gene tree discordance, often caused by incomplete lineage sorting (ILS). Quartet-based summary methods for estimating species trees from a collection of gene trees are becoming popular due to their high accuracy and theoretical guarantees of robustness to arbitrarily high amounts of ILS. ASTRAL, the most widely used quartet-based method, aims to infer species trees by maximizing the number of quartets in the gene trees consistent with the species tree. An alternative approach is inferring quartets for all subsets of four species and amalgamating them into a coherent species tree. While summary methods can be sensitive to gene tree estimation error, quartet amalgamation offers an advantage by potentially bypassing gene tree estimation. However, greatly understudied is the choice of weighted quartet inference method and downstream effects on species tree estimations under realistic model conditions. In this study, we investigated a wide array of methods for generating weighted quartets and critically assessed their impact on species tree inference. Our study provides evidence that the careful generation and amalgamation of weighted quartets, as implemented in methods like wQFM, can lead to significantly more accurate trees than popular methods like ASTRAL, especially in the face of gene tree estimation errors.