Skip to main content
Dryad

Data for: Gene tree estimation error with ultraconserved elements: An empirical study on Pseudapis bees

Cite this dataset

Bossert, Silas (2021). Data for: Gene tree estimation error with ultraconserved elements: An empirical study on Pseudapis bees [Dataset]. Dryad. https://doi.org/10.5061/dryad.z08kprrb6

Abstract

Summarizing individual gene trees to species phylogenies using two-step coalescent methods is now a standard strategy in the field of phylogenomics. However, practical implementations of summary methods suffer from gene tree estimation error, which is caused by various biological and analytical factors. Greatly understudied is the choice of gene tree inference method and downstream effects on species tree estimation for empirical data sets. To better understand the impact of this method choice on gene and species tree accuracy, we compare gene trees estimated through four widely used programs under different model-selection criteria: PhyloBayes, MrBayes, IQ-Tree and RAxML. We study their performance in the phylogenomic framework of > 800 ultraconserved elements from the bee subfamily Nomiinae (Halictidae). Our taxon sampling focuses on the genus Pseudapis, a distinct lineage with diverse morphological features, but contentious morphology-based taxonomic classifications and no molecular phylogenetic guidance. We approximate topological accuracy of gene trees by assessing their ability to recover two uncontroversial, monophyletic groups, and compare branch lengths of individual trees using the stemminess metric (the relative length of internal branches). We further examine different strategies of removing uninformative loci and the collapsing of weakly supported nodes into polytomies. We then summarize gene trees with ASTRAL and compare resulting species phylogenies, including comparisons to concatenation-based estimates. Gene trees obtained with the reversible jump model search in MrBayes were most concordant on average and all Bayesian methods yielded gene trees with better stemminess values. The only gene tree estimation approach whose ASTRAL summary trees consistently produced the most likely correct topology, however, was IQ-Tree with automated model designation (MFP). We discuss these findings and provide practical advice on gene tree estimation for summary methods. Lastly, we establish the first phylogeny-informed classification for Pseudapis s. l. and map the distribution of distinct morphological features of the group.

Methods

This Dryad Digital Repository contains data associated with the research paper "Gene tree shape and estimation error with ultraconserved elements: An empirical study on Pseudapis bees". Specifically, it contains the Trinity-assemblies of the de-novo sequenced UCEs, the extracted UCE sequences from the included genomes, and the concatenated matrices of the 80% completeness data set. We further provide all species trees and all 853 gene trees inferred with RAxML, IQ-Tree 2 (MFP), IQ-Tree 2 (GTR+G), MrBayes (rj), MrBayes (GTR+G) and PhyloBayes/EZ-PB as outlined in the original paper.

Funding

National Science Foundation, Award: DEB-1555905