Skip to main content
Dryad

Conflict over the eukaryote root resides in strong outliers, mosaics and missing data sensitivity of site-specific (CAT) mixture models

Cite this dataset

Al Jewari, Caesar; Baldauf, Sandra (2022). Conflict over the eukaryote root resides in strong outliers, mosaics and missing data sensitivity of site-specific (CAT) mixture models [Dataset]. Dryad. https://doi.org/10.5061/dryad.hmgqnk9jk

Abstract

Abstract Phylogenetic reconstruction using concatenated loci (“phylogenomics” or “supermatrix phylogeny”) is a powerful tool for solving evolutionary splits that are poorly resolved in single gene/protein trees (SGTs). However, recent phylogenomic attempts to resolve the eukaryote root have yielded conflicting results, along with claims of various artefacts hidden in the data. We have investigated these conflicts using two new methods for assessing phylogenetic conflict. ConJak uses whole marker (gene or protein) jackknifing to assess deviation from a central mean for each individual sequence, while ConWin uses a sliding window to screen for incongruent protein fragments (mosaics). Both methods allow selective masking of individual sequences or sequence fragments in order to minimize missing data, an important consideration for resolving deep splits with limited data. Analyses focused on a set of 76 eukaryotic proteins of bacterial-ancestry previously used in various combinations to assess the branching order among the three major divisions of eukaryotes: Amorphea (mainly animals, fungi and Amoebozoa), Diaphoretickes (most other well-known eukaryotes and nearly all algae) and Excavata, represented here by Discoba (Jakobida, Heterolobosea, and Euglenozoa). ConJak analyses found strong outliers to be concentrated in under-sampled lineages, while ConWin analyses of Discoba, the most under-sampled of the major lineages, detected potentially incongruent fragments scattered throughout. Phylogenetic analyses of the full data using an LG-gamma model support a Discoba sister scenario (neozoan-excavate root), which rises to 99-100% bootstrap support with data masked according to either protocol. However, analyses with two site-specific (CAT) mixture models yielded widely inconsistent results and a striking sensitivity to missing data. The neozoan-excavate root places Amorphea and Diaphoretickes as more closely related to each other than either is to Discoba, a fundamental relationship that should remain unaffected by additional taxa.

Funding

Swedish Research Council, Award: VR 2017-04351