Data from: Taxon-rich phylogenomic analyses resolve the eukaryotic tree of life and reveal the power of subsampling by sites
Cite this dataset
Katz, Laura A.; Grant, Jessica R. (2014). Data from: Taxon-rich phylogenomic analyses resolve the eukaryotic tree of life and reveal the power of subsampling by sites [Dataset]. Dryad. https://doi.org/10.5061/dryad.db78g
Most eukaryotic lineages are microbial, and many have only recently been sampled for phylogenetic studies or remain in the ‘dark area’ of the tree of life where there are no molecular data. To assess relationships among eukaryotic lineages, we perform a taxon-rich phylogenomic analysis including 232 eukaryotes selected to maximize taxonomic diversity and up to 1554 genes chosen as vertically inherited based on their broad distribution among eukaryotes. We also include sequences from 486 bacteria and 84 archaea to assess the impact of endosymbiotic gene transfer (EGT) from plastids and to detect contamination. Overall, our analyses are consistent with other less taxon-rich estimates of the eukaryotic tree of life and we recover strong support for five major clades: Amoebozoa, Excavata (without the genus Malawimonas), Opisthokonta, Archaeplastida and SAR (Stramenopila, Alveolata and Rhizaria). Our analyses also highlight the existence of ‘orphan’ lineages, lineages that lack robust placement in the eukaryotic tree of life and indicate the possibility of as yet undiscovered diversity. In analyses including bacteria and archaea, we find that ~10% of the 1554 genes, which we choose because they are found in four or five of the five major eukaryotic clades and hence may be more likely to be inherited vertically, appear to have been acquired from cyanobacteria through EGT in photosynthetic lineages. Removing these EGT genes places the green algae as sister to the glaucophytes instead of the red algae, suggesting that unknowingly including of genes of plastid origin, and combining them with genes of nuclear origin, may mislead phylogenetic estimates. Finally, the large size of our dataset allows comparative analyses of subsets of data; alignments built from randomly sampled sites provide greater support, particularly for deep relationships, than do equivalent sized datasets built from randomly sampled genes.