Most eukaryotic lineages are microbial, and many have only recently been sampled for phylogenetic studies or remain in the ‘dark area’ of the tree of life where there are no molecular data. To assess relationships among eukaryotic lineages, we perform a taxon-rich phylogenomic analysis including 232 eukaryotes selected to maximize taxonomic diversity and up to 1554 genes chosen as vertically inherited based on their broad distribution among eukaryotes. We also include sequences from 486 bacteria and 84 archaea to assess the impact of endosymbiotic gene transfer (EGT) from plastids and to detect contamination. Overall, our analyses are consistent with other less taxon-rich estimates of the eukaryotic tree of life and we recover strong support for five major clades: Amoebozoa, Excavata (without the genus Malawimonas), Opisthokonta, Archaeplastida and SAR (Stramenopila, Alveolata and Rhizaria). Our analyses also highlight the existence of ‘orphan’ lineages, lineages that lack robust placement in the eukaryotic tree of life and indicate the possibility of as yet undiscovered diversity. In analyses including bacteria and archaea, we find that ~10% of the 1554 genes, which we choose because they are found in four or five of the five major eukaryotic clades and hence may be more likely to be inherited vertically, appear to have been acquired from cyanobacteria through EGT in photosynthetic lineages. Removing these EGT genes places the green algae as sister to the glaucophytes instead of the red algae, suggesting that unknowingly including of genes of plastid origin, and combining them with genes of nuclear origin, may mislead phylogenetic estimates. Finally, the large size of our dataset allows comparative analyses of subsets of data; alignments built from randomly sampled sites provide greater support, particularly for deep relationships, than do equivalent sized datasets built from randomly sampled genes.

Supplemental_Data

Includes Table S3:Monophyly of subclades, S4:Subsampling results and tree data as newick strings

Table_S1

Table_S1.xlsx: Taxon source and code Taxa are listed by major clade, and information is given about the source of the data (e.g. OrthoMCL, Genbank, Moore Transcriptome Project). Codes used throughout the study are given. Taxa in red are in Phylobayes analysis but removed from other analyses

Table_S2

Table_S2.xlsx: Presence/Absence of genes x taxa Genes present by taxon. 1 = present; 0 = absent. Genes are given by OG name (OrthoMCL) and are marked with an * if a member of the 150 most even gene dataset and with a ^ if identified as a gene affected by EGT

All_1554

Alignment file; 1554 genes; eukaryotes, bacteria and archaea.

Katz_Grant_README.docx

Euk_1554

Alignment file; 1554 genes; eukaryotes

Katz_Grant_README.docx

All_150

Alignment file; 150 genes; eukaryotes, bacteria and archaea

Katz_Grant_README.docx

Euk_150

Alignment file; 150 genes; eukaryotes.

Katz_Grant_README.docx

All_1554_noEGT

Alignment file; 1554 genes minus egt genes; eukaryotes, bacteria and archaea

Katz_Grant_README.docx

Euk_1554_noEGT

Alignment file; 1554 genes minus egt genes; eukaryotes

Katz_Grant_README.docx

All_150_noEGT

Alignment file; 150 genes minus egt genes; eukaryotes, bacteria and archaea

Katz_Grant_README.docx

Euk_150_noEGT

Alignment file; 150 genes minus egt genes; eukaryotes.

Katz_Grant_README.docx

1554_Single_gene_alignments_All

All 1554 single gene alignments with all taxa. Data were concatenated and columns with > 50% missing data in eukaryotic taxa were removed to make the large matrices analyzed in this study.

Katz_Grant_README.docx

1554_Single_gene_alignments_Euk_only

All 1554 single gene alignments with only eukaryotic taxa. Data were concatenated and columns with > 50% missing data were removed to make the large matrices analyzed in this study.

Katz_Grant_README.docx

Data from: Taxon-rich phylogenomic analyses resolve the eukaryotic tree of life and reveal the power of subsampling by sites

Data files

Abstract

Supplemental_Data

Table_S1

Table_S2

All_1554

Euk_1554

All_150

Euk_150

All_1554_noEGT

Euk_1554_noEGT

All_150_noEGT

Euk_150_noEGT

1554_Single_gene_alignments_All

1554_Single_gene_alignments_Euk_only

Data from: Taxon-rich phylogenomic analyses resolve the eukaryotic tree of life and reveal the power of subsampling by sites

Data files

Abstract

Usage notes

Supplemental_Data

Table_S1

Table_S2

All_1554

Euk_1554

All_150

Euk_150

All_1554_noEGT

Euk_1554_noEGT

All_150_noEGT

Euk_150_noEGT

1554_Single_gene_alignments_All

1554_Single_gene_alignments_Euk_only

Works referencing this dataset