Genomic analysis finds no evidence of canonical eukaryotic DNA processing complexes in a free-living protist

Salas-Leiva, Dayana 1 ; Tromer, Eelco2; Curtis, Bruce1; Jerlström-Hultqvist, Jon1; Kolisko, Martin3; Yi, Zhenzhen4; Salas-Leiva, Joan5; Gallot-Lavallée, Lucie1; Williams, Shelby1; Kops, Geert6; Archibald, John1; Simpson, Alastair1; Roger, Andrew1

Published Oct 20, 2021 on Dryad. https://doi.org/10.5061/dryad.wh70rxwnv

Abstract

Cells replicate and segregate their DNA with precision. Previous studies showed that these regulated cell-cycle processes were present in the last eukaryotic common ancestor and that their core molecular parts are conserved across eukaryotes. However, some metamonad parasites have secondarily lost components of the DNA processing and segregation apparatuses. To clarify the evolutionary history of these systems in these unusual eukaryotes, we generated a genome assembly for the free-living metamonad Carpediemonas membranifera and carried out a comparative genomics analysis. Here, we show that parasitic and free-living metamonads harbor an incomplete set of proteins for processing and segregating DNA. Unexpectedly, Carpediemonas species are further streamlined, lacking the origin recognition complex, Cdc6 and most structural kinetochore subunits. Carpediemonas species are thus the first known eukaryotes that appear to lack this suite of conserved complexes, suggesting that they likely rely on yet-to-be-discovered or alternative mechanisms to carry out these fundamental processes.

Supplementary sequences:

Sequences corresponding to ‘Supplementary Data 4. Spindle assembly, kinetochore and APC/C orthologs in 18 diverse eukaryotic genomes’. Each multifasta sequence file is labeled according to respective aliases reported in the table.

'Orc1_Cdc6_Orc1-Cdc6-like.fasta' file is a multifasta sequence file corresponding to ‘Supplementary Data 6. Orc1, Cdc6 and Orc1/Cdc6-likeproteins. Information used in Supplementary Figure 3, panels b and d’

The figures presented here correspond to high resolution versions of the figures presented in the Supplementary Information of the source manuscript.

Supplementary figure legends:

Supplementary Fig. 1: Maximum-likelihood reconstruction of the phylogenetic relationships within the Metamonada clade. An initial reconstruction was carried out in IQ-Tree with the LG+C60+F+Γ model and 1000 ultrafast bootstraps, this was followed by tree inference under LG+PMSF(C60)+F+ Γ model using 100 nonparametric bootstraps; alignment length of 181 genes encompassing 48341 sites. Tree rooted on the ancestral branch of Amorphea. Scale bar shows the inferred number of amino acid substitutions per site. Bootstrap values are represented as shaded dots on each branch, and the values are represented in the following order: SH-aLRT support percentage/aBayes/nonparametric bootstrapping.

Supplementary Fig. 2: Phylogenetic reconstruction of Orc5 proteins inferred with IQ-TREE¹⁵ under the LG+ C60+F+ Γ model using 1000 ultrafast bootstraps (SH-aLRT support percentage/aBayes/bootstrap). Value ranges for branches are shown by dots, the red dot indicates that the values apply for each node within the clade. The alignment consists of 60 taxa with 422 sites after trimming. For simplicity, only the domain architecture for metamonads, S. cerevisiae, A. thaliana and H. sapiens are depicted on the tree.

Supplementary Fig. 3: Orc1-6 and Cdc6 proteins. (a) Left: typical domain architecture observed for Orc1-6 and Cdc6 in Saccharomyces cerevisiae. Right: representative domain architecture of metamonad proteins drawn to reflect the most common protein size. If no species name is given, then the depicted domain structure was found in all of the metamonads where present. Numbers on the right of each depiction correspond to the total protein length or its range in the case of metamonads (additional information in Supplementary Data 2). (b) Comparison of Orc1, Cdc6 and Orc1/Cdc6-like protein lengths across 81 eukaryotes encompassing metamonads and non-metamonads protists (source information in Supplementary Data 6). Metamonad proteins are highlighted with green shaded bubbles in the background. (c) Orc1/Cdc6 partial ATPase domain showing Walker A and Walker B motifs including R-finger. Reference species at the top. Multiple sequence alignment was visualized with Jalview⁷² using the Clustal colouring scheme. (d) Phylogenetic reconstruction of Orc1, Cdc6 and Orc1/Cdc6-like proteins inferred with IQ-TREE¹⁵ under the LG+ C10+F+ Γ model using 1000 ultrafast bootstraps (bootstrap value ranges for branches are shown with black and grey dots). The alignment consists of 81 taxa with 367 sites after trimming. Orc1/Cdc6-like proteins do not form a clade with bona fide Orc1 and Cdc6 proteins making it impossible to definitively establish whether or not they are orthologs.

Supplementary Fig. 4: The distribution of core molecular systems of the replisome, double strand break repair and endonucleases in nucleomorph genomes of cryptophyte and chlorarachniophytes.

Supplementary Fig. 5: The distribution of core molecular systems of DNA repair across eukaryotic diversity. A schematic global eukaryote phylogeny is shown on the left with classification of the major metamonad lineages indicated. Double strand break repair and endonuclease sets. ***Carpediemonas-Like Organisms. ‘?’ is used in cases where correct orthology was difficult to establish, so the protein name appears with the suffix ‘-like’ in tables.

Supplementary Fig. 6: Presence/absence diagram of LECA kinetochore components in eukaryotes, with a greater sampling of metamonads, including C. membranifera and C. frisia. Left: matrix of presences (coloured) and absences (light grey) of kinetochore, SAC and APC/C proteins that were present in LECA. On top: names of the different subunits; single letters (A-X) indicate Centromere protein A-X (e.g., CenpA) and numbers, APC/C subunit 1-15 (e.g., Apc1). E2S and E2C, refer to E2 ubiquitin conjugases S and C, respectively. Colour schemes correspond to the kinetochore overview figure on the right and to those used in Figure 3. Right: cartoon of the components of the kinetochore, SAC signalling, the APC/C and its substrates (Cyclin A/B) in LECA and Carpediemonas species to indicate the loss of components (light grey shading). Blue lines indicate the presence of proteins that are part of the MCC. Asterisk: Apc10 has three paralogs in C. membranifera and two in C. frisia. One is the canonical Apc10, the two others are fused to a BTB-Kelch protein of which its closest homologs is a likely adapter for the E3 ubiquitin ligase Cullin 3.

Supplementary Fig. 7: Carpediemonas harbours three different types of Histone H3 proteins, a centromere-specific variant (CenpA). Multiple sequence alignment of different Histone H3 variants in eukaryotes and metamonads, including the secondary structure of canonical H3 in humans (pdb: 6ESF_A). CenpA orthologs are characterized by extended amino and carboxy termini and a large L1 loop. Red names in the CenpA panel indicate for which species centromere/kinetochore localization has been confirmed. In addition to CenpA and canonical Histone H3-variants, multiple eukaryotes, including C. membranifera and C. frisia, harbour other divergent H3 variants. Such divergent variants make the annotation of Histone H3 homologs ambiguous (see Asterisks; incomplete sequences). Multiple sequence alignments were visualized with Jalview⁷², using the Clustal colour scheme. Asterisks indicate two potential CenpA candidates in T. vaginalis

Supplementary Fig. 8: Likely presence of SAC signalling in Carpediemonas. (a) Short linear motifs form the basis of SAC signalling. During prometaphase, unattached kinetochores catalyse the production of inhibitor of the cell cycle machinery, a phenomenon known as the SAC⁷³. (I) The main protein scaffold of SAC signalling is the kinase MadBub (paralogs Mad3/Bub1 exist in eukaryotes), which consist of many short linear motifs (SLiMs) that mediate the interaction of SAC components and the APC/C (light blue)^74,75. MadBub itself is recruited to the kinetochore through interaction with Bub3 (GLEBS), which on its turn binds repeated phosphomotifs in Knl1^76-78. The CDI or CMI motif aids to recruit Mad1^79-81, which has a Mad2-interaction Motif (MIM) that mediated the kinetochore-dependent conversion of open-Mad2 to Mad2 in a closed conformation⁸². (II) Mad2, MadBub, Bub3 and 2x Cdc20 (APC/C co-activator) form the mitotic checkpoint complex (MCC) and block the APC/C^75,83,84. MadBub contains 3 different APC/C degrons (D-box, KEN-box and ABBA motif)⁷⁴ that direct its interaction with 2x Cdc20s and effectively make the MCC a pseudo substrate of the APC/C. (III) Increasing amounts of kinetochore-microtubule attachments silence the production of the MCC at kinetochores and the APC/C is released. Cdc20 now presents its substrates Cyclin A and Cyclin B (some eukaryotes have other substrates as well, but they are not universally conserved) for ubiquitination and subsequent degradation through recognition of a Dbox motif⁸⁵. Chromosome segregation will now be initiated (anaphase). (b) Presence/absence matrix of motifs involved in SAC signalling in a selection of Eukaryotes and Metamonads, including C. membranifera and C. frisia. Colours correspond to the motifs in panel a, light grey indicates motif loss. N signifies the number of MadBub homologs that are present in each species. ‘Incomplete’ points to sequences that were found to be incomplete due to gaps in the genome assembly. Question marks indicate the uncertainty in the presence of that particular motif. Although Metamonads have all four MCC components (Mad2, Bub3, MadBub and Cdc20), most homologs do not contain the motifs to elicit a canonical SAC signalling and it is therefore likely that they do not have a SAC response. Exceptions are C membranifera, C. frisia and Kipferlia bialata. They retained the N-terminal KEN-boxes and one ABBA motif, which are involved in the binding of two Cdc20s and a Mad2-interaction motif (MIM) in Mad1 and Cdc20. c) Multiple sequence alignments of the motifs from panel A and B. Coloured motif boxes correspond to panel a and b. Multiple sequence alignments were visualized with Jalview⁷², using the Clustal colouring scheme. Asterisks indicate ambiguous motifs in Carpediemonas membranifera.

Supplementary Fig. 9: Histogram showing the frequency distribution of single nucleotide variants in the genome of C. membranifera. Diagram showing the typical distribution of a haploid genome.

Supplementary Fig. 10: Maximum likelihood reconstruction of Endonuclease IV. The unrooted tree contains eukaryotic and prokaryotic Endo IV sequences, showing Carpediemonas sequences emerging within bacterial proteins. The tree was inferred with IQ-TREE under the LG+I+C20 model with 1000 ultrafast bootstraps; alignment length was 276. Scale bar shows the inferred number of amino acid substitutions per site.

Supplementary Fig. 11: Maximum likelihood reconstruction of RarA. The unrooted tree contains eukaryotic and prokaryotic sequences, showing Carpediemonas sequences emerging within bacterial proteins. The tree was inferred with IQ-TREE under the LG+I+C20 model with 1000 ultrafast bootstraps; alignment length was 414. Scale bar shows the inferred number of amino acid substitutions per site.

Supplementary Fig. 12: Maximum likelihood reconstruction of RNAse H1. Carpediemonas RarA-like proteins emerge within bacterial proteins. Parabasalia and Diplomonada proteins highlighting the proteins have been acquired in different events. The tree was inferred with IQ-TREE under the LG+I+G+C20 model with 1000 ultrafast bootstraps; alignment length was 149. Scale bar shows the inferred number of amino acid substitutions per site.

Genomic analysis finds no evidence of canonical eukaryotic DNA processing complexes in a free-living protist

Data files

Abstract

Methods

Usage notes

Works referencing this dataset