Skip to main content
Dryad

Comment on “Ancient origins of allosteric activation in a Ser-Thr kinase”

Cite this dataset

Patton, Jaeda; Park, Yeonwoo; Hochberg, Georg; Thornton, Joseph (2020). Comment on “Ancient origins of allosteric activation in a Ser-Thr kinase” [Dataset]. Dryad. https://doi.org/10.5061/dryad.cvdncjt2b

Abstract

Hadzipasic et al. used ancestral sequence reconstruction to identify historical sequence substitutions that putatively caused Aurora kinases to evolve allosteric regulation. We show that their results arise from an implausible phylogeny and sparse sequence sampling. Addressing either problem reverses their inferences: allostery and the amino acids that confer it were not gained during the diversification of eukaryotes but were lost in a subgroup of Fungi.

Methods

Ancestral sequence reconstruction using Hadzipasic et al. sequences under congruence constraint (Fig. 1E, G)

We acquired the AURK and PLK sequences analyzed by Hadzipasic et al. We aligned them using MUSCLE (v3.8.425) (1), removed sequence-specific insertions and ambiguously aligned sites, and trimmed the N- and C-termini, matching the sequence boundaries set by Hadzipasic et al. We used RAxML (v8.2.12) (2) to infer the constrained ML phylogeny, imposing the constraint shown in Fig. 1B, and used PAML (v4.8) (3) to perform ASR. For both phylogenetics and ASR, we used the same model of sequence evolution as used by Hadzipasic et al. (LG + G + X, four gamma rate categories).

Phylogenetics and ASR with improved sequence sampling (Fig. 2)

To obtain a broad sample of eukaryotic AURK and PLK sequences, we used a reciprocal best-hit protein BLAST strategy using the NCBI protein database (4). Human AURKA and PLK4 were used as query sequences. Taxonomically restricted BLAST searches were conducted that together encompassed all species within the five major eukaryotic kingdoms/subkingdoms (Fungi, Holozoa, Amoebozoa, Archaeplastida, and SAR). BLAST hits of anomalous length (<250 or >600 amino acids for AURK, <250 or >1000 amino acids for PLK) were discarded. Redundant sequences were eliminated at similarity cutoff 0.85 using CD-HIT (v4.8.1) (5). Each remaining BLAST hit was then used as query in a reciprocal BLAST search against human proteins, and all sequences for which the best hit in humans was an AURK or PLK were retained.

Sequence alignment of these hits was performed hierarchically using MUSCLE software. We first aligned sequences from within defined profile groups of species (each usually a superphylum or phylum). We trimmed the N- and C-termini, leaving sites corresponding to human AURKA sites 133 to 383, and then removed sites representing species-specific insertions and ambiguously aligned sites. We discarded sequences that were missing 10 or more consecutive amino acids present in the majority of other sequences. We then inferred the phylogeny of the profile group using FastTree (v2.1.11) (6). To minimize long branch attraction, we removed all sequences or groups of sequences subtended by branches of length >0.5. We also removed sequences/small groups that were assigned to entirely different phyla (e.g., annelid sequences placed inside the molluscs, or green algae sequences placed inside land plants), as well as taxon-specific paralogs with long branches that were pulled outside of the entire profile group being aligned. We then used profile-profile alignment in MUSCLE to progressively align the group-specific alignments to each other, yielding a global AURK/PLK alignment.

We used RAxML to infer the ML AURK-PLK phylogeny from this global alignment, using the best-fit model of evolution (LG + G + X). For all RAxML analyses, we iterated topology search 50 times using different random number seeds, and chose the iteration with the highest likelihood. On the ML phylogeny, AURKs from a few lower-level groups of Ecdysozoa and Platyhelminthes subtended by long branches were placed in kingdoms other than the animals; drastic long-branch misplacements also moved a few small groups of AURKs from Fungi and Alveolates into other kingdoms/superphyla and affected some PLK sequences. These sequences were removed to yield the final alignment, and the analysis was repeated to infer the final ML phylogeny. Approximate likelihood ratio test was performed using PhyML (v3.3) (7). For the maximum congruence constraint analysis, we imposed the topological constraint shown in Fig. 2A and used RAxML to perform phylogenetic analysis to find the ML tree, branch lengths, and other parameters given this constraint. We used a similar approach to find the ML tree consistent with the Fungi-out constraint (the same constraint in Fig. 2A, except that Fungi are the most basally branching group). Ancestral sequences were inferred using the marginal reconstruction algorithm in PAML using LG + G and the amino acid frequencies inferred on the ML tree by RAxML.

The Shimodaira-Hasegawa test was used to evaluate relative support for the ML vs. MC trees. We used the R package phangorn to execute the SH test (8), comparing the ML tree found in the unconstrained ML search to the ML tree from the MC-constrained search (lnL -121973.5 and - 121992.0, respectively); this returned a nonsignificant result (p-value=0.36). The heuristic searches may not have identified the globally optimal tree in each case, so we also compared the tree from the search iteration with the highest likelihood in the unconstrained ML analysis to the tree recovered from the iteration with the lowest likelihood in the MC-constrained analysis (lnL=-122032.7, p-value=0.21).

Usage notes

See File_descriptions.xlsx

Funding

Howard Hughes Medical Institute

United States Department of Energy, Award: DE-FG02-05ER15699

Office of the Director, Award: GM100966

Damon Runyon Cancer Research Foundation, Award: DRG-2343-18