Skip to main content

Excessive parallelism in protein evolution of Lake Baikal amphipod species flock

Cite this dataset

Burskaia, Valentina (2020). Excessive parallelism in protein evolution of Lake Baikal amphipod species flock [Dataset]. Dryad.


Repeated emergence of similar adaptations is often explained by parallel evolution of underlying genes. However, evidence of parallel evolution at amino acid level is limited. When the analyzed species are highly divergent, this can be due to epistasic interactions underlying the dynamic nature of the amino acid preferences: the same amino acid substitution may have different phenotypic effects on different genetic backgrounds. Distantly related species also often inhabit radically different environments, which makes the emergence of parallel adaptations less likely. Here, we hypothesize that parallel molecular adaptations are more prevalent between closely related species. We analyze the rate of parallel evolution in genome-size sets of orthologous genes in three groups of species with widely ranging levels of divergence: 47 species of the relatively recent lake Baikal amphipod radiation, a species flock of very closely related cichlids, and a set of significantly more divergent vertebrates. In genes of amphipods, the rate of parallel substitutions at nonsynonymous sites exceeded that at synonymous sites, suggesting rampant selection driving parallel adaptation. By contrast, in cichlids, the rate of nonsynonymous parallel evolution nearly equalled that at synonymous sites, while in vertebrates, this rate was lower than that at synonymous sites, indicating the role of drift in fixation of parallel substitutions. Further data is needed to clarify the cause of the excessive parallelism observed in gammarids.


Transcriptomic analysis:

We used the transcriptomic sequences of closely related gammarid species from Lake Baikal (Naumenko et al. 2017). Of the 67 species analyzed in that work, we picked the 47 species for which the sequenced sample was based on exactly one individual. Orthologous groups of genes were calculated with OrthoMCL 2.0.9 with the inflation parameter set to 1.5 (Li 2003). If a particular species carried multiple paralogous sequences of a gene, this species was excluded from the analysis of this gene. Codon-aware alignments for orthogroups were obtained with TranslatorX (Abascal et al. 2010) using the Muscle method (Edgar 2004). Poorly aligned sequences were detected and removed from the alignments using the following rule:

1) A column in an alignment was considered "good" if it carried the same nucleotide in at least 50% of species;

2) Sequences for which fewer than 50% positions were "good" were removed from the alignment.

This exclusion process was performed using TrimAl 1.4 (Capella-Gutierrez et al. 2009). Finally we obtained 4366 orthologous groups of genes. Alignments for all genes were concatenated, and a phylogenetic tree was reconstructed using RAxML 8.1.20 (Stamatakis 2014) with GTR+Gamma model, 20 starting maximum parsimony trees and 100 bootstrap analysis pseudoreplicates. As mutations in the third positions of codons are often synonymous, the third positions of codons accumulate substitutions quicker than the first two. Therefore, we used partitioning, with separate substitution models for the first two and for the third codon positions. The obtained tree was similar to that obtained previously.

Sanger sequencing:

Purified PCR products were bidirectionally sequenced on an ABI 3500 Genetic Analyzer (Applied Biosystems) using the BigDye Terminator v 3.1 Cycle Sequencing Kit (Applied Biosystems) and the same primers as for PCR.

Usage notes

In README.txt file the is description of provided data.