Excessive parallelism in protein evolution of Lake Baikal amphipod species flock
Cite this dataset
Burskaia, Valentina (2020). Excessive parallelism in protein evolution of Lake Baikal amphipod species flock [Dataset]. Dryad. https://doi.org/10.5061/dryad.05qfttdzp
Abstract
Methods
Transcriptomic analysis:
We used the transcriptomic sequences of closely related gammarid species from Lake Baikal (Naumenko et al. 2017). Of the 67 species analyzed in that work, we picked the 47 species for which the sequenced sample was based on exactly one individual. Orthologous groups of genes were calculated with OrthoMCL 2.0.9 with the inflation parameter set to 1.5 (Li 2003). If a particular species carried multiple paralogous sequences of a gene, this species was excluded from the analysis of this gene. Codon-aware alignments for orthogroups were obtained with TranslatorX (Abascal et al. 2010) using the Muscle method (Edgar 2004). Poorly aligned sequences were detected and removed from the alignments using the following rule:
1) A column in an alignment was considered "good" if it carried the same nucleotide in at least 50% of species;
2) Sequences for which fewer than 50% positions were "good" were removed from the alignment.
This exclusion process was performed using TrimAl 1.4 (Capella-Gutierrez et al. 2009). Finally we obtained 4366 orthologous groups of genes. Alignments for all genes were concatenated, and a phylogenetic tree was reconstructed using RAxML 8.1.20 (Stamatakis 2014) with GTR+Gamma model, 20 starting maximum parsimony trees and 100 bootstrap analysis pseudoreplicates. As mutations in the third positions of codons are often synonymous, the third positions of codons accumulate substitutions quicker than the first two. Therefore, we used partitioning, with separate substitution models for the first two and for the third codon positions. The obtained tree was similar to that obtained previously.
Sanger sequencing:
Purified PCR products were bidirectionally sequenced on an ABI 3500 Genetic Analyzer (Applied Biosystems) using the BigDye Terminator v 3.1 Cycle Sequencing Kit (Applied Biosystems) and the same primers as for PCR.
Usage notes
In README.txt file the is description of provided data.