Supporting trees and alignments for the publication: Cryptic and abundant marine viruses at the evolutionary origins of Earth’s RNA virome
Wainaina, James et al. (2022), Supporting trees and alignments for the publication: Cryptic and abundant marine viruses at the evolutionary origins of Earth’s RNA virome, Dryad, Dataset, https://doi.org/10.5061/dryad.gb5mkkwqx
Whereas DNA viruses are known to be abundant, diverse, and commonly key ecosystem players, RNA viruses are relatively understudied outside disease settings. Here, we analyzed ≈28 terabases of Global Ocean RNA sequences to expand Earth’s RNA virus catalogues and their taxonomy, investigate their evolutionary origins, and assess their marine biogeography from pole to pole. Using new approaches to optimize discovery and classification, we identified RNA viruses that necessitate substantive revisions of taxonomy (doubling phyla and adding >50% new classes) and evolutionary understanding. “Species”-rank abundance determination revealed that viruses of new phyla “Taraviricota”, a missing link in early RNA virus evolution, and “Arctiviricota” are widespread and dominant in the oceans. These efforts provide foundational knowledge critical to integrating RNA viruses into ecological and epidemiological models.
To generate the global phylum-level phylogenetic tree, we used an approach that combined consensus [used for highly divergent sequences (Grandi et al., 2020; Grandi et al., 2018; Vargiu et al., 2016; Chen et al., 2015; Fernandez-Caso et al., 2019; Alipour et al., 2013; Zhang and Firestein, 2002)] and individual sequences in the alignment. Each consensus sequence was generated by first aligning individual sequences per megataxon, then obtaining the consensus sequence of the alignment using Geneious v8.1.9 (https://www.geneious.com). The number of ambiguous residues (i.e., ‘X’s) within each consensus sequence was then determined and each consensus sequence composed of >20% ambiguous sites was replaced by the individual sequences within the megataxon to preserve the quality of the alignment (Wiens, 2006). Almost all of the new megataxa had >20% ambiguous sites and hence, for consistency, they were all represented by their individual sequences. Subsequent alignment, trimming and phylogenetic inferences were as described above, with the only modification being using the -gappyout option during trimming.
- K. Katoh, D. M. Standley, MAFFT multiple sequence alignment software version 7:improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
- S. Capella-Gutiérrez, J. M. Silla-Martínez, T. Gabaldón, trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25, 1972–1973 (2009)
- M. F. Boni, D. Posada, M. W. Feldman, An exact nonparametric method for inferring mosaic structure in sequence triplets. Genetics. 176, 1035–1047 (2007
- S. Kalyaanamoorthy, B. Q. Minh, T. K. F. Wong, A. von Haeseler, L. S. Jermiin, ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Methods. 14, 587–589 (2017)
- L.-T. Nguyen, H. A. Schmidt, A. von Haeseler, B. Q. Minh, IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015)
- N. Grandi, M. P. Pisano, M. Demurtas, J. Blomberg, G. Magiorkinis, J. Mayer, E. Tramontano, Identification and characterization of ERV-W-like sequences in Platyrrhini species provides new insights into the evolutionary history of ERV-W in primates. Mob.DNA. 11, 6 (2020)
- N. Grandi, M. Cadeddu, J. Blomberg, J. Mayer, E. Tramontano, HERV-W group evolutionary history in non-human primates: characterization of ERV-W orthologs in Catarrhini and related ERV groups in Platyrrhini. BMC Evol. Biol. 18, 6 (2018)
- L. Vargiu, P. Rodriguez-Tomé, G. O. Sperber, M. Cadeddu, N. Grandi, V. Blikstad, E. Tramontano, J. Blomberg, Classification and characterization of human endogenous retroviruses; mosaic forms are common. Retrovirology. 13, 7 (2016).
- M. Chen, Y. Ma, C. Yang, L. Yang, H. Chen, L. Dong, J. Dai, M. Jia, L. Lu, The
5 combination of phylogenetic analysis with epidemiological and serological data to track HIV-1 transmission in a sexual transmission case. PLoS One. 10 (2015).
- B. Fernández-Caso, J. Á. Fernández-Caballero, N. Chueca, E. Rojo, A. de Salazar, L. García Buey, L. Cardeñoso, F. García, Infection with multiple hepatitis C virus genotypes detected using commercial tests should be confirmed using next generation sequencing. Sci. Rep. 9, 9264 (2019).
- A. Alipour, S. Tsuchimoto, H. Sakai, N. Ohmido, K. Fukui, Structural characterization of copia-type retrotransposons leads to insights into the marker development in a biofuel crop, Jatropha curcas L. Biotechnol. Biofuels. 6 (2013).
- X. Zhang, S. Firestein, The olfactory receptor gene superfamily of the mouse. Nat. 15 Neurosci. 5 (2002).
- J. J. Wiens, Missing data and the design of phylogenetic analyses. J. Biomed. Inform. 39 (2006)
Gordon and Betty Moore Foundation, Award: 3790
National Science Foundation, Award: OCE#1829831
The Ohio Supercomputer and Ohio State University’s Center of Microbiome Science
Ramon-Areces Foundation Postdoctoral Fellowship
Laulima Government Solutions, LLC prime contract with the U.S. National Institute of Allergy and Infectious Diseases (NIAID), Award: HHSN272201800013C
National Science Foundation, Award: ABI#1759874
National Science Foundation, Award: DBI# 2022070