Data from: Viral tagging reveals discrete populations in Synechococcus viral genome sequence space
Data files
Apr 16, 2015 version files 3.61 GB
-
ANI_2_PCA.txt
-
Comm_MG.fna
-
ConsensusCGs.zip
-
DATA-FIGURES_Replace.xls
-
GP23_Sequences.txt
-
RandomizationsX1500.FNA
-
RAREFACTION.zip
-
VT_MG_IL.fastq
-
VT_MG.fna
Abstract
Microbes and their viruses drive myriad processes across ecosystems ranging from oceans and soils to bioreactors and humans. Despite this importance, microbial diversity is only now being mapped at scales relevant to nature, while the viral diversity associated with any particular host remains little researched. Here we quantify host-associated viral diversity using viral-tagged metagenomics, which links viruses to specific host cells for high-throughput screening and sequencing. In a single experiment, we screened 107 Pacific Ocean viruses against a single strain of Synechococcus and found that naturally occurring cyanophage genome sequence space is statistically clustered into discrete populations. These population-based, host-linked viral ecological data suggest that, for this single host and seawater sample alone, there are at least 26 double-stranded DNA viral populations with estimated relative abundances ranging from 0.06 to 18.2%. These populations include previously cultivated cyanophage and new viral types missed by decades of isolate-based studies. Nucleotide identities of homologous genes mostly varied by less than 1% within populations, even in hypervariable genome regions, and by 42–71% between populations, which provides benchmarks for viral metagenomics and genome-based viral species definitions. Together these findings showcase a new approach to viral ecology that quantitatively links objectively defined environmental viral populations, and their genomes, to their hosts.