Wide spectrum and high frequency of genomic structural variation, including transposable elements, in large double stranded DNA viruses
Data files
Dec 18, 2019 version files 9.14 MB
-
AcMNPV_genome.csv
9.39 KB
-
AcMNPV_genome.fas
136.23 KB
-
HCMV_genome.csv
14.53 KB
-
HCMV_genome.fas
234.93 KB
-
IIV31_genome.csv
14.71 KB
-
IIV31_genome.fas
219.82 KB
-
IIV6_genome.csv
23.92 KB
-
IIV6_genome.fas
210.83 KB
-
Supplementary_Figures_09_09.docx
7.77 MB
-
Supplementary_Table_S1.xlsx
161.62 KB
-
Supplementary_Table_S3.docx
13.39 KB
-
Suppplementary_Table_S2.docx
15.38 KB
-
Suppplementary_Table_S4.xlsx
84.66 KB
-
Suppplementary_Table_S5.xlsx
123.68 KB
-
Suppplementary_Table_S6.xlsx
57 KB
-
SV_detection_script_in_short_and_long_reads.R
50.11 KB
Abstract
Our knowledge of the diversity and frequency of genomic structural variation segregating in populations of large double stranded (ds) DNA viruses is limited. Here we sequenced the genome of a baculovirus (AcMNPV) purified from beet armyworm (Spodoptera exigua) larvae at depths >195,000X using both short-read (Illumina) and long-read (PacBio) technologies. Using a pipeline relying on hierarchical clustering of structural variants (SVs) detected in individual short- and long-reads by six variant callers, we identified a total of 1,141 SVs in AcMNPV, including 464 deletions, 443 inversions, 160 duplications and 74 insertions. These variants are considered robust and unlikely to result from technical artifacts because they were independently detected in at least three long reads as well as at least three short reads. SVs are distributed along the entire AcMNPV genome and may involve large genomic regions (30,496 bp on average). We show that no less than 39.9% of genomes carry at least one SV in AcMNPV populations, that the vast majority of SVs (75%) segregate at very low frequency (<0.01%) and that very few SVs persist after 10 replication cycles, consistent with a negative impact of most SVs on AcMNPV fitness. Using short-read sequencing datasets, we then show that populations of two iridoviruses and one herpesvirus are also full of SVs, as they contain between 426 and 1102 SVs carried by 52.4 to 80.1% of genomes. Finally, AcMNPV long reads allowed us to identify 1,757 transposable elements (TEs) insertions, 895 of which are truncated and occur at one extremity of the reads. This further supports the role of baculoviruses as possible vectors of horizontal transfer of TEs. Altogether, we found that SVs, which evolve mostly under rapid dynamics of gain and loss in viral populations, represent an important feature in the biology of large dsDNA viruses.
The supplementary materials include the R script used to perform the hierarchical clustering of structural variants detected by six variants callers, Figures S1 - S17, Tables S1 - S6, the assembly of AcMNPV, IIV6, IIV31 and HCMV genomes as well as there associated gff annoation files.
The supplementary tables contain information on the genomic structural variants we have detected in populations of four large double stranded DNA viruses: the baculovirus AcMNPV, the iridoviruses IIV6 and IIV31 and the herpesvirus HCMV. The tables are provided in .xlsx format and can be open in Excel.