Structure and dynamics of enterovirus genotype networks
Data files
Apr 30, 2024 version files 270.64 MB
-
CVB3_new_allConsensus.fasta
85.66 MB
-
EVA71_6h_P1_allConsensus.fasta
28.87 MB
-
EVA71_6h_P3_allConsensus.fasta
21.93 MB
-
EVA71_6h_P5_allConsensus.fasta
38.12 MB
-
EVA71_allConsensus.fasta
51.92 MB
-
EVA71_P1_annot_v2.csv
160.04 KB
-
EVA71_P3_annot_v2.csv
142.93 KB
-
EVA71_P5_annot_v2.csv
440.39 KB
-
EVD68_allConsensus.fasta
38.86 MB
-
merged_metadata_genotypes_UMAP.csv
4.52 MB
-
README.md
5.74 KB
Abstract
Like all biological populations, viral populations exist as networks of genotypes connected through mutation. Mapping the topology of these networks and quantifying population dynamics across them is crucial to understanding how populations adapt to changes in their selective environment. The influence of mutational networks is especially profound in viral populations which rapidly explore their mutational neighborhoods via high mutation rates. Using a novel single-cell sequencing method, scRNAseq-Enabled Acquisition of mRNA and Consensus Haplotypes Linking Individual Genotypes and Host Transcriptomes (SEARCHLIGHT), we captured and assembled viral haplotypes from hundreds of individual infected cells to reveal the complexity of viral populations. We obtained these genotypes in parallel with host cell transcriptome information, enabling us to link host cell transcriptional phenotypes to the genetic structures underlying virus adaptation. Our examination of these structures reveals the common evolutionary dynamics of enterovirus populations and illustrates how viral populations reach through mutational ‘tunnels’ to span evolutionary landscapes and maintain connection with multiple adaptive genotypes simultaneously.
https://doi.org/10.5061/dryad.6hdr7sr76
This data repository pertains to the publication "Structure and Dynamics of Enterovirus Genotypic Networks", by Nathânia Dábilla and Patrick Dolan. Currently in revision at Science Advances.
Descriptions of the data and file structure are below. Please note: Some CSV tables include blank values, consistent with how these files are produced by the Python and R scripts. These are left intentionally to ensure that these tables produce reproducible results. We included these files in their original form, without empty cells marked with 'N/A'.
Correspondence can be sent to Patrick.Dolan@nih.gov
Data Description
Genotype analysis of passaged EVA71 populations
- EVA71_P3_annot_v2.csv 142.93 KB
- EVA71_P5_annot_v2.csv 440.39 KB
- EVA71_P1_annot_v2.csv 160.04 KB
Columns:
- pos: genome position of mutation
- base: reference nucleotide base
- mutants: mutant nucleotide
- CBC_ID: cell barcode identifier
- genotype: full viral genotype recovered in each cell
- BCMutCount: number of mutants relative to reference genotype
- total: total number of cells captured
- count: count of that mutation across all cells
- freq: frequency of that mutation
- ref.codon: reference codon (if applicable)
- ref.resPos: residue position of mutation (if applicable)
- ref.AA: reference residue character (if applicable)
- mut.codon: mutated codon sequence (if applicable)
- mut.resPos: mutated residue position (if applicable)
- mut.AA: mutant residue character (if applicable)
- subName: string denoting substitution name (if applicable)
- subClass: class of residue substitution, i.e. Synonymous or non-synonymous (if applicable)
- genotypeName: translated genotype identifier
- genoFreq: frequency of translated genotype in the population of cells
- haploFreq: frequency of nucleotide genotype in the population of cells
EVA71 populations merged with the Seurat Analysis and other computed metadata.
- merged_metadata_genotypes_UMAP.csv 4.52 MB
Columns:
- index: unique index for each entry
- name: sample name (e.g. 'EVA71.P1', for EV-A71 passage 1 infected cells).
- BC: barcode sequence
- V1: Full ID, sample name, and barcode
- UMAP_1: UMAP x coordinate
- UMAP_2: UMAP x coordinate
- orig.ident: Cell Ranger sample ID
- nCount_RNA: number of reads per cells
- nFeature_RNA: number of detected genes
- percent.mt: percent mitochondrial reads
- percent.virus: percent viral reads
- percent.virus_log10: log10(percent viral reads)
- InfectedStatus_groups: infection status (i.e. Low, High, NotInfected)
- InfectedStatus: infection status (i.e. Infected, NotInfected)
- InfectedStatus_threshold: cutoff for determining infection (Units: percent virus RNA)
- S.Score: Score for S phase assignment
- G2M.Score: Score for G2/M phase assignment
- Phase: Determined Cell Cycle Phase of cell
- RNA_snn_res.0.5: clustering at 0.5 resolution in Seurat
- seurat_clusters: renamed column with clustering at 0.5 resolution in Seurat
- RNA_snn_res.0.25: clustering at 0.25 resolution in Seurat
- pos: genome position of mutation
- base: reference nucleotide base
- mutants: mutant nucleotide
- CBC_ID: cell barcode identifier
- genotype: full viral genotype recovered in each cell
- BCMutCount: number of mutants relative to reference genotype
- total: total number of cells captured
- count: count of that mutation across all cells
- freq: frequency of that mutation
- ref.codon: reference codon (if applicable)
- ref.resPos: residue position of mutation (if applicable)
- ref.AA: reference residue character (if applicable)
- mut.codon: mutated codon sequence (if applicable)
- mut.resPos: mutated residue position (if applicable)
- mut.AA: mutant residue character (if applicable)
- subName: string denoting substitution name (if applicable)
- subClass: class of residue substitution, i.e. Synonymous or non-synonymous (if applicable)
- genotypeName: translated genotype identifier
- genoFreq: frequency of translated genotype in the population of cells
- haploFreq: frequency of nucleotide genotype in the population of cells
Consensus fasta sequence files for all populations in the manuscript.
- EVA71_6h_P5_allConsensus.fasta 38.12 MB
- EVA71_6h_P1_allConsensus.fasta 28.87 MB
- EVA71_6h_P3_allConsensus.fasta 21.93 MB
- EVD68_allConsensus.fasta 38.86 MB
- CVB3_new_allConsensus.fasta 85.66 MB
- EVA71_allConsensus.fasta 51.92 MB
Code/Software
Rmarkdowns for Seurat Analysis
- EVA71_Mock_with-virus.Rmd 8.38 KB
- EVA71_P1_with-virus.Rmd 9.24 KB
- Merge_EVA71_Passage_with-virus.Rmd 8.75 KB
- EVA71_P5_with-virus.Rmd 8.81 KB
- EVA71_P3_with-virus.Rmd 8.98 KB
- Subset_EVA71_cluster7.Rmd 3.18 KB
Python scripts
Identify barcoded viral reads and collect them by barcode. "Anchovy" script.
- anchovy.0.2.py 9.86 KB new
Script to convert CSV of barcodes into fasta of reads for mapping consensus.
- CBCtoFasta.py 881 B new
Script to run on old Locus cluster at NIH-NIAID, no longer available.
(New code to be released for skyline shortly.)
- AnchovyJob_11124_PB.sh 2.47 KB
These data were collected using 10x single-cell sequencing (10xgenomics.com). After collecting cDNA from the 10x process, the cDNA was processed for long-read sequencing. The resulting short- and long-read sequencing data was then processed through the anchovy script and associated R scripts to generate these outputs as described in the accompanying manuscript.