Structure and dynamics of enterovirus genotype networks

Published Apr 30, 2024 on Dryad. https://doi.org/10.5061/dryad.6hdr7sr76

Abstract

Like all biological populations, viral populations exist as networks of genotypes connected through mutation. Mapping the topology of these networks and quantifying population dynamics across them is crucial to understanding how populations adapt to changes in their selective environment. The influence of mutational networks is especially profound in viral populations which rapidly explore their mutational neighborhoods via high mutation rates. Using a novel single-cell sequencing method, scRNAseq-Enabled Acquisition of mRNA and Consensus Haplotypes Linking Individual Genotypes and Host Transcriptomes (SEARCHLIGHT), we captured and assembled viral haplotypes from hundreds of individual infected cells to reveal the complexity of viral populations. We obtained these genotypes in parallel with host cell transcriptome information, enabling us to link host cell transcriptional phenotypes to the genetic structures underlying virus adaptation. Our examination of these structures reveals the common evolutionary dynamics of enterovirus populations and illustrates how viral populations reach through mutational ‘tunnels’ to span evolutionary landscapes and maintain connection with multiple adaptive genotypes simultaneously.

https://doi.org/10.5061/dryad.6hdr7sr76

This data repository pertains to the publication "Structure and Dynamics of Enterovirus Genotypic Networks", by Nathânia Dábilla and Patrick Dolan. Currently in revision at Science Advances.

Descriptions of the data and file structure are below. Please note: Some CSV tables include blank values, consistent with how these files are produced by the Python and R scripts. These are left intentionally to ensure that these tables produce reproducible results. We included these files in their original form, without empty cells marked with 'N/A'.

Correspondence can be sent to Patrick.Dolan@nih.gov

Data Description

Genotype analysis of passaged EVA71 populations

EVA71_P3_annot_v2.csv 142.93 KB
EVA71_P5_annot_v2.csv 440.39 KB
EVA71_P1_annot_v2.csv 160.04 KB

Columns:

pos: genome position of mutation
base: reference nucleotide base
mutants: mutant nucleotide
CBC_ID: cell barcode identifier
genotype: full viral genotype recovered in each cell
BCMutCount: number of mutants relative to reference genotype
total: total number of cells captured
count: count of that mutation across all cells
freq: frequency of that mutation
ref.codon: reference codon (if applicable)
ref.resPos: residue position of mutation (if applicable)
ref.AA: reference residue character (if applicable)
mut.codon: mutated codon sequence (if applicable)
mut.resPos: mutated residue position (if applicable)
mut.AA: mutant residue character (if applicable)
subName: string denoting substitution name (if applicable)
subClass: class of residue substitution, i.e. Synonymous or non-synonymous (if applicable)
genotypeName: translated genotype identifier
genoFreq: frequency of translated genotype in the population of cells
haploFreq: frequency of nucleotide genotype in the population of cells

EVA71 populations merged with the Seurat Analysis and other computed metadata.

merged_metadata_genotypes_UMAP.csv 4.52 MB

Columns:

index: unique index for each entry
name: sample name (e.g. 'EVA71.P1', for EV-A71 passage 1 infected cells).
BC: barcode sequence
V1: Full ID, sample name, and barcode
UMAP_1: UMAP x coordinate
UMAP_2: UMAP x coordinate
orig.ident: Cell Ranger sample ID
nCount_RNA: number of reads per cells
nFeature_RNA: number of detected genes
percent.mt: percent mitochondrial reads
percent.virus: percent viral reads
percent.virus_log10: log10(percent viral reads)
InfectedStatus_groups: infection status (i.e. Low, High, NotInfected)
InfectedStatus: infection status (i.e. Infected, NotInfected)
InfectedStatus_threshold: cutoff for determining infection (Units: percent virus RNA)
S.Score: Score for S phase assignment
G2M.Score: Score for G2/M phase assignment
Phase: Determined Cell Cycle Phase of cell
RNA_snn_res.0.5: clustering at 0.5 resolution in Seurat
seurat_clusters: renamed column with clustering at 0.5 resolution in Seurat
RNA_snn_res.0.25: clustering at 0.25 resolution in Seurat
pos: genome position of mutation
base: reference nucleotide base
mutants: mutant nucleotide
CBC_ID: cell barcode identifier
genotype: full viral genotype recovered in each cell
BCMutCount: number of mutants relative to reference genotype
total: total number of cells captured
count: count of that mutation across all cells
freq: frequency of that mutation
ref.codon: reference codon (if applicable)
ref.resPos: residue position of mutation (if applicable)
ref.AA: reference residue character (if applicable)
mut.codon: mutated codon sequence (if applicable)
mut.resPos: mutated residue position (if applicable)
mut.AA: mutant residue character (if applicable)
subName: string denoting substitution name (if applicable)
subClass: class of residue substitution, i.e. Synonymous or non-synonymous (if applicable)
genotypeName: translated genotype identifier
genoFreq: frequency of translated genotype in the population of cells
haploFreq: frequency of nucleotide genotype in the population of cells

Consensus fasta sequence files for all populations in the manuscript.

EVA71_6h_P5_allConsensus.fasta 38.12 MB
EVA71_6h_P1_allConsensus.fasta 28.87 MB
EVA71_6h_P3_allConsensus.fasta 21.93 MB
EVD68_allConsensus.fasta 38.86 MB
CVB3_new_allConsensus.fasta 85.66 MB
EVA71_allConsensus.fasta 51.92 MB

Code/Software

Rmarkdowns for Seurat Analysis

EVA71_Mock_with-virus.Rmd 8.38 KB
EVA71_P1_with-virus.Rmd 9.24 KB
Merge_EVA71_Passage_with-virus.Rmd 8.75 KB
EVA71_P5_with-virus.Rmd 8.81 KB
EVA71_P3_with-virus.Rmd 8.98 KB
Subset_EVA71_cluster7.Rmd 3.18 KB

Python scripts

Identify barcoded viral reads and collect them by barcode. "Anchovy" script.

anchovy.0.2.py 9.86 KB new

Script to convert CSV of barcodes into fasta of reads for mapping consensus.

CBCtoFasta.py 881 B new

Script to run on old Locus cluster at NIH-NIAID, no longer available.

(New code to be released for skyline shortly.)

AnchovyJob_11124_PB.sh 2.47 KB