A comprehensive map of evolutionary constraints across the enterovirus A genome
Data files
Jun 19, 2024 version files 238.78 MB
-
Dryad_Repository_InDel_Paper_Final_Resubmission.zip
93.89 MB
-
Dryad_Repository_Indel_Paper.zip
144.85 MB
-
README.md
40.34 KB
Aug 16, 2024 version files 332.64 MB
-
Dryad_Repository_InDel_Paper_Final_Resubmission.zip
93.89 MB
-
Dryad_Repository_InDel_Paper_v3.zip
93.87 MB
-
Dryad_Repository_Indel_Paper.zip
144.85 MB
-
README.md
40.54 KB
Abstract
Insertions and deletions (InDels) are essential sources of novelty in protein evolution. In RNA viruses, InDels cause dramatic phenotypic changes that contribute to the emergence of viruses with altered immune profiles and host engagement. This work aims to comprehensively quantify the mutational tolerance of an RNA virus to insertion, deletion, and substitution. Using Enterovirus A71 (EV-A71) as a prototype for the Enterovirus A species (EV-A) of picornaviruses, we engineered approximately 45,000 insertions, 6,000 deletions, and 41,000 AA substitutions across the nearly 2,200 coding positions of the EV-A71 proteome, quantifying their effects on viral fitness. In contrast with AA changes, the vast majority of InDels are lethal to virus growth. Those that are tolerated primarily reside in a few hotspot regions. These tolerant sites highlight structurally flexible and mutationally plastic regions of EV-A71 proteins that avoid core structural and functional elements but often overlap with key sites of host- and immune recognition, suggesting a complex evolutionary role for InDels and substitutions at these sites. Phylogenetic analysis examining EV-A species isolated from diverse mammalian hosts reveals that many of the experimentally identified hotspots also correspond to sites of natural InDel diversity, suggesting these hotspots of mutational tolerance in EV-A genomes may have contributed to past phenotypic diversification of EV-A. Insights from this and future mutational scanning studies mapping viral evolutionary potential will inform better epidemiological monitoring and Enterovirus vaccine development.
https://doi.org/10.5061/dryad.866t1g1xm
August 16, 2024 changes: Changed relative paths in code to regenerate figures from paper.
Description of the data and file structure
The files are organized in the following fashion:
File directory tree:
.
├── 1_Molecular_Biology_Supp_Information
│ ├── SupplementalFile1_BB_Free_Assembly.fasta
│ ├── SupplementalFile2_Insertion_Capsid.fasta
│ ├── SupplementalFile3_Insertion_Replication.fasta
│ ├── SupplementalFile4_Deletion_Capsid.fasta
│ ├── SupplementalFile5_Deletion_Replication.fasta
│ ├── SupplementalFile6_DMS_Capsid.fasta
│ ├── SupplementalFile7_DMS_Replication.fasta
│ ├── SupplementalFile8_Chloramphenicol_Casette.fasta
│ └── SupplementalFile9_5AA_Oligopool.fasta
├── 2_Enrich2_Dataframes
│ ├── 5_AA_Insertion
│ │ ├── Capsid
│ │ │ ├── main_identifiers_scores_shared_full.tsv
│ │ │ └── main_identifiers_scores.tsv
│ │ └── Replication
│ │ ├── main_identifiers_scores_shared_full.tsv
│ │ └── main_identifiers_scores.tsv
│ ├── Competition_Pool
│ │ ├── main_identifiers_scores_shared_full.tsv
│ │ └── main_identifiers_scores.tsv
│ ├── Deletion
│ │ ├── Capsid
│ │ │ ├── main_identifiers_scores_shared_full.tsv
│ │ │ └── main_identifiers_scores.tsv
│ │ └── Replication
│ │ ├── main_identifiers_scores_shared_full.tsv
│ │ └── main_identifiers_scores.tsv
│ ├── DMS
│ │ ├── Capsid
│ │ │ ├── main_identifiers_scores_shared_full.tsv
│ │ │ └── main_identifiers_scores.tsv
│ │ └── Replication
│ │ ├── main_identifiers_scores_shared_full.tsv
│ │ └── main_identifiers_scores.tsv
│ └── Insertion
│ ├── Capsid
│ │ ├── main_identifiers_scores_shared_full.tsv
│ │ └── main_identifiers_scores.tsv
│ └── Replication
│ ├── main_identifiers_scores_shared_full.tsv
│ └── main_identifiers_scores.tsv
├── 3_R_analysis_scripts
│ ├── DMS_processing_Scripts
│ │ ├── All_Oligos_Capsid.fasta
│ │ ├── All_Oligos_Replication.fasta
│ │ ├── bbfree_2-746-3331.fasta
│ │ ├── bbfree-3332-7324.fasta
│ │ ├── codonCounts
│ │ │ ├── Capsid_input_DMS.codonCounts
│ │ │ ├── Capsid_P1_RepA_DMS_Nextseq.codonCounts
│ │ │ ├── Capsid_P1_RepB_DMS_Nextseq.codonCounts
│ │ │ ├── Capsid_P1_RepC_DMS_Nextseq.codonCounts
│ │ │ ├── Capsid_P2_RepA_DMS_Nextseq.codonCounts
│ │ │ ├── Capsid_P2_RepB_DMS_Nextseq.codonCounts
│ │ │ ├── Capsid_P2_RepC_DMS_Nextseq.codonCounts
│ │ │ ├── Replication_input_DMS_DMS.codonCounts
│ │ │ ├── Replication_P1_RepA_DMS_Nextseq.codonCounts
│ │ │ ├── Replication_P1_RepB_DMS_Nextseq.codonCounts
│ │ │ ├── Replication_P1_RepC_DMS_Nextseq.codonCounts
│ │ │ ├── Replication_P2_RepA_DMS_Nextseq.codonCounts
│ │ │ ├── Replication_P2_RepB_DMS_Nextseq.codonCounts
│ │ │ └── Replication_P2_RepC_DMS_Nextseq.codonCounts
│ │ ├── DMS_Processing_Workflow.R
│ │ ├── filtered
│ │ │ ├── capsid_P0_hgvs.tsv
│ │ │ ├── capsid_P1_A_hgvs.tsv
│ │ │ ├── capsid_P1_B_hgvs.tsv
│ │ │ ├── capsid_P1_C_hgvs.tsv
│ │ │ ├── capsid_P2_A_hgvs.tsv
│ │ │ ├── capsid_P2_B_hgvs.tsv
│ │ │ ├── capsid_P2_C_hgvs.tsv
│ │ │ ├── replication_P0_hgvs.tsv
│ │ │ ├── replication_P1_A_hgvs.tsv
│ │ │ ├── replication_P1_B_hgvs.tsv
│ │ │ ├── replication_P1_C_hgvs.tsv
│ │ │ ├── replication_P2_A_hgvs.tsv
│ │ │ ├── replication_P2_B_hgvs.tsv
│ │ │ └── replication_P2_C_hgvs.tsv
│ │ └── paper_rewrite.Rproj
│ ├── Figure_Generationscript
│ │ ├── Executable_R_script_and_dataframes
│ │ │ ├── 2A_Chimera_Attached_Deletions.csv
│ │ │ ├── 2A_Chimera_Attached_Insertions.csv
│ │ │ ├── alldel_final_input.csv
│ │ │ ├── allStickles_AA_final_input.csv
│ │ │ ├── allStickles_final_input_handle.csv
│ │ │ ├── Capsid_Chimera_deletions_Attached_3bp_p2.csv
│ │ │ ├── Capsid_Chimera_deletions_Attached_6bp_p2.csv
│ │ │ ├── Capsid_Chimera_deletions_Attached_9bp_p2.csv
│ │ │ ├── Capsid_Chimera_DMS_Attached_p2.csv
│ │ │ ├── Capsid_Chimera_insertions_Attached_p2.csv
│ │ │ ├── Capsid_P2_Deletions_shared.csv
│ │ │ ├── Capsid_P2_DMS_Enrich2_shared.csv
│ │ │ ├── Capsid_P2_Scores_shared.csv
│ │ │ ├── Competition_Pool_Input.tsv
│ │ │ ├── Data_Generation_Markdown_InDel_Manuscript.r
│ │ │ ├── Deletions_2C_reindexed.csv
│ │ │ ├── Deletions_3A_reindexed.csv
│ │ │ ├── Deletions_3C_reindexed.csv
│ │ │ ├── Deletions_3D_reindexed.csv
│ │ │ ├── DMS_2A_reindexed_chimera.csv
│ │ │ ├── DMS_2C_reindexed.csv
│ │ │ ├── DMS_3A_reindexed.csv
│ │ │ ├── DMS_3C_reindexed.csv
│ │ │ ├── DMS_3D_reindexed.csv
│ │ │ ├── EV71_4643_Features.csv
│ │ │ ├── Fullproteome_1AA_Enrich2_long.csv
│ │ │ ├── Fullproteome_input_DMS_Enrich2_long.csv
│ │ │ ├── Fullproteome_P2_DMS_Enrich2_long.csv
│ │ │ ├── Insertions_2C_reindexed.csv
│ │ │ ├── Insertions_3A_reindexed.csv
│ │ │ ├── Insertions_3C_reindexed.csv
│ │ │ ├── Insertions_3D_reindexed.csv
│ │ │ ├── main_identifiers_scores_Competition_Pooled_Experiment.tsv
│ │ │ ├── merged_df_indel_DMS.csv
│ │ │ ├── NS2C_2ndary_DMS.csv
│ │ │ ├── NS3D_2ndary_del.csv
│ │ │ ├── Overlap_Residues_Selected_filtered.txt
│ │ │ ├── Replication_P2_Deletion_shared.csv
│ │ │ ├── Replication_P2_DMS_Enrich2_shared.csv
│ │ │ ├── Replication_P2_Scores_shared.csv
│ │ │ ├── Scores_Deletions1AA_Fullproteome_1AA.csv
│ │ │ ├── Scores_Deletions_Fullproteome.csv
│ │ │ ├── Scores_Insertional_Handle_Fullproteome.csv
│ │ │ ├── STRIDE_3N6L.txt
│ │ │ ├── STRIDE_3W95.txt
│ │ │ └── STRIDE_5GQ1.txt
│ │ ├── HTML_version
│ │ │ └── InDel_DMS_resubmission.html
│ │ └── Output_Figures
│ │ ├── AA_histogram_nonlethal_competition.pdf
│ │ ├── Alanine_DMS_2A.pdf
│ │ ├── Alanine_DMS_3D.pdf
│ │ ├── Boxplot_1AA_input.pdf
│ │ ├── Boxplot_Competition_Variants.pdf
│ │ ├── Boxplot_Delsize_input.pdf
│ │ ├── Boxplot_DMS_Aminoacidcounts.pdf
│ │ ├── Competition_input_plot_Capsid.pdf
│ │ ├── Competition_input_plot_Replication.pdf
│ │ ├── del3Adel_hotspot4.pdf
│ │ ├── Deletion_Fitness_Class_3_6_9_bp.pdf
│ │ ├── Deletion_Fitness_Class_3bp.pdf
│ │ ├── deletions_fitness_class_2ndary_2A.pdf
│ │ ├── deletions_fitness_class_2ndary_2C.pdf
│ │ ├── deletions_fitness_class_2ndary_3D.pdf
│ │ ├── del_histogram_nonlethal_competition.pdf
│ │ ├── del_histogram_nonlethal.pdf
│ │ ├── del_histogram.pdf
│ │ ├── DMS_fitness_class_2ndary_2A.pdf
│ │ ├── DMS_fitness_class_2ndary_2C.pdf
│ │ ├── DMS_fitness_class_2ndary_3D.pdf
│ │ ├── DMS_Heatmap_3D_overlap.pdf
│ │ ├── DMS_Heatmap_549_to_601.pdf
│ │ ├── DMS_Heatmap_849_to_901.pdf
│ │ ├── DMS_Heatmap_input.pdf
│ │ ├── DMS_indel_Heatmap_fullproteome.pdf
│ │ ├── DMS_P2_Fitness_Class_Aminoacids.pdf
│ │ ├── DMS_P2_Fitness_Class.pdf
│ │ ├── DMS_plot_input.pdf
│ │ ├── DMS_tophit_competitionpool_2A.pdf
│ │ ├── DMS_tophit_competitionpool_VP1.pdf
│ │ ├── DMS_tophit_EV71_VP1.pdf
│ │ ├── G_input_AA.pdf
│ │ ├── Glu_DMS_2A.pdf
│ │ ├── Glu_DMS_3D.pdf
│ │ ├── Glycine_DMS_2A.pdf
│ │ ├── Glycine_DMS_3D.pdf
│ │ ├── histo_competition_pool_histo_AA.pdf
│ │ ├── histo_competition_pool_histo_Deletions.pdf
│ │ ├── histo_competition_pool_Insertions.pdf
│ │ ├── histo_DMS_sel_plot_nonlethal.pdf
│ │ ├── histo_DMS_sel_plot.pdf
│ │ ├── histo_insertions_sel_nonlethal_plot.pdf
│ │ ├── histo_insertions_sel_plot.pdf
│ │ ├── insertion_fitness_class_2ndary_2A.pdf
│ │ ├── insertion_fitness_class_2ndary_2C.pdf
│ │ ├── insertion_fitness_class_2ndary_3D.pdf
│ │ ├── Insertion_Fitness_Class.pdf
│ │ ├── insertions_histogram_nonlethal_competition.pdf
│ │ ├── IsoLeucine_DMS_2A.pdf
│ │ ├── IsoLeucine_DMS_3D.pdf
│ │ ├── Leucine_DMS_2A.pdf
│ │ ├── Leucine_DMS_3D.pdf
│ │ ├── Lorenz_Curve_Final_del_6bp.pdf
│ │ ├── Lorenz_Curve_Final_del_9bp.pdf
│ │ ├── Lorenz_Curve_Final_del.pdf
│ │ ├── Lorenz_Curve_Final.pdf
│ │ ├── meanPlot_input_3bp.pdf
│ │ ├── meanPlot_input_6bp.pdf
│ │ ├── meanPlot_input_9bp.pdf
│ │ ├── meanPlot_inserts_input_handle.pdf
│ │ ├── met_DMS_2A.pdf
│ │ ├── met_DMS_3D.pdf
│ │ ├── Proline_DMS_2A.pdf
│ │ ├── Proline_DMS_3D.pdf
│ │ ├── Ridge_plot_del_6bp.pdf
│ │ ├── Ridge_plot_del_9bp.pdf
│ │ ├── Ridge_plot_del.pdf
│ │ ├── Ridge_plot.pdf
│ │ ├── Scatter_input_1AA_2AA.pdf
│ │ ├── Scatter_input_1AA_3AA.pdf
│ │ ├── Scatter_input_2AA_3AA.pdf
│ │ ├── Scatter_P2_deletion_RepA_RepB.pdf
│ │ ├── Scatter_P2_deletion_RepA_RepC.pdf
│ │ ├── Scatter_P2_deletion_RepB_RepC.pdf
│ │ ├── Scatter_P2_deletion__Replication_RepA_RepB.pdf
│ │ ├── Scatter_P2_deletion__Replication_RepA_RepC.pdf
│ │ ├── Scatter_P2_deletion__Replication_RepB_RepC.pdf
│ │ ├── Scatter_P2_DMS__Capsid_RepA_RepB.pdf
│ │ ├── Scatter_P2_DMS__Capsid_RepA_RepC.pdf
│ │ ├── Scatter_P2_DMS__Capsid_RepB_RepC.pdf
│ │ ├── Scatter_P2_DMS__Replication_RepA_RepB.pdf
│ │ ├── Scatter_P2_DMS__Replication_RepA_RepC.pdf
│ │ ├── Scatter_P2_DMS__Replication_RepB_RepC.pdf
│ │ ├── Scatter_P2_insertion_RepA_RepB.pdf
│ │ ├── Scatter_P2_insertion_RepA_RepC.pdf
│ │ ├── Scatter_P2_insertion_RepB_RepC.pdf
│ │ ├── Scatter_P2_insertion_replication_RepA_RepB.pdf
│ │ ├── Scatter_P2_insertion_replication_RepA_RepC.pdf
│ │ ├── Scatter_P2_insertion_replication_RepB_RepC.pdf
│ │ ├── Scatter_plot_CapsidDeletion_wt_nonwt.pdf
│ │ ├── Scatter_plot_CapsidDMS_wt_nonwt.pdf
│ │ ├── Scatter_plot_CapsidInsertion_wt_nonwt.pdf
│ │ ├── Scatter_plot_ReplicationDeletion_wt_nonwt.pdf
│ │ ├── Scatter_plot_ReplicationDMS_wt_nonwt.pdf
│ │ ├── Scatter_plot_ReplicationInsertion_wt_nonwt.pdf
│ │ ├── Scores_Deletions1AA_Fullproteome_plot.pdf
│ │ ├── Scores_Deletions1AA_VP1.pdf
│ │ ├── Scores_Deletions2AA_VP1.pdf
│ │ ├── Scores_Deletions3AA_VP1.pdf
│ │ ├── Scores_DMS_Fullproteome_plot.pdf
│ │ ├── Scores_Insertional_Handle_Fullproteome.pdf
│ │ ├── Scores_Insertional_Handle_VP1.pdf
│ │ ├── SingleAA_fullproteome_Fitness_Class.pdf
│ │ ├── SingleAA_Heatmap_1440_to_1492.pdf
│ │ ├── SingleAA_Heatmap_399_to_451.pdf
│ │ ├── SingleAA_Heatmap_549_to_601.pdf
│ │ ├── SingleAA_Heatmap_849_to_901.pdf
│ │ ├── tyrosine_DMS_2A.pdf
│ │ ├── tyrosine_DMS_3D.pdf
│ │ ├── vp1_2a_cleavage_hotspot3.pdf
│ │ └── vp1_nterminus_del_hotspot2.pdf
│ └── Shannon_Entropy_Analysis_Scripts
│ ├── compute_shannon.R
│ ├── Enterovirus_A71_Curated_AA_Aligned.fasta
│ ├── EV71_4643_Features.csv
│ ├── EVA71_EntropyAnalysis.R
│ └── merged_df_indel_DMS.csv
├── 4_MSA_Phylogenetic_Trees_Analysis
│ ├── GapCount_MSA_EVA
│ │ ├── Numberofgaps_counts.pzfx
│ │ └── NumberofgapsMSAEVA.pdf
│ ├── MSA_EVA
│ │ └── Clustered_98_Enterovirus_A_Protein_Sequences_Outgroup_Aligned.fasta
│ ├── MSA_EVA_VP1N-C_termini
│ │ ├── C-terminus_VP1_Alignment.fasta
│ │ ├── Gap_Sizes.txt
│ │ └── N-terminus_VP1_alignment.fasta
│ └── Phylogenetic_Tree_EVA
│ ├── RAxML_bestTree.EVA_98_Tree
│ └── RAxML_bipartitionsBranchLabels.EVA_98_Tree
├── 5_Chimera_Analysis
│ ├── 2A
│ │ ├── Deletions
│ │ │ ├── Attribute_Files
│ │ │ │ └── 2A_Deletions_attribute.txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 3w95_Active_Site_Session_Deletions.py
│ │ │ │ └── 3w95_Active_Site_Session_Deletions.pyc
│ │ │ └── Screenshots
│ │ │ ├── Angle1.png
│ │ │ └── Angle2.png
│ │ ├── DMS
│ │ │ ├── Attribute_Files
│ │ │ │ └── 2A_DMS_attribute .txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 3w95_Active_Site_Session_DMS.py
│ │ │ │ └── 3w95_Active_Site_Session_DMS.pyc
│ │ │ └── Screenshots
│ │ │ ├── Angle1.png
│ │ │ └── Angle2.png
│ │ ├── Insertions
│ │ │ ├── Attribute_Files
│ │ │ │ └── 2A_Insertions_attribute.txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 3w95_Active_Site_Session_Insertions.py
│ │ │ │ └── 3w95_Active_Site_Session_Insertions.pyc
│ │ │ └── Screenshots
│ │ │ ├── Angle1.png
│ │ │ └── Angle2.png
│ │ ├── PDB
│ │ │ ├── 3w95_Active_Site_surface.py
│ │ │ ├── 3w95_Active_Site_surface.pyc
│ │ │ ├── 3w95.pdb
│ │ │ └── Screenshots
│ │ │ └── Cartoon_2A.png
│ │ └── STRIDE
│ │ └── 3W95.txt
│ ├── 2C
│ │ ├── Deletions
│ │ │ ├── Attribute_Files
│ │ │ │ └── 2C_Deletions_attribute.txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 5gq1_chainA_Deletions_Zoom.py
│ │ │ │ └── 5gq1_chainA_Deletions_Zoom.pyc
│ │ │ └── Screenshots
│ │ │ └── Zoom.png
│ │ ├── DMS
│ │ │ ├── Attribute_Files
│ │ │ │ └── 2C_DMS_attribute .txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 5gq1_chainA_DMS_Zoom.py
│ │ │ │ └── 5gq1_chainA_DMS_Zoom.pyc
│ │ │ └── Screenshots
│ │ │ └── Zoom.png
│ │ ├── Insertions
│ │ │ ├── Attribute_Files
│ │ │ │ └── 2C_Insertions_attribute.txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 5gq1_chainA_Insertions_Zoom.py
│ │ │ │ └── 5gq1_chainA_Insertions_Zoom.pyc
│ │ │ └── Screenshots
│ │ │ └── Zoom.png
│ │ ├── PDB
│ │ │ ├── 5gq1_chainA.py
│ │ │ ├── 5gq1_chainA.pyc
│ │ │ └── 5gq1.pdb
│ │ └── STRIDE
│ │ └── 5GQ1.txt
│ ├── 3A
│ │ ├── Deletions
│ │ │ ├── Attribute_Files
│ │ │ │ └── 3A_Deletions_attribute.txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 6HLW_dimer_Deletions.py
│ │ │ │ └── 6HLW_dimer_Deletions.pyc
│ │ │ └── Screenshots
│ │ │ └── 3A_dimer_Deletions.png
│ │ ├── DMS
│ │ │ ├── Attribute_Files
│ │ │ │ └── 3A_DMS_attribute.txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 6HLW_dimer_DMS.py
│ │ │ │ └── 6HLW_dimer_DMS.pyc
│ │ │ └── Screenshots
│ │ │ └── 3A_dimer_DMS.png
│ │ ├── Insertions
│ │ │ ├── Attribute_Files
│ │ │ │ └── 3A_Insertions_attribute.txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 6HLW_dimer_Insertions.py
│ │ │ │ └── 6HLW_dimer_Insertions.pyc
│ │ │ └── Screenshots
│ │ │ └── 3A_dimer_Insertions.png
│ │ └── PDB
│ │ └── 6hlw.pdb
│ ├── 3C
│ │ ├── Deletions
│ │ │ ├── Attribute_Files
│ │ │ │ └── 3C_Deletions_attribute.txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 3osy_activesite_Deletions.py
│ │ │ │ └── 3osy_activesite_Deletions.pyc
│ │ │ └── Screenshots
│ │ │ └── Angle_Active_Site_Deletions_openconfirmation_3osy.png
│ │ ├── DMS
│ │ │ ├── Attribute_Files
│ │ │ │ └── 3C_DMS_attribute .txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 3osy_activesite_DMS.py
│ │ │ │ └── 3osy_activesite_DMS.pyc
│ │ │ └── Screenshots
│ │ │ └── Angle_Active_Site_DMS_openconfirmation_3osy.png
│ │ ├── Insertions
│ │ │ ├── Attribute_Files
│ │ │ │ └── 3C_Insertions_attribute.txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 3osy_activesite_Insertions.py
│ │ │ │ └── 3osy_activesite_Insertions.pyc
│ │ │ └── Screenshots
│ │ │ └── Angle_Active_Site_Insertions_openconfirmation_3osy.png
│ │ ├── PDB
│ │ │ ├── 3osy_activesite.py
│ │ │ ├── 3osy_activesite.pyc
│ │ │ └── 3osy.pdb
│ │ └── STRIDE
│ │ └── 3OSY.txt
│ ├── 3D
│ │ ├── Deletions
│ │ │ ├── Attribute_Files
│ │ │ │ └── 3D_Deletions_attribute.txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 6KWQ_DMS_zoomactivesite.py
│ │ │ │ └── 6KWQ_DMS_zoomactivesite.pyc
│ │ │ └── Screenshots
│ │ │ └── Angle_Active_Site_3D_Deletions_activesite.png
│ │ ├── DMS
│ │ │ ├── Attribute_Files
│ │ │ │ └── 3D_DMS_attribute.txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 6KWQ_DMS_Bonds_Overlap_yellowbonds.py
│ │ │ │ ├── 6KWQ_DMS_Bonds_Overlap_yellowbonds.pyc
│ │ │ │ ├── 6KWQ_DMS.py
│ │ │ │ ├── 6KWQ_DMS.pyc
│ │ │ │ ├── 6KWQ_DMS_zoom.py
│ │ │ │ ├── 6KWQ_DMS_zoom.pyc
│ │ │ │ ├── 6KWQ_DMS_zoom_sandwich.py
│ │ │ │ └── 6KWQ_DMS_zoom_sandwich.pyc
│ │ │ └── Screenshots
│ │ │ ├── Angle_Active_Site_3D_DMS_RNA_exponen_KWQ_ACTIVEsite.png
│ │ │ ├── Angle_Active_Site_3D_DMS_RNA_exponen_KWQ.png
│ │ │ └── Angle_Active_Site_3D_DMS_RNA_exponen_KWQ_sandwich.png
│ │ ├── Insertions
│ │ │ ├── Attribute_Files
│ │ │ │ └── 3D_Insertions_attribute.txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── 3n6l_Insertions.py
│ │ │ │ └── 3n6l_Insertions.pyc
│ │ │ └── Screenshots
│ │ │ ├── Angle_Active_Site_3D_Insertions.png
│ │ │ └── Angle_oppositeactive_Site_3D_Insertions.png
│ │ ├── PDB
│ │ │ ├── 3N6L_Cartoon.py
│ │ │ ├── 3N6L_Cartoon.pyc
│ │ │ ├── 3n6l.pdb
│ │ │ ├── 6kwq.pdb1
│ │ │ ├── Screenshots
│ │ │ │ └── 3N6L_Cartoon.png
│ │ │ └── Template_RNA_Contacts
│ │ │ ├── Overlap_Residues_Selected_6KWQ.txt
│ │ │ └── Overlap_Residues_Selected_filtered_6KWQ.txt
│ │ └── STRIDE
│ │ └── 3N6L.txt
│ ├── Capsid
│ │ ├── Deletions
│ │ │ ├── 1AA
│ │ │ │ ├── Attribute_Files
│ │ │ │ │ └── Capsid_1AADeletions_attributes_P2.txt
│ │ │ │ ├── Chimera_Sessions
│ │ │ │ │ ├── Capsid_8E2X_Deletions_1AA.py
│ │ │ │ │ ├── Capsid_8E2X_Deletions_1AA.pyc
│ │ │ │ │ ├── Protomer_Capsid_8E2X_Deletions_1AA.py
│ │ │ │ │ ├── Protomer_Capsid_8E2X_Deletions_1AA.pyc
│ │ │ │ │ ├── Protomer_Internal_External_Deletions.py
│ │ │ │ │ └── Protomer_Internal_External_Deletions.pyc
│ │ │ │ └── Screenshots
│ │ │ │ ├── Capsid_8E2X_External_Deletions_1AA.png
│ │ │ │ ├── Capsid_8E2X_Internal_Deletions_1AA.png
│ │ │ │ ├── Capsid_8E2X_Protomer_N-terminus_Deletions1AA.png
│ │ │ │ └── Capsid_8E2X_Protomer_Surfaceloops_Deletions1AA.png
│ │ │ ├── 2AA
│ │ │ │ ├── Attribute_Files
│ │ │ │ │ └── Capsid_2AADeletions_attributes_P2.txt
│ │ │ │ ├── Chimera_Sessions
│ │ │ │ │ ├── Capsid_8E2X_Deletions_2AA.py
│ │ │ │ │ ├── Capsid_8E2X_Deletions_2AA.pyc
│ │ │ │ │ ├── Protomer_Capsid_8E2X_Deletions_2AA.py
│ │ │ │ │ ├── Protomer_Capsid_8E2X_Deletions_2AA.pyc
│ │ │ │ │ ├── Protomer_Internal_External_Deletions2AA.py
│ │ │ │ │ └── Protomer_Internal_External_Deletions2AA.pyc
│ │ │ │ └── Screenshots
│ │ │ │ ├── Capsid_8E2X_External_Deletions_2AA.png
│ │ │ │ ├── Capsid_8E2X_Internal_Deletions_2AA.png
│ │ │ │ ├── Capsid_8E2X_Protomer_N-terminus_Deletions2AA.png
│ │ │ │ └── Capsid_8E2X_Protomer_Surfaceloops_Deletions2AA.png
│ │ │ └── 3AA
│ │ │ ├── Attribute_Files
│ │ │ │ └── Capsid_3AADeletions_attributes_P2.txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── Capsid_8E2X_Deletions_3AA.py
│ │ │ │ ├── Capsid_8E2X_Deletions_3AA.pyc
│ │ │ │ ├── Protomer_Capsid_8E2X_Deletions_3AA.py
│ │ │ │ ├── Protomer_Capsid_8E2X_Deletions_3AA.pyc
│ │ │ │ └── Protomer_Internal_External_Deletions3AA.py
│ │ │ └── Screenshots
│ │ │ ├── Capsid_8E2X_External_Deletions_3AA.png
│ │ │ ├── Capsid_8E2X_Internal_Deletions_3AA.png
│ │ │ ├── Capsid_8E2X_Protomer_N-terminus_Deletions3AA.png
│ │ │ └── Capsid_8E2X_Protomer_Surfaceloops_Deletions3AA.png
│ │ ├── DMS
│ │ │ ├── Attribute_Files
│ │ │ │ └── Capsid_DMS_attributes_P2.txt
│ │ │ ├── Chimera_Sessions
│ │ │ │ ├── Capsid_8E2X_DMS.py
│ │ │ │ ├── Capsid_8E2X_DMS.pyc
│ │ │ │ ├── Protomer_Capsid_8E2X_DMS.py
│ │ │ │ ├── Protomer_Capsid_8E2X_DMS.pyc
│ │ │ │ ├── Protomer_Internal_External_DMS.py
│ │ │ │ └── Protomer_Internal_External_DMS.pyc
│ │ │ └── Screenshots
│ │ │ ├── Capsid_8E2X_External_DMS.png
│ │ │ ├── Capsid_8E2X_Internal_DMS.png
│ │ │ ├── Capsid_8E2X_Protomer_Nterminus_DMS.png
│ │ │ └── Capsid_8E2X_Protomer_Surfaceloops_DMS.png
│ │ └── Insertions
│ │ ├── Attribute_Files
│ │ │ └── Capsid_Insertions_attribute_p2.txt
│ │ ├── Chimera_Sessions
│ │ │ ├── Capsid_8E2X_Insertions.py
│ │ │ ├── Capsid_8E2X_Insertions.pyc
│ │ │ ├── Protomer_Capsid_8E2X_Insertions.py
│ │ │ ├── Protomer_Capsid_8E2X_Insertions.pyc
│ │ │ ├── Protomer_Internal_External_Insertions.py
│ │ │ └── Protomer_Internal_External_Insertions.pyc
│ │ └── Screenshots
│ │ ├── Capsid_8E2X_External_Insertions.png
│ │ ├── Capsid_8E2X_Internal_Insertions.png
│ │ ├── Capsid_8E2X_Protomer_Nterminus_Insertions.png
│ │ └── Capsid_8E2X_Protomer_Surfaceloops_Insertions.png
│ ├── EVA71_CVA6_Comparison
│ │ ├── CVA6EV71comparison_internal.py
│ │ ├── CVA6EV71comparison_internal.pyc
│ │ ├── CVA6EV71comparison_surface.py
│ │ ├── CVA6EV71comparison_surface.pyc
│ │ └── Screenshots
│ │ ├── CVA6EVA71comparison_internal.png
│ │ └── CVA6EVA71comparison.png
│ ├── Labels
│ │ ├── Labels_Deletions.png
│ │ ├── Labels_DMS.png
│ │ └── Labels_Insertions.png
│ └── Structural_Annotations_Colors.txt
├── 6_Sequencing_Analysis_Scripts
│ ├── DelMapper_v0.2.py
│ └── stickleback.0.2.py
├── Dryad_Repo_InDel_Readme.pdf
└── Dryad_Repo_InDel_Readme.txt
The file descriptions are as follows:
1- Molecular_Biology_Supp_Information
Contains all the molecular biology supplementary information regarding the generation of mutational scanning libraries.
- SupplementalFile1_BB_Free_Assembly.fasta: Sequences necessary for generation of EV-A71 molecular clone without BsmBI and BsAI sites.
- SupplementalFile2_Insertion_Capsid.fasta: Contains sequences necessary for the generation of insertional handle libraries in the capsid proteins of EV-A71.
- SupplementalFile3_Insertion_Replication.fasta: Contains sequences necessary for the generation of insertional handle libraries in the replication proteins of EV-A71.
- SupplementalFile4_Deletion_Capsid.fasta: Contains sequences necessary for the generation of deletion libraries in the capsid proteins of EV-A71.
- SupplementalFile5_Deletion_Replication.fasta: Contains sequences necessary for the generation of deletion libraries in the replication proteins of EV-A71.
- SupplementalFile6_DMS_Capsid.fasta: Contains sequences necessary for the generation of amino acid change libraries in the capsid proteins of EV-A71.
- SupplementalFile7_DMS_Replication.fasta: Contains sequences necessary for the generation of amino acid change libraries in the replication proteins of EV-A71.
- SupplementalFile8_Chloramphenicol_Casette.fasta: Sequence of the Chloramphenicol cassette for reducing wild-type contamination.
- SupplementalFile9_5AA_Oligopool.fasta: Contains sequences necessary for the generation of 5 AA insertion libraries from the insertional handle library.
2- Enrich2_Dataframes
This folder contains the Enrich2 outputs for all mutational scanning experiments. There are two dataframes for each mutational scanning experiment “main_identifiers_scores_shared_full.tsv” and “main_identifiers_scores.tsv”.
-
main_identifiers_scores_shared_full.tsv: Contains enrich2 scores and standard errors for variants of all different biological replicates.
- Row 1: Replicates: Indicates which biological replicate score and standard error are shown in row 2.
-
Row 2: hgvs references the nomenclature used to name the variants. The score references the enrich2 score for each variant in the corresponding biological replicate. SE references the standard error measurement for each variant in the corresponding biological replicate.
-
Main_identifiers_scores.tsv: Contains the mean enrich2 scores, standard errors, and epsilon for all variants.
- Row 1: hgvs references the nomenclature used to name the variants. The score references the mean enrich2 score of the different biological replicates. SE references the standard error measurement between the different biological replicates. Epsilon is the change in the standard error after the last iteration of the random-effects model.
3- R_analysis_scripts
Contains R scripts for data analysis and figure generation.
-
Shannon_Entropy_Analysis_Scripts: Contains R scripts for entropy calculation at each residue in the EV-A71 proteome derived from a set of 482 complete sequences of EV-A71.
- compute_shannon.R: Function to calculate Shannon entropy
- EVA71_EntropyAnalysis.R: R script to calculate Shannon entropy and to plot entropy calculations along with the mean fitness effects derived from the deep mutational scanning experiments.
- Enterovirus_A71_Curated_AA_Aligned.fasta: 482 complete amino sequences of EV-A71 genomes.
- EV71_4643_Features.csv: Contains a list of features of the EV-A71 genome used to plot boundaries of viral proteins.
-
merged_df_indel_DMS.csv: Dataframe including enrich2 scores for insertion, deletion, and amino acid variants. Position references the position of the variant. AminoAcid references the type of variant introduced. The score is the enrich2 score.
-
DMS_processing_Scripts: Contains R scripts required to process deep mutational scanning experiments. These scripts will filter only codons that were designed in our amino acid scanning library.
- DMS_Processing_Workflow.R and paper_rewrite.Rproj: R script that will filter for only designed codon variants from GATK/Analyze Saturation Mutagenesis codon output.
- All_Oligos_Capsid.fasta: Oligopools containing every possible amino acid change in the capsid proteins.
- All_Oligos_Replication.fasta: Oligopools containing every possible amino acid change in the replication proteins.
- bbfree_2-746-3331.fasta: Nucleotide sequence of the EV-A71 capsid proteins region.
-
bbfree-3332-7324.fasta: Nucleotide sequence of the EV-A71 replication proteins region.
- codonCounts: GATK/Analyze Saturation Mutagenesis codon output for all amino acid scanning experiments. Each column represents a codon, each row is a position in the viral genome, and the measurements represent the sequencing counts at each position for a certain codon.
- filtered: Contains all filtered amino acid counts for all amino acid scanning experiments. hgvs references the nomenclature used to name the variants, and count represents the sequencing count for each variant.
- Figure_Generationscript: Contains R scripts required to generate figures for deep mutational scanning experiments.
- Executable_R_script_and_dataframes: “Data_Generation_Markdown_InDel_Manuscript.r” is the R script used for data analysis and generation of figures in the paper. All the data frames required to run the script are included in this folder. The variables in these data frames include: indel, insertion, position, or residue: position of variant, score: relative enrichment values or enrich2 scores, dataset: dataset where the values were measured, n: count, Seq, or Amino Acid: type of variant, hgvs: nomenclature used to name the variants, SE: standard error, epsilon: change in the standard error after the last iteration of the random-effects model, Secondary: secondary structure assignment using STRIDE, Residue_Overlap_3D: contains residues in 3D(pol) that interact with the template RNA.
- HTML_version: HTML version of the executable R script where the script can be seen along with the figures generated.
- Output_Figures: Contains all output figures generated by the R script.
4- MSA_Phylogenetic_Trees_Analysis
Contains the multiple sequence alignment and the corresponding phylogenetic tree of the EV-A species.
- GapCount_MSA_EVA: Contains measurements of the number of gaps observed for each viral protein in the alignment of Enterovirus A species.
MSA_EVA: Multiple sequence alignment of 107 Enterovirus A viruses with the ICTV Enterovirus B exemplar isolate sequence (GenBank: AAB59927.1) as an outgroup.
Phylogenetic_Tree_EVA: Phylogenetic tree produced from the EV-A alignment using the maximum-likelihood method in RAxML.
MSA_EVA_VP1N-C_termini: Multiple sequence alignment of EV-A focusing on the gaps at the N- and C-termini of VP1. Gap_Sizes.txt contains all the gap sizes and their colors shown in Figure 6 of the paper.
5- Chimera_Analysis
Contains the chimera sessions for visualizing relative enrichment scores on the structures of the EV-A71 viral proteins.
- Structural_Annotations_Colors.txt: contains all the structural annotations used in Figures 3, 4, 5, 6, and Supplementary Figure 6.
Each viral protein is organized into a folder with the following subfolders and sub-subfolders:
-
PDB: The PDB subfolder contains the PDB structure(s) used for each viral protein.
- STRIDE: The STRIDE subfolder contains the secondary structure assignments for each PDB structure.
Insertions, Deletions, and DMS subfolders: These contain a sub-subfolder called “Attribute_Files” containing the attribute files used to map the relative enrichment scores on the protein structure. Another sub-subfolder called “Chimera_Sessions” contains the chimera sessions showing the relative enrichment scores of variants mapped on the structure of the viral protein. The “Screenshots” sub-subfolder contains all the screenshots saved from the chimera sessions shown in Figures 5 and 6 of the paper. - EVA71_CVA6_Comparison: Contains chimera sessions for comparison of the EV-A71 and CV-A6 structures at the N- and C- termini of VP1.
- Labels: Contains screenshots of the legends for the relative enrichment values used for insertions, deletions, and amino acid changes
6- Sequencing_Analysis_Scripts
Contains the two Python scripts used for detecting insertions (stickleback.0.2.py) and deletions (DelMapper_v0.2.py) from sequencing data.
Sharing/Access information
Short-read data will be published under NCBI project number PRJNA1066851.
Code/Software
Stickleback and DeletionMapper were run in Python environments. See the GitHub: https://github.com/QVEU/InDel_Toolkit for the current versions These scripts take as input files: mapped .sam files. R scripts for analysis and figure generation were run using the R version 4.0.3 (2020-10-10). R packages used were: ggplot2, tidyverse, tidyr, ggpubr, dplyr, ggridges, ineq, RColorBrewer, stringr, gglorenz, readr, scales. Chimera sessions were all created in Chimera production version 1.16 (build 42360).
These data were collected by sequencing the input and output libraries from deep mutational, insertional, and deletional scanning experiments. Data was processed by next-gen sequencing pipelines, in-house scripts, and published software to interpret the fitness effects of mutations engineered in the EV-A71 genome. Data was visualized using R packages, including ggplot2. All scripts for the analysis and generation and included here and also available through GitHub (see links in the Related Works section).