Inferring strain-level mutational drivers of phage-bacteria interaction phenotypes arising during coevolutionary dynamics
Data files
Dec 03, 2024 version files 60.16 KB
-
Data_S1.csv
4.11 KB
-
Data_S2.csv
29.49 KB
-
Data_S3.csv
10.86 KB
-
Data_S4.csv
12.86 KB
-
README.md
2.85 KB
Abstract
The enormous diversity of bacteriophages and their bacterial hosts presents a significant challenge to predict which phages infect a focal set of bacteria. Infection is largely determined by complementary – and largely uncharacterized – genetics of adsorption, injection, cell take-over and lysis. Here we present a machine learning approach to predict phage-bacteria interactions trained on genome sequences of and phenotypic interactions amongst 51 Escherichia coli strains and 45 phage l strains that coevolved in laboratory conditions for 37 days. Leveraging multiple inference strategies and without a priori knowledge of driver mutations, this framework predicts both who infects whom and the quantitative levels of infections across a suite of 2,295 potential interactions. We found that the most effective approach inferred interaction phenotypes from independent contributions from phage and bacteria mutations, accurately predicting 86% of interactions while reducing the relative error in the estimated strength of the infection phenotype by 40%. Feature selection revealed key phage l and E. coli mutations that have a significant influence on the outcome of phage-bacteria interactions, corroborating sites previously known to affect phage l infections, as well as identifying mutations in genes of unknown function not previously shown to influence bacterial resistance. The method's success in recapitulating strain-level infection outcomes arising during coevolutionary dynamics may also help inform generalized approaches for imputing genetic drivers of interaction phenotypes in complex communities of phage and bacteria.
README: Inferring strain-level mutational drivers of phage-bacteria interaction phenotypes arising during coevolutionary dynamics
The manuscript "Lucia-Sanz A, Peng S, Leung CYJ, Gupta A, Meyer JR, Weitz JS. Inferring strain-level mutational drivers of phage-bacteria interaction phenotypes arising during coevolutionary dynamics" has been accepted for publication in Virus Evolution (11/20/2024).
Supplementary data
We use experimental data from Gupta et al., 2022 where E. coli B strain REL606 and phage λ strain cI26 were co-cultured for a 37-day period. Samples were taken on checkpoint days for sequencing and pairwise quantitative plaque assays as described in (Gupta et al., 2022).
- Data_S1.csv is a table that contains the list of genome wide changes (rows) of 50 bacterial host (descended from E. coli B strain REL606) strains (columns) found during the 37-day coevolutionary experiment. Additional columns contain information about each mutation.
- Data_S2.csv is a table that contains the list of genome wide changes (rows) of 44 phage (descended from λ strain cI26) strains (columns) found during the 37-day coevolutionary experiment. Additional columns contain information about each mutation.
- Data_S3.csv and Data_S4.csv are tables that contain the list of mutational features with non-zero coefficients from final model for predicting presence or absence of mutation (POA), and the efficiency of infection (EFF) respectively, based on a linear combination of phage and host mutation profiles. NA - Not applicable
- Supplementary_figures.docx document with the supplementary figures indicated in the main text.
Software: Genotype to phenotype inference model
We developed a machine learning method to infer phage and bacterial mutations driving changes in infection phenotypes.
- genotype_to_phenotype_inference_model-v1.0.0.zip contains all files needed (experimental data and .R scripts) to run this machine learning framework to infer phage and bacterial mutations driving changes in infection phenotypes. We recommend to use latest version posted in https://doi.org/10.5281/zenodo.13838669 or directly refer to the github repository.
How to run the machine learning framework
step 1.
run approximation_uniq_nonsyn.R
step 2.
run all_feature_setup.R
step 3.
run eqtl_trial.R
step 4.(WARNING: code may take ~4h)
run first_step_logistic_regression.R
or
load 'first_step_all_feature_logistic_run_res3.RData' in R.
step 5. (WARNING: code may take ~4h)
run second_step_linear_model.R
or
load 'second_step_all_feature_run_res3.RData'
step 6.
run final_model_using_all_data.R
or
load 'final_model_res3.RData' in R.