Data and code from: Engineering bacteriophages through deep mining of metagenomic motifs
Data files
Apr 02, 2025 version files 11.95 MB
Abstract
Bacteriophages can adapt to new hosts by altering sequence motifs through recombination or convergent evolution. Where such motifs exist and what fitness advantage they confer remains largely unknown. We report a new method, Metagenomic Sequence Informed Functional Scoring (Meta-SIFT), to discover sequence motifs in metagenomic datasets to engineer phage activity. Meta-SIFT uses experimental deep mutational scanning data to create sequence profiles to mine metagenomes for functional motifs invisible to other searches. We experimentally tested 17,000 Meta-SIFT derived sequence motifs in the receptor-binding protein of the T7 phage. The screen revealed thousands of T7 variants with novel host specificity with motifs sourced from distant families. Position, substitution and location preferences dictated specificity across a panel of 20 hosts and conditions. To demonstrate therapeutic utility, we engineered active T7 variants against foodborne pathogen E. coli O121. Meta-SIFT is a powerful tool to unlock the potential encoded in phage metagenomes to engineer bacteriophages.
Files and Folders
- motif_finder_tool.py: main motif search tool
- motif_finder_tool_modules.pyx: Cython modules for the main tool, requires compilation
- compile_cython.py: used to compile Cython modules
- IMGVR_v4_human-animal-wastewater. VOG-relevant. cdhit100.faa.gz: curated database proteins from IMG/VR. Some proteins contained within this database are unpublished and are credited to the original authors. Please see the IMG/VR website and usage policy for details.
- prokaryotic_virus_ncbi_july2019.lim3000.trimmed.VOG-relevant.cdhit100.faa.gz: curated database proteins from NCBI.
- Hierarchical_Cluster.R: R scripts used for hierarchical clustering
Scripts for “Engineering bacteriophages through deep mining of metagenomic motifs”.
Required Dependencies:
- Numpy
- Cython
Setup and Installation:
- `Download dataset here or git clone https://github.com/raman-lab/Meta-SIFT
cd Meta-SIFT
- ` python3 compile_cython.py build_ext –inplace`
Usage:
motif_finder_tool.py -i <input_proteins> -m <matrix_table> [options]
Publication Settings:
-n 6 -c 1 -e 1e-50 -t 0.8
-n 10 -c 1 -e 1e-5 -t 0.045