Data for: CRISPR spacers acquired from plasmids primarily target backbone genes, making them valuable for predicting potential hosts and host range
Data files
Oct 28, 2024 version files 58.18 MB
-
README.md
1.70 KB
-
Supplementary_Table_A.xlsx
58.18 MB
Abstract
Here we provide the full data which is described and discussed in CRISPR Spacers Acquired from Plasmids Primarily Target Backbone Genes, Making Them Valuable for Predicting Potential Hosts and Host Range, submitted to Microbiology Spectrum.
Abstract of the main manuscript: In recent years, there has been a surge in metagenomic studies focused on identifying plasmids in environmental samples. While these studies have unearthed numerous novel plasmids, enriching our understanding of their environmental roles, a significant gap remains: the scarcity of information regarding the bacterial hosts of these newly discovered plasmids. Furthermore, even when plasmids are identified within bacterial isolates, the reported host is typically limited to the original isolate, with no insight into alternative hosts or the plasmid’s potential host range. Given that plasmids depend on hosts for their existence, investigating plasmids without knowledge of potential hosts offers only a partial perspective. This study introduces a method for identifying potential hosts and host ranges for plasmids through alignment with CRISPR spacers. To validate the method, we compared the PLSDB plasmids database with the CRISPR spacers database, yielding host predictions for 46% of the plasmids. When compared to reported hosts, our predictions achieved an 84% concordance at the family level and 99% concordance at the phylum level. Moreover, the method frequently identified multiple potential hosts for a plasmid, thereby enabling predictions of alternative hosts and the host range. Notably, we found that CRISPR spacers predominantly target plasmid backbone genes while sparing functional genes, such as those linked to antibiotic resistance, aligning with our hypothesis that CRISPR spacers are acquired from plasmid-specific regions rather than insertion elements from diverse sources. Lastly, we illustrate the network of connections among different bacterial taxa through plasmids, revealing potential pathways for horizontal gene transfer.
IMPORTANCE: Plasmids are notorious for their role in distributing antibiotic resistance genes, but they may also carry and distribute other environmentally important genes. Since plasmids are not free-living entities and rely on host bacteria for survival and propagation, predicting their hosts is essential. This study presents a method for predicting potential hosts for plasmids and offers insights into the potential paths for spreading functional genes between different bacteria. Understanding plasmid-host relationships is crucial for comprehending the ecological and clinical impact of plasmids and implications for various biological processes.
https://doi.org/10.5061/dryad.t76hdr87m
Description of the data and file structure
Supplementary Table A: The full list of the alignment of all the entries of the plasmid database (PLSDB, version 2021_06_23) with all the entries of the spacers database from CRISPRCasdb (version 20210121). Each line represents a match between a plasmid from PLSDB and a spacer from CRISPRCasdb. Each line lists the NCBI Accession number of the plasmid (Column A), NCBI Accession number of the source of the spacer (Column B), length of match (Column C), the region of match on the plasmid (Start - Column D; End - Column E), plasmid taxonomic lineage (from species, Column F, to Phylum, Column K), predicted host taxonomic lineage (from species, Column L, to Phylum, Column Q), the lowest taxonomic level in which the reported host and the predicted host match (Column R) and the matched sequence (Column S).
Further analysis of the results, as well as detailed description of the methods, can be found in the main manuscript.
Sharing/Access information
Data was derived from the following sources:
- PLSDB, version 2021_06_23; PLSDB (uni-saarland.de)
- CRISPRCasdb version 20210121; CRISPR-CAS++ (paris-saclay.fr)
Code/Software
All codes used for the production and analysis of the data are available on GitHub: https://github.com/Tal-Lab/crispr_plasmidome