Data from: Dissecting molecular evolution of class 1 integron gene cassettes and identifying their bacterial hosts in suburban creeks via epicPCR
Data files
Jul 30, 2023 version files 7.70 GB
-
KC1_1.fastq
-
KC1_2.fastq
-
KC1_3.fastq
-
KC2_1.fastq
-
KC2_2.fastq
-
KC2_3.fastq
-
KC3_1.fastq
-
KC3_2.fastq
-
KC3_3.fastq
-
MQ1_1.fastq
-
MQ1_2.fastq
-
MQ1_3.fastq
-
MQ2_1.fastq
-
MQ2_2.fastq
-
MQ2_3.fastq
-
MQ3_1.fastq
-
MQ3_2.fastq
-
MQ3_3.fastq
-
README.md
Abstract
Objectives
Our study aimed to sequence class 1 integrons in uncultured environmental bacterial cells in freshwater from suburban creeks and uncover the taxonomy of their bacterial hosts. We also aimed to characterize integron gene cassettes with altered DNA sequences relative to those from databases or literature and identify key signatures of their molecular evolution.
Methods
We applied a single-cell fusion PCR-based technique—emulsion, paired isolation and concatenation PCR (epicPCR)—to link class 1 integron gene cassette arrays to the phylogenetic markers of their bacterial hosts. The levels of streptomycin resistance conferred by the WT and altered aadA5 and aadA11 gene cassettes that encode aminoglycoside (3″) adenylyltransferases were experimentally quantified in an Escherichia coli host.
Results
Class 1 integron gene cassette arrays were detected in Alphaproteobacteria and Gammaproteobacteria hosts. A subset of three gene cassettes displayed signatures of molecular evolution, namely the gain of a regulatory 5′-untranslated region (5′-UTR), the loss of attC recombination sites between adjacent gene cassettes, and the invasion of a 5′-UTR by an IS element. Notably, our experimental testing of a novel variant of the aadA11 gene cassette demonstrated that gaining the observed 5′-UTR contributed to a 3-fold increase in the MIC of streptomycin relative to the ancestral reference gene cassette in E. coli.
Dissecting the observed signatures of molecular evolution of class 1 integrons allowed us to explain their effects on antibiotic resistance phenotypes, while identifying their bacterial hosts enabled us to make better inferences on the likely origins of novel gene cassettes and IS that invade known gene cassettes.
Methods
Approximately 200 mL freshwater samples were collected from Kikkiya Creek (-33.776, 151.117) and the Macquarie University Lake section of Mars Creek (-33.772, 151.115) in three independent biological replicates in suburban metropolitan Sydney, NSW, Australia (Figure 1A). Water samples were immediately transported back to the lab, filtered with 40 µm EASYstrainer cell strainers (Greiner AG, Germany) and centrifuged at 3,220 g for 15 minutes at 4˚C. Pellets were resuspended in 300 µL 0.01M phosphate buffered saline (PBS) solution (pH 7.4) and stored as 25% glycerol stocks for cryopreservation. Estimates of bacterial cell density were obtained by fluorescence microscopy using previously described procedures (1).
epicPCR to determine the identity of bacterial hosts of class 1 integrons in suburban creeks was performed in three technical replicates for each of the three biological replicates using previously established procedures with minor modifications (1). Approximately 150,000 cells were resuspended in 75 µL PCR mix containing the following reagents at the final concentrations indicated in the parentheses: GC buffer (1x), dNTP mix (0.4 mM), Phusion High-Fidelity DNA polymerase (0.05 U/µL) (Thermo Scientific, United States), bovine serum albumin (1 µg/µL) (Promega, United States), Lucigen Ready-Lyse lysozyme (500 U/µL) (LGC Biosearch Technologies, United States), as well as three primers R926 (2 µM), intI1_outer (1 µM), and R519-qacE bridge primer (0.04 µM) (Table S1, Supplementary Materials). The PCR was mixed at 4 ms-1 speed for 45 s in 425 µL ABIL oil using a FastPrep-24 bead beating system (MP Biomedicals, United States), and the resulting emulsion was aliquoted into 8 portions that were subject to the following conditions: 37˚C for 10 min (lysozyme lysis); 98˚C for 5 minutes (initial denaturation); 38 cycles of 98°C for 10 s (denaturation), 55°C for 30 s (annealing), and 72°C for 2 min (extension); and finally, 72°C for 5 min. DNA in the aqueous phase was extracted, purified, and size-selected to deplete <300 bp fragments.
In the second stage of epicPCR, DNA templates from the previous step were PCR amplified in eight aliquots of 12.5 µL reactions containing GC buffer (1x), dNTP mix (0.4 mM), Phusion High-Fidelity DNA polymerase (0.04 U/µL), AP27_short primer (0.8 µM), intI1_nested primer (0.4 µM), forward and reverse blocking primers (0.32 µM each) under the following conditions: 98˚C for 30 s; 23 cycles of 98°C for 10 s, 55°C for 30 s and 72°C for 2 min; followed by a final step of 72°C for 5 min. 80 µL of the recombined PCR product was added to 48 µL of Sera-Mag Select magnetic beads (Cytiva, United States) to deplete epicPCR products containing cassette-less integrons, and the remaining DNA size selection procedures were carried out according to the manufacturer’s instructions. In the third stage of epicPCR, the PCR products were further PCR amplified in eight aliquots of 12.5 µL reactions containing GC buffer (1x), dNTP mix (0.4 mM), Phusion High-Fidelity DNA polymerase (0.04 U/µL), AP27_tail primer (0.4 µM), intI1_nested primer (0.4 µM), forward and reverse blocking primers (0.32 µM each) under the following conditions: 98˚C for 30 s; 12 cycles of 98°C for 10 s, 55°C for 30 s and 72°C for 2 min. The recombined epicPCR products were visualised on 1% agarose gel by electrophoresis. 80 µL epicPCR product was added to 40 µL of Sera-Mag Select magnetic beads in the final DNA size selection step. The purified epicPCR amplicons were sequenced using the same Oxford Nanopore procedures described previously (1).
References
- Qi, Q.; Ghaly, T. M.; Penesyan, A.; Rajabal, V.; Stacey, J. A.; Tetu, S. G.; Gillings, M. R. Uncovering bacterial hosts of class 1 integrons in an urban coastal aquatic environment with a single-cell fusion-polymerase chain reaction technology. Environ Sci Technol 2023, 57 (12), 4870-4879. DOI: 10.1021/acs.est.2c09739
Usage notes
Users in the UNIX/Linux environment can use the unzip command to decompress the downloaded .ZIP file. All Nanopore sequence filtering and processing steps for epicPCR amplicon reads were carried out using an in-house pipeline (https://github.com/timghaly/Int1-epicPCR). First, the pipeline quality-filters Nanopore reads using NanoFilt v2.8.0 (1), removing those with an average read quality of less than 7 and read length less than 670 bp, which represents the minimum length of an epicPCR product with a cassette-less integron. Quality-filtered reads are then oriented and trimmed with the final nested epicPCR forward and reverse primer sequences using Pychopper v2.7.0 (https://github.com/epi2me-labs/pychopper). Pychopper identifies both primers in each read using edlib v1.2.3 (2), and orients the reads based on the forward and reverse primer sequences. Reads that do not contain both primers in the correct orientation are discarded. The pipeline then clusters the primer-oriented reads into amplicon-specific clusters using isONclust v 0.0.6.1 (3), and performs error correction on each cluster using isONcorrect v0.0.8 (4). Importantly, isONcorrect can jointly use all cassette arrangements of the same integron that occur in different clusters, allowing efficient error correction even for amplicons with low sequencing depths. After error correction, a consensus sequence is generated for each cluster using spoa v4.0.7 (5). All consensus sequences are then pooled, while removing any redundancies, including reverse complement redundancies, using the dedupe programme from the BBTools v35 software package (https://github.com/kbaseapps/BBTools). Next, the consensus sequences are screened for both the sequence of the R519-qacE bridging primer, and then the 157 bp region that includes the 5'-end of intI1 and all of attI1 using blastn from the BLAST v2.2.31 package. Retained sequences must contain both the bridging primer and the intI1/attI1 region. Any sequences containing more than one hit to either of these sequences are considered unintended chimeras and discarded. Finally, sequences are screened for the correctly fused 16S rRNA gene fragment using Metaxa2 v2.2.3 (6). The final output of the pipeline for each sample is a set of full-length, primer-oriented amplicon consensus sequences that contain a complete attI1 sequence, R519-qacE bridging primer, and the V4 hypervariable region of the 16S rRNA gene.
CD-HIT v4.8.1 (7) was used to cluster chimeric epicPCR products comprising class 1 integron gene cassette arrays and V4 hypervariable regions of 16S rRNA gene with ≥99% pairwise identity in nucleotide sequences in at least 3 replicates of epicPCR. For epicPCR products that were found in fewer than 3 replicates, the NCBI Genome Workbench was used to perform blastn searches of these sequences against a local database created using all the Nanopore reads obtained in this study. Reads with ≥98% pairwise identity in nucleotide sequences and ≥98% coverage were aligned using the Geneious bioinformatic software (Biomatters, New Zealand), and the consensus sequences were generated from these alignments within each set of epicPCR replicates. Consensus sequences that could be found in at least 3 epicPCR replicates and showed ≥99% pairwise identity in nucleotide sequences and ≥99% identical sites were added to the list of epicPCR products that were previously shortlisted by the CD-HIT algorithm. IntegronFinder 2.0 (8) [parameters: --local-max --gbk --promoter-attI --calin-threshold 1] was used to predict the ORFs and attC sites in the epicPCR products. The sequences of V4 hypervariable regions of 16S rRNA gene sequences obtained in this study were searched against the SILVA 16S rRNA gene database website (9) using the Alignment, Classification and Tree (ACT) tool with a cut-off threshold of 0.80.
References
- De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018;34(15):2666-9.
- Šošic M, Šikic M. Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance. Bioinformatics. 2017;33(9):1394-5.
- Sahlin K, Medvedev P. De novo clustering of long-read transcriptome data using a greedy, quality value-based algorithm. J Comput Biol. 2020;27(4):472-84.
- Sahlin K, Medvedev P. Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis. Nat Commun. 2021;12(1):2.
- Vaser R, Sović I, Nagarajan N, Šikić M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 2017;27(5):737-46.
- Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DG, et al. METAXA2: Improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Mol Ecol Resour. 2015;15(6):1403-14.
- Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658-9.
- Neron B, Littner E, Haudiquet M, Perrin A, Cury J, Rocha EPC. IntegronFinder 2.0: Identification and analysis of integrons across bacteria, with a focus on antibiotic resistance in Klebsiella. Microorganisms. 2022;10(4).
- Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41(Database issue):D590-6.