Uncovering bacterial hosts of class 1 integrons in an urban coastal aquatic environment with a single-cell fusion-polymerase chain reaction technology
Data files
Dec 05, 2022 version files 2.70 GB
-
Nanopore_FASTQ_Data.zip
-
README.md
Abstract
Horizontal gene transfer (HGT) is a key driver of bacterial evolution via transmission of genetic materials across taxa. Class 1 integrons are genetic elements that correlate strongly with anthropogenic pollution and contribute to the spread of antimicrobial resistance (AMR) genes via HGT. Despite their significance to human health, there is a shortage of robust, culture-free surveillance technologies for identifying uncultivated environmental taxa that harbour class 1 integrons. We developed a modified version of epicPCR (emulsion, paired isolation and concatenation PCR) that links class 1 integrons amplified from single bacterial cells to taxonomic markers from the same cells in emulsified aqueous droplets. Using this single-cell genomic approach and Nanopore sequencing, we successfully assigned class 1 integron gene cassette arrays containing mostly AMR genes to their hosts in coastal water samples that were affected by pollution. Our work presents the first application of epicPCR for targeting variable, multi-gene loci of interest. We also identified the Rhizobacter genus as novel hosts of class 1 integrons. These findings establish epicPCR as a powerful tool for linking taxa to class 1 integrons in environmental bacterial communities and offer the potential to direct mitigation efforts towards hotspots of class 1 integron-mediated dissemination of AMR.
Methods
Isolation of bacteria from coastal seawater
Coastal water samples were collected in three 50 mL biological replicates after rain in a rocky intertidal zone downstream from a stormwater outlet at Shark Point (location coordinates: -33.91, 151.27) near Sydney, NSW, Australia (Supplementary Figure 1). The samples were filtered using EASYstrainer cell strainers with a mesh size of 40 µm (Greiner AG, Germany) and centrifuged at 3220g for 15 minutes. Cell pellets were resuspended in phosphate buffered saline (PBS) solution at pH 7.4. Resuspended cells were stored as 25% glycerol stocks.
Bacterial cell counts by fluorescence microscopy
Approximately 105–106 cells were fixed and stained in PBS containing 4% formaldehyde and 2 µg/mL 4′,6-diamidino-2-phenylindole (DAPI) for 30 min in the dark with rotation at room temperature. Re-suspended cells were diluted in PBS, transferred to a haemocytometer and imaged under the x40 lens of an Olympus BX63 fluorescence microscope. Differential interference contrast (DIC) and DAPI fluorescence channel images were overlaid and analysed with ImageJ (National Institutes of Health, United States). Bacterial cell densities were calculated from the average number of cells in five 0.2 mm x 0.2 mm square grids of the counting chamber for each sample (Figure S2).
Emulsion, paired-isolation and concatenation PCR (epicPCR)
Our hybrid epicPCR procedures were adapted from the protocols of Sakowski et al (1) and Diebold et al (2). Briefly, epicPCR was performed in three technical replicates for the three biological replicates. Approximately 150,000 cells from the bacterial glycerol stocks were pelleted and vortexed in a 75 µL PCR reagent mix containing GC buffer (1x), dNTP mix (0.4 mM), Phusion Hot-Start II polymerase (0.05 U/µL) (Thermo Scientific, United States), bovine serum albumin solution (1 µg/µL) (Promega, United States), Lucigen Ready-Lyse lysozyme (500 U/µL) (LGC Biosearch Technologies, United States) and oligonucleotide primers R926 (2 µM), intI1_outer (1 µM), and R519-qacE bridging primer (0.04 µM) (Table S2). To demonstrate the absence of false associations between 16S rRNA markers and class 1 integrons originating from different cells, we spiked one set of technical replicates with cells of a class 1 integron-free Escherichia coli MG1655 strain at a total population frequency of approximately 10%.
ABIL emulsion oil was prepared by supplementing mineral oil with 4% ABIL EM 90 (Redox, Australia) and 0.05% Triton X-100 (Promega, United States). 425 µL ABIL oil was added to the PCR mix, which was immediately emulsified at 4 ms-1 for 45 s using the FastPrep-24 bead beating system (MP Biomedicals, United States). The water-in-oil emulsion was aliquoted into 8 portions, incubated at 37˚C for 10 min, and was subjected to the following PCR amplification conditions: 98˚C for 5 minutes; 38 cycles of 98°C for 10 s, 59°C for 20 s, and 72°C for 2 min; and finally 72°C for 5 min. The intI1_outer primer and the R519-qacE bridging primer bind to intI1 in the 5’-conserved segment (CS) and qacE in the 3’-CS of class 1 integrons respectively (Figure 1B). The 18 bp overhang introduced by the bridging primer to the qacE end allows the intermediate PCR products to act as a long primer and amplify the V4-V5 hypervariable region with the R926 reverse primer.
The emulsified PCR mix for each technical replicate was pooled, vortexed in 900 µL isobutanol and 200 µL 5M NaCl and centrifuged for 1 min at ~20,000g with soft brake. The aqueous phase at the bottom of the tubes was extracted, purified using the Monarch PCR & DNA Cleanup Kit (New England Biolabs, United States), and eluted with nuclease-free water. Sera-Mag Select magnetic beads (Cytiva, United States) were added to the eluted PCR products to deplete <300 bp DNA fragments. The size-selected DNA was used as template in 100 µL Phusion PCR mix containing GC buffer (1x), dNTP mix (0.4 mM), Phusion Hot-Start II polymerase (0.04 U/µL), AP27_short primer (0.8 µM), intI1_nested primer (0.4 µM), forward and reverse blocking primers (0.32 µM each). The PCR mix subjected to the following thermocycling conditions in 8 aliquots: 98˚C for 30 s; 35 cycles of 98°C for 10 s, 59°C for 20 s and 72°C for 1 min 50 s; followed by a final step of 72°C for 5 min. epicPCR products were visualised on 1% agarose gel by electrophoresis and treated with Sera-Mag Select magnetic beads to deplete epicPCR products containing cassette-less integrons.
To confirm that fusion between class 1 integron and 16S DNA fragments has occurred (Supplementary Figure 3A), 2 ng of each purified epicPCR product was amplified using qacE_F (0.4 µM) and AP28_short (0.4 µM) primers in GoTaq polymerase master mix (Promega, United States) under the following thermocycling conditions: 95˚C for 30 s; 30 cycles of 95°C for 30 s, 55°C for 30 s and 72°C for 30 s; with a final step of 72°C for 5 min. Successful amplification of the qacE-16S rRNA gene chimeric region should produce ~390 bp gel bands.
Oxford Nanopore long-read sequencing
Purified epicPCR amplicons were treated with the NEBNext Ultra II End Repair/dA-Tailing Module (New England Biolabs, United States), purified with JetSeq Clean magnetic beads (Meridian Bioscience, United States), and eluted using nuclease-free water. End-repaired DNA was barcoded using the Native Barcoding Expansion kit (ONT, United Kingdom) and Blunt/TA Ligase Master Mix (New England Biolabs, United States). Barcoded samples were multiplexed and ligated to sequencing adaptor molecules in Adapter Mix II using the NEBNext Quick T4 DNA ligase (New England Biolabs, United States). The ligation products were washed twice with Short Fragmentation Buffer and eluted with Elution Buffer. The sequencing library was loaded into a FLO-MIN106D flow cell R9.4.1 and a MinION Mk1B sequencer (ONT, United Kingdom). The minimum threshold for read length was set to 1 kB on the MinKNOW operating software. Basecalling of FAST5 raw data was performed with the high accuracy option using Guppy v6.1.2 (ONT, United Kingdom). The native barcoded samples were demultiplexed using default parameters.
References
- Sakowski EG, Arora-Williams K, Tian F, Zayed AA, Zablocki O, Sullivan MB, et al. Interaction dynamics and virus-host range for estuarine actinophages captured by epicPCR. Nat Microbiol. 2021;6(5):630-42.
- Diebold PJ, New FN, Hovan M, Satlin MJ, Brito IL. Linking plasmid-based beta-lactamases to their bacterial hosts using single-cell fusion PCR. eLife. 2021;10.
Usage notes
The sequencing data generated in this study are available in this repository. The nine FASTQ files that are available for download are as follows:
S1_1_Replicate_1.fastq; S1_1_Replicate_2.fastq; S1_1_Replicate_3.fastq; S1_2_Replicate_1.fastq; S1_2_Replicate_2.fastq; S1_2_Replicate_3.fastq; S1_3_Replicate_1.fastq; S1_3_Replicate_2.fastq; S1_3_Replicate_3.fastq
The downloaded .ZIP file can be decompressed with the unzip command in the UNIX/Linux environment or using the macOS Terminal. All Nanopore sequence filtering and processing steps for epicPCR amplicon reads were carried out using an in-house pipeline (https://github.com/timghaly/Int1-epicPCR). First, the pipeline quality-filters Nanopore reads using NanoFilt v2.8.0 (1), removing those with an average read quality of less than 7 and read length less than 670 bp, which represents the minimum length of an epicPCR product with a cassette-less integron. Quality-filtered reads are then oriented and trimmed with the final nested epicPCR forward and reverse primer sequences using Pychopper v2.7.0 (https://github.com/epi2me-labs/pychopper). Pychopper identifies both primers in each read using edlib v1.2.3 (2), and orients the reads based on the forward and reverse primer sequences. Reads that do not contain both primers in the correct orientation are discarded. The pipeline then clusters the primer-oriented reads into amplicon-specific clusters using isONclust v 0.0.6.1 (3), and performs error correction on each cluster using isONcorrect v0.0.8 (4). Importantly, isONcorrect can jointly use all cassette arrangements of the same integron that occur in different clusters, allowing efficient error correction even for amplicons with low sequencing depths. After error correction, a consensus sequence is generated for each cluster using spoa v4.0.7 (5). All consensus sequences are then pooled, while removing any redundancies, including reverse complement redundancies, using the dedupe programme from the BBTools v35 software package (https://github.com/kbaseapps/BBTools). Next, the consensus sequences are screened for both the sequence of the R519-qacE bridging primer, and then the 157 bp region that includes the 5'-end of intI1 and all of attI1 using blastn from the BLAST v2.2.31 package. Retained sequences must contain both the bridging primer and the intI1/attI1 region. Any sequences containing more than one hit to either of these sequences are considered unintended chimeras and discarded. Finally, sequences are screened for the correctly fused 16S rRNA gene fragment using Metaxa2 v2.2.3 (6). The final output of the pipeline for each sample is a set of full-length, primer-oriented amplicon consensus sequences that contain a complete attI1 sequence, R519-qacE bridging primer, and the V4 hypervariable region of the 16S rRNA gene. CD-HIT v4.8.1 (7) was used to cluster chimeric epicPCR products comprising class 1 integron gene cassette arrays and V4 hypervariable regions of 16S rRNA gene with ≥99% pairwise identity in nucleotide sequences in at least 3 replicates of epicPCR. For epicPCR products that were found in fewer than 3 replicates, the NCBI Genome Workbench was used to perform blastn searches of these sequences against a local database created using all the Nanopore reads obtained in this study. Reads with ≥98% pairwise identity in nucleotide sequences and ≥98% coverage were aligned using the Geneious bioinformatic software (Biomatters, New Zealand), and the consensus sequences were generated from these alignments within each set of epicPCR replicates. Consensus sequences that could be found in at least 3 epicPCR replicates and showed ≥99% pairwise identity in nucleotide sequences and ≥99% identical sites were added to the list of epicPCR products that were previously shortlisted by the CD-HIT algorithm. IntegronFinder 2.0 (8) [parameters: --local-max --gbk --promoter-attI --calin-threshold 1] was used to predict the ORFs and attC sites in the epicPCR products. The sequences of V4 hypervariable regions of 16S rRNA gene sequences obtained in this study were searched against the SILVA 16S rRNA gene database website (9) using the Alignment, Classification and Tree (ACT) tool with a cut-off threshold of 0.80.
References
- De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018;34(15):2666-9.
- Šošic M, Šikic M. Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance. Bioinformatics. 2017;33(9):1394-5.
- Sahlin K, Medvedev P. De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm. J Comput Biol. 2020;27(4):472-84.
- Sahlin K, Medvedev P. Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis. Nat Commun. 2021;12(1):2.
- Vaser R, Sović I, Nagarajan N, Šikić M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 2017;27(5):737-46.
- Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DG, et al. METAXA2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Mol Ecol Resour. 2015;15(6):1403-14.
- Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658-9.
- Neron B, Littner E, Haudiquet M, Perrin A, Cury J, Rocha EPC. IntegronFinder 2.0: Identification and Analysis of Integrons across Bacteria, with a Focus on Antibiotic Resistance in Klebsiella. Microorganisms. 2022;10(4).
- Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41(Database issue):D590-6.