Evaluating the diagnostic capabilities of nanopore sequencing for Borrelia burgdorferi detection in blacklegged ticks
Data files
Jan 13, 2026 version files 28.28 GB
-
all_fastqs.tar.gz
28.28 GB
-
README.md
1.73 KB
Abstract
Nanopore adaptive sampling was performed across seven sequencing experiments on 168 wild-caught adult blacklegged ticks. Nanopore adaptive sampling was leveraged to enrich for all known bacterial pathogens vectored by blacklegged ticks in the US. Raw POD5 files from the sequencing experiments were re-basecalled using the super high-accuracy basecalling model in MinKNOW. The resulting dataset provided here contains the concatenated zipped FASTQ files for the rebasecalled sequence data, with each file representing an individual tick. Unique identifiers correspond to the locations where ticks were collected in Minnesota.
Dataset DOI: 10.5061/dryad.2ngf1vj25
Description of the data and file structure
Blacklegged ticks were collected from three sites across Minnesota, including Washington County, Anoka County, and Winona County. 275 adult blacklegged ticks were collected and analyzed by nPCR to detect the presence of Borrelia burgdorferi. Ticks were morphologically validated before molecular analysis. Positive and negative (control) ticks were randomly selected for sequencing to determine the diagnostic capabilities of nanopore adaptive sampling as a rapid molecular surveillance tool. A total of 168 tick genomic DNA extracts were subject to nanopore sequencing. Seven sequencing experiments (n=24 each) were conducted with nanopore adaptive sampling targeting the whole genome of B. burgdorferi. Each sequencing experiment was performed on individual flow cells using a MinION instrument. This dataset includes the super high accuracy basecalled nanopore sequencing reads as zipped FASTQ files (n=168) for each tick within a zipped tarball file.
Files and variables
File: all_fastqs.tar.gz
Description: Zipped tarball file containing zipped FASTQ files for each sequenced blacklegged tick (n=168). The respective zipped FASTQ files contain nanopore adaptive sampling FASTQs that were concatenated for each individual. Data was rebasecalled using the super high accuracy basecalling model in MinKNOW.
Code/software
To extract the tarball file and obtain the zipped FASTQ files for each adult blacklegged tick that was sequenced:
tar -xzvf all_fastqs.tar.gz
Sample selection
Following PCR analysis, we randomly selected total gDNA extracts from 70% *B. burgdorferi-*positive and 30% B. burgdorferi-negative ticks for nanopore sequencing experiments. Briefly, positive and negative samples were separated into two datasets, and a random subsample was drawn from each. An infection prevalence of 70% and a likely sensitivity of 0.80 (null hypothesis) to 0.90 (alternative hypothesis) requires a total of 153 samples (power: 0.819; p-value: 0.040). Each library preparation kit has the capability to multiplex 24 samples and to ensure a sufficient initial library concentration (> 1 ug), we increased the sample size to 168 total gDNA extracts for sequencing. Each library contained the same proportion (i.e., 70:30) of nPCR-positive and nPCR-negative samples, respectively. To ensure blinding, ticks from negative and positive nPCR-infected classes were selected for a specific run using a random number generator to determine their inclusion and relative barcode order. Samples from negative and positive infected classes were prioritized according to their DNA concentration and quality to maximize the input library concentration for downstream sequencing. All subsequent analyses using sequence data were performed blinded to infection status. ONT native barcoding recommends that each barcode have an initial DNA concentration of 400 ng, and gDNA extracts that did not meet this threshold were subject to vacuum centrifuge concentration using a Vacufuge Plus (Eppendorf). Concentrated gDNA extracts were quantified using a NanoDrop One fluorometer before library preparation. DNA inputs were not normalized before library preparation.
Library preparation
Seven individual libraries, each consisting of 24 ticks, were constructed using the ONT Native Barcoding Kit (SQK-NBD114) following the manufacturer’s instructions for use with a MinION flow cell (R10.4.1). All libraries were prepared identically. DNA ends were initially repaired and prepared for barcode and adapter ligation using the NEBNext FFPE DNA Repair Mix and NEBNext Ultra II End Repair/dA-tailing Module (New England Biolabs Inc., Ipswich, MA). Each sample was incubated at 20 °C and 65 °C for 5 minutes, respectively, before purification and concentration using AMPure XP magnetic beads (Beckman Coulter, A63881). A 1:1X ratio of magnetic beads was used at each bead cleanup step. The samples were then cleaned with 80% EtOH and eluted in 10 μL of nuclease-free water. ONT molecular barcodes (Oxford Nanopore Technologies) were added to the eluate, enabling pooling, using a Blunt/TA Ligase Master Mix (New England Biolabs Inc.), and incubated for 20 minutes at room temperature. Following incubation, 4 μL of EDTA (Oxford Nanopore Technologies), which acts as a chelating agent to inhibit exonucleases from degrading the DNA, was added to each sample, and all samples were pooled into a single tube. The pooled, barcoded library was then purified and concentrated using AMPure XP magnetic beads, incubated at room temperature, cleaned using 80% EtOH, and eluted in 35 μL of nuclease-free water. Native adapters (ONT) were then added to the pooled barcoded library, which serve to anchor DNA fragments to individual nanopores on the flow cell for sequencing, using the NEBNext Quick Ligation Reaction Buffer and Quick T4 DNA ligase (New England Biolabs Inc.), and incubated for 20 minutes at room temperature. The library was then cleaned and purified using AMPure XP magnetic beads, incubated at room temperature, and cleaned using the ONT short fragment buffer. The final pooled library was eluted in 15 μL of Elution Buffer (ONT). At each elution step, the libraries were quantified using a Qubit 4 Fluorometer with the dsDNA HS Assay Kit (Thermo Fischer Scientific, Q32854).
Sequencing and basecalling
Each barcoded library (n=7) was sequenced using individual MinION Flow Cells (R10.4.1) on a MinION Mk1B instrument (ONT). Sequencing experiments were performed on a desktop computer using Pop!_OS with the following specifications: AMD Ryzen 9 7900X 12-core processor (Advanced Micro Devices, Inc., Santa Clara, United States), NVIDIA GeForce RTX 4090 (Nvidia, Santa Clara, United States), 64 GB RAM, 4 TB SSD, and an 8 TB internal hard drive. Sequence experiments were parameterized and monitored using MinKNOW (ONT). Each flow cell was checked to ensure that an adequate (i.e., >800) number of pores were available for sequencing before loading each of the final libraries. Flow cells were primed with a mixture of ONT Flow Cell Flush and Flow Cell Tether (ONT) that was gently loaded into the priming port, avoiding any introduction of air bubbles. Each library was prepared with ONT Sequencing Buffer and Library Beads (ONT) and then loaded onto the flow cell's SpotOn sample port. After loading, sequencing parameters were specified using MinKNOW, specifically enabling the enrichment of specific sequences as they pass through individual nanopores, termed nanopore adaptive sampling (NAS). NAS allows users to selectively enrich or deplete for target sequences, including whole genomes. As a DNA strand passes through the nanopore, the electrical signal is disrupted, producing a distinct change that represents the passing nucleotide sequence, which is basecalled in real-time using Dorado v0.8.1 (ONT). Nanopore adaptive sampling takes the first few hundred bases of the passing nucleotide sequence and maps it to the user-specified reference file. When enrichment is selected, if the passing nucleotide sequence is 70% similar to any sequence in the reference file, the sequencing of that DNA fragment will continue to completion. However, if the sequenced fragment is <70% similar, it is ejected from the nanopore, and the pore is freed to sequence a new strand. Here, each sequencing experiment was performed with an enrichment file containing known DNA-based TBPs vectored by the blacklegged tick: Borrelia burgdorferi sensu stricto (GCF_000008685), Borrelia miyamotoi (GCF_019668505), Ehrlichia muris euclairensis (GCF_000508225), Babesia microti (GCF_000691945), and Anaplasma phagocytophilum (GCF_000439775). Each reference genome FASTA file was downloaded from NCBI, concatenated using the command line into a penultimate reference file, and indexed using minimap2 [21]. The indexed pathogen reference file was selected for enrichment and alignment in MinKNOW before starting each sequencing experiment. Additional sequencing parameters were specified in MinKNOW before sequencing, including read quality (Q>8), minimum read length (>250 bp), the location to write output files, and library preparation kits used to construct libraries. Each flow cell was sequenced to exhaustion (e.g., 72 hours) without washing and reloading individual libraries.
Basecalling and pre-processing
Raw data from each sequencing experiment in POD5 file format were rebasecalled, adapter trimmed, and demultiplexed using Dorado with the “super accuracy” model dna_r10.4.1_e8.2_400bps_supv4.2.0. Passed FASTQ files, which included reads with a quality score greater than 8 and a minimum read length of 250 bp, were extracted and utilized for downstream analysis. Each barcode (n=24 in each of 7 experiments) contained multiple zipped FASTQ files for every sequenced DNA fragment that were concatenated using the command line into single zipped FASTQ files comprising all reads for a given barcode.
