Data from: Going mobile: Using portable genomic technologies for PCR-free in situ species identification and real-time molecular systematics
Data files
Sep 17, 2025 version files 8.29 GB
-
bat_cytb_alignment.phy
52.77 KB
-
CUL01_allhac.fastq
3.07 GB
-
CUL02_allhac.fastq
1.46 GB
-
CUL06_allhac.fastq
186.95 MB
-
culicidae_COI_alignment.phy
48.82 KB
-
insect_mitoDec2022.fasta
52.35 MB
-
Kipp_etal_Computational_Notebook.html
672.92 KB
-
mammal_mito20June2022.fasta
6.76 MB
-
monodelphis_cytb_ualignment.phy
42.44 KB
-
PHL01_allhac.fastq
1.47 GB
-
rattus_cytb_alignment.phy
56.50 KB
-
README.md
5.04 KB
-
sandfly_COI_alignment.phy
48.20 KB
-
TK217506_allhac.fastq
115.82 MB
-
TK217507_allhac.fastq
271.03 MB
-
TK217513_allhac.fastq
48.36 MB
-
TK217523_allhac.fastq
267.27 MB
-
TK217562_allhac.fastq
85.27 MB
-
TK217563_allhac.fastq
109.25 MB
-
TK217600_allhac.fastq
466.70 MB
-
TK217608_allhac.fastq
457.53 MB
-
TK217611_allhac.fastq
225.69 MB
Abstract
Across the globe, anthropogenic environmental changes are threatening animal biodiversity and contributing to the emergence of vector-borne and zoonotic pathogens through host range shifts. To combat these challenges, accurate and timely biodiversity assessments and molecular species monitoring efforts are critical. Here, we document how implementation of a portable laboratory in combination with targeted long-read nanopore sequencing can facilitate in situ genomic and systematic analyses across several animal taxa. Working at two ecologically divergent field sites in Guyana, South America, we collected small mammals and blood-feeding insects, including bats, rodents, a marsupial, mosquitoes, and a phlebotomine sand fly. For each specimen sampled, genomic DNA was extracted in the field and used for preparation of nanopore sequencing libraries. For field sequencing, we utilized a novel software-based targeted sequencing approach—nanopore adaptive sampling (NAS)—that enabled selective sequencing of mitochondrial reads using mitogenome assemblies of related taxa as enrichment targets. Basecalled reads from our field sequencing experiments were used to assemble complete mitogenomes and to generate mitochondrial biomarker consensus gene sequences for all nine small mammals and four blood-feeding insects sequenced. Confirmatory molecular identifications were made with a combination of local nucleotide BLAST queries and maximum likelihood analyses using biomarker consensus sequences. Importantly, the mitogenome-based targeted sequencing strategies outlined here are amplification-free and allowed us to bypass time-consuming and potentially troublesome PCR-based methods in the field, streamlining library preparation, sequencing experiments, and on-site analyses. Our findings describe targeted sequencing with NAS as an effective tool for implementation into portable laboratories to widely enhance field-based biodiversity monitoring and rapid molecular species assessments across vertebrate and invertebrate hosts of consequential emerging pathogens.
Dataset DOI: 10.5061/dryad.bnzs7h4p2
Description of the data and file structure
These sequence data were generated in the field between 5 – 21 June 2022, at two field sites in Guyana, South America, and include multiple species of field-collected small mammals and blood-feeding insects. Libraries were prepared for sequencing on flow cells with R9.4.1 chemistry and using the SQK-LSK109 library prep kit with EXP-NBD114 barcodes. Sequencing was performed on a Linux laptop with a 16x 11th Gen Intel Core i7 processor, Nvidia GeForce RTX 3080 Ti laptop 16 GB GPU, and Ubuntu 18.04 OS. Adaptive sampling was enabled during sequencing experiments to target mitochondrial DNA for both insects and small mammals using whole mitogenome assembly files obtained through NCBI RefSeq and deposited here in FASTA format.
Raw sequence data contained here were post-hoc basecalled at each field site following completion of each sequencing experiment using the 'HAC' basecalling model within ONT's guppy basecaller v6.0.7. All code and shell commands used to perform quality filtering, read mapping, and downstream analyses are contained in the included HTML notebook.
Files and variables
File: CUL01_allhac.fastq
Description: Concatenated and HAC-basecalled reads for insect sample 'CUL01'
File: CUL06_allhac.fastq
Description: Concatenated and HAC-basecalled reads for insect sample 'CUL06'
File: TK217506_allhac.fastq
Description: Concatenated and HAC-basecalled reads for rodent sample 'TK217506'
File: CUL02_allhac.fastq
Description: Concatenated and HAC-basecalled reads for insect sample 'CUL02'
File: TK217513_allhac.fastq
Description: Concatenated and HAC-basecalled reads for bat sample 'TK217513'
File: TK217507_allhac.fastq
Description: Concatenated and HAC-basecalled reads for rodent sample 'TK217507'
File: TK217562_allhac.fastq
Description: Concatenated and HAC-basecalled reads for bat sample 'TK217562'
File: TK217563_allhac.fastq
Description: Concatenated and HAC-basecalled reads for bat sample 'TK217563'
File: TK217523_allhac.fastq
Description: Concatenated and HAC-basecalled reads for bat sample 'TK217523'
File: TK217611_allhac.fastq
Description: Concatenated and HAC-basecalled reads for bat sample 'TK217611'
File: TK217600_allhac.fastq
Description: Concatenated and HAC-basecalled reads for bat sample 'TK217600'
File: TK217608_allhac.fastq
Description: Concatenated and HAC-basecalled reads for marsupial sample 'TK217608'
File: PHL01_allhac.fastq
Description: Concatenated and HAC-basecalled reads for insect sample 'PHL01'
File: bat_cytb_alignment.phy
Description: Multiple sequence alignment used for maximum likelihood analysis and containing field-generated cytb consensus sequences for all Chiroptera samples evaluated
File: culicidae_COI_alignment.phy
Description: Multiple sequence alignment used for maximum likelihood analysis and containing field-generated COI consensus sequences for all Culicidae samples evaluated
File: monodelphis_cytb_ualignment.phy
Description: Multiple sequence alignment used for maximum likelihood analysis and containing field-generated cytb consensus sequences for the Monodelphis sample evaluated
File: rattus_cytb_alignment.phy
Description: Multiple sequence alignment used for maximum likelihood analysis and containing field-generated cytb consensus sequences for the two Rattus samples evaluated
File: sandfly_COI_alignment.phy
Description: Multiple sequence alignment used for maximum likelihood analysis and containing field-generated COI consensus sequences for the phlebotomine sand fly sample evaluated
File: Kipp_etal_Computational_Notebook.html
Description: Computational notebook containing all code and shell commands used to filter raw FASTQs, assembled mitochondrial genomes, and perform systematic analyses
File: mammal_mito20June2022.fasta
Description: Adaptive sampling enrichment file containing assorted mammal reference mitochondrial genomes from NCBI RefSeq
File: insect_mitoDec2022.fasta
Description: Adaptive sampling enrichment file containing assorted insect reference mitochondrial genomes from NCBI RefSeq
Code/software
Code and shell commands used are contained in the accompanying HTML notebook.
Access information
Other publicly accessible locations of the data:
- Raw data will be deposited on NCBI SRA and consensus cytb and COI barcode sequences will be deposited on NCBI GenBank upon publicaiton
Data was derived from the following sources:
- Mitogenome enrichment FASTA reference files used to perform targeted sequencing experiments were derived from data deposited on NCBI RefSeq
