Data from: Whole genome sequencing reveals clade-specific genetic variation in blacklegged ticks
Data files
Feb 14, 2025 version files 109.25 GB
-
ipac_CA_mitogenome.fasta
14.77 KB
-
iscap_MN_mitogenome.fasta
14.74 KB
-
iscap_PA_mitogenome.fasta
14.74 KB
-
iscap_TX_mitogenome.fasta
14.74 KB
-
ixodes_pacificus_CA_rawreads.gz
34.81 GB
-
ixodes_pacificus_CA_Rickettsia_monacensis_assembly.fasta
1.94 MB
-
ixodes_scapularis_MN_filtered_variantcallfile.vcf.gz
386.73 MB
-
ixodes_scapularis_MN_rawreads.gz
28.88 GB
-
ixodes_scapularis_MN_Rickettsia_buchneri_assembly.fasta
2.05 MB
-
ixodes_scapularis_PA_filtered_variantcallfile.vcf.gz
384.30 MB
-
ixodes_scapularis_PA_rawreads.gz
19.83 GB
-
ixodes_scapularis_PA_Rickettsia_buchneri_assembly.fasta
2.07 MB
-
ixodes_scapularis_TX_filtered_variantcallfile.vcf.gz
377.11 MB
-
ixodes_scapularis_TX_rawreads.gz
24.57 GB
-
ixodes_scapularis_TX_Rickettsia_buchneri_assembly.fasta
2.22 MB
-
README.md
5.14 KB
Abstract
Ticks and tick-borne pathogens represent the greatest vector-borne disease threat in the US. Blacklegged ticks are responsible for most human cases, yet the disease burden is unevenly distributed across the northern and southern US. Understanding the genetic characteristics influencing phenotypic differences in tick vectors is critical to elucidating disparities in tick-borne pathogen transmission dynamics. Applying evolutionary analyses to molecular variation in natural tick populations across ecological gradients will help identify signatures of local adaptation, which will improve control and mitigation strategies. In this study, we performed whole genome nanopore sequencing of individual (n=1) blacklegged ticks across their geographical range (Minnesota, Pennsylvania, and Texas) to evaluate genetic divergence among populations. Our integrated analyses identified genetic variants associated with numerous biological processes and molecular functions that segregated across populations. Notably, northern populations displayed genetic variants in genes linked to xenobiotic detoxification, transmembrane transport, and sulfation that may underpin key phenotypes influencing tick dispersal, host associations, and vectorial capacity. Nanopore sequencing further allowed the recovery of complete mitochondrial and commensal endosymbiont genomes. Our study provides further evidence of genetic divergence in epidemiologically relevant gene families among blacklegged tick clades. This report emphasizes the need to elucidate the genetic basis driving divergence among conspecific blacklegged tick clades in the US.
https://doi.org/10.5061/dryad.sbcc2frh8
Description of the data and file structure
Nanopore sequencing was performed on individual eastern blacklegged ticks (Ixodes scapularis) from three locations (MN, PA, TX) and an individual western blacklegged tick (Ixodes pacificus) from one location (CA) throughout the US to uncover genomic and mitogenomic variation among blacklegged tick populations. Each tick was sequenced on an Oxford Nanopore Technologies P2 solo instrument using individual flow cells. Raw read sequence data was basecalled using Dorado’s super high accuracy model. The basecalled raw read sequence data from each tick is provided in the dataset with the label ‘rawreads.gz’. Raw read sequencing files were used to generate summary statistics, perform genome-wide SNP analysis, assemble mitochondrial and commensal endosymbiont genomes, and evaluate phylogenetic relationships.
Files and variables
File: ixodes_pacificus_CA_rawreads.fastq.gz
Description: Nanopore sequencing data for the adult female Ixodes pacificus individual from California, USA. Raw read sequencing data was base-called using Dorado’s super high accuracy model.
File: ixodes_pacificus_CA_Rickettsia_monacensis_assembly.fasta
Description: Whole Rickettsia monacensis genome assembly extracted from the Ixodes pacificus (CA, USA) individual’s raw read sequence data. Rickettsia spp. are commensal endosymbionts commonly found infecting blacklegged ticks.
File: ixodes_scapularis_MN_Rickettsia_buchneri_assembly.fasta
Description: Whole Rickettsia buchneri genome assembly extracted from the Ixodes scapularis (MN, USA) individual’s raw read sequence data. Rickettsia spp. are commensal endosymbionts commonly found infecting blacklegged ticks.
File: ixodes_scapularis_PA_Rickettsia_buchneri_assembly.fasta
Description: Whole Rickettsia buchneri genome assembly extracted from the Ixodes scapularis (PA, USA) individual’s raw read sequence data. Rickettsia spp. are commensal endosymbionts commonly found infecting blacklegged ticks.
File: ixodes_scapularis_TX_Rickettsia_buchneri_assembly.fasta
Description: Whole Rickettsia buchneri genome assembly extracted from the Ixodes scapularis (TX, USA) individual’s raw read sequence data. Rickettsia spp. are commensal endosymbionts commonly found infecting blacklegged ticks.
File: ixodes_scapularis_MN_rawreads.fastq.gz
Description: Nanopore sequencing data for the adult female Ixodes scapularis individual from Minnesota, USA. Raw read sequencing data was base-called using Dorado’s super high accuracy model.
File: ixodes_scapularis_PA_rawreads.fastq.gz
Description: Nanopore sequencing data for the adult female Ixodes scapularis individual from Pennsylvania, USA. Raw read sequencing data was base-called using Dorado’s super high accuracy model.
File: ixodes_scapularis_TX_rawreads.fastq.gz
Description: Nanopore sequencing data for the adult female Ixodes scapularis individual from Texas, USA. Raw read sequencing data was base-called using Dorado’s super high accuracy model.
File: iscap_MN_mitogenome.fasta
Description: Whole mitochondrial genome assembly extracted from the Ixodes scapularis (MN, USA) individual’s raw read sequence data. This fasta file corresponds to GenBank accession number PQ557589.
File: iscap_TX_mitogenome.fasta
Description: Whole mitochondrial genome assembly extracted from the Ixodes scapularis (TX, USA) individual’s raw read sequence data. This fasta file corresponds to GenBank accession number PQ557591.
File: ipac_CA_mitogenome.fasta
Description: Whole mitochondrial genome assembly extracted from the Ixodes pacificus (CA, USA) individual’s raw read sequence data. This fasta file corresponds to GenBank accession number PQ557592.
File: iscap_PA_mitogenome.fasta
Description: Whole mitochondrial genome assembly extracted from the Ixodes scapularis (PA, USA) individual’s raw read sequence data. This fasta file corresponds to GenBank accession number PQ557590.
File: ixodes_scapularis_MN_filtered_variantcallfile.vcf.gz
Description: Variant call file for the Ixodes scapularis (MN, USA) individual generated from variant calling using raw read sequence data. Variants were called against the PalLabHifi reference genome (GCF_016920785.2).
File: ixodes_scapularis_PA_filtered_variantcallfile.vcf.gz
Description: Variant call file for the Ixodes scapularis (PA USA) individual generated from variant calling using raw read sequence data. Variants were called against the PalLabHifi reference genome (GCF_016920785.2).
File: ixodes_scapularis_TX_filtered_variantcallfile.vcf.gz
Description: Variant call file for the Ixodes scapularis (TX, USA) individual generated from variant calling using raw read sequence data. Variants were called against the PalLabHifi reference genome (GCF_016920785.2).
Sample Collection
Questing ticks were collected by dragging a 1-m2 cloth along natural vegetation, stopping every ~10 m to collect attached ticks, from four locations throughout the United States: Minnesota (Anoka County, MN, USA), Pennsylvania (Allegheny County, PA, USA), Texas (Polk County, TX, USA), and California (Riverside County, CA, USA). Collected ticks were preserved in 70% EtOH or RNAlater and sent to the University of Minnesota for processing. Ticks were morphologically identified using keys from Keirans and Clifford (1978), and Cooley and Kohls (1945). Ticks were rinsed with molecular grade water and transferred to Zymo DNA/RNA shield until DNA extraction.
DNA Extraction and Sequencing
DNA was extracted using a Qiagen MagAttract kit (Cat. 67563; Aarhus, Denmark) according to the manufacturer's instructions, with a final elution step extended to 1 h at 37°C. DNA was sheared by 20 passes through a standard 30-gauge insulin syringe. Library prep was performed with an SQK-LSK-114 native ligation sequencing kit (ONT, Oxford, UK). Sequencing was performed on a P2 Solo instrument and basecalled using Nvidia 4090 GPUs. Sequencing was performed over 3 days with nuclease flush and library reload every 24 h. Individual adult females were used for sequencing of I. scapularis and I. pacificus (n = 1) from each location.
Raw data from all runs were basecalled together using Dorado v0.8.1 with the “super accuracy” model dna_r10.4.1_e8.2_400bps_supv4.2.0. Read quality was assessed using Nanoq v0.10.0 (Steinig and Coin 2022).
Variant Analysis
1. Variant Calling
Super-high accuracy basecalled reads from the three female adult I. scapularis (MN, PA, TX) specimens were individually mapped to the I. scapularis reference genome, PalLabHiFi (GCF_016920785.2) (De et al. 2023), using Minimap2 v2.28 (Li 2018). The resulting mapped bam file was sorted and indexed using Samtools v1.20 (Li et al. 2009) and served as input to call variants with Clair3 v1.0.10 (Zheng et al. 2022). Variants with quality scores of < 10 and allele frequencies of < 20% were masked using BCFtools v1.20 (Danecek et al. 2021). Filtering variants with a minor allele frequency of 20% and a quality score of 10 reflects a balance between excluding likely sequence errors and retaining biologically meaningful variation. Filtered variant call files generated alternative whole genome assemblies for each tick by replacing variant sites in the reference assembly using BCFtools. Variants were not called for the I. pacificus individual, as the reference assembly was scaffolded using the I. scapularis reference genome, limiting our confidence in the variants called for this sister species.2. Variant Statistics
Summary statistics were generated for each variant call file using VariantQC v1.05 (Yan et al. 2019) and Nanoq (Steinig and Coin 2022). Genome-wide heterozygosity and SNP density were calculated in 100 kb windows as a proxy for genome-wide diversity using VCFtools v0.1.16 (Danecek et al. 2011). Runs of homozygosity (ROH) were analyzed with Plink v1.90 (Purcell et al. 2007) specifying a sliding window of 100 kb, a threshold of 0.05 for overlapping homozygous windows, a minimum of 100 homozygous SNPs per window, a minimum SNP density of 50 kb per SNP, a minimum of 25 homozygous SNPs per ROH, a maximum of one heterozygous position per window, a maximum of 1000 kb gap between SNPs, and a maximum of 100 heterozygous sites in each final ROH (Dehasque et al. 2024). Genome-wide heterozygosity, SNP density, and runs of homozygosity were visualized using circlize v0.4.15 (Gu et al. 2014) in R v4.2 (R Core Team 2022) (Figure 1).3. Variant Annotation
Filtered variant files were annotated using snpEff v5.2 (Cingolani, Platts, et al. 2012). SnpEff requires an annotation database to perform variant effect prediction. We built a custom database by specifying the reference genome (GCF_016920785.2), including the FASTA and GTF annotation file from NCBI. SnpEff annotates variants and predicts the coding effects of genetic variants by creating a data structure that algorithmically identifies intersecting genomic regions to calculate variant effects (Cingolani, Platts, et al. 2012). Annotated variant call files were sifted using snpSift (Cingolani, Patel, et al. 2012) to partition variant annotations, including the impact and effect of each variant, by the 14 longest scaffolds in the PalLabHiFi assembly. Annotation summary statistics for the whole genome and scaffolds were generated by snpSift.4. Gene Ontology
Gene ontology (GO) analysis was performed using the Database for Annotation, Visualization, and Integrated Discovery v2023q4 (DAVID) (Sherman et al. 2022; Huang et al. 2009). GO discovery was performed on two curated datasets. The first dataset was established to identify regions putatively identical by descent among the three female adult I. scapularis individuals (MN, PA, TX). To do so, runs of homozygosity shared across all three individuals were identified using dplyr v1.1.4 (Wickham et al. 2023) in R. Genes falling within these shared runs of homozygosity were intersected with gene annotation files using BedTools v2.31.1 (Quinlan and Hall 2010) to produce a dataset that contained locations of genes, with their respective gene IDs, shared across the individuals. The second dataset included all genes annotated as having a high impact through snpEff. The gene IDs from the two datasets were then independently used as input into DAVID and extracted as tables. GO terms shared among all individuals, some individuals, or unique to an individual were investigated. GO terms were visualized using ggVennDiagram v1.5.2 (Gao et al. 2021) and ggplot2 v3.5.1 (Wickham 2016).Mitogenome Assembly, Annotation, and Phylogenetics
The mitogenomes of the three I. scapularis (MN, PA, TX) and I. pacificus (CA) individuals were extracted from their respective assemblies and characterized with MitoHiFi v3.2.2 (Uliano-Silva et al. 2023). MitoHiFi identifies mitochondrial contigs from whole genome sequencing datasets, assembles the genome, and annotates it for protein and RNA genes through external tools (e.g., minimap2, Hifiasm, MitoFinder, etc.) (Uliano-Silva et al. 2023). MitoHiFi was explicitly designed to handle long-read sequencing data. To ensure proper assembly and annotation, the consensus FASTA generated from MitoHiFi was polished using Medaka v2.0.1 (https://github.com/nanoporetech/medaka) and annotated using Mitos2 v2.1.9 (Bernt et al. 2013) with specifications for metazoan RefSeq and invertebrate genetic code. Manual annotation correction was performed to correct frameshift mutations resulting in premature stop codons within highly conserved protein-coding genes to ensure consistency with conserved mitochondrial gene structures. All frameshift mutations were found in homopolymer regions (i.e., stretches of consecutive adenine and thymine), a known limitation of nanopore sequencing. Manual correction was also performed to confirm tRNA annotations or added if any annotations were absent. Any tRNA annotation additions were performed using Geneious Prime software v2024.0.7 (https://www.geneious.com) by aligning the gene in question from reference mitogenomes with our polished mitogenome assembly using minimap2. Polished and assembled mitogenomes were visualized using OGDRAW v1.3.1 (Greiner et al. 2019).
After the mitogenomes were annotated, each mitogenome, cytochrome c oxidase subunit I (COI), and 16S gene sequences were used in phylogenetic analyses to determine their relatedness to congeneric Ixodes spp. To build the phylogenies, COI, 16S, and mitogenome sequences were downloaded from GenBank for each available North American Ixodes spp. The genus Ixodes (Acari: Ixodidae) is monophyletic and shares a most recent common ancestor with Metastriata, which resides within the same family, Ixodidae. As such, the metastriate species Amblyomma americanum was chosen as the outgroup and included in each analysis to root the phylogeny. Sequences were aligned using CLUSTALO v.1.2.3 (Thompson et al. 1994), and alignments were used in maximum likelihood analyses in RAxML v8.0.0 (Stamatakis 2014) with the GTRGAMMAI model of evolution and 1000 bootstrap replicates. The final tree was visualized in FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).