Data from: Effect of formic acid treatment on Apis mellifera foraging behavior as assayed using nanopore metabarcoding technologies

Wiese, Claudia L.1 ; Probst, Rodolfo S.1 2 ; Briggs, Heather M.1 ; Steffen, Joshua G.1

Published Jan 28, 2026 on Dryad. https://doi.org/10.5061/dryad.xksn02vw8

Abstract

The Western honeybee (Apis mellifera) is a crucial contributor to worldwide agriculture and ecological health, but is experiencing wide population declines linked to Varroa mite infection. Formic Acid (FA) has been increasingly used to control for Varroa, yet its effects on A. mellifera hives, particularly the impact it might have on their foraging preferences, remain unclear. In this study, we used a combination of pollen DNA metabarcoding with real-time nanopore sequencing to assess how FA treatment influences A. mellifera foraging preferences. DNA sequencing was performed on pollen samples from six University of Utah campus honeybee hives separated into FA and control treatments. Samples were collected before, during, and after FA application. We amplified trnL, a chloroplast DNA region useful for plant identification, using portable sequencers from Oxford Nanopore Technologies (ONT). We detected a significant difference in foraging composition between FA-treated and control hives at the end of the experiment. Control hives foraged from a more diverse array of plant genera. We also found that individual hives have unique foraging preferences independent of treatment. These findings suggest that FA treatment can have an impact on A. mellifera foraging behavior. The magnitude of FA impact, however, on hive foraging repertoire remains unclear. Pollen DNA metabarcoding with nanopore technology is an effective method for analyzing bee foraging patterns and holds significant potential for advancing ecological research on pollination health.

Dataset DOI: 10.5061/dryad.xksn02vw8

Description of the data and file structure

This data contains DNA reads sequenced from pollen DNA collected by Apis mellifera using Oxford Nanopore Technology (ONT). A. Mellifera (n=6) received either Formic Acid treatment (to combat Varroa) or a placebo treatment. Corbicular pollen was collected using pollen traps at the hive entrance Before, During, and After the treatment period.

DNA was isolated from collected pollen samples 1 gram of pollen was diluted in 2 mL of lysis solution. DNA was isolated from collected pollen samples using the Qiagen DNeasy® Plant Pro Kit. DNA samples were individually amplified through PCR, targeting the trnL (UAA) c-d chloroplast intronic region, with forward primer 5’-CGAAATCGGTAGACGCTACG-3’ and reverse primer 5’-GGGGATAGAGGGACTTGAAC-3’. To allow for multiplexing, primers were modified by attaching unique 24-base pair indices to four forward and reverse oligo primers. PCR reactions was performed on all samples for 20 cycles under the conditions of denaturation at 95 °C for 30s, annealing 50 °C for 30s, and extension at 72 °C for 1 minute. DNA libraries were prepared using an in-house pipeline in accordance with ONT protocols. Initial basecalling occurred using ONT MinKNOW (v20.06.2), producing real-time results for initial confirmation of sequencing read lengths of trnL barcode and other initial quality metrics. We then re-basecalled the sequences using ONT’s higher fidelity basecaller (SUP) with Dorado (v0.8.0) to limit error rates in the reads and maximize read output. Reads were demultiplexed into their respective original samples using minibar (v0.24). Long and short reads were filtered out. To create a trnL-specific database, we used the protocol constructed by Robert Greenhalgh (Greenhalgh, R. (2021). Stand: A tool for microbial community analysis. GitHub. Retrieved January 10, 2025, from https://github.com/robertgreenhalgh/stand. We used NCBI’s blastn+ (v2.15.0) to verify sequence identity for all obtained sequences.

Before performing statistical analysis, normalized Relative Read Abundance (RRA) was calculated for each plant genus per sample. RRA data was used in all following statistical analyses. Relative read abundances by treatment were visualized with a stacked bar chart and heatmap from ggplot2. Multivariate analysis was also performed (NMDS, PERMANOVA, etc.). pairwise post-hoc testing was then performed using the RVAideMemoire package. Bar charts were created to further visualize and understand differences between groups.

Files and variables

File: hive_key.csv

Description: Key Information regarding each pollen collection and which treatment the hive received and when in the process.

Variables

Hive and Sample Number (First two letter are the hive name, the number is separate samples from the hive)
Treatment (Control/Placebo or Formic Acid Treatment and Pre-, During- and After- Treatment Application)

Files: Ks5_blasted.txt, Ks6_blasted.txt, Ks7_blasted.txt, Hm1_blasted.txt, Ke9_blasted.txt, Ke8_blasted.txt, Ke7_blasted.txt, Ke1_blasted.txt, Ke10_blasted.txt, Ke2_blasted.txt, Hw1_blasted.txt, Km2_blasted.txt, Ke6_blasted.txt, Ke3_blasted.txt, Ke5_blasted.txt, Km1_blasted.txt, Hm2_blasted.txt, Ke4_blasted.txt, Ks3_blasted.txt, Ks2_blasted.txt, Ks4_blasted.txt, Hm6_blasted.txt, Ks1_blasted.txt, Hm5_blasted.txt, Hm7_blasted.txt, Hm4_blasted.txt, Kn1_blasted.txt, Hm3_blasted.txt, Kn2_blasted.txt

Description: These are the NCBI blastn+ (v2.15.0) taxonomic results from sequenced trnL gene DNA from pollen samples organized each collected pollen sample. The top Blast hit was chosen for each DNA read. This is the raw data that all further analysis stems from.

File: (1)relative_abundance_chart.R

Description: This is the first R script where the Blast data was process, ultimately pulling out the top genera match from each DNA sequence. After some data exploration, a relative reads abundance graph is produced.

File: (2)heatmap.R

Description: This is the second R script and produces a heatmap with the same data files that were produced in the file (1)relative_abundance_chart.R.

File: (3)NMDS_dispersion_PERMANOVA.R

Description: In this R script, the data from (1)relative_abundance_chart.R is used to calculate Relative Read Abundance and this data is used for all further statistical analysis. Then the data is tested for normal distribution, Hellinger transformation is applied to the data. Then Dispersion, NMDS and PERMANOVA's are performed sorted by both hive treatment and location. Figures are produced at the end.

File: (4)pairwise_comparisons_postPERMANOVA.R

Description: Post-hoc testing is performed of the PERMANOVA results calculated in (3)NMDS_dispersion_PERMANOVA.R.

File: (5)richness_shannon_diversity_graphs.R

Description: Further statistical analysis is performed on diversity metrics.

File: (6)back_to_back_barplots.R

Description: Back to back barplot for samples of interested are calculated and then graphed.

File: location_condition_RRA_barchart.csv

Description: This is a data files that has all pertinent DNA read data information organized for compatibility for the file: (6)back_to_back_barplots.R.

File: rbind_5-30.csv

Description: This is a comprehensive file with reads organized for compatibility with file: (3)NMDS_dispersion_PERMANOVA.R

File: RRA_data_5-30.csv

Description: This is a comprehensive file with all RRA DNA reads organized for compatibility with file: (3)NMDS_dispersion_PERMANOVA.R

File: hive_key_5-30.csv

Description: This is a comprehensive file with all sample data organized for compatibility with file: (3)NMDS_dispersion_PERMANOVA.R

Code/software

All analyses and figures were generated using different packages in R version 4.4.1 in Rstudio. Packages used include:

library(vegan)

library(tidyverse)

library(readxl)

library(ggplot2)

library(patchwork)

library(viridis)

library(pheatmap)

library(grateful)

Methods

Here, we investigated the effect of Formic Acid (FA) treatment on Apis mellifera foraging using six hives located at the University of Utah campus in Salt Lake City, Utah, USA. We leverage metabarcoding technologies on the collected pollen to understand foraging patterns under different FA treatments.

Experiment design

Hive location and sampling

The study utilized six hives, maintained by the University of Utah Beekeeper Society, situated at two different locations: four hives near the Kahlert building (40.766374, -111.837801) and two hives near the Health Science Education Building (HSEB) building (40.769540, -111.834158). Hives from both groups were selected to be either treated with FA or be a control. The Kahlert hives (denoted by first letter ‘K’) corresponded to 2 controls: Km, Kn; 2 FA: Ke, Ks, while HSEB hives (denoted by first letter ‘H’) corresponded to 1 control: Hm; 1 FA: Hw. Kalhert and HSEB hives were 0.47 km apart.

Pollen collection

Pollen was collected on three occasions: (1) immediately preceding FA or placebo treatment (4/25/2022), (2) three days after applying FA or placebo treatment (4/28/2022), and (3) on the final day of FA or placebo treatment (5/5/2022) (Figure 1 in associated paper).

Pollen was collected with gridded traps placed at the entrance of each hive (Figure 1). These traps are designed to remove the pollen grain from the bees’ corbicula once they return from a foraging trip by forcing workers to enter the hive through a small grid that dislodges the pollen, which then falls into a collection basket. To prevent workers from attempting to enter through other available openings, those were sealed with duct tape. During times when pollen was not being collected, the grid trap was raised.

Before each pollen trap collection, collection baskets were sterilized in lab and lined with aluminum foil to minimize any cross-contamination from previous uses. At the start of pollen collection, the grid trap was lowered. After approximately 120 minutes, pollen was collected from the basket and the grid trap was once again lifted. Exact times for the collection of pollen from pollen traps in the study can be found in Supplemental Table 1. Collected pollen from the baskets was immediately transferred into sterilized plastic vials, labeled, and stored at -20 °C.

Formic Acid and Control Treatment

FA was applied to the hives in the form of Mite Away Quick Strips®, a FA polysaccharide gel strip. As suggested by the manufacturer, two strips were placed between the bottom and second-bottom super to allow for maximum natural fumigation as the acid diffused upwards through the hive. Typically, treatment should be removed in 7 days, but rainy conditions prohibited removal of treatment until 10 days after placement. Control hives received lightly water-soaked pads as placebos, placed in the same manner. Following the treatments, all hives were monitored to confirm the queen's survival.

DNA extraction

DNA was isolated from the pollen samples using the Qiagen DNeasy® Plant Pro Kit. To prepare the sample for DNA extraction, 1 g of pollen was diluted in 2 mL of lysis solution in a 50 mL tube. Pollen grain exine disruption was achieved by vortexing 200 mg of pollen in a glass bead tube for 15 minutes. DNA extraction then proceeded as per the manufacturer’s protocol with the slight modifications of adding 150 µL more APP Buffer than recommended and using only 50 µL of EB Buffer for final elution, the minimum amount recommended.

DNA extraction was successfully performed on a total of 29 samples, 13 samples (8 in the FA group, 5 in the control) represent data taken before any treatment was applied. 6 samples (4 in the FA group, 2 in the control) represent data taken 3 days after the placement of treatment. Lastly, 10 samples (6 in the FAgroup, 4 in the control) represent data from pollen collected at the end of treatment, 10 days after the initial application.

Final DNA concentration was assessed using a Thermo Fisher Scientific™ NanoDrop. A ratio above 1.8 DNA absorbance at 260nm/280nm is recommended by the manufacturer for sequencing, although numbers close to approaching the 1.8 threshold were used in this study since these samples experienced sequencing success. DNA concentration and exact DNA absorbance for each extraction can be found in Supplemental Table 2. Three replicates from the same pollen collection sample were completed when possible. If the collected samples did not yield enough pollen for three replicates, the maximum number of replicates possible was performed based on the available pollen. Lastly, to limit bias in subsequent steps, all DNA extractions were diluted or concentrated to 180 ng/µL.

Library Prep & sequencing

PCR with Indexed Barcodes

DNA samples were individually amplified through PCR, targeting the trnL (UAA) c-d chloroplast intronic region, with forward primer 5’-CGAAATCGGTAGACGCTACG-3’ and reverse primer 5’-GGGGATAGAGGGACTTGAAC-3’. Fragment length of this region is highly variable, usually ranging from 250 bp to 750 bp (Taberlet et al., 2007).

To allow for multiplexing, primers were modified by attaching a unique 24 base pair index to 4 forward and 4 reverse oligo primers (Supplemental Table 3). These 4 unique forward and reverse primers allowed for a total of 16 uniquely indexed PCR samples to be pooled for subsequent sequencing and demultiplexed later.

PCR reactions of extracted DNA from pollen collections (n = 29) were performed in a total volume of 31 µL containing 15 µL of Thermo Fisher Scientific™ Phusion Hot Start Master Mix, 5 µl of nuclease free water, 5 µL of DNA, 3 µL of 5 µM forward, and 3 µL 5 µM reverse primers. PCR conditions consisted of initial denaturation at 95 °C for 2 minutes, followed by 20 cycles of denaturation at 95 °C for 30s, annealing 50 °C for 30s, and extension at 72 °C for 1 minute, concluded by a final extension at 72 °C for 2 minutes. While other studies amplifying this region used 30-35 cycles, our preliminary testing of PCR product concentration through gel-electrophoresis found that 20 cycles produced enough amplicon product for nanopore sequencing (Taberlet et al., 2007).

Amplicon concentration after PCR was measured using a Qubit™ dsDNA HS Assay Kit and the Qubit™ 4.0 fluorometer. To verify correct fragment length of amplicons as expected from trnL, PCR reactions were also run on a gel-electrophoresis. Amplicons were then pooled in equimolarity. In order to adequately sort samples with unique indexing combinations, we divided PCR amplicons into two pools: Pool 1 contained 16 multiplexed samples and Pool 2 contained the remaining 13 multiplexed samples. PCR clean-up was performed on both pools using a homemade mixture of AMPure XP beads. The cleaned amplicons were eluted in 30 µL of nuclease-free water and stored at -20 °C.

Library Preparation & Sequencing

DNA libraries were prepared using an in-house pipeline in accordance with ONT protocols. Library preparation first consisted of end-tail repair and A-tailing reagents using NEBNext® Companion module for ONT Ligation Sequencing. Ligation of adapter attachment and loading preparation was performed using ONT Ligation Sequencing Kit V13, SQK-LSK110. Pool 1 was processed for the MinION Flow Cell (R9.4), which includes an extra step of loading a priming mix to prepare the MinION flow cell. Pool 2 was processed for the MinION Flongle Flow Cell (R9.4). Sequencing was performed on the ONT MinION Mk1C device. Experiments were run for 48 hours or until pore exhaustion.

Bioinformatics

Basecalling

Initial basecalling occurred using ONT MinKNOW (v20.06.2) producing real-time results for initial confirmation of sequencing read lengths of trnL barcode and other initial quality metrics. We re-basecalled the sequences using a higher fidelity basecaller (SUP) with Dorado (v0.8.0) to limit error rates in the reads and maximize read output. To see the impact of higher fidelity basecalling see Supplemental Table 4. After re-basecalling, the reads were categorized as either pass or fail based on a Phred quality score threshold of 8. NanoPlot (v1.19.0) was run on re-basecalled sequences to confirm read quality and read length distribution.

Demultiplexing & Denoising

Reads were demultiplexed into their respective original samples using minibar (v0.24; California Academy of Sciences, 2018). Arguments for minibar require specification of allowed editing distance within the indexes and primers. To maximize the number of reads successfully demultiplexed into samples, various editing distances for the indexes and primers were evaluated. The maximum sorted reads and minimum unknown or multiple matched reads occurred at an allowed editing distance of 9 (“-e 9”) (Supplemental Table 5). Indexes and primers were removed once demultiplexing was complete.

Next, long and short reads were filtered out to reduce the presence of errors from sequencing, remove concatenated reads, and ensure sequences accurately represented the trnL c-d region. To account for significant length variability of the target region (Taberlet et al., 2007), we excluded reads longer than 1050 bp and shorter than 150 bp, using usearch (v 2.15.0; Edgar, 2010). See Supplemental Table 6 for the number of reads removed per sample.

Database Curation

To create a trnL-specific database we used the protocol outlined on the GitHub page, constructed by Robert Greenhalgh (2021). Using the search terms "biomol_genomic[PROP] AND is_nuccore[filter] AND chloroplast[filter]” within the entire NCBI Blast database, a collection of chloroplast-only sequences was compiled. This chloroplast database was further refined to include only trnL sequences by filtering to include only sequences containing the forward 5’-GGGCAATCCTGAGCCAA-3’ and reverse 5’-CCATTGAGTCTCTGCACCTATC-3’ primers. The final database contained 18,462 sequences, each representing a unique taxonomic unit.

BLAST

We used NCBI’s blastn+ (v2.15.0) to verify sequence identity for all obtained sequences. In order to maximize use from reads, taxonomic identity was assigned to genus level. All top matches (best bit score) were included regardless of percent identity match or e-value (San Martin, 2024). To limit the impacts of shallow read-depth, reads that made up less than 1 % of a sample total were removed (following similar recommendations from Leponiemi et al., 2023 and Lee et al., 2018).

Statistical Methods

Before performing statistical analysis, normalized Relative Read Abundance (RRA) was calculated for each plant genus per sample. This RRA data was used in all following statistical analyses. All analyses and figures were generated using different packages in R version 4.4.1 (R Core Team, 2024) using the Rstudio interface (RStudio Team, 2023). P-values of < 0.05 were considered statistically significant.

To visualize the data, a relative read abundance stacked barchart and heatmap organized by treatment was created using ggplot2 (v3.5.1; Wickham, 2016). To identify differences in the data between treatments and location, we used multivariate species analysis techniques. Results were separated in their respective 6 treatments: Pre-Formic Acid, Pre-Control, During-Formic Acid, During-Control, Post-Formic Acid, and Post-Control. First, we transformed the data using Hellinger-transformation (“hellinger_transformed_data”) to reduce the influence of high or low abundance genera. All following calculations were performed with this transformed data (“data”). We then calculated distance matrices between genera using the “vegdist” command from the vegan package (v2.6.4; Oksanen, 2022). To understand the dispersion effect on the distance data (“distance_data”) we used the “betadisper” command from vegan, with “type = centroid”. In order to quantify the relationship between the different treatment groups and/or location (“treatment” / “location”), permutational analysis of variance (PERMANOVA) was performed on the distance data. The following structure was used for this calculation, definitions for included data and variables are defined above:

adonis2(distance_data~as.factor(data$treatment), data=distance_data, permutations=9999)

Since the result was significant, pairwise post-hoc testing was then performed using the RVAideMemoire package (v0.9.83.1; Herve, 2023) to identify potential significance between specific treatment groups (Pre-Formic Acid, During-Formic Acid, Post-Formic Acid, Pre-Control, During-Control, or Post-Control) or location (Hm, Hw, Ke, Km, Kn, or Ks). The following structure was used for these calculations:

pairwise.perm.manova(distance_data, hellinger_transformed_data$treatment, p.method = "holm")

To visualize orientational differences between treatment groups, we applied the unconstrained, Non-Metric MultiDimensional Scaling (NMDS) modeling from the vegan package to the Hellinger-transformed data using the “metaMDS” command from vegan (v2.6.4; Oksanen, 2022).

To further visualize and understand differences between groups, barplots were created. Diversity metrics including Shannon Diversity Index and genera richness were calculated using the vegan package. All graphs were created using the RStudio package ggplot2 (v3.5.1; Wickham, 2016).