Comparison of seven DNA metabarcoding sampling methods to assess diet in a large avian predator
Data files
Aug 23, 2024 version files 41.58 GB
-
fastq_files_a.zip
-
fastq_files_b.zip
-
fastq_files_c.zip
-
fastq_files_d.zip
-
fastq_files_e.zip
-
fastq_files_f.zip
-
fastq_files_g.zip
-
fastq_files_h.zip
-
fastq_files_i.zip
-
fastq_files_j.zip
-
fastq_files_k.zip
-
fastq_files_l.zip
-
fastq_files_m.zip
-
fastq_files_n.zip
-
paprocki_et_all_processed_seqs.zip
-
README.md
-
rlha_new_12S_taxatable_dryad_revision.csv
Abstract
DNA metabarcoding is a rapidly advancing tool for diet assessment in wildlife ecology. Studies have used a variety of field collection methods to evaluate diet, however there is a pressing need to understand the differences among sampling methods and the downstream inferential consequences they may have on our ability to document diet accurately and efficiently. We evaluated seven DNA metabarcoding sampling methods to assess the diet of a large avian predator: Buteo lagopus (rough-legged hawk). We collected beak swabs, talon swabs, cheek (buccal) swabs, cloacal swabs, and cloacal loops from captured birds, and collected fecal samples from both captured and uncaptured birds. We described and compared variation in prey recovery within and among the seven sampling methods and identified appropriate analytical methods to compare diet among individuals sampled via different methods. Beak and talon swabs produced the highest prey detection rates, yielded the greatest prey richness per sample, and contributed the most to an individual’s total prey richness per sampling occasion compared to other sampling methods. Within individuals sampled using five methods during a single capture occasion, cloacal swabs and cheek swabs positively predicted prey richness and average prey mass, respectively, from fecal samples. While all methods identified similar dominant prey taxa that were consistent with prior diet studies, beak and talon swabs detected greater prey richness at both the individual and population levels. We propose a food residue duration hypothesis whereby methods which sample areas containing food DNA consumed from longer and more continuous pre-sampling time intervals explain variation among sampling methods in observed prey richness. Choice of sampling method can influence predator diet characterization and is particularly important if researchers wish to quantify uncommon diet items or compare diet metrics using samples collected via different methods.
README: Comparison of seven DNA metabarcoding sampling methods to assess diet in a large avian predator
https://doi.org/10.5061/dryad.rv15dv4hh
Description of the data and file structure
Diet metabarcoding samples were collected using bristle and foam swabs from Rough-Legged Hawk beaks, talons, cheeks, cloacas and feces. DNA from swabs was extracted in a low-input (clean) DNA laboratory in Moscow, ID. Vertebrates were targeted using the V5 region of the 12S RNA gene using primers from Riaz et al (2011). Libraries were prepped using a two-step protocol with the locus PCR in the first round and an i5/i7 dual-index PCR in the second round. Samples were sequenced in duplicate with rolling positive replicates between libraries (ie, one sample from a previous library included in duplicate in a subsequent library to assess inter-library concordance), a duplicate off-target positive control, and a duplicate no template control in each library. Libraries were run in three batches: a pilot consisting of ~50 samples, and two lanes of a Novaseq 6000 using S4 chemistry, with 6-8 libraries per lane.
Files and variables
File: paprocki_et_all_processed_seqs.zip
Description: Post-processing sequence files for each replicate. For each replicate sample, sequences have been demultiplex, F/R reads have been paired, and reads from the same ASV have been collapsed using fastx-collapser. For each sample, ASVs were filtered to remove sequences shorter than 50 bp or at an abundance of <0.1% (0.001) within each sample.
File: fastq_files_n.zip
Description: A subset of approximately 200 of the original fastq files, two for each replicate, forward and reverse. Files have been demultiplexed and trimmed of adapter sequences using cutadapt. Fastq files were compressed into 14 larger files to make data management easier and comply with Dryad's 1000-file maximum, but are not organized within these files. We recommend downloading all 14 files to ensure all samples have their paired F/R reads.
File: fastq_files_m.zip
Description: A subset of approximately 200 of the original fastq files, two for each replicate, forward and reverse. Files have been demultiplexed and trimmed of adapter sequences using cutadapt. Fastq files were compressed into 14 larger files to make data management easier and comply with Dryad's 1000-file maximum, but are not organized within these files. We recommend downloading all 14 files to ensure all samples have their paired F/R reads.
File: fastq_files_l.zip
Description: A subset of approximately 200 of the original fastq files, two for each replicate, forward and reverse. Files have been demultiplexed and trimmed of adapter sequences using cutadapt. Fastq files were compressed into 14 larger files to make data management easier and comply with Dryad's 1000-file maximum, but are not organized within these files. We recommend downloading all 14 files to ensure all samples have their paired F/R reads.
File: fastq_files_k.zip
Description: A subset of approximately 200 of the original fastq files, two for each replicate, forward and reverse. Files have been demultiplexed and trimmed of adapter sequences using cutadapt. Fastq files were compressed into 14 larger files to make data management easier and comply with Dryad's 1000-file maximum, but are not organized within these files. We recommend downloading all 14 files to ensure all samples have their paired F/R reads.
File: fastq_files_j.zip
Description: A subset of approximately 200 of the original fastq files, two for each replicate, forward and reverse. Files have been demultiplexed and trimmed of adapter sequences using cutadapt. Fastq files were compressed into 14 larger files to make data management easier and comply with Dryad's 1000-file maximum, but are not organized within these files. We recommend downloading all 14 files to ensure all samples have their paired F/R reads.
File: fastq_files_i.zip
Description: A subset of approximately 200 of the original fastq files, two for each replicate, forward and reverse. Files have been demultiplexed and trimmed of adapter sequences using cutadapt. Fastq files were compressed into 14 larger files to make data management easier and comply with Dryad's 1000-file maximum, but are not organized within these files. We recommend downloading all 14 files to ensure all samples have their paired F/R reads.
File: fastq_files_h.zip
Description: A subset of approximately 200 of the original fastq files, two for each replicate, forward and reverse. Files have been demultiplexed and trimmed of adapter sequences using cutadapt. Fastq files were compressed into 14 larger files to make data management easier and comply with Dryad's 1000-file maximum, but are not organized within these files. We recommend downloading all 14 files to ensure all samples have their paired F/R reads.
File: fastq_files_g.zip
Description: A subset of approximately 200 of the original fastq files, two for each replicate, forward and reverse. Files have been demultiplexed and trimmed of adapter sequences using cutadapt. Fastq files were compressed into 14 larger files to make data management easier and comply with Dryad's 1000-file maximum, but are not organized within these files. We recommend downloading all 14 files to ensure all samples have their paired F/R reads.
File: fastq_files_f.zip
Description: A subset of approximately 200 of the original fastq files, two for each replicate, forward and reverse. Files have been demultiplexed and trimmed of adapter sequences using cutadapt. Fastq files were compressed into 14 larger files to make data management easier and comply with Dryad's 1000-file maximum, but are not organized within these files. We recommend downloading all 14 files to ensure all samples have their paired F/R reads.
File: fastq_files_e.zip
Description: A subset of approximately 200 of the original fastq files, two for each replicate, forward and reverse. Files have been demultiplexed and trimmed of adapter sequences using cutadapt. Fastq files were compressed into 14 larger files to make data management easier and comply with Dryad's 1000-file maximum, but are not organized within these files. We recommend downloading all 14 files to ensure all samples have their paired F/R reads.
File: fastq_files_d.zip
Description: A subset of approximately 200 of the original fastq files, two for each replicate, forward and reverse. Files have been demultiplexed and trimmed of adapter sequences using cutadapt. Fastq files were compressed into 14 larger files to make data management easier and comply with Dryad's 1000-file maximum, but are not organized within these files. We recommend downloading all 14 files to ensure all samples have their paired F/R reads.
File: fastq_files_c.zip
Description: A subset of approximately 200 of the original fastq files, two for each replicate, forward and reverse. Files have been demultiplexed and trimmed of adapter sequences using cutadapt. Fastq files were compressed into 14 larger files to make data management easier and comply with Dryad's 1000-file maximum, but are not organized within these files. We recommend downloading all 14 files to ensure all samples have their paired F/R reads.
File: fastq_files_b.zip
Description: A subset of approximately 200 of the original fastq files, two for each replicate, forward and reverse. Files have been demultiplexed and trimmed of adapter sequences using cutadapt. Fastq files were compressed into 14 larger files to make data management easier and comply with Dryad's 1000-file maximum, but are not organized within these files. We recommend downloading all 14 files to ensure all samples have their paired F/R reads.
File: fastq_files_a.zip
Description: A subset of approximately 200 of the original fastq files, two for each replicate, forward and reverse. Files have been demultiplexed and trimmed of adapter sequences using cutadapt. Fastq files were compressed into 14 larger files to make data management easier and comply with Dryad's 1000-file maximum, but are not organized within these files. We recommend downloading all 14 files to ensure all samples have their paired F/R reads.
File: rlha_new_12S_taxatable.csv
Description: The taxonomic table - the full table that contains every ASV* for each replicate sample* (ie it is not a unique list of ASVs), with the associated metadata for each replicate sample including read count, sample type and taxonomic information for the best hit species using both the local and remote blastn queries.
Variables
- sample: The replicate sample. There are replicate x n entries for each replicate, where n is the number of ASVs for that replicate in the processed sequence files provided above. Each replicate corresponds to a well in a 96-well plate that was used in library prep.
- sample_p: The dereplicated sample name. Each sample corresponds to a specific swab. There are generally two replicates per sample, although some samples were repeated across libraries to assess inter-library batch effects, and thus some samples may have 3 or more replicates.
- replicate: The suffix for the replicate. A/B correspond to the first two replicates on a plate. An extra replicate is noted with "PX" where X is the number assigned to the library for the extra replicate(s).
- sequence: The full sequence of the ASV
- reads: The number of reads associated with that replicate and that ASV
- identity: The % identity score (calculated by NCBI blast) for the best hit in the local reference database
- taxa: The binomial species name of the best hit for that ASV in the local reference database
- taxid: The NCBI taxonomic "taxid" number assigned to the best hit taxon or taxa in the local reference database
- phylum: The phylum of the best hit taxa in the local reference database
- class: The class or classes of the best hit taxa in the local reference database
- order: The order or orders of the best hit taxa in the local reference database
- family: The family or families of the best hit taxa in the local reference database
- genus: The genus or genera of the best hit taxa in the local reference database
- bitscore: The bitscore, calculated by NCBI Blast, of the best hit in the local reference database
- tax_num: The number of equally-good best hits in the (generally only occurring when sequences are identical, ie when the resolution is too low for species-level assignment) in the local reference database
- all_species_in_best_hit: A list of all the species that are equally-good best hits.
"No Hit" and N/As in the data
"No hit" means that NCBI Blast returned no acceptable hit by its own internal metrics. This taxatable is not filtered for identity but for downstream analysis, only matches above 98% were included. N/As are present in this data in the following columns: taxid, phylum, class, order, family, genus, bitscore, tax_num, all_species_in_best_hit. N/As are present when a) there was no hit found in BLAST (Taxa is "No Hit", taxid, taxonomy and score fields contain "n/a") or when we could not resolve the hit to species, in which case the taxonomic levels below the level of resolution are marked as "n/a".
Code/software
Trimming adapters was done with cutadapt: https://cutadapt.readthedocs.io/en/stable/
Processing sequences was done using the pipeline found here: https://github.com/sckieran/Metabarcoding_Pipeline_Waits
The pipeline was run on the University of Idaho computing cluster, which uses Rocky Linux and a SLURM job manager. It currently will not work on other clusters, but can be modified to work with other linux clusters that use SLURM, please contact Shannon Blair (sckieran@ucdavis.edu). It can also be modified to run on certain versions of MacOS in the terminal.
Methods
We collected diet samples from Buteo lagopus throughout their North American nonbreeding range in the conterminous United States and Canada during the nonbreeding season (November – March) from 2020 – 2023. We sampled diet from captured birds via: 1) talon swabs, 2) beak swabs, 3) cheek (i.e., buccal cavity) swabs, 4) cloacal swabs, 5) cloacal fecal loops, and 6) fecal samples. We also collected 7) fecal samples from uncaptured free-ranging birds. In total, we collected 592 samples from 189 individuals including 113 captured and 76 uncaptured Buteo lagopus.
All samples were extracted in a laboratory dedicated to low quality and quantity DNA samples. No forms of high-quality DNA were handled or stored in this laboratory. We collected DNA onto three “substrates” as described above: nylon bristle swabs (beak and talon samples), foam swabs (cheek and cloacal samples) and homogenized feces (fecal samples, fecal loops). Each substrate had its own DNA extraction protocol, but the protocol was the same for each substrate regardless of the source of the DNA. DNA was extracted from nylon bristle swabs (beak and talon samples) using the Qiagen DNeasy Blood and Tissue Kit. Samples were vortexed for 15 s to dislodge prey remains cells from the bristles. The DNA/RNA Shield (1 mL) was then split between two new 2 mL microcentrifuge tubes (500 mL in each tube). After overnight incubation, extraction volumes were scaled up to match the amount of starting material. The entire extraction volume for a sample (~3.0 mL) was then spun through one spin column to combine the extraction back into one tube. The extraction was then completed according to the manufacturer’s protocol. DNA was extracted from foam swabs (cheek and cloacal samples) also using the Qiagen DNeasy Blood and Tissue Kit, as has been done in previous diet metabarcoding studies for cloacal swabs in sharks (Clark et al., 2023) and for esophageal and cloacal swabs in Tyto alba (barn owl; Elmore et al., 2023). The swab and 600uL of DNA/RNA shield were transferred to a new 2 mL microcentrifuge tube. Like the bristle swab, extraction volumes were scaled up to match volume of starting material (~1.8 mL) and the entire extraction volume was spun through a spin column. The swab tip was carried through to the first centrifugation step and placed in the spin column to ensure all DNA present made it onto the filter. DNA was extracted from fecal samples using the QIAamp Fast DNA Stool Mini Kit. Samples were vortexed then 200 mL of homogenized sample was used in the extraction. An extraction negative containing no sample was included in each extraction to monitor for potential contamination.
We targeted the V5 region of the 12S Ribosomal RNA gene. We prepared 14 libraries. All samples were run in duplicate. Additionally, each library contained two negative controls (nanopure water instead of DNA) and two positive controls, a low-concentration equimolar mix of tissue-derived DNA from two vertebrate species unlikely to occur naturally in our samples: Ovis canadensis (bighorn sheep) and Puma concolor (cougar).
We used a two-step metabarcoding library preparation, which consisted of two rounds of PCR. Two-step library preparation methods are popular, have been used for a variety of diet studies (Goldberg et al., 2020; Bourbour et al., 2021), and are generally modified versions of Illumina’s amplicon sequencing guidelines (Illumina, 2013). The first round of PCR, the amplicon PCR, amplified the target gene region using primers that also contained overhangs for Illumina sequencing adapters, which were then added in the second round of PCR, the index PCR. We used custom-designed fusion primers for the amplicon PCR. The primers were based on those designed by Riaz et al. (2011) for vertebrates but had small additional heterogeneity spacers (one base pair) to increase complexity, which can improve sequencing yields (Jensen et al., 2019). Additionally, we synthesized two versions of the reverse primer, with different 5’ initial bases to better capture prey richness.
All sequencing was performed at the University of Oregon Genomic and Cell Characterization Core Facility (UO GC3F) in Eugene, Oregon. Libraries were run in three rounds: round one contained our pilot, round two contained libraries 2-7 and round three contained libraries 8-14. Each library was afforded approximately 15% of a PE 150-bp Illumina NovaSeq 6000 SP sequencing lane. We did not use a host-blocking primer because 1) we wanted to maximize the recovery of avian prey taxa that are not well differentiated at 12S, and 2) designing and validating a primer specific to rough-legged hawk was outside the scope of this project. We anticipated high host read counts, particularly within cheek, cloacal, and fecal samples and consequently increased our mean reads per library to maximize prey read recovery.
Bioinformatics
Samples were demultiplexed by the UO GC3F. Next, adapters and primers were trimmed using cutadapt (Martin, 2011). From there, samples followed the pipeline found here: https://github.com/sckieran/Metabarcoding_Pipeline_Waits, including all scripts used. Briefly, forward and reverse reads were merged with PEAR (CC licensed: https://cme.h-its.org/exelixis/web/software/pear/). Next, unique reads (ASVs, no error correction) were pulled from each sample using FASTX-collapser (open source: http://hannonlab.cshl.edu/fastx_toolkit/index.html) and singletons and doubletons were removed to reduce noise. We then filtered sequence data following the general approach outlined in De Barba et al. (2014). First, we removed low frequency noise by relative read abundance and read count. We removed any sequence shorter than 50 bp after merging, and any sequence that was not present at a rate above 0.1% the total read count in any sample. In total, we removed 120,319 ASVs before BLAST. However, to ensure that our filters were appropriately conservative, we separately BLASTed all ASVs removed in this way. A summary of the removed ASVs, including mean and median read count and number of samples that ASV was found in, can be found in the Supplemental Material (section 3; Table S2) along with further details about the filtering methods. Overall, there were no taxa removed from any sample in our final dataset due to the pre-BLAST ASV filters, suggesting that we successfully reduced noise without impacting downstream detections. We built a custom local BLAST-formatted reference database that included all potential prey taxa and several common contaminants and controls. We queried each unique sequence in our filtered samples against our BLAST database and assigned it the lowest unambiguous taxonomic rank present in the top-scoring BLAST hit by bitscore. We then removed ASVs matching at <98% identity. Within each sample, we removed ASVs present in <2 PCR replicates and built consensus ASV profiles between replicates by taking the mean of ASV read counts. We then examined PCR negative read counts and filtered each sample to either: 1) remove ASVs with fewer read counts than the maximum observed in any PCR negative sample (filter1; read count threshold = 4,379); or 2) remove sequences with fewer read counts than the maximum non-host or non-human taxa in any PCR negative sample (filter2; read count threshold = 449). We took additional steps to ensure the geographic and taxonomic accuracy of taxonomic assignments (see Supplemental Material, section 3 for additional details regarding taxonomic assignment and sequence filtering).