Effects of consumer surface sterilization on diet DNA metabarcoding data of terrestrial invertebrates in natural environments and feeding trials
Miller-ter Kuile, Ana; Apigo, Austen; Young, Hillary S. (2021), Effects of consumer surface sterilization on diet DNA metabarcoding data of terrestrial invertebrates in natural environments and feeding trials, Dryad, Dataset, https://doi.org/10.5061/dryad.gqnk98snc
These are data and code from a study examining the potential for surface contamination to influence diet DNA metabarcoding datasets when DNA is sequenced from full body parts (in this case, the opisthosomas of spider individuals). These datasets include the raw sequencing data, all downstream datasets, and taxonomic assignments collected from database searches on BOLD and GenBank (accessed 2019). The code includes code to reproduce all bioinformatics (merge, filter, match to taxonomies, rarefy, sort) as well as all statistics and figures generated from analyses. Raw data are from DNA extractions of predator gut regions (opisthosomas) and amplification of the CO1 gene using PCR. The predator species is Heteropoda venatoria collected individually with sterilized implements and either the diet sequences from their natural diets were extracted or their diets following feeding spiders in a feeding trial.
Field site and collections
Specimens were collected on Palmyra Atoll, Northern Line Islands, USA (5º53’ N, 162º05’W). Heteropoda venatoria is a generalist, active hunting spider species (Heteropoda venatoria). All individuals were stored individually in sterilized containers.
Natural environment consumer collection
In 2015, we collected consumers (n = 47) from natural environments, which had fed on available diet items and come into contact with environmental surfaces, to test if DNA metabarcoding detects diet DNA effectively. We froze all individuals at -80ºC immediately following collection until surface sterilization and DNA extraction in 2019.
Feeding trial consumer set-up and feeding
In 2017, we conducted laboratory trials (n = 26) to test if DNA metabarcoding detects DNA from diet items offered in a contained environment. We created feeding environments from one-liter yogurt containers with holes for air transfer, and placed one H. venatoria in each container. After 12 hours, we placed one large grasshopper (Oxya japonica) in each container and left all containers for 24 hours. We then froze (-20ºC) each H. venatoria that had killed the grasshopper (n = 25, consumption was not easily detectable and thus not considered in analyses). We cleaned all containers between trials with 10% bleach solution.
To test surface sterilization’s efficacy at removing possible contaminants, we used a surface sterilization treatment on ~half the consumers for each set: those collected from the natural environment, and those subjected to controlled feeding trials. We submerged and stirred each (whole) consumer in 10% commercial bleach by volume for two minutes and washed each in deionized water for two minutes. Similar bleach submersion leads to undetectable DNA degradation in similar soft-exoskeleton consumers. Natural environment consumers (2015) had been frozen at -80ºC since collection; we surface sterilized these consumers in a sterilized laminar flow hood in 2019 just before DNA extraction (n = 22 surface sterilized, n = 25 not surface sterilized). We surface sterilized feeding trial consumers (2017) in the lab on the atoll in 2017 following freezing at -20ºC, then stored each in individual vials of 95% ethanol in a -20ºC freezer until DNA extraction (no -80ºC freezer was available at the field station that year) (n = 10 surface sterilized; n = 14 not surface sterilized). Prior to DNA extraction, all samples dried for 1-3 hours in a sterilized laminar flow hood and the opisthosoma (hind gut region) was removed using a sterilized scalpel. Between all steps, tools were sterilized with either ethanol and flame (scalpels and forceps) or 10% bleach (surfaces) between handling each individual.
DNA extraction and removal of consumer DNA with Ampure XP beads
We extracted DNA from each consumer following a modified CTAB extraction protocol. We quantified DNA using a Qubit (Invitrogen) fluorometer with the high sensitivity double-stranded DNA quantification kit. We followed Krehenwinkel et al. (2017) to isolate a proportion of lower molecular weight DNA with Ampure XP beads prior to PCR. We diluted each DNA sample to 20ng/microliter (creating a total sample volume of 40microLiter), mixed each sample using Ampure XP beads (0.75x bead-to-DNA ratio), and kept the supernatant. With the supernatant, we precipitating the DNA pellets with isopropanol and 5M potassium acetate, and washed DNA pellets with ethanol. We quantified this cleaned DNA again using a Qubit fluorometer and diluted all samples to 10ng/microLiter prior to PCR steps. All DNA pellets were stored in and diluted with TE buffer.
PCR amplification, library preparation, and sequencing
We amplified the CO1 gene with general metazoan primers (m1COIF/Fol-degen-rev). We performed all PCR preparation steps in a UV-sterilized biosafety cabinet. We used PCR reaction volumes of 25microLiter (9microLiter nuclease free water, 12.5microLiter GoTaq Green Master Mix (Promega Corp.), 1.25 microLiter of each of the primers (at 10mM), and 1 microLiter of DNA template (at 10ng/microLiter)). We ran each sample in duplicate along with duplicated negative samples each PCR run. PCR reactions are as follows: initial denaturation step at 95C for 3 minutes, then 35 cycles of: 1) 95ºC for 30 seconds, 2) 46ºC for 30 seconds, and 3) 72ºC for one minute, followed by a final 5 minutes at 72ºC. We cleaned PCR products with Ampure XP beads at a 0.8x bead-to-DNA ratio and resuspended from beads using a 10mM TRIS buffer.
We attached Illumina index primers with an additional PCR step following standard protocols (Nextera XT Index Kit v2, Illumina, 2019). We combined duplicate samples for which both duplicates successfully amplified and diluted to a concentration of 5nM. We multiplexed all samples with one negative control and two fungal clone positive controls (GenBank accession numbers: MG840195 and MG840196). We submitted multiplexed samples for sequencing at the University of California, Santa Barbara Biological Nanostructures Laboratory Genetics Core. Samples were run on an Illumina MiSeq platform (v2 chemistry, 500 cycles, paired-end reads) with a 15% spike-in of PhiX. Following sequencing, samples were demultiplexed using Illumina’s bcl2fastq conversion software (v2.20) at the Core facility.
Sequence merging, filtering, and clustering with UNOISE3
We merged, filtered (max ee = 1.0), and denoised (clustered) our sequences around amplicon sequence variants (ASVs) using the UNOISE3 algorithm (unoise3 command in the open-source USEARCH 32-bit version 11.0.667). Prior to denoising with UNOISE3, we used cutadapt (version 1.18) to remove primers from each sequence. We also repeated analyses with the DADA2 algorithm run through R (dada2 package version 22.214.171.124) and with a data cleaning step run through BBSplit to remove consumer DNA prior to ASV assignment (because ASV assignment is abundance-sensitive). We considered analyses from the UNOISE3 algorithm only because UNOISE3 assigned more sequence reads to positive controls than DADA2 (on average, 3x as many reads per positive control) and the cleaning step paired with either DADA2 or UNOISE3 did not increase potential diet DNA detection.
We created a list of unique ASVs and a matrix of ASV abundances across samples. We matched ASVs to taxonomies in the GenBank and BOLD databases. For GenBank, we used BLAST (version 2.7.1) with the blastn command for taxonomic assignment of each ASV using the computing cluster at UC Santa Barbara, comparing against the GenBank nucleotide database with an evalue of 0.01 (downloaded on November 20, 2019). We visualized and exported taxonomic alignment using MEGAN Community Edition (version 6.18.0), using default settings (LCA=naïve, MinScore = 50.0, MaxExpected = 0.01, TopPercent = 10.0, MinSupportPercent = 0.05) and selecting the subtree with all possible diet items for this species (Kingdom: Animalia, Clade: Bilateria). For taxonomies which were not assigned below the order level (n =24), we submitted each ASV individually to the BLAST Basic Local Alignment Search Tool and assigned them a family based on the best sequence match in the database, given that the top ten database matches were from the same family. For BOLD taxonomic assignment, we used the BOLD IDEngine of the CO1 gene with Species Level Barcode Records (accessed February 5-16, 2020; 3,825,490 Sequences, 216,704 Species, and 95,537 Interim Species in database) to match each ASV list to taxonomies. We combined taxonomic assignments from both programs and discarded taxonomic assignments that were mismatched at the family level or higher.
Detection of potential diet items
For consumers from both the natural environment and feeding trials, we asked whether surface sterilization altered detection of potential diet items for each consumer. For natural environment consumers, we examined all potential diet items (which could represent either diet or surface contaminants). For feeding trial consumers, we focused our detection analysis on the offered diet item we provided the consumers in the feeding trial environment (O. japonica, which all consumers were observed to have killed, but not necessarily ingested). We rarefied based on the sample with the lowest sequencing depth which had been sequenced with 95%+ sampling completeness based on iNEXT (version 2.0.20) interpolation and extrapolation methods (16,004 reads for natural environment and 55,205 reads for feeding trial consumers). We rarefied using the rrarefy() function in the vegan (version 2.5.6) package in R and rarefied the field and lab consumers separately.
We then selected all ASVs that matched potential diet items for the natural environment consumers (Kingdom: Animalia; Clade: Bilateria, excluding consumer DNA) and just the offered diet item for the feeding trial consumers (including species: Oxya japonica, genus: Oxya, family: Acrididae, excluding those which only matched to order). Because the consumer species H. venatoria is the only species in the family Sparassidae on Palmyra Atoll, removing consumer DNA meant excluding all ASVs that received a family-level taxonomic assignment of “Sparassidae”. As all ASVs received family-level taxonomic assignment, we pooled ASVs that matched at the family level into one taxonomic unit using cumulative read abundance (i.e. all ASVs matched to diet family A were pooled into diet family A taxonomic unit), a practice common in diet metabarcoding and predator-prey interaction studies.
For potential diet detection and rarefied abundance in both sets of consumers (natural environment and feeding trial) we used generalized linear models to assess the effect of surface sterilization treatment. For prey detection, we used all potential (natural environment) or offered (feeding trial) diet item detection (presence-absence per sample) as the response variable in the full model with surface sterilization as a fixed effect and a binomial distribution. For rarefied diet abundance, we only assessed consumers for which we had detected diet and not those with no diet detection (n = 33 of 37 for natural environment; n = 14 of 19 for feeding trials). For this model, we treated the number of all potential (natural environment) or offered (feeding trials) diet DNA reads per sample as the response variable, surface sterilization treatment as a fixed effect, total read abundance of the sample (constant across all) as an offset term, and a Poisson or negative binomial distribution (to correct for overdispersion when needed). We assessed differences in per sample potential diet richness among sterilization treatments for the natural environment consumers using generalized linear models with the number of potential diet items per sample as the response variable (both family-level taxonomic units or ASVs), surface sterilization treatment as the fixed effect and a Poisson or negative binomial distribution (to correct for overdispersion when needed). We assessed differences in potential diet item composition with family-level taxonomic units between surface sterilized and unsterilized consumers using a presence-absence PERMANOVA model fit with a binomial mixed effects model with surface sterilization treatment as a fixed effect, a random intercept term for potential diet item, and a random slope term for surface sterilization treatment. We also assessed ASV composition as a representation of potential prey composition using a canonical correspondence analysis (CCA) with surface sterilization as a predictor variable.
For all generalized linear models and mixed models, we performed model selection by comparing the full model (including the fixed effect of surface sterilization treatment) to a null model without this effect. All models were called in the glmmTMB package (version 1.0.0) in R (version 3.6.1) We chose the best fitting model based on size corrected AIC values (MuMIn package version 1.43.15). For responses for which the best model included the surface sterilization treatment term, we examined the model summary to determine the standardized coefficients (β) and p-value of the significance between marginal means of the levels of the surface sterilization fixed effect. We assessed model fit using diagnostics in the DHARMa package (version 0.2.7), including tests for heteroskedasticity, and for count models (Poisson or negative binomial), zero inflation and overdispersion. We performed the CCA analysis using the vegan package in R, comparing a model with surface sterilization as a fixed effect to a null model using an ANOVA.
The corresponding raw DNA sequencing data associated with this project can be found on GenBank (BioProject: PRJNA639981).
National Science Foundation, Award: DEB #1457371
National Geographic Society
University of California Faculty Senate Grant
University of California Faculty Senate Grant