Data from: To pool or not to pool: Pooled metabarcoding does not affect estimates of prey diversity in spider gut content analysis
Data files
Jan 15, 2025 version files 857.43 MB
-
metadata.csv
5.23 KB
-
raw_sequences.zip
856.79 MB
-
README.md
2.31 KB
-
tax.csv
625.66 KB
Abstract
Gut content metabarcoding has provided important insights into the food web ecology of spiders, the most dominant terrestrial arthropod predators. In small invertebrates, like spiders, gut content analysis is often performed on whole body DNA extracts of individual predators, from which prey sequences are selectively enriched and sequenced. Since many spider species are generalist predators, large numbers of samples comprising individual spider specimens must be analyzed to recover an exhaustive image of a spider species’ prey spectrum, which is costly and time-consuming. Pooled processing of bulk samples of multiple specimens has been suggested to reduce the necessary workload and cost while still recovering a representative estimate of the prey diversity. However, it is still unclear if pooling approaches lead to bias in recovering the prey spectrum and if the results are comparable to data from individually processed spiders. Here, we test the effect of metabarcoding pooled spider gut content on the recovered taxonomic diversity and composition of prey. Using a newly adapted primer pair, which efficiently enriches COI barcode sequences of diverse arthropod prey groups while suppressing spider amplification, we test if pooling leads to reduced taxonomic diversity or skewed estimates of prey composition. Our results show that pooling and individual processing recover highly correlated taxonomic diversity and composition of prey. The only exception are very rare prey items which were less well recovered by pooling. Our results support pooling as a cost effective and time efficient approach to recover the diet of generalist predators for population-level studies of spider trophic interactions.
README: Data from: To pool or not to pool: Pooled metabarcoding does not affect estimates of prey diversity in spider gut content analysis
https://doi.org/10.5061/dryad.2jm63xszk
Files and variables
File: raw_sequences.zip
Description: This folder consists of 150 fastaq.gz for different spider species. The raw sequence data from Illumina sequencing in fastq format.
File: metadata.csv
Description: metadata of the samples
Variables
sample_id : unique sample identifier, (naming scheme corresponding only to collector, not any other quality of the sample)
Note that some single specimen samples have been used in multiple DNA pools and therefore occur multiple times in this list, but with different pool IDs and rarefaction depths.type: sample type (single specimen sample = sample consisting of the DNA extracted from a singular spider individual;
DNA pool = sample consisting of multiple "single specimen samples" combined)pool_id: name of the pool sample each sample belongs to.
Note that some single specimen samples have been used in multiple DNA pools and therefore occur multiple times in this list, but with different pool IDs and rarefaction depths.species: spider species of the sample;
diverse= pool mixed from diverse species, see associated individual samples for detailssize_mm: prosoma width in mm;
NA = not applicable to pools, as they contain multiple spider individuals with varying sizesprey_reads: sum of reads obtained from the respective sample that belong to Arthropoda, but not Araneae zOTUs;
rarefied_to: rarefaction depth.
Note that some single specimen samples have been used in multiple DNA pools and therefore occur multiple times in this list, but with different rarefaction depths depending on the pool.
File: tax.csv
Description: taxonomic annotation of the quality-filtered zOTUs
Variables
- otu_id: unique identifier of each zero-radius OTU
- perc: similarity to the NCBI database hit
- len: length of the matching sequence between query sequence and NCBI sequence
- phylum, class, subclass, order, superfamily, genus, species: taxonomic annotation of each otu_id from the NCBI database
- missing values identifier: NA
Methods
Individual spiders were collected using beat sheets, sweep nets, and hand sampling between June and July 2021 at a grassland site in Kimmlingen, Rhineland-Palatinate, Germany (49°49'58.4"N 6°36'05.8"E). Immediately after collection, all specimens were separated into individual tubes with pure ethanol and stored at room temperature. In the lab they were morphologically identified to species level, measured for body size (prosoma width), and stored at -20°C. Adult males were excluded from further analysis due to their reduced feeding activity (Pollard et al., 1995), as were specimens with visible damages to minimize the risk of contamination from external DNA fragments.
DNA was extracted from each spider individually. Prior to DNA extraction, all samples were treated with 0.15% bleach (NaOCl) for 30 minutes according to Greenstone et al. (2012) to remove external DNA contamination. Since entire bodies were used for DNA extraction, they were homogenized for 30 seconds at maximum speed (SPEX 1600MiniG, Metuchen, New Jersey, USA), each with two sterile stainless steel beads in 600µl lysis buffer with 3µl Proteinase K (Invitrogen, Waltham, United States). Cell lysis was then performed at 55°C for 16 hours. Subsequent DNA extraction used the Qiagen Puregene Kit and followed the manufacturer's protocol (Qiagen, Hilden, Germany). GlycoBlue (Invitrogen, Waltham, United States) was added as coprecipitant (1:600) during the DNA precipitation step to visualize DNA and maximize its yield.
To compare the performance of pooling and individual processing of gut content samples, we generated pools of DNA from a diverse set of spider species. We prepared eight pools, each consisting of equal volumes of ten DNA extracts from different spiders. Four pools were species-specific and comprised ten individuals of one species each (either Agelena labyrinthica, Evarcha arcuata, Mangora acalypha or Synema globosum). The other four pools were composed of ten individuals that belonged to different species, but with approximately similar size (XS = 0.5-1.0mm, S = 1.5-2.0mm, M = 2.5-3.0mm, L = 3.0-3.5mm prosoma width, Supplemental table 1). These DNA pools will be referred to as “Pool” hereafter. Each of the spiders in one pool were also amplified and sequenced individually. After DNA sequencing, the same combinations of samples as before were merged computationally (see Fig. 1B, Supplemental table 1), hence creating an exact individually processed replicate of the pooled sample. These computationally merged pools will be referred to as “Reference Pool” hereafter.
Please note that we chose to use pools from extracted DNA rather than to pool spiders before DNA isolation. The latter approach would have made a comparison of the recovered prey richness and composition with individually processed spiders impossible. By pooling DNA extracts, and still processing the same extracts individually, an exact comparison of the effect of pooling on patterns of prey diversity can be observed. The incentive of this study however, is to provide insight into the suitability of extracting DNA from bulk samples instead of individual extractions.
PCRs were performed using the Qiagen Multiplex PCR Kit in 10 µl volumes with 1 µl DNA and 0.5 µl of each 10 µM primer in 5µl Multiplex and 3µl RNAse free water. The PCR amplification was performed in two rounds. The first round consisted of an initial denaturation at 95°C for 15 min, and 32 cycles with an annealing temperature of 45°C (and additionally increments up to 50°C in the gradient PCR) for 90s and extension at 72°C for 90s, omitting final elongation. This PCR used the new primers with 20bp long tails added to the 5’-end as templates for the following indexing PCR. This indexing PCR consisted of 5 cycles of the same protocol as before, but with 56°C annealing temperature to introduce the Illumina TruSeq adapters and dual indices. Amplification success of each PCR step was verified on a 2% agarose gel stained with GelRed. Amplicons were combined into the final library using approximately equal amounts of DNA, depending on their band intensity on the agarose gel. Final libraries were purified with 1:1 AmPURE XP beads (Beckman and Coulter, California, USA) and sequenced in multiple runs on an Illumina Miseq platform with V3 chemistry 300 cycles. To control for contamination, blank extractions and blank PCRs were included in each respective batch and sequenced alongside the experimental samples.
Data analysis
Reads were demultiplexed using CASAVA (Illumina, San Diego, CA, USA) and allowing no mismatches in indices. The demultiplexed reads were then merged using PEAR (Zhang et al., 2014) with a minimum overlap of 50bp and a quality threshold of 20. The resulting merged reads were quality-filtered for at least 90% of bases exceeding Q30, and then converted to FASTA files using the FastX toolkit (Gordon and Hannon, 2010). Valid sequences were selected by retaining only sequences beginning with the forward primer and ending with the reverse primer, allowing for variation only in degenerate sites of the primer sequences. Primer sequences were then trimmed with sed in UNIX.
Reads were dereplicated using USEARCH (Edgar, 2010) and the dereplicated sequences were clustered into zero-radius OTUs (zOTUs) using the unoise3 command (Edgar, 2016) with de novo chimera removal. Taxonomic identity was assigned to zOTU sequences using BLASTn (Altschul et al., 1990) against the complete NCBI nucleotide database (downloaded 12/2022), with the top 10 hits retained. A custom Python script (Schoeneberg, 2023) assigned taxonomy from the BLAST output. Sequences with non-arthropod hits among the top ten BLAST hits were excluded from further analyses. For all others, the first hit was used for zOTU annotation. This resulted in an OTU table consisting only of zOTUs belonging to Arthropoda. Annotated zOTUs were then filtered to a minimal percent identity of 90% and a minimal fragment length of 60bp.