Skip to main content

Money spider dietary choice in pre- and post-harvest cereal crops using metabarcoding

Cite this dataset

Cuff, Jordan (2020). Money spider dietary choice in pre- and post-harvest cereal crops using metabarcoding [Dataset]. Dryad.


  1. Money spiders (Linyphiidae) are an important component of conservation biological control in cereal crops, but they rely on alternative prey when pests are not abundant, such as between cropping cycles. To optimally benefit from these generalist predators, prey choice dynamics must first be understood.
  2. Money spiders and their locally available prey were collected from cereal crops two weeks pre- and post-harvest. Spider gut DNA was amplified with two novel metabarcoding primer pairs designed for spider dietary analysis, and sequenced.
  3. The combined general and spider-exclusion primers successfully identified prey from 15 families in the guts of the 46 linyphiid spiders screened, whilst avoiding amplification of Erigone spp. The primers show promise for application to the diets of other spider families such as Agelenidae and Pholcidae.
  4. Distinct invertebrate communities were identified pre- and post-harvest, and changes in spider diet and, to a lesser extent, prey choice reflected this. Spiders were found to consume one another more than expected, indicating their propensity toward intraguild predation, but also consumed common pest families.
  5. Changes in spider prey choice may redress prey community changes to maintain a consistent dietary intake. Consistent provision of alternative prey via permanent refugia should be considered to sustain effective conservation biocontrol.


Primer development and testing

Existing PCR primers were tested and ultimately redesigned to better match the target taxa of this study. Two novel primer pairs were used for amplification of DNA for the dietary analysis of spider gut contents to overcome the problems associated with the taxonomic proximity of spiders and their prey (particularly other spider species). Novel PCR primers were adapted for the exclusion of all spider DNA, with a focus on linyphiids (henceforth spider exclusion primers, titled TelperionF-LaurelinR), based upon a primer site slightly 3’ of the general animal barcoding primers LCO1490 (forward primer Folmer et al. 1994), and mICOIintR (Leray et al. 2013). A second primer pair was employed for broad amplification of both spiders and their prey (henceforth general primers, titled BerenF-LuthienR), based upon mICOIintF (Leray et al. 2013) and HCO2198 (Folmer et al. 1994). Both primer pairs were adapted via base changes designed with reference to mass alignments of invertebrate COI sequences and tested in silico and in vitro. The spider-exclusion primers were designed to overcome the loss of reads to predator DNA, whilst the general primers were designed to avoid the taxonomic biases associated with the exclusion primers.

Mass-alignments of COI sequences were batch-downloaded from GenBank (NCBI) and BOLD (Ratnasingham and Hebert 2007) using PrimerMiner (Elbrecht and Leese 2016) in R v.3.3.4 (R Core Team 2020) to aid visual inspection of existing and novel primer sites. PrimerMiner clusters batch downloads into operational taxonomic units (OTUs) based on sequence similarity and visualises mass alignments of sequence data for primer design. By merging overrepresented and duplicate sequences through taxonomy-independent clustering, PrimerMiner accounts for within-species variation and cryptic species whilst ignoring rare haplotypes (Elbrecht and Leese 2016). Sequences were downloaded for all terrestrial invertebrate orders available, and consensus sequences were created by clustering these into OTUs for each order. The COI sequences were trimmed to include only the Folmer region (Folmer et al. 1994) using Geneious R10 (Kearse et al. 2012) for subsequent use in PrimerMiner. Alignments of prey sequences created via PrimerMiner included cereal crop spiders, in order to find primer sites conserved between a wide range of potential prey, but different for spiders. Where these sites were 100-400 base pairs apart on sequences from one another or from existing primer sites, they were paired, and primers designed (Table 1, Figures S2-S3). Existing general invertebrate primer sites were compared against the PrimerMiner alignments to identify any potential improvements to the primers for the amplification of cereal spider prey. The coverage of primers (% amplified) was determined via PrimerMiner using the same mass alignments used for primer design. PrimerMiner uses a taxonomy-independent database and accounts for adjacent base mismatches and the position of each base in the primer. Primers were also analysed using the online ThermoFisher Scientific Multiple Primer Analyzer tool. After the primers were deemed successful in silico, they were tested in vitro.

The primer pairs were tested in vitro against a wide range of extracted invertebrate DNA including spiders, common spider prey and additional invertebrates. For this, invertebrate samples included those collected from the field site at Burdons Farm, Wenvoe, South Wales, the study site used for subsequent ecological analysis. Invertebrates were found via manual searching, collected via aspirator and placed in microcentrifuge tubes of 100% ethanol. These were identified at 20-50X magnification using a light stereomicroscope and taxonomic keys (Goulet and Huber 1993; Roberts 1993; Unwin 2001; Ball 2008; Barber 2008; Duff 2012; Dallimore and Shaw 2013). Additional invertebrates and DNA were taken from existing archived collections within Cardiff University. Extraction of DNA used DNeasy Blood & Tissue Kits (QIAGEN Inc., Chatsworth, CA, USA) following the manufacturer’s protocol for animal tissue. For predatory invertebrates, DNA was extracted from the lower legs, excluding the femur, to avoid the inclusion of prey DNA in the gut diverticulae and leg coxae (Macıas-Hernández et al. 2018). To verify successful extraction, the DNA and negative controls were amplified via PCR with the Qiagen PCR Multiplex Kit (Qiagen) with 95 °C for 15 minutes to activate the HotStarTaq® DNA polymerase, 35 cycles of 95 °C for 30 seconds, 40 °C for 90 seconds and 72 °C for 90 seconds, respectively, followed by a final extension at 72 °C for 10 minutes using universal invertebrate primers LCO1490 and HCO2198 (Folmer et al. 1994). PCR reactions comprised 25 µl reaction volumes containing 12.5 µl Qiagen PCR Multiplex kit, 0.2 µmol (0.5 µl of 10 µM stock) of each primer, 6.5 µl DNase-free water and 5 µl template DNA. Amplification was confirmed by gel electrophoresis.

Primers were initially tested against a small selection of spider and non-spider DNA. Temperature-gradient PCRs were used to determine the optimal annealing temperatures for the primer pairs selected, with temperatures between 40 °C and 60 °C considered and initial tests starting 5 °C lower than the mean melting temperature of both primers. Inclusion of the Q reagent supplied with Multiplex Kits was trialled for each pair to ascertain whether this could improve performance, but was ultimately excluded in all cases. PCR conditions were: 95 °C for 15 minutes to activate the HotStarTaq® DNA polymerase, 35 cycles of 95 °C, the annealing temperature and 72 °C for 30, 90 and 90 seconds, respectively, and a final extension at 72 °C for 10 minutes. PCR reactions comprised 25 µl reaction volumes containing 12.5 µl Qiagen PCR Multiplex kit, 0.2 µmol (0.5 µl of 10 µM stock) of each primer, 6.5 µl DNase-free water and 5 µl template DNA.  Successful amplification was confirmed by gel electrophoresis. Once optimised to amplify a range of non-spider species whilst amplifying few spiders, or amplify a broad range of all species included, primers were further tested on a broader range of DNA (Table S3).

The TelperionF–LaurelinR primer pair has well-conserved sites, facilitating broad coverage with few degenerate bases necessary. The terminal base at the 3’ end of Laurelin, being a thymine base, critically mismatches with the guanine base present for most spider taxa tested; this should theoretically prevent or at least severely reduce amplification of spiders with little cost to amplification breadth otherwise. The BerenF–LuthienR pair similarly makes use of conserved primer sites employed in other studies but adapted for universal amplification of the focal taxa of this study.

Field collection and identification

Linyphiids were visually located on transects through two adjacent spring barley fields at Burdons Farm, Wenvoe in South Wales (51°26'24.8"N, 3°16'17.9"W), and collected from occupied webs and the ground, in August and September 2017. Each transect comprised 4 m2 searching areas at least 10 m apart and all available linyphiids were collected. Spiders were taken from 20 locations along the aforementioned transects, 10 pre-harvest and 10 post-harvest. Spiders were collected two weeks prior (7-13th August) to harvest (~20th August) of the crop and two weeks after harvest (4-8th September) in crop stubble and placed in 100% ethanol using an aspirator. Ground-active linyphiid spiders were collected when webs were not abundant. Spiders were taken to Cardiff University, transferred to fresh 100% ethanol, adults identified to species-level and juveniles to genus, and stored at -20 °C until subsequent DNA extraction.

Invertebrate prey communities were collected using a converted McCulloch GBV 325 G-vac leaf blower suction sampler for 1 minute over 4 m2 areas near to those from which spiders for DNA analysis were collected. Samples were taken in transects, with 10 samples each pre- and post-harvest (20 total), split evenly between two adjacent fields. Invertebrate prey community samples were taken approximately 10 m apart, in sites near to those from which spiders were collected, with different sites used pre- and post-harvest. Invertebrates were killed with ethyl acetate and stored in 70% ethanol at -20 °C, as these samples were for measurement of the invertebrate community and not for molecular analysis. All invertebrates were identified to family level under an Olympus SZX7 stereomicroscope using morphological keys, except for springtails of Sminthuroidea (Sminthuridae and Bourletiellidae, which were often indistinguishable following vacuum sampling and preservation due to the fine features necessary to distinguish them) which were left at super-family and mites (many of which were immature or in poor condition), which were identified to order level.

Erigone atra and E. dentipalpis (Erigoninae), and Tenuiphantes tenuis (Blackwall, 1852; Linyphiinae) were the focus of our study, although a few juveniles were included from other genera due to the difficulties associated with morphological identification of linyphiid juveniles; these misidentifications were confirmed in the subsequent metabarcoding. In total, 66 spiders were screened (Table S2), unevenly split across the 20 corresponding prey sampling sites. Spiders were washed in and transferred to fresh 100 % ethanol to reduce external contaminants prior to identification using Roberts (1993) morphological key. Abdomens were removed from spiders and washed again in fresh 100 % ethanol. Only abdomens were used for molecular analysis of their gut contents given their higher concentration of prey DNA than that of the cephalothorax (Krehenwinkel et al. 2016; Macıas-Hernández et al. 2018). To ascertain optimal extraction technique, samples were split into two groups. From one group, DNA was extracted from the abdomens via Qiagen TissueLyser II as per the DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany) manufacturer’s protocol and abdomens kept in the lysis buffer during incubation. From the other group, DNA was extracted by splitting the abdomen with a sterile micropestle, swilling it in the lysis buffer and then removing the bulk tissue. Neither method ultimately afforded a significantly greater proportion of prey DNA reads post-amplification (Figure S1), so were combined for analysis. Post-lysis, all extractions followed the DNeasy Blood & Tissue Kit (Qiagen) manufacturer’s protocol but with an extended lysis time of 12 h (recommended: 1-3 h) to account for the complex and branched gut system in spider abdomens (Krehenwinkel et al. 2016). Per 12 spiders, each DNA extraction session included at least one negative control consisting of an empty tube treated identically to the samples.

Primers were labelled with unique 10 bp molecular identifier tags (MID-tags) and samples had a unique pairing of forward and reverse tags for identification of each sample post-sequencing. PCR reactions of 25 µl reaction volumes contained 12.5 µl Qiagen PCR Multiplex kit, 0.2 µmol (2.5 µl of 2 µM) of each primer, 2.5 µl DNase-free water and 5 µl template DNA. Reactions were carried out in the same Veriti Thermal Cycler (ThermoFisher Scientific, Waltham, USA), with annealing temperatures optimised via temperature gradient PCRs in the same machine. PCRs comprised 15 min at 95 °C, followed by 35 cycles of: 95 °C for 30 seconds, the primer-specific annealing temperature for 90 seconds, and 72 °C for 90 seconds; followed by a final extension at 72 °C for 10 minutes. The new primers, designated BerenF-LuthienR (universal) and TelperionF-LaurelinR (spider-excluding), used annealing temperatures of 52 °C and 42 °C, respectively.

Within each PCR 96-well plate, 12 negative (extraction and PCR) and two positive controls were included following Taberlet et al. (2018). Negative PCR controls consisted of DNase-free water. Positive controls comprised known-concentration mixtures of the invertebrate DNA used for primer testing, detailed above, quantified using Qubit dsDNA High-sensitivity Assay Kits (ThermoFisher Scientific) to ascertain any effects of primer bias. All concentrations were standardised at 0.1 ng µl-1 by diluting the DNA in DNase-free water. Five mixtures of different species richness and proportions were prepared (Table S1). A negative control was present for each MID-tag to identify any contamination of primers. Each plate was pooled according to concentrations determined by Qiaxcel Advanced System (Qiagen). Each pool was cleaned via SPRIselect beads (Beckman Coulter, Brea, USA), with a left-side size selection using a 1:1 ratio (retaining ~300-1000 bp fragments). The concentration of the pooled DNA was determined via Qubit dsDNA High-sensitivity Assay Kits, quality-checked via TapeStation 2200 (Agilent, Santa Clara, USA) and all pools sharing the same primer pair were pooled again into a ‘super pool’, thus forming one pool per primer pair. Library preparation for Illumina sequencing was carried out on these cleaned ‘super pools’ via NEXTflex Rapid DNA-Seq Kit (Bioo Scientific, Austin, USA) and samples were sequenced on an Illumina MiSeq via a Nano chip with 2x250 bp paired-end reads (expected capacity ≤1,000,000 reads).

5.2.5. Bioinformatic analysis

The Illumina run generated 405,270 and 482,249 reads using BerenF-LuthienR (universal) and TelperionF-LaurelinR (spider-excluding), respectively. All reads were quality-checked and trimmed in Trimmomatic v0.38 (Bolger et al. 2014) with a minimum quality score and sliding window of 20 and 4 bp, respectively, and a minimum length of 135 bp. The read pairs were aligned via FLASH v1.2.11 (Magoč and Salzberg 2011) and demultiplexed via Mothur v1.39.5 (Schloss et al. 2009), removing the MID and primer sequences. Replicates were removed, and denoising and clustering to zero-radius OTUs (ZOTUs; clustered without % identity to avoid multiple species represented within a single OTU) completed via Unoise3 in Usearch11 (Edgar 2010). The resultant sequences were assigned a taxonomic identity from GenBank via BLASTn v2.7.1. (Camacho et al. 2009) using a 97% identity threshold (Alberdi et al. 2017). The BLAST output was analysed in MEGAN v6.15.2 (Huson et al. 2016). Where the top BLAST hit, determined by lowest e-value, was resolved at a higher taxonomic level than species-level, the results were checked by blasting the sequence manually in GenBank and comparing the results; where possibly erroneous entries were preventing species-level assignment (e.g. poorly-resolved identifications on GenBank), finer resolution was considered. Where ZOTUs were assigned the same taxon, these were aggregated. Given the prevalence of family-level assignments (e.g. Chloropidae), the data were eventually converted to family-level, but were retained at their respective output assignments for clean-up.

To clean data prior to statistical analysis, all read counts less than the maximum read count present in blanks (negative controls and unused MID-tag combinations) for its respective ZOTU were removed. Instances of non-positive control taxa present in positive controls were calculated as a percentage of the maximum read count for that taxon. The greatest of these percentages was used to guide a universal percentage of the maximum read within each taxon to be removed. This accounts for tag-jumping and “bleeding” of over-represented taxa into other samples during sequencing. For BerenF-LuthienR, 0.54 % was optimal, whilst there were no obvious instances for TelperionF-LaurelinR, so the conservative 0.54 % was also applied for that library.

Simultaneously, known lab contaminants (e.g. German cockroach Blatella germanica (L.) and various species for which molecular analysis was recently undertaken that could be differentiated from the target taxa in this study, such as tropical species) were identified and the percentage of these occurrences of the total read count for their respective samples was calculated. The highest of these percentages was used as a guide for the universal percentage of each total sample read count to be removed. This accounts for environmental and lab contamination, and artefacts and errors of the sequencing process, which for BerenF-LuthienR and TelperionF-LaurelinR were 0.43 % and 0.45 %, respectively. The data from the two libraries were then aggregated together, first removing non-target taxa (e.g. fungi) and instances in which predator DNA was amplified (i.e. ZOTUs matching the individual spider’s morphological identity). All taxa were converted to family-level to standardise the taxonomic level since many ZOTUs could not be resolved further; this also increases evenness for subsequent analyses. Whilst all conspecific reads were removed to account for predator amplification, interspecific linyphiid interactions were still retained, thus any counts of linyphiids in the diet exclusively represent the consumption of other species. Finally, read counts were converted to presence-absence data.

Usage notes

Money spider dietary choice in pre- and post-harvest cereal crops using metabarcoding - Dryad dataset

TelperionFLaurelinRflushcrush.csv gives the percentage of each sample comprised of prey reads for the two extraction methods employed: "crushing" which involved beat beating of spider abdomens and retention of all tissue for lysis, and "flushing" which involved splitting the abdomens, swishing them around the tube and removing them prior to lysis.

BerenFLuthienRflushcrush.csv gives the percentage of each sample comprised of prey reads for the two extraction methods employed: "crushing" which involved beat beating of spider abdomens and retention of all tissue for lysis, and "flushing" which involved splitting the abdomens, swishing them around the tube and removing them prior to lysis.

InvertData.csv presents the invertebrate community data of those invertebrates collected by vacuum sampling. Abundances are absolute. The ENNR code corresponds to the grouping for the EcoNullNetR analyses, based upon pre- and post-harvest and the two adjacent fields sampled.

DietaryData.csv presents the data derived from HTS of spider gut contents as presence/absence values. As with the InvertData.csv, the ENNR code corresponds to the grouping for the EcoNullNetR analyses, based upon pre- and post-harvest and the two adjacent fields sampled.

ENNRDietData.csv gives the dietary data presented in a format compatible with EcoNullNetR.

ENNRInvertData.csv gives the invertebrate community data presented in a format compatible with EcoNullNetR.

ENNRDiet.fl.csv gives the allowed links for the EcoNullNetR analysis.

SpiderDietSummary.xlsx gives the dietary data summarised as percentages for each prey taxon and grouping.

PrimerMinerInSilicoResults.csv gives the percentage of each taxonomic database that was deemed amplifiable by each primer pair (with their differences indicating bias).

Mock Community Mix 1-5.csv are five files, each denoting the proportion of reads for each taxon with the two primer pairs, and the expected proportion given the proportions used in the mixes.

HarvestMoneySpiderDiet.R is the R script used to generate these data.

README.csv contains these descriptions.


Biotechnology and Biological Sciences Research Council, Award: BB/M009122/1