Targeted Next Generation Sequencing data for European Green Crab

Published Nov 02, 2021 on Dryad. https://doi.org/10.5061/dryad.pvmcvdnn0

Abstract

In the northeast Pacific Ocean there is high interest in developing eDNA-based survey methods to aid management of invasive populations of European green crab (Carcinus maenas). Expected benefits are improved sensitivity for early detection of secondary spread and to assess the outcome of eradication efforts. A new eDNA-based approach we term ‘Targeted Next Generation Sequencing (tNGS)’ is introduced here and shown to improve detection relative to qPCR at sites with lower green crab CPUE values measured by trapping. DNA standards (gBlock) with starting molecule copies that were 10- to 100- times lower than the qPCR limit of detection returned significant numbers of sequencing reads, which in our field assessments translated to a 7% - 10% increase in detection probability from tNGS relative to qPCR at sites with lower CPUE. We also found the number of sequencing reads from tNGS was significantly correlated with green crab CPUE whereas Ct values from qPCR were not. When sources of variation were partitioned for each assay, we found the difference between mean within-site and mean between-site variation was much larger and had non-overlapping confidence intervals for tNGS relative to qPCR, suggesting the former may offer more power for detecting spatial variation in eDNA availability. Results presented here indicate this approach is suitable for species of known low abundances where a positive detection has high economic or environmental consequences. Any species with an existing qPCR assay can be easily tested with a tNGS assay. We conclude with a discussion on the fit for purpose applications of tNGS vs. qPCR on how to best apply molecular surveys in management programs.

Methods:

qPCR:

Quantitative PCR was performed for replicate samples by Roux et al. (2020) and was performed here for site and zooplankton samples following the same protocol exactly and in the same lab. The assay targets a 100 bp fragment of the mitochondrial cytochrome oxidase sub-unit I gene region (Roux et al., 2020). The TaqMan™ MGB probe (Applied Biosystems®) assay successfully underwent optimization and specificity testing (Roux et al., 2020). The assay LOD at 50% of replicates (Biggs et al., 2015; Harper et al., 2018) was Ct = 38 and LOQ at 100% replicates was Ct = 35 (Roux et al., 2020). However, qPCR results from eDNA samples in the current study will be used strictly in a qualitative manner for relative comparisons, therefore, we adhere the recommendations of Klymus et al. (2020) to use eDNA qPCR Ct values up to 40 (the number of PCR cycles performed) within this context. Triplicate qPCR reactions were performed for each eDNA and zooplankton sample as described in the data sets section (see 2.3.1). Each reaction plate included a triplicate artificial positive gBlock control in a 10-fold dilution series from 1 x 10-4 to 1 x 10-11 gene copies/uL. Results from plates with a positive control coefficient of determination (R2) > 0.98 were used. PCR inhibition was tested in eDNA and zooplankton samples by spiking gBlock in an eDNA sample that had tested negative to a final concentration of 2.17 x 10-5 copies/uL. There were no shifts in Ct values above 3 cycles (Hinlo et al., 2017) therefore inhibition was not detected.

In cases where only one qPCR technical replicate returned a Ct value, that value was used to represent the whole sample; in cases where more than one qPCR replicate returned a Ct value, these were averaged to represent the whole sample. A sample was labelled non-detect if all technical replicates did not to return a Ct value. Two alternative methods to deal with non-detects were compared in the modelling analyses described below: (1) remove them from the data set; or (2) assign them a Ct value of 40 (the number of PCR cycles). Since there were no differences in model output conclusions from both methods, the most parsimonious method of full removal was used for final analyses.

Targeted Next Generation Sequencing:

Sequencing was performed by amplifying in a single-step PCR using the same species-specific primers as used for qPCR, but nested within longer fusion primers that also included the Illumina sequencing tags (Supplementary Table S2). Single-step PCR eliminates product mass variation due to ligation efficiencies, bead recovery, and other steps involved in Illumina library preparation and normalization. Fusion primer design and pairing scheme roughly followed that from Elbrecht & Leese (2015) with an additional pentamer barcode to aid in demultiplexing (Supplementary Table S2). All eDNA samples and blanks were assigned a unique dual index. High positional diversity was ensured by: (1) 15% PhiX spike; (2) shifting the base composition of any given position by adding zero to four bases to the primers; and (3) simultaneously sequencing forward and reverse primers (Elbrecht & Leese, 2015) (Supplementary Table S2). PCR reactions were performed in triplicate with 3 µL template in a 25 µL reaction with final concentrations: 1µM each primer, 2mM dNTP, 1 mg/mL BSA, 2.5mM MgCl2, 1X buffer, and 2U Amplitaq Gold. Cycling conditions were 95ºC for 10 min followed by 40 cycles of 95ºC for 10 sec, 60ºC for 1 min. PCR replicates of all water and zooplankton samples, blanks, and standard curve concentrations (see next paragraph) were pooled at equal volumes into two sequencing pools and cleaned twice with SPRI beads at 0.9x ratio. Pooling equal volumes of PCR reactions forms the premise of the assay that NGS read numbers will reflect starting eDNA concentrations. Libraries were quantified using a SYBR-Green fluorometric assay read on the Tecan Infinite Pro (Alam et al., 2017) and 15 pmoles of each library were loaded on two runs of MiSeq v2 300 cycles (150bp paired end).

Raw fastq files were trimmed of adapter sequence, low quality regions (phred33 score < 2), and N’s using Adapter Removal v2 (Schubert et al., 2016). OBITools v 1.2.11 (Boyer et al., 2016) was used to merge paired-end reads (‘illuminapairedend’, minimum alignment score 40). JAMP v0.67 (github.com/VascoElbrecht/JAMP) was used to demultiplex the samples. All samples were screened for green crab reads using BLAST+ with a custom database of all green crab COI sequences available in the NCBI non-redundant nucleotide database (downloaded Feb 27, 2019) (Camacho et al., 2009). All non-target sequences were screened by matching to the PhiX genome used by Illumina (Genbank Accession NC001422) with BLAST+.

Standard curves were included in each run to determine the relationship between starting DNA concentration and the number of NGS reads. The curve was generated using the same gBlock artificial double-stranded DNA construct from Roux et al. (2020) using the same set of serial dilutions but slightly shifted from 1 x 10-6 to 1 x 10-13 copies/uL. These were amplified in triplicate, pooled and added to each MiSeq run. Technical PCR replicates of eDNA samples were produced with the same molecular identification tag and therefore not analyzed separately. The standard curves were analyzed in R (R Core Team, 2018) using linear regression of log-transformed raw sequencing reads and DNA concentrations (lm in stats). The limit of detection and limit of quantification could not be calculated for tNGS in the same way as for qPCR assays because multiple runs were not performed on each sample. However, a minimum number of sequences in a sample, below which sequences are discarded, must be set to remove PCR artifacts and potential contamination. The use of unique dual indexes for all eDNA samples removes the possibility of index hopping. PCR errors and chimeras are filtered out by aligning to known green crab sequences within 99% similarity. There was just a single green crab read in all of the blanks. The minimum number of reads for an eDNA sample to call a positive detection was set at five and we are confident that any eDNA samples with 5 or more reads represents a true eDNA signal and not an artifact. This threshold will remove error due to contamination as informed by blank samples, yet is still low enough to detect the presence of species’ with very low starting eDNA amounts.

qPCR
The excel spreadsheet provided ('qPCR_results.xlsx') contains raw Ct values for the qPCR assay described in Methods for each sample used in the replicate and site data sets. Samples are identified by site, day, and replicate number (for replicate data set).

Targeted Next-generation sequencing:
All raw FASTQ files were trimmed of adapter sequence, low quality regions (phred33 score < 2), and N’s using Adapter Removal v2 (Schubert et al., 2016). OBITools v 1.2.11 (Boyer et al., 2016) was used to merge paired-end reads (‘illuminapairedend’, minimum alignment score 40). JAMP v0.67 (github.com/VascoElbrecht/JAMP) was used to demultiplex the samples. The data files provided are JAMP output files in FASTQ.GZ format. Note: these files contain both green crab and PhiX reads. The following table describes the sample identity and associated filename.

Replicate
STATION   DAY   REP   FILE ROOT   Ext
Mayne Bay   1   1   D1A1   .fastq.gz
Mayne Bay   1   2   D1A2   .fastq.gz
Mayne Bay   1   3   D1A3   .fastq.gz
Mayne Bay   1   4   D1A4   .fastq.gz
Mayne Bay   1   5   D1A5   .fastq.gz
Mayne Bay   1   6   D1A6   .fastq.gz
Mayne Bay   2   1   D2A1   .fastq.gz
Mayne Bay   2   2   D2A2   .fastq.gz
Mayne Bay   2   3   D2A3   .fastq.gz
Mayne Bay   2   4   D2A4   .fastq.gz
Mayne Bay   2   5   D2A5   .fastq.gz
Mayne Bay   2   6   D2A6   .fastq.gz
Pipestem Inlet   1   1   D1C1   .fastq.gz
Pipestem Inlet   1   2   D1C2   .fastq.gz
Pipestem Inlet   1   3   D1C3   .fastq.gz
Pipestem Inlet   1   4   D1C4   .fastq.gz
Pipestem Inlet   1   5   D1C5   .fastq.gz
Pipestem Inlet   1   6   D1C6   .fastq.gz
Pipestem Inlet   2   1   D2C1   .fastq.gz
Pipestem Inlet   2   2   D2C2   .fastq.gz
Pipestem Inlet   2   3   D2C3   .fastq.gz
Pipestem Inlet   2   4   D2C4   .fastq.gz
Pipestem Inlet   2   5   D2C5   .fastq.gz
Pipestem Inlet   2   6   D2C6   .fastq.gz
Ritherdon Bay   1   1   D1D1   .fastq.gz
Ritherdon Bay   1   2   D1D2   .fastq.gz
Ritherdon Bay   1   3   D1D3   .fastq.gz
Ritherdon Bay   1   4   D1D4   .fastq.gz
Ritherdon Bay   1   5   D1D5   .fastq.gz
Ritherdon Bay   1   6   D1D6   .fastq.gz
Ritherdon Bay   2   1   D2D1   .fastq.gz
Ritherdon Bay   2   2   D2D2   .fastq.gz
Ritherdon Bay   2   3   D2D3   .fastq.gz
Ritherdon Bay   2   4   D2D4   .fastq.gz
Ritherdon Bay   2   5   D2D5   .fastq.gz
Ritherdon Bay   2   6   D2D6   .fastq.gz
San Mateo Bay   1   1   D1E1   .fastq.gz
San Mateo Bay   1   2   D1E2   .fastq.gz
San Mateo Bay   1   3   D1E3   .fastq.gz
San Mateo Bay   1   4   D1E4   .fastq.gz
San Mateo Bay   1   5   D1E5   .fastq.gz
San Mateo Bay   1   6   D1E6   .fastq.gz
San Mateo Bay   2   1   D2E1   .fastq.gz
San Mateo Bay   2   2   D2E2   .fastq.gz
San Mateo Bay   2   3   D2E3   .fastq.gz
San Mateo Bay   2   4   D2E4   .fastq.gz
San Mateo Bay   2   5   D2E5   .fastq.gz
San Mateo Bay   2   6   D2E6   .fastq.gz

Site
STATION   DAY   FILE ROOT   Ext
Mayne Bay   1   D1A   .fastq.gz
Mayne Bay   2   D2A   .fastq.gz
Hilliers Island   1   D1B   .fastq.gz
Hilliers Island   2   D2B   .fastq.gz
Pipestem Inlet   1   D1C   .fastq.gz
Pipestem Inlet   2   D2C   .fastq.gz
Ritherdon Bay   1   D1D   .fastq.gz
Ritherdon Bay   2   D2D   .fastq.gz
San Mateo Bay   1   D1E   .fastq.gz
San Mateo Bay   2   D2E   .fastq.gz

Targeted Next Generation Sequencing data for European Green Crab

Data files

Abstract

Methods

Usage notes

Works referencing this dataset