Environmental DNA metabarcoding differentiates between micro-habitats within the rocky intertidal
Data files
Mar 14, 2024 version files 3.41 GB
-
Anacapa_Toolkit_Output.zip
14.89 MB
-
FASTQ_Files.zip
3.39 GB
-
Processed_eDNA_Data.zip
573.53 KB
-
README.md
8.95 KB
Abstract
While the utility of environmental DNA (eDNA) metabarcoding surveys for biodiversity monitoring continues to be demonstrated, the spatial and temporal variability of eDNA, and thus the limits of the differentiability of an eDNA signal, remains under-characterized. In this study, we collected eDNA samples from distinct micro-habitats (~40 m apart) in a rocky intertidal ecosystem over their exposure period in a tidal cycle. During this period, the micro-habitats transitioned from being interconnected, to physically isolated, to interconnected again. Using a well-established eukaryotic (cytochrome oxidase subunit I) metabarcoding assay, we detected 415 species across 28 phyla. Across a variety of univariate and multivariate analyses, using exclusively taxonomically assigned data as well as all detected amplicon sequence variants (ASVs), we identified unique eDNA signals from the different micro-habitats sampled. This difference paralleled expected ecological gradients and increased as the sites became more physically disconnected. Our results demonstrate that eDNA biomonitoring can differentiate micro-habitats in the rocky intertidal only 40 m apart, that these differences reflect known ecology in the area, and that physical connectivity informs the degree of differentiation possible. These findings showcase the potential power of eDNA biomonitoring to increase the spatial and temporal resolution of marine biodiversity data, aiding research, conservation, and management efforts.
This repository includes the necessary data to process marine eDNA metabarcoding samples collected from Pillar Point, Half Moon Bay, CA, USA on 28 January 2022.
Description of Data, Code, & File Structure
FASTQ_Files.zip
This archive contains compressed FASTQ sequencing files generated from an Illumina MiSeq PE 2x250bp (500 cycles) sequencing run. More details on sequencing methods can be found in the “Methods” section on Dryad and the manuscript. The zip file also contains FASTA files of the forward and reverse COI primers used. These FASTA and FASTQ files are formatted for use with the modified Anacapa Container described below.
Anacapa_Toolkit_Output.zip
This archive contains the unprocessed output from a run of the modified Anacapa Container on 31 January 2023 on the FASTQ files contained in FASTQ_Files.zip.
In the archive are three folders:
CO1
This folder contains the primary Anacapa Container output. The top level contains two summary .txt files.
- CO1_ASV_taxonomy_brief.txt: The rows are each ASV generated through the Anacapa Container, and columns are each eDNA sample. The numbers reflect the number of sequence reads of a given ASV in a given sample. The final columns contain the taxonomic classification of the given ASV, if any, the confidence level for each taxonomic assignment, and accession numbers.
- CO1_ASV_taxonomy_detailed.txt: This contains the same information as CO1_ASV_taxonomy_brief.txt, but has some additional columns. “sequence”, “sequencesF”, and “sequencesR” include the sequence information for the given ASV. Several additional columns are added by the Anacapa Container automatically, but are not used for any subsequent analysis or interpretation, including: “forward_CO1_seq_number”, “merged_CO1_seq_number”, “reverse_CO1_seq_number”, “single_or_multiple_hit”, “end_to_end_or_local”, “max_percent_id”, and “input_sequence_length”
Then, there is a “Summary_by_percent_confidence” folder generated that has inner folders (40, 50, 60, 70, 80, 90, 95, 100) for identifying only those taxonomic assignments from the Anacapa Container output that had a bootstrap confidence cutoff score of FOLDER-NAME or higher in BLCA. Each folder contains two .txt files, in identifical formats across folders.
- CO1_ASV_raw_taxonomy_FOLDER-NAME.txt: The rows are each ASV generated through the Anacapa Container, and columns are each eDNA sample. The numbers reflect the number of sequence reads of a given ASV in a given sample. The final columns contain the taxonomic classification of the given ASV, if any, the confidence level for each taxonomic assignment, and accession numbers. This is an identical format to CO1_ASV_taxonomy_brief.txt above.
- CO1_ASV_sum_by_taxonomy_FOLDER-NAME.txt: The rows are each unique taxonomic assignment, summed across all ASVs from CO1_ASV_raw_taxonomy_FOLDER-NAME.txt with the same assignment, and columns are each eDNA sample. The numbers reflect the number of sequence reads of a given taxonomic assignment in a given sample.
Metadata-Added
This folder contains four .txt files that include the metadata that we compiled to integrate with the Anacapa Container output. Most of the information contained in these files gets synthesized into the SampleData.csv file in the Processed-eDNA_Data.zip archive below.
- PillarPoint_Sample_Info_EnviroData.txt: contains the “T” (temperature; Degrees Celsius) and “S” (salinity; ppt) measured at each time (PST) and location combination with an Orion Model 1230 meter (Orion Research Inc., Beverly, MA, USA)
- PillarPoint_SampleInfo_FieldSampling.txt: contains the sample number for each sample, its corresponding field time (PST) and location, and the extraction group in which it was processed in the laboratory
- PillarPoint_SampleInfo_Location.txt: contains the GPS coordinates for the three sampling locations
- PillarPoint_SampleInfo_Sequencing.txt: contains the sample name, several descriptive portions of the sample name, whether it is a “sample” or “control”, and the PCR plate in which it was processed in the laboratory
Run_info
This folder contains a variety of scripts, outputs, and logs generated automatically during a run of the modified Anacapa Container. These will likely only be relevant to users who are trying to run the modified Anacapa Container (see below) on their own, and need to troubleshoot why their output is not identical to our output. More details on these files, and the Anacapa Toolkit pipeline, can be found on its original GitHub page.
Processed_eDNA_Data.zip
This archive contains processed versions of the ASVs, taxonomy assignments, and sample information from the Anacapa Toolkit Output. More details on processing can be found in the “Methods” section on Dryad and in the manuscript, but in short, we used only taxonomic assignments from the Anacapa Toolkit output (above) that had a bootstrap confidence cutoff score of 60 or higher in BLCA. Then, we removed singletons; we removed all ASVs that appeared in any negative field control, negative extraction control, or no-template negative PCR control; and we rarefied to the minimum number of reads of any sample. These processing steps can also all be found and reproduced in PillarPoint.Rmd in the Intertidal-eDNA GitHub Repo.
The processed eDNA dataset represents the core dataset analyzed in the manuscript.
In the archive are three .csv files: ASVTable.csv, SampleData.csv, and TaxTable.csv.
ASVTable.csv
Rows are each ASV generated through the Anacapa Toolkit pipeline, and columns are each eDNA sample. The numbers reflect the number of sequence reads of a given ASV in a given sample.
SampleData.csv
This file contains 12 columns of additional context for the eDNA samples listed in ASVTable.csv:
- “Sample” & “X”: contain the same eDNA sample names from ASVTable.csv
- “Sample_Name” & “Dereplicated_Sample_Name”: extract descriptive portions of the full sample name that are useful for differentiating samples
- “Type”: describes whether a given row represents a sample or a control
- “PCR”: denotes whether a sample was processed in the first or second PCR plate during laboratory work
- “Time”: sample collection time (PST)
- “Location”: sample collection location (S1, S2 or N; context for these locations in section 1.2 of Dryad Methods or in manuscript)
- “Extraction”: denotes the group in which this sample was extracted during laboratory work
- “T” & “S”: the temperature (Degrees Celsius) and salinity (ppt) measured at the time the sample was collected with an Orion Model 1230 meter (Orion Research Inc., Beverly, MA, USA)
- “SiteByTime”: a composite of “Location” and “Time” columns used to differentiate samples in some analyses
TaxTable.csv
Rows are each ASV generated through the Anacapa Toolkit pipeline (same as ASVTable.csv), and columns are the taxonomic information assigned to that ASV, if any, through the Anacapa Toolkit pipeline.
Related Works: Software, Code, and Data
Anacapa Container (modified)
The modified Anacapa Container, published to Zenodo and found in the “Related Works” on Dryad, is the workflow used to process FASTQ_Files.zip. The output from this workflow is Anacapa_Toolkit_Output.zip. More details on the Anacapa Container and how it works can be found on its Zenodo page.
Intertidal-eDNA GitHub Repository
The Intertidal-eDNA GitHub repository, published to Zenodo and found in the “Related Works” on Dryad, contains all of the data and code needed to reproduce the analyses in the manuscript, starting from Anacapa_Toolkit_Output.zip (which is already included in the repo). Using PillarPoint.Rmd, you can generate the processed eDNA dataset saved here as Processed_eDNA_Data.zip and use that dataset to reproduce all of the analyses, as well as most of the figures, in the manuscript. More details on the Intertidal-eDNA GitHub repository can be found on its Zenodo page.
NCBI SRA Sequence Data Submission
The FASTQ files in FASTQ_Files.zip have also been submitted to the NCBI Sequence Read Archive; the BioProject containing all of these files can be found with accession: PRJNA1083727.
GBIF and OBIS Biodiversity Data Submission
Biodiversity data generated from this project has also been submitted to the Global Biodiversity Information Facility (GBIF) and the Ocean Biodiversity Information System (OBIS). It is important to note that the biodiversity information submitted to GBIF/OBIS differs slightly from the Processed_eDNA_Data.zip files above, since best practices for submitting DNA-derived data to GBIF recommend not rarefying to the minimum number of reads of any sample and not including an controls. The exact data submitted to GBIF/OBIS can be produced through the Intertidal-eDNA GithUb repository published to Zenodo.
1.1. Reproducibility
To improve reproducibility (e.g. Dickie et al., 2018; Shea et al., 2023), enable open data science (e.g. Fredston & Lowndes, 2024), and aid in the initiation of new eDNA biomonitoring projects, we have published detailed, step-by-step protocols for many of the methods described, including specific materials used, photographs, and additional methodological notes not possible to include here. See Shea and Boehm for sample collection and filtering (2023b), for DNA extractions (2023c), for PCR amplification (2023d), and shipping samples (2023e). Additionally, we have published all data and code for replicating our analyses via Dryad, including FASTQ files and eDNA datasets (pre-processing & post-processing). Through Dryad and Zenodo, our modified Anacapa Container and scripts for bioinformatics (Shea & Boehm, 2023a), as well as a GitHub repository including an R Markdown file that reproduces all methods & results (Shea & Boehm, 2023f) can also be accessed.
1.2. Sampling Area
To better understand spatial and temporal differences in eDNA signals in a complex coastal environment, we sought a rocky intertidal field location that had consistent, large, accessible tide pools that were fully isolated from one another at some low tides but interconnected during other parts of their exposure period. We selected the intertidal at Pillar Point, a headlands promontory to the west of Pillar Point Harbor in San Mateo County, California, USA. Pillar Point is a popular recreational intertidal area that is directly adjacent to the Pillar Point State Marine Conservation Area (no specific use scientific collection permit required).
Within Pillar Point, we sampled three discrete locations: two individual tide pools with a range of physical connectivity (Tide Pool 1, S1: 37.495306°, -122.498744°; Tide Pool 2, S2: 37.494992°, -122.498955°) and an equidistant location (Nearshore, N: 37.495288°, -122.499198°) where there was well-mixed offshore water for the duration of the tidal cycle. Each site was approximately 40 meters from all other sites. Tide Pool 1 and Tide Pool 2 are fully isolated at tidal heights of around 0 m (mean low low water, MLLW) or lower, and substantially connected at around 0.25 m or higher. On the day we sampled, this meant water actively flowed between the locations at the start (11:30 PST) and end (17:00 PST) of the sampling period, but the sites were disconnected at low tide in the middle of the sampling period. Ecologically, Tide Pool 1 was located closer to shore and interior to a channel that divides the Pillar Point intertidal, characteristic of the high to middle intertidal. Tide Pool 2 was located across the channel and further from shore, characteristic of the low intertidal.
1.3. Sample Collection & Filtration
We collected 1 L surface samples from each site every 30 minutes for the duration of time the rocky intertidal was exposed on 28 January 2022, using single-use enteral feeding pouches (Covidien, Dublin, Ireland). The sampling volume used, 1 L, is sufficient for detecting a representative range of marine organisms in nearshore locations (Gold et al., 2022) and is commonly used in aquatic eDNA studies (Takahashi et al., 2023). Sampling commenced at 11:30 PST. At each site, samples were collected from a consistent position across time points. Following the approach used by Gold, Sprague, et al. (2021), we attached a sterile 0.22 μm pore size Sterivex cartridge (MilliporeSigma, Burlington, MA, USA) to the tubing of each feeding pouch, allowing samples to be immediately gravity filtered in the field. While gravity filtering (1-2 hours per sample), samples were shaded with an awning to prevent any degradation by sunlight(Andruszkiewicz et al., 2017). One sample fell during gravity filtration, resulting in a missing sample from S1 at 16:00 PST.
At three time points at the beginning and end of the sampling period as well as at low tide (at 14:00 PST), we collected triplicate 1 L samples from each location as biological replicates. At the beginning and end of the sampling period, we also filtered 1 L MilliQ water via the procedure described above to serve as negative field controls. Additionally, using an Orion Model 1230 meter (Orion Research Inc., Beverly, MA, USA), we recorded temperature and salinity in each location directly after samples were collected.
Once finished filtering, Sterivex cartridges were dried by pushing air through them using a sterile 3 mL syringe, capped, and placed in sterile Whirl-Pak bags (Whirl-Pak, Madison, WI, USA). Then, samples were stored in a cooler on ice until transported back to the laboratory at the end of the sampling period. Samples were transferred to a -20°C freezer for up to 18 days, at which time they were processed to extract nucleic acids from the captured materials. This sampling scheme resulted in 53 field samples.
1.4. DNA Extraction & Library Preparation
Within 18 days of collection, we extracted and purified DNA from the Sterivex cartridge using the DNeasy Blood and Tissue Kit (Qiagen, Germantown, MD, USA) and the modifications described in Spens et al. (2017). In short, we incubated the filter cartridge overnight with proteinase K and ATL. Then, we extracted the liquid from the cartridge with a syringe and mixed it with equal volumes of AL buffer and 0°C ethanol before proceeding with the manufacturer’s extraction protocol. One negative extraction control (DNA-grade water in place of a sample) was included in each of the four batches of extractions. Nucleic acids were stored at -20°C for up to 6 months. DNA extractions were conducted at a lab bench distant from all subsequent handling of PCR products.
We PCR amplified the extracted DNA in triplicate within 6 months of storage, using the mlCOIintF/ jgHCO2198 primer set targeting a 313 bp fragment of the mitochondrial COI region optimized by Leray et al. (2013) (Forward: GGWACWGGWTGAACWGTWTAYCCYCC; Reverse: TAIACYTCIGGRTGICCRAARAAYCA) with Nextera modifications. Following Curd et al. (2019), we used a 25 μl PCR reaction mixture consisting of 12.5 μl of Qiagen Multiplex Mix (Qiagen, Germantown, MD, USA), 2.5 μl each of the forward and reverse primers at 2 μM (Integrated DNA Technologies, Inc., Coralville, IA, USA), 6.5 μl of PCR-grade water, and 1 μl of undiluted DNA template. The PCR thermocycling touchdown profile began with an initial denaturation at 95° for 15 minutes to activate the DNA polymerase, followed by 13 cycles of denaturation (94° for 30 seconds), annealing (starting at 69.5° for 30 seconds, with the temperature decreased by 1.5° each cycle), and extension (72° for 1 minute). Then, an additional 35 cycles were run with the same denaturation and extension steps as above with an annealing temperature of 50°, followed by a final extension at 72° for 10 minutes. PCR reactions were prepared in a designated DNA-free hood until the template was added.
PCR amplification was conducted in two batches; in each batch, we included one no-template negative PCR control (DNA-grade water used as a template). Additionally, we extracted and purified DNA from tissue from five organisms across a range of phyla we expected to amplify with the mlCOIintF/ jgHCO2198 primers, but not expected to be present at Pillar Point in particular (Mytilus edulis, Mizuhopecten yessoensis, Xiphias gladius, Mercenaria mercenaria, Lutjanus campechanus) using the standard tissue extraction protocol detailed in the DNeasy Blood and Tissue Kit (Qiagen, Germantown, MD, USA). These tissues were obtained from a local grocery store and it was assumed that they were labeled correctly, although previous work has indicated mislabeling in seafood stores can occur (e.g. Willette et al., 2017). Extracts from the 5 tissue samples were combined in equimolar amounts to form a mock community used as a positive PCR control in each batch. Triplicate PCR amplicons from both samples and controls were not subsequently pooled but were carried through the remaining library preparation and sequencing steps as technical replicates. We electrophoresed and visualized a subset of PCR products on a 1.5% agarose gel stained with GelRed® (Biotium, Fremont, CA, USA) to ensure successful amplification and correct product sizes as well as lack of contamination.
Post-PCR library preparation and sequencing were conducted at the Georgia Genomics and Bioinformatics Core (GGBC, UG Athens, GA, RRID: SCR_010994). In short, provided PCR amplicons were cleaned using AMPure XP magnetic beads (Beckman Coulter, Indianapolis, IN, USA), barcoded with Nextera adapters (Illumina, San Diego, CA, USA) during a second PCR (3 min at 95 °C; 15 cycles of 30 sec at 95 °C, 30 sec at 67 °C, and 30 sec at 72 °C; and 4 min at 72 °C), cleaned again using AMPureXP magnetic beads, and pooled in equimolar ratio. The resulting library was sequenced on a MiSeq PE 2x250bp (500 cycles) using Reagent Kit V2 with 25% PhiX spike-in (Illumina, San Diego, CA, USA). Given our technical replication of samples and controls, our final library included 6 negative field controls (1 at beginning and end of field sampling, amplified in triplicate), 12 negative extraction controls (1 in each of 4 extraction sets, amplified in triplicate), 2 positive PCR controls (1 in each of 2 amplification batches), 2 no-template negative PCR controls (1 in each of 2 amplification batches), and 159 samples (53 field samples, amplified in triplicate). FASTQ files are in this Dryad repository as FASTQ_Files.zip and are also available through the NCBI SRA (Accession: PRJNA1083727).
1.5. Bioinformatics
We processed sequencing data using the Anacapa Toolkit, which contains two core modules: one for quality control and ASV parsing, and one for classifying taxonomy (Curd et al., 2019). Briefly, we ran the first module using default parameters, which uses cutadapt (version 1.16) (Martin, 2011) for adapter and primer trimming, FastX-Toolkit (version: 0.0.13) (Gordon & Hannon, 2010) for quality trimming, and dada2 (version 1.6) (Callahan et al., 2016) for assigning ASVs. For the second module, we utilized the MIDORI2 reference database, a quality-controlled and updated database built from GenBank release 253 (20 December 2022) that has been technically validated (Leray et al., 2022). Following Gold et al. (2022), we adjusted the identity and query coverage to 95% (default: 80%) to account for the relative incompleteness of the broad COI reference database compared to more taxonomically-specific databases (Curd et al., 2019). The second module relies on Bowtie 2 (version 2.3.5) (Langmead & Salzberg, 2012) and a modified instance of BLCA (Gao et al., 2017) as dependencies. The Anacapa Toolkit output is in this Dryad repository as Anacapa_Toolkit_Output.zip. Following Gold, Curd, et al. (2021) we only kept taxonomic assignments that had a bootstrap confidence cutoff score of 60 or higher in BLCA, to avoid spurious assignments from the incomplete reference database. We modified the Anacapa Container (Ogden, 2018), a Singularity container with all the needed dependencies for executing the Anacapa Toolkit, to enable the pipeline to be run in a high-performance computing environment requiring two-step authentication; the updated container, scripts, and reference database with the required Bowtie 2 index library needed to reproduce our bioinformatics process are archived on Zenodo (Shea & Boehm, 2023a).
Raw ASVs, taxonomy assignments, and sample information were converted into interchangeable ampvis2 (version 2.8.6) (Andersen et al., 2018) and phyloseq (version 1.46.0) (McMurdie & Holmes, 2013) objects in R (version 4.3.1) to facilitate decontamination and further analyses. Singletons were removed using ampvis2, and samples were further decontaminated using phyloseq by removing all ASVs that appeared in any negative field control, negative extraction control, or no-template negative PCR control, a choice made due to the very low number of overlapping ASVs between samples and negative controls (Table S1). Samples were rarified to the minimum number of reads of any sample using ampvis2. This processed eDNA dataset is in this Dryad repository as Processed_eDNA_Data.zip. None of these decontamination steps changed the interpretation of subsequent analyses, as verified by replicating all analyses with datasets representing all 8 combinations of the presence/absence of the three decontamination and processing steps.
To ensure the accuracy of taxonomic assignments, we analyzed occurrence data from the Global Biodiversity Information Facility to investigate whether identified species had occurrence records in the California Current System and known ranges that encompassed Pillar Point, using the spocc (version 1.2.2) (S. Chamberlain, 2021) package. A phylogenetic tree based on taxonomic assignments was created using the taxize (version 0.9.100) (S. A. Chamberlain & Szöcs, 2013) and ggtreeExtra (version 1.12.0) (Xu et al., 2021) packages.
1.6. Data Analysis
To investigate whether eDNA signals could be distinguished by location, we first analyzed individual-level differences between locations; that is, whether ASVs or individual taxa (agglomerated species-level taxonomic assignments) were unique to, or associated with particular locations. We calculated and visualized unique ASVs and taxa using the eulerr (version 7.0.0) (Larsson, 2022) package. However, with metabarcoding data in particular, taxa or ASVs that are unique to a given location are not necessarily ecologically meaningful; they could include rare taxa present elsewhere but not amplified and exclude taxa that are well correlated with particular locations but sometimes detected at others. Thus, we also analyzed ASVs and taxa using an indicator species framework (Dufrêne & Legendre, 1997). In this framework, the null hypothesis is that the frequency of taxon or ASV presence in samples from a particular location is not higher than the frequency of that taxon or ASV presence in samples from other locations. For each location, we identified all statistically-significant indicator taxa and ASVs with indicator value indices above 0.7—that is, indicators that are well-associated with a site group even if they are detected in samples from other sites—based on presence-absence data per sample using the indicspecies package (version 1.7.14) (Cáceres & Legendre, 2009).
We followed our individual-level analyses with approaches to test whether community composition varied by location. We calculated a Jaccard dissimilarity matrix across all samples using vegan (version 2.6.4) (Oksanen et al., 2022). Then, we tested for differences in community composition among locations using a permutational multivariate analysis of variance (PERMANOVA) with the model eDNA Presence ~ Location + Time + Biological Replicates using the adonis function in vegan. We confirmed that locational differences found via PERMANOVA were not a result of differences in dispersions by testing for homogeneity of dispersions using the betadisper function in vegan. We also visualized Jaccard dissimilarity using non-metric multidimensional scaling (NMDS) using the metaMDS function in vegan. We coupled these analyses with a partitioning among medoids algorithm (Kaufman & Rousseeuw, 1990) implemented with the pam function in the cluster package (version 2.1.6) (Maechler et al., 2022). Rather than assuming location clusters a priori, we validated the optimal number of clusters for a given dataset by finding the number of clusters (k) that maximized the average silhouette width, a measure of how well-structured the clusters are. Finally, to better understand how the difference between locations varied over the time sampled, for each time point, we calculated the Jaccard dissimilarity between each unique combination of replicates within each site, and then the pairwise Jaccard dissimilarity between each unique combination of samples across the three pairs of sites: S1-N, S2-N, and S1-S2.
To further investigate whether differences in eDNA detections corresponded with underlying ecological gradients, we compared the unique and indicator taxa identified at each site to their ecological zonation in a highly regarded field guide to the Pacific intertidal, Between Pacific Tides (Ricketts et al., 1985). To account for potential variations in taxonomic names between Between Pacific Tides and the MIDORI2 reference database, we used the World Register of Marine Species (WoRMS) to identify all synonymized names for each unique and indicator taxon identified using the taxize (version 0.9.100) (S. A. Chamberlain & Szöcs, 2013) package. Then, we searched Between Pacific Tides for all synonyms, both manually using the index and in R by extracting the text from a PDF of Between Pacific Tides using the pdftools package (version 3.4.0) (Ooms, 2023). We compared the proportion of
Zone 1 (uppermost horizon), Zone 2 (high intertidal), and Zone 3 (middle intertidal) species were identified across each location using a chi-squared test with p-values computed via Monte Carlo simulation.
1.7. Citations
- Andersen, K. S., Kirkegaard, R. H., Karst, S. M., & Albertsen, M. (2018). ampvis2: An R package to analyse and visualise 16S rRNA amplicon data (p. 299537). bioRxiv. https://doi.org/10.1101/299537
- Andruszkiewicz, E. A., Sassoubre, L. M., & Boehm, A. B. (2017). Persistence of marine fish environmental DNA and the influence of sunlight. PLOS ONE, 12(9), e0185043. https://doi.org/10.1371/journal.pone.0185043
- Cáceres, M. D., & Legendre, P. (2009). Associations between species and groups of sites: Indices and statistical inference. Ecology, 90(12), 3566–3574. https://doi.org/10.1890/08-1823.1
- Callahan, B. J., McMurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J. A., & Holmes, S. P. (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13(7), Article 7. https://doi.org/10.1038/nmeth.3869
- Chamberlain, S. (2021). spocc: Interface to Species Occurrence Data Sources (1.2.0) [R package]. https://CRAN.R-project.org/package=spocc
- Chamberlain, S. A., & Szöcs, E. (2013). taxize: Taxonomic search and retrieval in R. F1000Research, 2, 191. https://doi.org/10.12688/f1000research.2-191.v2
- Curd, E. E., Gold, Z., Kandlikar, G. S., Gomer, J., Ogden, M., O’Connell, T., Pipes, L., Schweizer, T. M., Rabichow, L., Lin, M., Shi, B., Barber, P. H., Kraft, N., Wayne, R., & Meyer, R. S. (2019). Anacapa Toolkit: An environmental DNA toolkit for processing multilocus metabarcode datasets. Methods in Ecology and Evolution, 10(9), 1469–1475. https://doi.org/10.1111/2041-210X.13214
- Dickie, I. A., Boyer, S., Buckley, H. L., Duncan, R. P., Gardner, P. P., Hogg, I. D., Holdaway, R. J., Lear, G., Makiola, A., Morales, S. E., Powell, J. R., & Weaver, L. (2018). Towards robust and repeatable sampling methods in eDNA-based studies. Molecular Ecology Resources, 18(5), 940–952. https://doi.org/10.1111/1755-0998.12907
- Dufrêne, M., & Legendre, P. (1997). Species Assemblages and Indicator Species:the Need for a Flexible Asymmetrical Approach. Ecological Monographs, 67(3), 345–366. https://doi.org/10.1890/0012-9615(1997)067[0345:SAAIST]2.0.CO;2
- Fredston, A. L., & Lowndes, J. S. S. (2024). Welcoming More Participation in Open Data Science for the Oceans. Annual Review of Marine Science, 16(1), annurev-marine-041723-094741. https://doi.org/10.1146/annurev-marine-041723-094741
- Gao, X., Lin, H., Revanna, K., & Dong, Q. (2017). A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy. BMC Bioinformatics, 18(1), 247. https://doi.org/10.1186/s12859-017-1670-4
- Gold, Z., Curd, E. E., Goodwin, K. D., Choi, E. S., Frable, B. W., Thompson, A. R., Walker Jr., H. J., Burton, R. S., Kacev, D., Martz, L. D., & Barber, P. H. (2021). Improving metabarcoding taxonomic assignment: A case study of fishes in a large marine ecosystem. Molecular Ecology Resources, 21(7), 2546–2564. https://doi.org/10.1111/1755-0998.13450
- Gold, Z., Sprague, J., Kushner, D. J., Marin, E. Z., & Barber, P. H. (2021). eDNA metabarcoding as a biomonitoring tool for marine protected areas. PLOS ONE, 16(2), e0238557. https://doi.org/10.1371/journal.pone.0238557
- Gold, Z., Wall, A. R., Schweizer, T. M., Pentcheff, N. D., Curd, E. E., Barber, P. H., Meyer, R. S., Wayne, R., Stolzenbach, K., Prickett, K., Luedy, J., & Wetzer, R. (2022). A manager’s guide to using eDNA metabarcoding in marine ecosystems. PeerJ, 10, e14071. https://doi.org/10.7717/peerj.14071
- Gordon, A., & Hannon, G. (2010). FASTX-Toolkit (0.0.13) [Computer software]. http://hannonlab.cshl.edu/fastx_toolkit/index.html
- Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. Wiley.
- Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), Article 4. https://doi.org/10.1038/nmeth.1923
- Larsson, J. (2022). eulerr: Area-Proportional Euler and Venn Diagrams with Ellipses (7.0.0) [R package]. https://CRAN.R-project.org/package=eulerr
- Leray, M., Knowlton, N., & Machida, R. J. (2022). MIDORI2: A collection of quality controlled, preformatted, and regularly updated reference databases for taxonomic assignment of eukaryotic mitochondrial sequences. Environmental DNA, 4(4), 894–907. https://doi.org/10.1002/edn3.303
- Leray, M., Yang, J. Y., Meyer, C. P., Mills, S. C., Agudelo, N., Ranwez, V., Boehm, J. T., & Machida, R. J. (2013). A new versatile primer set targeting a short fragment of the mitochondrial COI region for metabarcoding metazoan diversity: Application for characterizing coral reef fish gut contents. Frontiers in Zoology, 10(1), 34. https://doi.org/10.1186/1742-9994-10-34
- Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., & Hornik, K. (2022). cluster: Cluster Analysis Basics and Extensions. https://CRAN.R-project.org/package=cluster
- Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal, 17(1), Article 1. https://doi.org/10.14806/ej.17.1.200
- McMurdie, P. J., & Holmes, S. (2013). phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PLOS ONE, 8(4), e61217. https://doi.org/10.1371/journal.pone.0061217
- Ogden, M. (2018). CALeDNA Anacapa/CRUX Dat Container (Linux/HPC) (Version 9, p. 3224287111 bytes) [dataset]. Dryad. https://doi.org/10.6071/M31H29
- Oksanen, J., Simpson, G. L., Blanchet, F. G., Kindt, R., Legendre, P., Minchin, P. R., O’Hara, R. B., Solymos, P., Stevens, M. H. H., Szoecs, E., Wagner, H., Barbour, M., Bedward, M., Bolker, B., Borcard, D., Carvalho, G., Chirico, M., Caceres, M. D., Durand, S., … Weedon, J. (2022). vegan: Community Ecology Package. https://CRAN.R-project.org/package=vegan
- Ooms, J. (2023). pdftools: Text Extraction, Rendering and Converting of PDF Documents [Computer software]. https://CRAN.R-project.org/package=pdftools
- Ricketts, E. F., Calvin, J., & Hedgpeth, J. W. (1985). Between Pacific Tides (5th ed.). Stanford University Press.
- Shea, M. M., & Boehm, A. B. (2023a). CALeDNA Anacapa Container for Linux/HPC (modified). https://doi.org/10.5281/zenodo.8201140
- Shea, M. M., & Boehm, A. B. (2023b). Coastal Environmental DNA Sampling & Gravity Filtration Protocol. https://www.protocols.io/view/coastal-environmental-dna-sampling-amp-gravity-fil-cws6xehe
- Shea, M. M., & Boehm, A. B. (2023c). DNA Extraction Protocol from Sterivex Filters. https://www.protocols.io/view/dna-extraction-protocol-from-sterivex-filters-cvtzw6p6
- Shea, M. M., & Boehm, A. B. (2023d). Environmental DNA (eDNA) COI PCR Amplification and Gel Electrophoresis Protocol. https://www.protocols.io/view/environmental-dna-edna-coi-pcr-amplification-and-g-cxe9xjh6
- Shea, M. M., & Boehm, A. B. (2023e). Environmental DNA (eDNA) Sample Shipping Protocol. https://www.protocols.io/view/environmental-dna-edna-sample-shipping-protocol-cvxmw7k6
- Shea, M. M., & Boehm, A. B. (2023f). meghanmshea/intertidal-eDNA: V1.0.0 [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.8213050
- Shea, M. M., Kuppermann, J., Rogers, M. P., Smith, D. S., Edwards, P., & Boehm, A. B. (2023). Systematic review of marine environmental DNA metabarcoding studies: Toward best practices for data usability and accessibility. PeerJ, 11, e14993. https://doi.org/10.7717/peerj.14993
- Spens, J., Evans, A. R., Halfmaerten, D., Knudsen, S. W., Sengupta, M. E., Mak, S. S. T., Sigsgaard, E. E., & Hellstrom, M. (2017). Comparison of capture and storage methods for aqueous macrobial eDNA using an optimized extraction protocol: Advantage of enclosed filter. METHODS IN ECOLOGY AND EVOLUTION, 8(5), 635–645. https://doi.org/10.1111/2041-210X.12683
- Takahashi, M., Saccò, M., Kestel, J. H., Nester, G., Campbell, M. A., van der Heyde, M., Heydenrych, M. J., Juszkiewicz, D. J., Nevill, P., Dawkins, K. L., Bessey, C., Fernandes, K., Miller, H., Power, M., Mousavi-Derazmahalleh, M., Newton, J. P., White, N. E., Richards, Z. T., & Allentoft, M. E. (2023). Aquatic environmental DNA: A review of the macro-organismal biomonitoring revolution. Science of The Total Environment, 873, 162322. https://doi.org/10.1016/j.scitotenv.2023.162322
- Willette, D. A., Simmonds, S. E., Cheng, S. H., Esteves, S., Kane, T. L., Nuetzel, H., Pilaud, N., Rachmawati, R., & Barber, P. H. (2017). Using DNA barcoding to track seafood mislabeling in Los Angeles restaurants. Conservation Biology, 31(5), 1076–1085. https://doi.org/10.1111/cobi.12888
- Xu, S., Dai, Z., Guo, P., Fu, X., Liu, S., Zhou, L., Tang, W., Feng, T., Chen, M., Zhan, L., Wu, T., Hu, E., Jiang, Y., Bo, X., & Yu, G. (2021). ggtreeExtra: Compact Visualization of Richly Annotated Phylogenetic Data. Molecular Biology and Evolution, 38(9), 4039–4042. https://doi.org/10.1093/molbev/msab166