Data from: Taking eDNA underground: factors affecting eDNA detection of subterranean fauna in groundwater
Data files
Mar 29, 2023 version files 218.66 MB
-
Barrow_18S_combined_tax7.csv
310.63 KB
-
Barrow_18S_sample_data2.csv
35.74 KB
-
Barrow_18S_zotus_trimmed.fasta
1.36 MB
-
Barrow_18S_zotus.fasta
1.58 MB
-
Barrow_18S.fasta.zip
212.89 MB
-
README.md
2.64 KB
-
zotuTable.csv
2.47 MB
Abstract
Stygofauna are ancient aquatic fauna that have evolved to live underground. The impacts of anthropogenic climate change, extraction and pollution on groundwater pose major threats to groundwater health, prompting the need for efficient and reliable means to detect and monitor stygofaunal communities. Traditional survey techniques for these species rely on morphological identification and can be biased, labour intensive, and often indeterminate to lower taxonomic levels. By contrast, environmental DNA (eDNA)-based methods have the potential to dramatically improve on existing stygofaunal survey methods in a large range of habitats and for all life stages, reducing the need for the destructive manual collection of often critically endangered species or specialized taxonomic expertise. We compared eDNA and haul-net samples collected in 2020 and 2021 from 19 groundwater bores and a cave on Barrow Island, located northwest of Western Australia, and assessed how sampling factors influenced the quality of eDNA detection of stygofauna. The two detection methods were complementary, eDNA metabarcoding was able to detect soft-bodied taxa and fish often missed by nets, but only detected seven of the nine stygofaunal crustacean orders identified from haul-net specimens. Our results also indicated that eDNA metabarcoding could detect 54-100% of stygofauna from shallow water samples and 82-90% from sediment samples. However, there was significant variation in stygofauna diversity between sample years and sampling types. The findings of this study demonstrate that haul net sampling has a tendency to underestimate stygofaunal diversity and that eDNA metabarcoding of groundwater can substantially improve efficiency of stygofaunal surveys.
Methods
Study site
Barrow and Boodie island are located northwest of mainland Western Australia on the edge of the Australian continential shelf. The climate of the islands is arid and subtropical with hot, humid summers with daily mean maxima reaching 34° C and daily mean minima of 20° C. The winter daily mean maxima are lower at 26° C and daily mean minima of 17° C (Moro & Lagdon, 2013). Rainfall is seasonal, most prevalent November-April, and dependent on the passage of summer monsoons. The average annual rainfall has historically been 320 mm; however, the first year of sampling (2020) was during a particularly dry period when the average annual rain for the previous three years was only 105 mm (www.bom.gov.au). The following year of sampling (2021) was during a wetter year with 377 mm of rain, the majority occurring between March and May.
Groundwater on Barrow Island comprises an exceptional example of an anchialine system (Humphreys, 2001). That is, stratified waters where a freshwater lens that originates from seasonal rainfall overlies seawater with a transitional zone in between. The transitional zone is of comparable thickness to the freshwater layer having been expanded due to tidal forces. As a smaller island, Boodie Island has less freshwater and is almost entirely marine. Access to the groundwater and in situ stygofaunal sampling is conducted via existing bores that have been installed for specific operational or monitoring purposes. The bores are typically constructed of 50 or 100 mm diameter, Polyvinyl Chloride (PVC) casing that is slotted at discrete intervals depending on their purpose. Slots vary from 0.5 - 5 mm in width. Bores with larger slot sizes can have mesh across them to prevent debris from accumulating inside the bore. Depth of bores vary depending on the location on the island (bores at higher elevation need to be deeper to reach the water column) and the initial purpose of the bore. Bores designed to monitor stygofauna or groundwater have slotted intervals in shallower parts of the aquifer (i.e. water several metres deep below the water table) compared to those constructed for the cathodic protection system (tens of metres below the water table). Many bores are slotted below the water table.
Sample Collection and Processing
Stygofaunal specimens and eDNA samples were collected from 17 bores in December 2020 and November 2021 (Figure 1). An additional two bores were sampled in 2021 (November) from Boodie Island south of Barrow Island. Groundwater at the base of a cave on the west side of Barrow Island was also sampled in November 2021. These sites are particularly difficult to access, and were unavailable for sampling during the 2020 sampling period. Bores chosen for sampling were spatially stratified across the island and high in stygofaunal diversity according to previous monitoring efforts (G. Humphreys et al., 2013). Two ‘control’ bores (BGMW08, WFS2MW04) with no previous detection of stygofauna were also selected.
Stygofaunal Collection and Processing
Whole specimens used to assess a baseline for morphological species diversity were collected using haul nets (similar to plankton nets – see Sacco et al., 2022), which were 45 mm or 95 mm in diameter, depending on the size of the PVC casing. Five (Allford, Cooper, Humphreys, & Austin, 2008) to six (EPA guidance) haul net samples were taken at each bore, half with 150 mm mesh and half with 50 mm mesh, alternating mesh size between hauls. We favoured haul net sampling for stygofauna prior to collection of eDNA samples to ensure that live specimens were not dispersed by the eDNA sampling equipment. Haul net samples were sorted and identified using a dissecting microscope and specimens of each morphotype were stored in glass vials in 100% ethanol and kept frozen at -20° C. Stygofauna experts (NS and MTG) provided the morphological identifications.
eDNA Sample Collection
Water samples were collected for eDNA analysis after hauling. Four 1 L water samples were collected from the upper 2 m of the water column using a 1L plastic disposable bailer (shallow water samples). One 1L water sample was collected from the bottom of the bore using a 1L steel bailer (deeper water sample). Sediment samples (one per bore) were collected from the deeper eDNA water samples where there was an abundance of sediment leftover that could not be filtered. With six samples per bore (i.e. 4x 1L eDNA, 1x 1L deeper, 1x sediment), there were 102 samples collected in 2020, and 120 samples in 2021. For five bores the scope was extended to include collecting the eDNA samples before and after the haul net samples to determine if the order of sampling was significant (n=30). All equipment was sterilized between bores in 10% bleach for 10 min and rinsed in RO water, and disposable gloves were used at each bore hole to prevent contamination. Fresh bleach solution and RO water was used for every bore hole for equipment that was not decontaminated in the lab ahead of time, and samples of the RO water used for rinsing equipment (i.e. field controls) were collected to assess sources of contamination. Additional quality control samples included: two blank filter membranes carried in the field and a sample of all RO water sources for a total of 34 field controls.
eDNA and haul net samples were kept on ice until they could be transferred to a 4°C refrigerator at the end of the collecting day. eDNA water samples were filtered within 24 h across 0.45 μm Supor polyethersulfone membranes using a Pall Sentino Microbiology pump (Pall Corporation, Port Washington, USA). For samples with high turbidity, up to two membranes were used to increase the volume of water filtered until the filters clogged. The samples were immediately frozen and stored at –20°C prior to, and on-ice during, their transportation to the Trace & Environmental DNA (TrEnD) Laboratory at Curtin University in Perth, Western Australia.
Water physicochemical properties were measured for each bore hole using a YSI ProDSS either two days before, or the day after sampling, to avoid cross-contamination between bore holes. Measurements of turbidity (FNU), conductivity (µS/cm), salinity (psu), temperature (°C), dissolved oxygen (%sat, mg/L), pH, and depth (m) were recorded for each bore hole. Water chemistry at various depths was only measured for bores with a water column that was greater than several metres.
DNA Barcoding of Specimens and Custom Reference Library
A leg was removed from 15 morphologically identified specimens (Table S1) collected via haul nets and DNA was extracted using the DNeasy Blood and Tissue kit (Qiagen) following the manufacturer’s instructions with the following modification – genomic DNA was eluted from the silica column with 50 μl of AE buffer. The DNA was then amplified using polymerase chain reaction (PCR) with a universal eukaryote18S assay (Table S2)(Pochon et al., 2013). This assay has been previously used to detect stygofauna (West et al., 2020) and is broad enough to detect a wide range of organisms, which might otherwise require multiple assays (Pochon, Bott, Smith, & Wood, 2013). Each PCR reaction for barcoding was carried out in duplicate in a total volume of 25 μl containing: 1X AmpliTaq Gold®PCR buffer (Life Technologies, Massachusetts, USA), 2.5 mM MgCl2, 0.25 mM dNTPs, 0.4 μM each of forward and reverse primers (Integrated DNA Technologies, Australia), 0.4 mg/mL BSA (Fisher Biotec, Australia), 0.6 μl of 5X SYBR® Green (Life Technologies), 1 U AmpliTaq Gold® DNA Polymerase (Life Technologies), 2 μl of genomic DNA template, and made to volume with Ultrapure Distilled Water (Life Technologies). The cycling conditions were initial denaturation at 95° C for 5 min, followed by 40-45 cycles of 95° C for 30 s, 52° C for 30 s, 72° C for 45 s, a melt curve stage of 95° C for 15 s, 60° C for 1 min, and 95° C for 15 s, and a final extension at 72° C for 10 min. The expected PCR product size was visualised on a 2% agarose gel and sent to the Australian Genome Research Facility (Perth) for sanger sequencing in both directions. The forward and reverse sequences were then aligned, the primers were removed and the consensus sequences were used to create a custom library for reference analyses (i.e., taxonomic identification) of the 18S eDNA sequences obtained from the water and sediment samples. The custom DNA sequence library also included sequences of stygofauna from the Pilbara region sourced from BOLD and GenBank (www.ncbi.nlm.nih.gov/genbank), as well as previous consulting work done in the area for a total of 207 18S sequences.
eDNA Laboratory Processing
DNA was extracted from filter membranes within 1 month of collection and filtering using a DNeasy Blood and Tissue Kit (Qiagen) with the following modifications: 540 μl of ATL lysis buffer and 60 μl of Proteinase K were used during the cell lysis phase. A total of seven DNA extraction controls, containing the solutions and plastics supplied in the extraction kit were used and processed alongside all eDNA and sanger sequencing samples in order to detect any laboratory or cross-contamination of samples. Sediment samples were homogenized for 1 min at 30 Hz in a TissueLyser (Qiagen) and DNA was extracted from 0.25 g of homogenized sediment using a DNeasy PowerLyser PowerSoil kit (Qiagen). All eDNA extractions (post-lysis stage) were performed using an automated DNA extraction platform (QIAcube; Qiagen) with a customised eDNA protocol that elutes the DNA off the silica membrane in 100 ul of AE buffer.
DNA extracts were screened for quality and quantity of DNA using quantitative PCR (qPCR) to determine the presence of inhibitors and the quantity of target template molecules present in each DNA extract (Murray, Coghlan, & Bunce, 2015). Three dilutions (neat, 1/5 and 1/10) were used for a subset of 30 samples, while the remaining were assessed using only the neat and 1/10 dilution. Each PCR reaction for the 18S universal assay was carried out with the same master mix as described above. The cycling conditions were initial denaturation at 95° C for 5 min, followed by 45 cycles of 95° C for 30 s, annealing at 52° C for 30 s, 72° C for 45 s, a melt curve stage of 95° C for 15 s, 60° C for 1 min, and 95° C for 15 s, and a final extension at 72° C for 10 min. All PCR plates included a negative control and a positive control of chicken DNA.
Each sample, including all controls, was assigned a unique combination of multiplex identifier (MID) tags for the 18S universal assay. These MID tags were incorporated into fusion tagged primers, and none of the primer-MID tag combinations had been used previously in the lab to prevent cross contamination. Fusion PCRs were done in duplicate to minimize PCR stochasticity, and the mixes were prepared in a dedicated ultra clean room before DNA was added. The PCRs were done with the same conditions as the standard qPCRs described above. Samples were then pooled into approximately equimolar concentrations to produce a PCR amplicon library that was size-selected to remove any primer-dimer that may have accumulated during fusion PCR. Size selection was performed (250-600bp) using a PippinPrep 2% ethidium bromide cassette (Sage Science, Beverly, MA, U.S.A). Libraries were cleaned using a QIAquick PCR Purification Kit (Qiagen, Germany) and quantified using Qubit Fluorometric Quantitation (Thermo Fisher Scientific). Sequencing was performed on the Illumina MiSeq platform using the 500 cycle paired-end V2 as per manufacturer's instructions
Bioinformatics
DNA sequences were processed using eDNAFlow, an automated bioinformatics workflow designed for analysis of eDNA metabarcoding data (Mousavi-Derazmahalleh et al., 2021). This pipeline uses AdaptorRemoval (Schubert et al., 2016) and FASTQC (Andrews, 2010) to quality filter and demultiplex sequences using ‘obitools’ (Boyer et al., 2016). The minimum Phred quality score was set at 20, with a minimum alignment of 12 for the paired-end reads, minimum length of 100 bp, and maximum 2 primer mismatches allowed, but no mismatches allowed in the MID-tag sequences. The demultiplexed sequences were then denoised using USEARCH unoise3 (Edgar, 2016) to create Zero-radius operational taxonomic units (ZOTUs) with a minimum abundance of 8. The resulting ZOTU table was then curated using ‘LULU’ to remove spurious ZOTUs (Frøslev et al., 2017). The ZOTU sequences were then queried against a custom barcode reference library for stygofauna and GenBank (NCBI) using blastn with the following parameters (-outfmt "5" -perc_identity 95 -qcov_hsp_perc 95 -query -max_target_seqs 10). Taxonomic assignment was performed using a simple lowest common ancestor algorithm on MEGAN (Huson, Auch, Qi, & Schuster, 2007) with a minimum score of 450.
Niches (i.e. stygofaunal, troglofaunal, non-subterranean taxon groups) were assigned to all metazoan taxa identified from ZOTU’s. This allowed us to parse stygofauna and troglofaunal taxa from non-subterranean taxa. ZOTUs belonging to subterranean taxa were then extracted an aligned along with sequences from the custom reference database using MUSCLE (Robert et al. 2004) in Geneious 2021.0.3 (https://www.geneious.com) with 10 iterations of the default settings. The resulting tree was used to infer putative species with a Poisson tree processes (PTP) model (Zhang, Kapli, Pavlidis, & Stamatakis, 2013). Some subterranean ZOTU taxa names were modified to reflect the species inferences from the model such that all ZOTUs assigned to a species had the same taxa name, and no two subterranean ‘species’ had the same name. For example, three ZOTU’s identified as ‘Atyidae’ were renamed to Stygiocaris stylifera because they belonged to the same putative species. Details on initial and modified taxon names can be found in the taxa table on dryad digital repository.
Statistical analysis
Statistical analyses were performed on R 4.0.5 (R Core Team, 2018). We used the R package ‘phyloseq’ (McMurdie & Holmes, 2013) to filter the dataset; removing samples with less than 2,000 reads (Figure S1) and pruning ZOTUs that were present in extraction controls and rinsate blanks. This removed 18 ZOTUs that were detected in extraction controls, and 166 ZOTUs from the rinsate controls, although only 3 of those were stygofauna. Two of the potential stygofauna ZOTUs removed were Nematode ZOTUs identified as Monhysterida found in RO water used to rinse equipment. ZOTUs with less than 5 reads were removed, and abundances were converted to presence or absence. We selected only the ZOTUs from presumed subterranean taxa (both stygofauna and troglofaunal), and combined the detections from all sample types for the comparison between subterranean taxa detected using morphological identification of haul-net samples, and taxa detected using eDNA. Differences in the taxa detected using each method were visualized using a bubble plot to compare taxa at the Order level for both stygofauna and troglofauna.
The effect of substrate types (deeper, shallow, sediment) on stygofauna community composition was tested on a dataset containing the one sediment sample, one deeper water sample, and one shallow water sample per bore hole. The effect of sample type on stygofauna ZOTU richness was tested using a Kruskal-Wallis test. For the data from 2021, the analysis was performed on the data for all sites, and without the three extra sites (Boodie Island and Ledge cave), to ensure the addition of those sites was not influencing the patterns observed. To examine which stygofaunal taxa could be detected using each sample type, the ZOTUs were collapsed by Taxon using the tax_glom function from the ‘phyloseq’ (McMurdie & Holmes, 2013). The taxa detected by each substrate were then illustrated using a venn diagram function (ps_venn) from the ‘MicEco’ package (Russel, 2021).
For the five bores where we collected eDNA samples both before and after using the haul nets, we tested whether the order of sampling had an effect on stygofauna ZOTU richness using an analysis of variance (ANOVA) with ‘bore’ as a random factor. Differences between community composition before and after haul net sampling were likewise tested using a Permutational multivariate analyses of variance (PERMANOVA, distance=’jaccard’, permutations=999) with bore as a random factor. Finally, a multipattern analysis was run using the R package ‘indicspecies’ (de Caceras &Legendre, 2009) to determine which taxa were more prevalent in the sampling before or after net-haul samples.
An asymptotic regression curve was fitted to the accumulation curves for each replicate using the ‘AR.2’ function in the R package ‘drc’ (Ritz, Baty, Streibig, & Gerhard, 2015). The replicates for each bore hole were then merged and a Mantel-test from the R package ‘ade4’ (Dray & Dufour, 2007) was used to test the correlations between similarity in community composition and the distances between the bores. The clustering of bores was analysed using the SIMPROF (Similarity Profile Analysis) function in the ‘CLUSTIG’ package (Whitaker & Christman, 2014).