Skip to main content

Environmental DNA in a global biodiversity hotspot: Lessons from coral reef fish diversity across the Indonesian archipelago

Cite this dataset

Marwayana, Onny; Gold, Zachary; Meyer, Christopher; Barber, Paul (2021). Environmental DNA in a global biodiversity hotspot: Lessons from coral reef fish diversity across the Indonesian archipelago [Dataset]. Dryad.


Indonesia is the heart of the Coral Triangle, the world’s most diverse marine ecosystem. Preserving the biological and economic value of this marine biodiversity requires efficient and economical ecosystem monitoring, yet our understanding of marine biodiversity in this region remains limited. Towards this end, this study uses environmental DNA (eDNA) to survey fish communities across a well-documented biodiversity gradient in Indonesia. A total of  6,608,693 sequence reads of MiFish 12S rRNA from 39 sites spanning 7 regions of Indonesia revealed 1,099 fish Amplified Sequence Variants (ASVs), 80.4% of which could be identified to species through the inclusion of new reference sequences from Mo’orea BIOCODE. Patterns of regional fish diversity inferred from eDNA broadly conformed to expectations, with the highest fish biodiversity in Raja Ampat and lowest in Western Indonesia. Similarly, zeta diversity analysis showed greater community turnover in higher diversity reefs of Eastern Indonesia, and greater community similarity in low diversity regions of Western Indonesia. However, several results highlight challenges for eDNA in megadiverse ecosystems. Despite a two-fold difference in fish diversity between Eastern and Western Indonesia, mean ASVs recovered per one-liter seawater was relatively similar across 7 regions of Indonesia. Moreover, although ASV recovery from individual seawater samples saturated, ASV recovery did not saturate at the site or region level, indicating that sampling/sequencing efforts employed in lower diversity temperate marine ecosystems are insufficient for biodiversity hotspots like the Coral Triangle. Despite these limitations, 36.3% to 84.1% (mean 57.1%) of fish species detected by eDNA at 8 sites within Raja Ampat were not observed during intensive visual surveys. Taxa missed include pelagic (tuna, jacks, scads, mackerels), nocturnal (soldierfish, lanternfish), and crevice dwelling species (eels, blennies, gobies) that are difficult to document in visual surveys. These results demonstrate the added value of eDNA in biodiversity hotspots like the Coral Triangle and the need for further research to understand how best to sample eDNA in high diversity regions like the Coral Triangle to deliver on the promise of eDNA as a tool to monitor marine biodiversity effectively and efficiently.  


Sampling Sites

We collected eDNA samples across Indonesia, spanning a strong fish biodiversity gradient (Bellwood & Meyer, 2009; Roberts et al., 2002). Sampling focused on 7 reef systems within three regions (Fig. 1), including: 1) outside the Coral Triangle in Western Indonesia (Aceh and Batam-Bintan), 2) lower diversity regions of the Coral Triangle in Central Indonesia (Derawan and Wakatobi), and 3) high diversity regions of the Coral Triangle (Eastern Indonesia: Lembeh Strait, Ternate, and Raja Ampat) that have the world’s highest reef fish biodiversity (Allen & Werner, 2002; Bellwood & Meyer, 2009; Roberts et al., 2002).


eDNA Sampling

To assess marine fish diversity with eDNA, we employed a hierarchical sampling design (Table S1). To capture spatial variability in both habitat and eDNA signatures, within each of the 7 reef systems above, we sampled multiple sites separated by at least five kilometers (Andruszkiewicz et al., 2017; Miya et al., 2015), except for Wakatobi, where sites were 2.5 km distant. To maximize the capture of diversity at each sampling site and account for fine-scale spatial heterogeneity in local eDNA signatures, we collected three one-liter “biological replicates” at each site following standard sampling protocols used in temperate ecosystems (Miya et al., 2015); each replicate was no more than 3 meters from each other. To minimize variation in community composition associated with depth, each seawater sample was collected on SCUBA, 1 m off the reef, at depths between 11-15 m.

To isolate eDNA from seawater samples, we filtered one liter of seawater through a 0.22-micron Sterivex™ filter (Millipore®, SIGMA MILLIPORE) following the methods of Miya et al. (2015) and Curd et al. (2018) with one key modification: we collected individual water samples in sterile one liter Kangaroo™ Gravity Feeding Bags (similar to intravenous drip bags) that allow for gravity filtration through the Sterivex columns, a method ideally suited to remote field locations. In addition, we filtered one liter of bottled drinking water at each location to serve as a “field blank” negative control. Filters were stored in a -20˚ C freezer until eDNA was extracted at the Indonesian Biodiversity Research Center (now “BIONESIA”).


eDNA Extraction, Amplification, and Sequencing

We extracted eDNA samples and blanks using the DNeasy Blood & Tissue Kit (QIAGEN, Hilden, Germany) following the modified extraction protocol of Spens et al. (2017), adding 720 µL of ATL buffer and 80 µL of proteinase K directly into the filter cartridge. Extracted DNA was then stored at -20˚C and transported to the University of California, Los Angeles for amplification and library preparation.

We amplified extracted eDNA using the Multiplex PCR Kit (QIAGEN, Hilden, Germany), using the MiFish Teleost 12S rRNA mitochondrial DNA (mtDNA) primers (Miya et al., 2015). We conducted each individual eDNA sample via Polymerase Chain Reactions (PCR) in triplicate to account for potential PCR bias (Andruszkiewicz et al., 2017; Miya et al., 2015; Taberlet et al., 2012). Each PCR reaction consisted of 12.5 µL Qiagen 2x Master Mix, 2.5 µL (2 mM) of the primer, 6.5µL nuclease free water, and one µL the DNA extract. Thermocycling parameters utilized a touchdown protocol, beginning with a 15-minute pre-denaturation step at a 95 °C, followed by a touchdown thermocycling profile consisting of 30 seconds denaturing at 94 °C, 30 seconds annealing at 69.5 °C, and 30 seconds extension at 72 °C, with the annealing temperature dropping by 1.5 °C per cycle until 50 °C. Following this initial touchdown phase, the main cycle consisted of 25 cycles of 94 °C for 30 seconds for denaturation, 50 °C for 30 seconds for annealing and 72 °C for 45 seconds for extension, concluding with a 10-minute final extension at 72 °C. We confirmed successful PCR reactions through electrophoresis of 5µL products for 30 minutes at 150 volts on 2% agarose gels stained with 6x SYBR™ Green (ThermoFisher Scientific, Waltham, MA, USA) for visualization.

To create the sequencing libraries, we pooled the triplicate PCR products from each individual eDNA sample into a single tube and purified these pooled PCR products using the Serapure bead protocol (Faircloth et al., 2014). Next, we quantified the DNA concentration (ng/µL) of each pooled PCR sample using the Qubit Broad Range dsDNA assay (ThermoFisher, Waltham, MA, USA) following the manufacturer protocol, and then then subsequently normalized pooled PCRs to equimolar concentration. We then used the Nextera DNA Library Preparation Kit (Illumina, San Diego, CA, USA) to index each pooled sample using a unique combination of Nextera i5 and i7 primers in a second PCR reaction, following the manufacturer protocol (Curd et al., 2018). The indexing PCR reaction consisted of 12.5 µL Kapa High Fidelity Master Mix (Roche, Basel, Switzerland) 0.625 µL of 1 µM i5 Illumina Nextera indices, 0.625 µL of 1 µM i7 Illumina Nextera indices, and 11.25 µL of PCR product for a total of 10ng of DNA. The indexing PCR thermocycling parameters began with an initial denaturation of 95 ˚C for 5 minutes, followed by 8 cycles of: 98 ˚C denaturation for 30 seconds, 56 ˚C annealing for 30 seconds, and 72 ˚C extension for 3 minutes, ending with a 72 ˚C extension for five minutes. To ensure the indexing PCR was successful, we electrophoresed indexed PCR products at 120 V for 45 minutes on a 2% agarose gel. We then created the final sequencing library by combining the cleaned and quantified indexed PCR products in equal concentration of 10 ng/µl per sample. Libraries were pooled by barcode resulting in two pooled libraries. The final libraries were sequenced at the University of California Berkeley sequencing core on an Illumina MiSeq platform utilizing 300 base pair paired end sequencing.


Bioinformatics and Data Analysis

To determine fish community composition, we used the Anacapa Toolkit (version: 1) to conduct quality control, amplicon sequence variant (ASV) parsing, and taxonomic assignment using user-generated custom reference databases (Curd et al., 2018). The Anacapa Toolkit sequence QC and ASV parsing module relies on cutadapt (version: 1.16) (Dobin et al., 2013), FastX-toolkit (version: 0.0.13) (Gordon & Hannon, 2010), and DADA2 (version 1.6) (Callahan et al., 2016) as dependencies, and the Anacapa classifier modules relies on Bowtie2 (version 2.3.5) (Langmead & Salzberg, 2012) and a modified version of BLCA (Gao, et al., 2017) as dependencies. We processed sequences using the default parameters and assigned taxonomy using a CRUX-generated reference database supplemented with additional 12S reference sequences generated by the Smithsonian (GenBank accessions MZ597881-MZ598481, Table S2). We note that CRUX relies on ecoPCR (version: 1.0.1) (Boyer et al., 2016), blastn (version: 2.6.0) (Altschul et al., 1990; Camacho et al., 2009), and Entrez-qiime (version: 2.0) (Baker, 2016) as dependencies.

The raw ASV community table was decontaminated following Kelly et al. (2018) and McKnight et al. (2019) (See Supplemental Methods). However, we did not conduct site occupancy modeling as we were specifically interested in exploring under sampling of total ASV diversity. Because of noted issues with incomplete reference databases (Juhel et al., 2020), diversity analyses focused on ASVs rather than ASVs successfully assigned to species. In addition, because we focused on fish and the MiFish primers amplify vertebrates more broadly, we filtered out any ASVs that were not Actinopterygii and Chondrichthyes ASVs. Similarly, because of sequence similarities in 12S, ASVs can be erroneously assigned to taxa that are not native to local ecosystems (Gold et al., 2021). As such, we filtered out any ASVs assigned to fishes not known from the Coral Triangle.  After applying these filters, we then transformed all read counts into an eDNA index for beta-diversity statistics (Kelly et al., 2019).


Patterns of Biodiversity Across Indonesia

We conducted analyses of fish biodiversity in a hierarchical fashion. First, we examined diversity at the level of individual sample replicate, where fish diversity was represented by eDNA sequences amplified from each of the three one-liter water samples at an individual sampling site. Second, we examined diversity at the site level, where fish diversity was represented by combining replicate eDNA sequences amplified from each site (Table S1). Lastly, we examined diversity in each of the seven sampled regions by combining diversity across all sites sampled. To better understand the efficacy of standard eDNA sampling protocols in high diversity marine ecosystems, we compared ASV recovery from our high diversity Indonesian reefs to Scorpion Bay, Santa Cruz Island, California, a lower diversity temperate marine ecosystem (Gold et al., 2021) as samples were collected, processed, an analyzed using the same methods outlined above. All analyses were conducted in in R (R Core Team, 2020).

We examined patterns of alpha diversity by exploring ASV richness at each hierarchical level. We then compared sample coverage estimates by conducting ASV accumulation curves using the iNext package (version: 2.0.20) in R (Hsieh et al., 2016). Specifically, we explored the accumulation of ASVs 1) across all replicate samples within each site, 2) within all replicate samples within each region, and 3) within all replicate’s sites within each region. We then explored betadiversity patterns across sample replicates, sites, and regions by calculating Jaccard-binary dissimilarity indices and running PERMANOVA and companion homogeneity of dispersion test (adonis and betadisper functions from vegan (version: 2.5-6) in R). Results of the PERMANOVA were visualized using a nonmetric multidimensional scaling (NMDS) through the phyloseq (version: 1.32.0) and vegan packages in R (R Core Team, 2020).

To explore the scale of species turnover across the biogeographic regions, we conducted zeta-diversity analyses using the zetadiv (version 1.2.0) package (Latombe et al., 2017). Zeta diversity measures the degree of overlap between any set of observed communities (Hui & McGeoch, 2014), in contrast to betadiversity which only compares pairwise overlap (Hui et al., 2018; Latombe et al., 2017; McGeoch et al., 2019). Importantly, the zeta diversity framework allows for the calculation of diversity metrics across multiple samples and sampling strata to better understand patterns of biodiversity (McGeoch et al., 2019), particularly the scale and shape of community turnover (Hui et al., 2018; McGeoch et al., 2019; Simon et al., 2019). From this framework there are two important zeta diversity metrics: zeta diversity decay and species retention rates (McGeoch et al., 2019). Zeta diversity decay measures the decreasing number of shared taxa across greater observed communities (McGeoch et al., 2019). For small numbers of samples (n<5) this can be visually represented as the center of the Venn Diagram between multiple observed communities (i.e., individual eDNA samples).

We compared species retention rates and zeta diversity decay (i.e., the decreasing number of shared taxa across more observed communities) across 1) all replicate samples within each site, 2) within all replicate samples within each region, and 3) within all replicate sites within each region. In particular, we compare the rate of zeta diversity decay and species retention rates across the region to identify sites with lower community turnover across sampling strata. We further explore zeta diversity decay over distance to explore the scale of community turnover across sample replicates, sites, and regions through eDNA data.


eDNA Survey and Visual Census Data

Many eDNA studies do contemporaneous paired comparisons of eDNA to visual census data to compare the efficacy of each approach for biodiversity monitoring (e.g. Gold et al., 2021; Valdivia-Carrillo et al., 2019; Fernández et al., 2020, Stoeckle et al., 2021, Stat et al., 2019). However, comprehensive fish biodiversity studies in the Coral Triangle involve high-intensity visual census surveys involving multiple divers spending 6 hours per day over 10-21 days across a wide range of habitats to capture as much fish diversity as possible (G. Allen, M. Erdmann, pers comm.). Because we could not conduct contemporaneous paired studies, we explored the differences in eDNA and high-intensity visual surveys in capturing local fish diversity by sampling eDNA at three reef sites within Raja Ampat that G. Allen and M. Erdmann comprehensively sampled previously with visual surveys during multiple dives over multiple days. We then compared taxa recovered by eDNA to the visual census data, not to determine which was superior, but instead to better understand the relative strengths of weaknesses of each approach.