Skip to main content

Estimating the extended and hidden species diversity from environmental DNA in hyper-diverse regions

Cite this dataset

Juhel, Jean-Baptiste et al. (2022). Estimating the extended and hidden species diversity from environmental DNA in hyper-diverse regions [Dataset]. Dryad.


Species inventories are the building blocks of our assessment of biodiversity patterns and human impact. Yet, historical inventories based on visual observations are often incomplete impairing subsequent analyses of ecological mechanisms, extinction risk and management success. Environmental DNA (eDNA) metabarcoding is an emerging tool that can provide wider biodiversity assessments than classical visual-based surveys. However, eDNA-based inventories remain limited by sampling effort and reference database incompleteness. In this study, we propose a new framework coupling eDNA surveys and sampling-theory methods to estimate species richness in under-sampled and hyper-diverse regions where some species remain absent from the checklist or undetected by visual surveys. We applied this framework to the coastal fish diversity in the heart of the Coral Triangle, the richest marine biodiversity hotspot worldwide. Combining data from 279 underwater visual censuses, 92 eDNA samples and an extensive custom genetic reference database, we show that eDNA metabarcoding recorded 196 putative species not detected by underwater visual census including 37 species absent from the regional checklist. We provide an updated checklist of marine fishes in the 'Raja Ampat Bird's Head Peninsula' ecoregion with 2,534 species including 1,761 confirmed and 773 highly probable presences. The Chao lower-bound diversity estimator, based on the incidence of rare species, shows that the region potentially hosts an additional 123 fish species, including pelagic, cryptobenthic and vulnerable species. The extended and hidden biodiversity along with their asymptotic estimates highlight the ability of eDNA to expand regional inventories and species distributions to better guide conservation strategies.


Updated Marine fish checklist

We constructed an extensive species checklist of the ' Bird's Head Peninsula ' (BHP) of West Papua Province' region based on historical fishing records and visual surveys (Kulbicki et al. 2013) including the ecoregions of the study area, extended with species occurring within and in the adjacent ecoregions with similar environments (Allen & Erdmann 2012, Froese 2020), and with the specimen collected and observed during the 2017 survey. Species names were checked and updated using the authoritative reference and searchable on-line database Eschmeyer's Catalog of Fishes ( 2020). This extensive checklist identifies 2,534 marine fish species including 1,761 species with confirmed occurrences belonging to 582 genera and 144 families ; it also includes 736 species that are present in close regions and similar environments (See Appendix S4). This exceptional fish diversity is subject to a range of threats (Mangubhai et al. 2012, Campbell et al. 2020).

Underwater visual census

We retrieved data from 186 UVC transects performed during Aug.-Sept. 2014, Sept. 2015 and March 2018 from the Reef Life Survey initiative ( Additionally, we used data from 93 UVC transects performed between 2004 and 2013 in the region (Cinner et al. 2016, Fig. 1). All surveys used standardized protocols with two divers recording fish identity, abundance and size in 5x50m, or 2.5x50m for Cinner et al. (2016), blocks either side of the transect line. The two transect blocks include independent counts that are averaged to characterized the transect (Edgar et al. 2020).

Environmental DNA filtering and processing

We collected 92 water samples along the south coast of the BHP region of West Papua between October and November 2017 across different reef habitats (estuarine and brackish waters excluded) distributed over an area of 500 km from East to West, with a focus (80 of the 92 samples, or 87%) from the easternmost 210 km sector (Fig. 1). We collected the water samples in DNA-free plastic bags from a dinghy, during closed-circuit rebreather diving (depths between 10 - 100m) as close as possible to the habitat or using Niskin water samplers (depths between 100 - 300m) (Hocdé et al. 2020). Every water sampling session were performed before and never at the same time as fish collection to avoid in situ contamination. We coupled a pressure and temperature sensor to the Niskin bottle to control the sampling depth and characterize the water mass via the vertical temperature profile. For each sample, we filtered 2L of seawater with sterile Sterivex filter capsules (Merck© Millipore; pore size 0.22µm) and disposable sterile syringes. Immediately after, we filled the filter units with lysis conservation buffer (CL1 buffer SPYGEN©) and stored them in 50 mL screw-cap tubes at -20°C. The DNA extraction and amplification were performed following a modified protocol of Pont et al. (2018) including 12 separate PCR amplifications per sample. A teleost-specific 12S mitochondrial rDNA primer (teleo, forward primer-ACACCGCCCGTCACTCT, reverse primer -CTTCCGGTACACTTACCATG, Valentini et al. 2016) was used for the amplification of metabarcoding sequences (see Appendix S1 for laboratory analyses and bioinformatic analyses).

Among fish eDNA 12S primers, teleo provides a strong performance to detect fish diversity even in highly diverse ecosystems (Collins et al. 2019, Polanco Fernández et al. 2022). Although alternative fish eDNA primers might cover a larger proportion of fishes in the reference database and hence be more informative on species identification, there is currently no primers located outside the 12S with similar performance (Zhang et al. 2020).

We followed a contamination control protocol during both field and laboratory stages (Valentini et al. 2016). Water sample processing included the use of disposable gloves and single-use filtration equipment, and the bleaching (50% bleach) of Niskin bottles between samples. Staffs who performed eDNA filtration were not involved in tissue sampling of fish and used a dedicated workspace to avoid both contact and airborne contamination.

Genetic reference database completion

During the same survey along the south-western coast of the BHP in West Papua, we collected 1,466 individuals from 413 species, 180 genera and 69 families of fishes along the shore. The specimens were mainly collected by hand or with 4 to 8m long bottom gillnets deployed by open-circuit and closed-circuit divers in the 0-100 m depth range (Hocdé et al. 2020). Some brackish and estuarine fishes were also collected with 10m beach purse seines and pelagic fish with line fishing and spearfishing. We used morphological features and 652 bp CO1 (Cytochrome Oxidase 1) targeted genetic sequencing to identify the specimens. Then we amplified and sequenced the individuals on a large fraction of the 12S mitochondrial rDNA region (480 bp) with two distinct pairs of primers respectively designed for teleosts and elasmobranchs to improve sequencing results. Finally, the 12S teleo region defined in Valentini et al. (2016) was extracted from the obtained sequences to complement the EMBL genetic reference database (European Molecular Biology Laboratory,, version 141, downloaded on January 2020, Baker et al. 2000) and improve taxonomic assignments (see Appendix S2 and S3 for the reference database and the methodological details of its completion).

To evaluate the completeness of the online database for the teleo region of the 12S mitochondrial DNA, we performed an in silico PCR on the EMBL database with ecoPCR (Ficetola et al. 2010) using the teleo primer sequences, allowing up to 3 mismatches. We compared the generated list of sequenced species to the extensive species checklist of the BHP ecoregion. Among the 1,761 species of the Bird’s Head Peninsula checklist for which presence is confirmed, only 496 species (28%) were sequenced in EMBL for the teleo region. The addition of sequences retrieved from our fish sampling increased this list to 762 sequenced species (43.4%). Additionally, 21 species absent from the historical checklist were collected, or observed and clearly identified, during the development of the genetic reference database (see Appendix S4 for the extensive checklist).

Taxonomic assignments

The metabarcoding workflow was based on the VSEARCH toolkit and the clustering algorithm SWARM that groups multiple sequence variants into MOTUs (Molecular Operational Taxonomic Units, Mahé et al. 2014) to clean PCR and sequencing errors. We performed taxonomic assignments using the ecotag program (lowest common ancestor algorithm) from the OBITOOLS toolkit (Boyer et al. 2016) against our custom reference database and the global public EMBL genetic database (release 141, downloaded on January 2020). For each MOTU, we chose the taxonomic assignment with the highest similarity from either the custom reference database or EMBL. We only retained the assignments with 100% similarity to either reference database so matching perfectly over the full length of the sequence (see Appendix S1). Some sequences could match at 100% but correspond to several species due to limited taxonomic resolution on our marker region, preventing a taxonomical assignment at the species level. For those sequences, we determined, if possible, the most probable species being detected based on the list of species corresponding to the sequence and the known spatial distribution of those species. For other sequences, it was not possible to narrow down the list of possible species if those are all known to occur in the region or in the vicinity of the region, so these sequences were tagged with a list of possible assignations (Appendix S6) and removed from the analyses.

Fish traits

The extended fish diversity may be characterized by certain traits or behaviors which may limit the detection by classical (fishing or visual records) and eDNA surveys (Thalinger et al. 2021). To investigate this bias, we retrieved available data on habitat (reef or pelagic), diet, circadian activity, maximum body length, and IUCN (International Union for Conservation of Nature) conservation status for all the species detected by eDNA from Fishbase ( and compared them among the different sets of species.