Fromsight to sequence: Underwater visual census vs eDNA metabarcoding for the monitoring of taxonomic and functional fish diversity
Data files
Jun 10, 2025 version files 754.56 MB
-
README.md
3.99 KB
-
Roblet_Ecological_indicators_2024.zip
754.55 MB
Abstract
Fish monitoring is essential for assessing the effects of natural and anthropic stressors on marine ecosystems. In this context, environmental DNA (eDNA) metabarcoding appears to be a promising tool, due to its efficiency in species detection. However, before this method can be fully implemented in monitoring programs, more studies are needed to evaluate its ability to assess the composition of fish assemblages compared with traditional survey methods that have been used for decades. Here, we used both eDNA metabarcoding and Underwater Visual Census (UVC) to assess the taxonomic and functional diversity (presence-absence data) of Mediterranean fish communities. We collected eDNA samples and performed UVC strip transects inside and outside four Marine Protected Areas in the Mediterranean Sea. Samples for eDNA analysis were collected by filtering seawater simultaneously at the surface and the bottom, and DNA was amplified using a combination of three sets of primers. We found that eDNA alone made an outstanding characterisation of fish composition with the detection of 95% of the 60 taxa identified in this study, whereas UVC recovered only 58% of them. Functional diversity was better evaluated with eDNA than with UVC, with the detection of a greater breadth of functional traits. eDNA was even better at characterising functional than taxonomic diversity, providing reliable information on ecosystem functioning with little sampling effort. Together these results suggest that eDNA metabarcoding offers great potential for surveying complex marine ecosystems. Combining eDNA metabarcoding and UVC in integrated monitoring programs would therefore improve monitoring strategies and enhance our understanding of fish communities, a key step promoting their conservation.
https://doi.org/10.5061/dryad.9cnp5hqt5
Description of the data and file structure
eDNA raw data description
- Library AOYG-211 contains sequences obtained using AcMDB07 primer set
- Library AOYG-209-merged contains sequences obtained using Fish16S primer set
- Library AOYG-197 contains sequences obtained using Vert16S primer set
In each library folder, two files, one for R1 reads and the other for R2 reads are present for every PCR replicate including controls. Some fastq files are empty because there was no sequence amplified and sequenced for these given PCR replicates.
The names of PCR replicates corresponding to field samples are:
ECSBXXX (XXX corresponding to the sample’s number)
Various quality controls were conducted at each step of the protocol to identify potential contamination, ensuring an accurate interpretation of the results. For each primer set, the following controls were performed: four negative extraction controls, three negative PCR controls, two positive controls, and eight bioinformatic controls.
Positive controls correspond to eDNA samples collected from previous projects analysed by Argaly and targeting marine fishes.
The names of PCR replicates corresponding to controls are:
- BLNKXXX, for bioinformatics blanks that allow tag jumps to be filtered
- CEXTXXX, for extraction controls
- CPCRXXX, for PCR controls
- CPOSXXX, for positive controls
For each primer set, a text file (.ngsfilter) describes the tag combinations associated with each PCR replicate.
UVC raw data
A CSV file contains the raw UVC data.
Each row corresponds to a given UVC transect (for example: 220228.PQR.2.IN.Roche.P.1.1 corresponds to the transect 1 of the Site Roche 1 (Roche.P.1.1) located inside the Pequerolle MPA (PQR.IN) and performed the 28th of February, 2022 (220228)).
Each column corresponds to a given fish taxa (species, genus, or family).
Functional raw data
A CSV file contains the values, for each fish taxa, of the functional traits considered in this study to compute functional diversity metrics.
Functional traits:
- Maximum size
- 1- very small (< 10cm)
- 2- small (10 - 30 cm)
- 3- medium (30 - 50 cm)
- 4- large (50 - 100 cm)
- 5- very large (> 100 cm)
- Schooling behaviour
- 1- non-schooling (solitary)
- 2- facultative schooler (can form school)
- 3- obligate schooler (always in school)
- Depth range
- 1- shallow (< 10 m)
- 2- medium (up to 50 m)
- 3- deep (> 50 m)
- 4- broad (covering more than one range
- Mobility
- 1- sedentary
- 2- mobile (mobile within a reef)
- 3- very mobile (mobile between reefs)
- Period of activity
- 1- diurnal
- 2- both diurnal and nocturnal
- 3- nocturnal
- Position in the water column
- 1- benthic
- 2- demersal
- 3- pelagic
- Diet
- herbivore: TL : 2 - 2.1
- Omnivore1 (Omnivores with preference for vegetable material): 2.1 < TL < 2.9
- Omnivore2 (Omnivores with preference for animal material): 2.9 < TL < 3.7
- Carnivore (Preference for invertebrates): 3.7 < TL < 4
- Piscivore (Preference for fish and cephalopods): TL > 4
Example of interpretation for the species Anthias anthias:
| Species | Size | Position | Schooling | Activity | Mobility | Depth range | Diet |
|---|---|---|---|---|---|---|---|
| Anthias anthias | 2 | 2 | 2 | 1 | 1 | 4 | Carnivore |
Size: 2 = Small
Position: 2 = Demersal
Schooling: 2 = Facultative schooler
Activity: 1 = Diurnal
Mobility: 1 = Sedentary
Depth range: 4 = Broad
Diet: Carnivore = Carnivore
Sharing/Access information
n/a
Code/Software
n/a
Metabarcoding: water sampling
We have followed the eDNA metabarcoding sampling method described in Roblet et al. (2024) which is highly effective in detecting fish species. This sampling strategy relies on the simultaneous filtration of two samples along the same transect, one from the surface (i.e., one meter below the surface) and the other from the bottom (i.e., one meter above the seafloor). For bottom samples, sea-water filtration was performed by divers with the underwater pump attached to a diver propulsion vehicle. For the surface sample, sea-water filtration was conducted from a boat following the divers to ensure that surface and bottom samples were collected along the same transect. Four transects with two replicates for each (i.e., surface and bottom samples), were conducted per MPA, resulting in a total of 32 eDNA samples of 30 L.
Immediately after sampling, 50 mL of Longmire buffer solution (Longmire et al., 1997) was injected into the eDNA capsules to allow the long-term conservation of eDNA before the laboratory procedure. eDNA samples were always manipulated with gloves to avoid contamination. Back at the lab, capsules were shaken vigorously and the eDNA extract was stored at room temperature in the dark until extraction.
During the field campaign, several negative field controls were performed to check for contamination that could have occurred during capsule handling on the boat. These controls consisted of 1 L of ultrapure water filtered with a capsule connected to the surface pump. They were treated the same way as true field samples.
Metabarcoding: lab processing
DNA extraction and PCR amplification were performed by Argaly (Sainte-Hélène-du-Lac, France), in dedicated laboratories for handling eDNA samples. Extraction was conducted following the NucleoSpin Soil kit protocol (Macherey Nagel) with the following modifications: the 50 mL falcon tubes were centrifuged for 1 h at 12,000 g. The pellets were then resuspended in ATL buffer and proteinase K for 2 h at 56°C to lyse cells and cell debris. The extraction procedure was continued according to the manufacturer's protocol and the resulting DNA extracts were eluted in a final volume of 100 μL of elution buffer.
In this study, we used three primer sets selected from the recent pilot study (Roblet et al., 2024): Fish16S, an Actinopterygii specific primer set that targets the 16S rRNA locus; AcMDB07, which is also Actinopterygii specific and targets the 12S rRNA gene; and Vert16S, which targets the 16S locus but is Vertebrate specific. Therefore, these primer sets may detect both Actinopterygian and Chondrichthyan fish species. These primer sets each showed promising results, allowing the detection of many fish species, and were complementary, which justifies combining them (Roblet et al., 2024).
For each of these three primer sets, the extracted DNA from each sample was amplified in 12 replicates. Each PCR replicate was uniquely identified by a combination of two eight-base tags appended to the PCR primer at the 5’ end. These tags were used during bioinformatics analysis to assign sequences to the corresponding replicate. Following amplification, all samples were purified with the MinElute purification kit (Qiagen). Library constructions and sequencing were then performed by Fasteris (Geneva, Switzerland). The libraries were prepared according to the Metafast protocol, designed to minimize sequencing artefacts. The libraries were then sequenced in several Illumina MiSeq runs with paired-end reads of 2 x 250 bp (for AcMDB07 and Vert16S) or 2 x 150 bp (for Fish16S).
Various quality controls were conducted at each step of the protocol to identify potential contamination, ensuring an accurate interpretation of the results. For each primer set, the following controls were performed: four negative extraction controls, three negative PCR controls, two positive controls and eight bioinformatic controls. The success of the amplifications and purifications was confirmed on a 2% agarose gel (E-Gel Power Snap, Invitrogen).
Metabarcoding: Bioinformatics
Argaly conducted the bioinformatic steps, using the following procedure: the raw sequence data for each primer were analysed using the suite of OBITools programs (https://pythonhosted.org/OBITools/welcome.html; Boyer et al., 2016), which is designed specifically for metabarcoding data analysis. For each primer set, paired sequences were assembled (“illuminapairedend” command). Then, sequences with an alignment score >= 40 (i.e., corresponding to an overlap of at least 10 bases) were assigned to the corresponding amplification replicate, thanks to the tags inserted in the 5' end of the primers (“ngsfilter” command). The resulting dataset was dereplicated (“obiuniq” command), then filtered (“obigrep” command) to remove low-quality sequences (i.e., containing at least one N), sequences whose length did not belong to the length range observed in silico for the target group, and singletons (i.e., sequences observed only once in the dataset). Sequences sharing 97% identity were grouped into clusters using SumaClust (Mercier et al., 2013). The abundances of sequences within each cluster were summed for each PCR replicate. The cluster head, representing the most abundant sequence in the cluster, was chosen as the representative sequence. Clusters appearing less than 10 times in a sample were deleted. A taxonomic assignment of the cluster heads was performed using the “ecotag” command to obtain a list of MOTUs (Molecular Operational Taxonomic Units). The reference sequences used for this taxonomic assignment were obtained by performing an in-silico PCR on the public sequence database GenBank (v.249) using the ecoPCR program (Ficetola et al., 2010). This in silico PCR was conducted using the PCR primers associated with each marker, allowing a maximum of three mismatches per primer and retaining only sequences assigned at least at the family level.
The R package “metabaR” (Zinger et al., 2021) was then used to remove artefactual sequences that are present in low abundance in the metabarcoding data, but which may influence the ecological conclusions that can be drawn from them (Calderón‐Sanou et al., 2020). This process included removing i) MOTUs with sequence similarity to any sequence in the reference database below 0.95, as they are potential chimeras; ii) MOTUs whose frequency over the entire dataset is maximum in at least one negative control ("max" method of the “contaslayer” function), because they are potential contaminants; and iii) MOTUs with a relative frequency < 0.03% within a PCR replicate (“tagjumpslayer” function), because they are potentially artefacts generated during library construction for sequencing (i.e., "tag jumps"; Schnell et al., 2015). PCR replicates with a sequencing coverage < 100 sequences were also removed and then the remaining PCR replicates were aggregated by sample using the “aggregate_pcrs” function. Finally, MOTUs observed less than 10 times in a sample were recoded as absent in that sample.
Manual verification of the data output from the bioinformatic pipeline and modification of the automatic taxonomic assignations were performed for each primer set dataset as follows:
-non-fish taxa and freshwater fish MOTUs were deleted;
-MOTUs assigned to marine fish were reviewed by blasting the sequences on Genbank and were modified if needed following these criteria:
- Based on biogeographic data, if a sequence was assigned to a non-Mediterranean species, the assignment was changed to the next lowest possible taxonomic rank known to occur in the Mediterranean Sea. If there was only one known species of this particular genus or family occurring in the Mediterranean Sea, the assignment was changed to that species.
- Based on biogeographic data, when a sequence was assigned to a taxonomic rank higher than the species level and there was only known one species with this genus or family occurring in the Mediterranean Sea, we changed the assignment to that species.
Datasets of the three primer sets were then transformed into presence-absence data and pooled into a single common dataset. As we were not interested in intraspecific diversity, MOTUs assigned to the same taxa were pooled.
