Data from: Primer sets evaluation and sampling method assessment for the monitoring of fish communities in the North-western part of the Mediterranean Sea through eDNA metabarcoding

Sabourault, Cecile 1 ; Roblet, Sylvain1 ; Benoit, Derijard2 ; Priouzeau, Fabrice1 ; Gambini, Gilles1

Research facility: Ecology and Conservation Science for Sustainable Seas

Published Jul 03, 2024 on Dryad. https://doi.org/10.5061/dryad.612jm648m

Abstract

Environmental DNA (eDNA) metabarcoding appears to be a promising tool for surveying fish communities. However, the effectiveness of this method relies on primer set performance and on a robust sampling strategy. While some studies have evaluated the efficiency of several primers for fish detection, it has not yet been assessed in situ for the Mediterranean Sea. In addition, mainly surface waters were sampled and no filter porosity testing was performed. In this pilot study, our aim was to evaluate the ability of six primer sets, targeting 12S rRNA (AcMDB07; MiFish; Tele04) or 16S rRNA (Fish16S; Fish16SFD; Vert16S) loci, to detect fish species in the Mediterranean Sea using a metabarcoding approach. We also assessed the influence of sampling depth and filter pore size (0.45 µm versus 5 µm filters). To achieve this, we developed a novel sampling strategy allowing simultaneous surface and bottom filtration of large water volumes along on-site the same transect. We found that 16S rRNA primer sets enabled more fish taxa to be detected across each taxonomic level. The best combination was Fish16S/Vert16S/AcMDB07, which recovered 95% of the 97 fish species detected in our study. There were highly significant differences in species composition between surface and bottom samples. Filters of 0.45 µm led to the detection of significantly more fish species. Therefore, to maximize fish detection in the studied area, we recommend to filter both surface and bottom waters through 0.45 µm filters and to use a combination of these three primer sets.

Sampling strategy

This study was conducted over a two-day period (03/05/2022 - 04/05/2022) in the Marine Protected Area (MPA) of Cap Roux, located in the North-western part of the Mediterranean Sea, between Cannes and Saint-Raphael, France. Established in 2003, it covers an area of 450 ha spanning from the shoreline to the 100 m isobath. All types of fishing are prohibited but permanent surveillance is lacking. The habitats within the MPA consist of typical ecosystems of the Mediterranean Sea such as Posidonia oceanica meadows, rocky and coralligenous reefs. We have chosen this study area due to its high species richness, making it an ideal sampling location to test our metabarcoding strategy.

For this study, we have developed a novel sampling strategy allowing the simultaneous filtration of surface and bottom water. Two transects of approximately 1.3 km in length were designed, inside the MPA, crossing several habitats to allow the detection of a broad range of species . Along these transects, we collected two samples concomitantly. The first one consisted of 30 L of surface water filtered one meter below the surface with a diaphragm pump (Argaly; flow: 1.0 L/min) attached to a boat. The second sample consisted of 30 L of bottom water filtered one meter above the substrate with a custom-made waterproof pump (flow: 1.0 L/min) fixed on a Diver Propulsion Vehicle (DPV) driven by a scuba diver . Both pumps were started at the same moment. To enable the simultaneous pumping of water from both the surface and the bottom along the same transect, a second diver carried a buoy, enabling the boat to closely follow the divers, adapting its speed to their pace . The scuba divers navigated effectively underwater by identifying the capes and determining the appropriate time to spend in each habitat. Each transect was duplicated, resulting in a total of eight samples. Both pumps (i.e., surface and bottom) were connected to a filtration capsule (eDNA water filter, Waterra; 600 cm²; Polyethersulfone), allowing immediate filtration of large water volumes. Two pore sizes, 0.45 µm and 5 µm, were tested. Since field sampling occurred during the plankton bloom period, we were expecting potential filter clogging. Thus, we wanted to try water filtration through a large mesh size (i.e., 5 µm) to assess if this porosity would perform better than conventional 0.45 µm filters in our area.

After the filtration step, 50 mL of Longmire buffer solution (Longmire et al., 1997), an effective solution to allow long term conservation of eDNA samples, was directly injected into the capsules which were shaken by hand. eDNA capsules were always handled with gloves to avoid contamination. Upon returning to the laboratory, the capsules were vigorously agitated again for 1 minute. The 50 mL extract was finally stored at room temperature in the dark until DNA extraction.

In addition, we collected two water samples in the aquarium of the Monaco Oceanographic Museum (MOM) (Table S2). Sampling in an aquarium was performed because we had prior knowledge of the species composition, enabling us to assess the performance of our metabarcoding strategy by comparing the number of detected species with the known list of fish species. Moreover, since fish composition is different between the aquarium and field samples (i.e., some species absent from the field samples might be found in the aquarium), it gave us more information on the taxonomic coverage of each primer set. For each capsule, 30 L of water was filtered from the surface across three different tanks containing only Mediterranean fish species. One sample was collected with a 0.45 µm pore size capsule and the other with a 5 µm capsule. Following the filtration step, the capsules were treated in the same manner as the Cap Roux field samples.

Metabarcoding analysis: Extraction, PCR, sequencing

All the metabarcoding laboratory steps were performed by Argaly (Sainte-Hélène-du-Lac, France), using the following protocol: DNA extraction from the 10 samples was carried out in a laboratory dedicated to handle eDNA water samples following the NucleoSpin Soil kit protocol (Macherey Nagel) with the following modifications: the 50 mL falcon tubes were centrifuged for 1 h at 12,000 g. The pellets were then resuspended in ATL buffer and proteinase K, and placed for 2 h at 56 °C to lyse cells and cell debris. The extraction procedure was continued according to the manufacturer's protocol and the resulting DNA extracts were eluted in a final volume of 100 μL of elution buffer.

Subsequently DNA from each sample was amplified in 12 replicates for each primer set. Each PCR replicate was uniquely identified by a combination of two eight-base tags appended to the PCR primer at the 5’ end. These tags were used during bioinformatics analysis to assign sequences to the corresponding replicate. Following amplification, all samples were purified with the MinElute purification kit (Qiagen). Library constructions and sequencing were then performed by Fasteris (Geneva, Switzerland). The libraries were prepared according to the Metafast protocol (analysis), designed to minimize sequencing artefacts. The libraries were then sequenced in several Illumina MiSeq runs with paired-end reads of 2 x 150 bp or 2 x 250 bp depending on the amplicon’s length.

Various quality controls were conducted at each step of the protocol to identify potential contamination, ensuring an accurate interpretation of the results. For each PCR replicate, the following controls were performed: a negative extraction control, a negative PCR control, a positive control and eight bioinformatic controls. The positive control corresponded to a DNA sample from fish stomach contents diluted to 1/10th previously sequenced by Argaly. The success of the amplifications and purifications was confirmed on a 2% agarose gel (E-Gel Power Snap, Invitrogen).

Metabarcoding analysis: Bioinformatics

Argaly conducted the bioinformatic steps, using the following procedure: the raw sequence data for each primer, were analysed using the suite of OBITools programs (https://pythonhosted.org/OBITools/welcome.html; Boyer et al., 2016) and the SumaClust clustering tool (Mercier et al., 2013), which are specifically designed for analysing metabarcoding data. More specifically, the paired sequences were first assembled (“illuminapairedend” command), then only the sequences with an alignment score >= 40 (i.e., corresponding to an overlap of at least 10 bases) were assigned to the corresponding amplification replicate, thanks to the tags inserted in the 5' of the primers (“ngsfilter” command). The resulting dataset was dereplicated (“obiuniq” command), then filtered (“obigrep” command) to remove low quality sequences (i.e., containing at least one N), sequences whose length does not belong to the length range observed in silico for the target group, and singletons (i.e., sequences observed only once in the dataset). SumaClust was then used to group sequences sharing 97% identity into clusters. The abundances of sequences belonging to each cluster were summed for each PCR replicate. The cluster head, representing the most abundant sequence in the cluster, was chosen as the representative sequence, and clusters appearing less than 10 times in a sample were deleted. A taxonomic assignment of the cluster heads was then performed with the “ecotag” command, to obtain a list of MOTUs (Molecular Operational Taxonomic Units). The reference sequences used for this taxonomic assignment were obtained by performing an in-silico PCR on the public sequence database GenBank (v.249) with the ecoPCR program (Ficetola et al., 2010). This in silico PCR was conducted using the PCR primers associated with each marker allowing a maximum of three mismatches per primer and retaining only sequences assigned at least at the family level.

The R package “metabaR” (Zinger et al., 2021) was then used to remove artefactual sequences from the resulting dataset that are present in low abundance in the metabarcoding data, but which may influence the ecological conclusions that can be drawn from them (Calderón‐Sanou et al., 2020). This included removing (1) MOTUs with sequence similarity to any sequence in the reference database below 0.95, as they are potential chimeras; (2) MOTUs whose frequency over the entire dataset is maximum in at least one negative control ("max" method of the “contaslayer” function), because they are potential contaminants; and (3) MOTUs with a relative frequency < 0.03% within a PCR replicate (“tagjumpslayer” function), because they are potentially artefacts generated during sequencing library construction (i.e., "tag jumps"; Schnell et al., 2015). PCR replicates with a sequencing coverage < 1000 sequences were also removed and then the remaining PCR replicates were aggregated by sample using the “aggregate_pcrs” function. Finally, MOTUs observed less than 10 times in a sample were recoded as absent in that sample.

After receiving the results from Argaly, manual verification and modification of the taxonomic assignations were performed. Non-fish taxa or freshwater fish sequences were deleted and marine fish sequences were reviewed based on biogeographic data according to criteria used by Aglieri et al. (2021).

Bioinformatic pipeline code: OBIToolsScript_Fish16S

#!/bin/bash
#

# Read assembly
illuminapairedend -r AOYG-107_R2.fastq AOYG-107_R1.fastq > AOYG-107.fastq

# Selection of sequences with an alignment score > 40
obiannotate -S goodAli:'"Alignment" if score>40.00 else "Bad"' AOYG-107.fastq | obisplit -t goodAli -p AOYG-107_

# Sequence assignment to the right PCR replicate based on the descriptor file AOYG-107.ngs
ngsfilter -t AOYG-107.ngs -u unidentified_AOYG-107.fastq AOYG-107_Alignment.fastq > AOYG-107_ngs.fastq

# Sequence dereplication
obiuniq -m sample AOYG-107_ngs.fastq > AOYG-107_uniq.fasta

# Discarding of sequence observed only once, containing ambiguous nucleotides, and/or whose size is outside the range defined in silico 
obigrep -s '^[ACGT]+$' -p'count>1' -p 'seq_length>11' -p 'seq_length<95' AOYG-107_uniq.fasta > AOYG-107_filt.fasta

# Clustering (and identification of clusters)
sumaclust -n -t 0.97 AOYG-107_filt.fasta > AOYG-107_suma.fasta

# Gathering of sequences by clusters and addition of their counts
obiselect -c cluster --merge=sample -f "cluster_center==True" AOYG-107_suma.fasta > AOYG-107_centers.fasta

# Selection of clusters with at least 10 reads in at least one PCR replicate
obigrep -p 'max(merged_sample.values()) >= 10' AOYG-107_centers.fasta > AOYG-107_centers10.fasta

# Taxonomic assignment using the reference database db_Fish16S_genbank249.fasta and the GenBank taxonomy v249
ecotag -d /bioinfo/Databases/Genbank/Genbank_249.0_light/genbank249 -R db_Fish16S_genbank249.fasta AOYG-107_centers10.fasta > AOYG-107_tag.fasta

# File annotation and preparation for exportation
obiannotate -d /bioinfo/Databases/Genbank/Genbank_249.0_light/genbank249 --with-taxon-at-rank=class \
 --with-taxon-at-rank=order --with-taxon-at-rank=phylum --with-taxon-at-rank=kingdom AOYG-107_tag.fasta  \
 | obiannotate -k species_name -k genus_name -k family_name -k order_name -k class_name -k phylum_name  \
 -k kingdom_name -k species -k genus -k family -k order -k class -k phylum -k kingdom -k scientific_name \
  -k species_list -k taxid -k best_identity -k scientific_name_by_db -k merged_sample -k count | \
  obisort -r -k count  | obiannotate --seq-rank | obiannotate --set-identifier='"'Fish16S'_%05d" % seq_rank' \
   | obiannotate --delete-tag=seq_rank  | obitab -o -d > AOYG-107.obitab

Data from: Primer sets evaluation and sampling method assessment for the monitoring of fish communities in the North-western part of the Mediterranean Sea through eDNA metabarcoding

Data files

Abstract

Description of the data and file structure

Data from: Primer sets evaluation and sampling method assessment for the monitoring of fish communities in the North-western part of the Mediterranean Sea through eDNA metabarcoding

Data files

Abstract

README: Primer sets evaluation and sampling method assessment for the monitoring of fish communities in the North-western part of the Mediterranean Sea through eDNA metabarcoding

Description of the data and file structure

Methods

Works referencing this dataset