Integration of environmental DNA metabarcoding technique to reinforce fish biodiversity assessments in seagrass ecosystems: A case study of Gazi Bay Seagrass meadows
Data files
Oct 06, 2023 version files 332.73 MB
-
README.md
1.13 KB
-
SM-A1_S68_L001_R1_001.fastq.gz
26.98 MB
-
SM-A1_S68_L001_R2_001.fastq.gz
36.55 MB
-
SM-A2_S69_L001_R1_001.fastq.gz
28.55 MB
-
SM-A2_S69_L001_R2_001.fastq.gz
37.57 MB
-
SM-A3_S70_L001_R1_001.fastq.gz
24.81 MB
-
SM-A3_S70_L001_R2_001.fastq.gz
33.06 MB
-
SM-A4_S71_L001_R1_001.fastq.gz
11.36 MB
-
SM-A4_S71_L001_R2_001.fastq.gz
15.49 MB
-
SM-B1_S72_L001_R1_001.fastq.gz
11.78 MB
-
SM-B1_S72_L001_R2_001.fastq.gz
15.65 MB
-
SM-B2_S73_L001_R1_001.fastq.gz
14.03 MB
-
SM-B2_S73_L001_R2_001.fastq.gz
19.37 MB
-
SM-B3_S74_L001_R1_001.fastq.gz
25.45 MB
-
SM-B3_S74_L001_R2_001.fastq.gz
32.07 MB
Abstract
Assessing biodiversity in marine nearshore ecosystems is crucial for effective management, especially in the context of climate change and overexploitation of marine resources. Conventional methods often fall short in providing comprehensive information for managing seagrass ecosystems. However, the emergence of environmental DNA (eDNA) techniques has transformed the field by enabling non-invasive surveys that are cost-effective and provide detailed information with high resolution. In this study, we utilized eDNA to assess fish diversity and compared its effectiveness to conventional techniques such as catch assessment surveys and underwater surveys. We sampled three habitats (A: mangrove-seagrass, B: seagrass only, and C: coral-seagrass) with 4 replicates. Site A recorded 8 fish species, site B had 16 species, and site C, characterized by coral and seagrass habitats, exhibited the highest fish diversity with 45 species (mean H' index = 2.455), underscoring its ecological importance. To ensure accurate taxonomic identification, we utilized an updated MiFish reference database containing a larger number of fish species compared to the initial library. This expanded reference database with 9,569 fish species, facilitated more precise identification and enhanced the reliability of our findings. Notably, the eDNA technique outperformed conventional methods by detecting 23 additional fish species that went undetected using traditional surveys. Moreover, our study documented five fish species previously unknown to occur within the study region, further emphasizing the value of eDNA analysis in uncovering hidden biodiversity. These findings strongly advocate for integrating eDNA techniques into the monitoring and assessment of biodiversity in shallow tropical habitats of the Western Indian Ocean. By leveraging eDNA surveys, we can gain valuable insights into fish diversity, discover hidden species, and make informed decisions for the conservation and management of these ecologically significant areas.
https://doi.org/10.5061/dryad.x69p8czqk
The data was generated from sequencing eDNA samples obtained from Gazi Bay. Sampling was done in three sites with four replicates.
Description of the data and file structure
The data set is labelled based on the sites A, B and C, and the numbers 1-4 represent the replicates. The data is raw fastq files generated using Illumina sequencing.
Sharing/Access information
The raw DNA sequence data has been submitted to the GenBank (www.ncbi.nlm.nih.gov/genbank) databases under BioProject PRJNA777404 and the voucher specimen assigned accession numbers OP787891-4. Scripts and datasets generated and used to obtain results in this study can be accessed through (https://github.com/Kishaz/Gazi_Bay_eDNA.git).
Code/Software
Scripts and datasets generated and used to obtain results in this study can be accessed through (https://github.com/Kishaz/Gazi_Bay_eDNA.git).
Study site
Located in Kenya’s South Coast region and covering ~7 km2, Gazi Bay’s seagrass meadows are surrounded by a dynamic fishing community (Hemminga et al., 1995; Tuda & Wolff, 2018; Pendleton et al., 2012). It is located, (4°25’S, and 39°30’E), ~55 km from Mombasa City. The bay is shallow with a mean depth of ~5 m, ~1.75-3.5 km wide and 3.25 km long with a surface area of ~10 km2 (Bouillon et al., 2007). Being a shallow tropical coastal water system (Musembi et al., 2019), it is surrounded by fringing mangrove forests on the landward side, coral reefs sheltering the bay from the eastern seaward side, freshwater inflow from two rivers and extensive seagrass bed on the shallow continental waters (Figure 1). The bay opens into the Indian Ocean through a relatively wide but shallow (3-8 m deep) entrance in the southern part and are two creeks (Western and Eastern creeks). The western creek is characterised by two freshwater inflows: River Kidogoweni to the north and River Mkurumunji to the west. Among the most active landing sites on Kenya's South Coast with dominance in fishing and fishery-related activities, Gazi Bay has long supported small-scale artisanal multi-species and multi-gear fishing (Kimani et al., 1996; Musembi et al., 2019).
Samples and data collection
All reusable apparatus and reagents used in this study were sterilized by autoclaving at 121°C for 15 minutes. Heat-labile apparatuses were UV sterilized for 1 hour, while heat-labile reagents were filter-sterilized using a 0.22 µm nitrocellulose filter membrane. All working surfaces and equipment were decontaminated using 10% bleach and 70% ethanol. Sample processing including DNA extraction, quantification, and amplification, was performed by a single individual in separate rooms.
The sampling activities were conducted on 12th November 2020, and started just before the ebb current. The sampling scheme was customized following the guidance of the eDNA Society's sampling standardization method (Miya & Sado, 2019). Three sampling sites (10m*8m transect) within the bay were identified and labeled as Site A, B, and C (Figure 1) with their respective GPS coordinate (Table S1). The sampling sites were selected based on the proximity between seagrass and other habitats. Site A represented the seagrass-mangrove habitat, Site B represented the seagrass-only habitat, and Site C represented the seagrass-coral reef habitat. At each site, we randomly collected 1-liter water samples from the sea surface using a sterile Nansen bottle in four replicates at 2-minute intervals, as recommended by Ficetola et al. (2015).
In addition to the sampling process, three 0.5-liter bottles filled with nuclease-free water were intentionally left open during the collection to serve as negative controls. These control samples were handled in the same manner as the actual samples, including exposure to the surrounding environment, but without any target organisms. By including these negative controls, we could monitor and account for any potential contamination or false positive results that might arise from the sampling and laboratory procedures.
The physical-chemical parameters of the seawater were measured, and the description of the habitat was recorded (see Table S2). The sampling process took approximately 10 min per site. The collected water samples were immediately preserved in a cooler box with ice packs and transported to the molecular biology laboratory at the Kenya Marine and Fisheries Research Institute (KMFRI) for filtration. Once the seawater sampling was completed, an underwater visual survey and multi-gear fish catch survey followed up immediately.
Underwater visual surveys were conducted to directly observe fish species present in the study site covering a predetermined transect of 30m * 30m coinciding with the seawater sampling points. Divers equipped with snorkels visually surveyed the underwater habitats, including seagrass beds, mangroves, and coral reefs immediately after seawater sampling. They carefully recorded the fish species observed and their abundance.
Multi-gear fish catch survey in this study involved the use of various fishing gears, including basket traps, hand lines, and reef seines. This was conducted by the local fishers as part of their routine work. Basket traps were deployed by submerging them in submerged seagrass locations within the bay for 6 hours to capture fish. Hand lines, consisting of a line with a baited hook, were used to catch fish manually. Reef seines, which are large nets with weights and floats, were dragged along water columns to capture fish. These fishing gears were deployed in different locations and depths within the bay guided by the fishers to sample the fish population. Catch landing was carefully documented and four voucher specimens of the most dominant fish (Siganus sutor (Valenciennes, 1835)) in the landed catch were obtained and preserved in a cooler box.
DNA extraction, amplification and library preparation
The water samples were processed using a manifold filtration system, and sterile 0.45µm nitrocellulose filter papers were used to filter the samples. These filters were then stored at -80°C until further processing. Total genomic DNA was extracted from the filter papers, as well as from four fish voucher specimens (Siganus sutor (Valenciennes, 1835)) and three DNA extraction negative controls. The DNA extraction was performed using a CTAB-based method, following the protocol described by Miya and Sado (2019). To assess the concentration and purity of the extracted DNA, spectrophotometry was performed using an Eppendorf Bio Spectrometer with software version 4.3.5.0.
The DNA samples were evaluated for their quality and quantity. For amplification of the hypervariable region of the 12S rRNA gene, a universal primer pair MiFish-U-F: GTCGGTAAAACTCGTGCCAGC and MiFish-U-R: CATAGTGGGGTATCTAATCCCAGTTTG was used (Miya et al., 2015). The expected amplicon length was approximately 172 bp, ranging from 163 to 185 bp. In addition, specific primers F1: TCAACCAACCACAAAGACATTGGCAC and R1: TAGACTTCTGGGTGGCCAAAGAATCA (Tabassum et al., 2017) targeting the Cytochrome C oxidase subunit I were used for the identification of the voucher specimens. The amplification reaction was conducted in a 12 μl reaction volume comprising the following components: 6.0 μl of 2 × KAPA HiFi HotStart ReadyMix (KAPA Biosystems), 1.4 μl of each primer (5 μM primer F/R), 2.6 μl of sterile distilled water, and 2.0 μl of DNA template. The amplification reaction was designed in accordance with Miya et al., (2015) while amplification was performed with an initial denaturation step at 95°C for 5 min, followed by 35 cycles of denaturation at 95°C for 30 sec, annealing at 55°C for 30 sec, extension at 72°C for 1 min, and a final extension step at 72°C for 10 min. The PCR products from three rounds of amplification were pooled together. Negative control samples from the field, as well as from the DNA extraction and amplification steps, were also pooled into one sample.
The pooled samples, including the amplicons and the negative controls, were sent to Inqaba Biotechnical Industries, a commercial next-generation sequencing (NGS) service provider in Pretoria, South Africa, for sequencing. The amplicons were purified, end-repaired, and ligated to Illumina-specific adapter sequences using the NEBNext Ultra II DNA library prep kit. After quantification, the samples were individually indexed using NEBNext Multiplex Oligos for Illumina (Dual Index Primers Set 1), and an additional purification step was performed using AMPure XP beads. The libraries were quantified using Agilent Technologies 2100 Bioanalyzer, normalized, and sequenced on the Illumina MiSeq platform using a MiSeq v3 (600 cycles) kit. Additionally, the four voucher specimens were subjected to Sanger sequencing for further confirmation and validation.
Reconstruction of Gazi bay historic fish species list
The historical fish species list for Gazi Bay was collated from published papers archived in online journal repositories. Key search terms “Gazi Bay'' and “Fisheries'' were used in the Google Scholar search engine to find relevant papers and information. This was done during the month of July 2020. A total of six papers published since 1996 were found to have relevant information and fish species that have been recorded in Gazi Bay (Kimani et al., 1996; De Troch et al., 1998; Crona & Rönnbäck, 2007; Nyunja et al., 2009; Samoilys et al., 2017; Musembi et al., 2019). Notably, KMFRI has been conducting fish catch assessment surveys (CAS) in major landing bays in Kenya since 2017; though partially published (Kimani et al., 2018), it was pivotal in compiling the Gazi Bay fish species list.
Data analysis
Taxonomic assignment
The quality of raw demultiplexed Illumina MiSeq sequences was assessed using the FastQC (v0.11.9) to inform downstream analysis parameters (Table S3). The raw sequences were processed through the web-based MiFish pipeline (http://mitofish.aori.u-tokyo.ac.jp/mifish/), first accessed on 20th April 2021, with the following parameters: 200 bp ± 25 bp read length filter and a 97% identity cut-off using the original database (v.1.00 2019, 7565 fishes, original MiFish DB ver.30) and secondly accessed on 2nd December, 2022, using the updated database (v.3.85 2022-11-01, 9569 fishes) (Iwasaki et al., 2013; Sato et al., 2018). In brief, the pipeline evaluated the paired end reads quality and trimmed low-quality tails using fastp (v0.23.2) (Chen et al., 2018). FLASH (v1.2.11) (Magoč & Salzberg, 2011) was used for the assembly of paired end reads, and Cutadapt (v4.1) (Martin, 2011) was employed to remove primers. The reads were further processed through denoising, chimera removal, and a 99% operational taxonomic unit (OTU) detection using usearch (Edgar, 2010). Finally, a sequence similarity search was conducted using BLAST+ (v2.9.0) (Camacho et al., 2009) set at 97% threshold using an e value of 10-5 against the selected database.
To confirm that all listed species were fish or related to the marine ecosystem, a name check was performed against FishBase (https://www.fishbase.de/), WoRMS (https://www.marinespecies.org/index.php), and NCBI (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi) databases using a custom Python script developed specifically for this study (https://github.com/yamaton/fishbase-scraper.git). The raw sequences of the voucher specimens were manually curated and queried against the NCBI database for further verification. This comprehensive approach ensured the reliability and accuracy of the species identification by cross-referencing with authoritative databases and employing stringent bioinformatic analysis techniques.
All the species that were flagged as new species within the study site were further investigated through querying the representative sequences against the NCBI database and the results were presented in terms of percentage identity.
Phylogenetic analysis
The resulting species-assigned sequences were aligned using MAFFT (v7.490) with a LOCAL pair alignment approach and 1000 iterations (Katoh & Standley, 2013). To construct a phylogenetic tree, we employed FastTree (v2.1.11) and implemented the Maximum-likelihood method (Price, Dehal & Arkin, 2010). The resulting tree was generated in Newick format and imported into Interactive Tree of Life (iTOL) for annotation (Letunic & Bork, 2021).
Diversity analysis
Diversity indices were generated using the ‘vegan’ (v2.6-2) package (Oksanen et al., 2013). Shannons’ H’ index for diversity within the samples and Bray-Curtis’ dissimilarity index for diversity between samples were calculated using OTU reads abundance as described by Gelis et al., (2021). To ascertain whether fish assemblage differed between sites and by extension assess the impact of physicochemical parameters, we applied the non-metric multidimensional scaling (NMDS) method using PCoA to create the initial configuration. This approach was best suited to our data because the relationship between data dissimilarity and ordination distance was non-linear. In order to describe the community in a way that all species and sampled sites had an equal opportunity to influence the patterns, we standardised the species abundances using Wisconsin double standardisation approach and generated a distance matrix reflecting the multidimensional distance between each of the site pairs using Bray-Curtis’s dissimilarity index. We tested the effects of the habitat variance on fish species distribution patterns using Permutational multivariate analysis of variance (PERMANOVA) (Anderson, 2014) in the ‘vegan’ package.
The raw DNA sequence data has been submitted to the GenBank (www.ncbi.nlm.nih.gov/genbank) databases under BioProject PRJNA777404 and the voucher specimen assigned accession numbers OP787891-4. Scripts and datasets generated and used to obtain results in this study can be accessed through (https://github.com/Kishaz/Gazi_Bay_eDNA.git).