Experimental evaluation of genetic variability based on DNA metabarcoding from the aquatic environment: Insights from the Leray COI fragment

Name: Experimental evaluation of genetic variability based on DNA metabarcoding from the aquatic environment: Insights from the Leray COI fragment
Creator: Sergei Turanov

Turanov, Sergei 1

Published Jun 07, 2024 on Dryad. https://doi.org/10.5061/dryad.sf7m0cgfk

Data files

Jun 07, 2024 version files 247.99 KB

README.md

4.83 KB
Supplement_B.zip

243.16 KB

Abstract

Intraspecific genetic variation is important for the assessment of organisms’ resistance to changing environments and anthropogenic pressures. Aquatic DNA metabarcoding provides a non-invasive method in biodiversity research, including investigations at the within-species level. Through the analysis of eDNA samples collected from the Peter the Great Gulf of the Japan Sea, in this study we aimed to evaluate the identification of Amplicon Sequence Variants (ASVs) in marine eDNA among abundant species of the Zostera sp. community: Hexagrammos octogrammus, Pholidapus dybowskii (Teleostei: Perciformes), and Pandalus latirostris (Arthropoda: Decapoda). These species were collected from two distant locations to produce mock communities and gather aquatic eDNA both on the community and individual level. Our approach highlights the efficacy of eDNA metabarcoding in capturing haplotypic diversity and the potential for this methodology to track genetic diversity accurately, contributing to conservation efforts and ecosystem management. Additionally, our results elucidate the impact of nuclear mitochondrial DNA segments (NUMTs) on the reliability of metabarcoding data, indicating the necessity for cautious interpretation of such data in ecological studies. Moreover, we analyzed 83 publicly available COI sequence datasets from common groups of multicellular organisms (Mollusca, Echinodermata, Crustacea, Polychaeta, and Actinopterygii). The results reflect the decrease in population diversity that arises from using the metabarcode compared to the COI barcode.

https://doi.org/10.5061/dryad.sf7m0cgfk

Sergei Turanov

Laboratory of Deep Sea Research, A.V. Zhirmunsky National Scientific Center of Marine Biology, Far Eastern Branch, Russian Academy of Sciences, Vladivostok, Russia

Dataset Description

This dataset contains information on the experimental evaluation of genetic variability using DNA metabarcoding techniques focused on the Leray COI fragment. The data were collected from environmental DNA (eDNA) of the aquatic animals in the Japan Sea. In addition, we analyzed 83 publicly available COI sequence datasets from common groups of multicellular organisms (Mollusca, Echinodermata, Crustacea, Polychaeta, and Actinopterygii).

File Organization

The dataset is organized into several files and supplements as described below:

Supplement A.

Files with samples and primers information used for reads demultiplexing procedure based on Begum (Zepeda-Mendoza et al., 2016; Yang et al., 2021) pipeline.

1. Primers_LerayCOI.txt: Contains sequences of primers used for amplifying the COI Leray fragment.

- Forward Primer: GGWACWGGWTGAACWGTWTAYCCYCC

- Reverse Primer: TANACYTCNGGRTGNCCRAARAAYCA

2. Samples_LerayCOI.txt: Lists the samples and their associated primer tags.

- Format: SampleID, Forward Primer Tag, Reverse Primer Tag, Pool

- Example: PL20-2_hap2, F1R1, F1R1, pool

3. Tags_LerayCOI.txt: Provides the sequences of the 7-nucleotide tags used in the study.

Supplement B.

Contains fasta files of COI sequences for various species. These files are named according to the species and whether they are standard or Leray fragment sequences. Example files: Acrocnida_brachiata_COI.fas, Halicampus_grayi_COI_leray.fas, etc.

Supplement C.

Scritps used to perform the formatting and the analysis of the COI datasets.

'SNPs_them.sh– shell script generating datasets based on SNPs sites only.

Format.sh – shell script making the resulting SNPs files suitable for Geneland program input.

Clusters with GenMAper.R – R script making Geneland population cluster analysis for all the FASTA files in the current working directory.

Methods Summary

Sample Collection

- Animals were collected from Vostok Bay and Vityaz Bay in the Japan Sea using a fish fry net.

- Collected species: Hexagrammos octogrammus, Pholidapus dybowskii, and Pandalus latirostris.

Experimental Setup

- Animals were placed in two separate 150 L aquaria and maintained at 15°C.

- DNA was collected from both the environment (eDNA) and individual animals.

- eDNA was isolated both from eDNA sampls and individual total DNA samples and COI fragments were amplified using sample-specific pairs of primers (Samples_LerayCOI.txt)

Sequencing and Data Processing

- Amplicons were sequenced using Illumina NovaSeq 6000.

- Reads were processed to remove adapters, correct errors, merge paired-end reads, and de-multiplex using various software tools.

- Compared individual genotyping and eDNA sequencing data and found additional haplotypes likely to be NUMTs

Assessing the Genetic Variation of COI Fragments in Population Datasets Retrieved from GenBank

- The most common groups of multicellular organisms Mollusca, Echinodermata, Crustacea (subphylum of Arthropoda phylum), Polychaeta (class of Annelida phylum), and Actinopterygii (class of Chordata phylum) were chosen for that analysis.

- Taxonomic-based searches were performed in the popset database of the NCBI GenBank resource among the sequence sets for population genetic studies.

- A total of 83 datasets were selected, with 9–20 sets for each group and at least 17 sequences in each dataset.

- Reduction of the retrieved sequences to the Leray fragment length was performed by aligning with reference datasets.

- Sequence matrices were generated and analyzed to estimate haplotypic variability, the number of population clusters, and other genetic variations.

References

Yang, C., Bohmann, K., Wang, X., Cai, W., Wales, N., Ding, Z., Gopalakrishnan, S., & Yu, D. W. (2021). Biodiversity Soup II: A bulk-sample metabarcoding pipeline emphasizing error reduction. Methods in Ecology and Evolution. https://doi.org/10.1111/2041-210X.13602

Zepeda-Mendoza, M. L., Bohmann, K., Carmona Baez, A., & Gilbert, M. T. P. (2016). DAMe: A toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses. BMC Research Notes. https://doi.org/10.1186/s13104-016-2064-9