Bony fish 12S rRNA sequencing data from coastal water samples in gulf of Maine
Data files
Aug 28, 2025 version files 4.37 GB
-
mdat.v3.tsv
1.92 KB
-
README.md
1.63 KB
-
Sequence_data.zip
4.37 GB
-
V4-gs.csv
3.59 KB
Abstract
Ecosystems in coastal waters of Gulf of Maine (GOM) are undergoing environmental challenges in response to climate change and anthropogenic stressors. eDNA metabarcoding, a powerful tool for assessing fish community structure, was used to identify fish communities in three types of GOM aquatic environments (sand, macroalgae, and eelgrass) in Maine and New Hampshire, USA. The available 12S rRNA fish universal primer analysis system was modified using nested PCR (MiFish and 12S-V5) to improve targeting of fish products and reduce to non-target products. The nested PCR strategy allowed successful amplification of 12S rRNA genes in fishes without production of non-target products and identified 28 fish groups at the genus level. Presence/Absence data and Relative Abundance showed significant differences among locales but not among habitats. Myoxocephalus sp. were found at all sampling sites. Relative Abundance data revealed that Menidia menidia and Brevoortia sp. were statistical indicator species in Goosefare, Maine and New castle, New Hampshire, respectively. Although beta diversity indicated that fish communities were not different across habitats, statistical analysis found that Pholis sp. and Ammodytes sp. were dominant species in macroalgae and sand respectively. To our knowledge, this is the first metabarcoding study to assess fish communities in the Western Atlantic region using the MiFish primer set and the study suggests that metabarcoding is useful for mapping geographic and temporal marine fish diversity.
https://doi.org/10.5061/dryad.47d7wm3q6
Description of the data and file structure
Sequence_data.zip
This folder includes paired-end raw sequencing data generated by NovaSeq6000 for 36 samples.
mdat.v3.tsv
This is metadata file of 36 sample water samples collected for this study.
V4-gs.csv
This is the sequence abundance file that used for R-analysis.
Files and variables
File: mdat.v3.tsv
Description:
Variables
- sampleid: sample name
- Site: Four locales that samples were collected: New Castle (NC), Fort Foster (FF), Kennebunk Port (BK), Goosefare (GF)
- Habitat: Three habitats that samples were collected: Sand (S), eelgrass (Eg): macroalgae (Ma)
- Replicate: number of sample replicates
- Date Collected : Date that Sample collected
- year: All samples were collected in 2002
- month: month that samples were collected
- primer: Primer used to amplify DNA : MiFish-U
- PCR: PCR method to used for amplify the primers- Nested: Nested PCR
File: V4-gs.csv
Description:
Variables
- 1st column : sample name
- the rest : taxa identified by metabarcoding analysis
File: Sequence_data.zip
Description: paired-end .fastq.gz files for all samples
Access information
Other publicly accessible locations of the data:
- National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) under bioproject accession number PRJNA1104637
Water samples were collected from 3 habitat types (sand, ellgrass, and macroalgae) in four different GOM locations: New Castle (NC) in New Hampshire, Fort Foster (FF), Kennebunk Port (KB), and Goosefare (GF) in Maine. Three-liter samples were collected ~1 meter above the seafloor, either manually by divers carrying Niskin bottles (3´1L), or by using a Van Dorn sampler and filling 3 Niskin bottles aboard the sampling vessel. All 36 samples were collected between June and October in 2022. Total DNA was extracted using the DNeasy PowerSoil Pro™ kit (Qiagen) after fitration. Nested PCR based on 12S rRNA was performed to amplify fish DNA. Three PCR replicates were conducted for each DNA sample and then pooled for sequencing analysis after confirmation of no contamination in both negative controls. Final amplicons were purified with AMPure XP SPRI™ reagent to remove remnant primers and dNTPs. The purified amplicons were indexed by barcode and sequenced on the Illumina NovaSeq 6000 platform at Hubbard Center for Genome Studies (HCGS) at University of New Hampshire (Durham, NH). Removal of sequencing adaptors and quality trimming with a Phred score cutoff 33 were performed using Trimmomatic 0.39 after which sequence reads < 200 bp were eliminated from the data set. Trimmed reads were imported into Qiime2 bioinformatics software with a metadata file. Denoising was conducted with the DADA2 plug-in using the following parameters: --p-trim-left-f 21, --p-trim-left-r 27, --p-trunc-len-f 200, and --p-trunc-len-r 200. Taxonomic classification of amplicon sequences was performed with the QIIME2 function feature-classifier classify-consensus-vsearch with the percentage sequence similarity to reference sequences (--p-perc-identity 0.97), query coverage (--p-query cov 0.9), and maximum number of hits to keep for each query (--p-maxaccepts all). The reference database was Mitohelper 12S rDNA database released in Dec 2023 (https://github.com/aomlomics/mitohelper/tree/master/QIIME-compatible). Unassigned sequences were removed using the QIIME2 function (qiime taxa filter-table –p-exclude Unassigned). Phylogeny of the filtered amplicon sequencing variants (ASVs) was reconstructed using (qiime phylogeny align-to-tree-mafft-fasttred) to group ASVs into operational taxonomic units (OTU). When an OTU was assigned to several species of the same or higher taxonomy levels, it was manually confirmed by BLAST analysis against NCBI and MitoFish (https://mitofish.aori.u-tokyo.ac.jp/blast/simple/) reference databases.
