Overwintering fish community in ice-covered environments in Hokkaido, Japan, inferred from environmental DNA
Data files
Mar 12, 2025 version files 4.74 MB
-
Metadata.zip
17.28 KB
-
MiFish_outputs.zip
1.27 MB
-
Processed_data.zip
26.10 KB
-
README.md
19.74 KB
-
Results.zip
114.48 KB
-
Taxonomic_assignment.zip
3.29 MB
Abstract
The overwintering ecology of fish in the seasonal ice zones (SIZs) remains largely unexplored owing to methodological limitations. Environmental DNA (eDNA) can reveal the distribution and diversity of fish species in various aquatic environments, thereby offering a possible solution to the methodological limitations of SIZ studies. Therefore, we aimed to detect the overwintering fish community in the ice-covered Saroma-ko Lagoon, located on the Okhotsk Sea coast of Hokkaido, and its inflow, using eDNA metabarcoding. eDNA extracted from under-ice seawater collected from the lagoon yielded 28 fish taxa, predominantly Clupea pallasii based on the relative DNA read abundance. Dissimilarity analysis suggested short-term temporal variations in eDNA composition in under-ice seawater, even at the same site. Nineteen fish taxa, predominantly Tribolodon brandtii and T. hakonensis, were detected in the eDNA extracted from under-ice river water. The high dissimilarity between eDNA results from under-ice seawater and river water suggested segregation of the overwintering community between the lagoon and river. Fish eDNA detected in meltwater from sea ice was assigned to five taxa, suggesting the entrainment of particulate matter containing fish eDNA during ice growth. The true species richness estimated based on eDNA results and discrepancies with historical reports suggested that sampling efforts need to be optimized for ice-covered environments to promote more comprehensive species detection. This study demonstrated the usefulness of using fish eDNA metabarcoding to study the ecology of overwintering fish under ice. Clarifying eDNA shedding patterns, persistence, and dispersal in under-ice environments would improve the reliability of this technique and expand its use in SIZs.
https://doi.org/10.5061/dryad.rn8pk0pnn
Description of the data and file structure
This repository contains the data and R script required to reproduce the main results of the manuscript, as well as a prototype of the figures and tables. Raw sequence data files were deposited in the DNA Data Bank of Japan (DDBJ) Sequence Read Archive (DRA) as DRA014723 and DRA018892.
Files and variables
File: Metadata.zip
This ZIP file includes the following metadata and supporting data for the analysis.
Sampling_detail.csv
Detailed information of eDNA sampling.
- Sample_ID: ID for each eDNA sample.
- Filter_ID: ID for the filter to process the sample.
- Description: Description of the eDNA source.
- Sampler: The type of water sampler used to collect the seawater.
- Station: The name of the sampling station.
- Latitude: Latitude of the station (decimal degrees).
- Longitude: Longitude of the station (decimal degrees).
- Depth_m: The water depth (m) at the station.
- Collection_depth_m: The depth (m) at which the water sample was collected.
- Ice_tickness_m: The thickness of sea ice (m) at the station.
- Collection_time_JST: The time (Japanese Standard Time) at which the sample was collected.
- Collection_time_UTC: The time (Universal Time) at which the sample was collected.
- Volume_filtered_L: The volume of water (L) filtered per filter.
- Water_temperature_C: The water temperature (°C) at the station.
- Salinity: Salinity (psu) at the station.
Sequence_detail.csv
Detailed information of sequencing runs.
- Run_ID: ID for sequencing runs.
- Library_ID: ID for each library.
- Sample_ID: ID for each eDNA sample.
- Filter_ID: ID for the filter to process the sample.
- Volume_of_2ndPCR_products_input_uL: The amount of PCR products (uL) used for library construction.
Amaoka_DB.txt
A list of NCBI taxonomy ID of fish species reported in Amaoka et al. (2020).
Nelson_family.csv
A list of fish families as presented in in Fishes of the World, 5th edition (Nelson et al., 2016).
- number: Sequencial number for each family, as presented in Nelson et al. (2016).
- family: The scientific name of the family.
- english_name: The English common name for the family.
Nelson_order.csv
A list of fish orders as presented in in Fishes of the World, 5th edition (Nelson et al., 2016).
- number: Sequencial number for each family, as presented in Nelson et al. (2016).
- order: The scientific name of the order.
- english_name: The English common name for the order.
Saroma_Tokoro_fishlist.csv
A list of fish species previously reported in the Saroma-ko Lagoon (Ministry of the Environment, Japan, 1993; available at https://www.biodic.go.jp/reports2/4th/kosho/4_kosho_allm.pdf) and the Tokoro River (Ministry of Land, Infrastructure, Transport, and Tourism, Japan, 2024; available at https://www.nilim.go.jp/lab/fbg/ksnkankyo/).
- location: The geographical location where the species was reported (lagoon = Saroma-ko Lagoon, river = Tokoro River).
- order: The scientific name of the order.
- family: The scientific name of the family.
- species: The scientific name of the species.
- ncbi_id: ID for the species in the NCBI Taxonomy database.
- reference: The reference reporting the presence of the species.
File: MiFish_outputs.zip
This zip file contains output files generated using the online version of the Mifish pipeline v3.91 (available at:https://mitofish.aori.u-tokyo.ac.jp/mifish/). For more detailed information about data structure, please see the web site.
The summary of sequencing quality.
The statistics related to the processing of sequencing reads for each sample.
- Sample Name: The sample name used in the sequencing run.
- Read Type: The type of sequencing reads. In this data set, all samples have the same value (Pair-End FASTQ).
- No. Reads: The total number of reads obtained from the initial step of the pipeline.
- No. Reads after Quality Filter: The number of reads remaining after quality filtering.
- No. Assembled Reads: The number of reads resulting from the merging paired-end reads.
- No. Reads Without N: The number of reads remaining after removing the reads containing ambiguous bases (N).
- No. Reads Fit Length: The number of reads that passed length filtering.
- No. Unique Reads: The number of unique reads.
- No. Denoised Haploids: The number of haploids obtained after denoising.
- No. Species: The number of species assigned in the output of the pipeline.
The sample-species table exported by the MiFish pipeline. An excel file contains three sheets: Comparison of Samples, List of Sample Details, and Haploids with Low Identities.
Sheet 1: Comparison of Samples
- Class: The scientific name of the class for the species.
- Order: The scientific name of the order for the species.
- Family: The scientific name of the family for the species.
- Scientific Name: The scientific name of the species assigned by the MiFish pipeline.
- Common Name: The English common name for the species, presented in FishBase. Blank entries indicate no data.
- Ave. Confidence: The average of confidence value for the assignment.
- Water area: The aquatic environment that the species inhabit. Blank entries indicate no data.
- Habitat: The habitat of the species. Blank entries indicate no data.
- DepthS: The minimum depth (meters) at which the species has been reported to occur. Blank entries indicate no data.
- DepthD: The maximum depth (meters) at which the species has been reported to occur. Blank entries indicate no data.
- IUCN Red List Status: The conservation status of the fish species, as defined by the IUCN Red List categories.
- Importance in Fisheries: The importance in fisheries of the species. Blank entries indicate no data.
- Threat to Humans: Potential threat of the species to humans. Blank entries indicate no data.
- Abundances: The read abundance of the species in each sample.
Sheet 2: List of Sample Details
- Sample name: The sample name used in the sequencing run.
- Species: The scientific name of the species assigned by the MiFish pipeline, representing the top-hit species with the highest percent sequence identity.
- Total read: The total number of sequencing reads of the temporarily identified species.
- Representative Sequence: The most abundant sequence within a sample, among those sharing the same top-hit species.
- Haploid ID: ID for each unique haploid within a sample.
- Size: The read abundance of each haploid.
- Confidence: The categorized comfidence score.
- Identity(%): The percent sequence identity between each haploid sequence and its top-hit sequence.
- Confidence Score: The confidence score for each taxonomic assignment.
- Align Len: The alignment length between each haploid sequence and its top-hit sequence.
- Mismatch: The number of mismatch between each haploid sequence and its top-hit sequence.
- 2nd-sp Name: The scientific name of the second-hit species based on the percent sequence identity.
- 2nd-sp Align Len: The alignment length between each haploid sequence and its second-hit sequence.
- 2nd-sp Mismatch: The number of mismatch between each haploid sequence and its second-hit sequence.
- Sequence: The sequence of each haploid.
Sheet 3: Haploids with Low Identities
The column descriptions are consistent with those provided above.
File: Processed_data.zip
This ZIP file contains processed MiFish outputs. First, an OTU-sample table and a ZOTU-sample table were compiled based on the initial assignment. Then, potential contaminant reads were removed.
MiFish_initial_table.csv
An OTU-sample table compiled from the data presented in the "Comparison of Samples" sheets within the "/MiFish_outputs/Taxonomy/taxonomy_*.xlsx".
The column descriptions are consistent with those provided above.
MiFish_initial_assignment.csv
A list of sequence for each haploid compiled from the data presented in the "List of Sample Details" sheets within the "/MiFish_outputs/Taxonomy/taxonomy_*.xlsx".
The column descriptions are consistent with those provided above.
MiFish_initial_ZOTUtable.csv
A combined table generated by merging the OTU-sample table (MiFish_initial_table.csv) with the results of taxonomic assignments (MiFish_initial_assignment.csv). This table links sequence of each haploid to their read counts and initial taxonomic assignments.
The column descriptions are consistent with those provided above.
MiFish_decontam.csv
A table showing read counts per ZOTU (zero-radius operational taxonomic unit) per sample, after correction and decontamination.
- Run_ID: ID for sequencing runs.
- sample_name: The sample name used in the sequencing run.
- zotu_ID: ID for each ZOTU.
- nreads: The raw read abundance of each ZOTU for each sample.
- nreads_corrected: The read counts adjusted based on the volume of PCR products input into the sequencing library.
- nreads_decontam: The read counts after decontamination, calculated by subtracting the number of reads detected in negative controls from nreads_corrected.
MiFish_decontam_ZOTUtable.csv
An OTU-sample table generated from the "MiFish_decontam.csv". Each column indicates the decontaminated read counts for each sample, with rows representing ZOTUs.
File: Taxonomic_assignment.zip
This ZIP file contains the files required for taxonomic assignment and its results.
Sequences.fasta
A FASTA file containing the ZOTU sequences.
CZ8F3X14016-Alignment-HitTable.csv
Results of GenBank BLAST for the ZOTU sequences, formatted as a Hit Table (CSV).
- column1 (zotu_ID): Query ID for each ZOTU sequence.
- column2 (accession): GenBank accession number of significant hit sequences for the query.
- column3 (percent_identity): The percent sequence identity between the query and hit sequence.
- column4 (alignment_length): The alignment length between the query and hit sequence.
- column5 (mismatches): The number of mismatches between the query and hit sequence.
- column6 (gap_opens): The number of gap openings within the alignment.
- column7 (q_start): The start position of query sequence used in the alignment.
- column8 (q_end): The end position of query sequence used in the alignment.
- column9 (s_start): The start position of hit sequence used in the alignment.
- column10 (s_end): The end position of hit sequence used in the alignment.
- column11 (evalue): The E-value of the BLAST hit.
- column12 (bit_score): The bit-score of the BLAST hit.
CZ8F3X14016-Alignment.txt
Results of GenBank BLAST for the ZOTU sequences, formatted as text.
- Description: Description of the hit sequence from GenBank.
- Scientific Name: The scientific name of the species associated with the hit sequence.
- Common Name: The English common name of the species.
- Taxid: NCBI taxonomic ID for the species.
- Max Score: The maximum bit-score of the BLAST hit.
- Total Score: The total bit-score of the BLAST hit.
- Query cover: The percentage of query coverage in the alignment.
- E Value: The E-value of the BLAST hit.
- Per. Ident: The percent sequence identity between the query and hit sequence.
- Acc. Len: The length of the hit sequence.
- Accession: The GenBank accession number of the hit sequence.
BLAST_results.csv
Summary of the BLAST results compiled from the data presented in the "CZ8F3X14016-Alignment-HitTable.csv" and "CZ8F3X14016-Alignment.txt".
The column descriptions are consistent with those provided above.
BLAST_results_classified.csv
The modified version of "BLAST_results.csv" where taxonomic information for the candidate species has been added, based on the NCBI taxonomy database.
The column descriptions are consistent with those provided above.
Assignment_results.csv
The results of the taxonomic assignment of ZOTUs based on sequence identity.
- zotu_ID: ID for each ZOTU.
- name: The scientific name of the taxon assigned to the ZOTU.
- rank: Taxonomic rank of the assigned taxon.
- candidates: The scientific name of the candidate species to determine the assigned taxon.
- class: The scientific name of the class for the assigned taxon.
- order: The scientific name of the order for the assigned taxon.
- family: The scientific name of the family for the assigned taxon.
- genus: The scientific name of the genus for the assigned taxon.
- species: The scientific name of the species for the assigned taxon. "NA" indicates that the ZOTU was not assigned to the species level.
File: Results.zip
This ZIP file contains the results of the community-level analysis. Prototypes of figures and tables based on these results are also included.
MiFish_decontam_assigned.csv
This file was created by combining "MiFish_decontam_ZOTUtable.csv" and "Assignment_results.csv". ZOTUs assigned to the same taxon were collapsed into one, and the read counts (nreads_decontam) were summed for each taxon (total_reads).
- sample_name: The sample name used in the sequencing run.
- name: The scientific name of the taxon assigned to the ZOTU.
- order: The scientific name of the order for the assigned taxon.
- family: The scientific name of the family for the assigned taxon.
- total_reads: The total decontaminated read counts per assigned taxon per sample.
MiFish_decontam_assigned_table.csv
An OTU-sample table generated from the file "MiFish_decontam_assigned.csv".
The column descriptions are consistent with those provided above. Each column indicates the decontaminated read counts for each sample, with rows representing ZOTUs.
Rarefied_for_sample.csv
This file contains rarefied read counts for each unique sample, normalized to a consistent sequencing depth among the samples.
- group: The group of samples subjected to the rarefaction.
- name: The scientific name of the taxon.
- size_rarefied: The read counts per taxon after rarefaction.
- proportion: Relative abundance of the rarefied read counts.
Rarefied_for_site.csv
The rarefied read counts for sampling site in the lagoon (St. L) and the river (St. R).
The column descriptions are consistent with those provided above.
Rarefied_for_source.csv
The rarefied read counts for eDNA sources (Under-ice seawater, Under-ice riverwater, and Meltwater from sea-ice).
The column descriptions are consistent with those provided above.
Table2_Rarefied_for_sample_table.csv
Prototype of Table 2: A table showing the relative abundance of rarefied read counts for each taxon.
Figure2_Taxonomic_composition.pdf
Prototype of Figure 2: A bar graph showing taxonomic composition of unique samples.
Figure3_Overlap_within_lagoon.pdf
Prototype of Figure 3: A Venn diagram showing taxonomic overlap among unique samples within the lagoon site. The file "Rarefied_for_sample.csv" was used to create this figure.
Figure3_Overlap_within_lagoon.csv
Table showing the overlapping taxa among unique samples within the lagoon site.
Figure4_Overlap_between_sites.csv
Prototype of Figure 4: A Venn diagram showing taxonomic overlap between the lagoon site (St. L) and the river site (St. R). The file "Rarefied_for_site.csv" was used to create this figure.
Figure4_Overlap_between_sites.pdf
Table showing the overlapping taxa between the lagoon site (St. L) and the river site (St. R).
Figure5_Overlap_among_sources.pdf
Prototype of Figure 5: A Venn diagram showing taxonomic overlap among eDNA sources (Under-ice seawater, Under-ice riverwater, and Meltwater from sea-ice). The file "Rarefied_for_source.csv" was used to create this figure.
Figure5_Overlap_among_sources.csv
Table showing the overlapping taxa among eDNA sources (Under-ice seawater, Under-ice riverwater, and Meltwater from sea-ice).
Figure6_Aquapams_parameters.csv
The species-specific environmental envelopes retrieved from Aquamaps using "aquamapsdata" in R.
Figure6_Aquamaps_supplement_data.csv
Supplementary data for species-specific environmental envelopes that are not automatically retrieved by "aquamapsdata" in R.
Figure6_WT_preferences.pdf
Prototype of Figure 6: A figure showing the environmental envelopes for water temperature (Temp) of marine species that were detected by eDNA metabarcoding.
Figure6_Ice_preferences.pdf
Prototype of Figure 6: A figure showing the environmental envelopes for ice concentration (IceCon) of marine species that were detected by eDNA metabarcoding.
TableS1_Sequence_statistics.csv
Prototype of Table S1: A table showing the read counts at each bioinformatics step.
TableS2_Fishfauna.csv
Prototype of Table S2: A table combining fish taxa detected by eDNA metabarcoding (this study) and previous reports. Habitat information derived from FishBase (Oceanic, Estuaries, Stream, Lakes) was also added to the table.
TableS3_Pairwise_dissimilarity.csv
Prototype of Table S3: Pairwise Jaccard dissimilarity between unique samples, sites, and eDNA sources. The dissimilarity was also calculated between eDNA results and the fish fauna previously reported.
FigureS1_Species_coverage.pdf
Prototype of Figure S1: Figures showing the number of taxa against read counts for each filter replicate.
FigureS2_Chao2_estimator.pdf
Prototype of Figure S2: Rarefaction and extrapolation curve of taxa in relation to the number of unique samples at St. L.
FigureS3_Overlap_with_pastreport.pdf
Prototype of Figure S3: A Venn diagram showing taxonomic overlap among eDNA results and the previous reports. The file "TableS2_Fishfauna.csv" was used to create this figure.
FigureS3_Overlap_with_previousreport_lagoon.csv
A table showing the overlapping taxa between eDNA results and the previous report in the lagoon.
FigureS3_Overlap_with_previousreport_river.csv
A table showing the overlapping taxa between eDNA results and the previous report in the river.
FigureS4_Ice_preferences.pdf
Prototype of Figure S4: A figure showing the environmental envelopes for water temperature (Temp) of marine species that were not detected by eDNA metabarcoding.
FigureS4_WT_preferences.pdf
Prototype of Figure S4: A figure showing the environmental envelopes for ice concentration (IceCon) of marine species that were not detected by eDNA metabarcoding.
Code/software
Rscript_kawakami.rmd
R Markdown file containing the script executed in R version 4.4.0 (2024-04-24) using RStudio (2024.12.0+467).
Following packages were used in the script.
- tidyverse_2.0.0
- cowplot_1.1.3
- iNEXT_3.0.1
- vegan_2.6-6.1
- betapart_1.6
- ggVennDiagram_1.5.2
- broom_1.0.6
- ggsci_3.1.0
- taxize_0.9.100
- readxl_1.4.3
- rfishbase_4.1.2
- aquamapsdata_0.1.6
Access information
Data was derived from the following sources:
Study site
The Saroma-ko Lagoon is an enclosed coastal lagoon with a surface area of 151.59 km2 and a maximum depth of 22 m. It is connected to the southern part of the Sea of Okhotsk by two inlets (Figure 1). The water mass of the eastern part of the lagoon consists primarily of Okhotsk Sea water, with freshwater input from the Saromabetsu River (Nomura et al., 2009). The sediments in the lagoon consist primarily of silt and sand (Nishihama & Hoshikawa, 1992). Almost the entire surface of the lagoon is covered with sea ice from early January to early April (Shirasawa et al., 2005). A weak but steady horizontal and vertical current (<0.02 m/s) likely driven by tidal fluctuations is observed beneath the sea ice (Nomura et al., 2024). The lagoon has moderate fish diversity (49 species) and is a composite of coastal marine fish (e.g., Clupea pallasii, Sebastes spp., Pleuronectidae spp.) and brackish water fish (e.g., Mugil cephalus) (Ministry of the Environment, Japan, 1993). Fish fauna include species targeted for under-ice fishing, such as Eleginus gracialis and Hypomesus nipponensis.
eDNA sampling
Sampling methods for eDNA collection are detailed by Nomura et al. (2022). Briefly, under-ice water was collected at the eastern part of the lagoon (St. L, which corresponds to the “eDNA site” in Nomura et al., 2022) and river mouth of the Saromabetsu River (St. R) on March 5 and 6, 2021 (Figure 1, Table 1). The water depth was approximately 12 m at St. L and 2 m at St. R. The distance between these two stations was approximately 4.25 km. The ice thickness at both sites was approximately 0.5 m. Water temperature and salinity in the lagoon ranged from -1.2 to -0.8 °C and 25.0 to 32.0 at St. L (Nomura et al., 2022), and -0.1 °C and 0.3 at St. R (Table 1), respectively. Surface sea water temperature off the Saroma-ko Lagoon in early March was -1.74 °C on average, with partial coverage of drifting sea ice (data was available in Japan Meteorological Agency: https://ds.data.jma.go.jp/kaiyou/data/db/kaikyo/series/engan/engandata_SP.html and Ice Information Center, Japan: https://www1.kaiho.mlit.go.jp/KAN1/drift_ice/ice_chart/calendar/2021/03.html).
Under-ice water was collected from a depth of 1.5 m (approximately 1 m below sea ice) using a water sampler or pump through an ice hole cut using an ice saw as described below. At St. L, water collection was repeated four times during the ebb tide (from 9:00 to 14:00) at approximately 1–2 hour intervals, considering short-term temporal variation in eDNA signals (Jensen et al., 2022). To evaluate the collection efficiency of under-ice seawater, two types of water samplers (a 1 L Kitahara sampler [Rigosha, Japan] and a 2 L Van Dorn sampler [Rigosha, Japan]) and two types of pumps (a hand pump [GH-0400D, Bosworth Company, East Providence, RI, USA] and an electric pump [YS-20, Sendak, Tokyo, Japan]) were used. At St. R, a peristaltic pump (Geopump, Geotech Environmental Equipment Inc., Denver, CO, US) was used to collect under-ice river water. Each collection filled at least two sterilized 10 L foldable plastic bags, and a benzalkonium chloride (BAC) solution (final concentration of 0.01%) was added immediately to prevent eDNA degradation (Jo et al., 2021). Water samples were transported to a laboratory installed in our accommodation using a sledge. Although the systematic effects of differences in the apparatus used for water collection on eDNA detection are not clarified, we assumed the results were comparable.
Sea-ice samples were collected from St. L on March 8, 2021. An ice block measuring 0.3 × 0.3 m in area and 0.5 m thickness was collected, and the bottom 5 cm was cut using an ice saw. This layer was composed of columnar ice, which was characterized by vertically aligned ice crystals. This layer likely formed within 2 months prior to sampling. The ice samples were placed in a clean plastic bag, transported in a cooler box to the laboratory at the Faculty of Fisheries Sciences, Hokkaido University (Hakodate, Hokkaido), and subsequently melted at 4 °C in the dark.
Seawater samples and meltwater from sea ice were filtered through a 0.45 µm pore size cartridge filter (Sterivex HV, Millipore, Billerica, MA, USA) using a peristaltic tubing pump (Masterflex 07528-10, Cole-Parmer, IL, USA) at a flow rate of 100 mL/min. Filtration stopped when the filter was severely clogged. The water volume filtered through one filter ranged from 1.3 to 8.5 L for seawater and approximately 0.3 L for meltwater from sea ice (Table 1). The filters were stored at -20 °C until DNA extraction. Prior to sample filtration, 500 mL purified water was filtered in the same manner as the negative control. All equipment used for water collection was decontaminated using a bleach solution (approximately 0.12% NaOCl) and rinsed with distilled water before use. Researchers wore new gloves and masks at each step of eDNA sampling. The workspace was wiped with a breach disinfectant spray (approximately 0.1% NaOCl) and distilled water before use.
DNA extraction, PCR, and sequencing
DNA was extracted from each filter using the Qiagen DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) following the protocol of Wong et al. (2020). Detailed information on DNA extraction is provided by Kawakami et al. (2023b). DNA extract (150 µL) was obtained from each filter and stored at -20 °C.
The sequencing data used in this study were obtained from two distinct sequencing runs (RIR-005 and HFS-013, Table S1) following the Illumina paired-end index sequencing protocol on an iSeq 100 (Illumina, San Diego, CA, USA). The composition of the PCR mixture and tagging protocol for Illumina sequencing differed slightly between the runs because the methods were conducted in two distinct laboratories and the protocol had to be modified to decrease index hopping, which is the misassignment of DNA reads between samples.
As the first step, the partial mitochondrial 12S rRNA gene (approximately 170 bp) was amplified using MiFish primers (Miya et al., 2015). Selecting optimal primers for eDNA metabarcoding with adequate species coverage and taxonomic resolution is crucial for a comprehensive overview of the biodiversity at the study site (Zhu & Iwasaki, 2023). The MiFish database (v3.83, available at https://github.com/billzt/MiFish), which is a subset of GenBank sequences, completely covers the species that inhabit the Saroma-ko Lagoon, with high taxonomic coverage of 86.5% at the species level and 94.9% at the family level for fish reported in waters around Hokkaido (Amaoka et al., 2020). Therefore, using MiFish primers are expected to provide a comprehensive view of fish diversity in the Saroma-ko Lagoon.
The first PCR for RIR-005 was performed in a 20 µL reaction containing 1× PCR Buffer for KOD FX Neo (Toyobo, Osaka, Japan), 0.4 mM dNTP mix, 0.4 U KOD FX Neo polymerase, 0.29 μM of each MiFish-U primer, 0.15 μM of each MiFish-E primer, and 2 μL of the template. The PCR for HFS-013 was performed in a 12 µL reaction containing 1× PCR Buffer for KOD FX Neo (Toyobo), 0.4 mM dNTP mix, 0.4 U KOD FX Neo polymerase, 0.3 µM of each MiFish-U primers, 0.15 µM of each MiFish-E primer, and 2 µL of template. The same PCR conditions were applied: 94 °C for 2 min; 40 cycles of 98 °C for 10 s, 65 °C for 30 s, and 68 °C for 30 s, followed by a final extension at 68 °C for 5 min. Eight PCR replicates were performed for each sample. Each PCR replicate was pooled and subsequently purified using ExoSAP-IT (Thermo Fisher Scientific, Waltham, MA, USA). A no-template control (NTC) was introduced in each run of the first PCR to evaluate cross-contamination and sequencing with 2.0 μL of nuclease-free water (UltraPure; Invitrogen, Waltham, MA, USA) instead of extracted DNA.
A second PCR was performed to attach the Illumina sequence adapter and dual-index tags (i7 and i5) to the first PCR products. For RIR-005, combinational indexing was applied, with each sample having a unique combination of indices. The 13 µL reaction included 1× PCR Buffer for KOD FX Neo, 0.37 mM dNTP mix, 0.24 U KOD FX Neo polymerase, 0.27 μM of each index primer, and 2 μL of the first PCR product. For HFS-013, non-redundant indexing was applied, ensuring that each sample had completely unique indices. Although the Illumina sequencer equipped with patterned flow cell, such as iSeq 100, may generate index hopping at a high frequency (0.29% in HiSeq 2500), non-redundant indexing effectively reduces the risk (MacConaill et al., 2018). The reaction volume was 12 µL, and it contained 1× PCR Buffer for KOD FX Neo, 0.4 mM dNTP mix, 0.24 U KOD FX Neo polymerase, 0.3 µM of each index primer, and 1 µL of the first PCR product.
For each iSeq run, the second PCR products were pooled in one tube, using 2–6 µL per sample based on band brightness from gel electrophoresis, and 4 µL for the NTC and negative controls. The band brightness was visually assessed by comparing it to that of DNA ladder markers (OneSTEP Ladder 50, Nippon Gene, Tokyo, Japan). The PCR products were not diluted to normalize concentration, as DNA from high-biomass species can mask the detection of low-biomass species (Deagle et al., 2018). The pooled library was purified using AMPure XP (Beckman Coulter, Brea, CA, USA). The second PCR products with a length of approximately 370 bp, were excised using a 2% E-Gel SizeSelect Agarose Gel (Thermo Fisher Scientific) and quantified using a Qubit dsDNA HS Assay Kit and Qubit Fluorometer (Thermo Fisher Scientific). The purity was evaluated using an Agilent 2100 Bioanalyzer system (Agilent, Santa Clara, CA, USA). The library was diluted to 40 pM with Buffer EB (Qiagen) and sequenced using an iSeq 100 with a 150 bp paired-end reading and 30% PhiX spike-in (Illumina) for quality control. Additionally, negative controls for field sampling and PCR were sequenced (Table S1).
The samples were processed in a laboratory dedicated to eDNA analysis for DNA extraction and subsequent PCR preparation. PCR amplification and library preparation were conducted in a separate room to prevent cross-contamination. Researchers wore new gloves and masks at each step of molecular work. The workspace and equipment (micropipettes, incubator, and centrifuge) were wiped with DNA-OFF (Takara, Shiga, Japan) and distilled water before use. In addition, the extraction equipment (manifold, valves, and tubes) were decontaminated using a bleach solution (approximately 0.12% NaOCl), as described in the eDNA sampling section.
Sequencing data processing
Raw sequences were demultiplexed using the Generate FastQ analysis module (v.2.0.1) on iSeq and exported as Fastq files. The Fastq files were processed using the Mifish pipeline v3.91 (Zhu et al., 2023; available at: https://mitofish.aori.u-tokyo.ac.jp/mifish/) with default parameters and a 97% similarity threshold. This pipeline generates zero-radius operational taxonomic units (ZOTUs), which are valid sequences corrected for sequencing errors (Edgar, 2016) as a consequence of consecutive processes, including quality checks, assembly of sequence reads, length filtering, removal of primer sequences, denoising, and chimera removal (Zhu et al., 2023). Subsequently, this pipeline automatically assigned ZOTUs to a known species with the highest similarity based on a homology search against the implemented DNA database. ZOTUs that shared the same species names were compulsorily selected and pooled to produce a sample-by-species table. Accordingly, this assignment procedure can yield spurious results when multiple species cannot be discriminated based on sequence similarity. Therefore, to obtain robust and conservative assignments, we disregarded the results exported from the pipeline and refined the taxonomic assignments for each ZOTU as described in the taxonomic assignment section below.
To remove the probable contaminant reads, the maximum read count of each ZOTU detected in the negative controls (field blank and NTCs) was subtracted from the read count detected in the sample. Subsequently, ZOTUs were considered positive when the read count was ≥10. Prior to decontamination, read counts were corrected in proportion to the volume of the second PCR products of each sample used for library construction (Table S1). This adjustment was performed for each iSeq run, as the relationship between DNA concentration and read count could vary across runs.
Taxonomic assignment
All data handling and statistical analyses were performed using R (R Core Team, 2021) v.4.1.2, in combination with R studio (2023.12.0+369).
The ZOTUs were searched against GenBank’s nr database using BLASTN 2.14.1+ (as of 5 August 5, 2023, https://blast.ncbi.nlm.nih.gov/Blast.cgi) after combining them into a FASTA file. To reduce the risk of missing species with similar sequences owing to the dominance of sequences from a few species, the maximum number of aligned sequences displayed was set at 250. BLAST results were downloaded and parsed using a custom R script. To avoid uncertain species-associations only sequences identical to ZOTUs (100% identity) with 100% coverage were retained as candidates. Taxonomic information for these candidate species was retrieved from the National Center for Biotechnology Information (NCBI) taxonomy database using “classification” function in taxize v.0.9.100 (Chamberlain & Szocs, 2013).
To improve the accuracy of the assignment, candidate sequences derived from species found in the waters surrounding Hokkaido were selected according to a species list previously documented by Amaoka et al. (2020). ZOTUs with hit sequences from mammals were excluded from analysis. For consistency, synonyms were consolidated according to the scientific names used in the NCBI taxonomy. The ZOTUs were subsequently assigned to the lowest common ancestor (LCA), which is the lowest taxonomic level shared by the candidate species using the “lowest_common” function in the taxize. For ZOTUs annotated above the genus level, the species name was designated as “sp.” because it was unclear whether the DNA was derived from a single or multiple species. ZOTUs assigned to the same LCA collapsed to form taxonomic units unless their candidate species were completely distinct from each other.
Statistical analyses
To assess sample completeness, species accumulation curves were generated for each sample using the “iNEXT” function in iNEXT v.2.0.20 (Hsieh et al., 2016) based on corrected read counts. Given that all curves reached a plateau (Figure S1), it was assumed that the sequencing depth was sufficient to detect all the taxa contained in a filter replicate. As fluctuations in filtration volume per filter can affect the fish diversity detected by eDNA metabarcoding (Kawakami et al., 2023b), read counts were rarefied to the minimum read count within each unique sample prior to combining the results from filtration duplicates for the sample. Rarefaction was performed using the functions in vegan 2.6–6.1 (Oksanen et al., 2024): species occurrence probabilities in a rarefied community were estimated with the “drarefy” function, and species with probabilities greater than 0.5 were retained. The rarefied read count for each species was expressed as the mean of 100 iterations of the “rrarefy” function.
The taxonomic composition was compared among the samples within the under-ice seawater at St. L, between the under-ice seawater at St. L and under-ice river water at St. R, and between eDNA sources (under-ice seawater at St. L and meltwater from sea ice at St. L, under-ice river water at St. R and meltwater from sea ice at St. L). For each comparison, samples were rarefied to the minimum read count within the combination, following the same method described above. Compositional dissimilarity was evaluated using the Jaccard index (ßjac) and Venn diagrams. ßjac was further partitioned into turnover (βjtu) and nestedness components (βjne). The turnover component indicates the replacement of some species with others between samples, whereas the nestedness component indicates the extent to which the taxonomic composition of a sample with fewer species is a subset of a richer sample (Baselga, 2010). The contribution of each component to the total dissimilarity is expressed as the ratio of ßjtu or ßjne to ßjac. The dissimilarity indices were calculated using the “beta.pair” function in betapart v.1.6 (Baselga et al., 2012) after converting the taxon-by-sample table into presence/absence data. Additionally, the Chao2 estimator (Schao2), which estimates true taxonomic richness (Hsieh et al., 2016), was calculated based on results from the four samples of under-ice seawater at St. L using the “iNEXT” function (Hsieh et al., 2016) after being rarefied as previously described.
To discuss the possibility of fish overwintering within the frozen lagoon and river, the congruence of data between taxa detected by eDNA and known fish fauna was evaluated based on the National Survey on the Natural Environment, which was periodically conducted by the Ministry of the Environment, Japan, from 1973 to 2004, and the National Census on River Environments, which was continually conducted by the Ministry of Land, Infrastructure, Transport, and Tourism, Japan, after 1990. Fish species lists were downloaded from the Biodiversity Center of Japan, Ministry of the Environment (Ministry of the Environment, Japan, 1993; available at https://www.biodic.go.jp/reports2/4th/kosho/4_kosho_allm.pdf) and River Environmental Database (Ministry of Land, Infrastructure, Transport, and Tourism, Japan, 2024; available at https://www.nilim.go.jp/lab/fbg/ksnkankyo/). As no comprehensive report on fish fauna is available for the Saromabetsu River, we selected the Tokoro River as an alternative under the assumption that they have similar fish communities. In these reports, fish fauna were surveyed through fish collection, compilation of fishery catch records, visual inspection, and interviews with residents. Only taxa identified or assigned at the species level were used for comparison. Since the Actinopteri species known from these references were completely covered by the MiFish database (v.3.83), we assumed that the risk of false-negative results due to the lack of DNA data was negligible. This possibility was further evaluated by integrating environmental preferences. Habitat information of each species was retrieved from Fishbase using the “ecology” function in rfishbase v.4.1.2 (Boettiger et al., 2012) with Amaoka et al. (2020) as a complementary. The ranges of preferred parameters and absolute minimum or maximum parameter thresholds of certain species regarding water temperature and ice concentration were obtained from AquaMaps (https://www.aquamaps.org/) using aquamapsdata v.0.1.6 (Kaschner et al., 2019). As AquaMaps compiles natural occurrence data for marine species, our comparison also focused on marine fish.
