Data from: Stability of environmental DNA methylation and its utility in tracing spawning of fish
Data files
Aug 05, 2024 version files 74.22 MB
-
data_03_table.csv
25.06 MB
-
Data_processing.R
3.22 KB
-
Fasta.zip
854 B
-
Figure_S2_data_.xlsx
21.48 KB
-
MethylationRate.xlsx
20.75 MB
-
NGS_data.zip
28.32 MB
-
qPCR_data.zip
48.88 KB
-
README.md
5.71 KB
Abstract
The use of environmental DNA (eDNA) as a new method of ecological monitoring is widely applied. Although eDNA can provide important information on the distribution and biomass of particular taxa, an organism’s DNA sequences remain unaltered throughout its existence, which complicates identifying crucial events, including reproduction, with high accuracy. We thus examined DNA methylation as a novel source of information from eDNA, considering that methylation patterns of eggs and sperm released during reproduction differ from those of somatic tissues.
Despite its potential applications, little is known about eDNA methylation, including its stability and methods for detection and quantification. Therefore, we conducted tank experiments and performed methylation analysis targeting 18S rDNA through bisulfite amplicon sequencing.
Methylation of eDNA was not affected by degradation and was equivalent to the rate of genomic DNA from somatic tissues. Unmethylated DNA, which is abundant in the ovary, was detected in eDNA during reproductive activity of fish.
These results indicate that eDNA methylation is a stable signal reflecting genomic methylation and demonstrate that germ cell-specific methylation patterns can be used as markers for detecting reproductive activity.
This dataset is supporting information for the submitted manuscript MER-24-0037.R1.
Please refer to the manuscript for information on the equipment, workflows, and materials used.
Description of the data and file structure
qPCR data
In the manuscript, section 2.6 of MATERIALS AND METHODS states that the qPCR results for Tank Experiment 1 are found in "220901_qPCR_ITS2_1_data.csv," while the results for Tank Experiment 2 are in "220901_qPCR_ITS2_2_data.csv," "220901_qPCR_ITS2_3_data.csv," and "220901_qPCR_ITS2_4_data.csv."
The "Well" column indicates the position on the 96-well plate. The "Sample Name" column indicates the template sample. For example, "Tank1-1-1"to"Tank1-1-7" in "220901_qPCR_ITS2_1_data.csv" represent the seven time points of Tank1-1 in Tank experiment 1 in the manuscript, respectively. "Tank2-1-1"to"Tank2-1-16" in "220930_qPCR_ITS2_2_data.csv", "220930_qPCR_ITS2_3_data.csv", and "220930_qPCR_ITS2_4_data.csv" represent in are the 16 time points of Tank2-1 in Experiment2, respectively." blank" indicates a blank tank samples. "Dre-ITS2" in the "Target Name" column indicates that the target region of the PCR is ITS2 of zebrafish (Danio rerio). The "Task" column indicates that the template is a standard or unknown concentration sample or negative control. The "Reporter" and "Quencher" columns indicate the type of fluorophore used for the probe, respectively. The "Ct" column is the Ct value for each replicate determined by real-time PCR, and the "Ct Mean" and "Ct SD" columns are the mean and standard deviation of the three replicates for each sample." The "Quqntity", "Quantity Mean" and "Quantity SD" columns were calculated based on calibrations generated from standards. "Undetermined" indicates non-detection, and if no mean or standard deviation was calculated due to non-detection, it is expressed as n/a (not applicable). The standard is also expressed as n/a since the concentration is known. "Ct Threshold" was automatically calculated by the software installed in the PCR instrument.
Figure_S2_data
This file contains the Figure S2 referred to in section RESULTS 3.1 in the manuscript, the underlying data, and the calculation method.
The "Sample" column corresponds to the qPCR data section of the README. The "MethylRate_before" column is the methylation rate for each sample obtained by skipping the Identity filtering described in section 2.8.4 of MATERIALS AND METHODS. The "MethylRate_after" column is the methylation rate for each sample obtained by performing Identity filtering as described in section 2.8.4. "Metyl_change" column is the value obtained by dividing the difference between A and B by the value of B. This value calculated for each sample is the y-coordinate of the graph.
The "% of removed sequences" is the ratio of the sequences after Identity filtering to the raw sequences, which can also be found in TableS2 of the Appendix. The "Metyl_change" and "% of removed sequences" for each sample are plotted on the graph as y-coordinate and x-coordinate, respectively.
Fasta_format data
The "18S_sanger_consensus.fasta" is the BSAS target region sequence of the zebrafish somatic tissue genome determined by Sanger sequencing.
The "18S_sanger_ovary.fasta" is the BSAS target region sequence of the zebrafish ovary genome determined by Sanger sequencing.
These PCR products were amplified using the primer set of EF3 and ER1 (see the paper for primer sequences).
The reference sequence for BSAS is "18SB2_amplicon_ref_221115.fasta". It was used as the sequence after bisulfite treatment, converting all C to T except CG context within the target region. These sequences are deposited in DDBJ with accession numbers LC813236-LC813237.
NGS data
This archive contains compressed FASTQ sequencing files generated from an Illumina iSeq PE 2x150bp sequencing run. More details on sequencing methods can be found in the "MATERIALS AND METHODS" 2.8 section on Dryad and the manuscript.
Rcode & Methylation rate calculation
The R code ("Data_processing.R") generates a list of unique sequences and their read counts for each sample from the FASTQ data, stored in the "data_03_table.csv" file. Using the "data_03_table.csv," a spreadsheet ("MethylationRate.xlsx") was created and used in the study.
"data_03_table.csv"
The FASTQ data for each sample was summarized by sequence after screening and output as "data_03_table.csv".
After screening, the data were grouped by sequence.
The "uniq" column indicates the ID of each sequence.
The "seq.t" column indicates the contents of each sequence.
This list contains the number of each sequence in each sample extracted from tissue or tank water.
Methylation rate calculation
"MethylationRate.xlsx" was prepared based on "data_03_table.csv". "seq.t-CG" (column C in the sheet) contains the sequence of "seq.t" excluding the CG sequence." The number of CGs is the number of characters in "seq.t" minus the number of characters in "seq.t-CG" divided by 2, and is stored in "CG number" (column D). The right side from column E lists the number of each array in each sample, and the third line lists the total number of arrays. (For example, the number of uniq1 detected in 1_F1_skin is 76 and the total number of sequences is 1150.) The second line shows the methylation rate in %. The methylation rate in the target region of each sample was calculated as follows: (sum of CGs)/(sum of CpG sites). For example, the methylation rate of 1_F1_skin is expressed by the following formula in Excel.
=SUMPRODUCT($D$4:$D$61001*E4:E61001)/SUM(E4:E61001)/19
- Hirayama, Itsuki; Minamoto, Toshifumi; Wu, Luhan (2024). Stability of environmental DNA methylation and its utility in tracing reproductive activities of fish [Preprint]. Authorea, Inc.. https://doi.org/10.22541/au.171007239.91091562/v1
-
Hirayama, Itsuki T.; Wu, Luhan; Minamoto, Toshifumi (2024). Stability of environmental
DNA methylation and its utility in tracing spawning in fish. Molecular Ecology Resources. https://doi.org/10.1111/1755-0998.14011
