An epigenetic clock for accurate age prediction in Atlantic cod populations for improved fisheries management
Data files
Feb 18, 2026 version files 344.19 MB
-
README.md
3.20 KB
-
samples-details-lat-lon.txt
8.48 KB
-
samples.zip
344.18 MB
Abstract
Fisheries management relies on accurate stock assessments, which in turn depend on precise age information. Recent molecular tools called “epigenetic clocks” harness age-related DNA methylation changes to build accurate and precise age-prediction models. However, the influences of intrinsic and extrinsic factors on clock performance remain uncertain. In this study, we examined Atlantic cod aged 0 to 7 years, sampled from various locations across the North Sea, and developed an epigenetic clock using bisulfite restriction-site associated DNA sequencing (bis-RAD-seq) DNA methylation data from 73 CpG sites obtained from fin clips. This clock predicted age with 97.5 % accuracy and a precision of 2.8 months and generalized well in unseen data. Further, we addressed critical variables such as sex and maturity status, which are often overlooked, and we showed that clock performance was unaffected by sex-specific differences in growth, and it was lower in advanced sexually mature individuals, reflecting a slight bias towards younger fish. A key finding of our study is the discovery of a latitudinal cline in global DNA methylation patterns. We found that DNA methylation varied with latitude, despite the absence of genetic differences, while our clock maintained consistent performance across geographic locations. This resolves a major question regarding how generalizable epigenetic clocks are within the distribution of a species. Our clock demonstrates extensive applicability and enhanced practicality for real-world fisheries management. It provides accurate and precise age prediction for Atlantic cod irrespective of intrinsic differences or environmental influences associated with geographic locations.
https://doi.org/10.5061/dryad.tmpg4f565
Description of the data and file structure
This dataset contains the data required to construct an epigenetic clock to accurately predict age in populations of the iconic Atlantic cod. One hundred twenty individuals from 8 age classes (0-7 years old) individuals from the North Sea were aged via otolith readings. Fin-clips were obtained and Bisulfite Restriction site Associated DNA markers sequencing (Bis-RAD-seq) was performed to assess genome-wide DNA methylation patterns. The epigenetic clock predicted age with 97.5 % accuracy and a precision of 2.8 months and generalized well in unseen data. Our clock maintained consistent performance across sexes and different geographic locations.
Files and variables
Each file represents an individual fin-clip sample. The sequencing files follow the structure described below. The genome assembly used was version gadMor3.0 (INSDC Assembly GCA_902167405.1, Jul 2019) publicly available on Ensembl.
- chrBase: chromosome and position separated by “.”
- chr: chromosome
- base: position
- strand: F = forward, R = reverse
- coverage: total number of reads covering the position
- freqC: frequency of Cs in reads of the position (methylated Cs that have not been converted to Ts after bisulfite treatment)
- freqT: frequency of Cs in reads of the position (unmethylated Cs that have been converted to Ts after bisulfite treatment)
- filename: name of .txt file containing the DNA methylation data
- sample_id: numbered ids of the sampled individuals
- age: age (years) estimated by otolith readings
- sample: descriptive id containing the year (y) and month (m) of each sample
- batch: two sequencing batches were run and samples belong in either
- station: sampling station during research cruise no. 428 of the German fisheries research vessel ‘Walther Herwig III’ (July/August 2019) as part of the International Bottom Trawl Survey (IBTS)
- length: length of fish (cm)
- weight: total weight of fish (g)
- gutted_weight: weight after removing guts (g), when impossible to weight = NA
- sex: sex of fish coded as:
- F = female
- M = male
- NA = undetermined because individuals are still undifferentiated
- maturity: sexual maturity status coded as:
- a = immature/juvenile
- b = maturing
- c = spent
- UN = undetermined maturity
- geo_location: locations of sampling across the North Sea with letters (A, C, L, M) indicating the four areas of origin where A and C are in the Southern area and L and M in the Viking area
- lat: latitude
- long: longitude
- ml: indicates whether the sample was used in the machine learning (ml) process (y) or was filtered out during previous steps (n)
Code/software
The code required to analyze the data and construct the epigenetic clock is available here: https://codeberg.org/dafanast/Epigenetic_clock
Overall design: We used Bisulfite Restriction site Associated DNA markers sequencing (Bis-RAD-seq) to assess genome-wide DNA methylation patterns in fin-clips of Atlantic cod individuals from the North Sea aged via otolith readings. One hundred twenty individuals from 8 age classes (0-7 years old) were represented, including individuals based on their 1) sex: females, males and undifferentiated, 2) sexual maturity status: immature, maturing, and undetermined and 3) geographic origin along the North Sea (A, C, L, M according to the country of origin). An equal number of bis-RAD-seq libraries were prepared and sequenced in 2 batches: 60 samples were processed first to develop the bis-RAD-seq protocol (batch 1). Then, 60 more samples of cod (batch 2) were processed. Bis-RAD-seq libraries were prepared by Floragenex (Eugene, USA) after adapting the original protocol by Trucchi et al. (2016, doi:/10.1111/mec.13550).
Description of protocols: Samples were collected on the research cruise no. 428 of the German fisheries research vessel “Walther Herwig III” during July/August 2019 as part of the International Bottom Trawl Survey (IBTS) in the 3rd quarter. Individual cod were sampled throughout the survey area in the North Sea. For each fish, total length, weight, sex and the sexual maturity stage were taken, as well as otolith samples for age reading. Fin-clips were stored in 80 % ethanol at 4 ºC until DNA extraction. Genomic DNA was extracted using the DNeasy Blood & Tissue Kit (Qiagen, Germany) according to the manufacturer's protocol. Bis-RAD-seq libraries were prepared as in Trucchi et al. (2016, doi:/10.1111/mec.13550). Briefly, 200 ng of DNA were digested with the restriction enzyme SbfI-HF and ligated with an individually barcoded, restriction-site overhang-specific P1 adapter with cytosines methylated. Five individually barcoded samples with unique P1 barcoded adapters were combined in a single pool. DNA was sheared to an average size of 500 bp, cleaned up, fragments of 300-500 bp were selected, cleaned up, end-repaired and dA-tailing was performed. The cytosine methylated P2 adapter, a “Y” adapter with divergent ends that contains a 3´ dT overhang was ligated onto the ends of DNA fragments with corresponding dA overhangs. DNA was bisulfite converted and high-fidelity PCR amplification was performed. Library fragments of 400-600 bp were size selected and cleaned up. Final libraries were quantified and their quality evaluated. Batch 1 was sequenced paired end 150 bp and batch 2 was sequenced single end 150 bp.
Description of data processing: Bioinformatics and machine learning data analyses were performed as described in Anastasiadi and Piferrer (2023, doi:10.3389/fmars.2023.1096909). The quality of demultiplexed raw sequencing reads was assessed using fastqc and multiqc for summarizing reports across many samples. Quality filtering was performed using process_radtags from Stacks (v. 2.2) the following parameters activated: -e sbfI, –r (--rescue: rescue barcodes and RAD-Tags), -c (--clean: clean data, remove any read with an uncalled base), -q (--quality: discard reads with low quality scores), --disable_rad_check. PCR duplicates were partially removed using clone_filter from Stacks (v. 2.2). Quality controls of the filtered data were performed using fastqc and multiqc again.
A bisulfite converted versions of the genome was created using Bismark (v. 20.0). We used the assembly version gadMor3.0 (INSDC Assembly GCA_902167405.1, Jul 2019) publicly available on Ensembl. Filtered reads were aligned to the bisulfite converted genomes with Bismark checking for alignments to all possible strands and relaxing the stringency setting --score-min to L,0,-0.6. Methylation values were extracted from aligned reads using the function bismark_methylation_extractor, while ignoring the first 4 bases since they correspond to the P1 adapter overhang, merging the DNA methylation values in non-CpG context. Individual files containing DNA methylation values (n = 120) were read into R via methylKit. This process requires assignment of samples to one of two groups (control vs treated) and samples were randomly assigned to one or the other. CpGs covered by less than 10 reads or exceeding the 99.9 % percentile of the distribution were filtered out and coverage was normalized across samples. Only CpGs present in 48 out of 60 samples (80 %) per group were kept. Individuals with less than 100.000 CpGs covered were eliminated as outliers resulting in a total of 110 individuals.
In this data package, we provide 1) the 120 individual files containing DNA methylation values in a format compatible with methylKit to be read directly into R and 2) a metadate file containing all information about the 110 samples used in final analyses.
