Skip to main content

Data from: Proportion methylation at a set of CpGs from 4 amplicons

Cite this dataset

Little, Tom (2020). Data from: Proportion methylation at a set of CpGs from 4 amplicons [Dataset]. Dryad.


The age structure of populations, or the ageing rate of individuals, impacts aspects of animal ecology, epidemiology and conservation. Yet for many wild organisms, age is an inaccessible trait. In many cases measuring age or ageing rates in the wild requires molecular biomarkers of age. Epigenetic clocks based on DNA methylation have been shown to accurately estimate the age of humans and laboratory mice, but they also show variable ticking rates that are associated with mortality risk above and beyond that predicted by chronological age. Thus, epigenetic clocks are proving to be useful markers of both chronological and biological age, and they are beginning to be applied to wild mammals and birds. We have acquired strong evidence that an accurate clock will be possible for the wood mouse Apodemus sylvaticus by adapting epigenetic information from the laboratory mouse. Apodemus sylvaticus is a well-studied field system that is amenable to experimental perturbations and longitudinal sampling of individuals across their lives, and these features of the wood mouse offer opportunities to disentangle causal relationships between ageing rates and environmental stress. Our wood mouse epigenetic clock is PCR-based, and so requires tiny amounts of tissue and non-destructive sampling. We quantified methylation using Oxford Nanopore sequencing technology and present a new bioinformatics pipeline for data analysis. We thus describe a new and generalizable system that should enable ecologists and other field biologists to go from tiny tissue samples to an epigenetic clock for their study animal.   


DNA was extracted using a phenol-chloroform method from ear punches taken from individuals of known age and kept under identical conditions in a wood mouse colony at the University of Edinburgh. We obtained DNA from 48 mice spanning from 88 to 496 days old. This slightly unusual choice of ages arises because we utilised mice that were part of other experiments, thus increasing the information gained from each animal sacrificed. Lifespan in wild mice is not known for certain, and many will die from extrinsic mortality (e.g. predation), but in our own field work we recapture 10-20% of tagged mice the following year, meaning wild mice may often live for hundreds of days.

We used the polymerase chain reaction (PCR) to identify CpG sites in Apodemus sylvaticus that could contribute to an methylation-based epigenetic clock. We first bisulphite treated the DNA using the ZYMO-Gold DNA bisulphite conversion kit. Bisulphite treatment converts unmethylated cytocine to uracil, which are then converted to thymine during PCR. Methylated cytosines are unchanged by the treatment. For PCR of bisulphite-converted DNA, we focused initially on five genes studied by Han et al (8): Prima1, HSP4, Kcns1, Gm9312 and Gm7325. From Ensemble we obtained Mus musclus DNA sequence that included the key sites identified by Han et al (8). We blasted 200-300bp of Mus sequence for each gene this into the Apodemus sylvaticus whole genome shotgun sequence available on NCBI (taxonID: 40375) to retrieve the homologous wood mouse sequence. We then designed Apodemus-specific primers for each gene in approximately the same location as that used by Han et al (8). In some cases, Apodemus contained a CpG at the desired primer location, but it was usually possible to design primers slightly up or downstream. In two cases, however, we inserted an ambiguous base pair at a CpG location.

We used an Oxford Nanopore Flongle to sequence our amplicons and determine the % methylation at all CpG sites (Figure 1). The Flongle sequences up to ~ 1GB of DNA, but given the short length of our amplicons this provides considerable depth of coverage. Our molecular methods follow Quick et al 2017 (18), with minor modifications. Amplicons were pooled by mouse and quantified with a Qubit, the aim being to normalise DNA quantities so that ~ 200 fmol of DNA were loaded onto the Flongle. Otherwise, the minion was run according to the manufacturer’s instructions (Ligation Sequencing Kit, SQK-LSK109).

We developed the custom software `Paramether` on a Snakemake(20) framework for easy-to-use estimation of CpG methylation from nanopore sequences. Paramether first bins the sequences by barcode using Porechop ( We curated a database of reference genes representing the bisulphite-treated, amplified gene sequences. Paramether queries this database using parasail ( in order to classify which gene each read corresponds to. Using the alignment with the reference genes, key sites are identified in the read and `Paramether` determines if a site is modified (i.e. a C or a T) or not. This process is continued for each read and counts of methylation are output per gene per barcode.

Proportion methylated Cs at each of the 85 CpG in the sequences was studied using penalised regression to identify sites that correlate with age. The was achieved with the GLMNET package in R (19) using a LASSO model (mixing parameter alpha=1)  and leave-one-out cross validation (nfolds=nrow). The code for this is presented in the supplementary data.  

Usage notes

The file cpg_wide_clock.csv is the proportion methylation at each CpG. Rows or columns with missing values have been removed because GLMnet does not accept them. The file cpg_counts.csv provides information on the number of C's and T's counted, and thus also depth of coverage at each CpG. 


Wellcome Trust

The Moray Fund

The Moray Fund