DNA methylation-based age prediction and sex-specific epigenetic aging in a lizard with female-biased longevity

Shealy, Ethan 1 ; Schwartz, Tonia 2 ; Cox, Robert 3 ; Reedy, Aaron4 ; Parrott, Ben1

Research facility: University of Georgia

Published Jan 15, 2025 on Dryad. https://doi.org/10.5061/dryad.3j9kd51sw

Data files

Jan 15, 2025 version files 5.69 GB

Abstract

Sex differences in lifespan are widespread across animal taxa, but their causes remain unresolved. Alterations to the epigenome are hypothesized to contribute to vertebrate aging, and DNA methylation-based aging clocks allow for quantitative estimation of biological aging trajectories. Here, we investigate the influence of age, sex, and their interaction on genome-wide DNA methylation patterns in the brown anole (Anolis sagrei), a lizard with pronounced female-biased survival and longevity. We develop a series of age predictor models and find that contrary to our predictions, rates of epigenetic aging were not slower in female lizards. However, methylation states at loci acquiring age-associated changes appear to be more “youthful” in young females, suggesting that female DNA methylomes are preemptively fortified in early life in opposition to the direction of age-related drift. Collectively, our findings provide new insights into epigenetic aging in reptiles and suggest that early-life epigenetic profiles are more informative than rates of change over time for predicting sex biases in longevity.

https://doi.org/10.5061/dryad.3j9kd51sw

Description of the data and file structure

This repository is associated with the publication: Shealy et al. (2025). DNA methylation-based age prediction and sex-specific epigenetic aging in a lizard with female-biased longevity. Science Advances. In Press.

Briefly, in this study, DNA was obtained from Anolis sagrei red blood cells from individuals of various age groups and both sexes, and DNA methylation data was generated using a high-throughput sequencing approach.

This repository contains the methylation frequency tables which, along with the coordinates of differentially methylated cytosines in the supplementary material of the manuscript, allows the reproduction of the main findings in the study.

The scripts necessary to process Bismark methylation data into the matrices found here can be located at https://github.com/ethan-shealy/AnolisEA2024.

Two data files are included: one unfiltered matrix of the DNA methylation data (labeled_allSites_stranded.tab) and a filtered, de-stranded version (labeled_total80perc.tab).

Unfiltered dataset - labeled_allSites_stranded.tab

This file is a tab-seperated table containing data describing one CpG dinucleotide per row. The first four columns describe the genomic location of the CpG using coordinates from the AnoSag2.1 genome assembly (Refseq: GCF_025583915.1). The first column (heading: "chr") is the sequence (scaffold) name in the assembly. The second and third columns are identical (headed "start" and "end", respectively), and both give the 1-base coordinates of the cytosine in question. The fourth column (heading "strand") gives the strandedness ("+" or "-") of the cytosine with respect to the reference.

Each column following these first four corresponds to data from a single sampled individual (with a heading containing the sample's ID number). The column for each individual contains methylation beta values for each CpG site - which is calculated as the frequency with which a site was observed as being methylated in the EM-seq experiment expressed as a percentage (50.00 = 50% methylation). Missing values in the matrix represent sites which did not receive any read coverage for a given individual during sequencing, so methylation frequency could not be estimated. These missing values are represented by empty cells in the matrix.

De-stranded and filtered dataset - labeled_total80perc.tab

Similar to the previous matrix, this file is a tab-seperated table containing data describing one CpG dinucleotide per row. However, this matrix has undergone some filtering procedures, including:

Methylation information was first collapsed by strand - that is, because CpG dinucletodies are palindromic with respect to strand, each CpG on the top/"Watson" strand represents a corresponding CpG in the opposite direction on the bottom/"Crick" strand. Because studies have shown that methylation at these CpG palindromes is often tightly associated across strands, it is common to summarise information from the top and bottom strand when estimating methylation frequency. Therefore, each cell in this dataset represents the combined information from reads aligning to both strands.
For a methylation beta value estimate (table cell) to be included in the matrix for a given individual at a given CpG site, it must have received a read coverage depth of at least 5x (i.e. the sum of reads aligning to the top and bottom strands must be >= 5).
For a CpG site (row) to be included in the final matrix, it must have an methylation estimate present across a total of at least 30 individuals (while meeting the previous criteria of 5x coverage in each of those individuals), which is roughly 80% of the total sample size.

Again, the first four columns of this table describe the genomic location of each CpG passing filtering using coordinates from the AnoSag2.1 genome assembly (Refseq: GCF_025583915.1). The first column (heading: "chr") is the scaffold name. The second and third columns are identical (headed "start" and "end", respectively), and both give the 1-base coordinates of the cytosine in question. The fourth column (heading "strand") is not meaningful in this matrix, because strand information has been collapsed, and is recorded only as "+".

Each column following these first four corresponds to a single sampled individual (with a heading containing the sample's ID number prefixed with an "S"). The column for each individual contain a methylation beta value for each CpG site - which is calculated as the frequency with which a site was observed as being methylated in the EM-seq experiment expressed as a percentage (64.70 = 64.7% methylation). Missing values in the matrix represent sites which did not receive adequate read coverage (5x) for a given individual during sequencing, so methylation frequency could not be estimated. These missing values are represented by empty cells in the matrix.

Sharing/Access information

Raw reads and sample metadata are available via the GEO accession GSE285624.

Code/Software

See github.com/ethan-shealy/AnolisEA2024

Organism: Anolis sagrei

Tissue: Red blood cells

Genotype: WT

Sampling protocol: Animals were immediately euthanized by decapitation. Blood was collected from the trunk and stored on ice prior to separation of plasma and blood cells by centrifugation. All tissue samples and blood components were snap-frozen in liquid nitrogen.

Husbandry protocol: Animals used in this study were captive-bred descendants of stock originally collected from native A. sagrei populations near Georgetown, Great Exuma, in the Commonwealth of the Bahamas (23º29’N, 75º45’W) and imported under permits from the Bahamas Environment, Science, and Technology Commission, the Bahamas Ministry of Agriculture, and the United States Fish and Wildlife Service. Animals were housed at the University of Virginia and all procedures were approved under UVA ACUC protocol 3896. The exact hatch date of each captive-bred animal was recorded such that age was known with certainty.

Extracted molecule: genomic DNA

Extraction protocol: DNA was isolated from the blood cell pellet (majority nucleated red blood cells) using Qiagen’s PureGene CELL extraction kit (method described in Lindsey and Schwartz 2022, "DNA Isolation from Reptile Blood using Gentra Puregene (Qiagen) DNA Isolation Kit" published on protocols.io).

Sequencing protocol: Libraries were prepared using the NEBNext Enzymatic Methyl-seq kit from New England BioLabs with 200 ng genomic DNA as input. Prior to enzymatic conversion, libraries were sheared to a target fragment size of 300 bp using a Covaris sonication instrument. After assessing library qualities using Bioanalyzer and Qubit instruments, 40 enzymatically converted libraries were sequenced on two lanes of an Illumina Novaseq 6000 instrument at the University of Florida’s Interdisciplinary Center for Biotechnology and Research to yield an average of ~120 million paired-end, 150bp reads per sample.

Bioinformatic protocol: Reads were trimmed and quality-checked with Trim_Galore!, a wrapper script for CutAdapt and FastQC using a PHRED score cutoff of 20 in addition to a 4 bp hard clip on both read ends in order to reduce methylation bias. Alignment to the A. sagrei reference genome, AnoSag2.1 (Refseq: GCF_025583915.1), was conducted using Bismark with default parameters, and duplicate reads resulting from PCR bias were removed using the deduplicate_bismark tool. Alignment files were sorted and converted to .sam files using SAMtools. Three samples were excluded from downstream analysis due to low alignment rates (<5%), leaving a total of 37 samples. SAM files were imported into R and converted to a methylation matrix and filtered using the R package methylKit.