Epigenetic aging of Māui and Hector's dolphins
Data files
Jan 26, 2024 version files 18.96 KB
-
README.md
-
Revised_Ceph_Samples.csv
Abstract
The age of an individual is an essential demographic parameter but is difficult to estimate without long-term monitoring or invasive sampling. Epigenetic approaches are increasingly used to age organisms, including non-model organisms such as cetaceans. Māui dolphins (Cephalorhynchus hectori maui) are a critically endangered subspecies endemic to Aotearoa New Zealand, and the age structure of this population is important for informing conservation. Here we present an epigenetic clock for aging Māui and Hector's dolphins (C. h. hectori) developed from methylation data using DNA from tooth aged individuals (n = 48). Based on this training dataset, the optimal model required only eight methylation sites, provided an age correlation of 0.95, and had a median absolute age error of 1.54 years. A leave-one-out cross-validation analysis with the same parameters resulted in an age correlation of 0.87 and median absolute age error of 2.09 years. To improve age estimate, we included previously published beluga whale (Delphinapterus leucas) data to develop a joint beluga/dolphin clock, resulting in a clock with comparable performance and improved estimation of older individuals. Application of the models to DNA from skin biopsy samples of living Māui dolphins revealed a shift in the median age of 8–9 years to a younger population aged 7–8 years 10 years later. These models could be applied to other dolphin species and demonstrate the ability to construct a clock even when the number of known age samples is limited, removing this impediment to estimating demographic parameters vital to the conservation of critically endangered species.
README: Epigenetic aging of Māui and Hector's dolphins
https://doi.org/10.5061/dryad.hx3ffbgm3
The files linked here accompany the article by Hernandez et al., "Using epigenetic clocks to investigate changes in the age structure of critically endangered Māui dolphins." Included here are the R code for the epigenetic clock construction and accompanying analyses and the files needed to replicate this work. Supplementary material is available from the journal website.
Description of the data and file structure
The methylation data is structured to where each row is a cytosine-guanine dinucleotide (CpG) site in the methylation array, and each column is an individual sample, as indicated by the variable Basename and ExternalSampleID. Metadata are structured to where each row is an individual sample, and the columns are associated metadata variables. Missing data are indicated by NAs. Individuals with a "no" in the CanBeUsedForAging variable or DOC_Cat "red" were excluded from epigenetic clock development as these individuals were technical outliers or considered unsuitable for aging.
1. Revised_Ceph_Samples.csv
Contains metadata and identifiers for the Māui and Hector's dolphin samples analyzed in the study relevant for R analyses.
Variables
- ExternalSampleID: In-house sample identifiers from Oregon State University
- DOC_Code: an individual identifier assigned by the New Zealand Department of Conservation
- Basename: An identifier for the sample when it was run on the methylation array by UCLA
- Age: The initial tooth age estimate (in years) provided by Massey University
- CanBeUsedForAging: Based on hierarchical clustering after the methylation array data were generated by UCLA, is the sample suitable for aging? Either yes, no or maybe. Samples with "no" were not used in clock calibration.
- Female: a binary representation of sex, where 1 = female and 0 = male
- SpeciesCommonName: The common name of the subspecies from which the sample originated
- SpeciesLatinName: The binomial name of the subspecies from which the sample originated
- DOC_cat: A categorical variable for samples that were analyzed for teeth aging. 'Green' denotes a confident age estimate. 'Yellow' for some level of uncertainty in age estimate and to use with caution in clock calibration. 'Orange' for samples where only a minimum age estimate could be reached. 'Red' for samples where the tooth was unreadable so no age estimate was possible and it was recommended not to use the sample for clock calibration. NA for samples that did not have a tooth read for aging. Additional details about tooth aging are provided in Betty et al. 2022 (see citation in Hernandez et al. 2023 for full reference information).
- ToothBatch: If a tooth was read for aging, the batch number for analysis. Either 1, 2, 3 or NA
- SampleSource: Source for the tissue sample used for epigenetic aging, either 'Biopsy' for tissue from a biopsy dart, or 'Necropsy' for post-mortem collection.
2. HM_Clock_Clean.R
The R code used to analyze the methylation data. Variables defined within. Data files are available through this repository or from NCBI GEO.
Sharing/Access information
The manifest for the methylation array (HorvathMammalMethylChip40) is available at the Gene Expression Omnibus (GPL28271: Illumina HorvathMammalianMethylChip40 Bead Chip). Beluga methylation data and some of the Māui and Hector's dolphin samples are archived in the NCBI GEO database, GSE164465, “Genome Methylation in Wild Beluga Whales.” Remaining Māui and Hector's dolphin methylation data are archived in the NCBI GEO database, GSE242072, “Epigenetic aging of Māui and Hector’s Dolphins” (Accession numbers GSM7748508 – GSM7748648). Beluga whale metadata are available through the supporting information for Bors et al. (2021) at https://doi.org/10.1111/eva.13195.
Code/Software
All analyses were run in R version 4.2.1 as implemented in RStudio. The necessary packages include the tidyverse suite, cowplot (for generating multi-panel figures) and glmnet (for clock models).