Skip to main content

Epigenetic models developed for plains zebras predict age in domestic horses and endangered equids

Cite this dataset

Larison, Brenda; Pinho, Gabriela (2022). Epigenetic models developed for plains zebras predict age in domestic horses and endangered equids [Dataset]. Dryad.


Effective conservation and management of threatened wildlife populations require an accurate assessment of age structure to estimate demographic trends and population viability. Epigenetic aging models are promising developments because they estimate individual age with high accuracy, accurately predict age in related species, and do not require invasive sampling or intensive long-term studies. Using blood and biopsy samples from known age plains zebras (Equus quagga), we modeled epigenetic aging using two approaches: the epigenetic clock (EC) and the epigenetic pacemaker (EPM). The plains zebra EC has the potential for broad application within the genus Equus given that five of the seven extant wild species of the genus are threatened. We test the ECs ability to predict age in sister taxa, including two endangered species and the more distantly related domestic horse, demonstrating high accuracy in each case. Comparing estimated and chronological age (age acceleration) can indicate health status in known age populations. Our epigenetic models leverage samples from a population with a complex pedigree, allowing us to measure the association between inbreeding and age acceleration. The EPM model highlights an interaction between age and inbreeding associated with accelerated aging, suggesting that the effects of inbreeding on epigenetic aging increase with age.


We obtained both whole blood (96) and remote biopsy (24) samples from a captive population of zebras maintained in a semi-wild state by the Quagga Project38 in the Western Cape of South Africa. The collection of 188 whole-blood samples from domestic horses is described in detail in39. The Grevy’s zebra (n=5) and Somali wild ass (n=7) are samples from zoo-based animals that were opportunistically collected and banked during routine health exams. The DNA methylation profiles from these samples have been reported previously40

We generated All DNA methylation data (plains zebra, horse, Somali wild ass, Grevy's zebra) using a custom Illumina methylation array (HorvathMammalMethylChip40). The array contains 36 thousand probes, 31,836 of which mapped uniquely to the horse genome. We normalized methylation values from each species (plains zebra, horse, Somali wild ass, and Grevy’s zebra) and tissue (blood and biopsy) using SeSAMe.

We studied epigenetic aging in plains zebras using both epigenetic clock (EC) and epigenetic pacemaker (EPM) models. The ECs were developed by fitting a generalized linear model with elastic-net penalization (alpha=0.5) using leave-one-out (LOO) cross-validation in glmnet v.4.0-2 in R v.4.1.0.

We derived the genotypes used to estimate inbreeding (F and FROH) from two sources: RAD sequencing (42 samples) and imputed genotypes at the same set of loci from low-coverage whole-genome sequencing data (28 additional samples). F and ROH were estimated using PLINK. We used multiple linear regressions to assess whether inbreeding is associated with age acceleration in the plains zebra population. Age acceleration was the dependent variable and was calculated as the residuals of chronological age regressed on predicted age. The independent variables were sex, chronological age, inbreeding, and the interaction between chronological age and inbreeding.

More details are available in Larison et. al "Epigenetic models developed for plains zebras predict age in domestic horses and endangered equids" on bioRxiv [currently in review: Communications Biology]

Usage notes

Data are provided in three zipped folders EC, EPM, and Inbreeding. Each folder contains a README file. In brief:

Scripts are provided to process each data set. They can be found on Zenodo at

The data in the EC folder include everything necessary to create plains zebra epigenetic clocks. The methylation data provided in the EC folder has already been normalized. The non-normalized data can be found on Gene Expression Omnibus (GEO, GSE184223). Also in the EC script are tests of the plains zebra blood clock in three other equids. These commands should be run using normalized data - using SeSAMe. The non-normalized data is available on GEO (GSE184223). 

Data in the EPM folder contain everything needed to run the epigenetic pacemaker models, and an excel sheet with all the pearson coefficients for plains zebra methylation (CpGs mapping to horse) and age. 

Data in the inbreeding folder include files needed to estimate ROH and F and assess the relationship between inbreeding and age acceleration. The other files are found in the EC and EPM folders as noted. 


National Geographic Society, Award: 8941-11

Allen Institute

Science Without Borders program of the National Counsel of Technological and Scientific Development of Brazil