Age estimation of captive Asian elephants (Elephas maximus) based on DNA methylation: An exploratory analysis using methylation-sensitive high-resolution melting (MS-HRM)
Data files
Nov 28, 2023 version files 8.63 KB
-
LOIOCV_trial.csv
6.43 KB
-
README.md
2.20 KB
Abstract
Age is an important parameter for bettering the understanding of biodemographic trends-development, survival, reproduction and environmental effects-critical for conservation. However, current age estimation methods are challenging to apply to many species, and no standardised technique has been adopted yet. This study examined the potential use of methylation-sensitive high-resolution melting (MS-HRM), a labour, time, and cost-effective method to estimate chronological age from DNA methylation in Asian elephants (Elephas maximus). The objective of this study was to investigate the accuracy and validation of MS-HRM use for age determination in long-lived species, such as Asian elephants. The average lifespan of Asian elephants is between 50-70 years but some have been known to survive for more than 80 years. DNA was extracted from 53 blood samples of captive Asian elephants across 11 zoos in Japan, with known ages ranging from a few months to 65 years. Methylation rates of two candidate age-related epigenetic genes, RALYL and TET2, were significantly correlated with chronological age. Finally, we established a linear, unisex age estimation model with a mean absolute error (MAE) of 7.36 years. This exploratory study suggests an avenue to further explore MS-HRM as an alternative method to estimate the chronological age of Asian elephants.
README: Estimation of captive Asian elephants (Elephas maximus) age based on DNA methylation: An exploratory analysis using methylation-sensitive high-resolution melting (MS-HRM)
https://doi.org/10.5061/dryad.qjq2bvqnb
Description of the data and file structure
The raw methylation data of RALYL and TET2 used in this analysis are in csv files.
This is the dataset that we used to develop the age estimation model. The 'subject ID' represents the same individuals. In contrast, 'sample' represents the sample collections taken over time. In addition, cells containing 'n/a' in our dataset within the 'sample' column are samples which were sampled recently and had no define number ID at the time (please refer to the Supplementary Information on the manuscript for more details on sampling date). 'sex' represents the sex of the individual, where F: female and M: male. 'age' represents the chronological age of the individual at the time of sampling. 'ralyl_methylationrate_ave' and 'tet2_methylationrate_ave' is the average methylation rate of two duplicate measure of each sample of RALYL and TET2 regions respectively. 'ralyl_df_ave' and 'tet2_df_ave' is the Df value of two duplicate measure of each sample of RALYL and TET2 regions respectively. 'ralyl_before', 'tet2_before', and 'all_svm' represents the predicted age output of the SVM model, before leave-one-individual-out cross-validation for the RALYL model, TET2 model, and the combined model respectively. 'ralyl_loiocv_after', 'tet2_loiocv', and 'all_loiocv' represents the output of the predicted age after leave-one-individual-out cross-validation for the RALYL model, TET2 model, and the combined model respectively.
The R script is a pdf file.
This is the R script for the statistical analysis and cross-validations examined in this study.
The codes were created by modifying the R script of Qi et al. (2021) accordingly (https://doi.org/10.5061/dryad.66t1g1k2t).
Cross-validation used to validate model performance was leave-one-individual-out cross-validation as samples from the same individuals were sampled over time.