Identification of age-related CpG sites from longitudinal avian methylomes
Data files
Jun 21, 2024 version files 380.38 MB
-
JD_AgeRelated_CpG_noMT_Final.csv
1.58 MB
-
JD_AgeRelated_CpG_noMT_LENIENT_Final.csv
8.56 MB
-
JD_Annotation_All.rds
193.37 MB
-
JD_ARCpGs_AllGenes.csv
40.60 KB
-
JD_Chromosome_Length.csv
555 B
-
JD_CpG_Info_Final.csv
488.40 KB
-
JD_Positions_AR_Final_Len.csv
82.55 KB
-
JD_Positions_AR_Final.csv
15.36 KB
-
JD_PositionsPerChr_All_Final.csv
484 B
-
README.md
8.64 KB
-
ZF_AgeRelated_CpG_noMT_Final.csv
1.25 MB
-
ZF_AgeRelated_CpG_noMT_LENIENT_Final.csv
7.13 MB
-
ZF_Annotation_All.rds
167.40 MB
-
ZF_ARCpGs_AllGenes.csv
35.46 KB
-
ZF_Chromosome_Length.csv
538 B
-
ZF_CpG_Info_Final.csv
335.11 KB
-
ZF_Positions_AR_Final_Len.csv
72.64 KB
-
ZF_Positions_AR_Final.csv
12.82 KB
-
ZF_PositionsPerChr_All_Final.csv
469 B
Jun 26, 2025 version files 380.47 MB
-
02_Pipeline.R.R
13.11 KB
-
03_Annotation_Length.sh
7.52 KB
-
04_Plots_Stats.R.R
64.21 KB
-
JD_AgeRelated_CpG_noMT_Final.csv
1.58 MB
-
JD_AgeRelated_CpG_noMT_LENIENT_Final.csv
8.56 MB
-
JD_Annotation_All.rds
193.37 MB
-
JD_ARCpGs_AllGenes.csv
40.60 KB
-
JD_Chromosome_Length.csv
555 B
-
JD_CpG_Info_Final.csv
488.40 KB
-
JD_Positions_AR_Final_Len.csv
82.55 KB
-
JD_Positions_AR_Final.csv
15.36 KB
-
JD_PositionsPerChr_All_Final.csv
484 B
-
length_category_chromosome.JD.txt
2.35 KB
-
length_category_chromosome.ZF.txt
2.26 KB
-
README.md
9.65 KB
-
ZF_AgeRelated_CpG_noMT_Final.csv
1.25 MB
-
ZF_AgeRelated_CpG_noMT_LENIENT_Final.csv
7.13 MB
-
ZF_Annotation_All.rds
167.40 MB
-
ZF_ARCpGs_AllGenes.csv
35.46 KB
-
ZF_Chromosome_Length.csv
538 B
-
ZF_CpG_Info_Final.csv
335.11 KB
-
ZF_Positions_AR_Final_Len.csv
72.64 KB
-
ZF_Positions_AR_Final.csv
12.82 KB
-
ZF_PositionsPerChr_All_Final.csv
469 B
Sep 25, 2025 version files 363.25 MB
-
03_Annotation_Length.sh
7.52 KB
-
04.Plots_Stats.R
65.23 KB
-
JD_Annotation_All.rds
193.37 MB
-
JD_ARCpGs_AllGenes.csv
40.60 KB
-
JD_ARCpGs_Promoters.csv
7.87 KB
-
JD_ARsites_PerChr_Stats.csv
991 B
-
JD_Chromosome_Length.csv
555 B
-
JD_Filtered_Expected.csv
1.22 MB
-
length_category_chromosome.JD.txt
2.35 KB
-
length_category_chromosome.ZF.txt
2.26 KB
-
README.md
12.77 KB
-
ZF_Annotation_All.rds
167.40 MB
-
ZF_ARCpGs_AllGenes.csv
35.46 KB
-
ZF_ARCpGs_Promoters.csv
20.25 KB
-
ZF_ARsites_PerChr_Stats.csv
1.22 KB
-
ZF_Chromosome_Length.csv
538 B
-
ZF_Filtered_Expected.csv
1.07 MB
Abstract
Sex chromosomes are thought to play an important role in sex-dependent ageing, yet they are neglected in epigenetic aging research. We identified genome-wide age-related CpG (AR-CpG) sites in two avian species (zebra finch and jackdaw) and found AR-CpG sites to be overrepresented on the haploid, female-specific W chromosome in both species, and on the Z chromosome in the zebra finch.
https://doi.org/10.5061/dryad.wm37pvmw8
Data for identification of age-related CpG sites from longitudinal avian methylomes
Description of the data and file structure
Codes:
Code #1 (01_Bioinformatics):
- Bioinformatics code for DNA methylation extraction from WGBS and EMSeq data
Code #2 (02_Pipeline.R):
- Merging DNA methylation measurements per CpG site per sample in one data frame per species
- Running the initial part of the pipeline for identification of age-related CpG sites
Code #3 (03_Annotation.sh + 03_Annotation_Length.sh):
- Functional annotation of all CpG sites captured by our analysis and calculation of the length of each annotation category
Code #4 (04.Plots_Stats.R):
- Final steps of the pipeline
- Functional annotation of age-related CpG sites
- Statistical analyses
- Plots
File descriptions:
"JD_AgeRelated_CpG_noMT_Final.csv"- Contains information for all age-related CpG sites per sample for the jackdaw.
"Pos": age-related CpG site position(Chromosome_position)
"SampleShared"= how many samples this age-related CpG sites is shared by
"Sample"= inividual ID
"SampleL"= denotes longitudinal samples taken from the same individual
"StartPosition"= position of site
"Endposition"= position of site ("StartPosition" and "Endposition" should be the same)
"MethylationPercentage"= percentage of DNA methylation per site
"CountMethylated"= count of methylated reads per site
CountNonMethylated"= count of unmethylated reads per site
"Chromosome"= chromosome in which the site is located
"Coverage"= coverage per site
"Dage"= delta age per sample (the difference between an individual's chronological age and its average age)
"JD_AgeRelated_CpG_noMT_LENIENT_Final.csv"- Contains information for all age-related CpG sites per sample for the jackdaw when using the lenient pipeline (more lenient cut-offs).
"Pos": age-related CpG site position(Chromosome_position)
"SampleShared"= how many samples this age-related CpG sites is shared by
"Sample"= inividual ID
"SampleL"= denotes longitudinal samples taken from the same individual
"StartPosition"= position of site
"Endposition"= position of site ("StartPosition" and "Endposition" should be the same)
"MethylationPercentage"= percentage of DNA methylation per site
"CountMethylated"= count of methylated reads per site
CountNonMethylated"= count of unmethylated reads per site
"Chromosome"= chromosome in which the site is located
"Coverage"= coverage per site
"Dage"= delta age per sample (the difference between an individual's chronological age and its average age)
"JD_ARCpGs_AllGenes.csv" - Contains information on all genes of the jackdaw in which we located age-related CpG sites and the genomic annotation of each of them.
"Pos": age-related CpG site (Chromosome_position)
"gene.name": gene name
"category": annotation category of the site (promoter, exon, intron or intergenic)
"JD_Annotation_All.rds"- RDS file containing the genomic annotation of all CpG sites captured by our analysis in the jackdaw
"chromossome"= chromosome
"site"=position of site
"dist.to.feature"= distance to nearst feature
"feature.name"= feature name
"feature.strand"= feature strand
"prom"= promoter (0=no,1=yes)
"exon"= exon (0=no,1=yes)
"intron"= exon (0=no,1=yes)
"gene.name"= gene name
"Pos"= position in the genome (Chromosome_position)
"category"= asigned annotation category (promoter, exon, intron or intergenic)
"JD_Chromosome_Length.csv"- Contains information on the hawaiian crow chromosome length
"Chromosome"= chromosome
"Length"= length (bp)
"JD_CpG_Info_Final.csv"- Contains additional information on the age-related CpG sites in the jackdaw
"Pos"= age-related CpG site (Chromosome_position)
"MeanC"= mean coverage over all samples
"SDC"= standard deviation of coverage over all samples
"mCor"= correlation of DNA methylation and Dage over all samples
"abs"= absolute correlation of DNA methylation and Dage over all samples
"MeanM"= mean DNA methylation of coverage over all samples
"JD_Positions_AR_Final.csv"- Location of all age-related CpG sites for the jackdaw
"Chromosome"=chromosome
"Position"=position
"JD_Positions_AR_Final_Len.csv"- Location of all age-related CpG sites for the jackdaw when using the lenient pipeline (more lenient cut-offs).
"Chromosome"=chromosome
"Position"=position
"JD_PositionsPerChr_All_Final.csv"- Count of unique CpG sites per chromosome captured by our analysis in the jackdaw
"Chromosome"= chromosome
"Unique_Pos_Count"= count of unique CpG sites per chromosome captured by our analysis
"JD_Filtered_Expected.csv" - CpG sites that passed the filters in pipeline Steps 1,2,3 for the jackdaw
"Pos"= Position as chromosome_position
"r" = correlation value
"pval" = p-value
"JD_ARsites_PerChr_Stats.csv" - Results of chi-squared tests of observed vs. expected proportions of age-related CpG sites per chromosome for the jackdaw
"Chromosome"- Chromosome
"ARsites" - count of age-related CpG sites
"prop_exp" - proportion of expected CpG sites
"prop_obs"- proportion of observed CpG sites
"p_adj"- FDR-corrected p-value
"JD_ARCpGs_Promoters.csv"- Information of age-related CpG sites located in promoters of the jackdaw
"Pos"
"Chromosome"= chromosome
"Position" = position
"r" = correlation value
"pval" = p-value
"MeanC"= mean coverage over all samples
"coverage_bin" = coverage bin at which the CpG site was located
"SDC"= standard deviation of coverage over all samples
"mCor"= correlation of DNA methylation and Dage over all samples
"abs"= absolute correlation of DNA methylation and Dage over all samples
"MeanM"= mean DNA methylation of coverage over all samples
"gene.name" = name of the gene in whose promoter the CpG site is located
"category" = annotation category
"ZF_AgeRelated_CpG_noMT_Final.csv"- Contains information for all age-related CpG sites per sample for the zebra finch when using the lenient pipeline (more lenient cut-offs).
"Pos": age-related CpG site position(Chromosome_position)
"SampleShared"= how many samples this age-related CpG sites is shared by
"Sample"= inividual ID
"SampleL"= denotes longitudinal samples taken from the same individual
"Chromosome"= chromosome in which the site is located
"StartPosition"= position of site
"Endposition"= position of site ("StartPosition" and "Endposition" should be the same)
"MethylationPercentage"= percentage of DNA methylation per site
"CountMethylated"= count of methylated reads per site
CountNonMethylated"= count of unmethylated reads per site
"Coverage"= coverage per site
"Dage"= delta age per sample (the difference between an individual's chronological age and its average age)
"ZF_AgeRelated_CpG_noMT_LENIENT_Final.csv"- Contains information for all age-related CpG sites per sample for the zebra finch when using the lenient pipeline (more lenient cut-offs).
"Pos": age-related CpG site position(Chromosome_position)
"SampleShared"= how many samples this age-related CpG sites is shared by
"Sample"= inividual ID
"SampleL"= denotes longitudinal samples taken from the same individual
"Chromosome"= chromosome in which the site is located
"StartPosition"= position of site
"Endposition"= position of site ("StartPosition" and "Endposition" should be the same)
"MethylationPercentage"= percentage of DNA methylation per site
"CountMethylated"= count of methylated reads per site
CountNonMethylated"= count of unmethylated reads per site
"Coverage"= coverage per site
"Dage"= delta age per sample (the difference between an individual's chronological age and its average age)
"ZF_Annotation_All.rds"- RDS file containing the genomic annotation of all CpG sites captured by our analysis
"chromossome"= chromosome
"site"=position of site
"dist.to.feature"= distance to nearst feature
"feature.name"= feature name
"feature.strand"= feature strand
"prom"= promoter (0=no,1=yes)
"intron"= exon (0=no,1=yes)
"exon"= exon (0=no,1=yes)
"gene.name"= gene name
"Pos"= position in the genome (Chromosome_position)
"category"= asigned annotation category (promoter, exon, intron or intergenic)
"ZF_ARCpGs_AllGenes.csv" - Contains information on all genes in which we located age-related CpG sites and the genomic annotation of each of them
"Pos": age-related CpG site (Chromosome_position)
"gene.name": gene name
"Location": annotation category of the site (promoter, exon, intron or intergenic)
"ZF_Chromosome_Length.csv"- Contains information on the zebra finch chromosome length
"Chromosome"= chromosome
"Length"= length (bp)
"ZF_CpG_Info_Final.csv"- Contains additional information on the age-related CpG sites in the zebra finch
"Pos"= age-related CpG site (Chromosome_position)
"MeanC"= mean coverage over all samples
"SDC"= standard deviation of coverage over all samples
"mCor"= correlation of DNA methylation and Dage over all samples
"abs"= absolute correlation of DNA methylation and Dage over all samples
"MeanM"= mean DNA methylation of coverage over all samples
"ZF_Positions_AR_Final.csv"- Location of all age-related CpG sites for the zebra finch
"Chromosome"=chromosome
"Position"=position
"ZF_Positions_AR_Final_Len.csv"- Location of all age-related CpG sites for the zebra finch when using the lenient pipeline (more lenient cut-offs).
"Chromosome"=chromosome
"Position"=position
"ZF_PositionsPerChr_All_Final.csv"- Count of unique CpG sites per chromosome captured by our analysis in the zebra finch
"Chromosome"= chromosome
"Unique_Pos_Count"= count of unique CpG sites per chromosome captured by our analysis
"ZF_Filtered_Expected.csv" - CpG sites that passed the filters in pipeline Steps 1,2,3 for the zebra finch
"Pos"= Position as chromosome_position
"r" = correlation value
"pval" = p-value
"ZF_ARsites_PerChr_Stats.csv" - Results of chi-squared tests of observed vs. expected proportions of age-related CpG sites per chromosome for the zebra finch
"Chromosome"- Chromosome
"ARsites" - count of age-related CpG sites
"prop_exp" - proportion of expected CpG sites
"prop_obs"- proportion of observed CpG sites
"p_adj"- FDR-corrected p-value
"ZF_ARCpGs_Promoters.csv"- Information of age-related CpG sites located in promoters of the zebra finch
"Pos"
"Chromosome"= chromosome
"Position" = position
"r" = correlation value
"pval" = p-value
"MeanC"= mean coverage over all samples
"coverage_bin" = coverage bin at which the CpG site was located
"SDC"= standard deviation of coverage over all samples
"mCor"= correlation of DNA methylation and Dage over all samples
"abs"= absolute correlation of DNA methylation and Dage over all samples
"MeanM"= mean DNA methylation of coverage over all samples
"gene.name" = name of the gene in whose promoter the CpG site is located
"category" = annotation category
"length_category_chromosome.ZF.txt" and "length_category_chromosome.JD.txt"- length of each annotation category per chromosome for each species
"Chromosome"= chromosome
"exon_length"= length of exons (bp)
intron_length" = length of introns (bp)
"promoterlength" = length of promoters (bp)
"intergenic_length" = length of intergenic regions (bp)
"gene_length" = length of genes (bp)
"chrom_size" = chromosome size (bp)
Raw sequencing data
Raw sequencing data for both species can be found in: https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA1108628
Change Log
2024-06-21: Initial data and code submission.
2025-06-24: Added calculation of the length of each annotation category (Code #3) and an enrichment- depletion test for AR-CpG sites in each annotation category in the whole genome and split into autosomes, Z and W (Code #4). The raw data of the length of each annotation category per chromosome for each species were also added in the submission as "length_category_chromosome.ZF.txt" and "length_category_chromosome.JD.txt".
2025-09-25: Changed the order of the steps of the pipeline for identification of age-related CpG sites (Code #2). Additionally, we now consider as "expected" distribution of age-related CpG sites, the distribution of sites that passed the filtering steps1,2,3 in the pipeline. Code #4 has been updated accordingly and information on the sites that passed filtering for each species were added as "ZF_Filtered_Expected.csv" and "JD_Filtered_Expected.csv". All plots and statistical analyses have been updated accordingly (Code #4).
Zebra finch blood samples were collected during a long-term experiment where birds were housed in outdoor aviaries (320 × 150 × 210 cm) each containing single sex groups of 18–24 adults . Twenty longitudinal samples were collected from ten known-age individuals (six males and four females) sampled twice during their lifetime at an average interval of 1,470 days between the two sampling points. Jackdaw blood samples were collected from individuals of a free-ranging population breeding in nest-boxes south of Groningen, the Netherlands (53.1708°N, 6.6064°E). We analyzed 22 longitudinal blood samples collected during 2007 – 2021 from 11 known age adults (five males and six females) with an average sampling interval of 2,429 days. Samples were taken between June 2008 and December 2014 and stored in -80oC in EDTA buffer.
We extracted total-cell DNA using innuPREP DNA Mini Kit (Analytik Jena GmbH+Co) from 3μl (nucleated) red blood cells according to the manufacturer’s protocol. Whole genome sequencing was performed by The Hospital for Sick Children (Toronto, Canada) where paired-end Illumina next-generation sequencing (150bp) was carried out on either an Illumina HiSeqX™ (12 zebra finch samples) or an Illumina NovaSeq™ sequencer (eight zebra finch samples and 22 jackdaw samples). Libraries were prepared using the Swift Biosciences Inc. Accel NGS Methyl Seq kit (part no. 30024 and 30096) and the DNA was bisulfite converted using the EZ-96 DNA Methylation-Gold kit from (Zymo Research Inc., part no. D5005) as per the manufacturer's protocol and subsequently subjected to whole-genome amplification.
Sequences were trimmed using Trim Galore! v. 0.6.10 (38) in paired-end mode. Visual quality controls of the data were carried out before and after trimming using FastQC v. 011.9 (39) and MultiQC v. 11.14 (40). Because the Swift Biosciences Inc. Accel NGS Methyl Seq kit was used for library preparation, the first ~10 bp showed extreme biases in sequence composition and M-bias, so after checking the M-bias plots, the first 10 bps were further trimmed from each sequence.
Alignments were performed using Bismark v. 0.14.433 using the Bowtie 2 v. 2.4.5 alignment algorithm for both in silico bisulfite conversion of the reference genomes and alignments. For zebra finch, trimmed reads were aligned against the in silico bisulfite converted zebra finch (Taeniopygia guttata) reference genome (GCA_003957565.4) and the average mapping efficiency was 64.5% (SD: 3.22). Because the jackdaw reference genome did not contain an assembled W chromosome, the jackdaw bisulfite sequencing data was aligned to an in silico bisulfite converted Hawaiian crow genome (Corvus hawaiiensis, GCA_020740725.1, the closest relative species with an assembled W chromosome) the average mapping efficiency was 64.7% (SD: 0.98).
The GFT-formatted annotation files for the zebra finch (GCF_003957565.2) and the Hawaiian crow (GCF_020740725.1) were converted into BED12 format using the University of California, Santa Cruz (UCSC) utilities gtfToGenePred and genePredToBed (available at https://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads). AR-CpG sites were annotated using the tool annotateWithGeneParts from the R 4.1.2 package genomation 1.4.1. This tool hierarchically classifies the sites into pre-defined functional regions, i.e., promoter, exon, intron, or intergenic, hereon referred to as annotation categories. The predefined functional regions were based on the annotation information present in the BED12 files accessed with the genomation tool readTranscriptFeatures. Annotations were performed as a hierarchical assignment (promoter > exon). Subsequently, a customized R script was employed to integrate the annotation results of AR-CpG sites with their respective annotation category information.
