Skip to main content
Dryad

Identification of age-related CpG sites from longitudinal avian methylomes

Cite this dataset

Tangili, Marianthi et al. (2024). Identification of age-related CpG sites from longitudinal avian methylomes [Dataset]. Dryad. https://doi.org/10.5061/dryad.wm37pvmw8

Abstract

Sex chromosomes are thought to play an important role in sex-dependent ageing, yet they are neglected in epigenetic aging research. We identified genome-wide age-related CpG (AR-CpG) sites in two avian species (zebra finch and jackdaw) and found AR-CpG sites to be overrepresented on the haploid, female-specific W chromosome in both species, and on the Z chromosome in the zebra finch. 

README: Identification of age-related CpG sites from longitudinal avian methylomes

https://doi.org/10.5061/dryad.wm37pvmw8

Data for identification of age-related CpG sites from longitudinal avian methylomes

Description of the data and file structure

Codes:

Code #1

  • Bioinformatics code for DNA methylation extraction from WGBS and EMSeq data

Code #2:

  • Merging DNA methylation measurements per CpG site per sample in one data frame per species

  • Running the initial part of the pipeline for identification of age-related CpG sites

Code #3:

  • Functional annotation of all CpG sites captured by our analysis

Code #4:

  • Finals steps of the pipeline

  • Functional annotation of age-related CpG sites

  • Statistical analyses

  • Plots

File descriptions:

"JD_AgeRelated_CpG_noMT_Final.csv"- Contains information for all age-related CpG sites per sample for the jackdaw.

"Pos": age-related CpG site position(Chromosome_position)

"SampleShared"= how many samples this age-related CpG sites is shared by

"Sample"= inividual ID

"SampleL"= denotes longitudinal samples taken from the same individual

"StartPosition"= position of site

"Endposition"= position of site ("StartPosition" and "Endposition" should be the same)

"MethylationPercentage"= percentage of DNA methylation per site

"CountMethylated"= count of methylated reads per site

CountNonMethylated"= count of unmethylated reads per site

"Chromosome"= chromosome in which the site is located

"Coverage"= coverage per site

"Dage"= delta age per sample (the difference between an individual's chronological age and its average age)

"JD_AgeRelated_CpG_noMT_LENIENT_Final.csv"- Contains information for all age-related CpG sites per sample for the jackdaw when using the lenient pipeline (more lenient cut-offs).

"Pos": age-related CpG site position(Chromosome_position)

"SampleShared"= how many samples this age-related CpG sites is shared by

"Sample"= inividual ID

"SampleL"= denotes longitudinal samples taken from the same individual

"StartPosition"= position of site

"Endposition"= position of site ("StartPosition" and "Endposition" should be the same)

"MethylationPercentage"= percentage of DNA methylation per site

"CountMethylated"= count of methylated reads per site

CountNonMethylated"= count of unmethylated reads per site

"Chromosome"= chromosome in which the site is located

"Coverage"= coverage per site

"Dage"= delta age per sample (the difference between an individual's chronological age and its average age)

"JD_ARCpGs_AllGenes.csv" - Contains information on all genes of the jackdaw in which we located age-related CpG sites and the genomic annotation of each of them.

"Pos": age-related CpG site (Chromosome_position)

"gene.name": gene name

"category": annotation category of the site (promoter, exon, intron or intergenic)

"JD_Annotation_All.rds"- RDS file containing the genomic annotation of all CpG sites captured by our analysis in the jackdaw

"chromossome"= chromosome

"site"=position of site

"dist.to.feature"= distance to nearst feature

"feature.name"= feature name

"feature.strand"= feature strand

"prom"= promoter (0=no,1=yes)

"exon"= exon (0=no,1=yes)

"intron"= exon (0=no,1=yes)

"gene.name"= gene name

"Pos"= position in the genome (Chromosome_position)

"category"= asigned annotation category (promoter, exon, intron or intergenic)

"JD_Chromosome_Lengt.csv"- Contains information on the hawaiian crow chromosome length

"Chromosome"= chromosome

"Length"= length (bp)

"JD_CpG_Info_Final.csv"- Contains additional information on the age-related CpG sites in the jackdaw

"Pos"= age-related CpG site (Chromosome_position)

"MeanC"= mean coverage over all samples

"SDC"= standard deviation of coverage over all samples

"mCor"= correlation of DNA methylation and Dage over all samples

"abs"= absolute correlation of DNA methylation and Dage over all samples

"MeanM"= mean DNA methylation of coverage over all samples

"JD_Positions_AR_Final.csv"- Location of all age-related CpG sites for the jackdaw

"Chromosome"=chromosome

"Position"=position

"JD_Positions_AR_Final_Len.csv"- Location of all age-related CpG sites for the jackdaw when using the lenient pipeline (more lenient cut-offs).

"Chromosome"=chromosome

"Position"=position

"JD_PositionsPerChr_All_Final.csv"- Count of unique CpG sites per chromosome captured by our analysis in the jackdaw

"Chromosome"= chromosome

"Unique_Pos_Count"= count of unique CpG sites per chromosome captured by our analysis

"ZF_AgeRelated_CpG_noMT_Final.csv"- Contains information for all age-related CpG sites per sample for the zebra finch when using the lenient pipeline (more lenient cut-offs).

"Pos": age-related CpG site position(Chromosome_position)

"SampleShared"= how many samples this age-related CpG sites is shared by

"Sample"= inividual ID

"SampleL"= denotes longitudinal samples taken from the same individual

"Chromosome"= chromosome in which the site is located

"StartPosition"= position of site

"Endposition"= position of site ("StartPosition" and "Endposition" should be the same)

"MethylationPercentage"= percentage of DNA methylation per site

"CountMethylated"= count of methylated reads per site

CountNonMethylated"= count of unmethylated reads per site

"Coverage"= coverage per site

"Dage"= delta age per sample (the difference between an individual's chronological age and its average age)

"ZF_AgeRelated_CpG_noMT_LENIENT_Final.csv"- Contains information for all age-related CpG sites per sample for the zebra finch when using the lenient pipeline (more lenient cut-offs).

"Pos": age-related CpG site position(Chromosome_position)

"SampleShared"= how many samples this age-related CpG sites is shared by

"Sample"= inividual ID

"SampleL"= denotes longitudinal samples taken from the same individual

"Chromosome"= chromosome in which the site is located

"StartPosition"= position of site

"Endposition"= position of site ("StartPosition" and "Endposition" should be the same)

"MethylationPercentage"= percentage of DNA methylation per site

"CountMethylated"= count of methylated reads per site

CountNonMethylated"= count of unmethylated reads per site

"Coverage"= coverage per site

"Dage"= delta age per sample (the difference between an individual's chronological age and its average age)

"ZF_Annotation_All.rds"- RDS file containing the genomic annotation of all CpG sites captured by our analysis

"chromossome"= chromosome

"site"=position of site

"dist.to.feature"= distance to nearst feature

"feature.name"= feature name

"feature.strand"= feature strand

"prom"= promoter (0=no,1=yes)

"intron"= exon (0=no,1=yes)

"exon"= exon (0=no,1=yes)

"gene.name"= gene name

"Pos"= position in the genome (Chromosome_position)

"category"= asigned annotation category (promoter, exon, intron or intergenic)

"ZF_ARCpGs_AllGenes.csv" - Contains information on all genes in which we located age-related CpG sites and the genomic annotation of each of them

"Pos": age-related CpG site (Chromosome_position)

"gene.name": gene name

"Location": annotation category of the site (promoter, exon, intron or intergenic)

"ZF_Chromosome_Length.csv"- Contains information on the zebra finch chromosome length

"Chromosome"= chromosome

"Length"= length (bp)

"ZF_CpG_Info_Final.csv"- Contains additional information on the age-related CpG sites in the zebra finch

"Pos"= age-related CpG site (Chromosome_position)

"MeanC"= mean coverage over all samples

"SDC"= standard deviation of coverage over all samples

"mCor"= correlation of DNA methylation and Dage over all samples

"abs"= absolute correlation of DNA methylation and Dage over all samples

"MeanM"= mean DNA methylation of coverage over all samples

"ZF_Positions_AR_Final.csv"- Location of all age-related CpG sites for the zebra finch

"Chromosome"=chromosome

"Position"=position

"ZF_Positions_AR_Final_Len.csv"- Location of all age-related CpG sites for the zebra finch when using the lenient pipeline (more lenient cut-offs).

"Chromosome"=chromosome

"Position"=position

"ZF_PositionsPerChr_All_Final.csv"- Count of unique CpG sites per chromosome captured by our analysis in the zebra finch

"Chromosome"= chromosome

"Unique_Pos_Count"= count of unique CpG sites per chromosome captured by our analysis

Raw sequencing data

Raw sequencing data for both species can be found in: https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA1108628

Methods

Zebra finch blood samples were collected during a long-term experiment where birds were housed in outdoor aviaries (320 × 150 × 210 cm) each containing single sex groups of 18–24 adults . Twenty longitudinal samples were collected from ten known-age individuals (six males and four females) sampled twice during their lifetime at an average interval of 1,470 days between the two sampling points. Jackdaw blood samples were collected from individuals of a free-ranging population breeding in nest-boxes south of Groningen, the Netherlands (53.1708°N, 6.6064°E). We analyzed 22 longitudinal blood samples collected during 2007 – 2021 from 11 known age adults (five males and six females) with an average sampling interval of 2,429 days. Samples were taken between June 2008 and December 2014 and stored in -80oC in EDTA buffer.

We extracted total-cell DNA using innuPREP DNA Mini Kit (Analytik Jena GmbH+Co) from 3μl (nucleated) red blood cells according to the manufacturer’s protocol. Whole genome sequencing was performed by The Hospital for Sick Children (Toronto, Canada) where paired-end Illumina next-generation sequencing (150bp) was carried out on either an Illumina HiSeqX™ (12 zebra finch samples) or an Illumina NovaSeq™ sequencer (eight zebra finch samples and 22 jackdaw samples). Libraries were prepared using the Swift Biosciences Inc. Accel NGS Methyl Seq kit (part no. 30024 and 30096) and the DNA was bisulfite converted using the EZ-96 DNA Methylation-Gold kit from (Zymo Research Inc., part no. D5005) as per the manufacturer's protocol and subsequently subjected to whole-genome amplification.  

Sequences were trimmed using Trim Galore!  v. 0.6.10 (38) in paired-end mode. Visual quality controls of the data were carried out before and after trimming using FastQC v. 011.9 (39) and MultiQC v. 11.14 (40). Because the Swift Biosciences Inc. Accel NGS Methyl Seq kit was used for library preparation, the first ~10 bp showed extreme biases in sequence composition and M-bias, so after checking the M-bias plots, the first 10 bps were further trimmed from each sequence.

Alignments were performed using Bismark v. 0.14.433  using the Bowtie 2 v. 2.4.5 alignment algorithm  for both in silico bisulfite conversion of the reference genomes and alignments. For zebra finch, trimmed reads were aligned against the in silico bisulfite converted zebra finch (Taeniopygia guttata) reference genome (GCA_003957565.4) and the average mapping efficiency was 64.5% (SD: 3.22). Because the jackdaw reference genome did not contain an assembled W chromosome, the jackdaw bisulfite sequencing data was aligned to an in silico bisulfite converted Hawaiian crow genome (Corvus hawaiiensis, GCA_020740725.1, the closest relative species with an assembled W chromosome) the average mapping efficiency was 64.7% (SD: 0.98).

The GFT-formatted annotation files for the zebra finch (GCF_003957565.2) and the Hawaiian crow  (GCF_020740725.1) were converted into BED12 format using the University of California, Santa Cruz (UCSC) utilities gtfToGenePred and genePredToBed (available at https://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads). AR-CpG sites were annotated using the tool annotateWithGeneParts from the R 4.1.2 package genomation 1.4.1. This tool hierarchically classifies the sites into pre-defined functional regions, i.e., promoter, exon, intron, or intergenic, hereon referred to as annotation categories. The predefined functional regions were based on the annotation information present in the BED12 files accessed with the genomation tool readTranscriptFeatures. Annotations were performed as a hierarchical assignment (promoter > exon). Subsequently, a customized R script was employed to integrate the annotation results of AR-CpG sites with their respective annotation category information.

 

Funding

University of Groningen, Adaptive Life

European Commission, Award: 101025890, Marie Skłodowska-Curie grant

European Commission, Award: 813383, Marie Skłodowska-Curie grant