Dosage compensation and sexual conflict in female heterogametic methylomes
Data files
Oct 16, 2024 version files 360.92 MB
-
06_zfnoMT.csv
49.47 KB
-
07_jdnoMT.csv
57.65 KB
-
08_MT.csv
2.30 KB
-
09_jdtoJD.csv
39.53 KB
-
10_ZF_Annotation_All.rds
167.40 MB
-
11_JD_Annotation_All.rds
193.37 MB
-
README.md
8.83 KB
Abstract
DNA methylation (DNAm) suppresses gene expression and contributes to dosage compensation in mammals but whether DNAm plays a similar role in female ZW chromosome heterogametic species remains unresolved. We assessed chromosome-level DNAm using whole genome bisulphite sequencing in two avian species, zebra finches and jackdaws. Dosage compensation by DNAm would result in higher and more variable DNAm level in males relative to females on the Z chromosome. However, we found that the level of DNAm and its variance on the Z chromosome was lower in males. Moreover, male Z chromosome based gene promoters were more frequently hypomethylated compared to females, indicating absence of upregulation on a gene-by-gene basis across the female Z chromosome.
title: Dosage compensation and sexual conflict in female heterogametic methylomes
author: Joanna Sudyka & Marianthi Tangili
date: October 2024
output: html_document
This dataset contains average methylation percentage per sample per chromosome alongside standard deviation data for DNA methylation per sample per chromosome in two species of birds: Taeniopygia guttata and Corvus monedula. Also included are the sex and age of the individuals and information on the longitudinal sample, that is individual id. Genomic annotation of all CpG sites captured by our analysis is included. See File list below for more information.
Description of the Data and file structure
The data here are structured into separate .csv files. A .txt file containing steps applied for bioinformatic genomic data processing and an R scripts to analyze each .csv file are included.
Sharing/access Information
These data are not publicly available in any other location.
Was data derived from another source?
If yes, list source(s):
Genomic data used to align bisulfite converted genomes can be found on NCBI: for zebra finches (bTaeGut1.4.pri; https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_003957565.2/; ) and for jackdaws (Corvus hawaiiensis alignment, bCorHaw1.pri.cur; https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_020740725.1/; Corvus monedula alignment ASM1340703v1; https://www.ncbi.nlm.nih.gov/data-hub/genome/GCA_013407035.1/)
Raw sequencing data for both species can be found in: https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA1108628
File list:
Codes (Zenodo):
1. bioinformaticsDNAm.txt
2. avian_DNAm.R
3. Annotation.sh
4. DNAm_SexChrom_ZF_JD_Annotation_Final.R
5. DNAm_SexChrom_ZF_JD_Figures_Final.R
Data files:
6. zfnoMT.csv
7. jdnoMT.csv
8. MT.csv
9. jdtoJD.csv
10. ZF_Annotation_All.rds
11. JD_Annotation_All.rds
1. File name: “bioinformaticsDNAm.txt”
-Description: This file contains scripts used for bioinformatics and processing of genomic data to obtain the files on average methylation percentage per sample per chromosome and standard deviation data for DNA methylation per sample per chromosome in zebra finches Taeniopygia guttata and jackdaws Corvus monedula. It also contains a script to prepare files for annotation of hypo and hypermethylated CpG sites.
2. File name: “avian_DNAm.R”
Description: This file contains scripts used for statistical analysis of DNA methylation scores in both species.
3. File name: “Annotation.sh”
Description: This file contains functional annotation of all CpG sites captured by our analysis.
4. File name: “DNAm_SexChrom_ZF_JD_Annotation_Final.R”
Description: This file contains functional annotation of hypo and hypermethylated CpG sites.
5. File name: “DNAm_SexChrom_ZF_JD_Figures_Final.R”
Description: This file contains codes to plot all results.
6. File name: “zfnoMT.csv”
Description: This file contains longitudinal data for average methylation percentage per sample per chromosome and standard deviation data for DNA methylation per sample per chromosome in zebra finches Taeniopygia guttata. Whole-genome bisulfite sequencing data of the zebra finch samples were aligned to a bisulfite converted zebra finch genome (NCBI: bTaeGut1.4.pri). Data on mitochondrial DNA methylation were excluded from this file.
-Number of variables: 9
-Number of cases/rows: 936
-Variable List (and units):
- Chromosome = chromosome number according to bTaeGut1.4.pri assembly
- Sample = number of sample sequenced
- MeanM = average methylation percentage per sample per chromosome
- SDM = standard deviation of methylation percentage per sample per chromosome
- Length = chromosome length in base pairs according to bTaeGut1.4.pri assembly
- Sex = bird’s sex (Male and Female)
- Age = bird’s age (Young = first sample vs Old = second sample per individual)
- SampleL = longitudinal id = individual id
- gen = chromosome type: autosome, Z or W
-Missing data code: n/a
7. File name: “jdnoMT.csv”
Description: This file contains longitudinal data for average methylation percentage per sample per chromosome and standard deviation data for DNA methylation per sample per chromosome in jackdaws Corvus monedula. Whole-genome bisulfite sequencing data of the jackdaw samples were aligned to a bisulfite converted Hawaiian crow genome (NCBI: bCorHaw1.pri.cur). Data on mitochondrial DNA methylation were excluded from this file.
-Number of variables: 9
-Number of cases/rows: 806
-Variable List (and units):
- Chromosome = chromosome number according to bCorHaw1.pri.cur assembly
- Sample = number of sample sequenced
- MeanM = average methylation percentage per sample per chromosome
- SDM = standard deviation of methylation percentage per sample per chromosome
- Length = chromosome length in base pairs according to bCorHaw1.pri.cur assembly
- Sex = bird’s sex (Male and Female)
- Age = bird’s age (Young = first sample vs Old = second sample per individual)
- SampleL = longitudinal id = individual id
- gen = chromosome type: autosome, Z or W
-Missing data code: n/a
8. File name: “MT.csv”
Description: This file contains longitudinal data for average methylation percentage and standard deviation per sample for mitochondrial DNA methylation in zebra finches Taeniopygia guttata (NCBI: bTaeGut1.4.pri) and jackdaws Corvus monedula (NCBI: bCorHaw1.pri.cur).
-Number of variables: 8
-Number of cases/rows: 42
-Variable List (and units):
- Chromosome = chromosome number according to bTaeGut1.4.pri and CorHaw1.pri.cur assemblies
- Sample = number of sample sequenced
- MeanM = average methylation percentage per sample per chromosome
- SDM = standard deviation of methylation percentage per sample per chromosome
- Sex = bird’s sex (Male and Female)
- Age = bird’s age (Young = first sample vs Old = second sample per individual)
- SampleL = longitudinal id = individual id
- Species = species of the bird (zebra finch and jackdaw)
-Missing data code: n/a
9. File name: “jdtoJD.csv”
Description: This file contains longitudinal data for average methylation percentage per sample per chromosome and standard deviation data for DNA methylation per sample per chromosome in jackdaws Corvus monedula. Whole-genome bisulfite sequencing data of the jackdaw samples were aligned to a bisulfite converted jackdaw genome (NCBI: ASM1340703v1). The results for this dataset are auxiliary analyses presented in the Supplementary material.
-Number of variables: 9
-Number of cases/rows: 638
-Variable List (and units):
- Chromosome = chromosome number according to ASM1340703v1 assembly
- Sample = number of sample sequenced
- MeanM = average methylation percentage per sample per chromosome
- SDM = standard deviation of methylation percentage per sample per chromosome
- Length = chromosome length in base pairs according ASM1340703v1 assembly
- Sex = bird’s sex (Male and Female)
- Age = bird’s age (Young = first sample vs Old = second sample per individual)
- SampleL = longitudinal id = individual id
- gen = chromosome type: autosome or Z
-Missing data code: n/a
10. File name: “ZF_Annotation_All.rds”
Description: This RDS file contains the genomic annotation of all CpG sites captured by our analysis in the zebra finch
Variable List (and units):
- chromosome= chromosome number according to bTaeGut1.4.pri assembly
- site=position of site
- dist.to.feature= distance to nearest feature
- feature.name= feature name
- feature.strand= feature strand
- prom= promoter (0=no,1=yes)
- intron= exon (0=no,1=yes)
- exon= exon (0=no,1=yes)
- gene.name= gene name
- Pos= position in the genome (Chromosome_position)
- category= assigned annotation category (promoter, exon, intron or intergenic)
Missing data code: n/a
11. File name: “JD_Annotation_All.rds”
Description: This file contains RDS file containing the genomic annotation of all CpG sites captured by our analysis in the jackdaw
Variable List (and units):
- chromosome= chromosome number according to bCorHaw1.pri.cur assembly
- site=position of site
- dist.to.feature= distance to nearest feature
- feature.name= feature name
- feature.strand= feature strand
- prom= promoter (0=no,1=yes)
- intron= exon (0=no,1=yes)
- exon= exon (0=no,1=yes)
- gene.name= gene name
- Pos= position in the genome (Chromosome_position)
-
category= assigned annotation category (promoter, exon, intron or intergenic)
-Missing data code: n/a
Exact age at sampling of all individuals was known because subjects were followed since birth in both species. Zebra finch samples were collected in the context of a long-term experiment in outdoor aviaries (320 × 150 × 210 cm) each containing single-sex flocks with 18–24 adults. Selected blood samples were taken from ten individuals (six males and four females), each sampled twice (20 genomes) with an average interval of 1,470 days, between June 2008 and December 2014. Average age at sampling was 464 days (SD: 237.0) and 1,934 days (SD: 767.2) at collection of the first and second blood sample, respectively (see Table S1 for exact ages). Jackdaws were sampled in the context of a long-term study of a free-ranging population breeding in nest-boxes south of Groningen, the Netherlands (53.1708°N, 6.6064°E). For the present study, we selected 22 blood samples (genomes) collected from 11 known age adults (five males and six females); all individuals were sampled twice with an average interval at 2,429 days in the years 2007 to 2021. The average age at sampling was 877 days (SD: 241.2) and 3,306 days (SD: 947.9) during collection of the first and second samples, respectively.
We extracted DNA according to the manufacturer’s protocol using innuPREP DNA Mini Kit (Analytik Jena GmBH) from 3 uL (nucleated) red blood cells stored in glycerol storage buffer (40% glycerol, 50mM TRIS, 5mM MgCl2, 0.1mM EDTA) at -80oC. Next-generation sequencing was outsourced to The Hospital for Sick Children (Toronto, Canada), where paired-end Illumina next-generation sequencing (150bp) was carried out on either an Illumina HiSeqX™ (12 zebra finch samples) or an Illumina NovaSeq™ S4 flowcell sequencer (eight zebra finch samples and 22 jackdaw samples). Libraries were prepared using the Swift Biosciences Inc. Accel NGS Methyl Seq kit (part no. 30024 and 30096) and DNA was bisulphite converted using the EZ-96 DNA Methylation-Gold kit (Zymo Research Inc., part no. D5005) as per the manufacturer's protocol and subsequently subjected to whole-genome amplification.
Sequences were trimmed using Trim Galore! in paired-end mode, while filtering for low-quality bases (Phred score < 20). Visual controls of the data for sequence qualities, duplication levels, adapter content etc., were carried out before and after trimming using FastQC and MultiQC. Because the Accel-NGS Swift kit was used for library preparation, the first ~10 bp showed extreme biases in sequence composition and M-bias, so after checking the M-bias plots, the first 10 bps were trimmed from each sequence.
Alignments were performed using Bismark v. 0.14.4 using the Bowtie 2 alignment algorithm for both in silico bisulphite conversion of the reference genomes and alignments (see codes). For zebra finch, trimmed reads were aligned against the in silico bisulphite converted zebra finch reference genome (bTaeGut1.4.pri; https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_003957565.2/). While some bird genome assemblies contain the sex chromosomes, many reference genomes (including the current jackdaw reference genome assembly) originate from males and thus lack a W chromosome sequence. Therefore, the jackdaw sequencing data were aligned to a bisulphite converted Hawaiian crow genome (Corvus hawaiiensis, bCorHaw1.pri.cur; https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_020740725.1/), which, is a chromosome-level Vertebrate Genomes Project genome assembly, which is annotated allowing to assess the functional consequences of DNAm levels. The Hawaiian crow is the most closely related species for which a high-quality genome assembly was available. The divergence time between the crow and jackdaw clades has been estimated at 13 million years. To verify the robustness of our findings, we repeated the analyses aligning the jackdaw sequences to the bisulphite converted jackdaw reference genome (Table S2) along with the mitochondrial alignment.
DNAm calling was performed using the Bismark methylation extractor of the Bismark Bisulfite Mapper. We selected Bismark, which uses the Bowtie 2 alignment algorithm as an aligner because of its integrated features and relative resistance to error across ranges of DNAm levels compared to other packages. We obtained mean DNAm percentage and its standard deviation per chromosome per sample, averaged over all CpG sites. Mapping efficiencies were 64.5 % (SD: 3.22) and 64.7 % (SD: 0.98) for the zebra finch and jackdaw respectively.
The files containing DNAm level information for each sample were merged and used to identify hypo- and hypermethylated sites across all samples. We calculated the average DNAm percentage for each CpG site separately for the sexes, and weighted by sample dependent coverage of that site. Based on these averages, we scored sites which were present in all samples as being either hypomethylated (<= 10% on average, enhanced potential for transcription), hypermethylated (>= 90% on average, reduced transcription). We then combined the location of these sites with a functional annotation of the genomes. The annotation files for the zebra finch (GCF_003957565.2) and the jack-down (GCF_020740725.1) reference genomes were retrieved in gene transfer format (GTF). These GTF annotation files were then converted into BED12 format using the University of California, Santa Cruz (UCSC) utilities gtfToGenePred and genePredToBed (available at https://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads). CpG sites were annotated using annotateWithGeneParts from the R 4.1.2 package Genomation 1.4.1. This tool hierarchically classifies sites into pre-defined functional regions, i.e., promoter, exon, intron, or intergenic. The pre-defined functional regions were based on the annotation information present in the BED12 files accessed with the Genomation tool readTranscriptFeatures. A customized script was employed to integrate the annotation results of CpG sites with their respective gene symbol information.