Spermatozoa RRBS data: Post-Bismark aligned files extracted for the methylation call for every single C in the CpG context of sperm cells of Murrah buffalo bulls under heat stress
Data files
Abstract
The DNA was isolated from purified spermatozoa samples obtained during hot summer season from the Murrah buffalo bulls (n=10) and were sequenced using RRBS. The animals belonged to two groups, seasonally affected (SA=5) and seasonally not affected (SNA=5), based on the semen quality parameters (sperm viability, hypo-osmotic swelling test, and acrosomal integrity), which significantly varied across the four seasons between the two groups. The study is the first of its kind to generate the methylome of heat stress affected Murrah buffalo bulls from the sperm cells. This data potentially paves the path to understanding the epigenetic regulation of heat stress on altered sperm function and semen quality.
https://doi.org/10.5061/dryad.ns1rn8q1j
The dataset contains 10 .cov files derived from the BAM files aligned against the Bubalus bubalis assembly (UOA_WB1) for the RRBS sequence data using the BISMARK suite.
Description of the data and file structure
The cov file (coverage file) is a 6-column file containing the following information:
Column No | Statistics | Information |
---|---|---|
1 | Chromosome | The chromosome name. |
2 | Start position | The genomic start position. |
3 | End position | The genomic end position. |
4 | Methylation Percentage | The percentage of methylation at that position. |
5 | Count Methylated | The number of C bases that are methylated. |
6 | Count Unmethylated | The number of C bases that are unmethylated. |
Sharing/Access information
Data was derived from the following sources:
- The fastq files used to derive these cov files are under embargo (NCBI SRA submission ID: SUB12949619)
- The data describing the genes comprising the differential methylated cytosines in the promoter TSS region has been published (DOI: 10.1016/j.gene.2024.148233)
Code/Software
The Bismark suite used for generating the cov files for the RRBS data has been well described by the Bismark team (https://github.com/FelixKrueger/Bismark/tree/1675a9c07b49d51cbf9ae42e9f4bbbbde11f992f/Docs)). For deriving the differentially methylated cytosines from these cov files, methylKit (a Bioconductor package) has been used. One can find the tutorial for the aforesaid here (https://bioconductor.riken.jp/packages/3.9/bioc/vignettes/methylKit/inst/doc/methylKit.html).
The sequences were aligned against the Bubalus bubalis assembly using the Bismark suite.The unsorted BAM files were processed for the next step, where the information regarding the methylated CpGs was extracted. This particular step is not mandatory to be performed, but generating the methylation context-wise report can assure an error-free DMC calculation in further steps. The methylation base extraction is enabled by the script within the Bismark suite 'bismark_methylation_extractor'. Depending on the context (CpG, CHG, CHH), the positions of every cytosine in the BAM file were documented to a new output file, with methylated cytosines labelled as forward reads and non-methylated cytosines labelled as reverse reads.