Flongle data for KnowYourCG: Facilitating base-level sparse methylome interpretation
Data files
Sep 10, 2025 version files 11.52 MB
-
blood.tar
2.59 MB
-
cortex.tar
203.78 KB
-
lung.tar
2.80 MB
-
README.md
2.61 KB
-
uterus.tar
5.93 MB
Abstract
ONT-based 5mC and 5hmC signals across four mouse tissues (lung, blood, uterus, and cortex) profiled with low-pass Flongle flowcells (~1M CpGs per sample). Decoding DNA methylomes for biological insights is critical in epigenetics research. We present KnowYourCG (KYCG), a data interpretation framework designed for functional DNA methylation analysis. Unlike existing tools that target genes or genomic intervals, KYCG features direct base-level screenings of diverse biological and technical influences, including sequence motifs, transcription factor binding, histone modifications, replication timing, cell-type-specific methylation, and trait associations. Through implementing efficient infrastructure that rapidly screens and investigates thousands of knowledge bases, KYCG addresses the challenges of data sparsity in various methylation datasets, including low-pass or single-cell DNA methylomes, 5-hydroxymethylation profiles, spatial DNA methylation maps, and array-based datasets for epigenome-wide association studies. Applying KYCG to these datasets provides valuable insights into cell differentiation, cancer origins, epigenome-trait associations, and technical issues such as array artifacts, single-cell batch effects, and Nanopore 5hmC detection accuracy. Our tool simplifies large-scale methylation analysis and integrates seamlessly with standard assay technologies.
Dataset DOI: 10.5061/dryad.zgmsbccq9
Description of the data and file structure
Oxford Nanopore Technology (ONT) is an emerging approach to directly discriminate 5mC, 5hmC, and unmodified C from ion current signals, bypassing cytosine deamination methods that cannot separate 5mC and 5hmC. Here, we provide ONT-based 5mC signals across four mouse tissues (lung, blood, uterus, and cortex) profiled with low-pass Flongle flowcells (~1M CpGs per sample). Each of the four mouse tissues is stored as .cg file to capture the methylation information.
Files and variables
File: uterus.tar
Description: uterus 20231207_1654_MN36407_AQY317_deaa9eb5
- Contents: One
.cgfile (uterus_20231207_1654_MN36407_AQY317_deaa9eb5.cg) - Tissue type: Mouse uterus
- Sequencing date: 2023-12-07
- Flowcell ID: MN36407
- Experiment tag: AQY317
- Approx. coverage: ~1M CpGs
- Notes: Low-pass Flongle run capturing genome-wide 5mC signal calls.
File: blood.tar
Description: blood 20231213_1441_MN36407_ARH615_cbdd1451
- Contents: One
.cgfile (blood_20231213_1441_MN36407_ARH615_cbdd1451.cg) - Tissue type: Mouse blood
- Sequencing date: 2023-12-13
- Flowcell ID: MN36407
- Experiment tag: ARH615
- Approx. coverage: ~1M CpGs
- Notes: Profiles peripheral blood tissue; sparse coverage typical of Flongle output.
File: lung.tar
Description: lung 20231128_1200_MN36407_AQY190_5c40b3aa
- Contents: One
.cgfile (lung_20231128_1200_MN36407_AQY190_5c40b3aa.cg) - Tissue type: Mouse lung
- Sequencing date: 2023-11-28
- Flowcell ID: MN36407
- Experiment tag: AQY190
- Approx. coverage: ~1M CpGs
- Notes: Genome-wide 5mC calls with low-pass resolution.
File: cortex.tar
Description: cortex 20221226_1552_MN36407_ANB527_cd04e856
- Contents: One
.cgfile (cortex_20221226_1552_MN36407_ANB527_cd04e856.cg) - Tissue type: Mouse cortex
- Sequencing date: 2022-12-26
- Flowcell ID: MN36407
- Experiment tag: ANB527
- Approx. coverage: ~1M CpGs
- Notes: Captures brain cortex methylation landscape at sparse genome coverage.
Code/software
The sequence-level enrichment analysis is available as a command-line C program available at https://github.com/zhou-lab/YAME. The YAME documentation is available at https://zhou-lab.github.io/YAME/. To unpack the data, simply use the command yame unpack which will output the desired methylation matrix for all the samples.
