Pathogenic CGG expansions in oculopharyngodistal myopathy exhibit distinct characteristics of each causative gene on the flanking sequences as well as methylation status
Data files
Mar 19, 2026 version files 8.95 GB
-
OPDM_BAM.zip
8.95 GB
-
README.md
4.08 KB
Abstract
Background: Oculopharyngodistal myopathy (OPDM) is a hereditary muscle disease caused by CGG/CCG repeat expansions in six genes. Although the clinical features are often similar, such as ptosis, dysphagia, and distal muscle weakness, the age at onset vary widely, and the mechanisms underlying this variation remain unclear. In particular, the contributions of repeat size, flanking sequence variation, and DNA methylation to phenotype have not been systematically explored using single-molecule resolution.
Methods: We applied CRISPR/Cas9-targeted nanopore sequencing (nCATS) to genomic DNA from 91 individuals carrying expanded CGG repeats in three OPDM-related genes (LRP12, GIPC1, and NOTCH2NLC). This approach enabled the simultaneous analysis of CGG repeat length, flanking sequence architecture, single-nucleotide variant haplotypes, structural variation, and CpG methylation profiles. Genotype–phenotype correlations were evaluated by integrating molecular and clinical data.
Results: Expanded LRP12 and GIPC1 alleles in the patients showed respective single nucleotide variant patterns around repeat regions, suggesting founder haplotypes. Repeat regions essentially comprised pure CGG expansions, but exhibited size variability, even within patients. Additionally, LRP12-expanded repeats lacked flanking nucleotide sequences present in non-expanded repeats, whereas GIPC1 expanded repeats contained specific discontinued CGG patterns in their 5'-regions. Structural variations were also identified in some patients. A significant inverse correlation was observed between repeat length and age at onset in patients with GIPC1 or NOTCH2NLC expansions, while this was disturbed by higher methylation of upstream regions in patients with LRP12 expansions, leading to delayed onset.
Conclusions: This study highlights gene-specific differences in CGG repeat architecture and epigenetic regulation in OPDM. Founder haplotypes, expanded allele-specific flanking sequences, and the combined effects of repeat size and methylation contribute to patient regional frequency, repeat stability, and clinical variability, respectively, offering insight into disease pathomechanism and potential therapeutic targets.
Dataset DOI: 10.5061/dryad.1rn8pk18t
Description of the data and file structure
Genomic DNA samples were collected from individuals carrying CGG repeat expansions associated with oculopharyngodistal myopathy (OPDM). CRISPR/Cas9-targeted nanopore sequencing (nCATS) was performed to characterize repeat length, flanking sequence variation, and CpG methylation patterns at disease-associated loci including LRP12, GIPC1, and NOTCH2NLC.
Files and variables
File: OPDM_BAM.zip
Description: A compressed archive containing de-identified individual-level long-read sequencing alignment files (BAM format) generated using CRISPR/Cas9-targeted nanopore sequencing (nCATS), along with corresponding BAM index files (.bai).
The compressed archive "OPDM_BAM" contains three top-level directories corresponding to the causative genes: LRP12, GIPC1, and NOTCH2NLC.
The LRP12 directory includes BAM and corresponding BAI files from both familial and sporadic cases. Familial samples are labeled using the prefix “F” followed by family and individual identifiers (e.g., F1-1, F1-2, F2-1 through F7-5), comprising 20 BAM files and 20 BAI files in total. Sporadic cases are labeled using the prefix “S” (S1 through S53), including two independent sequencing runs for samples S26 and S49, resulting in 55 BAM files and 55 BAI files.
The GIPC1 directory contains BAM and BAI files from 10 individuals labeled G1 through G10, with sample G3 sequenced in two independent runs, resulting in a total of 11 BAM files and 11 BAI files.
The NOTCH2NLC directory contains BAM and BAI files from 8 individuals labeled N1 through N7 and N10, resulting in 8 BAM files and 8 BAI files.
Each BAM file is accompanied by its corresponding index file (.bai) and contains aligned long-read sequencing data mapped to the human reference genome (GRCh38).
Code/software
The BAM files included in this dataset can be viewed and processed using standard open-source bioinformatics software compatible with long-read sequencing alignment data.
The following software tools are recommended:
samtools: Used for indexing, viewing, and manipulating BAM files.
Integrative Genomics Viewer (IGV): Used for visualization of aligned long-read sequencing data.
All BAM files are aligned to the human reference genome (GRCh38). Users should load the corresponding reference genome when visualizing the data in IGV or related software.
No custom scripts or proprietary software are required to access or interpret the alignment data.
Access information
Other publicly accessible locations of the data:
- None. These data are not available in any other publicly accessible repository.
Data was derived from the following sources:
- The data were generated from genomic DNA samples collected from study participants as part of this research and were not derived from any previously published or publicly available datasets.
Human subjects data
All participants provided written informed consent for the sharing of de-identified research data in public repositories. The data included in this Dryad submission were generated using CRISPR/Cas9-targeted nanopore sequencing (nCATS) and are limited to specific disease-associated loci (LRP12, GIPC1, NOTCH2NLC, and RILPL1). These data do not contain comprehensive individual genomic profiles and sufficient genomic variation to enable identification of individual participants. All direct personal identifiers have been removed prior to submission. Based on the limited scope of the sequencing data and the minimal risk of re-identification, the dataset is considered appropriately de-identified for public release under a CC0 license. The study protocol was approved by the institutional ethics review board, and all data were handled in accordance with applicable legal and ethical guidelines.
