An expanded CAG repeat in the huntingtin gene (HTT) causes Huntington's disease (HD). Since the length of uninterrupted CAG repeat, not polyglutamine, determines the age-at-onset in HD, base editing strategies to convert CAG to CAA are anticipated to delay onset by shortening the uninterrupted CAG repeat. Here, we developed base editing strategies to convert CAG in the repeat to CAA and determined their molecular outcomes and effects on relevant disease phenotypes. Base editing strategies employing combinations of cytosine base editors and gRNAs efficiently converted CAG to CAA at various sites in the CAG repeat without generating significant indels, off-target edits, or transcriptome alterations, demonstrating their feasibility and specificity. Candidate BE strategies converted CAG to CAA on both expanded and non-expanded CAG repeats without altering HTT mRNA and protein levels. In addition, somatic CAG repeat expansion, which is the major disease driver in HD, was significantly decreased in the liver by a candidate BE strategy treatment in HD knock-in mice carrying canonical CAG repeats. Notably, CAG repeat expansion was abolished entirely in HD knock-in mice carrying CAA-interrupted repeats, supporting the therapeutic potential of CAG-to-CAA conversion strategies in HD and potentially other repeat expansion disorders.

To determine the molecular consequences of candidate BE strategies, we performed RNAseq analysis. We transfected HEK293 cells with BE4max+empty vector, BE4max+gRNA 1, or BE4max+gRNA2 for 72hours. Subsequently, genomic DNA for MiSeq analysis and cell pellets for RNAseq analysis were generated from replica plates Genome-wide RNAseq analysis (Tru-Seq strand specific large insert RNA sequencing) was performed by the Broad Institute. Sequence data were processed by STAR aligner (Dobin 2013, 23104886) as part of the Broad Institute's standard RNAseq analysis pipeline. For differential gene expression (DGE) analysis, we used transcripts per million (TPM) data computed by the TPMCalculator (https://github.com/ncbi/TPMCalculator) (Alvarez 2019, 30379987). Expression levels in approximately 19,000 protein-coding genes based on Ensembl (ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/) were normalized The DGE analysis was performed by the generalized linear model using a library of “glm” in R package v3.3.1 (https://www.r-project.org/) after adjustment for two principal components based on RNAseq data, followed by multiple test correction using an FDR method. A multiple test corrected p-value less than 0.05 was considered statistically significant.

Expression data can be opened by text editors, Microsoft Excel, and R. Metadata can be opened by the same programs. README.md file can be opened by text editors.

Base editing strategies to convert CAG to CAA diminish the disease-causing mutation in Huntington's disease

Data files

Abstract

HD.BE.RNAseq.Meta.Data.230116.csv: Sample characteristics and meta-data

HD.BE.RNAseq.12.Sample.230116.txt: RNAseq expression data

Base editing strategies to convert CAG to CAA diminish the disease-causing mutation in Huntington's disease

Data files

Abstract

README: Base editing strategies to convert CAG to CAA in Huntington's disease

HD.BE.RNAseq.Meta.Data.230116.csv: Sample characteristics and meta-data

HD.BE.RNAseq.12.Sample.230116.txt: RNAseq expression data

Methods

Usage notes

Works referencing this dataset