Genetic modifiers of somatic expansion and clinical phenotypes in Huntington’s disease reveal shared and tissue-specific effects
Data files
Mar 21, 2025 version files 14.83 GB
-
blood.instability.abi.pps.txt
1.07 GB
-
blood.instability.miseq.expansion.ratio.txt
986.95 MB
-
clinical.dcl4.txt
1.07 GB
-
clinical.onset.txt
1.07 GB
-
clinical.tfc6.euro.txt
1.07 GB
-
clinical.tfc6.txt
1.07 GB
-
quantile.bradykinesia.txt
1.07 GB
-
quantile.chorea.txt
1.05 GB
-
quantile.dystonia.txt
1.07 GB
-
quantile.oculomotor.txt
1.07 GB
-
quantile.rigidity.txt
1.05 GB
-
quantile.sdmt.txt
1.07 GB
-
quantile.stroopword.txt
1.07 GB
-
quantile.tms.txt
1.07 GB
-
README.md
2.59 KB
Abstract
An inherited, expanded CAG repeat in HTT undergoes further somatic expansion to cause Huntington’s disease (HD). To gain insights into this molecular mechanism, we compared genome-wide association studies of somatic expansion in blood and somatic expansion-driven HD clinical phenotypes. Here, we show that somatic expansion is driven by a mismatch repair-related process whose genetic modification and consequences show unexpected complexity, including cell-type specificity. The HD clinical trajectory is further modified by non-DNA repair genes that differentially influence measures of cognitive and motor dysfunction. In addition to shared (DNA repair genes MSH3, PMS2, and FAN1) and distinct trans-modifiers, a synonymous CAG-adjacent variant in HTT dramatically hastens motor onset without increasing somatic expansion, while a cis-acting 5’-UTR variant promotes blood repeat expansion without influencing clinical HD. Our findings are directly relevant to the therapeutic suppression of expansion in DNA repeat disorders and provide additional clues to HD pathogenic mechanisms beyond somatic expansion.
Genome-wide association study (GWAS) for clinical and molecular phenotypes of Huntington’s disease was performed, revealing overlapping and distinct genetic modifiers of clinical landmarks and somatic expansion in blood DNA. Summary results of GWAS of 14 phenotypes are available.
Name of the GWAS summary file
In our study, we analyzed three groups of phenotypes:
-
Blood instability phenotypes (2 files)
-
Clinical phenotypes (4 files)
-
Algorithmically predicted quantile phenotypes (8 files)
The first part of the GWAS summary file name indicates the phenotype group, while the second part specifies the particular phenotype within that group.
Blood instability phenotypes: We analyzed ABI PPS and MiSeq expansion ratio phenotypes.
Clinical phenotypes: We examined DCL4, onset, and TFC6. For TFC6, we conducted an additional analysis restricted to Europeans.
Algorithmically predicted quantile phenotypes: We assessed bradykinesia, chorea, dystonia, oculomotor function, rigidity, SDMT (Symbol Digit Modalities Test), Stroop word, and TMS (Total Motor Score).
Description of the data and file structure
Each GWAS summary data contains columns for SNP (variant name), CHR (chromosome), BP (base pair), SAMPLE_SIZE (sample size), TEST_ALLELE (test allele), BETA (effect size of the test allele), SE (standard error), and P (mixed effect model association analysis p-value). GWAS summary result file can be opened by text editor in the UNIX system and used for meta-analysis and other analyses.
Sharing/Access information
Original data will be made available on request. Individual level genetic data cannot be deposited into a public or controlled repository due to restrictions imposed by the General Data Protection Regulation of the European Union and the CHDI Foundation, sponsor of the Enroll-HD Platform. They can be obtained by qualified investigators given their institutional assurance of subject confidentiality and compliance with GDPR requirements with respect to personal data by emailing info@chdifoundation.org with the words ‘‘GWAS123456 data’’ in the subject line.
Code/Software
A standard set of analysis programs such as PLINK (genotype QC, data management), GEMMA (association analysis), and R (plotting, data management) were used to generate the GWAS results. Detailed description of the softwares and codes will be available in the published manuscript.
The typed genotype data from HD subjects were subject to quality control (QC) such as call rate < 95% or minor allele frequency < 1% for subsequent pre-imputation QC using the "HRC or 1000G Imputation preparation and checking" program (https://www.chg.ox.ac.uk/~wrayner/tools/; Version 4.3.0). Then, genotypes were imputed using the TOPMed Imputation Server (https://imputation.biodatacatalyst.nhlbi.nih.gov/#!) involving MINIMAC4 and EAGLE V2.4. The TOPMed r2 was used as the reference panel (i.e., all populations). Post-imputation, SNVs with 1) imputation r2 value < 0.5, 2) call rate < 100%, 3) Hardy-Weinberg equilibrium p value < 1E-6 except for the chromosome 4:1-5,000,000 region, or 4) minor allele frequency < 0.1% were removed. These quality control filters generated approximately 19,000,000 imputed SNVs for genome-wide association study (GWAS) for clinical (age at onset, DCL4, TFC6. TFC6 in Europeans), repeat instability (blood instability based on ABI, blood instability based on MiSeq), and quantile phenotypes (bradykinesia, chorea, dystonia, oculomotor, rigidity, SDMT, stroopword, TMS) in participants. GWAS was based on a mixed effects model using the GEMMA program. Summary results for each phenotype GWAS are available at the DRYAD.