Genetic modifiers of Huntington’s disease differentially influence motor and cognitive domains
Lee, Jong-Min (2022), Genetic modifiers of Huntington’s disease differentially influence motor and cognitive domains, Dryad, Dataset, https://doi.org/10.5061/dryad.dr7sqvb11
Genome-wide association studies (GWAS) of Huntington’s disease (HD) have identified six DNA maintenance gene loci (among others) as modifiers and implicated a two step-mechanism of pathogenesis: somatic instability of the causative HTT CAG repeat with subsequent triggering of neuronal damage. The largest studies have been limited to HD individuals with a rater-estimated age at motor onset. To capitalize on the wealth of phenotypic data in several large HD natural history studies, we have performed algorithmic prediction using common motor and cognitive measures to predict age at other disease landmarks as additional phenotypes for GWAS. Combined with imputation using the Trans-Omics for Precision Medicine reference panel, predictions using integrated measures provided objective landmark phenotypes with greater power to detect most modifier loci. Importantly, substantial differences in the relative modifier signal across loci, highlighted by comparing common modifiers at MSH3 and FAN1, revealed that individual modifier effects can act preferentially in the motor or cognitive domains. Individual components of the DNA maintenance modifier mechanisms may therefore act differentially on the neuronal circuits underlying the corresponding clinical measures. In addition, we identified new modifier effects at the PMS1 and PMS2 loci and implicated a potential new locus on chromosome 7. These findings indicate that broadened discovery and characterization of HD genetic modifiers based on additional quantitative or qualitative phenotypes offers not only the promise of in-human validated therapeutic targets, but also a route to dissecting the mechanisms and cell types involved in both the somatic instability and toxicity components of HD pathogenesis.
Five batches of GWA data sets were produced and subsequently imputed using the HRC reference panel. as described previously (Cell 2019, 178, 887). Genotype imputation using the TOPMed reference panel was performed similarly. Briefly, the individual genotype data set (i.e., GWA1, 2, 3, 4, and 5) comprising subjects with genotype call rate greater than 90% and single nucleotide variants (SNVs) with call rate > 95% and MAF > 1% was subjected to quality control by the “HRC or 1000G Imputation preparation and checking” program. Additional QC was performed using the TOPMed Imputation Server, revealing genomic regions of 10 MB with at least one sample with call rate < 50%. Those low call rate samples at certain genomic regions were further excluded from genotype imputation. The final QC-passed typed data set was used for imputation by the TOPMed Imputation Server. Genomic coordinates of TOPMed imputation data were converted to GRCh37/hg19 to make this data set directly comparable to the HRC imputation data set. Finally, we removed SNVs with imputation r-square < 0.5 in any of the GWA data sets, HWE p-value < 1E-6 (excluding chr4:1-5000000 region containing HTT), or MAF < 0.1%. For a given phenotype, genome-wide association analysis was performed for individuals of European ancestry using a mixed effects model with a relationship matrix and a set of covariates such as sex, batch, and the first four principal component values from genetic ancestry analysis as covariates. For selected candidate regions with signiﬁcant association signals, conditional analysis was performed using a ﬁxed effect linear model that included one or more additional covariates representing the minor allele counts of conditioned SNVs. SNVs are expressed as chromosome_coordinate_reference-allele_alternative-allele based on the GRCh37/hg19 genome assembly. CAG repeat lengths for each subject were generated by a standard ABI fragment analysis assay.3; 28 For individuals uncovered in the TopMed imputation with HTT CAA-loss and CAACAG-duplication haplotypes, whose CAG lengths are systematically misestimated by the fragment analysis assay, the pure CAG repeat size was determined by MiSeq (Illumina) DNA sequencing analysis using the previously reported assay method.