Skip to main content

Not all roads lead to the immune system: The genetic basis of multiple sclerosis severity

Cite this dataset

Jokubaitis, Vilija G. et al. (2022). Not all roads lead to the immune system: The genetic basis of multiple sclerosis severity [Dataset]. Dryad.


Multiple sclerosis is a leading cause of neurological disability in adults. Heterogeneity in multiple sclerosis clinical presentation has posed a major challenge for identifying genetic variants associated with disease outcomes. To overcome this challenge, we used prospectively ascertained clinical outcomes data from the largest international multiple sclerosis Registry, MSBase. We assembled a cohort of deeply phenotyped individuals of European ancestry with relapse-onset multiple sclerosis. We used unbiased genome-wide association study and machine learning approaches to assess the genetic contribution to longitudinally defined multiple sclerosis severity phenotypes in 1,813 individuals. Our primary analyses did not identify any genetic variants of moderate to large effect sizes that met genome-wide significance thresholds. The strongest signal was associated with rs7289446 (β=-0.4882, P = 2.73 × 10−7), intronic to SEZ6L on chromosome 22. However, we demonstrate that clinical outcomes in relapse-onset multiple sclerosis are associated with multiple genetic loci of small effect sizes. Using a machine learning approach incorporating over 62,000 variants together with clinical and demographic variables available at multiple sclerosis disease onset, we could predict severity with an area under the receiver operator curve of 0.84 (95% CI 0.79–0.88). Our machine learning algorithm achieved positive predictive value for outcome assignation of 80% and negative predictive value of 88%. This outperformed our machine learning algorithm that contained clinical and demographic variables alone (area under the receiver operator curve 0.54, 95% CI 0.48–0.60). Secondary, sex-stratified analyses identified two genetic loci that met genome-wide significance thresholds. One in females (rs10967273; βfemale =0.8289, P = 3.52 × 10-8), the other in males (rs698805; βmale =  -1.5395, P = 4.35 × 10-8), providing some evidence for sex dimorphism in multiple sclerosis severity. Tissue enrichment and pathway analyses identified an overrepresentation of genes expressed in central nervous system compartments generally, and specifically in the cerebellum (P = 0.023). These involved mitochondrial function, synaptic plasticity, oligodendroglial biology, cellular senescence, calcium and g-protein receptor signalling pathways. We further identified six variants with strong evidence for regulating clinical outcomes, the strongest signal again intronic to SEZ6L (adjusted hazard ratio 0.72, P = 4.85 × 10-4). Here we report a milestone in our progress towards understanding the clinical heterogeneity of multiple sclerosis outcomes, implicating functionally distinct mechanisms to multiple sclerosis risk. Importantly, we demonstrate that machine learning using common single nucleotide variant clusters, together with clinical variables readily available at diagnosis can improve prognostic capabilities at diagnosis, and with further validation has the potential to translate to meaningful clinical practice change.


Data collection:

Participants of European descent were recruited from eight tertiary-referral multiple sclerosis-specialist centres, from three countries (Australia, Spain, and Czech Republic), participating in the MSBase Registry. MSBase is an international, prospective, observational, multiple sclerosis clinical outcomes registry, registered with the World Health Organization International Clinical Trials Registry Platform, ID ACTRN12605000455662. Clinical data used to derive phenotypic information were entered by neurologists in, or near real-time including: participant demographics, disease phenotype, expanded disability status scale (EDSS) scores, relapse information, and disease-modifying therapy use. Clinical assessments occur on average every 6 months.

This study was approved by the Melbourne Health Human Research Ethics Committee, and by institutional review boards at all participating centres. All participants gave written informed consent for participation in the MSBase Registry, together with additional informed consent to participate in genetic research (HREC/13/MH/189 and per local approvals elsewhere).

Data processing:

Peripheral blood samples were taken from participants, and genomic DNA extracted. Samples were genotyped at the John. P. Hussman Institute for Human Genomics (University of Miami, FL, USA) using the Ilumina MegaEx Beadchip array, and imputed on the Michigan Server (Das et al., Nature Genetics 2016; 48: (1284–1287) using the Haplotype Reference Consortium panel. QC and data processing was performed in PLINK 1.9 using standard quality control procedures. Association testing was also performed in PLINK 1.9 and 2.0, adjusting for the first 5 principal components and other imbalanced covariates as per our manuscript.

Weighted genetic risk scores were calculated according to De Jager (Lancet Neurol. 2009 Dec;8(12):1111-9) and based on directly genotyped SNVs described by the International multiple sclerosis Genetics Consortium (Science. 2019 Sep 27;365(6460)).

Survival analyses were performed in Stata v 17 (Stata Corp, College Station, TX) using the cox or weibull packages as appropriate and noted in the data tables.

Usage notes

Data files can be opened with Excel.


Multiple Sclerosis Research Australia, Award: 16-0206

Royal Melbourne Hospital, Award: Home Lottery Grant MH2013-055

MSBase Foundation

Monash University

CharityWorks for MS