Comparing exercise with virtual reality gaming on gait and cognition in relapsing-remitting multiple sclerosis: a randomized controlled trial
Data files
Mar 16, 2026 version files 40.06 KB
-
DisciplineSpecificMetadata.json
8.83 KB
-
MSRCT_ALL_Class__anonymized.csv
6.96 KB
-
MSRCT_ALL_Mean_SD__anonymized.csv
869 B
-
MSRCT_Data_Set__anonymized.csv
8.65 KB
-
README.md
14.75 KB
Abstract
Background: Exercise and virtual reality gaming may mitigate gait and cognitive deficits in relapsing-remitting multiple sclerosis (RRMS). The main aim was to compare the efficacy of both interventions on gait and cognition and gait in RRMS. Secondary aims were to explore the efficacy of both interventions on serum biomarkers and to explore the predictors of treatment response.
Methods: Forty-eight participants with RRMS were randomized to exercise (n=19), VR (n=19), or wait-list control (n=10) for eight weeks. Primary outcomes were the 10-meter walk test (10MWT) and the Symbol Digit Modalities Test (SDMT). Secondary outcomes included serum levels of neurofilament light chain (NfL), brain-derived neurotrophic factor (BDNF), and insulin-like growth factor-1 (IGF-1). Extreme Gradient Boosting (XGBoost), Random Forest, and logistic regression models were trained to predict treatment response.
Results: The exercise group improved 10MWT performance by 2.41 seconds and increased IGF-1 levels by 100.25 ng/ml, significantly more than the VR and control groups (both p<0.001). The VR group improved on the SDMT by 1.95 points (p=0.001 vs. control; p=0.05 vs. exercise). Both interventions reduced NfL concentrations compared to control (exercise: –2.07 pg/ml; VR: –0.60 pg/ml), with exercise showing a greater reduction than VR (p=0.02). XGBoost demonstrated highest predictive accuracy (10MWT: 87%; SDMT: 86%). SHapley Additive exPlanations (SHAP) analysis identified baseline IGF-1 and BDNF as top predictors of 10MWT, and baseline CognICA, BDNF, and age as predictors of SDMT performance.
Conclusion: Exercise preferentially improves gait and IGF-1, whereas VR gaming yields modest cognitive gains. Serum biomarkers enhance machine learning prediction of treatment response, supporting a precision rehabilitation approach in RRMS.
Keywords: Multiple Sclerosis, Virtual Reality, Exercise, Gait, Cognition, Machine Learning
Dataset DOI: 10.5061/dryad.x69p8czzx
Description of the data and file structure
1. Study Description
––––––––––––––––––––––––––––––––––––––
This dataset was generated from a single-blinded randomized controlled trial investigating the effects of exercise training and immersive virtual reality (VR) gaming compared with usual care on gait performance, cognitive function, and circulating neurobiological biomarkers in individuals with relapsing-remitting multiple sclerosis (RRMS).
Participants were randomly assigned to one of three groups: exercise intervention, VR gaming intervention, or wait-list control. All outcomes were assessed at baseline (pretest) and immediately after an 8-week intervention period (post).
The data were used for conventional statistical analyses and for supervised machine learning models to predict individual treatment response, supporting personalized rehabilitation strategies in multiple sclerosis.
––––––––––––––––––––––––––––––––––––––
2. Study Design
––––––––––––––––––––––––––––––––––––––
Study type: Randomized controlled trial
Blinding: Outcome assessor blinded
Intervention duration: 8 weeks
Assessment points: Pretest and Post
Groups:
Exercise group (supervised aerobic and resistance training)
Virtual Reality (VR) group (immersive motor-cognitive VR gaming)
Control group (usual care / wait-list)
Total participants: 48
––––––––––––––––––––––––––––––––––––––
3. Participants
––––––––––––––––––––––––––––––––––––––
Diagnosis: Relapsing-remitting multiple sclerosis (2010 McDonald criteria)
Age range: 18–55 years
Disability level: EDSS ≤ 5.0
Disease-modifying therapy stable for ≥ 6 months
All data are fully anonymized. Participant identifiers contain no personal or indirect identifiers.
––––––––––––––––––––––––––––––––––––––
4. File Contents
––––––––––––––––––––––––––––––––––––––
The dataset is provided as a tabular file (CSV/XLSX) containing:
Participant demographics
Clinical disability scores
Cognitive test scores
Gait, balance, and functional mobility outcomes
Serum biomarker concentrations
Pretest and post values
Calculated change scores (post minus pretest)
––––––––––––––––––––––––––––––––––––––
5. Definition of Treatment Responders
––––––––––––––––––––––––––––––––––––––
Responder labels were defined a priori and used for supervised machine learning analyses.
Gait Responder:
A participant is classified as a gait responder if they demonstrate at least a 20 percent improvement in the 10-metre timed walk test time:
(Pretest − Post) / Pretest × 100 ≥ 20%
Cognitive Responder:
A participant is classified as a cognitive responder if they demonstrate an increase of at least 4 points on the SDMT from pretest to post.
These thresholds correspond to established minimal clinically important differences in multiple sclerosis rehabilitation research.
––––––––––––––––––––––––––––––––––––––
6. Data Processing Notes
––––––––––––––––––––––––––––––––––––––
Data are provided in raw, non-imputed form
No normalization or scaling has been applied
Change scores are explicitly calculated as post minus pretest
Data are suitable for replication, reanalysis, meta-analysis, and machine learning applications
––––––––––––––––––––––––––––––––––––––
7. Trial Registration
––––––––––––––––––––––––––––––––––––––
This randomized controlled trial was registered in the Iranian Registry of Clinical Trials (IRCT).
Registry: Iranian Registry of Clinical Trials (IRCT)
Registration number: IR.SSRI.REC.1398.258
Registry URL: https://www.irct.ir
The Iranian Registry of Clinical Trials (IRCT) is a Primary Registry in the World Health Organization’s International Clinical Trials Registry Platform (WHO ICTRP) network.
WHO ICTRP Primary Registry reference:
https://www.who.int/tools/clinical-trials-registry-platform/network/primary-registries\
Because the IRCT website may occasionally be inaccessible from certain locations (including the United States), the WHO ICTRP reference is provided to confirm the registry’s international recognition and status as a WHO Primary Registry.
––––––––––––––––––––––––––––––––––––––
8. Ethical Approval
––––––––––––––––––––––––––––––––––––––
The study was approved by the Royan Institute Ethics Committee and conducted in accordance with the Declaration of Helsinki.
Ethics approval ID: IR.ACECR.ROYAN.REC.1396.98
All participants provided written informed consent.
––––––––––––––––––––––––––––––––––––––
9. Data License
––––––––––––––––––––––––––––––––––––––
This dataset is released under the CC0 Public Domain Dedication. This waiver allows anyone to copy, modify, distribute, and use the data for any purpose without restriction.
––––––––––––––––––––––––––––––––––––––
Files and variables
Files: MSRCT_Data_Set__anonymized.csv, MSRCT_ALL_Mean_SD__anonymized.csv, and MSRCT_ALL_Class__anonymized.csv
Description: These files contains the complete, de-identified participant-level dataset from a single-blind randomized controlled trial investigating the effects of exercise training, immersive virtual-reality (VR) exergaming, and wait-list control on cognitive performance, mobility, balance, flexibility, and serum biomarkers in individuals with relapsing–remitting multiple sclerosis (RRMS).
Data were collected at two time points:
- Pretest (baseline)
- Post (after 8 weeks of intervention)
The dataset includes demographic characteristics, clinical disability scores, blood-based biomarkers (IGF-1, BDNF, NfL), cognitive outcomes (ICA, SDMT), and physical performance measures. All participants provided informed consent, and all data have been anonymized in accordance with ethical approval.
Variables: Each row represents one participant. All measurements are numeric unless otherwise stated.
Identifiers and grouping
- ID – Unique anonymized participant identifier
- group name – Intervention group (
exercise,VR,control) - gender – Biological sex (
m,f)
Demographic and clinical characteristics
- age – Age (years)
- EDSS (pretest) – Expanded Disability Status Scale score at baseline
- EDSS (post) – EDSS score post-intervention
- EDSS diff – Change in EDSS score (post − pre)
Serum biomarkers
- IGF-1 (ng/ml) pretest – Insulin-like growth factor-1 at baseline
- IGF-1 (ng/ml) post – IGF-1 after intervention
- IGF-1 diff – Change in IGF-1 (ng/ml)
- BDNF (pg/ml) pretest – Brain-derived neurotrophic factor at baseline
- BDNF (pg/ml) post – BDNF after intervention
- BDNF diff – Change in BDNF (pg/ml)
- NFL (pg/ml) pretest – Serum neurofilament light chain at baseline
- NFL (pg/ml) post – NfL after intervention
- NFL diff – Change in NfL (pg/ml)
Cognitive outcomes
- ICA score pretest – Integrated Cognitive Assessment (CognICA) composite score
- ICA score post – ICA score after intervention
- ICA diff – Change in ICA score
- SDMT pretest – Symbol Digit Modalities Test score (number correct)
- SDMT post – SDMT score after intervention
- SDMT diff – Change in SDMT score
Functional and mobility outcomes
- Timed get up & go test (s) pretest – Timed Up and Go test (seconds)
- Timed get up & go test (s) post – Post-intervention TUG
- Timed get up & go test diff – Change in TUG time (seconds)
- Three minutes step test pretest – Number of steps completed
- Three minutes step test post – Post-intervention score
- Three minutes step test diff – Change in step test score
- 10-metre timed walk test (s) pretest – 10-meter walk test (seconds)
- 10-metre timed walk test (s) post – Post-intervention 10MWT
- 10-metre timed walk test diff – Change in 10MWT time (seconds)
- Standing balance test (s) pretest – Standing balance duration (seconds)
- Standing balance test (s) post – Post-intervention balance time
- Standing balance test diff – Change in balance time (seconds)
- The sit & reach-A test (cm) pretest – Flexibility score (cm)
- The sit & reach-A test (cm) post – Post-intervention score
- The sit & reach-A test diff – Change in flexibility (cm)
File: MSRCT_ALL_Class__anonymized.csv
Description:
This file contains a derived dataset used for classification analyses in the machine-learning component of the study. Each row represents one participant. The variables correspond to baseline measurements that were used as input features for predictive models of treatment response.
Variables:
- class – Binary treatment-response class label used for supervised machine-learning analysis. This variable also indicates the measurement time point for the observation (1 = pretest baseline, 2 = post-intervention).
- ICA score – Integrated Cognitive Assessment (CognICA) composite cognitive score.
- SDMT – Symbol Digit Modalities Test score (number of correct responses).
- Timed get up & go test (s) – Timed Up and Go test result measured in seconds.
- Three minutes step test – Number of steps completed during the 3-minute step test.
- 10-metre timed walk test (s) – Time required to walk 10 meters, measured in seconds.
- Standing balance test (s) – Static standing balance duration measured in seconds.
- The sit & reach-A test (cm) – Flexibility score measured using the sit-and-reach test in centimeters.
File: DisciplineSpecificMetadata.json
Description:
This file contains discipline-specific metadata automatically generated during the Dryad submission process. It provides structured metadata describing the dataset for repository indexing and interoperability. This file is not required for analysis of the dataset itself.
Missing values
-
No missing values are present in the final dataset.
Code/software
Code / Software
The dataset is provided as a CSV file (.csv) and can be viewed using any standard spreadsheet software, including:
- Microsoft Excel (version 2016 or later)
- LibreOffice Calc (open-source, version 7.0 or later)
- Google Sheets (web-based)
No proprietary or custom software is required to view or interpret the raw data.
Analysis software and workflow
Statistical analyses and machine-learning modeling described in the associated manuscript were conducted using Python (version 3.9 or later). The analysis workflow included data preprocessing, feature engineering, model training, validation, and interpretability analysis.
Key Python packages used include:
- NumPy – numerical computations
- Pandas – data manipulation and preprocessing
- Scikit-learn – machine learning models (Logistic Regression, Random Forest) and evaluation
- XGBoost – gradient boosting model for prediction
- SHAP – model interpretability and feature attribution
- SciPy – statistical testing
The machine-learning workflow involved:
- Importing the de-identified dataset from Excel format
- Separating baseline features and outcome variables
- Defining responder vs. non-responder labels based on post–pre changes (as defined in the manuscript)
- Training and validating predictive models using cross-validation
- Interpreting model outputs using SHAP values
No executable scripts are required to reuse the dataset itself. The dataset is self-contained and fully documented via the accompanying README file.
Access information
The dataset is originally generated by the authors as part of a randomized controlled trial and was not derived from any external or third-party data sources.
The complete, de-identified dataset is made publicly available exclusively through the Dryad Digital Repository in accordance with the PLOS Digital Health data availability policy.
No other publicly accessible locations currently host this dataset.
All data are released under the CC0 Public Domain Dedication, allowing unrestricted reuse, distribution, and reproduction without restriction.
Human subjects data
All data included in this dataset were collected from human participants following approval by the relevant institutional ethics committee.
Written informed consent was obtained from all participants, including explicit consent for the use and publication of de-identified research data in the public domain.
The dataset has been fully anonymized prior to deposition. All direct identifiers (such as names, contact information, national identification numbers, and exact dates of birth) were removed. Participants are represented only by randomly assigned study IDs.
The dataset contains only non-identifiable demographic variables (e.g., age in years, sex), clinical measures, biochemical markers, and functional test outcomes.
No personally identifiable information (PII) is included in this dataset, and the risk of re-identification is minimal.
