Full list of potential drug repurposing candidates for rare neuro-muscular disorders
Data files
May 05, 2026 version files 618.10 KB
-
Drug-Repurposing-Compound-Full-List.tsv
612.43 KB
-
Known-drug-indications-for-NMDs.tsv
1.95 KB
-
README.md
3.73 KB
Abstract
Drug repurposing is particularly challenging yet essential for rare diseases, where limited patient populations and scarce biomedical evidence hinder traditional discovery pipelines. This work presents a holistic machine learning approach for drug-disease link prediction, leveraging multiple heterogeneous sources including biomedical literature, structured databases, and textual descriptions of diseases. Focusing on seven rare neuro-muscular disorders, we construct a biomedical knowledge graph from literature and open databases, to evaluate a suite of rule-based, graph neural network, and path-encoding models. An ensemble of the best-performing methods, further enriched with disease similarity features derived from text-based embeddings, is used to generate candidate treatments for each disorder. Experimental results show that established graph neural network approaches (CompGCN), and path encoding methods (Prime Adjacency Matrix framework), outperform other approaches in metrics like Mean Reciprocal Rank. The ensemble of the best-performing methods further improves those metrics, reaching MRR = 0.3145. A manual validation of top-ranked drugs from rare disease experts illustrates a high precision (> 50 %) for drugs that potentially treat a rare disorder or its symptoms. The lack of vast number of publications and known drug indications for rare neuro-muscular disorders sets serious challenges in identifying potential therapies and symptom-relievers. The ensemble predictor incorporates rule-based, graph neural networks and path encoding techniques, to improve drug repurposing prediction performance on a biomedical knowledge graph created from open data. Expert evaluation indicates that an ensemble of various knowledge graph link prediction methods can produce promising repurposing hypotheses, for disorders lacking any approved therapies.
This repository contains the data sources accompanying the work:
Identifying Drug Repurposing Candidates for Rare Neuro-muscular Disorders Using Different AI Methods on the Literature Knowledge Graph (2025)
Papadimas, F., Svolou, S., Bougiatiotis, K., Aisopos, F., Krithara, A., and Paliouras, G.
This work was conducted in the context of the SIMPATHIC project, funded by the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 101080249).
Drug repurposing is a critical yet challenging task for rare diseases, where limited patient populations and sparse curated biomedical evidence hinder traditional drug discovery pipelines. This project presents a computational framework for drug–disease link prediction, integrating heterogeneous biomedical evidence into a unified literature-based knowledge graph.
Focusing on seven rare neurological, neurometabolic, and neuromuscular disorders:
1. SpinoCerebellar Ataxia type 3 (SCA3)
2. Congenital NeuroTransmitter defects (CNT)
3. Pyridoxine Dependent Epilepsy (PDE )
4. Congenital disorder glycosylation (PMM2)
5. Zellweger Spectrum Disorders (ZSD)
6. Myotonic Dystrophy type 1 (DM1)
7. Congenital Myasthenic Syndrome (CMS)
we construct a disease-centered biomedical knowledge graph and evaluate multiple artificial intelligence approaches.
Description of the data and file structure
The following two files have been shared, which are used by the link prediction approaches:
- Drug-Repurposing-Compound-Full-List.tsv:
A long list of all approved/non-approved drugs and compounds that can be considered as candidates for the aforementioned disorders.
Various information is provided for each drug, namely the Drug Name, Synonyms, CAS Number, Drug Target, Drug Pathway(s), Research Area and related Clinical Information.
Drugs with no synonyms, have "n/a" as value in the synonyms column. - Known-drug-indications-for-NMDs.tsv:
A short list of drug indications reported for the aforementioned disorders that is used as groundtruth for our prediction methods.
This list includes disorder/drug names and UMLS CUI identifiers.
Sharing/Access information
These indications are collected from the following online databases:
- TTD (Zhou Y, Zhang Y, Zhao D, Yu X, Shen X, Zhou Y, et al. TTD: Therapeutic Target Database describing target druggability information. Nucleic acids research. 2024;52(D1):D1465-77.
- DrugCentral (Avram S, Wilson TB, Curpan R, Halip L, Borota A, Bora A, et al. DrugCentral 2023 extends human clinical data and integrates veterinary drugs. Nucleic acids research. 2023;51(D1):D1276-87.)
- Open Targets (Buniello A, Suveges D, Cruz-Castillo C, Llinares MB, Cornu H, Lopez I, et al. Open Targets Platform: facilitating therapeutic hypotheses building in drug discovery. Nucleic acids research. 2025;53(D1):D1467-75.)
- Drugbank (Knox C, Wilson M, Klinger CM, Franklin M, Oler E, Wilson A, et al. DrugBank 6.0: the DrugBank knowledgebase for 2024. Nucleic acids research. 2024;52(D1):D1265-75)
Code/Software
Python 3 was used to implement all models presented in this work, except for the Path Analysis feature extraction, which relied on a Java-based implementation.
GNN-based models, R-GCN and CompGCN, were implemented using the Pytorch Geometric library, with CompGCN additionally employing PyTorch.
The code of all methods is provided in the following link: https://github.com/fotais/simpathic-computational-drug-repurposing
