Skip to main content
Dryad

Data from: Inferring new relations between medical entities using literature curated term co-occurrences

Cite this dataset

Spiro, Adam; Fernández García, Jonatan; Yanover, Chen (2019). Data from: Inferring new relations between medical entities using literature curated term co-occurrences [Dataset]. Dryad. https://doi.org/10.5061/dryad.j6n470c

Abstract

ABSTRACT Objectives Identifying new relations between medical entities, such as drugs, diseases, and side-effects, is typically a resource-intensive task, involving experimentation and clinical trials. The increased availability of related data and curated knowledge enables a computational approach to this task, notably by training models to predict likely relations. Such models rely on meaningful representations of the medical entities being studied. We propose a generic features vector representation that leverages co-occurrences of medical terms, linked with PubMed citations. Materials and Methods We demonstrate the usefulness of the proposed representation by inferring two types of relations: a drug causes a side effect, and a drug treats an indication. To predict these relations and assess their effectiveness, we applied two modeling approaches: multi-task modeling using neural networks, and single-task modeling based on gradient-boosting machines and logistic regression. Results These trained models, which predict either side effects or indications, obtained significantly better results than baseline models that use a single direct co-occurrence feature. The results demonstrate the advantage of a comprehensive representation. Discussion Selecting the appropriate representation has an immense impact on the predictive performance of machine learning models. Our proposed representation is powerful, as it spans multiple medical domains and can be used to predict a wide range of relation types. Conclusion The discovery of new relations between various medical entities can be translated into meaningful insights, for example, related to drug development or disease understanding. Our representation of medical entities can be used to train models that predict such relations, thus accelerating healthcare-related discoveries.

Usage notes