Skip to main content
Dryad

Medical publications with information as to whether a publication reports a randomized controlled trial and/or if it covers an oncology topic

Data files

Jul 13, 2024 version files 5.21 MB

Click names to download individual files

Abstract

Background:

Most tools trying to automatically extract information from medical publications are domain agnostic and process publications from any field. However, only retrieving trials from dedicated fields could have advantages for further processing of the data.

Dataset collection:

A random sample of 900 publications from seven major journals (British Medical Journal, JAMA, JAMA Oncology, Journal of Clinical Oncology, Lancet, Lancet Oncology, New England Journal of Medicine) published between 2010 and 2022 were annotated. Publications that described randomized controlled trials (RCTs) received the label “RCT”. Publications that covered oncological topics received the label “ONCOLGY”. Trials that fulfilled both criteria were assigned both labels. Trials that were neither RCTs nor covered oncology topics were assigned no label. 100 randomly sampled trials from the New England Journal of Medicine were used as the unseen test set as the journal publishes both oncology and non-oncology articles. 

Data properties:

Each trial is a row in the CSV file. For each trial, there is a doi, a publication date, a title, an abstract, the abstract sections (introduction, methods, results, conclusion), several tags associated with the annotation process (text, _input_hash, _task_hash, options, _view_id, config, accept, answer, _timestamp, _annotator_id,_session_id), and the assigned labels (answer).