Clinical case reports (CCRs) provide an important means of sharing clinical experiences about atypical disease phenotypes and new therapies. However, published case reports contain largely unstructured and heterogeneous clinical data, posing a challenge to mining relevant information. Current indexing approaches generally concern document-level features and have not been specifically designed for CCRs. To address this disparity, we developed a standardized metadata template and identified text corresponding to medical concepts within 3,100 curated CCRs spanning 15 disease groups and more than 750 reports of rare diseases. We also prepared a subset of metadata on reports on selected mitochondrial diseases and assigned ICD-10 diagnostic codes to each. The resulting resource, Metadata Acquired from Clinical Case Reports (MACCRs), contains text associated with high-level clinical concepts, including demographics, disease presentation, treatments, and outcomes for each report. Our template and MACCR set render CCRs more findable, accessible, interoperable, and reusable (FAIR) while serving as valuable resources for key user groups, including researchers, physician investigators, clinicians, data scientists, and those shaping government policies for clinical trials.
MACCR Supplementary File 1
This GIF animation visualizes the geographic distribution of cases used in the assembly of the MACCR set. Darker colors indicate more clinical case reports origination from a location contributed to the set, with location determined by institutional affiliation of authors. Animation frames denote different disease groups.
MACCR_Supplementary_File_1.gif
MACCR_citations
This BibTeX file contains citations for the source clinical case reports from which metadata were extracted in the assembly of the MACCR set. No abstract text is included.
MACCR_RMD_ICD10_Categories
This file contains a set of scores indicating presence of ICD-10-CM codes, grouped into categories, as determined by a panel of domain experts reading clinical case reports describing presentations of rare mitochondrial diseases. These reports are a subset of those used in assembly of the MACCR set. Each row represents a single report, while each column contains a value of 0 (denoting the material corresponding to any code in the category named in the header was not observed) or 1 (denoting at least once code for material within the category named in the header was described in the report text). Reports are identified using their PubMed IDs.
MACCR_RMD_ICD10
This file contains a set of scores indicating presence of ICD-10-CM codes, as determined by a panel of domain experts reading clinical case reports describing presentations of rare mitochondrial diseases. These reports are a subset of those used in assembly of the MACCR set. Each row represents a single report, while each column contains a value of 0 (denoting the material corresponding to the code in the header was not observed) or 1 (denoting material corresponding to the code was described in the report text). Reports are identified using their PubMed IDs.
MACCR_entities.tsv
This file contains named entities (MeSH descriptors and SNOMED CT terms) identified within the metadata records in the MACCR set. Terms up to three words in length were identified. Terms are separated by the field of the primary MACCR file in which they were identified; these headings have identical names in this file.
MeSH terms correspond to the 2018 version of MeSH.
Please note that the U.S. National Library of Medicine is the creator, maintainer, and provider of MeSH. No proprietary rights to any MeSH content are claimed.
TEMPLATE
This Excel format spreadsheet contains a template for the extraction of metadata from the text of clinical case reports.
MACCR_mesh.tsv
This file contains the MeSH descriptors associated with all source clinical case reports used in assembly of the MACCR set. Each row contains a single descriptor, followed by its single-letter category. All terms are unique and are not repeated. Modifiers are not included. Terms correspond to the 2018 version of MeSH.
Please note that the U.S. National Library of Medicine is the creator, maintainer, and provider of MeSH. No proprietary rights to any MeSH content are claimed.
MACCRs.tsv
This file contains 3,100 sets of metadata extracted from clinical case reports. Each metadata record includes information identifying the source report, text corresponding to high-level medical concepts, and funding details.
README
This file contains documentation for the MACCR set and its assembly. This documentation is not intended to replace the detailed manuscript describing its creation and intended use, but contains a general overview of its files and formats.
Metadata Extraction Guide
This file provides a guide to the process performed in assembly of the Metadata Acquired from Clinical Case Reports (MACCR) data set.
Metadata_Extraction_Guide.pdf
MACCR File Guide
This file provides documentation for the primary MACCR metadata record file, including full descriptions of all data fields and their formats.
MACCR_File_Guide.pdf