Agreement rate data set for GENETEX manuscript

Published Sep 27, 2021 on Dryad. https://doi.org/10.5061/dryad.dbrv15f25

Data files

Sep 27, 2021 version files 66.64 KB

GENETEX_Agreement_Rates.csv

47 KB
GENETEX_README.docx

19.64 KB

Abstract

Objectives: Clinico-Genomic Data (CGD) acquired through routine clinical practice has the potential to improve our understanding of clinical oncology. However, these data often reside in heterogeneous and semi-structured data, resulting in prolonged time-to-analyses.
Materials and Methods: We created GENETEX: an R package and Shiny application for text mining genomic reports from EHR and direct import into REDCap^®.
Results: GENETEX facilitates the abstraction of CGD from EHR and streamlines capture of structured data into REDCap^®. Its functions include natural language processing of key genomic information, transformation of semi-structured data into structured data and importation into REDCap. When evaluated with manual abstraction, GENETEX had >99% agreement and captured CGD in approximately one-fifth the time.
Conclusions: GENETEX is freely available under the Massachusetts Institute of Technology license and can be obtained from GitHub. GENETEX is executed in R and deployed as a Shiny application for non-R users. It produces high-fidelity abstraction of CGD in a fraction of the time.