Skip to main content
Dryad

Mapping between Human Phenotype Ontology and phecode terminologies

Cite this dataset

McArthur, Evonne; Capra, John (2023). Mapping between Human Phenotype Ontology and phecode terminologies [Dataset]. Dryad. https://doi.org/10.7272/Q6H70D20

Abstract

This biomedical data repository contains a mapping between the Human Phenotype Ontology (HPO) and phecodes which are curated groupings of International Classification of Diseases (ICD) codes. These data correspond to the mappings published and evaluated in the JAMIA Open manuscript, "Linking rare and common disease vocabularies by mapping between the Human Phenotype Ontology and phecodes." This mapping was created using a variety of data sources and methods, including text matching, the National Library of Medicine's Unified Medical Language System, Wikipedia, SORTA, and PheMap. The mapping includes 38,950 links, and the files allow users to tailor the HPO-phecode links via a variety of filters for diverse applications across the spectrum of monogenic to polygenic diseases. Other intermediate files and the most up-to-date mappings can be found at the "phecode-HPO-map" GitHub repository (https://github.com/emcarthur/phecode-HPO-map/).

Methods

The map between phecodes and HPO terms is constructed using multiple types of evidence including string or sub-string match, UMLS, SORTA, WikiMedMap and PheMap. The HPO terms used are from the 2022-04-14 release and phecodes are from version 1.2 available in the PheWAS catalog (https://phewascatalog.org/). Mappings are also replicated with Phecode X, which has increased granularity and coverage of terms related to pregnancy, congenital anomalies, and neonatology. Further technical details of the integration process can be found in the manuscript methods. The code used the process, plot, and create the data are in the "phecode-HPO-map" github repository (https://github.com/emcarthur/phecode-HPO-map/).

Usage notes

Files are tab-delimited text files that can be opened in any text editor or spreadsheet software. Use examples with dataframes in python along with a flowchart to recommend mappings for specific needs are available in the "phecode-HPO-map" github repository (https://github.com/emcarthur/phecode-HPO-map/).

Funding

National Human Genome Research Institute, Award: F30HG011200

National Institute of General Medical Sciences, Award: T32GM007347

National Institute of General Medical Sciences, Award: R35GM127087