Skip to main content

Longitudinal trends of EHR concepts in pediatric patients

Cite this dataset

Giangreco, Nicholas (2022). Longitudinal trends of EHR concepts in pediatric patients [Dataset]. Dryad.


The longitudinal nature of the data motivated temporal trend identification in the pediatric EHR datatypes. Over the past three decades (1980-2018), we identified and quantified the temporal trend of 16,460 EHR concepts across measurement, visit, diagnosis, drug, and procedure datatypes. 


See the Methods of the associated JAMIA Open manuscript. 

We defined trends for clinical concepts per EHR datatype per year across participants. We first calculated a standardized z-score (x-mu)/sd, where x was the percent of a concept or the number of participants with a recorded concept within a year and for a datatype out of the total number of concepts for that datatype, mu was the average percent across concepts for a year, and sd was the sample standard deviation of the percent across concepts. We then calculated a linear model to estimate the association of concept z-scores across time. We quantified the slope and R squared coefficient between the z-score and a year, across all years where EHR data was provided. This generated a beta coefficient for each year representing a trend in clinical concepts, relative to other EHR concepts, recorded in participant’s EHRs. Significant trends were defined as the linear model beta coefficient greater than 0, beta coefficient confidence interval not containing the null association, and R squared coefficient between the date and z scores greater than 0.8. We performed summarization, visualization, and statistical analyses using R packages including tidyverse and data.table and Python3 libraries Numpy, Matplotlib, Pandas, Sklearn, and Seaborn.

Usage notes

Field : Description

datatype : The OMOP-defined EHR domain.

concept_id : The OMOP-defined concept identifier. 

concept_name : The OMOP-defined concept name.

lwr: The 95% lower bound of the odds ratio quantified by the linear model.

odds: The odds ratio of the EHR concept z-score across three decades (year units) quantified by the linear model.

upr: The 95% upper bound of the odds ratio quantified by the linear model.


National Institutes of Health