Skip to main content

Data from: Metabolomics - Emory Cardiovascular Biobank

Cite this dataset

Mehta, Anurag; Liu, Chang; Uppal, Karan; Quyyumi, Arshed (2020). Data from: Metabolomics - Emory Cardiovascular Biobank [Dataset]. Dryad.


Untargeted high-resolution plasma metabolomic profiling among patients with coronary artery disease. Patients recruited from the Emory Cardiovascular Biobank into independent discovery and validation cohorts. 


Dataset collected as part of Emory Cardiovascular Biobank - a prospective regsitry of patients with coronary artery disease

Usage notes

This DATSETNAMEreadme.txt file was generated on 2020-08-20 by ANURAG MEHTA and CHANG LIU


1. Title of Dataset: Emory Cardiovascular Biobank Metabolomics 

2. Author Information
    A. Principal Investigator Contact Information
        Name: Arshed A. Quyyumi, MD
        Institution: Emory University School of Medicine
        Address: 1462 Clifton Road NE, Suite 507, Atlanta, Georgia 30329

    B. Associate or Co-investigator Contact Information
        Name: Anurag Mehta, MD
        Institution: Emory University School of Medicine
        Address: 1462 Clifton Road NE, Suite 513, Atlanta, Georgia 30329

    C. Alternate Contact Information
        Name: Chang Liu
        Institution: Emory University School of Medicine
        Address: 1462 Clifton Road NE, Atlanta, Georgia 30329

3. Date of data collection: 2004 to 2016

4. Information about funding sources that supported the collection of the data: National Heart, Lung, and Blood Institute grant 1P20HL113451

1. Links to publications that cite or use the data:

2. Links to other publicly accessible locations of the data:

3. Links/relationships to ancillary data sets: none

4. Was data derived from another source? no

5. Recommended citation for this dataset: Mehta A, Liu C, Quyyumi AA. Emory Cardiovascular Biobank Metabolomics. 2020


1. File List: Analysis_data_first_cohort.csv; Analysis_data_second_cohort.csv

2. Relationship between files: data of two separate cohorts

3. Additional related data collected that was not included in the current data package: none

4. Are there multiple versions of the dataset? no

METHODOLOGICAL INFORMATION: please see description in materials and methods section of the manuscript

DATA-SPECIFIC INFORMATION FOR: Analysis_data_first_cohort.csv

1. Number of variables: 6796

2. Number of cases/rows: 454

3. Variable List: 
GENEID: subject IDs
timetodeath3yr: time to death event or censoring at three years
death3yr: death event at three years
age: age in years
male: male gender, 1=male, 0=female
BlackRace: black race, 1=black, 0=non-black
batch: batch of metabolomics profiling
Strokehx: history of stroke, 1=yes, 0=no
CABGhx: prior CABG, 1=yes, 0=no
PVDhx: peripheral artery disease, 1=yes, 0=no
eGFR_max120_lt60: estimated glomerular filtration rate less than 60 ml/min/1.73m2, 1=yes, 0=no
curr_smoking: current smoking, 1=yes, 0=no
HFhx: heart failure history, 1=yes, 0=no
HTN: hypertension, 1=yes, 0=no
DM: diabetes, 1=yes, 0=no
mzXX_tXX: intensities of metabolic features

4. Missing data codes: NA

DATA-SPECIFIC INFORMATION FOR: Analysis_data_second_cohort.csv

1. Number of variables: 8729

2. Number of cases/rows: 322

3. Variable List: 
GENEID: subject IDs
timetodeath3yr: time to death event or censoring at three years
death3yr: death event at three years, 1=yes, 0=no
age: age in years
male: male gender, 1=male, 0=female
BlackRace: black race, 1=black, 0=non-black
batch: batch of metabolomics profiling
Strokehx: history of stroke, 1=yes, 0=no
CABGhx: prior CABG, 1=yes, 0=no
PVDhx: peripheral artery disease, 1=yes, 0=no
eGFR_max120_lt60: estimated glomerular filtration rate less than 60 ml/min/1.73m2, 1=yes, 0=no
curr_smoking: current smoking, 1=yes, 0=no
HFhx: heart failure history, 1=yes, 0=no
HTN: hypertension, 1=yes, 0=no
DM: diabetes, 1=yes, 0=no
mzXX_tXX: intensities of metabolic features

4. Missing data codes: NA


National Heart Lung and Blood Institute, Award: 1P20HL113451

American Heart Association, Award: 19POST34400057