Skip to main content

Molecular subtyping of alzheimer’s disease with consensus non-negative matrix factorization

Cite this dataset

Zheng, Chunlei; Xu, Rong (2021). Molecular subtyping of alzheimer’s disease with consensus non-negative matrix factorization [Dataset]. Dryad.


Alzheimer’s disease (AD) is a heterogeneous disease and exhibits diverse clinical presentations and disease progression. Some pathological and anatomical subtypes have been proposed. However, these subtypes provide a limited mechanistic understanding for AD. Leveraging gene expression data of 222 AD patients from The Religious Orders Study and Memory and Aging Project (ROSMAP) Study, we identified two AD molecular subtypes (synaptic type and inflammatory type) using consensus non-negative matrix factorization (NMF). Synaptic type is characterized by disrupted synaptic vesicle priming and recycling and synaptic plasticity. Inflammatory type is characterized by disrupted IL2, interferon alpha and gamma pathways. The two AD molecular subtypes were validated using independent data from Gene Expression Omnibus. We further demonstrated that the two molecular subtypes are associated with APOE genotypes, with synaptic type more prevalent in AD patients with E3E4 genotype and inflammatory type more prevalent in AD patients with E3E3 genotype (p = 0.031). In addition, two molecular subtypes are differentially represented in male and female AD, with synaptic type more prevalent in male and inflammatory type in female patients (p = 0.051). Identification of AD molecular subtypes has potential in facilitating disease mechanism understanding, clinical trial design, drug discovery, and precision medicine for AD.


ROSMAP gene expression data and corresponding metadata were downloaded from (syn3219045). Raw count data were normalized and processed according to commonly used procedure described in edgeR (version: 3.28.0). Data were first normalized by sequencing library size. Non-expressed genes, defined as count per million less than 5 in 80% of samples, were then filtered out, resulting in 12281 genes. The number of genes were further narrowed down to 2456 using top 20% cut-off based on their interquartile range (IQR) for NMF-based clustering

Usage notes

Please see the readme file for the datafile description and the data dictionaries.