A major goal in translational cancer research is to identify biological signatures driving cancer progression and metastasis. A common technique applied in genomics research is to cluster patients using gene expression data from a candidate prognostic gene set, and if the resulting clusters show statistically significant outcome stratification, to associate the gene set with prognosis, suggesting its biological and clinical importance. Recent work has questioned the validity of this approach by showing in several breast cancer data sets that "random" gene sets tend to cluster patients into prognostically variable subgroups. This work suggests that new rigorous statistical methods are needed to identify biologically informative prognostic gene sets. To address this problem, we developed Significance Analysis of Prognostic Signatures (SAPS) which integrates standard prognostic tests with a new prognostic significance test based on stratifying patients into prognostic subtypes with random gene sets. SAPS ensures that a significant gene set is not only able to stratify patients into prognostically variable groups, but is also enriched for genes showing strong univariate associations with patient prognosis, and performs significantly better than random gene sets. We use SAPS to perform a large meta-analysis (the largest completed to date) of prognostic pathways in breast and ovarian cancer and their molecular subtypes. Our analyses show that only a small subset of the gene sets found statistically significant using standard measures achieve significance by SAPS. We identify new prognostic signatures in breast and ovarian cancer and their corresponding molecular subtypes, and we show that prognostic signatures in ER negative breast cancer are more similar to prognostic signatures in ovarian cancer than to prognostic signatures in ER positive breast cancer. SAPS is a powerful new method for deriving robust prognostic biological signatures from clinically annotated genomic datasets.

Breast Cancer Data

Breast cancer data. This R-workspace contains the objects: dat, dat.st, event, st, and time.

Breast.zip

Ovary_NonAngio_GSEA_Results

Results from GSEA Analysis in Non-Angiogenic subtype of ovarian cancer

Ovary_NonAngio.zip

Breast_Global_GSEA_Results

Results from GSEA Analysis in Global Breast Cancer Analysis

Breast_Global.zip

Breast_Her2_GSEA_Results

Results from GSEA Analysis in HER2+ subtype of breast cancer

Breast_Her2.zip

Ovary_Angio_GSEA_Results

Results from GSEA Analysis in Angiogenic subtype of ovarian cancer

Ovary_Angio.zip

Breast_ERHigh_GSEA_Results

Results from GSEA Analysis in ER+ high proliferation subtype of breast cancer

Breast_ERHigh.zip

Breast_ERNegHer2Neg_GSEA_Results

Results from GSEA Analysis in ER Neg HER2 Neg subtype of breast cancer

Breast_ERNegHer2Neg.zip

Ovary_Global_GSEA_Results

Results from GSEA Analysis in Global ovarian cancer analysis

Ovary_Global.zip

Breast_ERLow_GSEA_Results

Results from GSEA Analysis in ER+ low proliferation subtype of breast cancer

Breast_ERLow.zip

Ovarian Cancer Data

Ovarian cancer data. This R-workspace contains the objects: dat, dat.st, event, st, and time.

Ovary.zip

Breast.Ps.OnPermutedData.RData

Breast.Ps.OnPermutedData.RData contains the results of performing SAPS using permuted gene sets on the breast data. P_enrich, p_pure,p_rand are each 8 x 10000 x 6 arrays with P_enrich,P_pure, and P_random p values from permuted gene sets

Ovary.Ps.OnPermutedData.RData

ReadMe, Ovary.Ps.OnPermutedData.RData. Ovary.Ps.OnPermutedData.RData contains the results of performing SAPS using permuted gene sets on the ovarian data. P_enrich, p_pure,p_rand are arrays with P_enrich,P_pure, and P_random p values from permuted gene sets.

FinalOutput_Breast

FinalOutput_Breast.RData contains the results from the subtype-specific analysis in breast cancer, including the results of the permutation-based procedure to compute p values and q values for the SAPSscores.

FinalOutput_Ovary

FinalOutput_Ovary.RData contains the results from the traditional scaled data set in ovarian cancer, including the results of the permutation-based procedure to compute p values and q values for the SAPSScores.

molsigdb.v3.0.entrezForR.txt

molsigdb.v3.0.entrezForR contains the molsigdb, downloaded from the Broad Institute. The file is used to read the molsigdb.v3.0 gene sets into R.

BreastOutput_TradScaled

BreastOutput_TradScaled.RData is an R-workspace contains the objects: allPs, allPs.adj, sumTable. These were generated from applying the SAPS method to the breast cancer meta-data set scaled by transforming each feature into its Z score across all patients in a data-set prior to merging across data-sets.

BreastOutput_SubScaled

BreastOutput_SubScaled.RData is an R-workspace contains the objects: allPs, allPs.adj, sumTable. These were generated from applying the SAPS method to the breast cancer meta-data set scaled by transforming each feature into its Z score across all patients within a breast cancer subtype data-set prior to merging across data-sets.

BreastSubtypeSpecScaleRankDir

BreastSubtypeSpecScaleRankDir contains the ranked gene lists of concordance indices used to perform the GSEA in breast cancer.

OvaryOutput_TradScaled

OvaryOutput_TradScaled.RData contains the objects: allPs, allPs.adj, sumTable.BreastOutput_TradScaled.RData. These were generated from applying the SAPS method to the ovarian cancer meta-data set scaled by transforming each feature into its Z score across all patients in a data-set prior to merging across data-sets.

OvaryOutput_SubScaled

OvaryOutput_SubScaled.RData is an R-workspace contains the objects: allPs, allPs.adj, sumTable. These were generated from applying the SAPS method to the ovarian cancer meta-data set scaled by transforming each feature into its Z score across all patients within a ovarian cancer subtype data-set prior to merging across data-sets.

OvaryTradScaleRankDir

OvaryTradScaleRankDircontains the ranked gene lists of concordance indices used to perform the GSEA in ovarian cancer.

BreastOvary_HCv2

BreastOvary_HCv2.zip – This zip directory contains files to generate Figure 10 (Hierarchical clustering of breast and ovarian cancer subtypes based on SAPS scores) using JavaTreeView (http://jtreeview.sourceforge.net/)

runSAPSonPermutedData

runSAPSonPermutedData.R – This R script generates the P_pure, P_random, and P_enrichment on random gene sets. This "biologically null" set of SAPS scores is used to compute the SAPS_q_values on the msigdb gene sets.

saps

saps.R – This R script provides R commands for loading data, applying the SAPS method, and generating the SAPS p values. The script is interactive, and the user must specify the working directory, and if the analysis is on the ovarian or breast data.

sapsFigures

sapsFigures.R – This R script generates the figures, tables, and file used for clustering

computeSAPS.Permute.PValue.R

computeSAPS.Permute.PValue.R – This script generates permutation-based p and q values for the SAPSscores obtained in breast and ovarian cancer.

Data from: Significance Analysis of Prognostic Signatures

Data files

Abstract

Breast Cancer Data

Ovary_NonAngio_GSEA_Results

Breast_Global_GSEA_Results

Breast_Her2_GSEA_Results

Ovary_Angio_GSEA_Results

Breast_ERHigh_GSEA_Results

Breast_ERNegHer2Neg_GSEA_Results

Ovary_Global_GSEA_Results

Breast_ERLow_GSEA_Results

Ovarian Cancer Data

Breast.Ps.OnPermutedData.RData

Ovary.Ps.OnPermutedData.RData

FinalOutput_Breast

FinalOutput_Ovary

molsigdb.v3.0.entrezForR.txt

BreastOutput_TradScaled

BreastOutput_SubScaled

BreastSubtypeSpecScaleRankDir

OvaryOutput_TradScaled

OvaryOutput_SubScaled

OvaryTradScaleRankDir

BreastOvary_HCv2

runSAPSonPermutedData

saps

sapsFigures

computeSAPS.Permute.PValue.R

Data from: Significance Analysis of Prognostic Signatures

Data files

Abstract

Usage notes

Breast Cancer Data

Ovary_NonAngio_GSEA_Results

Breast_Global_GSEA_Results

Breast_Her2_GSEA_Results

Ovary_Angio_GSEA_Results

Breast_ERHigh_GSEA_Results

Breast_ERNegHer2Neg_GSEA_Results

Ovary_Global_GSEA_Results

Breast_ERLow_GSEA_Results

Ovarian Cancer Data

Breast.Ps.OnPermutedData.RData

Ovary.Ps.OnPermutedData.RData

FinalOutput_Breast

FinalOutput_Ovary

molsigdb.v3.0.entrezForR.txt

BreastOutput_TradScaled

BreastOutput_SubScaled

BreastSubtypeSpecScaleRankDir

OvaryOutput_TradScaled

OvaryOutput_SubScaled

OvaryTradScaleRankDir

BreastOvary_HCv2

runSAPSonPermutedData

saps

sapsFigures

computeSAPS.Permute.PValue.R

Works referencing this dataset