Skip to main content
Dryad

Reproducible data, example subsets, and analysis pipeline for the extended TAaCGH study of breast cancer genomic and transcriptomic profiles

Data files

Jan 03, 2026 version files 48.35 KB

Click names to download individual files

Abstract

This repository contains the R-based computational framework used to implement and extend the TAaCGH (Tumor Array CGH) pipeline for the associated study. The materials provide a clear and reproducible workflow, taking the user from data preprocessing through survival analysis and subtype-specific modeling. When applied to the referenced publicly available datasets, the scripts reproduce all analyses reported in the manuscript, including maximally selected rank statistic (MaxStat) calculations and region-to-gene mapping steps. The workflow is designed to run end-to-end and produces a consistent set of outputs, such as multiple-testing-adjusted significance tables, genomic interval annotations, and gene biotype summaries. In addition, the pipeline generates diagnostic plots and gene metadata that support interpretation of subtype-specific genomic patterns. Detailed instructions describing the directory structure and execution steps are provided in the accompanying README to support reproducibility and reuse.