Healthy and B-cell precursor Acute Lymphoblastic Leukemia (ALL) cells analyzed via CyTOF
Data files
Apr 22, 2025 version files 129.32 MB
-
healthy_1_bm_healthy.fcs
2.90 MB
-
healthy_2_bm_healthy.fcs
2.23 MB
-
healthy_3_bm_healthy.fcs
6.96 MB
-
patient_1_bm_diagnosis.fcs
12.02 MB
-
patient_1_pb__15.fcs
524.10 KB
-
patient_1_pb__8.fcs
4.62 MB
-
patient_1_pb_diagnosis.fcs
21.92 MB
-
patient_2_bm_diagnosis.fcs
27.21 MB
-
patient_2_pb__15.fcs
3.09 MB
-
patient_2_pb__8.fcs
8.82 MB
-
patient_2_pb_diagnosis.fcs
5.22 MB
-
patient_3_bm_diagnosis.fcs
24.30 MB
-
patient_3_pb__15.fcs
517.01 KB
-
patient_3_pb__8.fcs
4.65 MB
-
patient_3_pb_diagnosis.fcs
4.35 MB
-
README.md
4.62 KB
Abstract
This is a dataset of 1,108,853 blood and bone marrow cells collected from 3 pediatric B-cell Precursor Acute Lymphoblastic Leukemia (BCP-ALL) patients and 3 healthy controls. Each ALL sample is made up of a mixture of cancer cells and healthy cells, whereas the healthy samples do not contain and cancer cells. This dataset can be used to evaluate models trained to classify cells as either cancerous or non-cancerous.
Each BCP-ALL patient has samples collected from 3 timepoints and 2 tissues: diagnosis (bone marrow and blood), day 8 post-treatment initiation with chemotherapy (blood), and day 15 post-treatment initiation (blood). Healthy patients only have samples collected from a single timepoint (the time of donation) and one tissue (bone marrow). These different tissues and timepoints can be used to assess a classifier's ability to generalize to new contexts (i.e. from bone marrow to blood, or from the diagnostic timepoint to a timepoint later in treatment).
https://doi.org/10.5061/dryad.8gtht76vw
This repository contains 15 .FCS (Flow Cytometry Standard) data files. Each of these files represents a sample collected from a BCP-ALL patient or a healthy control patient.
Each BCP-ALL patient has samples collected from 3 timepoints and 2 tissues: diagnosis (bone marrow and blood), day 8 post-treatment initiation with chemotherapy (blood), and day 15 post-treatment initiation (blood). Healthy patients only have samples collected from a single timepoint (the time of donation) and one tissue (bone marrow).
Description of the data and file structure
The information about the patient, timepoint, and tissue information about each sample is encoded in the .FCS filename. The .FCS filename is a string formatted as follows:
{patient_name}{tissue}{timepoint}.fcs.
In this string, {patient_name} is one of the following:
- "patient_1" (a BCP-ALL patient)
- "patient_2" (a BCP-ALL patient)
- "patient_3" (a BCP-ALL patient)
- "healthy_1" (a healthy donor)
- "healthy_2" (a healthy donor)
- "healthy_3" (a healthy donor)
{tissue} is one of the following:
- "bm" (bone marrow)
- "pb" (peripheral blood)
and {timepoint} is one of the following:
- "diagnosis" (the diagnostic timepoint for a BCP-ALL patient)
- "+8" (The timepoint corresponding to 8 days post-initiation of chemotherapy treatment)
- "+15" (The timepoint corresponding to 15 days post-initiation of chemotherapy treatment)
- "healthy" (The timepoint at which the healthy donor donated their cells; i.e. the only timepoint available for healthy controls.)
All cells have been analyzed for the presence of 28 proteins as previously described using mass cytometry (CyTOF), a high-dimensional cytometry platform similar to multicolor flow cytometers commonly used to analyze leukemic tissue specimens in clinical laboratories. CyTOF analysis allows a high-dimensional sample characterization, extending the capabilities beyond those of conventional multicolor flow cytometers, typically employed for the analysis of leukemic tissue specimens in clinical laboratories.
Each file includes information about 28 proteins as read off the mass cytometer (unit: ion counts) and an additional column called cell_type that encodes healthy cells with a value of 0 and cancerous cells as a value of 1. These labels were manually annotated by an expert BCP-ALL cytometrist and verified by a physician-scientist board-certified in pediatric hematology and oncology. The samples were extracted, debarcoded, and filtered for doublets and dead cells according to standard mass cytometry protocols. The ion counts for each protein have NOT been transformed in any way.
Software and Workflow
The data files are in Flow Cytometry Standard (.fcs) format and can be analyzed using free, open-source software.
R
We recommend using the following R packages:
flowCore
: for reading and preprocessing .fcs filestidyFlowCore
: for tidyverse-style workflows with flow cytometry datatidytof
: for downstream analysis, including clustering and visualization
Python
For Python users, we recommend pytometry
for reading, transforming, and analyzing .fcs files using the Scanpy
ecosystem
These packages allow users to load, subset, and analyze single-cell protein expression data from mass cytometry experiments. No analysis scripts are included in this dataset submission.
Sharing/Access information
This is a proprietary dataset that has not been published anywhere else. If you access it for use in an academic paper, we encourage citing this Dryad repository.
Human subjects data
All human samples and associated data in this study were collected under protocols approved by the Stanford University Institutional Review Board (IRB). Informed consent was obtained from all participants or their legal guardians, including explicit consent to publish de-identified data in public repositories.
The data shared in this submission have been fully de-identified in accordance with applicable legal and ethical standards. No personally identifiable information, including names, dates of birth, medical record numbers, or geographic identifiers, is included. Sample identifiers (e.g., ID numbers) are randomly assigned codes that cannot be traced back to individual participants.
Only single-cell protein expression data derived from mass cytometry are included. These data do not contain genetic information or any direct or indirect identifiers.
All cells have been analyzed for the presence of 28 proteins as previously described using mass cytometry (CyTOF), a high-dimensional cytometry platform similar to multicolor flow cytometers commonly used to analyze leukemic tissue specimens in clinical laboratories. CyTOF analysis allows a high-dimensional sample characterization, extending the capabilities beyond those of conventional multicolor flow cytometers, typically employed for the analysis of leukemic tissue specimens in clinical laboratories.
The files are in the flow cytometry standard (.FCS) file format and include information about 28 proteins as read off the mass cytometer (unit: ion counts) and an additional column called ('cell_type') that encodes healthy cells with a value of 0 and cancerous cells as a value of 1. These labels were manually annotated by an expert BCP-ALL cytometrist and verified by a physician-scientist board-certified in pediatric hematology and oncology. The samples were extracted, debarcoded, and filtered for doublets and dead cells according to standard mass cytometry protocols. The ion counts for each protein have NOT been transformed in any way.