Fluoro-Forest CODEX data for random Forest-based cell type annotation
Data files
Dec 30, 2025 version files 408.91 MB
-
C-7_coords.csv
515.27 KB
-
C-7_expression.csv
13.33 MB
-
C-7.geojson
6.79 MB
-
C-7.ome.tif
159.29 MB
-
channelnames.txt
156 B
-
N-12_coords.csv
794.55 KB
-
N-12_expression.csv
21.56 MB
-
N-12.geojson
9.98 MB
-
N-12.ome.tif
196.65 MB
-
README.md
450 B
Abstract
High-plex immunofluorescence (IF) workflows typically rely on unsupervised clustering, followed by cell type annotation at a cluster level for cell type assignment. Most of these methods use marker expression averages that lack a statistical evaluation of cell type annotations, which can result in misclassification. Here, we propose a strategy through an end-to-end pipeline using a semi-supervised, random forests approach to predict cell type annotations. Our method includes cluster-based sampling for training data, cell type prediction, and downstream visualization for interpretability of cell annotation that ultimately improves classification results. We show that our workflow can annotate cells more accurately with a training set < 5 % of the total number of cells tested. In addition, our pipeline outputs cell type annotation probabilities and model performance metrics for users to decide if it could boost their existing clustering-based workflow results for complex IF data.
Example data for random Forest-based cell type annotation
Dataset DOI: 10.5061/dryad.hqbzkh1v1
File overview
TMA cores: C-7.ome.tif, N-12.ome.tif
Channel names (row number = image slice index): channelnames.txt
Segmentations from StarDist: C-7.geojson, N-12.geojson
Log2 expression data: C-7_expression.csv, N-12_expression.csv
Coordinates of Cell IDs: C-7_coords.csv, N-12_coords.csv
Selected samples from anal precancers and cancers were used to create a tumor microarray (TMA) for spatial phenotyping analysis using the Akoya Phenocycler Fusion (formerly known as CODEX). CODEX data were generated at The Bursky Center for Human Immunology and Immunotherapy Programs, Washington University School of Medicine.
This approach uses tissue-based cyclic immunofluorescence for highly multiplexed immunofluorescence imagining on FFPE specimens from glass slides. Data shown within this manuscript were taken from 2, 2mm core biopsies sourced from the University of Wisconsin-Madison using a custom panel. Final stitched images for the cores, segmentation results from StarDist, and expression summaries are used within the workflow for processing and cell annotation.
30 maker panel: each image slice corresponds to a marker in the order shown:
DAPI, Ki67, CD31, FOXP3 , CD56
CD34, CD4, CD20, CD45, CD163
HLA-A, LAG3, CD8, SMA, PDL1
CD21, PanCK, IDO1, bCat1, CD14
PD-1, CD44, CD3e, CD45RO, CD68
GZMB, HLA-DR, ICOS, HIF1A, CK17
