Skip to main content
Dryad

Fluoro-Forest CODEX data for random Forest-based cell type annotation

Data files

Dec 30, 2025 version files 408.91 MB

Click names to download individual files

Abstract

High-plex immunofluorescence (IF) workflows typically rely on unsupervised clustering, followed by cell type annotation at a cluster level for cell type assignment. Most of these methods use marker expression averages that lack a statistical evaluation of cell type annotations, which can result in misclassification. Here, we propose a strategy through an end-to-end pipeline using a semi-supervised, random forests approach to predict cell type annotations. Our method includes cluster-based sampling for training data, cell type prediction, and downstream visualization for interpretability of cell annotation that ultimately improves classification results. We show that our workflow can annotate cells more accurately with a training set < 5 % of the total number of cells tested. In addition, our pipeline outputs cell type annotation probabilities and model performance metrics for users to decide if it could boost their existing clustering-based workflow results for complex IF data.