Skip to main content

Cell type labels for all clustering and normalization combinations compared for CODEX multiplexed imaging

Cite this dataset

Hickey, John (2022). Cell type labels for all clustering and normalization combinations compared for CODEX multiplexed imaging [Dataset]. Dryad.


We performed CODEX (co-detection by indexing) multiplexed imaging on four sections of the human colon (ascending, transverse, descending, and sigmoid) using a panel of 47 oligonucleotide-barcoded antibodies. Subsequently images underwent standard CODEX image processing (tile stitching, drift compensation, cycle concatenation, background subtraction, deconvolution, and determination of best focal plane), and single cell segmentation. Output of this process was a dataframe of nearly 130,000 cells with fluorescence values quantified from each marker. We used this dataframe as input to 1 of the 5 normalization techniques of which we compared z, double-log(z), min/max, and arcsinh normalizations to the original unmodified dataset. We used these normalized dataframes as inputs for 4 unsupervised clustering algorithms: k-means, leiden, X-shift euclidian, and X-shift angular.

From the clustering outputs, we then labeled the clusters that resulted for cells observed in the data producing 20 unique cell type labels. We also labeled cell types by hiearchical hand-gating data within cellengine ( We also created another gold standard for comparison by overclustering unormalized data with X-shift angular clustering. Finally, we created one last label as the major cell type call from each cell from all 21 cell type labels in the dataset. 

Consequently the dataset has individual cells segmented out in each row. Then there are columns for the X, Y position in pixels in the overall montage image of the dataset. There are also columns to indicate which region the data came from (4 total). The rest are labels generated by all the clustering and normalization techniques used in the manuscript and what were compared to each other. These also were the data that were used for neighborhood analysis for the last figure of the manuscript. These are provided at all four levels of cell type level granularity (from 7 cell types to 35 cell types). 

Usage notes

This dataset could be used to observed relationships in cell type calls for CODEX Multiplexed Imaging data with different normalization and clustering technique combinations. It could also be used to test machine learning algorithms for cell type label transfer accuracy methods.