CODEX multiplexed imaging cell datasets used for using STELLAR to transfer cell type annotations to other tissues and donors
Hickey, John (2022), CODEX multiplexed imaging cell datasets used for using STELLAR to transfer cell type annotations to other tissues and donors, Dryad, Dataset, https://doi.org/10.5061/dryad.g4f4qrfrc
We performed CODEX (co-detection by indexing) multiplexed imaging on 24 sections of the human intestine from 3 donors (B004, B005, B006) using a panel of 47 oligonucleotide-barcoded antibodies. We also performed CODEX imaging on both human tonsil and Barrett's esophagus (BE) using a panel of 57 oligonucleotide-barcoded antibodies. Subsequently images underwent standard CODEX image processing (tile stitching, drift compensation, cycle concatenation, background subtraction, deconvolution, and determination of best focal plane), single cell segmentation, and column marker z-normalization by tissue. Output of this process were dataframes of 870,000 cells and 220,000 cells respectively with fluorescence values quantified from each marker.
See README file.
This dataset could be used to test machine learning algorithms for cell type label transfer accuracy methods. It could also be used to look at cell type relationships in tonsil, intestine, and Barrett's esophagus tissues.
The overall structure of the datasets are individual cells segmented out in each row. Then there are columns for the X, Y position in pixels in the overall montage image of the dataset. There are also columns to indicate which region the data came from. There are also cell type labels generated from expert annotations. The other columns are the values of the antibody staining the target protein within the tissue quantified at the single-cell level. This value is the per cell/area averaged fluorescent intensity that has subsequently been z normalized along each column as described above.
For the B004_training_dryad.csv dataset, data from donor B004 was expert annotated for cell types within the small intestine and colon (~250,000 cells) and contains cell type labels in addition to protein marker expressions and x, y positions. Each donor has 4 regions from the colon and 4 regions from the small intestine. For the intra-region comparisons we looked at B004 regions in the colon with training 3 regions and then predicting on the fourth.
For the B0056_unnanotated_dryad.csv dataset, data from donors B005 and B006 were unnanotated samples we transferred cell type labels to from the B004 training dataset. This means that B0056 has the quantified and preprocessed single-cell quantified protein expression with x, y positions from CODEX imaging, but no cell type annotation labels associated yet.
The TonsilBE_processed_0.5 pkl is the preconstructed graph for the tonsil and BE datasets so that running a demo example of STELLAR found on our github (https://github.com/snap-stanford/stellar) runs faster and is included as supplementary data.