Processed single cell data from CODEX multiplexed imaging of the human intestine

Hickey, John 1

Published Nov 10, 2022; Updated Sep 13, 2023 on Dryad. https://doi.org/10.5061/dryad.pk0p2ngrf

Data files

Nov 10, 2022 version files 2.95 GB

Sep 13, 2023 version files 2.91 GB

23_09_CODEX_HuBMAP_alldata_Dryad_merged.csv

2.91 GB
donor_metadata.csv

484 B
README.md

6.53 KB

Abstract

We performed CODEX (co-detection by indexing) multiplexed imaging on 64 sections of the human intestine (~16 mm2) from 8 donors (B004, B005, B006, B008, B009, B010, B011, and B012) using a panel of 57 oligonucleotide-barcoded antibodies. Subsequently, images underwent standard CODEX image processing (tile stitching, drift compensation, cycle concatenation, background subtraction, deconvolution, and determination of best focal plane), single cell segmentation, and column marker z-normalization by tissue. The outputs of this process were data frames of 2.6 million cells with 57 antibody fluorescence values quantified from each marker. Each cell has its cell type, cellular neighborhood, community of neighborhooods, and tissue unit defined with x, y coordinates representing pixel location in the original image. This is from a total of 25 cell types, 20 multicellular neighborhoods, 10 communities of neighborhoods, and 3 tissue segments that could be used to understand the cellular interactions, composition, and structure of the human intestine from the duodenum to the sigmoid colon and understand differences between different areas of the intestine. This data could be used as a healthy baseline to compare other single-cell datasets of the human intestine, particularly multiplexed imaging ones.

The overall structure of the datasets is individual cells segmented out in each row. Columns MUC2 through CD161 are the markers used for clustering the cell types. These are the columns that are the values of the antibody staining the target protein within the tissue quantified at the single-cell level. This value is the per cell/area averaged fluorescent intensity that has subsequently been z normalized along each column as described above. OLFM4 through MUC6 were captured in the quantification but not used within the clustering of cell types. Other columns are explained in the table in the Usage Notes section below.

Along with this main data table, there is also a donor metadata table that links the donor ids to clinical metadata such as: age, sex, race, BMI, history of diabetes, history of cancer, history of hypertension, and history of gastorintestinal disease.

The raw imaging data can be found at (https://portal.hubmapconsortium.org/). We have created a landing page with links to all the raw dataset IDs and the HuBMAP ID for this Collection is HBM692.JRZB.356 and the DOI is:10.35079/HBM692.JRZB.356. This can be used to also pair it with the matched snRNAseq and snATACseq for each section of tissue.

https://doi.org/10.5061/dryad.pk0p2ngrf

We performed CODEX (co-detection by indexing) multiplexed imaging on 64 sections of the human intestine from 8 donors (B004, B005, B006, B008, B009, B010, B011, and B012) using a panel of 57 oligonucleotide-barcoded antibodies. Subsequently images underwent standard CODEX image processing (tile stitching, drift compensation, cycle concatenation, background subtraction, deconvolution, and determination of best focal plane), single cell segmentation, and column marker z-normalization by tissue. Output of this process were dataframe of 2.6 million cells with 57 antibody fluorescence values quantified from each marker.

Methods

For detailed description of each of the steps of protocols and processes to obtain this data see the detailed materials and methods with the associated manuscript. Briefly, intestine pieces from 8 different sites across the small intestine and colon were taken and frozen in OCT. These were assembled into an array of 4 tissues, cut into 7 um slices, and stained with a panel of 54 CODEX DNA-oligonucleotide barcoded antibodies. Tissues were imaged with a Keyence microscope at 20x objective and then processed using image stitching, drift compensation, deconvolution, and cycle concatenation. Processed data were then segmented using CellVisionSegmenter, a neural network R-CNN-based single-cell segmentation algorithm. Cell type analysis was completed on B004, 5 and 6 by z normalization of protein markers used for clustering and then overclustered using leiden-based clustering. The cell type labels were verified looking back at the original image. Cell type labels were transferred to other donors using STELLAR framework for annotating spatially resolved single-cell data as described in details in companion Nature Methods manuscript. With set cell type labels we performed neighborhood analysis by clustering windows of the 10 nearest neighbors around a given cell and were named based off cell type enrichment and location in the tissue. Similarly communities of neighborhoods were determined by taking the 100 nearest neighbors with the neighborhood labels and clustered. Finally, tissue segments were determined through multiple rounds of clustering the 300 nearest neighbors of the community labels of each cell. Broad categories for cell types, neighborhoods, and communities were expert annotated based on epithelial, immune, or other stromal compartment.

Description of the data and file structure

This dataset of 8 donors with 8 individual tissue regions (64 tissues imaged) across 2.6 million cells, with 25 cell types, 20 multicellular neighborhoods, 10 communities of neighborhoods, and 3 tissue segments could be used to understand the cellular interactions, composition, and structure of the human intestine from the duodenum to the sigmoid colon and understand differences between different areas of the intestine.
The overall structure of the datasets are individual cells segmented out in each row. Columns MUC2 through CD161 are the markers used for clustering the cell types. These are the columns that are the values of the antibody staining the target protein within the tissue quantified at the single-cell level. This value is the per cell/area averaged fluorescent intensity that has subsequently been z normalized along each column as described above. OLFM4 through MUC6 were captured in the quantification but not used within the clustering of cell types. Other columns are explained in the table below:

Column	Explanation
x	Tissue x position in each region imaged
y	Tissue y position in each region imaged
array	tissue array from which each region was imaged
Xcorr	Corrected x position in each array imaged
Ycorr	Corrected y position in each array imaged
Tissue_location	Segment of the intestine where the tissue came from
tissue	Whether it was from small bowel or colon
donor	Which donor the cells came from
unique_region	Label for unique region from both Tissue_location and donor
region	Number region from initial imaging data
Cell Type	Cell type labels used for the paper analysis
Cell Type em	Cell type labels from a subset of samples where MUC6 was used (B009-B012)
Cell subtype	Major categories of Cell Type column used for subsetting cell types
Neighborhood	Neighborhood labels from data analyzed all together
Neigh_sub	Major categories of Neighborhood column used for subsetting neighborhoods
Neighborhood_Ind	Neighborhood labels from data analyzed by Tissue_location
NeighInd_sub	Major categories of Neighborhood_Ind column used for subsetting neighborhoods
Community	Community labels from data analyzed all together
Major Community	Major categories of Community column used for subsetting communities
Tissue Segment	Tissue segment labels for each cell from data analyzed together
Tissue Unit	Tissue unit labels for each cell from data analyzed together (except for two replicate conditions)

Along with this main data table there is also a donor metadata table that links the donor ids to clinical metadata such as: age, sex, race, BMI, history of diabetes, history of cancer, history of hypertension, and history of gastrointestinal disease.

Sharing/Access information

Data is connected to several other sources of data (snRNAseq, snATACseq, imaging):

*As of 09/11/2023, the single-cell csv file was updated to include the Tissue Unit column which was present in the original analysis, but not saved with the original .csv file uploaded to dryad. Also, two of the tissue replicates were renamed within the unique_region column to enable more accurate spatial calculations using this column as a unique identifier of tissue coordinates.

Code/Software

n/a

Processed single cell data from CODEX multiplexed imaging of the human intestine

Data files

Abstract

README: Processed single cell data from CODEX multiplexed imaging of the human intestine

Methods

Description of the data and file structure

Sharing/Access information

Code/Software

Methods

Works referencing this dataset