Processed single cell data from CODEX multiplexed imaging of the human intestine
Data files
Nov 10, 2022 version files 2.95 GB
-
CODEX_HuBMAP_alldata_Dryad.csv
-
donor_metadata.csv
-
README.docx
Sep 13, 2023 version files 2.91 GB
-
23_09_CODEX_HuBMAP_alldata_Dryad_merged.csv
-
donor_metadata.csv
-
README.md
Abstract
We performed CODEX (co-detection by indexing) multiplexed imaging on 64 sections of the human intestine (~16 mm2) from 8 donors (B004, B005, B006, B008, B009, B010, B011, and B012) using a panel of 57 oligonucleotide-barcoded antibodies. Subsequently, images underwent standard CODEX image processing (tile stitching, drift compensation, cycle concatenation, background subtraction, deconvolution, and determination of best focal plane), single cell segmentation, and column marker z-normalization by tissue. The outputs of this process were data frames of 2.6 million cells with 57 antibody fluorescence values quantified from each marker. Each cell has its cell type, cellular neighborhood, community of neighborhooods, and tissue unit defined with x, y coordinates representing pixel location in the original image. This is from a total of 25 cell types, 20 multicellular neighborhoods, 10 communities of neighborhoods, and 3 tissue segments that could be used to understand the cellular interactions, composition, and structure of the human intestine from the duodenum to the sigmoid colon and understand differences between different areas of the intestine. This data could be used as a healthy baseline to compare other single-cell datasets of the human intestine, particularly multiplexed imaging ones.
The overall structure of the datasets is individual cells segmented out in each row. Columns MUC2 through CD161 are the markers used for clustering the cell types. These are the columns that are the values of the antibody staining the target protein within the tissue quantified at the single-cell level. This value is the per cell/area averaged fluorescent intensity that has subsequently been z normalized along each column as described above. OLFM4 through MUC6 were captured in the quantification but not used within the clustering of cell types. Other columns are explained in the table in the Usage Notes section below.
Along with this main data table, there is also a donor metadata table that links the donor ids to clinical metadata such as: age, sex, race, BMI, history of diabetes, history of cancer, history of hypertension, and history of gastorintestinal disease.
The raw imaging data can be found at (https://portal.hubmapconsortium.org/). We have created a landing page with links to all the raw dataset IDs and the HuBMAP ID for this Collection is HBM692.JRZB.356 and the DOI is:10.35079/HBM692.JRZB.356. This can be used to also pair it with the matched snRNAseq and snATACseq for each section of tissue.
README: Processed single cell data from CODEX multiplexed imaging of the human intestine
https://doi.org/10.5061/dryad.pk0p2ngrf
We performed CODEX (co-detection by indexing) multiplexed imaging on 64 sections of the human intestine from 8 donors (B004, B005, B006, B008, B009, B010, B011, and B012) using a panel of 57 oligonucleotide-barcoded antibodies. Subsequently images underwent standard CODEX image processing (tile stitching, drift compensation, cycle concatenation, background subtraction, deconvolution, and determination of best focal plane), single cell segmentation, and column marker z-normalization by tissue. Output of this process were dataframe of 2.6 million cells with 57 antibody fluorescence values quantified from each marker.
Methods
For detailed description of each of the steps of protocols and processes to obtain this data see the detailed materials and methods with the associated manuscript. Briefly, intestine pieces from 8 different sites across the small intestine and colon were taken and frozen in OCT. These were assembled into an array of 4 tissues, cut into 7 um slices, and stained with a panel of 54 CODEX DNA-oligonucleotide barcoded antibodies. Tissues were imaged with a Keyence microscope at 20x objective and then processed using image stitching, drift compensation, deconvolution, and cycle concatenation. Processed data were then segmented using CellVisionSegmenter, a neural network R-CNN-based single-cell segmentation algorithm. Cell type analysis was completed on B004, 5 and 6 by z normalization of protein markers used for clustering and then overclustered using leiden-based clustering. The cell type labels were verified looking back at the original image. Cell type labels were transferred to other donors using STELLAR framework for annotating spatially resolved single-cell data as described in details in companion Nature Methods manuscript. With set cell type labels we performed neighborhood analysis by clustering windows of the 10 nearest neighbors around a given cell and were named based off cell type enrichment and location in the tissue. Similarly communities of neighborhoods were determined by taking the 100 nearest neighbors with the neighborhood labels and clustered. Finally, tissue segments were determined through multiple rounds of clustering the 300 nearest neighbors of the community labels of each cell. Broad categories for cell types, neighborhoods, and communities were expert annotated based on epithelial, immune, or other stromal compartment.
Description of the data and file structure
This dataset of 8 donors with 8 individual tissue regions (64 tissues imaged) across 2.6 million cells, with 25 cell types, 20 multicellular neighborhoods, 10 communities of neighborhoods, and 3 tissue segments could be used to understand the cellular interactions, composition, and structure of the human intestine from the duodenum to the sigmoid colon and understand differences between different areas of the intestine.
The overall structure of the datasets are individual cells segmented out in each row. Columns MUC2 through CD161 are the markers used for clustering the cell types. These are the columns that are the values of the antibody staining the target protein within the tissue quantified at the single-cell level. This value is the per cell/area averaged fluorescent intensity that has subsequently been z normalized along each column as described above. OLFM4 through MUC6 were captured in the quantification but not used within the clustering of cell types. Other columns are explained in the table below:
Column | Explanation |
---|---|
x | Tissue x position in each region imaged |
y | Tissue y position in each region imaged |
array | tissue array from which each region was imaged |
Xcorr | Corrected x position in each array imaged |
Ycorr | Corrected y position in each array imaged |
Tissue_location | Segment of the intestine where the tissue came from |
tissue | Whether it was from small bowel or colon |
donor | Which donor the cells came from |
unique_region | Label for unique region from both Tissue_location and donor |
region | Number region from initial imaging data |
Cell Type | Cell type labels used for the paper analysis |
Cell Type em | Cell type labels from a subset of samples where MUC6 was used (B009-B012) |
Cell subtype | Major categories of Cell Type column used for subsetting cell types |
Neighborhood | Neighborhood labels from data analyzed all together |
Neigh_sub | Major categories of Neighborhood column used for subsetting neighborhoods |
Neighborhood_Ind | Neighborhood labels from data analyzed by Tissue_location |
NeighInd_sub | Major categories of Neighborhood_Ind column used for subsetting neighborhoods |
Community | Community labels from data analyzed all together |
Major Community | Major categories of Community column used for subsetting communities |
Tissue Segment | Tissue segment labels for each cell from data analyzed together |
Tissue Unit | Tissue unit labels for each cell from data analyzed together (except for two replicate conditions) |
Along with this main data table there is also a donor metadata table that links the donor ids to clinical metadata such as: age, sex, race, BMI, history of diabetes, history of cancer, history of hypertension, and history of gastrointestinal disease.
Sharing/Access information
Data is connected to several other sources of data (snRNAseq, snATACseq, imaging):
- https://doi.org/10.5061/dryad.76hdr7t1p
- https://doi.org/10.5061/dryad.gmsbcc2sq
- https://doi.org/10.5061/dryad.8pk0p2ns8
- https://doi.org/10.5061/dryad.0zpc8672f
*As of 09/11/2023, the single-cell csv file was updated to include the Tissue Unit column which was present in the original analysis, but not saved with the original .csv file uploaded to dryad. Also, two of the tissue replicates were renamed within the unique_region column to enable more accurate spatial calculations using this column as a unique identifier of tissue coordinates.
Code/Software
n/a
Methods
For a detailed description of each of the steps of protocols and processes to obtain this data see the detailed materials and methods in the associated manuscript. Briefly, intestine pieces from 8 different sites across the small intestine and colon were taken and frozen in OCT. These were assembled into an array of 4 tissues, cut into 7 um slices, and stained with a panel of 54 CODEX DNA-oligonucleotide barcoded antibodies. Tissues were imaged with a Keyence microscope at 20x objective and then processed using image stitching, drift compensation, deconvolution, and cycle concatenation. Processed data were then segmented using CellVisionSegmenter, a neural network R-CNN-based single-cell segmentation algorithm. Cell type analysis was completed on B004, 5, and 6 by z normalization of protein markers used for clustering and then overclustered using leiden-based clustering. The cell type labels were verified by looking back at the original image. Cell type labels were transferred to other donors using STELLAR framework for annotating spatially resolved single-cell data as described in detail in the companion Nature Methods manuscript. With set cell type labels we performed neighborhood analysis by clustering windows of the 10 nearest neighbors around a given cell and were named based on cell type enrichment and location in the tissue. Similarly, communities of neighborhoods were determined by taking the 100 nearest neighbors with the neighborhood labels and clustered. Finally, tissue segments were determined through multiple rounds of clustering the 300 nearest neighbors of the community labels of each cell. Broad categories for cell types, neighborhoods, and communities were expert annotated based on epithelial, immune, or other stromal compartments.