Skip to main content
Dryad

scRNA data from: Organization of the human Intestine at single cell resolution

Cite this dataset

Becker, Winston (2023). scRNA data from: Organization of the human Intestine at single cell resolution [Dataset]. Dryad. https://doi.org/10.5061/dryad.8pk0p2ns8

Abstract

The human adult intestinal system is a complex organ that is approximately 9 meters long and performs a variety of complex functions including digestion, nutrient absorption, and immune surveillance. We performed snRNA-seq on 8 regions of of the human intestine (duodenum, proximal-jejunum, mid-jejunum, ileum, ascending colon, transverse colon, descending colon, and sigmoid colon) from 9 donors (B001, B004, B005, B006, B008, B009, B010, B011, and B012). In the corresponding paper, we find cell compositions differ dramatically across regions of the intestine and demonstrate the complexity of epithelial subtypes. We map gene regulatory differences in these cells suggestive of a regulatory differentiation cascade, and associate intestinal disease heritability with specific cell types. These results describe the complexity of the cell composition, regulation, and organization in the human intestine, and serve as an important reference map for understanding human biology and disease.

Methods

For a detailed description of each of the steps to obtain this data see the detailed materials and methods in the associated manuscript. Briefly, intestine pieces from 8 different sites across the small intestine and colon were flash frozen. Nuclei were isolated from each sample and the resulting nuclei were processed with either 10x scRNA-seq using Chromium Next GEM Single Cell 3’ Reagent Kits v3.1 (10x Genomics, 1000121) or Chromium Next GEM Chip G Single Cell Kits (10x Genomics, 1000120) or 10x multiome sequencing using Chromium Next GEM Single Cell Multiome ATAC + Gene Expression Kits (10x Genomics, 1000283).

Initial processing of snRNA-seq data was done with the Cell Ranger Pipeline (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger) by first running cellranger mkfastq to demultiplex the bcl files and then running cellranger count. Since nuclear RNA was sequenced, data were aligned to a pre-mRNA reference. Initial processing of the mutiome data, including alignment and generation of fragments files and expression matrices, was performed with the Cell Ranger ARC Pipeline. The raw expression matrices from these pipelines are included here. Downstream processing was performed in R, using the Seurat package.

Usage notes

The dataset includes the raw expression matrices generated by Cell Ranger or Cell Ranger ARC for each individual sample in the study. Each raw expression matrix is saved as 3 files (***_barcodes.tsv.gz, ***_features.tsv.gz, and ***_matrix.mtx.gz) where *** is replaced by the sample name. These files can be opened in any programming language. 

In addition to the individual expression matrices, the processed scRNA data is included as Seurat objects saved as .rds files. The Seurat objects are divided into an object for all immune cells (clustered_immune_object.rds), an object for all stromal cells (clustered_stromal_object.rds), and four objects for epithelial cells from the four primary regions of the intestine (clustered_duodenum_object.rds, clustered_jejunum_object.rds, clustered_ileum_object.rds, and clustered_colon_object.rds). Subclustered populations of enteroendocrine (clustered_enteroendocrine_object.rds) and specialized secretory epithelial cells (clustered_secretory_special_object.rds) are also included as separate Seurat objects. The Seurat objects contain the DecontX corrected counts (corrected for ambient RNA contamination) in addition to the raw counts. These files can be opened in R.

There are also the files that include umap x and y coordinates and our cell type annotations for the filtered cells in the study named ***_UMAP_CellType.tsv. These files have four columns, Cell ID (unique name of the cell consisting of sample and RNA barcode), UMAP_1 (umap coordinate 1), UMAP_2 (umap coordinate 2), and the CellType (cell type annotation that we assigned to each cell). These tsv files can be opened in any programming language. 

A metadata file is included that contains the names of all samples in the study (SampleNameRNA), whether the sample was processed with the multiome kit (Multiome), the donor (Donor), and the location in the intestine the sample was collected from (Location).

Funding

National Institutes of Health