Single cell RNA sequencing of human tissue along the stomach-intestinal tract

Beumer, Joep1 ; Geurts, Maarten H2; Geurts, Veerle1; Andersson-Rolf, Amanda1; Akkerman, Ninouk1; Völlmy, F3; Krueger, Daniel1; Busslinger, Georg A4; Martínez-Silgado, Adriana1; Boot, Charelle1; Yousef Yengej, Fjodor A1; Jens Puschhof, Jens1; Van de Wetering, Wiline J5; Knoops, Kevin5; Peters, Peter. J5; Vivié, Judith A6; Mooijman, Dylan6; Van Es, Johan1; Clevers, Hans 7

Published Oct 30, 2024 on Dryad. https://doi.org/10.5061/dryad.zgmsbccn1

Data files

Oct 30, 2024 version files 503.20 MB

counts.csv

499.63 MB
obs_metadata.csv

449.02 KB
README.md

4.88 KB
var_metadata.csv

3.12 MB

Abstract

Enteroendocrine cells (EECs) are gut epithelial cells that respond to intestinal contents by secreting hormones, including incretins GLP-1 and GIP, which regulate multiple physiological processes. Hormone release is controlled through metabolite-sensing proteins. Low expression, interspecies differences, and existence of multiple EEC subtypes have posed challenges to the study of these sensors. We describe differentiation of stomach EECs to complement existing intestinal organoid protocols. CD200 emerged as a pan-EEC surface marker, allowing deep transcriptomic profiling from primary human tissue along the stomach-intestinal tract. We generated loss-of-function mutations in 22 receptors and subjected organoids to ligand-induced secretion experiments. We delineate the role of individual human EEC sensors in hormone secretion, including GLP-1. These represent potential pharmacological targets to influence appetite, bowel movement, insulin sensitivity and mucosal immunity.

https://doi.org/10.5061/dryad.zgmsbccn1

Description of the data and file structure

Processed, normalized RNA-seq data performed according to the VASA-Seq manuscript (DOI: https://doi-org.utrechtuniversity.idm.oclc.org/10.1038/s41587-022-01361-8 of all called genes,

This set includes

File: counts.csv
Count matrix
File: obs_metadata.csv
observation metadata table with information about tissue origin, cluster number and cell type etc.
File: var_metadata.csv
variable metadata table with information about Gene name, total counts, mean counts etc.

Data structure

Data is structured according to anndata object
https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.html

Cells are rows (observations)

Genes are column (variables)

File details

Details for: counts.csv

Description: A comma-delimited file that contains read counts that represent the normalized number of sequencing reads mapped to each gene/transcript.
Format(s): .csv
Sequencing Technology: Illumina

Details for: obs_metadata.csv

Description: A comma-delimited file that contains observation metadata, providing information about each cell in the scRNA-seq dataset. Each row corresponds to a single cell, and each column represents a different metadata attribute.
Format(s): .csv
Variables:
- Plate_number: The plate number on which the cells were processed. Pl1- pl11)
- Plate_ID: A unique identifier for each plate.
  - 'HUB_AM_003', 'HUB_JB_058 ', 'HUB_JB_067', 'HUB_JB_070', 'HUB_JB_072','HUB_JB_073', 'HUB_JB_75', 'HUB_JB_76', 'HUB_JB_77', 'HUB_JB_78', 'HUB_JB_s050'
- Library_prep: The library preparation method used for scRNA-seq.
  - VASA-Seq
  - CEL-Seq2
- CD200_status: The expression status of the CD200 marker
  - Positive
  - Negative
  - Tbd (Unknown at time of sorting)
- Tissue: The tissue from which the cells were isolated.
  - colon
  - corpus
  - duodenum
  - ileum
  - ileum_crypt
  - ileum_villus
  - pylorus
- Patient_number: Identifier to be able to separate samples
  - 1-4
- n_genes_by_counts: The number of genes detected in each cell based on read counts.
- total_counts: The total number of reads mapped to genes in each cell.
- total_counts_mt: The total number of reads mapped to mitochondrial genes in each cell.
- pct_counts_mt: The percentage of reads mapped to mitochondrial genes in each cell.
- total_counts_ribo: The total number of reads mapped to ribosomal genes in each cell.
- pct_counts_ribo: The percentage of reads mapped to ribosomal genes in each cell.
- percent_mt2: same as pct_counts_mt
- n_genes: The number of genes detected in each cell.
- S_score: A score representing the cell cycle S phase activity.
- G2M_score: A score representing the cell cycle G2/M phase activity.
- phase: The assigned cell cycle phase based on the scores (e.g., G1, S, G2M).
- leiden_0.4 to leiden_4.5: Cluster assignments from the Leiden clustering algorithm at different resolutions (0.4 to 4.5).
- louvain_0.4 to louvain_2.0: Cluster assignments from the Louvain clustering algorithm at different resolutions (0.4 to 2.0).
- cell type: The annotated cell type for each cell
  - Brunner gland
  - CBCs
  - Colonocyte
  - D cells
  - ECL
  - ECs
  - Endothelial
  - Enterocytes
  - G cells
  - Goblet cells
  - Immune/plasma cells
  - MX cells
  - Mucinous (gastric)
  - Progenitors
  - TA cells
  - Tuft cells
  - X cells

Details for: var_metadata.csv

Description: A comma-delimited file that contains variable metadata, providing information about each gene in the scRNA-seq dataset. Each row corresponds to a gene, and each column represents a different metadata attribute.
Format(s): .csv
Variables:
- mt: Indicates whether a gene is mitochondrial. (e.g., True, False)
- ribo: Indicates whether a gene is ribosomal. (e.g., True, False)
- n_cells_by_counts: The number of cells expressing a particular gene (where expression is defined as having one or more counts).
- mean_counts: The average expression level (mean count) of a gene across all cells.
- pct_dropout_by_counts: The percentage of cells where a particular gene has zero counts (not detected).
- total_counts: The total number of reads mapped to a particular gene across all cells.
- n_cells: The total number of cells in the dataset.
- Gene_Name: The official gene symbol or name.

Works referencing this dataset

https://www.science.org/doi/10.1126/science.adl1460