Single cell RNA sequencing of human tissue along the stomach-intestinal tract
Data files
Oct 30, 2024 version files 503.20 MB
-
counts.csv
499.63 MB
-
obs_metadata.csv
449.02 KB
-
README.md
4.88 KB
-
var_metadata.csv
3.12 MB
Abstract
Enteroendocrine cells (EECs) are gut epithelial cells that respond to intestinal contents by secreting hormones, including incretins GLP-1 and GIP, which regulate multiple physiological processes. Hormone release is controlled through metabolite-sensing proteins. Low expression, interspecies differences, and existence of multiple EEC subtypes have posed challenges to the study of these sensors. We describe differentiation of stomach EECs to complement existing intestinal organoid protocols. CD200 emerged as a pan-EEC surface marker, allowing deep transcriptomic profiling from primary human tissue along the stomach-intestinal tract. We generated loss-of-function mutations in 22 receptors and subjected organoids to ligand-induced secretion experiments. We delineate the role of individual human EEC sensors in hormone secretion, including GLP-1. These represent potential pharmacological targets to influence appetite, bowel movement, insulin sensitivity and mucosal immunity.
https://doi.org/10.5061/dryad.zgmsbccn1
Description of the data and file structure
Processed, normalized RNA-seq data performed according to the VASA-Seq manuscript (DOI: https://doi-org.utrechtuniversity.idm.oclc.org/10.1038/s41587-022-01361-8 of all called genes,
This set includes
- File: counts.csv \
Count matrix - File: obs_metadata.csv\
observation metadata table with information about tissue origin, cluster number and cell type etc. - File: var_metadata.csv\
variable metadata table with information about Gene name, total counts, mean counts etc.
Data structure
Data is structured according to anndata object \
https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.html
Cells are rows (observations)
Genes are column (variables)
File details
Details for: counts.csv
- Description: A comma-delimited file that contains read counts that represent the normalized number of sequencing reads mapped to each gene/transcript.
- Format(s): .csv
- Sequencing Technology: Illumina
Details for: obs_metadata.csv
- Description: A comma-delimited file that contains observation metadata, providing information about each cell in the scRNA-seq dataset. Each row corresponds to a single cell, and each column represents a different metadata attribute.
- Format(s): .csv
- Variables:
- Plate_number: The plate number on which the cells were processed. Pl1- pl11)
- Plate_ID: A unique identifier for each plate.
- ‘HUB_AM_003’, ‘HUB_JB_058 ‘, ‘HUB_JB_067’, ‘HUB_JB_070’, ‘HUB_JB_072’,’HUB_JB_073’, ‘HUB_JB_75’, ‘HUB_JB_76’, ‘HUB_JB_77’, ‘HUB_JB_78’, ‘HUB_JB_s050’
- Library_prep: The library preparation method used for scRNA-seq.
- VASA-Seq
- CEL-Seq2
- CD200_status: The expression status of the CD200 marker
- Positive
- Negative
- Tbd (Unknown at time of sorting)
- Tissue: The tissue from which the cells were isolated.
- colon
- corpus
- duodenum
- ileum
- ileum_crypt
- ileum_villus
- pylorus
- Patient_number: Identifier to be able to separate samples
- 1-4
- n_genes_by_counts: The number of genes detected in each cell based on read counts.
- total_counts: The total number of reads mapped to genes in each cell.
- total_counts_mt: The total number of reads mapped to mitochondrial genes in each cell.
- pct_counts_mt: The percentage of reads mapped to mitochondrial genes in each cell.
- total_counts_ribo: The total number of reads mapped to ribosomal genes in each cell.
- pct_counts_ribo: The percentage of reads mapped to ribosomal genes in each cell.
- percent_mt2: same as pct_counts_mt
- n_genes: The number of genes detected in each cell.
- S_score: A score representing the cell cycle S phase activity.
- G2M_score: A score representing the cell cycle G2/M phase activity.
- phase: The assigned cell cycle phase based on the scores (e.g., G1, S, G2M).
- leiden_0.4 to leiden_4.5: Cluster assignments from the Leiden clustering algorithm at different resolutions (0.4 to 4.5).
- louvain_0.4 to louvain_2.0: Cluster assignments from the Louvain clustering algorithm at different resolutions (0.4 to 2.0).
- cell type: The annotated cell type for each cell
- Brunner gland
- CBCs
- Colonocyte
- D cells
- ECL
- ECs
- Endothelial
- Enterocytes
- G cells
- Goblet cells
- Immune/plasma cells
- MX cells
- Mucinous (gastric)
- Progenitors
- TA cells
- Tuft cells
- X cells
Details for: var_metadata.csv
- Description: A comma-delimited file that contains variable metadata, providing information about each gene in the scRNA-seq dataset. Each row corresponds to a gene, and each column represents a different metadata attribute.
- Format(s): .csv
- Variables:
- mt: Indicates whether a gene is mitochondrial. (e.g., True, False)
- ribo: Indicates whether a gene is ribosomal. (e.g., True, False)
- n_cells_by_counts: The number of cells expressing a particular gene (where expression is defined as having one or more counts).
- mean_counts: The average expression level (mean count) of a gene across all cells.
- pct_dropout_by_counts: The percentage of cells where a particular gene has zero counts (not detected).
- total_counts: The total number of reads mapped to a particular gene across all cells.
- n_cells: The total number of cells in the dataset.
- Gene_Name: The official gene symbol or name.
Works referencing this dataset
Human stomach and intestine library preparation, and sequencing
Using BD FACS Fusion single cells were sorted into 384-well hardshell plates containing CELseq2/SORT-seq1, primers. Post sorting, plates were sealed (Greiner, SILVERseal sealer, 676090) and spun down at 2,000 revolution centrifugal force (r.c.f.) for 2 minutes (Eppendorf 5810R). Plates were stored at −80 °C before being processed. All plates but plate JB-s050 were processed according to the VASA-plate library preparation protocol, previously published Plate JB-s050 was processed using the CEL-Seq2 library preparation protocol. Samples were sequenced on a NextSeq 2000 with the following parameters: 150 000 reads/cell, Read 1 26 cycles (Index 6 cycles), Read 2 60 cycles. Plate AM003 was an exception and for which Read 1 30 cycles, and Read 2 120 cycles. FASTQ file pre-processing (VASA-plate) and mapping of VASA data was performed according to previously published methods. In short, read 1 contains the 6 nt long UMI and cell-specific barcode (one unique barcode per well in a 384-well plate), and is thus used to assign reads to a specific cell/well. Read 2 was trimmed and mapped to the human GRCh38 genome (Ensembl 99). Data from all plates were mapped to the genome indexed for 60 nucleotides.
Libraries were generated following the protocol described for VASA-Seq https://doi-org.utrechtuniversity.idm.oclc.org/10.1038/s41587-022-01361-8
For references see original manuscript https://www.science.org/doi/10.1126/science.adl1460