Cell types, positional codes, and enhancers contributing to human facial individuality and pathology

Erickson, Alek1 2; Gerstein, Yakov3; Galimullina, Rozalina 3 ; Waern, Felix2; Riba, Tonyak1 2; Kaiser, Marketa4; Schnyder, Daniela2; Li, Lei5; Vaulin, Nikita3; Samuelsson, Saga5; Murtazina, Aliia2; Isaev, Sergei3; Bouderlique, Thibault3; Parobkova, Viktoria6; Zeberg, Hugo2; Zikmund, Tomas6; Kaiser, Jozef6; Fried, Kaj2; Shagimardanova, Elena7; Shigapova, Leyla7; Gazizova, Guzel7; Reva, Ivan7; Deviatiiarov, Ruslan8 7; Vymazalova, Katerina9; Goovaerts, Seppe10; Lor, Yuk Kit2; Castelo-Branco, Goncalo2; Claes, Peter10; Li, Xiaofei2; Gusev, Oleg7 8; Chagin, Andrei2 5; Sundstrom, Erik2; Kharchenko, Peter11; Adameyko, Igor 3 2

Published May 28, 2026 on Dryad. https://doi.org/10.5061/dryad.7m0cfxq8t

Data files

May 28, 2026 version files 71.16 GB

README.md

28.43 KB
RNAseq.tar.gz

5.74 GB
single_cell_multiom.tar.gz

62.64 GB
spatial.tar.gz

2.78 GB

Abstract

We created a multi-modal facial atlas across embryonic weeks 6-11, providing single cell transcriptomics, chromatin accessibility and spatial transcriptomics, all at single cell resolution. We characterized 56 cell states, mapping mesenchymal subtypes and their gene-enhancer cis-regulatory landscapes in space and time. Gene expression associations with facial traits were strongest in early mesenchymal progenitor cells, gradually becoming more region-restricted. Autocorrelation analysis revealed patterning genes that define spatial neighborhoods of mesenchyme, potentially explaining trait-specificity of their nearby variants. Enhancers of key disease-related genes were found to be likely vehicles for generating facial variation in modern human populations. One such enhancer, linked to PAX1 expression, appears to be essential for normal skeletal development in mice. Finally, GWAS prediction, cell signaling interaction analysis and validation in mice revealed that peripheral nerves fine-tune maxilla shape during embryonic development. This multi-modal atlas should be a useful resource for researchers studying mechanisms of craniofacial disorders and the gene regulatory mechanisms driving human facial individuality.

Dataset DOI: 10.5061/dryad.7m0cfxq8t

Description of the data and file structure

Sample preparation for Single Cell RNA-seq and Single Cell Multiome ATAC + Gene Expression

Each sample was dissociated separately using Collagenase P, 1mL of 25 U/mL Dispase, 1mL of 10X TrypLE Select, 7mL of HBSS. The chondrocranial region and surrounding tissue was separated and placed in ice cold PBS prior to tissue dissociation. Then, dissection of different regions was performed under a stereoscope with darkfield illumination to enrich samples for cells originating from a specific facial region (mandibular/lower jaw region, eye/maxillary region , external nose region, external ear region, forehead region). A subset of samples was processed directly without regional separation. Each dissected tissue was chopped into 1mm pieces and placed into digestion buffer and incubated in a shaking incubator at 37°C for 4 periods of 5 minutes. The supernatant was transferred to a new 15mL Falcon tube and I added 2% FBS in HBSS on ice to each tube until they were filled up to 10mL, and from then on samples were kept on ice. Samples were centrifuged at 500g for 15 minutes at 4°C, and each pellet was resuspended in 2mL of 2% FBS/HBSS and subsequently passed through a 40-micron filter. Cell suspensions were placed into 2mL Eppendorf tubes and spun for 500g for 5min at 4°C on the tabletop centrifuge. Supernatant was removed and cell pellets were resuspended in PBS/0.05% BSA. For 10X 3' Gene Expression libraries, 400mL methanol was added to the suspension, and samples stored at -80°C. For 10X snRNA/ATAC, pellets were suspended in 20% FBS & 10% DMSO, in RPMI. Cells were brought to +4°C and pelleted. Supernatant was removed and 500 uL of DPBS containing 1.0% BSA and 0.5U/μl RNAse Out was used to wash cells before sorting by FACS. After sorting, cells were concentrated to 10,000 cells in 50 uL. Nuclear isolation for multi-omics experiments was performed in accordance with the 10x Genomics Chromium Single Cell Multiome ATAC + Gene Expression Kit. Library preparation was done in accordance with 10x Genomics Chromium Single Cell 3’ protocol for Reagent Kits v3.1, and the 10x Genomics Chromium Single Cell Multiome ATAC + Gene Expression Kit. For human fetal cells we aimed to recover 10000 cells or nuclei per chip with Illumina NovaSeq 6000 Sequencing System (NovaSeq 6000 S1 Reagent Kit or NovaSeq 6000 S2 Reagent Kit were used). The read setup was the standard recommended from 10X Genomics: Read 1: 28 cycles (Cell barcode and UMI), i7 index: 8 cycles (Sample index), Read 2: 91 cycles.

Visium and Stereoseq sample preparation

For spatial analysis human tissue were snap-frozen using liquid nitrogen and stored at -80C. Tissue blocks were sectioned on a cryostat and placed on a 10X Visium capture slide. RNA quality of all samples was determined to be RIN > 9. Library preparation was performed according to the user manuals for Stereoseq (en.stomics.tech) and Visium (10xgenomics.com).

Files and variables

/single_cell_multiom.tar.gz

`single_cell_multiom/` — Raw single-cell multiome data

This directory contains single-cell multiome data (joint gene expression and chromatin accessibility, ATAC) for 12 samples. Each sample has three files: a feature-barcode matrix, an ATAC fragment file, and a tabix index for the fragment file. All filenames follow the pattern <filetype>_<region>_<sampleindex>.

Feature-barcode matrices

Each filtered_feature_bc_matrix_*.h5 file is an HDF5-formatted feature-barcode matrix produced by 10x Genomics Cell Ranger ARC. In each matrix, rows represent features — the genes (gene expression) and peaks (chromatin accessibility / ATAC) detected — and columns represent the cell-associated barcodes, i.e. all barcodes identified as corresponding to a cell. Values are the counts for each feature in each cell barcode.

ATAC fragment files

For each sample there are two accompanying ATAC-seq files:

atac_fragments_*.tsv.gz — a gzip-compressed tabular file in which each line represents a unique ATAC-seq fragment captured by the assay. Each record gives the genomic interval of the fragment, the associated cell barcode, and a read-support count.
atac_fragments_*.tsv.gz.tbi — a tabix index of the corresponding fragment file. It indexes the fragment intervals and enables fast random access to records from an arbitrary genomic interval, without decompressing the whole file. Each .tbi file must be kept together with its matching .tsv.gz file.

Sample metadata and file mapping

The sample index in each filename identifies the sample. Region is the anatomical region the sample was dissected from, and Age (weeks) is the developmental embryonic stage at collection.

Sample index	Region	Age (weeks)	Matrix file	Fragment file	Tabix index
MO_23_009	nose	6.5	filtered_feature_bc_matrix_nose_MO_23_009.h5	atac_fragments_nose_MO_23_009.tsv.gz	atac_fragments_nose_MO_23_009.tsv.gz.tbi
MO_23_010	eye	6.5	filtered_feature_bc_matrix_eye_MO_23_010.h5	atac_fragments_eye_MO_23_010.tsv.gz	atac_fragments_eye_MO_23_010.tsv.gz.tbi
MO_23_011	jaw	6.5	filtered_feature_bc_matrix_jaw_MO_23_011.h5	atac_fragments_jaw_MO_23_011.tsv.gz	atac_fragments_jaw_MO_23_011.tsv.gz.tbi
Px_101	anterior	6.5	filtered_feature_bc_matrix_anterior_Px_101.h5	atac_fragments_anterior_Px_101.tsv.gz	atac_fragments_anterior_Px_101.tsv.gz.tbi
Px_102	posterior	6.5	filtered_feature_bc_matrix_posterior_Px_102.h5	atac_fragments_posterior_Px_102.tsv.gz	atac_fragments_posterior_Px_102.tsv.gz.tbi
MO_21_002	nose	10.5	filtered_feature_bc_matrix_nose_MO_21_002.h5	atac_fragments_nose_MO_21_002.tsv.gz	atac_fragments_nose_MO_21_002.tsv.gz.tbi
MO_21_003	jaw	10.5	filtered_feature_bc_matrix_jaw_MO_21_003.h5	atac_fragments_jaw_MO_21_003.tsv.gz	atac_fragments_jaw_MO_21_003.tsv.gz.tbi
MO_21_004	eye	10.5	filtered_feature_bc_matrix_eye_MO_21_004.h5	atac_fragments_eye_MO_21_004.tsv.gz	atac_fragments_eye_MO_21_004.tsv.gz.tbi
MO_24_019	jaw	11	filtered_feature_bc_matrix_jaw_MO_24_019.h5	atac_fragments_jaw_MO_24_019.tsv.gz	atac_fragments_jaw_MO_24_019.tsv.gz.tbi
MO_24_020	eye	11	filtered_feature_bc_matrix_eye_MO_24_020.h5	atac_fragments_eye_MO_24_020.tsv.gz	atac_fragments_eye_MO_24_020.tsv.gz.tbi
MO_24_021	nose	11	filtered_feature_bc_matrix_nose_MO_24_021.h5	atac_fragments_nose_MO_24_021.tsv.gz	atac_fragments_nose_MO_24_021.tsv.gz.tbi
MO_24_022	rest (remaining tissue)	11	filtered_feature_bc_matrix_rest_MO_24_022.h5	atac_fragments_rest_MO_24_022.tsv.gz	atac_fragments_rest_MO_24_022.tsv.gz.tbi

The three files for any given sample (matrix, fragment file, fragment index) share the same <region>_<sampleindex> suffix and belong together.

Loading the data

The .h5 matrices can be loaded with standard single-cell tools such as Cell Ranger ARC, Scanpy (scanpy.read_10x_h5), or Seurat/Signac (Read10X_h5). The atac_fragments_*.tsv.gz files, together with their .tbi indices, are read by ATAC-aware tools such as Signac (CreateFragmentObject) or accessed directly with tabix.

Processed files:

/multiom_all_processed.rds - Seurat object saved in RDS file with all multiom data preprocessed, normalized and with metadata information included. Could be loaded using readRDS("$filepath") and further used Seurat and Signac R packages.

/subset_multiom_processed.rds - sample version of multiom_all_processed.rds containing all the assays and metadata as in original data with around 100 cells per major cell types.

Software requirements for processed files

These objects are Seurat v5 objects and open in R using the Seurat and Signac packages. They were created under Seurat 5.0.0 and verified to load correctly under the following versions; we recommend using these or newer:

R version 4.4 or newer (verified on 4.5.2)
Seurat version 5.0.0 or newer (verified on 5.4.0)
SeuratObject version 5.0.2 or newer (verified on 5.3.0)
Signac version 1.16.0 or newer (verified on 1.17.1)

The objects require Seurat v5 and will not load correctly under Seurat v4.

To load preprocessed files (in R):


## install.packages("Signac")

## install.packages("Seurat")

library(Signac)

library(Seurat)

obj<-readRDS("subset_multiom_processed.rds")

##basic object overview

obj

/RNAseq.tar.gz

This folder contains RNA sequencing data from various parts of the face.

The dataset includes both single-cell RNA-seq (scRNAseq) and single-nucleus RNA-seq (snRNAseq) files in HDF5 (.h5) format.

/count_matrices - folder structure for raw counts

scRNAseq/ : single-cell RNA-seq count matrices

snRNAseq/ : single-nucleus RNA-seq count matrices

Each file name indicates the tissue/region and the developmental age (between PCW6.5 - PCW11.5), corresponding to different embryonic stages.

File Description by Part of the Face

Whole Face Samples

F1_6_5.h5

F2_6_5.h5
half_9_5.h5

half_11_5.h5

Ear

ear_6_5.h5

ear_9.h5

ear_11p5.h5

Eye

eye_6_5.h5

eye_7.h5

eye_9.h5

eye_11.h5

eye_11p5.h5

eyes_10_5.h5

Jaw

jaw_6_5.h5
jaw_7.h5
jaw_9.h5
jaw_10_5.h5
jaw_11.h5
jaw_11p5.h5

Nose

nose_6_5.h5
nose_7.h5
nose_9.h5
nose_10_5.h5
nose_11.h5
nose_11p5.h5

Other Facial Parts

rest_7.h5
rest_11.h5
suture_10_5.h5

Twin Samples

twin1_7_5.h5
twin2_7_5.h5

Single-Nucleus RNAseq (snRNAseq)

Eye

sn_eye_6.h5
sn_eye_10.h5
sn_eye_11.h5

Jaw

sn_jaw_6.h5
sn_jaw_10.h5
sn_jaw_11.h5

Nose

sn_nose_6.h5
sn_nose_10.h5
sn_nose_11.h5

Other Regions

sn_anterior_6_5.h5
sn_posterior_6_5.h5
sn_rest_11.h5

/all_merged1.h5ad - all scRNAseq samples merged and preprocessed

Preprocessing and merging procedure for all_merged1.h5ad file

Processing of count matrices

The RNA sequencing data were processed in a Linux environment using Python-based single-cell analysis workflows implemented primarily with Scanpy, with additional downstream analyses using scVelo, CellRank, Harmony, and scFates where applicable.

Initial per-sample preprocessing

Each dataset was processed independently from 10x Genomics HDF5 count matrices using scanpy.read_10x_h5. Gene names were made unique.

Initial filtering steps applied across all datasets included removal of cells with fewer than 200 detected genes and removal of genes detected in fewer than 3 cells.

Mitochondrial genes were annotated as genes beginning with MT-, and per-cell QC metrics were computed, including total counts, number of detected genes, and percentage of mitochondrial counts.

For each dataset, QC distributions were visually inspected using histograms of these metrics. Based on these distributions, dataset-specific thresholds were applied for gene complexity and total counts. In contrast, mitochondrial filtering was performed using a fixed threshold across all datasets, with cells exceeding 5% mitochondrial counts removed.

QC metrics were recalculated after filtering.

Doublet detection and removal

Doublets were identified using Scrublet with a threshold of 0.2. The score distribution was inspected, and cells classified as predicted doublets were removed prior to downstream analysis.

Gene filtering

After QC and doublet removal, a predefined list of sex-linked genes was excluded from the dataset if present. This list included X- and Y-chromosome genes (e.g., XIST, SRY, ZFY, RPS4Y1, DDX3Y, KDM5D).

Dataset concatenation

Each processed dataset was saved as an AnnData (.h5ad) object. Datasets were merged in a two-step procedure. First, samples were grouped and concatenated using an outer join to preserve the union of genes within each group.

Batch 1 (outer join):

ear_6_5, eyes_10_5, eye_6_5, half_11_5, half_9_5, jaw_10_5, jaw_6_5, nose_10_5, nose_6_5, suture_10_5, twin1_7_5, twin2_7_5, sn_anterior_6_5, sn_posterior_6_5

Batch 2 (outer join):

ear_11p5, ear_9, eye_11, eye_11p5, eye_7, eye_9, F1_6_5, F2_6_5, jaw_11, jaw_11p5, jaw_7, jaw_9, nose_11, nose_11p5, nose_7, nose_9, rest_11, rest_7, sn_eye_10, sn_eye_11, sn_eye_6, sn_jaw_10, sn_jaw_11, sn_jaw_6, sn_nose_10, sn_nose_11, sn_nose_6, sn_rest_11

These grouped datasets were then merged using an inner join to retain only genes shared across all datasets, ensuring a consistent feature space for downstream analysis.

Cell cycle scoring and regression

Following dataset concatenation, cell cycle effects were quantified and regressed out. Cell cycle genes were obtained from a predefined list of human cell cycle–associated genes and divided into S phase and G2/M phase gene sets. Cells were scored using scanpy.tl.score_genes_cell_cycle, and the resulting S and G2M scores were regressed out using scanpy.pp.regress_out. This step was performed prior to downstream dimensionality reduction and clustering.

The final merged object used for downstream analysis is provided as all_merged1.h5ad.

Code availability

All analysis steps described above are implemented in the provided Jupyter notebooks and can be used to reproduce the results.

/spatial.tar.gz

References for this data description:

Visium: https://www.10xgenomics.com/support/software/space-ranger/latest/analysis/outputs/spatial-outputs

Visium: https://www.10xgenomics.com/support/software/space-ranger/latest/analysis/outputs/output-overview

Stereo-seq: https://en.stomics.tech/service/new-saw-operation-manual.html

Description of the data and file structure

The files in this repository are meant to be processed in a Linux environment using conda as a package handler with Python and R as the main processing softwares. The folder structure is divided into the technology/sample/files.

Visium

V12U07-297_A1_alltissue
1. outs
  1. analysis
    1. clustering
      1. gene_expression_grapclust
        
        clusters.csv
      2. gene_expression_kmeans_10_clusters
        
        clusters.csv
      3. gene_expression_kmeans_2_clusters
        
        clusters.csv
      4. gene_expression_kmeans_3_clusters
        
        clusters.csv
      5. gene_expression_kmeans_4_clusters
        
        clusters.csv
      6. gene_expression_kmeans_5_clusters
        
        clusters.csv
      7. gene_expression_kmeans_6_clusters
        
        clusters.csv
      8. gene_expression_kmeans_7_clusters
        
        clusters.csv
      9. gene_expression_kmeans_8_clusters
        
        clusters.csv
      10. gene_expression_kmeans_9_clusters
        
        clusters.csv
    2. diffexp
      1. gene_expression_graphclust
        
        differential_expression.csv
      2. gene_expression_graphclust_10_clusters
        
        differential_expression.csv
      3. gene_expression_graphclust_2_clusters
        
        differential_expression.csv
      4. gene_expression_graphclust_3_clusters
        
        differential_expression.csv
      5. gene_expression_graphclust_4_clusters
        
        differential_expression.csv
      6. gene_expression_graphclust_5_clusters
        
        differential_expression.csv
      7. gene_expression_graphclust_6_clusters
        
        differential_expression.csv
      8. gene_expression_graphclust_7_clusters
        
        differential_expression.csv
      9. gene_expression_graphclust_8_clusters
        
        differential_expression.csv
      10. gene_expression_graphclust_9_clusters
        
        differential_expression.csv
    3. pca
      1. gene_expression_10_components
        
        components.csv
        
        dispersion.csv
        
        features_selected.csv
        
        projection.csv
        
        variances.csv
    4. tsne
      1. gene_expression_2_components
        
        projection.csv
    5. umap
      1. gene_expression_2_components
        
        projection.csv
  2. filtered_feature_bc_matrix
    - barcodes.tsv.gz
    - features.tsv.gz
    - matrix.mtx.gz
  3. raw_feature_bc_matrix
    - barcodes.tsv.gz
    - features.tsv.gz
    - matrix.mtx.gz
  4. spatial
    - aligned_fiducials.jpg
    - detected_tissue_image.jpg
    - scalefactors_json.json
    - spatial_enrichment.csv
    - tissue_hires_image.png
    - tissue_lowres_images.png
    - tissue_positions.csv
  5. cloupe.cloupe
  6. filtered_feature_bc_matrix.h5
  7. metrics_summary.csv
  8. molecule_info.h5
  9. raw_feature_bc_mattrix.h5
  10. web_summary.html
V12U07-297_B1_alltissue
1. outs
  1. analysis
    1. clustering
      1. gene_expression_grapclust
        
        clusters.csv
      2. gene_expression_kmeans_10_clusters
        
        clusters.csv
      3. gene_expression_kmeans_2_clusters
        
        clusters.csv
      4. gene_expression_kmeans_3_clusters
        
        clusters.csv
      5. gene_expression_kmeans_4_clusters
        
        clusters.csv
      6. gene_expression_kmeans_5_clusters
        
        clusters.csv
      7. gene_expression_kmeans_6_clusters
        
        clusters.csv
      8. gene_expression_kmeans_7_clusters
        
        clusters.csv
      9. gene_expression_kmeans_8_clusters
        
        clusters.csv
      10. gene_expression_kmeans_9_clusters
        
        clusters.csv
    2. diffexp
      1. gene_expression_graphclust
        
        differential_expression.csv
      2. gene_expression_graphclust_10_clusters
        
        differential_expression.csv
      3. gene_expression_graphclust_2_clusters
        
        differential_expression.csv
      4. gene_expression_graphclust_3_clusters
        
        differential_expression.csv
      5. gene_expression_graphclust_4_clusters
        
        differential_expression.csv
      6. gene_expression_graphclust_5_clusters
        
        differential_expression.csv
      7. gene_expression_graphclust_6_clusters
        
        differential_expression.csv
      8. gene_expression_graphclust_7_clusters
        
        differential_expression.csv
      9. gene_expression_graphclust_8_clusters
        
        differential_expression.csv
      10. gene_expression_graphclust_9_clusters
        
        differential_expression.csv
    3. pca
      1. gene_expression_10_components
        
        components.csv
        
        dispersion.csv
        
        features_selected.csv
        
        projection.csv
        
        variances.csv
    4. tsne
      1. gene_expression_2_components
        
        projection.csv
    5. umap
      1. gene_expression_2_components
        
        projection.csv
  2. filtered_feature_bc_matrix
    - barcodes.tsv.gz
    - features.tsv.gz
    - matrix.mtx.gz
  3. raw_feature_bc_matrix
    - barcodes.tsv.gz
    - features.tsv.gz
    - matrix.mtx.gz
  4. spatial
    - aligned_fiducials.jpg
    - detected_tissue_image.jpg
    - scalefactors_json.json
    - spatial_enrichment.csv
    - tissue_hires_image.png
    - tissue_lowres_images.png
    - tissue_positions.csv
  5. cloupe.cloupe
  6. filtered_feature_bc_matrix.h5
  7. metrics_summary.csv
  8. molecule_info.h5
  9. raw_feature_bc_mattrix.h5
  10. web_summary.html
V12U07-297_D1_alltissue
1. outs
  1. analysis
    1. clustering
      1. gene_expression_grapclust
        
        clusters.csv
      2. gene_expression_kmeans_10_clusters
        
        clusters.csv
      3. gene_expression_kmeans_2_clusters
        
        clusters.csv
      4. gene_expression_kmeans_3_clusters
        
        clusters.csv
      5. gene_expression_kmeans_4_clusters
        
        clusters.csv
      6. gene_expression_kmeans_5_clusters
        
        clusters.csv
      7. gene_expression_kmeans_6_clusters
        
        clusters.csv
      8. gene_expression_kmeans_7_clusters
        
        clusters.csv
      9. gene_expression_kmeans_8_clusters
        
        clusters.csv
      10. gene_expression_kmeans_9_clusters
        
        clusters.csv
    2. diffexp
      1. gene_expression_graphclust
        
        differential_expression.csv
      2. gene_expression_graphclust_10_clusters
        
        differential_expression.csv
      3. gene_expression_graphclust_2_clusters
        
        differential_expression.csv
      4. gene_expression_graphclust_3_clusters
        
        differential_expression.csv
      5. gene_expression_graphclust_4_clusters
        
        differential_expression.csv
      6. gene_expression_graphclust_5_clusters
        
        differential_expression.csv
      7. gene_expression_graphclust_6_clusters
        
        differential_expression.csv
      8. gene_expression_graphclust_7_clusters
        
        differential_expression.csv
      9. gene_expression_graphclust_8_clusters
        
        differential_expression.csv
      10. gene_expression_graphclust_9_clusters
        
        differential_expression.csv
    3. pca
      1. gene_expression_10_components
        
        components.csv
        
        dispersion.csv
        
        features_selected.csv
        
        projection.csv
        
        variances.csv
    4. tsne
      1. gene_expression_2_components
        
        projection.csv
    5. umap
      1. gene_expression_2_components
        
        projection.csv
  2. filtered_feature_bc_matrix
    - barcodes.tsv.gz
    - features.tsv.gz
    - matrix.mtx.gz
  3. raw_feature_bc_matrix
    - barcodes.tsv.gz
    - features.tsv.gz
    - matrix.mtx.gz
  4. spatial
    - aligned_fiducials.jpg
    - detected_tissue_image.jpg
    - scalefactors_json.json
    - spatial_enrichment.csv
    - tissue_hires_image.png
    - tissue_lowres_images.png
    - tissue_positions.csv
  5. cloupe.cloupe
  6. filtered_feature_bc_matrix.h5
  7. metrics_summary.csv
  8. molecule_info.h5
  9. raw_feature_bc_mattrix.h5
  10. web_summary.html

Stereo-seq
- A02989E1.tissue.gef
- A02994D6.tissue.gef
- C01939A4.tissue.gef
- C01939A6.tissue.gef
- A02993E1.tissue.gef
- A02994E6.tissue.gef
- C01939A5.tissue.gef
- C01939B2.tissue.gef

Files and folders

visium_sample/outs/analysis/clustering/gene_expression_kmeans_x_clusters/clusters.csv

Barcodes to kmeans cluster assignments.

visium_sample/outs/analysis/diffexp/gene_expression_graphclust_X_clusters/differential_expression.csv

Marker genes for each cluster from graph-based clustering.

visium_sample/outs/analysis/pca/gene_expression_10_components

- components.csv: PC loadings showing which genes contribute to each PC
- dispersion.csv: Gene dispersion 
- features_selected.csv: List of highly variable genes used for PCA
- projection.csv: PC coordinates for each spot
- variances.csv: Variance explained by each PC

visium_sample/outs/analysis/tsne/gene_expression_2_components/projection.csv

2D t-SNE coordinates for each spot

visium_sample/outs/analysis/umap/gene_expression_2_components/projection.csv

2D UMAP coordinates for each spot

visium_sample/outs/filtered_feature_bc_matrix

Filtered Space Ranger Feature Barcode Matrix and barcodes and features by themselves.
- barcodes.tsv.gz: List of spatial barcodes
- features.tsv.gz: Gene information 
- matrix.mtx.gz: Sparse matrix of gene counts

visium_sample/outs/raw_feature_bc_matrix

Raw Space Ranger Feature Barcode Matrix and barcodes and features by themselves.
- barcodes.tsv.gz: List of spatial barcodes
- features.tsv.gz: Gene information 
- matrix.mtx.gz: Sparse matrix of gene counts

visium_sample/outs/spatial

Files relating to the H&E tissue image. 
- aligned_fiducials.jpg: File to verify that fiducial alignment was successful.
- detected_tissue_image.jpg: Shows Space Ranger's automated tissue detection
- scalefactors_json.json: Conversion factors between coordinate systems
- spatial_enrichment.csv: Contains table of Moran's I values for each feature when specific conditions are met
- tissue_hires_image.png: Full-resolution image of tissue
- tissue_lowres_images.png: Downsampled image of tissue
- tissue_positions.csv: This text file contains a table with rows that correspond to spots

visium_sample/outs/cloupe.cloupe

Loupe Browser visualization and analysis file

visium_sample/outs/filtered_feature_bc_matrix.h5

Same information as filtered_features_bc_matrix/ but in HDF5 format.

visium_sample/outs/metrics_summary.csv

Run summary metrics in CSV format

visium_sample/outs/molecule_info.h5

Contains per-molecule information for all molecules that contain a valid barcode, valid UMI, and were assigned with high confidence to a gene barcode or bin.

visium_sample/outs/raw_feature_bc_mattrix.h5

Same information as raw_features_bc_matrix/ but in HDF5 format.

visium_sample/outs/web_summary.html

Run summary metrics and plots in HTML format

Stereo-seq\sample.gef

Compressed data containing as output from the SAW pipeline in gene expression file format.

Feature expression matrix under the tissue coverage region. It is also a visualization GEF that includes expression counts for bin1, 5, 10, 20, 50, 100, 150, 200.

Read using Python and the Stereopy package in the beginning of processing to later be saved as an anndata object in h5ad format.

- Gene expression matrix
- Spatial coordinates
- Cell/bin metadata.
- Gene information

Human subjects data

Human prenatal tissue was obtained from elective terminations of pregnancy (induced abortions) with informed oral and written consent from the patient, and pseudonymized.

All procedures regarding the acquisition of human pre-natal tissue for research was approved by the Swedish Ethical Research Authority and the National Board of Health and Welfare (ethical reference is 2018/769-31). The material was donated for general non-profit research purposes, especially with the focus on embryonic tissue development. Sample staging involved clinical ultrasound, using crown-rump-length (CRL) and anatomical landmarks.Sample staging involved clinical ultrasound, using crown-rump-length (CRL) and anatomical landmarks.

Cell types, positional codes, and enhancers contributing to human facial individuality and pathology

Data files

Abstract

README: Cell types, positional codes, and enhancers contributing to human facial individuality and pathology

Description of the data and file structure

Files and variables

/single_cell_multiom.tar.gz

single_cell_multiom/ — Raw single-cell multiome data

Feature-barcode matrices

ATAC fragment files

Sample metadata and file mapping

Loading the data

Processed files:

Software requirements for processed files

/RNAseq.tar.gz

/all_merged1.h5ad - all scRNAseq samples merged and preprocessed

/spatial.tar.gz

Description of the data and file structure

Files and folders

visium_sample/outs/analysis/clustering/gene_expression_kmeans_x_clusters/clusters.csv

visium_sample/outs/analysis/diffexp/gene_expression_graphclust_X_clusters/differential_expression.csv

visium_sample/outs/analysis/pca/gene_expression_10_components

visium_sample/outs/analysis/tsne/gene_expression_2_components/projection.csv

visium_sample/outs/analysis/umap/gene_expression_2_components/projection.csv

visium_sample/outs/filtered_feature_bc_matrix

visium_sample/outs/raw_feature_bc_matrix

visium_sample/outs/spatial

visium_sample/outs/cloupe.cloupe

visium_sample/outs/filtered_feature_bc_matrix.h5

visium_sample/outs/metrics_summary.csv

visium_sample/outs/molecule_info.h5

visium_sample/outs/raw_feature_bc_mattrix.h5

visium_sample/outs/web_summary.html

Stereo-seq\sample.gef

Human subjects data

`single_cell_multiom/` — Raw single-cell multiome data