Whole-embryo spatial transcriptomics at subcellular resolution from gastrulation to organogenesis
Data files
Dec 11, 2025 version files 128.97 GB
-
matched_A_50p_E1_sphere_logFC.h5ad
95.83 MB
-
matched_A_50p_E2_sphere_logFC.h5ad
98.04 MB
-
matched_B_75p_E1_sphere_logFC.h5ad
139.03 MB
-
matched_B_75p_E2_sphere_logFC.h5ad
143.21 MB
-
README.md
4.60 KB
-
RNA_loc_A_50p_E1.csv
15.23 GB
-
RNA_loc_A_50p_E2.csv
18.11 GB
-
RNA_loc_B_75p_E1.csv
19.36 GB
-
RNA_loc_B_75p_E2.csv
17.08 GB
-
RNA_loc_C_6s_E1.csv
17.09 GB
-
RNA_loc_C_6s_E2.csv
19.56 GB
-
Tracking_Matrix.csv
168.26 MB
-
weMERFISH_combined_A_50p_E1.h5ad
1.53 GB
-
weMERFISH_combined_A_50p_E2.h5ad
1.84 GB
-
weMERFISH_combined_B_75p_E1.h5ad
3.37 GB
-
weMERFISH_combined_B_75p_E2.h5ad
3.25 GB
-
weMERFISH_combined_C_6s_E1_rescaled_z.h5ad
5.03 GB
-
weMERFISH_combined_C_6s_E2_rescaled_z.h5ad
5.78 GB
-
weMERFISH_measured_A_50p_E1.h5ad
49.02 MB
-
weMERFISH_measured_A_50p_E2.h5ad
65.36 MB
-
weMERFISH_measured_B_75p_E1.h5ad
93.89 MB
-
weMERFISH_measured_B_75p_E2.h5ad
88.07 MB
-
weMERFISH_measured_C_6s_E1_rescaled_z.h5ad
372.98 MB
-
weMERFISH_measured_C_6s_E2_rescaled_z.h5ad
429.98 MB
Abstract
This dataset supports a whole-embryo investigation of the spatial and temporal regulation of gene expression during early zebrafish development. It includes high-resolution weMERFISH measurements of 495 genes at subcellular resolution across three developmental stages (50% epiboly/early gastrula, 75% epiboly/mid gastrula, and 6-somite stage/early organogenesis), together with imputed spatial expression patterns for 25,872 additional genes. For each embryo, the dataset provides cell-level transcript counts, spatial coordinates, subcellular localization of individual transcripts, and corresponding tissue and cluster annotations. In addition, the dataset contains the MERFISH-FATE cell correspondence tables that link fixed embryos to whole-embryo live imaging, enabling analysis of gene expression changes in the context of cell movements and lineage trajectories. These data collectively form a comprehensive spatial multiomics atlas of zebrafish gastrulation and early organogenesis, supporting studies of gene regulation, tissue patterning, boundary formation, and the integration of molecular and morphogenetic processes.
Dataset DOI: 10.5061/dryad.j0zpc86v9
Description of the data and file structure
weMERFISH data, containing single-cell measured and imputed gene expression, and subcellular localization data of individual transcripts. This dataset includes whole-embryo spatial transcriptomics measurements for 495 genes acquired at subcellular resolution across three zebrafish developmental stages, together with imputed spatial expression patterns for 25,872 additional genes generated through integration with single-cell multiome data. For each embryo, the dataset provides cell-level transcript counts, 3D spatial coordinates, transcript-level subcellular positions, and tissue and cluster annotations.
The repository also contains the MERFISH-FATE correspondence tables that link fixed embryos to whole-embryo live imaging, enabling exploration of gene expression dynamics in the context of cell movement and lineage trajectories.
Files and variables
The following files contain location of the single RNA molecules
- RNA_loc_A_50p_E1.csv
- RNA_loc_A_50p_E2.csv
- RNA_loc_B_75p_E1.csv
- RNA_loc_B_75p_E2.csv
- RNA_loc_C_6s_E1.csv
- RNA_loc_C_6s_E2.csv
Column data correspond to
- : transcript ID
- z,x,y: coordinates of the transcripts in imaging/pixel space
- Cor_: signal correlation with Gaussian
- Xdist_: distance between the three decoded dots
- Hraw: raw intensity
- memdist: distance to membrane in 3D
- memdist2d: distance to membrane in 2D
- isnuc: whether the transcript is in nuclei (TRUE) or cytoplasm (FALSE)
- icodesN: MERFISH codebook ID
- cid: cell ID
- gene: gene name
The following files contain measured weMERFISH signal assigned to single cells
- weMERFISH_measured_A_50p_E1.h5ad
- weMERFISH_measured_A_50p_E2.h5ad
- weMERFISH_measured_B_75p_E1.h5ad
- weMERFISH_measured_B_75p_E2.h5ad
- weMERFISH_measured_C_6s_E1_rescaled_z.h5ad
- weMERFISH_measured_C_6s_E2_rescaled_z.h5ad
Two datasets/embryos (E1 and E2) per developmental stage, with single-cell measurements in the format of scanpy AnnData:
adata.obsm[‘X_raw’] contains the measured gene levels (dimensions same as adata.X)
adata.obs[‘clusters’] contains cluster annotation of the cells.
For 50% epiboly (A_50p) and 75% epiboly (B_75p) data,
adata.obsm[‘global_sphere’] has the spherical coordinates (x,y,z)
adata.obs[‘theta’] and adata.obs[‘phi’] contains the polar coordinates of the cells on the sphere.
For 6-somite stage (C_6s) data
adata.obsm[‘spatial’] should be used to visualize the coordinates, with z scaled to real-world sample thickness
The following files contain imputed gene expression data from scMultiome data:
- weMERFISH_combined_A_50p_E1.h5ad
- weMERFISH_combined_A_50p_E2.h5ad
- weMERFISH_combined_B_75p_E1.h5ad
- weMERFISH_combined_B_75p_E2.h5ad
- weMERFISH_combined_C_6s_E1_rescaled_z.h5ad
- weMERFISH_combined_C_6s_E2_rescaled_z.h5ad
It contains the data structure above as weMERFISH_measured, and additionally:
adata.obsm[‘X_imputed’] contains the imputed gene levels
adata.uns[‘imputed_gene_names’] contains the name of the imputed genes in the matrix above
For MERFISH-FATE
The two pairs of datasets [matched_A_50p_E1_sphere_logFC.h5ad and matched_B_75p_E1_sphere_logFC.h5ad] and [matched_A_50p_E2_sphere_logFC.h5ad and matched_B_75p_E2_sphere_logFC.h5ad] are weMERFISH datasets connected by live imaging data. Cells with the same .obs['weMERFISH_id'] indicate the corresponding lineage/spatial locations. The matched cell ID in live tracking dataset is stored in .obs['Amat_id'], which corresponds to the first column of the csv file listed below.
Tracking_Matrix.csv is the tracking result of the live imaging data, with TM0 being 50% epiboly and TM230 being 75% epiboly.
Variables
- node id: ID of the cell
- timepoint: time point of the cell in the movie, time step is 90s
- x: x coordinates
- y: y coordinates
- z: z coordinates
- number of children: number of children of the listed cell
- number of parents: number of paretns of the listed cell
- sum number of children and parents: number of children and parents of the listed cell
- child 1 id: child 1 cell ID, correspond the the first column
- child 2 id: child 2 cell ID, correspond the the first column
- parent id: parent cell ID, correspond the the first column
- ancestor id: ancestor cell ID from time point 0 or when the lineage started
Code/software
Python 3.9.13
scanpy 1.10.3
