Lineage-resolved analysis of embryonic gene expression evolution in C. elegans and C. briggsae
Data files
Jun 11, 2025 version files 10.45 GB
-
briggsae_WS290_cistarget.zip
114.92 MB
-
cbr_markers.txt
53.80 MB
-
cbr_tpm_out.txt
732.23 MB
-
cel_markers.txt
71.55 MB
-
cel_tpm_out.txt
690.49 MB
-
cell_data_bins.txt
237.40 KB
-
cell_data_mean.txt
170.75 KB
-
cell_plots.zip
236.53 MB
-
clist.rds
80.85 MB
-
config.yml
210 B
-
deg.txt
136.23 MB
-
elegans_WS290_cistarget.zip
142.96 MB
-
eset.rds
1.68 GB
-
gene_data.txt
18.80 MB
-
gene_plots.zip
6.49 GB
-
README.md
32.93 KB
Abstract
What constraints govern the evolution of gene expression patterns across development remains a fundamental question. Single-cell RNA-sequencing can detail these constraints by systematically profiling homologous cells. The conserved invariant embryonic lineage of C. elegans and C. briggsae makes them ideal for comparing cell-type gene expression across evolution. Measuring the spatiotemporal divergence of gene expression across embryogenesis, we find a high level of similarity in gene expression programs between species despite tens of millions of years of evolutionary divergence. Nonetheless, thousands of genes show divergence in their cell-type-specific expression patterns, with enrichment for functions in environmental response and behavior. Neuronal cell types show higher divergence than others, such as the intestine and germline. This work identifies likely constraints on the evolution of developmental gene expression.
Dataset DOI: 10.5061/dryad.1rn8pk15n
Description of the data and file structure
Files and variables
File: config.yml
Description: Config file needed for VisCello
File: clist.rds
Description: Cello list of UMAP projections
File: cell_plots.zip
Description: Collection of plots describing each progenitor and terminal cell types. Below are the elements in the plots:
- The relative TPM of every gene in C. elegans and C. briggsae. Whether a gene is a cell-type marker within that species or both is labeled.
- A barplot of the cell-type markers from C. elegans binned by their WormCat gene category. Ontop is their fold-enrichment and below is their count.
- The top cell-type markers that are shared between species (black outline), private to C. elegans (green), or private to C. briggsae (blue). The private markers can also include genes that weren't annotated as being directly orthologous between the species.
- A bunch of cell type metrics, where the values for that cell type are shown in green for C. elegans and blue for C. briggsae (red for both) ontop of the dataset wide distribution.
- Cell count: number of cells in the dataset.
- The number of genes ‘detected’ in that cell type. Calculated by generating 1000 bootstraps of the TPM, then selecting genes whose 95% lower CI doesn’t intersect 0.
- The toal number of markers of that cell-type.
- How many of the markers of that cell-type are just in one species versus the total markers (shared + private).
- Gini coefficient: A measure of inequality that shows how evenly distributed the TPM values are (0 = even, 1 = skewed).
- The number of UMI-collapsed sequencing reads that are associated with the cell-type.
- Jensen-Shannon Distance: Metric of distance between the two species cell transcriptomes.
- Pearson Correlation: Metric of similarity between the two species cell transcriptomes.
- Cosine distance: Metric of distance between the two species cell transcriptomes using one minus the cosine angle.
- The number of differentially expressed genes between the species.
File: cbr_markers.txt
Description: Gene markers identified in C. briggsae. Values that are NA for the C. elegans columns are because of a lack of 1:1 orthology between the genes.
- p_val.species - The unadjusted p-value between the cells of that cell-type and the cells of rest of the dataset for that species.
- avg_log2FC.species - The log2 fold-change between the cells of that cell-type and the cells of rest of the dataset for that species.
- pct.1.species - The fraction of cells for which this gene was detected in that cell-type for that species.
- pct.2.species - The fraction of cells for which this gene was detected in the rest of the cells in the dataset for that species.
- p_val_adj.species - The adjusted p-value between the cells of that cell-type and the cells of rest of the dataset for that species.
- cell_type - The cell-type in which the gene was tested for its marker status.
- gene - The gene that was tested for its marker status.
- cell_gene - The joint name of the cell type and the gene.
- p_val.other_species - If the gene was tested in the other species for its marker status, the values for that test are shown here.
- avg_log2FC.other_species - If the gene was tested in the other species for its marker status, the values for that test are shown here.
- pct.1.other_species - If the gene was tested in the other species for its marker status, the values for that test are shown here.
- pct.2.other_species - If the gene was tested in the other species for its marker status, the values for that test are shown here.
- p_val_adj.other_species - If the gene was tested in the other species for its marker status, the values for that test are shown here.
- cel_tpm_log2fc - The log2FC using the TPM pseudobulk values instead of the single-cell estimates.
- cbr_tpm_log2fc - The log2FC using the TPM pseudobulk values instead of the single-cell estimates.
- cel_tpm_log2fc_just_pro - The log2FC using the TPM pseudobulk values from just the progenitor cell types instead of the single-cell estimates. NA values in this columns are either due to a lack of orthology or the gene not being expressed in the progenitors.
- cbr_tpm_log2fc_just_pro - The log2FC using the TPM pseudobulk values from just the progenitor cell types instead of the single-cell estimates. NA values in this columns are either due to a lack of orthology or the gene not being expressed in the progenitors.
- cel_tpm - The TPM of that gene in that cell-type for C. elegans.
- cbr_tpm - The TPM of that gene in that cell-type for C. briggsae.
- cel_max_tpm_term - The maximum TPM of that gene in terminal cell types for C. elegans.
- cbr_max_tpm_term - The maximum TPM of that gene in terminal cell types for C. briggsae.
- cel_max_tpm_pro - The maximum TPM of that gene in progenitor cell types for C. elegans.
- cbr_max_tpm_pro - The maximum TPM of that gene in progenitor cell types for C. briggsae.
- cel_mean_tpm_term - The mean TPM of that gene in terminal cell types for C. elegans.
- cbr_mean_tpm_term - The mean TPM of that gene in terminal cell types for C. briggsae.
- cel_mean_tpm_pro - The mean TPM of that gene in progenitor cell types for C. elegans.
- cbr_mean_tpm_pro - The mean TPM of that gene in progenitor cell types for C. briggsae.
- cel_tau_pro - The broadness of the gene expression pattern in just progenitor cell types for C. elegans.
- cbr_tau_pro - The broadness of the gene expression pattern in just progenitor cell types for C. briggsae.
- cel_tau_term - The broadness of the gene expression pattern in just terminal cell types for C. elegans.
- cbr_tau_term - The broadness of the gene expression pattern in just terminal cell types for C. briggsae.
- cel_tau_joint - The broadness of the gene expression pattern across the dataset for C. elegans.
- cbr_tau_joint - The broadness of the gene expression pattern across the dataset for C. briggsae.
- gene.type - Whether the gene in shared between species, or is specific to one or the other.
- orthology_conf - The confidence in the orthology classification
- OG - The orthogroup name for this gene.
- cel_OG_count - The number of genes from C. elegans in this orthogroup.
- cbr_OG_count - The number of genes from C. briggsae in this orthogroup.
- WormCat.1 - The WormCat (Holdorf, et al., 2020) category of this gene at a tier one level.
- WormCat.2 - The WormCat category of this gene at a tier two level.
- WormCat.3 - The WormCat category of this gene at a tier three level.
- in_species - Whether this gene marker is also a marker in the other species.
File: cell_data_bins.txt
Description: Cell data, calculated on progenitor and terminal cell type bins
- cell-type - The name of the cell type.
- cell_type_bin - The cell type time bins associated with this cell type.
- cell_class - The tissue subset the cell is a part of.
- jsd_median - The median Jensen-shannon distance between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- jsd_lower - The lower 95% confidence-interval of the Jensen-shannon distance between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- jsd_upper - The upper 95% confidence-interval of the Jensen-shannon distance between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- cor_median - The median pearson correlation between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- cor_lower - The lower 95% confidence-interval of the Pearson correlation between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- cor_upper - The upper 95% confidence-interval of the Pearson correlation between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- cos_median - The median cosine angle between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- cos_upper - The lower 95% confidence-interval of the cosine angle between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- cos_lower - The upper 95% confidence-interval of the cosine angle between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- cel_gini_median - C. elegans gini coefficient, calculated on calculated on 1000 bootstraped TPM values.
- cbr_gini_median - C. briggsae gini coefficient, calculated on calculated on 1000 bootstraped TPM values.
- cel_gini_upper - C. elegans upper 95% confidence-interval gini coefficient, calculated on calculated on 1000 bootstraped TPM values.
- cel_gini_lower - C. elegans lower 95% confidence-interval gini coefficient, calculated on calculated on 1000 bootstraped TPM values.
- cbr_gini_upper - C. briggsae upper 95% confidence-interval gini coefficient, calculated on calculated on 1000 bootstraped TPM values.
- cbr_gini_lower - C. briggsae lower 95% confidence-interval gini coefficient, calculated on calculated on 1000 bootstraped TPM values.
- cel_markers - Number of C. elegans markers.
- cbr_markers - Number of C. briggsae markers.
- cel_markers_common - Number of C. elegans shared markers.
- cbr_markers_common - Number of C. briggsae shared markers.
- cel_markers_non_one_to_one - Number of markers that are private to C. elegans.
- cbr_markers_non_one_to_one - Number of markers that are private to C. briggsae.
- genes_detected_bootstrap_cel - Number of genes detected using binarization using the 95% CI on bootstrapped TPM.
- genes_detected_bootstrap_cbr - Number of genes detected using binarization using the 95% CI on bootstrapped TPM.
- cel_cell_count - Cell count for C. elegans.
- cbr_cell_count - Cell count for C. briggsae.
- cel_median_umi - Median number of UMI per cell.
- cbr_median_umi - Median number of UMI per cell.
- deg - How many differentially expressed between homologous cell type between C. elegans and C. briggsae.
- lineage_group - Naming of progenitor cell groups (Not applicable for terminally differentiated cell types).
- div_stage - The embryonic stage of that cell type.
- min_cell_count - The minimum cell count from either species.
- embryo_time - The mean embryo time of that cell type.
File: cell_data_mean.txt
Description: Cell data, calculated on progenitor and the mean values of terminal cell type bins
- cell-type - The name of the cell type.
- cell_type_bin - The cell type time bins associated with this cell type.
- cell_class - The tissue subset the cell is a part of.
- jsd_median - The median Jensen-shannon distance between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- jsd_lower - The lower 95% confidence-interval of the Jensen-shannon distance between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- jsd_upper - The upper 95% confidence-interval of the Jensen-shannon distance between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- cor_median - The median pearson correlation between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- cor_lower - The lower 95% confidence-interval of the Pearson correlation between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- cor_upper - The upper 95% confidence-interval of the Pearson correlation between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- cos_median - The median cosine angle between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- cos_upper - The lower 95% confidence-interval of the cosine angle between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- cos_lower - The upper 95% confidence-interval of the cosine angle between the transcriptomes of the homologous cell-types between C. elegans and C. briggsae, calculated on 1000 bootstraped TPM values.
- cel_gini_median - C. elegans gini coefficient, calculated on calculated on 1000 bootstraped TPM values.
- cbr_gini_median - C. briggsae gini coefficient, calculated on calculated on 1000 bootstraped TPM values.
- cel_gini_upper - C. elegans upper 95% confidence-interval gini coefficient, calculated on calculated on 1000 bootstraped TPM values.
- cel_gini_lower - C. elegans lower 95% confidence-interval gini coefficient, calculated on calculated on 1000 bootstraped TPM values.
- cbr_gini_upper - C. briggsae upper 95% confidence-interval gini coefficient, calculated on calculated on 1000 bootstraped TPM values.
- cbr_gini_lower - C. briggsae lower 95% confidence-interval gini coefficient, calculated on calculated on 1000 bootstraped TPM values.
- cel_markers - Number of C. elegans markers.
- cbr_markers - Number of C. briggsae markers.
- cel_markers_common - Number of C. elegans shared markers.
- cbr_markers_common - Number of C. briggsae shared markers.
- cel_markers_non_one_to_one - Number of markers that are private to C. elegans.
- cbr_markers_non_one_to_one - Number of markers that are private to C. briggsae.
- genes_detected_bootstrap_cel - Number of genes detected using binarization using the 95% CI on bootstrapped TPM.
- genes_detected_bootstrap_cbr - Number of genes detected using binarization using the 95% CI on bootstrapped TPM.
- cel_cell_count - Cell count for C. elegans.
- cbr_cell_count - Cell count for C. briggsae.
- cel_median_umi - Median number of UMI per cell.
- cbr_median_umi - Median number of UMI per cell.
- deg - How many differentially expressed between homologous cell type between C. elegans and C. briggsae.
- lineage_group - Naming of progenitor cell groups (Not applicable for terminally differentiated cell types).
- div_stage - The embryonic stage of that cell type.
- min_cell_count - The minimum cell count from either species.
- embryo_time - The mean embryo time of that cell type.
File: gene_data.txt
Description: Gene data information. For genes that are poorly sampled, some of the distance and broadness metrics may be NA, meaning there was not enough information to make a conclusion. For columns that come from matching with other datasets, NA values refer to where no match was able to be made due to gene naming differences.
- jsd_median_term - Jensen-Shannon distance calculated on 1000x bootstraps of the terminal cell type TPM values
- pcor_median_term - Pearson correlation calculated on 1000x bootstraps of the terminal cell type TPM values
- scor_median_term - Spearman correlation calculated on 1000x bootstraps of the terminal cell type TPM values
- cos_median_term - Cosine angle calculated on 1000x bootstraps of the terminal cell type TPM values
- cel_tau_median_term - C. elegans Tau calculated on 1000x bootstraps of the terminal cell type TPM values
- cbr_tau_median_term - C. briggsae Tau calculated on 1000x bootstraps of the terminal cell type TPM values
- jsd_median_pro - Jensen-Shannon distance calculated on 1000x bootstraps of the progenitor cell type TPM values
- pcor_median_pro - Pearson correlation calculated on 1000x bootstraps of the progenitor cell type TPM values
- scor_median_pro - Spearman correlation calculated on 1000x bootstraps of the progenitor cell type TPM values
- cos_median_pro - Cosine angle calculated on 1000x bootstraps of the progenitor cell type TPM values
- cel_tau_median_pro - C. elegans Tau calculated on 1000x bootstraps of the progenitor cell type TPM values
- cbr_tau_median_pro - C. briggsae Tau calculated on 1000x bootstraps of the progenitor cell type TPM values
- jsd_median_joint - Jensen-Shannon distance calculated on 1000x bootstraps of the joint cell type TPM values
- pcor_median_joint - Pearson correlation calculated on 1000x bootstraps of the joint cell type TPM values
- scor_median_joint - Spearman correlation calculated on 1000x bootstraps of the joint cell type TPM values
- cos_median_joint - Cosine angle calculated on 1000x bootstraps of the joint cell type TPM values
- cel_tau_median_joint - C. elegans Tau calculated on 1000x bootstraps of the joint cell type TPM values
- cbr_tau_median_joint - C. briggsae Tau calculated on 1000x bootstraps of the joint cell type TPM values
- cel_max_tpm_term - C. elegans maximum TPM value across all terminal cell types
- cbr_max_tpm_term - C. briggsae maximum TPM value across all terminal cell types
- max_tpm_term - Maximum TPM value across all terminal cell types across species
- cel_max_tpm_pro - C. elegans maximum TPM value across all progenitor cell types
- cbr_max_tpm_pro - C. briggsae maximum TPM value across all progenitor cell types
- max_tpm_pro - Maximum TPM value across all progenitor cell types across species
- elegans_id - C. elegans WBGene name
- elegans_gene_long_name - C. elegans transcript name
- briggsae_id - C. briggsae WBGene name
- briggsae_gene_short_name - C. briggsae gene short name
- briggsae_gene_long_name - C. briggsae transcript name
- orthology_conf - Confidence in the orthology assignment with 1:1 being the highest, followed by confident canonical then canonical
- percent_identity - Percent identity in a Smith-Waterman alignment between C. elegans and C. briggsae
- percent_similarity - Percent similarity in a Smith-Waterman alignment between C. elegans and C. briggsae
- syntenic - Whether the gene has been found to be syntenic between species
- OG - The orthogroup number for reference with the orthogroup supplementary table
- cel_OG_count - Number of C. elegans genes in the orthogroup
- cbr_OG_count - Number of C. briggsae genes in the orthogroup
- WormCat.1 - WormCat tier one term
- WormCat.2 - WormCat tier two term
- WormCat.3 - WormCat tier three term
- omega - The dN/dS values calculated in this study
- Cutter_Ka - Ka values calculted in Tu et al. 2015
- Cutter_Ks - Ks values calculted in Tu et al. 2015
- Cutter_Ka.Ks - Ka/Ks values calculted in Tu et al. 2015
- PS.Value - Phylostrata values calculated in Ma et al., 2023
- PS.Name - Phylostrata names calculated in Ma et al., 2023
- maternal - Whether the gene was maternally inherited according to have an expression above zero in the first cell division from Tintori et al., 2016
File: cel_markers.txt
Description: Gene markers of cell type calculated in C. elegans. Values that are NA for the C. briggsae columns are because of a lack of 1:1 orthology between the genes.
- p_val.species - The unadjusted p-value between the cells of that cell-type and the cells of rest of the dataset for that species.
- avg_log2FC.species - The log2 fold-change between the cells of that cell-type and the cells of rest of the dataset for that species.
- pct.1.species - The fraction of cells for which this gene was detected in that cell-type for that species.
- pct.2.species - The fraction of cells for which this gene was detected in the rest of the cells in the dataset for that species.
- p_val_adj.species - The adjusted p-value between the cells of that cell-type and the cells of rest of the dataset for that species.
- cell_type - The cell-type in which the gene was tested for its marker status.
- gene - The gene that was tested for its marker status.
- cell_gene - The joint name of the cell type and the gene.
- p_val.other_species - If the gene was tested in the other species for its marker status, the values for that test are shown here.
- avg_log2FC.other_species - If the gene was tested in the other species for its marker status, the values for that test are shown here.
- pct.1.other_species - If the gene was tested in the other species for its marker status, the values for that test are shown here.
- pct.2.other_species - If the gene was tested in the other species for its marker status, the values for that test are shown here.
- p_val_adj.other_species - If the gene was tested in the other species for its marker status, the values for that test are shown here.
- cel_tpm_log2fc - The log2FC using the TPM pseudobulk values instead of the single-cell estimates.
- cbr_tpm_log2fc - The log2FC using the TPM pseudobulk values instead of the single-cell estimates.
- cel_tpm_log2fc_just_pro - The log2FC using the TPM pseudobulk values from just the progenitor cell types instead of the single-cell estimates. NA values in this columns are either due to a lack of orthology or the gene not being expressed in the progenitors.
- cbr_tpm_log2fc_just_pro - The log2FC using the TPM pseudobulk values from just the progenitor cell types instead of the single-cell estimates. NA values in this columns are either due to a lack of orthology or the gene not being expressed in the progenitors.
- cel_tpm - The TPM of that gene in that cell-type for C. elegans.
- cbr_tpm - The TPM of that gene in that cell-type for C. briggsae.
- cel_max_tpm_term - The maximum TPM of that gene in terminal cell types for C. elegans.
- cbr_max_tpm_term - The maximum TPM of that gene in terminal cell types for C. briggsae.
- cel_max_tpm_pro - The maximum TPM of that gene in progenitor cell types for C. elegans.
- cbr_max_tpm_pro - The maximum TPM of that gene in progenitor cell types for C. briggsae.
- cel_mean_tpm_term - The mean TPM of that gene in terminal cell types for C. elegans.
- cbr_mean_tpm_term - The mean TPM of that gene in terminal cell types for C. briggsae.
- cel_mean_tpm_pro - The mean TPM of that gene in progenitor cell types for C. elegans.
- cbr_mean_tpm_pro - The mean TPM of that gene in progenitor cell types for C. briggsae.
- cel_tau_pro - The broadness of the gene expression pattern in just progenitor cell types for C. elegans.
- cbr_tau_pro - The broadness of the gene expression pattern in just progenitor cell types for C. briggsae.
- cel_tau_term - The broadness of the gene expression pattern in just terminal cell types for C. elegans.
- cbr_tau_term - The broadness of the gene expression pattern in just terminal cell types for C. briggsae.
- cel_tau_joint - The broadness of the gene expression pattern across the dataset for C. elegans.
- cbr_tau_joint - The broadness of the gene expression pattern across the dataset for C. briggsae.
- gene.type - Whether the gene in shared between species, or is specific to one or the other.
- orthology_conf - The confidence in the orthology classification
- OG - The orthogroup name for this gene.
- cel_OG_count - The number of genes from C. elegans in this orthogroup.
- cbr_OG_count - The number of genes from C. briggsae in this orthogroup.
- WormCat.1 - The WormCat (Holdorf, et al., 2020) category of this gene at a tier one level.
- WormCat.2 - The WormCat category of this gene at a tier two level.
- WormCat.3 - The WormCat category of this gene at a tier three level.
- in_species - Whether this gene marker is also a marker in the other species.
File: deg.txt
Description: A list of all differentially expressed genes between C. elegans and C. briggsae. To identify genes differentially expressed between C. elegans and C. briggsae within the homologous cell-types, we used Seurat V5. A Wilcoxon Rank Sum test was run between the cells of that cell-type from C. elegans against the cells of that cell-type from C. briggsae. The data have been filtered for an adjusted p-value less than 0.05 and a log2 fold-change of greater than 1 or less than -1. The columns in the table are as below:
- p_val - The unadjusted p-value between C. elegans cells and C. briggsae cells for that gene in that cell-type.
- avg_log2FC - The log2 fold-change between C. elegans cells and C. briggsae cells for that gene in that cell-type.
- pct.1 - The fraction of cells for which this gene was detected in C. elegans.
- pct.2 - The fraction of cells for which this gene was detected in C. briggsae.
- p_val_adj- The adjusted p-value between C. elegans cells and C. briggsae cells for that gene in that cell-type.
- cell_type - The cell-type in which the gene was tested for its differential expression.
- gene - The gene that was tested for its differential expression.
- cell_type_bin - The cell type bin that this differentially expressed gene was found in.
- cel_tpm - The TPM of that gene in that cell-type for C. elegans.
- cbr_tpm - The TPM of that gene in that cell-type for C. briggsae.
File: cbr_tpm_out.txt
Description: The summary expression values for every gene in C. elegans. Available here is the expression value of every gene in the C. elegans genome, summarized as the transcripts per million (TPM) on pseudobulked progenitor and terminal cell-types. To evaluate the variation in the measurement of these TPM values, we have used bootstrapping to take samples of the cells and generate several confidence intervals on the cellular expression values. A gene can be thought of as confidently detected in that cell-type if its lower 95% confidence interval does not intersect zero. Additionally, we have assessed the percentage of cells from that cell-type that we were able to detect expression from.
File: eset.rds
Description: The cell and gene information for VisCello.
Meta data columns in the VisCello Objects:
- lineage - The manually annotated cell lineage. For ambiguities in division orientation, an x is used (e.g. MSx to refer to MSa and MSp).
- cell_type - The terminal cell-type identities, manually annotated using homologous marker genes.
- species - Whether the cell is from C. elegans or C. briggsae.
- embryo_time - The estimated age of the embryo from which the cell was drawn. See Packer and Zhu et al., 2019 for more details on how this was calculated. <em>C. briggsae</em> embryo_time was estimated using the orthologous genes between the species.
- dataset - Which collection batch the cells come from.
- n_umi - The number of UMI-collapsed sequencing reads that are associated with the cell.
- genotype - The genotype from which the cell came from. Some of the C. elegans cells are from mutant animals.
- Wild-type C. elegans: N2 and VC2010
- Wild-type C. briggsae: AF16
- Mutant C. elegans for mec-3: VC2396 mec-3(gk1126). Mutants for mec-3 appear to be missing their touch neurons and markers of the touch neurons are not detected.
- Mutant C. elegans for M03D4.4: VC4183 M03D4.4(gk5269[loxP + myo-2p::GFP::unc-54 3' UTR + rps-27p::neoR::unc-54 3' UTR + loxP]). This mutant strain for M03D4.4 appears otherwise wild-type in cell composition and expression.
- Mutant C. elegans for ceh-9: YL633 ceh-9(tm2747). This mutant strain for ceh-9 appears otherwise wild-type in cell composition and expression.
- potential_low_quality_cell - Using a variety of manual annotation strategies, we have identified some cells that don't behave consistently across UMAP embeddings due to a variety of technical reasons. These have been left in the dataset as they often represent 'normal cells', but have been labeled as being potentially low-quality.
- high_background - The amount of background reads was estimated for every cell similar to Packer and Zhu et al., 2019. The cells labeled here as TRUE had a fraction of reads from background higher than 0.75.
- possible_doublet - Droplets that annotated as possibly containing two or more cells. Not all cells annotated as possible droplets are as such. Please see Packer and Zhu et al., 2019 for details on how the background was estimated.
- packer_cell_type - Cell type annotation from Packer and Zhu et al., 2019.
- packer_cell_subtype - Cell type annotation from Packer and Zhu et al., 2019.
- packer_plot_cell_type - Cell type annotation from Packer and Zhu et al., 2019.
- SizeFactor - A column used to estimate the library size.
- smoothed_embryo_time - The estimated embryo time calculated as above, with an additional nearest neighbor smoothing algorithm to use the neighboring cell's embryo time and transcriptome to better approximate the age of the embryo.
- embryo_time_bin - Binned smoothed embryo time with lt_100 meaning 'less than 100' and gt_710 meaning 'greater than 710.
- Gene Expression - Used to inspect gene expression.
File: cel_tpm_out.txt
Description: The summary expression values for every gene in C. briggsae. Available here is the expression value of every gene in the C. briggsae genome, summarized as the transcripts per million (TPM) on pseudobulked progenitor and terminal cell-types. To evaluate the variation in the measurement of these TPM values, we have used bootstrapping to take samples of the cells and generate several confidence intervals on the cellular expression values. A gene can be thought of as confidently detected in that cell-type if its lower 95% confidence interval does not intersect zero. Additionally, we have assessed the percentage of cells from that cell-type that we were able to detect expression from.
File: gene_plots.zip
Description: A collection of summary plots for every gene that is shared in C. elegans and C. briggsae with some expression. Below are the elements in the plots:
- Global UMAP showing the expression of your gene of interest.
- Cell subset UMAP showing the expression of your gene of interest. The choice of which UMAP is shown is based on which cell-type shows maximum expression across all cell-types between the two species.
- Terminal cell-type comparative TPM values shown in log2 space. The cell-types are summarized by their cell class.
- Progenitor cell-type comparative TPM values shown in log2 space. The circle plots summarize the division patterns in the embryo.
- A bunch of gene metrics, where the values for this gene is shown in red as a confidence interval (CI) range on top of the dataset wide distribution. These metrics are shown for the jointly estimated, terminal, and progenitor cell types.
- Gene expression patern distance shown as the Jensen-Shannon Distance (JSD) calculated on the bootstrapped TPM values. The CI and median for the gene JSD was calculated on the bootstrap resampled TPM values.
- The broadness of gene expression pattern shown as the Tau value for C. elegans.
- The broadness of gene expression pattern shown as the Tau value for C. briggsae.
File: briggsae_WS290_cistarget.zip
Description: Database files for running SCENIC on C. briggsae expression data (WS290). Please see the following publication for how to run SCENIC using the enclosed feather databases:
Van de Sande, B., Flerin, C., Davie, K. et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat Protoc 15, 2247–2276 (2020). https://doi.org/10.1038/s41596-020-0336-2
File: elegans_WS290_cistarget.zip
Description: Database files for running SCENIC on C. elegans expression data (WS290). Please see the following publication for how to run SCENIC using the enclosed feather databases:
Van de Sande, B., Flerin, C., Davie, K. et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat Protoc 15, 2247–2276 (2020). https://doi.org/10.1038/s41596-020-0336-2
Code/software
Included are all codes to generate the summary metrics and visualizations presented in the associated manuscript. Additional code for the ShinyApp is also available here as a zip file.
Access information
Other publicly accessible locations of the data:
