Simulations of gene regulatory networks with transcriptional adaptation
Data files
Aug 02, 2024 version files 2.61 GB
-
README.md
18.01 KB
-
Simulations.zip
2.61 GB
Abstract
Background
Cells and tissues have a remarkable ability to adapt to genetic perturbations via a variety of molecular mechanisms. Transcriptional adaptation has recently emerged as one such mechanism, in which nonsense mutations in a gene trigger upregulation of related genes, possibly conferring robustness at cellular and organismal levels. However, beyond a handful of developmental contexts and curated sets of genes, no comprehensive genome-wide investigation of this behavior has been undertaken for mammalian cell types and contexts. Further, how the regulatory-level effects of inherently stochastic compensatory gene networks contribute to phenotypic penetrance in single cells remains unclear.
Results
In the corresponding manuscript, we analyze existing bulk and single-cell transcriptomic datasets to infer the prevalence of transcriptional adaptation in mammalian systems across diverse contexts and cell types. In the data presented here, stochastic mathematical modeling of minimal compensatory gene networks qualitatively recapitulates several aspects of transcriptional adaptation. Combined with machine learning analysis of network features of interest, our framework offers potential explanations for which regulatory steps are most important for transcriptional adaptation.
Conclusions
We provide a formal quantitative framework to test and refine models of transcriptional adaptation.
https://doi.org/10.5061/dryad.nk98sf82j
Project Description
Outputs of simulations of gene regulatory networks with transcriptional adaptation (also known as nonsense-induced transcriptional compensation) as described in
Mellis IA, Melzer ME, Bodkin N, and Goyal Y. 2024. Prevalence of and gene regulatory constraints on transcriptional adaptation in single cells. Genome Biology.
We performed simulations of gene regulatory networks with transcriptional adaptation, where genes are represented as nodes and regulatory relationships as edges.
The regulatory networks were constructed using a modified telegraph model for transcriptional bursting, where each gene’s alleles switch between active (transcribing) and inactive (quiescent) states, for an ancestral regulator gene (A), its paralogs (A’ for 1-paralog models, A’1 and A’2 for multiparalog models), and a downstream target gene (B). To capture the dynamics of gene regulation, we implemented stochastic simulations using Gillespie’s next reaction method. We accounted for differential regulatory effects of different gene products. We included models for both activating and repressing interactions, as well as models with more than one paralog gene.
Parameter ranges were defined based on previously reported literature on transcriptional regulation and other related phenomena, as described in the corresponding manuscript. Parameter sets for simulations were sampled with latin hypercube sampling from the defined parameter ranges. Each simulation ran for 300,000 timesteps, simulating three genotypes: wildtype, heterozygous, and homozygous-mutant for 100,000 timesteps each. A total of approximately 60,000 simulations were performed across different network models.
Usage
The companion publication to this dataset is
Mellis IA, Melzer ME, Bodkin N, and Goyal Y. 2024. Prevalence of and gene regulatory constraints on transcriptional adaptation in single cells. Genome Biology.
To use the data herein:
- Review the companion publication Methods section and its Additional File 2: Supplementary Note for all details about simulation rationale, setup, and execution.
- Download necessary software and packages: MATLAB R2017a, R2021b, or R2024a and R v4.3.2 using packages readxl v1.4.0, partykit v1.2-16, mvtnorm v1.1-3, libcoin v1.0-9, ggalluvial v0.12.3, entropy v1.3.1, svglite v2.1.0, corrplot v0.92, ggrepel v0.9.1, e1071 v1.7-11, diptest v0.76-0, gridExtra v2.3, Hmisc v4.7-0, Formula v1.2-4, survival v3.3-1, lattice v0.20-45, ineq v0.2-13, magrittr v2.0.3, forcats v0.5.1, stringr v1.4.0, dplyr v1.0.9, purrr v0.3.4, readr v2.1.2, tidyr v1.2.0, tibble v3.1.7, ggplot2 v3.3.6, tidyverse v1.3.1, gprofiler2 v0.2.3.
- Clone the repository used to generate and analyze this dataset from GitHub or Zenodo.
- Download parameter set values used to generate the outputs in this dataset in the companion publication Additional File 1: Table S4, which includes all parameter sets used to generate every file in this dataset. Random seeds are included in the corresponding files in the code repository in Step 3.
- Execute analysis code from the GitHub or Zenodo respository Analysis/simulations subfolder after downloading the above software, scripts, and metadata.
Description of the file structure
The data here are sets of tables in which each row is a sampled “pseudo-single-cell” at one point in time and each column is a feature, such as a gene product abundance or the status of an allele’s activation state. The file structure is organized according to the type of gene regulatory network simulated. Details of Activator_1paralog, Multiparalog, and Repressor networks are all extensively described in the companion publication.
For each network, there is a subfolder for each of several different simulation conditions, also described in the companion publication. The companion publication Additional File 1: Table S4 lists exact parameter sets simulated for each subfolder here. The companion publication Methods section and Additional File 2: Supplementary Note of the companion publication provides a complete explanation of the simulation goals, setup, and analysis workflow.
All named features used in the tables below were generated programmatically and their roles in simulation execution and output is described in the companion publication. All named features are explained in the “Contents of data files” section, below.
Subfolders and contents:
Activator_1paralog
- stores simulation outputs from several 1-paralog models in which upstream factors A and A’ activate expression of downstream effector B. Contains subfolders for each of several model conditionstrimodal_resample_v1-6-5-1
- stores simulation outputs after parameter set resampling with a custom radius around the index trimodal case described in the companion publicationinitialsim_rates*.csv
- created programmatically by simulation code, where * is in [1,100]. Each column contains one entry for the specified parameter values for that parameter set * ID.initialsim_species*_q300.csv
- created programmatically by simulation code, where * is in [1,100]. Each column contains species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
trimodal_resampled2_v1_6_5_1_2
- stores simulation outputs after parameter set resampling with a minimum observed radius around the index trimodal case described in the companion publicationinitialsim_rates*.csv
- created programmatically by simulation code, where * is in [1,100]. Each column contains one entry for the specified parameter values for that parameter set * ID.initialsim_species*_q300.csv
- created programmatically by simulation code, where * is in [1,100]. Each column contains species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
trimodal_resampled3_v1_6_5_1_3
- stores simulation outputs after parameter set resampling with a 10th percentile observed radius around the index trimodal case described in the companion publicationinitialsim_rates*.csv
- created programmatically by simulation code, where * is in [1,100]. Each column contains one entry for the specified parameter values for that parameter set * ID.initialsim_species*_q300.csv
- created programmatically by simulation code, where * is in [1,100]. Each column contains species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
trimodal_resampled4_v1_6_5_1_4
- stores simulation outputs after parameter set resampling with a 10th percentile observed radius around the index trimodal case described in the companion publicationinitialsim_rates*.csv
- created programmatically by simulation code, where * is in [1,100]. Each column contains one entry for the specified parameter values for that parameter set * ID.initialsim_species*_q300.csv
- created programmatically by simulation code, where * is in [1,100]. Each column contains species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
unimodal_robust_resample_v1-6-5-3
- stores simulation outputs after parameter set resampling within the candidate unimodal symmetric fully-robust parameter subspace described in the companion publicationall_species_q300.csv
- created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
unimodal_shape_v1_6_5_2_1
- stores simulation outputs after parameter set resampling within the candidate unimodal symmetric shape-robust parameter subspace described in the companion publicationinitialsim_rates*.csv
- created programmatically by simulation code, where * is in [1,100]. Each column contains one entry for the specified parameter values for that parameter set * ID.initialsim_species*_q300.csv
- created programmatically by simulation code, where * is in [1,100]. Each column contains species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
with_basal_paralog_v1-6-6-1
- stores simulation outputs over the entire parameter space for a network in which there is basal expression of A’all_species_q300.csv
- created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
without_basal_paralog_v1-6-5-2
- stores simulation outputs over the entire parameter space for a network in which there is no basal expression of A’all_species_q300.csv
- created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
Multiparalog
- stores simulation outputs from 2-paralog models in which upstream factors A, A’1, and A’2 activate expression of downstream effector B. Contains subfolders for each model conditiondifferent_parameters_v1_7_2
- stores simulation outputs over the entire parameter space for a network in which A’1 and A’2 are allowed to have different parameter parameter valuesall_species_q300.csv
- created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
same_parameters_v1_7_1
- stores simulation outputs over the entire parameter space for a network in which A’1 and A’2 are not allowed to have different parameter parameter valuesall_species_q300.csv
- created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
Repressor
- stores simulation outputs from 1-paralog model in which upstream factors A and A’ repress expression of downstream effector B, in which B has basal expression. Contains subfolders for each model conditionwith_basal_paralog_v1_6_10_1
- stores simulation outputs over the entire parameter space for a network in which there is basal expression of A’all_species_q300.csv
- created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
without_basal_paralog_v1_6_11_1
- stores simulation outputs over the entire parameter space for a network in which there is not basal expression of A’all_species_q300.csv
- created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
Contents of data files
Descriptions of the columns in all data files with the corresponding basename in the folders listed above.
all_species_q300
- created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”A1
- species count for A, wild typeAnonsense1
- species count for A, mutatedAprime1
- species count for A’ in 1-paralog models (or A’1 in the multiparalog model)Aprime2
- species count for A’2 in the multiparalog model. Not included in other 1-paralog model outputB1
- species count for Bparamset
- parameter set ID, corresponding to parameter set in the corresponding table from “Usage” point 4.version
- analysis version identifier, in the “Description of the file structure” subfolder names, with periods instead of hyphens or underscorestime
- an integer, corresponding to the simulation timepoint at which the pseudo-single-cell was observedmutated_alleles
- an integer, corresponding to the number of alleles of A mutated. 0, 1, or 2.
initialsim_species*_q300
- created programmatically by simulation code. A table for each parameter set in which each column contains a species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”A1
- species count for A, wild typeAnonsense1
- species count for A, mutatedAprime1
- species count for A’B1
- species count for BBurst1_onorig_targ_allele1
- 0 or 1, corresponding to whether B allele 1 is in the on-state catalyzed by A. 0 = off, 1 = onBurst1_onpara_targ_allele1
- 0 or 1, corresponding to whether B allele 1 is in the on-state catalyzed by A’. 0 = off, 1 = onBurst1_off_targ_allele1
- 0 or 1, corresponding to whether B allele 1 is in the off-state. 0 = on, 1 = offBurst1_onorig_targ_allele2
- 0 or 1, corresponding to whether B allele 2 is in the on-state catalyzed by A. 0 = off, 1 = onBurst1_onpara_targ_allele2
- 0 or 1, corresponding to whether B allele 2 is in the on-state catalyzed by A’. 0 = off, 1 = onBurst1_off_targ_allele2
- 0 or 1, corresponding to whether B allele 2 is in the off-state. 0 = off, 1 = onBurst1_on_para_allele1
- 0 or 1, corresponding to whether A’ allele 1 is in the on state. 0 = off, 1 = onBurst1_on_para_allele2
- 0 or 1, corresponding to whether A’ allele 2 is in the on state. 0 = off, 1 = onBurst1_on_orig_allele1
- 0 or 1, corresponding to whether A allele 1 is in the on state. 0 = off, 1 = onBurst1_on_orig_allele2
- 0 or 1, corresponding to whether A allele 2 is in the on state. 0 = off, 1 = onBurst1_is_mutated_allele1
- 0 or 1, corresponding to whether A allele 1 is mutated. 0 = wild type, 1 = mutatedBurst1_is_mutated_allele2
- 0 or 1, corresponding to whether A allele 2 is mutated. 0 = wild type, 1 = mutatedtime
- an integer, corresponding to the simulation timepoint at which the pseudo-single-cell was observed
initialsim_rates*
- created programmatically by simulation code, where * is in [1,100]. Each column contains one entry for the specified parameter values for that parameter set * ID.r_prodbasal_A1
- Basal production rate of mRNA from an allele of A, wild typer_prodbasal_Anonsense1
- Basal production rate of mRNA from an allele of A, mutatedr_prodbasal_Aprime1
- Basal production rate of mRNA from an allele of A’r_prodbasal_B1
- Basal production rate of mRNA from an allele of Br_prodon_A1
- Production rate of mRNA from an active allele of A, wild typer_prodon_Anonsense1
- Production rate of mRNA from an active allele of A, mutatedr_prodon_Aprime1
- Production rate of mRNA from an active allele of A’r_prodon_B1
- Production rate of mRNA from an active allele of Bd_Aprime1_B1
- Factor by which the mRNA production rate of B is lower in the A’-directed B active state than in A-directed B active stater_deg_A1
- Degradation rate of mRNA of A, wild typer_deg_Anonsense1
- Degradation rate of mRNA of A, mutatedr_deg_Aprime1
- Degradation rate of mRNA of A’r_deg_B1
- Degradation rate of mRNA of Br_onbasal_A1
- Basal activation rate of an allele of A, wild typer_onbasal_Anonsense1
- Basal activation rate of an allele of A, mutatedr_onbasal_Aprime1
- Basal activation rate of an allele of A’r_onbasal_B1
- Basal activation rate of an allele of Br_nitc_byAnonsense1_A1
- Additional activation of an allele of A, wild type, upon nonsense-induced transcriptional compensation caused by product Anonsenser_nitc_byAnonsense1_Anonsense1
- Additional activation of an allele of A, mutated, upon nonsense-induced transcriptional compensation caused by product Anonsenser_nitc_byAnonsense1_Aprime1
- Additional activation of an allele of A’ upon nonsense-induced transcriptional compensation caused by product Anonsenser_addon_byA1_B1
- Activation rate of B by A to one of two respective B active states specified in the modelr_addon_byAprime1_B1
- Activation rate of B by A’ to one of two respective B active states specified in the modelr_off_A1
- Inactivation rate of an allele of A, wild typer_off_Anonsense1
- Inactivation rate of an allele of A, mutatedr_off_Aprime1
- Inactivation rate of an allele of A’r_offorig_B1
- Inactivation rate of an allele of B in the on-state catalyzed by Ar_offpara_B1
- Inactivation rate of an allele of B in the on-state catalyzed by A’k_A1
- Dissociation constant of the Hill function for A, wild typek_Anonsense1
- Dissociation constant of the Hill function for A, mutatedk_Aprime1
- Dissociation constant of the Hill function for A’k_B1
- Dissociation constant of the Hill function for Bn_A1
- Hill coefficient for A, wild typen_Anonsense1
- Hill coefficient for A, mutatedn_Aprime1
- Hill coefficient for A’n_B1
- Hill coefficient for B
Sharing/Access information
Links to other publicly accessible locations of the data:
Data was derived from the following sources: simulations of gene regulatory networks with transcriptional adaptation as described in
Mellis IA, Melzer ME, Bodkin N, and Goyal Y. 2024. Prevalence of and gene regulatory constraints on transcriptional adaptation in single cells. Genome Biology.
Code/Software
All code required to reproduce these simulation outputs is available on Zenodo and Github.
The dataset was generated through simulations of gene regulatory networks with transcriptional adaptation, where genes are represented as nodes and regulatory relationships as edges. Please see the corresponding manuscript for a complete description of the model, simulation conditions, and analyses.
The regulatory networks were constructed using a modified telegraph model for transcriptional bursting, where each gene's alleles switch between active (transcribing) and inactive (quiescent) states, for an ancestral regulator gene (A), its paralogs (A'1 and A'2), and a downstream target gene (B). To capture the dynamics of gene regulation, we implemented stochastic simulations using Gillespie’s next reaction method. We accounted for differential regulatory effects of different gene products. We included models for both activating and repressing interactions, as well as models with more than one paralog gene.
Parameter ranges were defined based on previously reported literature on transcriptional regulation and other related phenomena, as described in the corresponding manuscript. Parameter sets for simulations were sampled with latin hypercube sampling from the defined parameter ranges. Each simulation ran for 300,000 timesteps, simulating three genotypes: wildtype, heterozygous, and homozygous-mutant for 100,000 timesteps each. A total of approximately 60,000 simulations were performed across different network models.
The simulations were performed in MATLAB R2017a, R2021b, and R2024a. This extensive dataset and the corresponding analysis provides valuable insights into the stochastic behavior of gene regulatory networks with transcriptional adaptation.
For a full description of the models and simulations, please see the preprint and corresponding forthcoming publication.