Simulations of gene regulatory networks with transcriptional adaptation

Mellis, Ian1 ; Melzer, Madeline2 ; Bodkin, Nicholas2 ; Goyal, Yogesh 2 ; Sohini, Rohan 2

Research facility: Northwestern University

Published Aug 02, 2024 on Dryad. https://doi.org/10.5061/dryad.nk98sf82j

Data files

Aug 02, 2024 version files 2.61 GB

README.md
18.01 KB
Simulations.zip
2.61 GB

Abstract

Background

Cells and tissues have a remarkable ability to adapt to genetic perturbations via a variety of molecular mechanisms. Transcriptional adaptation has recently emerged as one such mechanism, in which nonsense mutations in a gene trigger upregulation of related genes, possibly conferring robustness at cellular and organismal levels. However, beyond a handful of developmental contexts and curated sets of genes, no comprehensive genome-wide investigation of this behavior has been undertaken for mammalian cell types and contexts. Further, how the regulatory-level effects of inherently stochastic compensatory gene networks contribute to phenotypic penetrance in single cells remains unclear.

Results

In the corresponding manuscript, we analyze existing bulk and single-cell transcriptomic datasets to infer the prevalence of transcriptional adaptation in mammalian systems across diverse contexts and cell types. In the data presented here, stochastic mathematical modeling of minimal compensatory gene networks qualitatively recapitulates several aspects of transcriptional adaptation. Combined with machine learning analysis of network features of interest, our framework offers potential explanations for which regulatory steps are most important for transcriptional adaptation.

Conclusions

We provide a formal quantitative framework to test and refine models of transcriptional adaptation.

https://doi.org/10.5061/dryad.nk98sf82j

Project Description

Outputs of simulations of gene regulatory networks with transcriptional adaptation (also known as nonsense-induced transcriptional compensation) as described in

Mellis IA, Melzer ME, Bodkin N, and Goyal Y. 2024. Prevalence of and gene regulatory constraints on transcriptional adaptation in single cells. Genome Biology.

We performed simulations of gene regulatory networks with transcriptional adaptation, where genes are represented as nodes and regulatory relationships as edges.

The regulatory networks were constructed using a modified telegraph model for transcriptional bursting, where each gene’s alleles switch between active (transcribing) and inactive (quiescent) states, for an ancestral regulator gene (A), its paralogs (A’ for 1-paralog models, A’1 and A’2 for multiparalog models), and a downstream target gene (B). To capture the dynamics of gene regulation, we implemented stochastic simulations using Gillespie’s next reaction method. We accounted for differential regulatory effects of different gene products. We included models for both activating and repressing interactions, as well as models with more than one paralog gene.

Parameter ranges were defined based on previously reported literature on transcriptional regulation and other related phenomena, as described in the corresponding manuscript. Parameter sets for simulations were sampled with latin hypercube sampling from the defined parameter ranges. Each simulation ran for 300,000 timesteps, simulating three genotypes: wildtype, heterozygous, and homozygous-mutant for 100,000 timesteps each. A total of approximately 60,000 simulations were performed across different network models.

Usage

The companion publication to this dataset is

Mellis IA, Melzer ME, Bodkin N, and Goyal Y. 2024. Prevalence of and gene regulatory constraints on transcriptional adaptation in single cells. Genome Biology.

To use the data herein:

Review the companion publication Methods section and its Additional File 2: Supplementary Note for all details about simulation rationale, setup, and execution.
Download necessary software and packages: MATLAB R2017a, R2021b, or R2024a and R v4.3.2 using packages readxl v1.4.0, partykit v1.2-16, mvtnorm v1.1-3, libcoin v1.0-9, ggalluvial v0.12.3, entropy v1.3.1, svglite v2.1.0, corrplot v0.92, ggrepel v0.9.1, e1071 v1.7-11, diptest v0.76-0, gridExtra v2.3, Hmisc v4.7-0, Formula v1.2-4, survival v3.3-1, lattice v0.20-45, ineq v0.2-13, magrittr v2.0.3, forcats v0.5.1, stringr v1.4.0, dplyr v1.0.9, purrr v0.3.4, readr v2.1.2, tidyr v1.2.0, tibble v3.1.7, ggplot2 v3.3.6, tidyverse v1.3.1, gprofiler2 v0.2.3.
Clone the repository used to generate and analyze this dataset from GitHub or Zenodo.
Download parameter set values used to generate the outputs in this dataset in the companion publication Additional File 1: Table S4, which includes all parameter sets used to generate every file in this dataset. Random seeds are included in the corresponding files in the code repository in Step 3.
Execute analysis code from the GitHub or Zenodo respository Analysis/simulations subfolder after downloading the above software, scripts, and metadata.

Description of the file structure

The data here are sets of tables in which each row is a sampled “pseudo-single-cell” at one point in time and each column is a feature, such as a gene product abundance or the status of an allele’s activation state. The file structure is organized according to the type of gene regulatory network simulated. Details of Activator_1paralog, Multiparalog, and Repressor networks are all extensively described in the companion publication.

For each network, there is a subfolder for each of several different simulation conditions, also described in the companion publication. The companion publication Additional File 1: Table S4 lists exact parameter sets simulated for each subfolder here. The companion publication Methods section and Additional File 2: Supplementary Note of the companion publication provides a complete explanation of the simulation goals, setup, and analysis workflow.

All named features used in the tables below were generated programmatically and their roles in simulation execution and output is described in the companion publication. All named features are explained in the “Contents of data files” section, below.

Subfolders and contents:

Activator_1paralog - stores simulation outputs from several 1-paralog models in which upstream factors A and A’ activate expression of downstream effector B. Contains subfolders for each of several model conditions
- trimodal_resample_v1-6-5-1 - stores simulation outputs after parameter set resampling with a custom radius around the index trimodal case described in the companion publication
  - initialsim_rates*.csv - created programmatically by simulation code, where * is in [1,100]. Each column contains one entry for the specified parameter values for that parameter set * ID.
  - initialsim_species*_q300.csv - created programmatically by simulation code, where * is in [1,100]. Each column contains species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
- trimodal_resampled2_v1_6_5_1_2 - stores simulation outputs after parameter set resampling with a minimum observed radius around the index trimodal case described in the companion publication
  - initialsim_rates*.csv - created programmatically by simulation code, where * is in [1,100]. Each column contains one entry for the specified parameter values for that parameter set * ID.
  - initialsim_species*_q300.csv - created programmatically by simulation code, where * is in [1,100]. Each column contains species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
- trimodal_resampled3_v1_6_5_1_3 - stores simulation outputs after parameter set resampling with a 10th percentile observed radius around the index trimodal case described in the companion publication
  - initialsim_rates*.csv - created programmatically by simulation code, where * is in [1,100]. Each column contains one entry for the specified parameter values for that parameter set * ID.
  - initialsim_species*_q300.csv - created programmatically by simulation code, where * is in [1,100]. Each column contains species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
- trimodal_resampled4_v1_6_5_1_4 - stores simulation outputs after parameter set resampling with a 10th percentile observed radius around the index trimodal case described in the companion publication
  - initialsim_rates*.csv - created programmatically by simulation code, where * is in [1,100]. Each column contains one entry for the specified parameter values for that parameter set * ID.
  - initialsim_species*_q300.csv - created programmatically by simulation code, where * is in [1,100]. Each column contains species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
- unimodal_robust_resample_v1-6-5-3 - stores simulation outputs after parameter set resampling within the candidate unimodal symmetric fully-robust parameter subspace described in the companion publication
  - all_species_q300.csv - created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
- unimodal_shape_v1_6_5_2_1 - stores simulation outputs after parameter set resampling within the candidate unimodal symmetric shape-robust parameter subspace described in the companion publication
  - initialsim_rates*.csv - created programmatically by simulation code, where * is in [1,100]. Each column contains one entry for the specified parameter values for that parameter set * ID.
  - initialsim_species*_q300.csv - created programmatically by simulation code, where * is in [1,100]. Each column contains species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
- with_basal_paralog_v1-6-6-1 - stores simulation outputs over the entire parameter space for a network in which there is basal expression of A’
  - all_species_q300.csv - created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
- without_basal_paralog_v1-6-5-2 - stores simulation outputs over the entire parameter space for a network in which there is no basal expression of A’
  - all_species_q300.csv - created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
Multiparalog - stores simulation outputs from 2-paralog models in which upstream factors A, A’1, and A’2 activate expression of downstream effector B. Contains subfolders for each model condition
- different_parameters_v1_7_2 - stores simulation outputs over the entire parameter space for a network in which A’1 and A’2 are allowed to have different parameter parameter values
  - all_species_q300.csv - created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
- same_parameters_v1_7_1 - stores simulation outputs over the entire parameter space for a network in which A’1 and A’2 are not allowed to have different parameter parameter values
  - all_species_q300.csv - created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
Repressor - stores simulation outputs from 1-paralog model in which upstream factors A and A’ repress expression of downstream effector B, in which B has basal expression. Contains subfolders for each model condition
- with_basal_paralog_v1_6_10_1 - stores simulation outputs over the entire parameter space for a network in which there is basal expression of A’
  - all_species_q300.csv - created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
- without_basal_paralog_v1_6_11_1 - stores simulation outputs over the entire parameter space for a network in which there is not basal expression of A’
  - all_species_q300.csv - created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”

Contents of data files

Descriptions of the columns in all data files with the corresponding basename in the folders listed above.

all_species_q300 - created programmatically by simulation code. A table in which each column contains a parameter set ID or species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
- A1 - species count for A, wild type
- Anonsense1 - species count for A, mutated
- Aprime1 - species count for A’ in 1-paralog models (or A’1 in the multiparalog model)
- Aprime2 - species count for A’2 in the multiparalog model. Not included in other 1-paralog model output
- B1 - species count for B
- paramset - parameter set ID, corresponding to parameter set in the corresponding table from “Usage” point 4.
- version - analysis version identifier, in the “Description of the file structure” subfolder names, with periods instead of hyphens or underscores
- time- an integer, corresponding to the simulation timepoint at which the pseudo-single-cell was observed
- mutated_alleles - an integer, corresponding to the number of alleles of A mutated. 0, 1, or 2.
initialsim_species*_q300 - created programmatically by simulation code. A table for each parameter set in which each column contains a species values or states, and each row corresponds to one observation of a “pseudo-single-cell.”
- A1 - species count for A, wild type
- Anonsense1 - species count for A, mutated
- Aprime1 - species count for A’
- B1 - species count for B
- Burst1_onorig_targ_allele1 - 0 or 1, corresponding to whether B allele 1 is in the on-state catalyzed by A. 0 = off, 1 = on
- Burst1_onpara_targ_allele1 - 0 or 1, corresponding to whether B allele 1 is in the on-state catalyzed by A’. 0 = off, 1 = on
- Burst1_off_targ_allele1 - 0 or 1, corresponding to whether B allele 1 is in the off-state. 0 = on, 1 = off
- Burst1_onorig_targ_allele2 - 0 or 1, corresponding to whether B allele 2 is in the on-state catalyzed by A. 0 = off, 1 = on
- Burst1_onpara_targ_allele2 - 0 or 1, corresponding to whether B allele 2 is in the on-state catalyzed by A’. 0 = off, 1 = on
- Burst1_off_targ_allele2 - 0 or 1, corresponding to whether B allele 2 is in the off-state. 0 = off, 1 = on
- Burst1_on_para_allele1 - 0 or 1, corresponding to whether A’ allele 1 is in the on state. 0 = off, 1 = on
- Burst1_on_para_allele2 - 0 or 1, corresponding to whether A’ allele 2 is in the on state. 0 = off, 1 = on
- Burst1_on_orig_allele1 - 0 or 1, corresponding to whether A allele 1 is in the on state. 0 = off, 1 = on
- Burst1_on_orig_allele2 - 0 or 1, corresponding to whether A allele 2 is in the on state. 0 = off, 1 = on
- Burst1_is_mutated_allele1 - 0 or 1, corresponding to whether A allele 1 is mutated. 0 = wild type, 1 = mutated
- Burst1_is_mutated_allele2 - 0 or 1, corresponding to whether A allele 2 is mutated. 0 = wild type, 1 = mutated
- time - an integer, corresponding to the simulation timepoint at which the pseudo-single-cell was observed
initialsim_rates* - created programmatically by simulation code, where * is in [1,100]. Each column contains one entry for the specified parameter values for that parameter set * ID.
- r_prodbasal_A1 - Basal production rate of mRNA from an allele of A, wild type
- r_prodbasal_Anonsense1 - Basal production rate of mRNA from an allele of A, mutated
- r_prodbasal_Aprime1 - Basal production rate of mRNA from an allele of A’
- r_prodbasal_B1 - Basal production rate of mRNA from an allele of B
- r_prodon_A1 - Production rate of mRNA from an active allele of A, wild type
- r_prodon_Anonsense1 - Production rate of mRNA from an active allele of A, mutated
- r_prodon_Aprime1 - Production rate of mRNA from an active allele of A’
- r_prodon_B1 - Production rate of mRNA from an active allele of B
- d_Aprime1_B1 - Factor by which the mRNA production rate of B is lower in the A’-directed B active state than in A-directed B active state
- r_deg_A1 - Degradation rate of mRNA of A, wild type
- r_deg_Anonsense1 - Degradation rate of mRNA of A, mutated
- r_deg_Aprime1 - Degradation rate of mRNA of A’
- r_deg_B1 - Degradation rate of mRNA of B
- r_onbasal_A1 - Basal activation rate of an allele of A, wild type
- r_onbasal_Anonsense1 - Basal activation rate of an allele of A, mutated
- r_onbasal_Aprime1 - Basal activation rate of an allele of A’
- r_onbasal_B1 - Basal activation rate of an allele of B
- r_nitc_byAnonsense1_A1 - Additional activation of an allele of A, wild type, upon nonsense-induced transcriptional compensation caused by product Anonsense
- r_nitc_byAnonsense1_Anonsense1 - Additional activation of an allele of A, mutated, upon nonsense-induced transcriptional compensation caused by product Anonsense
- r_nitc_byAnonsense1_Aprime1 - Additional activation of an allele of A’ upon nonsense-induced transcriptional compensation caused by product Anonsense
- r_addon_byA1_B1 - Activation rate of B by A to one of two respective B active states specified in the model
- r_addon_byAprime1_B1 - Activation rate of B by A’ to one of two respective B active states specified in the model
- r_off_A1 - Inactivation rate of an allele of A, wild type
- r_off_Anonsense1 - Inactivation rate of an allele of A, mutated
- r_off_Aprime1 - Inactivation rate of an allele of A’
- r_offorig_B1 - Inactivation rate of an allele of B in the on-state catalyzed by A
- r_offpara_B1 - Inactivation rate of an allele of B in the on-state catalyzed by A’
- k_A1 - Dissociation constant of the Hill function for A, wild type
- k_Anonsense1 - Dissociation constant of the Hill function for A, mutated
- k_Aprime1 - Dissociation constant of the Hill function for A’
- k_B1 - Dissociation constant of the Hill function for B
- n_A1 - Hill coefficient for A, wild type
- n_Anonsense1 - Hill coefficient for A, mutated
- n_Aprime1 - Hill coefficient for A’
- n_B1 - Hill coefficient for B

Sharing/Access information

Links to other publicly accessible locations of the data:

Northwestern University SharePoint

Data was derived from the following sources: simulations of gene regulatory networks with transcriptional adaptation as described in

Mellis IA, Melzer ME, Bodkin N, and Goyal Y. 2024. Prevalence of and gene regulatory constraints on transcriptional adaptation in single cells. Genome Biology.

Code/Software

All code required to reproduce these simulation outputs is available on Zenodo and Github.

Citation

Subject keywords

Funding

Northwestern University , Startup Funds
Burroughs Wellcome Fund , Career Awards at the Scientific Interface
National Institute of Health : GM144295, T32
National Institute of Health : GM142604, T32
National Science Foundation : DMS-2235451, National Institute for Theory and Mathematics in Biology
Simons Foundation : MPTMPS-00005320

Simulations of gene regulatory networks with transcriptional adaptation

Data files

Abstract

README: Simulations of Gene Regulatory Networks with Transcriptional Adaptation

Project Description

Usage

Description of the file structure

Contents of data files

Sharing/Access information

Code/Software

Methods

Works referencing this dataset