Data from: Gene network topology drives the mutational landscape of gene expression
Data files
Jul 03, 2025 version files 13.78 MB
-
GeneRation_Data_Analysis_pack.zip
13.77 MB
-
README.md
7.40 KB
Abstract
Regulatory mutations, coding sequence variations, and gene deletions and duplications are generally expected to have qualitatively different effects on fitness during adaptation. We aim to ground this expectation within a theoretical framework using evolutionary simulations of gene regulatory networks (GRNs) controlling the expression of fitness-related genes. We examined the distribution of fitness effects as a function of the type of mutation and the topology of the gene network. Contrary to our expectation, the GRN topology had more influence on the effect of mutations than the type of mutation itself. In particular, the topology conditioned (i) the speed of adaptation, (ii) the distribution of fitness effects, and (iii) the degree of pleiotropy which acts as explanatory factor for all mutation types. All mutations had the potential to participate in adaptation, although their propensity to generate beneficial variants differed according to the network topology. In scale-free networks, arguably the most common topology for biological networks, coding mutations were more pleiotropic and overrepresented in both beneficial and deleterious mutations, while regulatory mutations were more often neutral. However, this observation was not general, as this pattern was reversed in the other network topologies. These results highlight the critical role of gene interactions in defining mutations' contributions to adaptation.
Project organization
The project repository contains the following directories:
| Directory | Location |
|---|---|
| Extracted_data | Dryad |
| figures | empty |
| movies | empty |
| SIMULATIONS | Dryad |
| src | Zenodo (Software Related work) |
| various R scripts | Zenodo (Software Related work) |
It is necessary to have both the code (from Zenodo) and simulation data in the same directory to generate the figures. Simulation data can be either downloaded from Dryad, or re-generated from the code (takes several days depending on the hardware).
Simulation data in Dryad
This Dryad repository contains a zip file (GeneRation_Data_Analysis_pack.zip) with two directories:
-
SIMULATIONSfolder: raw simulation results (two replicated runs for each network topology: HIGHCO (highly connected), RANDOM (random), SCALEF (scale free)). Each run reports: the parameter file (.param), the initial and final populations (.rds files), the full simulation table (.table), and two PNG image files (.graphall and .heatmap) that monitor the behavior of the simulations (warning: depending on the system setting, it might be necessary to add the .png extension to these files for a correct visualization). The folder names in the Dryad repository are:- 0612_215807_HIGHCO_002_november
- 0612_215807_HIGHCO_004_november
- 0613_091236_RANDOM_001_november
- 0613_091244_RANDOM_002_november
- 0613_222625_SCALEF_002_november
- 0613_222625_SCALEF_002_november
-
Extracted_data: processed simulation results- allmutations_20250619003541_001_TUTO_SIMUS.csv
This file is a table containing the effects of mutations calculated in the second step of the analysis, from the last generations of the simulations.
Columns in this csv file stands for:
-
id: unique mutation #id (line number)
-
name: simulation series (one of the folder name in the SIMULATIONS directory)
-
topology: either HIGHCO, RANDOM, or SCALEF
-
mut_type: either REG, COD, or DUP (regulatory, coding, duplication, respectively)
-
i, j: coordinates of the affected slot in the W matrix. for COD mutations, j=11 (technically, the vector of the coding value was the 11th column of the genotype matrix). DUP mutations are not associated with i,j coordinates, so the value is NA.
-
mutvalue_init and mutvalue_final: value of the slot i,j before and after the mutation (NA for DUP mutations)
-
mutvalue_delta: mutvalue_final - mutvalue_init
-
dupdel_number: number of duplicated genes (NA if not DUP)
-
dupdel_genes: identity of duplicated genes (indexed from 1 to 10, i.e. 594 stands for genes 5, 9, and 4; 710 stands for 7 and 10)
-
fit_pop: fitness of the average genotype in the population
-
fit_indE1: fitness before mutation in environment 1 (= fit_pop since mutations were performed from the average genotype)
-
fit_indE2: fitness before mutation in environment 2
-
fit_mutE2: fitness after mutation in environment 2
-
fit_delta: difference before - after mutation
-
fit_effect: NEUTR, BENEF, or DELET
-
fit_thresh (always 0.001): threshold used to determine whether mutations are neutral or not
-
pleiotropy: number of genes which expression was affected by the mutation
-
affected_genes_thresh: phenotypic threshold (in gene expression unit) to determine whether a mutation affects gene expression
-
reg_muteff, cod_mutef: same as mutvalue_delta, but each column is specific of a type of mutation
-
notes: comments
-
The next columns are relevant only for mutations that have a phenotypic effect. They give the gene id (gene), the change in gene expression (dExpr), and the nature of the effect (cis or trans) for a random gene (rand suffix), the minimum expression change (min suffix), the maximum expression change (max suffix), and the mean expression change (mean suffix). The last column (Expr_effect_cisness) is the cis/(cis+trans) ratio (1 for cis-only mutations, 1/10 for mutations that affect all 10 genes).
Other files:
-
storeFits_TUTO_SIMUS: csv file, average fitnesses at every generation for the simulation runs from the SIMULATIONS directory
-
storeJustLastFits_TUTO_SIMUS: csv file, the same but only the last generation
-
storeLasts_TUTO_SIMUS: csv file, full simulation summary (from the simulation tables in the SIMULATIONS directory) but only for the last generation.
- allmutations_20250619003541_001_TUTO_SIMUS.csv
These simulated data were the ones used to generate figures and tables in the associated publication.
GeneRation software
The associated Zenodo repository contains the R code to reproduce results of Pouzet & Le Rouzic 2025 - Evolution: "Gene network topology drives the mutational landscape of gene expression".
Packages required
This work was carried out on R version 4.04 (R Core Team 2021)
- rlist (v0.4.6.2) - save and load simulations
- RColorBrewer (v1.1.2) - color generation
- igraph (v1.2.10) - display networks
- MASS (v7.3.53.1) - fits for degree distributions
Reproduction of the results
- Make sure R has the packages installed.
- Launch simulations: In bash, use cd /SIMULATIONS to move to the correct directory and launch sh ../0-run_simulations.sh, which will launch 9 small simulations labeled "TUTO_SIMUS"
- Extract simu features and isolate successful ones (1-data_extract_from_simus.R)
- Carry out mutation tests (2-run_mutation_tests.R)
- Plot figure 2: check and plot simu, fit and plot degree (3-Plot_fig2.R)
- Plot figure 3: compare adaptation profiles (4-Plot_fig3.R)
- Plot figure 4: Plot fitness effects (5-Plot-fig4.R)
- Plot figure 5: Plot cis-effects and pleiotropy (6-Plot-fig5.R)
- Plot figure 6: Enrichment analysis (7-Plot-fig6.R)
Folders
- /SIMULATIONS folder: where raw simulations results are placed
- /Extracted_data folder: contains results extracted from the simulations and mutation tests
- /figures folder: contains generated figures
- /src folder: source files and parameters. Find more details on data structure and parameter files in the readme - it contains the 3 sets of parameter files used to obtain the 3 topologies described in the article, as well as a detail account of each parameter used in the model.
- src/playground: see this folder for a small tutorial and useful functions. It contains a step by step template R script to get familiar with the treatment of the data generated from the simulations. Two simulation outputs are also included.
Notes
- Figure 1 is the materials and methods figures, so not part of the results reproduction.
- Each subfolder has a corresponding README which will provide additional detail.
Useful links
- Access to the paper is via https://doi.org/10.1093/evolut/qpaf068
- This github repository is https://github.com/spouze/GeneRation_Pouzet2025/
- The Dryad corresponding repository is https://doi.org/10.5061/dryad.2fqz61312
- The former BioRxiv entry is https://www.biorxiv.org/content/10.1101/2024.11.28.625874v1
