Mechanistic interactions as the origin of modularity in biological networks
Data files
Mar 14, 2024 version files 21.30 MB
-
modularity_in_biological_networks.zip
-
README.md
Abstract
Biological networks are often modular. Explanations for this peculiarity either assume an adaptive advantage of a modular design such as higher robustness, or attribute it to neutral factors such as constraints underlying network assembly. Interestingly, most insights on the origin of modularity stem from models in which interactions are either determined by highly simplistic mechanisms or have no mechanistic basis at all. Yet, empirical knowledge suggests that biological interactions are often mediated by complex structural or behavioural traits. Here, we investigate the origins of modularity using a model in which interactions are determined by potentially complex traits. Specifically, we model system elements - such as the species in an ecosystem - as finite-state machines (FSMs) and determine their interactions by means of communication between the corresponding FSMs. Using this model, we show that modularity likely emerges for free. We further find that the more modular an interaction network is, the less complex are the traits that mediate the interactions. Altogether, our results suggest that the conditions for modularity to evolve may be much broader than previously thought.
README: Mechanistic interactions as the origin of modularity in biological networks - Code & Data
The repository contains the Python code (model and analysis) and data for:
Wechsler, D., and J. Bascompte. Mechanistic interactions as the origin of modularity in biological networks.
unpublished.
modularity_in_biological_networks.zip
├── core/ Model code and measures
│ ├── basic_measures.py Functions to compute e.g. connectance
│ ├── finite_state_machine.py Finite-state machine class
│ ├── fsm_minimization.py Implementation of the procedure to minimize a set of FSM (used to compute trait complexity)
│ ├── modularity.py Functions to compute modularity (invokes R package BiRewire)
│ ├── modularity_maximization.py Algorithm to find the maximal modularity of a network with a given connectance or degree sequence (see sections S3 and S4 in supplementary material).
│ ├── network_evolution.py Evolves networks of a given connectance and modularity (Experiment 2 - see also S7 in supplementary material)
│ └── utilities.py
│
├── data/ The data obtained from Experiment 1 and 2 (analysed in the jupyter notebook 'results_csv.ipynb'). See the description of the columns below.
│ ├── random_assemblages_modularity.csv Data from experiment 1
│ ├── random_assemblages_complexity_NN.csv Data from experiment 1
│ ├── random_assemblages_complexity_n.csv Data from experiment 1
│ ├── opt_modularity_NN.csv Data from experiment 2
│ ├── opt_modularity_c.csv Data from experiment 2
│ ├── specificity_vs_connectance.csv Data for supplementary figure S2
│ └── genotype_phenotype/
│ ├── phenotypes.csv Data from sampling genotype space (figure 6)
│ ├── discovery_times.csv Phenotype discovery time
│ ├── delta1/ Data for Figure S12
│ └── delta05/ Data for Figure S13
│
├── lib/ Some functions used for plotting in 'results_csv.ipynb'
├── plots/ The plots generated by the analysis script 'results_csv.ipynb'
├── experiment_1.py A script illustrating the setup used in Experiment 1
├── experiment_1.py A script illustrating the setup used in Experiment 2
├── sample_phenotypes.py A script to sample genotype space (Figure 6)
├── readme.txt This file.
└── results_csv.ipynb Jupyter notebook containing the analysis scripts (uses csv files in data)
Description of columns in data (csv files)
Each row corresponds to a single simulation run.
Some columns corresponding to variables not considered in the analysis contain no values.
Experiment 1
- random_assemblages_complexity_n.csv
- random_assemblages_complexity_NN.csv
- random_assemblages_modularity.csv
Parameter values (inputs):
Column | Description |
---|---|
NN | Number of species N . |
n | Number of FSM states n . |
th | Interaction specificity δ . |
FSM_COMPLEXITY_MAX_NO_IMPROVE | Defines after how many consecutive iterations without a reduction of average complexity the algorithm to compute trait complexity stops. |
NUM_RANDOMIZATIONS | Number of randomizations to calculate Q_rand . |
MAX_MODULARITY_MAX_NO_IMPROVEMENT | Defines after how many consecutive iterations without an increase in modularity the greedy algorithm to approximate Q_max stops (see supplementary document S4). |
Measurements (outputs):
Column | Description |
---|---|
c | Connectance of the interaction network of the sampled assemblage. |
mean_rel_complexity | Average complexity of the FSM |
rel_complexity_trace | The sequence of complexity traversed during minimization. |
Q | Modularity of the interaction network of the sampled assemblage. |
Q_rand | Modularity of a random network with the same degree sequence as the original network. |
Q_norm | Normalized modularity of the interaction network of the sampled assemblage. |
Q_max | Maximum value of modularity. |
num_modules | Number of modules detected in the network. |
Q_p_value | P-value of modularity. |
Q_z_score | Z-score of modularity. |
Q_max_trace | The sequence of modularity values traversed towards Q_max . |
I | Average pairwise mutual information of the interaction network. |
duration_Q | Time (in seconds) it took to compute normalized modularity. |
duration_complexity | Time (in seconds) it took to compute average complexity. |
modularity_significant | Whether the network is significantly modular. |
complexity_normalized | rel_complexity_trace divided by n . |
The following columns store meta-information about the simulation run:
SIM_ID, ID, REPLICATE, STATUS, DURATION, RANDOM_SEED, DISPATCHER_ID, MSG, FK_SIMULATION, ID
Experiment 2
- opt_modularity_c.csv
- opt_modularity_NN.csv
Parameter values (inputs):
Column | Description |
---|---|
NN | Number of species N . |
n | Number of FSM states n . |
th | Interaction specificity δ . |
c_target | Target value of connectance (see supplementary document S7). |
Q_target | Target value of normalized modularity (see supplementary document S7). |
MAX_UPDATES | The number of iterations at which the algorithm to evolve assemblages with a network of desired connectance and modularity is stopped. |
FSM_COMPLEXITY_MAX_NO_IMPROVE | Defines after how many consecutive iterations without a reduction of average complexity the algorithm to compute trait complexity stops. |
NUM_RANDOMIZATIONS | Number of randomizations to calculate Q_rand . |
MAX_MODULARITY_MAX_NO_IMPROVEMENT | Defines after how many consecutive iterations without an increase in modularity the greedy algorithm to approximate Q_max stops (see supplementary document S4). |
C_EPSILON | Specifies allowed deviation from c_target (precision). |
Q_NORM_EPSILON | Specifies allowed deviation from Q_target (precision). |
MUT_EPSILON | Specifies allowed deviation from average pairwise mutual information of a random network M_rand (precision). |
NUM_Q_MAX_SAMPLES | Defines how often the greedy algorithm to approximate Q_max is started (the highest value reached among the runs is taken as Q_max). |
Measurements (outputs):
Column | Description |
---|---|
c_start | Connectance of the network of the initial assemblage. |
c_end | Connectance of the network of the final assemblage (should be close to c_target ). |
avg_rel_complexity_end | Average complexity of the final assemblage. |
Q_start | Modularity of the network of the initial assemblage. |
Q_end | Modularity of the network of the final assemblage. |
Q_rand | The expected value of modularity of a random network (with same number of nodes and connectance). |
Q_norm_start | Normalized modularity of the network of the initial assemblage. |
Q_norm_end | Normalized modularity of the network of the final assemblage (should be close to Q_target ). |
Q_max | The maximum possible value of modularity. |
Q_p_value_start | P-value of modularity of the network of the initial assemblage. |
Q_z_score_start | Z-score of modularity of the network of the final assemblage. |
Q_p_value_end | P-value of modularity of the network of the final assemblage. |
Q_z_score_end | Z-score of modularity of the network of the final assemblage. |
E_I | Expected average pairwise mutual information of a random network. |
I_end | Value of average pairwise mutual information after evolving assemblage to towards E_I (i.e., I_end should be close to E_I ) |
Q_rand_samples | The modularity values of the randomizations. |
G_start | Adjacency matrix of the initial network. |
G_end | Adjacency matrix of the final network. |
num_groups_start | Number of modules in the network of the initial assemblage. |
num_groups_end | Number of modules in the network of the final assemblage. |
num_iterations_modularity | The number of iterations it took to evolve the assemblage to have a network with modularity equal to Q_target . |
num_iterations_randomization | The number of iterations it took to evolve the assemblage to have network with E_I (average pairwise mutual information of a random network). |
duration_randomization | Time (in seconds) it took to evolve assemblage to have a network with E_I (average pairwise mutual information of a random network). |
duration_modulairty | Time (in seconds) it took to evolve assemblage to have a network with modularity equal to Q_target . |
duration_complexity | Time (in seconds) it took to compute |
duration_Q_max | Time (in seconds) it took to compute maximum modularity. |
duration_Q_rand | Time (in seconds) it took to compute the modularity of a random network. |
Columns storing meta-information about the simulation run:
SIM_ID, ID, REPLICATE, STATUS, DURATION, RANDOM_SEED, DISPATCHER_ID, MSG, FK_SIMULATION, ID
Sampling Phenotypes
- phenotypes.csv
Parameter values (inputs):
Column | Description |
---|---|
phenotype | Binary representation of the phenotype. |
count | How many genotypes were sampled with that phenotype. |
- discovery_times.csv
Parameter values (inputs):
Column | Description |
---|---|
sample_number | The sample number (i.e. time). |
num_phenotypes | The number of phenotypes discovered until the given sample (i.e., time). |