Mechanistic interactions as the origin of modularity in biological networks
Data files
Mar 14, 2024 version files 21.30 MB

modularity_in_biological_networks.zip

README.md
Abstract
Biological networks are often modular. Explanations for this peculiarity either assume an adaptive advantage of a modular design such as higher robustness, or attribute it to neutral factors such as constraints underlying network assembly. Interestingly, most insights on the origin of modularity stem from models in which interactions are either determined by highly simplistic mechanisms or have no mechanistic basis at all. Yet, empirical knowledge suggests that biological interactions are often mediated by complex structural or behavioural traits. Here, we investigate the origins of modularity using a model in which interactions are determined by potentially complex traits. Specifically, we model system elements  such as the species in an ecosystem  as finitestate machines (FSMs) and determine their interactions by means of communication between the corresponding FSMs. Using this model, we show that modularity likely emerges for free. We further find that the more modular an interaction network is, the less complex are the traits that mediate the interactions. Altogether, our results suggest that the conditions for modularity to evolve may be much broader than previously thought.
README: Mechanistic interactions as the origin of modularity in biological networks  Code & Data
The repository contains the Python code (model and analysis) and data for:
Wechsler, D., and J. Bascompte. Mechanistic interactions as the origin of modularity in biological networks.
unpublished.
modularity_in_biological_networks.zip
├── core/ Model code and measures
│ ├── basic_measures.py Functions to compute e.g. connectance
│ ├── finite_state_machine.py Finitestate machine class
│ ├── fsm_minimization.py Implementation of the procedure to minimize a set of FSM (used to compute trait complexity)
│ ├── modularity.py Functions to compute modularity (invokes R package BiRewire)
│ ├── modularity_maximization.py Algorithm to find the maximal modularity of a network with a given connectance or degree sequence (see sections S3 and S4 in supplementary material).
│ ├── network_evolution.py Evolves networks of a given connectance and modularity (Experiment 2  see also S7 in supplementary material)
│ └── utilities.py
│
├── data/ The data obtained from Experiment 1 and 2 (analysed in the jupyter notebook 'results_csv.ipynb'). See the description of the columns below.
│ ├── random_assemblages_modularity.csv Data from experiment 1
│ ├── random_assemblages_complexity_NN.csv Data from experiment 1
│ ├── random_assemblages_complexity_n.csv Data from experiment 1
│ ├── opt_modularity_NN.csv Data from experiment 2
│ ├── opt_modularity_c.csv Data from experiment 2
│ ├── specificity_vs_connectance.csv Data for supplementary figure S2
│ └── genotype_phenotype/
│ ├── phenotypes.csv Data from sampling genotype space (figure 6)
│ ├── discovery_times.csv Phenotype discovery time
│ ├── delta1/ Data for Figure S12
│ └── delta05/ Data for Figure S13
│
├── lib/ Some functions used for plotting in 'results_csv.ipynb'
├── plots/ The plots generated by the analysis script 'results_csv.ipynb'
├── experiment_1.py A script illustrating the setup used in Experiment 1
├── experiment_1.py A script illustrating the setup used in Experiment 2
├── sample_phenotypes.py A script to sample genotype space (Figure 6)
├── readme.txt This file.
└── results_csv.ipynb Jupyter notebook containing the analysis scripts (uses csv files in data)
Description of columns in data (csv files)
Each row corresponds to a single simulation run.
Some columns corresponding to variables not considered in the analysis contain no values.
Experiment 1
 random_assemblages_complexity_n.csv
 random_assemblages_complexity_NN.csv
 random_assemblages_modularity.csv
Parameter values (inputs):
Column  Description 

NN  Number of species N . 
n  Number of FSM states n . 
th  Interaction specificity δ . 
FSM_COMPLEXITY_MAX_NO_IMPROVE  Defines after how many consecutive iterations without a reduction of average complexity the algorithm to compute trait complexity stops. 
NUM_RANDOMIZATIONS  Number of randomizations to calculate Q_rand . 
MAX_MODULARITY_MAX_NO_IMPROVEMENT  Defines after how many consecutive iterations without an increase in modularity the greedy algorithm to approximate Q_max stops (see supplementary document S4). 
Measurements (outputs):
Column  Description 

c  Connectance of the interaction network of the sampled assemblage. 
mean_rel_complexity  Average complexity of the FSM 
rel_complexity_trace  The sequence of complexity traversed during minimization. 
Q  Modularity of the interaction network of the sampled assemblage. 
Q_rand  Modularity of a random network with the same degree sequence as the original network. 
Q_norm  Normalized modularity of the interaction network of the sampled assemblage. 
Q_max  Maximum value of modularity. 
num_modules  Number of modules detected in the network. 
Q_p_value  Pvalue of modularity. 
Q_z_score  Zscore of modularity. 
Q_max_trace  The sequence of modularity values traversed towards Q_max . 
I  Average pairwise mutual information of the interaction network. 
duration_Q  Time (in seconds) it took to compute normalized modularity. 
duration_complexity  Time (in seconds) it took to compute average complexity. 
modularity_significant  Whether the network is significantly modular. 
complexity_normalized  rel_complexity_trace divided by n . 
The following columns store metainformation about the simulation run:
SIM_ID, ID, REPLICATE, STATUS, DURATION, RANDOM_SEED, DISPATCHER_ID, MSG, FK_SIMULATION, ID
Experiment 2
 opt_modularity_c.csv
 opt_modularity_NN.csv
Parameter values (inputs):
Column  Description 

NN  Number of species N . 
n  Number of FSM states n . 
th  Interaction specificity δ . 
c_target  Target value of connectance (see supplementary document S7). 
Q_target  Target value of normalized modularity (see supplementary document S7). 
MAX_UPDATES  The number of iterations at which the algorithm to evolve assemblages with a network of desired connectance and modularity is stopped. 
FSM_COMPLEXITY_MAX_NO_IMPROVE  Defines after how many consecutive iterations without a reduction of average complexity the algorithm to compute trait complexity stops. 
NUM_RANDOMIZATIONS  Number of randomizations to calculate Q_rand . 
MAX_MODULARITY_MAX_NO_IMPROVEMENT  Defines after how many consecutive iterations without an increase in modularity the greedy algorithm to approximate Q_max stops (see supplementary document S4). 
C_EPSILON  Specifies allowed deviation from c_target (precision). 
Q_NORM_EPSILON  Specifies allowed deviation from Q_target (precision). 
MUT_EPSILON  Specifies allowed deviation from average pairwise mutual information of a random network M_rand (precision). 
NUM_Q_MAX_SAMPLES  Defines how often the greedy algorithm to approximate Q_max is started (the highest value reached among the runs is taken as Q_max). 
Measurements (outputs):
Column  Description 

c_start  Connectance of the network of the initial assemblage. 
c_end  Connectance of the network of the final assemblage (should be close to c_target ). 
avg_rel_complexity_end  Average complexity of the final assemblage. 
Q_start  Modularity of the network of the initial assemblage. 
Q_end  Modularity of the network of the final assemblage. 
Q_rand  The expected value of modularity of a random network (with same number of nodes and connectance). 
Q_norm_start  Normalized modularity of the network of the initial assemblage. 
Q_norm_end  Normalized modularity of the network of the final assemblage (should be close to Q_target ). 
Q_max  The maximum possible value of modularity. 
Q_p_value_start  Pvalue of modularity of the network of the initial assemblage. 
Q_z_score_start  Zscore of modularity of the network of the final assemblage. 
Q_p_value_end  Pvalue of modularity of the network of the final assemblage. 
Q_z_score_end  Zscore of modularity of the network of the final assemblage. 
E_I  Expected average pairwise mutual information of a random network. 
I_end  Value of average pairwise mutual information after evolving assemblage to towards E_I (i.e., I_end should be close to E_I ) 
Q_rand_samples  The modularity values of the randomizations. 
G_start  Adjacency matrix of the initial network. 
G_end  Adjacency matrix of the final network. 
num_groups_start  Number of modules in the network of the initial assemblage. 
num_groups_end  Number of modules in the network of the final assemblage. 
num_iterations_modularity  The number of iterations it took to evolve the assemblage to have a network with modularity equal to Q_target . 
num_iterations_randomization  The number of iterations it took to evolve the assemblage to have network with E_I (average pairwise mutual information of a random network). 
duration_randomization  Time (in seconds) it took to evolve assemblage to have a network with E_I (average pairwise mutual information of a random network). 
duration_modulairty  Time (in seconds) it took to evolve assemblage to have a network with modularity equal to Q_target . 
duration_complexity  Time (in seconds) it took to compute 
duration_Q_max  Time (in seconds) it took to compute maximum modularity. 
duration_Q_rand  Time (in seconds) it took to compute the modularity of a random network. 
Columns storing metainformation about the simulation run:
SIM_ID, ID, REPLICATE, STATUS, DURATION, RANDOM_SEED, DISPATCHER_ID, MSG, FK_SIMULATION, ID
Sampling Phenotypes
 phenotypes.csv
Parameter values (inputs):
Column  Description 

phenotype  Binary representation of the phenotype. 
count  How many genotypes were sampled with that phenotype. 
 discovery_times.csv
Parameter values (inputs):
Column  Description 

sample_number  The sample number (i.e. time). 
num_phenotypes  The number of phenotypes discovered until the given sample (i.e., time). 