Mechanistic interactions as the origin of modularity in biological networks

Published Mar 14, 2024 on Dryad. https://doi.org/10.5061/dryad.prr4xgxq3

Data files

Mar 14, 2024 version files 21.30 MB

modularity_in_biological_networks.zip

21.29 MB
README.md

16.45 KB

Abstract

Biological networks are often modular. Explanations for this peculiarity either assume an adaptive advantage of a modular design such as higher robustness, or attribute it to neutral factors such as constraints underlying network assembly. Interestingly, most insights on the origin of modularity stem from models in which interactions are either determined by highly simplistic mechanisms or have no mechanistic basis at all. Yet, empirical knowledge suggests that biological interactions are often mediated by complex structural or behavioural traits. Here, we investigate the origins of modularity using a model in which interactions are determined by potentially complex traits. Specifically, we model system elements - such as the species in an ecosystem - as finite-state machines (FSMs) and determine their interactions by means of communication between the corresponding FSMs. Using this model, we show that modularity likely emerges for free. We further find that the more modular an interaction network is, the less complex are the traits that mediate the interactions. Altogether, our results suggest that the conditions for modularity to evolve may be much broader than previously thought.

The repository contains the Python code (model and analysis) and data for:

Wechsler, D., and J. Bascompte. Mechanistic interactions as the origin of modularity in biological networks.
unpublished.

modularity_in_biological_networks.zip
├── core/                           Model code and measures
│   ├── basic_measures.py           Functions to compute e.g. connectance
│   ├── finite_state_machine.py     Finite-state machine class
│   ├── fsm_minimization.py         Implementation of the procedure to minimize a set of FSM (used to compute trait complexity)
│   ├── modularity.py               Functions to compute modularity (invokes R package BiRewire)
│   ├── modularity_maximization.py  Algorithm to find the maximal modularity of a network with a given connectance or degree sequence (see sections S3 and S4 in supplementary material).
│   ├── network_evolution.py        Evolves networks of a given connectance and modularity (Experiment 2 - see also S7 in supplementary material)
│   └── utilities.py
│
├── data/                           The data obtained from Experiment 1 and 2 (analysed in the jupyter notebook 'results_csv.ipynb'). See the description of the columns below.
│   ├── random_assemblages_modularity.csv       Data from experiment 1
│   ├── random_assemblages_complexity_NN.csv    Data from experiment 1
│   ├── random_assemblages_complexity_n.csv     Data from experiment 1
│   ├── opt_modularity_NN.csv                   Data from experiment 2
│   ├── opt_modularity_c.csv                    Data from experiment 2
│   ├── specificity_vs_connectance.csv          Data for supplementary figure S2
│   └── genotype_phenotype/
│       ├── phenotypes.csv                      Data from sampling genotype space (figure 6)
│       ├── discovery_times.csv                 Phenotype discovery time
│       ├── delta1/                             Data for Figure S12
│       └── delta05/                            Data for Figure S13
│
├── lib/                            Some functions used for plotting in 'results_csv.ipynb'
├── plots/                          The plots generated by the analysis script 'results_csv.ipynb'
├── experiment_1.py                 A script illustrating the setup used in Experiment 1
├── experiment_1.py                 A script illustrating the setup used in Experiment 2
├── sample_phenotypes.py            A script to sample genotype space (Figure 6)
├── readme.txt                      This file.
└── results_csv.ipynb		        Jupyter notebook containing the analysis scripts (uses csv files in data)

Description of columns in data (csv files)

Each row corresponds to a single simulation run.

Some columns corresponding to variables not considered in the analysis contain no values.

Experiment 1

random_assemblages_complexity_n.csv
random_assemblages_complexity_NN.csv
random_assemblages_modularity.csv

Parameter values (inputs):

Column	Description
NN	Number of species `N`.
n	Number of FSM states `n`.
th	Interaction specificity `δ`.
FSM_COMPLEXITY_MAX_NO_IMPROVE	Defines after how many consecutive iterations without a reduction of average complexity the algorithm to compute trait complexity stops.
NUM_RANDOMIZATIONS	Number of randomizations to calculate `Q_rand`.
MAX_MODULARITY_MAX_NO_IMPROVEMENT	Defines after how many consecutive iterations without an increase in modularity the greedy algorithm to approximate `Q_max` stops (see supplementary document S4).

Measurements (outputs):

Column	Description
c	Connectance of the interaction network of the sampled assemblage.
mean_rel_complexity	Average complexity of the FSM
rel_complexity_trace	The sequence of complexity traversed during minimization.
Q	Modularity of the interaction network of the sampled assemblage.
Q_rand	Modularity of a random network with the same degree sequence as the original network.
Q_norm	Normalized modularity of the interaction network of the sampled assemblage.
Q_max	Maximum value of modularity.
num_modules	Number of modules detected in the network.
Q_p_value	P-value of modularity.
Q_z_score	Z-score of modularity.
Q_max_trace	The sequence of modularity values traversed towards `Q_max`.
I	Average pairwise mutual information of the interaction network.
duration_Q	Time (in seconds) it took to compute normalized modularity.
duration_complexity	Time (in seconds) it took to compute average complexity.
modularity_significant	Whether the network is significantly modular.
complexity_normalized	`rel_complexity_trace` divided by `n`.

The following columns store meta-information about the simulation run:

SIM_ID, ID, REPLICATE, STATUS, DURATION, RANDOM_SEED, DISPATCHER_ID, MSG, FK_SIMULATION, ID

Experiment 2

opt_modularity_c.csv
opt_modularity_NN.csv

Parameter values (inputs):

Column	Description
NN	Number of species `N`.
n	Number of FSM states `n`.
th	Interaction specificity `δ`.
c_target	Target value of connectance (see supplementary document S7).
Q_target	Target value of normalized modularity (see supplementary document S7).
MAX_UPDATES	The number of iterations at which the algorithm to evolve assemblages with a network of desired connectance and modularity is stopped.
FSM_COMPLEXITY_MAX_NO_IMPROVE	Defines after how many consecutive iterations without a reduction of average complexity the algorithm to compute trait complexity stops.
NUM_RANDOMIZATIONS	Number of randomizations to calculate `Q_rand`.
MAX_MODULARITY_MAX_NO_IMPROVEMENT	Defines after how many consecutive iterations without an increase in modularity the greedy algorithm to approximate `Q_max` stops (see supplementary document S4).
C_EPSILON	Specifies allowed deviation from `c_target` (precision).
Q_NORM_EPSILON	Specifies allowed deviation from `Q_target` (precision).
MUT_EPSILON	Specifies allowed deviation from average pairwise mutual information of a random network `M_rand` (precision).
NUM_Q_MAX_SAMPLES	Defines how often the greedy algorithm to approximate `Q_max` is started (the highest value reached among the runs is taken as Q_max).

Measurements (outputs):

Column	Description
c_start	Connectance of the network of the initial assemblage.
c_end	Connectance of the network of the final assemblage (should be close to `c_target`).
avg_rel_complexity_end	Average complexity of the final assemblage.
Q_start	Modularity of the network of the initial assemblage.
Q_end	Modularity of the network of the final assemblage.
Q_rand	The expected value of modularity of a random network (with same number of nodes and connectance).
Q_norm_start	Normalized modularity of the network of the initial assemblage.
Q_norm_end	Normalized modularity of the network of the final assemblage (should be close to `Q_target`).
Q_max	The maximum possible value of modularity.
Q_p_value_start	P-value of modularity of the network of the initial assemblage.
Q_z_score_start	Z-score of modularity of the network of the final assemblage.
Q_p_value_end	P-value of modularity of the network of the final assemblage.
Q_z_score_end	Z-score of modularity of the network of the final assemblage.
E_I	Expected average pairwise mutual information of a random network.
I_end	Value of average pairwise mutual information after evolving assemblage to towards `E_I` (i.e., `I_end` should be close to `E_I`)
Q_rand_samples	The modularity values of the randomizations.
G_start	Adjacency matrix of the initial network.
G_end	Adjacency matrix of the final network.
num_groups_start	Number of modules in the network of the initial assemblage.
num_groups_end	Number of modules in the network of the final assemblage.
num_iterations_modularity	The number of iterations it took to evolve the assemblage to have a network with modularity equal to `Q_target`.
num_iterations_randomization	The number of iterations it took to evolve the assemblage to have network with `E_I` (average pairwise mutual information of a random network).
duration_randomization	Time (in seconds) it took to evolve assemblage to have a network with `E_I` (average pairwise mutual information of a random network).
duration_modulairty	Time (in seconds) it took to evolve assemblage to have a network with modularity equal to `Q_target`.
duration_complexity	Time (in seconds) it took to compute
duration_Q_max	Time (in seconds) it took to compute maximum modularity.
duration_Q_rand	Time (in seconds) it took to compute the modularity of a random network.

Columns storing meta-information about the simulation run:

SIM_ID, ID, REPLICATE, STATUS, DURATION, RANDOM_SEED, DISPATCHER_ID, MSG, FK_SIMULATION, ID

Sampling Phenotypes

phenotypes.csv

Parameter values (inputs):

Column	Description
phenotype	Binary representation of the phenotype.
count	How many genotypes were sampled with that phenotype.

discovery_times.csv

Parameter values (inputs):

Column	Description
sample_number	The sample number (i.e. time).
num_phenotypes	The number of phenotypes discovered until the given sample (i.e., time).

Mechanistic interactions as the origin of modularity in biological networks

Data files

Abstract

README: Mechanistic interactions as the origin of modularity in biological networks - Code & Data

Description of columns in data (csv files)

Experiment 1

Experiment 2

Sampling Phenotypes

Works referencing this dataset