Supporting information for: Accounting for the topology of road networks to better explain human-mediated dispersal in terrestrial landscapes
Human trade and movements are central to biological invasions worldwide. Human activities not only transport species across biogeographical barriers but also accelerate their post-introduction spread in the landscape. Thus, by constraining human movements, the spatial structure of road networks might greatly affect the regional spread of invasive species. However, few invasion models have accounted for the topology of road networks so far, and its importance for explaining the regional distribution of invasive species remains mostly unexplored. To address this issue, we developed a spatially explicit and mechanistic human-mediated dispersal model that accounts and tests for the influence of transport networks on the regional spread of invasive species. Using as a model the spread of the invasive ant Lasius neglectus in the middle Rhône valley (France), we show that accounting for the topology of road networks improves our ability to explain the current distribution of the invasive ant. In contrast, we found that using human population density as a proxy for the frequency of transport events decreases models’ performance and might thus not be as appropriate as previously thought. Finally, by differentiating road networks into sub-networks, we show that national and regional roads are more important than smaller roads for explaining spread patterns. Overall, our results demonstrate that the topology of transport networks can strongly bias regional invasion patterns and highlight the importance of better incorporating it into future invasion models. The mechanistic modelling approach developed in this study should help invasion scientists explore how human-mediated dispersal and topography shape invasion dynamics in landscapes. Ultimately, our approach could be combined with demographic, natural dispersal and environmental suitability models to refine spread scenarios and improve invasive species monitoring and management at regional to national scales.
README: Supporting Information for the publication "Accounting for the topology of road networks to better explain human-mediated dispersal in terrestrial landscapes"
This Dryad repository contains all the supporting information, data and the simulation framework related to the publication "Accounting for the topology of road networks to better explain human-mediated dispersal in terrestrial landscapes" (Rocabert et al. 2023).
The reader will find here a complete description of the repository, as well as guidelines to re-run the pipeline used to produce the results presented in the main manuscript.
Table of content
- 1) Repository content summary
- 2) MoRIS (Model of Routes of Invasive Spread) software
- 3) Description of the dataset
- 3.1) Supplementary figures and tables
- 3.2) Supporting information documents
- 3.3) Input files
- 4) Complete pipeline (Supporting Information 5)
- 4.1) Pipeline organization
- 4.2) Content description
- 5) Running the pipeline
- 5.1) Introduction
- 5.2) Supported platforms and dependencies
- 5.3) Dependencies
- 5.4) Compile the simulation executable
- 5.5) Run the validation of the CMA-ES outputs
- 5.6) Find and run the best parameters set of each scenario
- 5.7) Compute performance metrics distributions
- 5.8) Generate the figures of the manuscript
- 5.9) Convert figures
1) Repository content summary
- Figure S1. General overview of the human-mediated dispersal algorithm.
- Figure S2. Log-likelihood distribution of each calibrated model.
- Table S1. Performance metrics of each calibrated model.
- Animation S1. Animation of the spatial spread through time for each selected model.
- Supporting Information 1. Input files for the simulation framework.
- Supporting Information 2. Runtime information.
- Supporting Information 3. Spatial distribution of experimental and simulated presences/absences of each calibrated model.
- Supporting Information 4. Description of the invasion dynamics of the road network model.
- Supporting Information 5. Complete pipeline (including the simulation framework) used to produce post-analyses, figures and gif animation.
2) MoRIS (Model of Routes of Invasive Spread) software
The software developed and used for this publication is freely available at You can consult MoRIS Github page for details about the software, a first usage tutorial, and a guideline to construct input files. Please contact the authors if you plan to use MoRIS for scientific purpose.
3) Description of the dataset
The content of the repository is described below. Please consult the main manuscript for more context (Rocabert et al. 2023).
3.1) Supplementary figures and tables
Supplementary figures (Figure S1; Figure S2) and the supplementary Table S1 are available (also in tabular format).
A gif animation showing one simulated example of human-mediated dispersal for each tested scenario is available in Animation S1.
3.2) Supporting information documents
Three supplementary information documents are available:
- Supporting Information 2 (
): Runtime information. - Supporting Information 3 (
): Spatial distribution of experimental and simulated presences/absences of each calibrated model. - Supporting Information 4 (
): Description of the invasion dynamics of the road network model.
3.3) Input files (Supporting Information 1)
Three input files are necessary to run human-mediated dispersal (HMD) simulations. As described in the main manuscript, these files are:
- The map file (
), containing the area of interest discretized in 2x2km square cells, i.e. The Rhône valley around Lyon urban area (France); - The network file (
), containing a discretized version of the road network connecting cells on the map; - The sample file (
), containing the sampling effort of the invasive species of interest, cell by cell.
A guideline is available at to build these files.
Files are structured as following:
• map.txt
This file describes the properties of each cell in the discretized map.
- Column 1: Cell identifier;
- Column 2: X coordinate (in meters, cell centroid);
- Column 3: Y coordinate (in meters, cell centroid);
- Column 4: Cell's area (square meters);
- Column 5: Cell's suitable area (square meters, not used here);
- Column 6: Population size (not used here);
- Column 7: Population density;
- Column 8: Road density (not used here);
• network.txt
This file is an adjacency list describing road connectivity between cells. Identifier -1 symbolizes the map border.
- Column 1: Cell 1 identifier;
- Column 2: Cell 2 identifier;
- Column 3: Number of category I roads connecting the two cells;
- Column 4: Number of category II roads connecting the two cells;
- Column 5: Number of category III roads connecting the two cells;
- Column 6: Number of category IV roads connecting the two cells;
- Column 7: Number of category V roads connecting the two cells (not used here);
- Column 8: Number of category VI roads connecting the two cells (not used here);
• sample.txt
This file contains the Lasius neglectus experimental sampling dataset.
- Column 1: Cell identifier;
- Column 2: Number of positive samples in the cell (presence of L. neglectus);
- Column 3: Total number of samples in the cell;
4) Complete pipeline (Supporting Information 5)
The compressed archive SupportingInformation5.tar.gz
contains the complete pipeline to reproduce the data analysis, the figures and the animation presented in this work. It also includes the simulation framework (see the Github repository for the last version of the software).
The pipeline is organized around several folders that are pre-filled with simulation and post-processed data, to avoid time-consuming calculations for end-users and readers. Several scripts are also available to run the pipeline step by step.
4.1) Pipeline organization
This folder contains the raw CMA-ES optimization results. For each of the four scenarii (isotropic, human activity, road network, and combined; see main manuscript), two files are provided:- Best optimization result (file suffix
): This is the best point ever found for this optimization run, - Mean optimization result (file suffix
, not used here): The center of the best multivariate normal distribution found by CMA-ES during the optimization process (see Hansen & Auger, 2011).
All files have the same structure, each line being the result of one optimization run. The
column corresponds to the log-likelihood; all other columns are optimized or pre-defined simulation parameters (please consult the main manuscript and the Github repository for detailed explanations).- Best optimization result (file suffix
This folder contains re-calculated log-likelihood distributions from the best parameter sets found by CMA-ES (see above). For each scenario, log-likelihood distribution mean and variance are in columnsreplay_mean
of the fileXXX_replayed.txt
the scenario).3_best_models
This folder contains the output of one single simulation executed on the best parameters set found for each scenario. The organization of this output is described below.4_models_evaluation
This folder contains the result of the calculation of performance metrics (AUC, TSS, etc). For each of the four scenarii, performance metrics are computed 100 times to obtain a distribution (see main manuscript). For each scenario, metrics are store in the filescore_distribution.txt
This folder contains the result of the calculation of some performance metrics at every time step during a simulation. For each scenario, metrics are store in the filecomplete_evaluation_all.txt
This folder contains the three input files (see Supporting Information 1), and two datafiles (.shx
) describing the area of interet and used to generate graphics.figures
These folders will contain generated figures and gif animations;src
These folders contain all the material (source code, compilation scripts, binary folders) to run numerical simulations (see instructions below);scripts
This folder contains lower-level scripts used to run the pipeline;
4.2) Content description
• 1_simulation_results
files content:
- Column
: Relative path of MoRIS executable file; - Column
: Relative path of the map file; - Column
: Relative path of the network file; - Column
: Relative path of the sample file; - Column
: Type of experimental data. In the case of this study, always "PRESENCE_ABSENCE"; - Column
: Optimization function used to calculate the minimization score (here, always "LOG_LIKELIHOOD"); - Column
: Law of the probability distribution of dispersal length (in number of cells; always "LOG_NORMAL"); - Column
: Seed of the pseudo-random numbers generator; - Column
: Number of iterations (here, always 25 years); - Column
: Number of repetitions (here, always 1,000); - Column
: Minimal connection weight between cells; - Column
: Probability of presence in the cell of introduction (always 1 in this study); - Column
: Boolean indicating if human activity metrics should be used in simulations; - Column
: Identifier of the cell of introduction; - Column
: X-coordinate of the point of introduction; - Column
: Y-coordinate of the point of introduction; - Column
: Best mu value found by CMA-ES; - Column
: Best lambda value found by CMA-ES; - Column
: Best sigma value found by CMA-ES; - Column
: Parameter never used here (always 0); - Column
: Best category I road weight found by CMA-ES; - Column
: Best category II road weight found by CMA-ES; - Column
: Best category III road weight found by CMA-ES; - Column
: Best category IV road weight found by CMA-ES; - Column
: Category V road weight (always 0 in this study); - Column
: Category VI road weight (always 0 in this study); - Column
:Best log-likelihood found by CMA-ES ("score" is a generic term because other scores are possible, see MoRIS software);
• 2_cmaes_validation
files content:
File content is similar to above, except two columns of interest:
- Column
: Mean of the distribution of re-calculated log-likelihoods; - Column
: Variance of the distribution of re-calculated log-likelihoods;
• 3_best_models
files content:
For each scenario, the output
folder contains the typical output of a simulation:
The final_state.txt
file contains the final state (here, after 25 simulated years) of the simulation. Similar files are also generated for each time step (from 0 to 24 years), with an identical structure. These files are structured as following:
- Column
: Cell identifier; - Column
: Cell X-coordinate (centroid); - Column
: Cell Y-coordinate (centroid); - Column
: Number of experimental presences; - Column
: Total number of experimental observations; - Column
: Proportion of positive observations; - Column
: Total number of simulated introductions; - Column
: Mean number of simulated introductions per repetition; - Column
: Variance of the number of simulated introductions per repetition; - Column
: Number of simulated presences; - Column
: Number of simulated observations; - Column
: Proportion of simulated presences; - Column
: Mean time of the first invasion (in years); - Column
: Variance of the time of the first invasion (in years); - Column
: Mean time of the last invasion (in years); - Column
: Variance of the time of the last invasion (in years); - Column
: Likelihood; - Column
: Log-likelihood with no simulated invasion; - Column
: Log-likelihood when the simulation perfectly matches the experimental data; - Column
: Score with no simulated invasion; - Column
: Score when the simulation perfectly matches the experimental data;
The lineage_tree.txt
file contains the complete list of HMD events during a simulation (for all repetitions), allowing to reconstructing the spread history. Each line describes one HMD event:
- Column
: Repetition of the simulation; - Column
: Identifier of the starting cell of the HMD event; - Column
: Identifier of the ending cell of the HMD event; - Column
: "Geodesic" distance between cells (Shortest distance between the two cells on the connectivity graph, in number of cells); - Column
: Euclidean distance (in meters) between cell centroids; - Column
: Iteration (here, current year of the simulation);
The parameters.txt
file contains all the input parameters of the simulation (see MoRIS software for a full description).
• 4_models_evaluation
files content:
For each scenario, the file score_distribution.txt
contains the result of the calculation of performance metrics:
- Column
: Repetition; - Column
: Likelihood; - Column
: Likelihood when there is no simulated presence; - Column
: Likelihood when the simulation perfectly matches experimental data; - Column
: Log-likelihood when there is no simulated presence; - Column
: Log-likelihood; - Column
: Area under the (ROC) curve; - Column
: Partition threshold minimizing the euclidean distance to the top-right corner (max(TPR) and min(FPR)) of the ROC curve (threshold used to split simulated data in presences/absences, see main manuscript); - Column
: Euclidean distance to the top-right corner (max(TPR) and min(FPR)) of the ROC (receiver operating characteristic) curve; - Column
: Sensitivity; - Column
: 1-Specificity; - Column
: Partition threshold minimizing the accuracy score, - Column
: Accuracy score; - Column
: Partition threshold minimizing the F1 score; - Column
: F1 score; - Column
: Partition threshold minimizing the standard Kappa score; - Column
: Standard Kappa score; - Column
: Quantity disagreement; - Column
: Allocation disagreement; - Column
: Partition threshold minimizing the true skill statistic; - Column
: True skill statistic;
• 5_models_complete_evaluation
files content:
For each scenario, the file complete_evaluation_all.txt
contains the result of the calculation of some performance metrics at any time during a simulation:
- Column
: Repetition; - Column
: Time step (in years); - Column
: Log-likelihood; - Column
: Corresponding AIC; - Column
: Total number of simulated colonies (i.e. presences); - Column
: Area under the curve; - Column
: Boyce index; - Column
: Associated p-value;
• scripts
folder content:
This folder contains all the low-level scripts needed to run the pipeline. This includes Python and R-scripts to handle simulation outputs and run post-treatments, and R-scripts to generate and convert figures and the gif animation. The reader does not need to call directly these scripts, as higher-level shell scripts are provided to run the pipeline (see below).
- Script
: This Python script runs the validation pipeline (related to2_cmaes_validation
folder); - Script
: This Python script runs simulation examples for the best parameters set of each scenario (related to3_best_models
folder); - Script
: This Python script evaluates performance metrics of each best scenario (related to4_models_evaluation
folder); - Script
: R-script associated
script; - Script
: This Python script evaluates performance metrics at every time step of each best scenario (related to5_models_complete_evaluation
folder); - Script
: R-script associated
script; - Script
: This R-script displays log-likelihood and AIC for each best scenario; - Script
: This R-script displays performance metrics for each best scenario; - Script
: This R-script generates the Figure 3 of the main manuscript; - Script
: This R-script generates the Figure 4 of the main manuscript; - Script
: This R-script generates the Figure 5 of the main manuscript; - Script
: This R-script generates the Figure S2 of the main manuscript; - Script
: This R-script generates the figures of Supporting Information 3 document; - Script
: This R-script generates the Figure 1 of Supporting Information 4 document; - Script
: This R-script generates the Figure 3 of Supporting Information 4 document; - Script
: This R-script generates the Figure 4 of Supporting Information 4 document; - Script
: This R-script generates the components of the Animation S1 gif;
• Simulation framework:
, cmake
and build
folders contain source code (C++), compilation scripts and executables for the simulation framework. Please consult MoRIS software Github page:
: Main simulation executable;lib
: Enumerations;Prng.h
: Pseudo-random numbers generator class declaration;Prng.cpp
: Pseudo-random numbers generator class definition;Node.h
: Node (here, corresponding to map cells) class declaration;Node.cpp
: Node class definition;Graph.h
: Graph (here, corresponding to the road network) class declaration;Graph.cpp
: Graph class definition;Parameters.h
: Simulation parameters class declaration;Parameters.cpp
: Simulation parameters class definition;Simulation.h
: Simulation class declaration;Simulation.cpp
: Simulation class definition;ully tested on Unix/Linux and macOS platforms.
: Make clean script;
: Compilation script in debug mode;
: Compilation script in optimized mode;modules
: Header file containing the versioning of the software;FindGSL.cmake
: Module used by CMake to find the GSL library;
folder: will contain the binary executableHMD_model_run
after compilation;
5.3) Dependencies
- A C++ compiler (GCC, LLVM, ...);
- CMake (command line version);
- GSL for C/C++;
- CBLAS for C/C++;
- Python ≥ 3 (Packages CMA-ES and numpy are required);
- R (packages ggplot2, cowplot, ggpubr, sf, viridis and scales are required);
- ImageMagick;
- poppler;
- pdf2svg;
5.4) Compile the simulation executable
To compile the executable, navigate to the folder cmake
, and run the following command line in a terminal:
5.5) Run the validation of the CMA-ES outputs
To compute the log-likelihood distribution of the parameters sets found by the optimization algorithm (100 repetitions, see main manuscript), run the following command line in a terminal:
Resulting files will be saved in the folder 2_cmaes_validation
This script may take several hours to complete.
5.6) Find and run the best parameters set of each scenario
The next script finds the best parameters set of each model by comparing the average log-likelihoods and selecting the lowest one (see main manuscript). The script then launches a simulation with N=1,000 repetitions. Run the following command line in a terminal:
Resulting files will be saved in the folder 3_best_models
5.7) Compute performance metrics distributions
To compute the various performance metrics associated to each calibrated model (see main manuscript), run the following scripts:
This operation could also take some time. Resulting files will be saved in the folders 4_models_evaluation
and 5_models_complete_evaluation
5.8) Generate the figures of the manuscript
To generate the figures of this manuscript, simply execute the following script (the Unix libraries poppler, pdf2svg and ImageMagick are needed, as well as the R-packages ggplot2, cowplot, sf, ggpubr, viridis and scales):
All the figures are saved in the folder figures
. The AnimationS1 gif is saved in the folder gif
5.9) Convert figures
To convert figures in png and svg format, run:
Converted figures are saved in the folder figures.