Supporting information for: Accounting for the topology of road networks to better explain human-mediated dispersal in terrestrial landscapes
Data files
Nov 13, 2023 version files 25.04 MB
-
AnimationS1.gif
-
FigureS1.pdf
-
FigureS2.pdf
-
README.md
-
SupportingInformation1.tar.gz
-
SupportingInformation2.pdf
-
SupportingInformation3.pdf
-
SupportingInformation4.pdf
-
SupportingInformation5.tar.gz
-
Table1.csv
-
TableS1.pdf
Abstract
Human trade and movements are central to biological invasions worldwide. Human activities not only transport species across biogeographical barriers but also accelerate their post-introduction spread in the landscape. Thus, by constraining human movements, the spatial structure of road networks might greatly affect the regional spread of invasive species. However, few invasion models have accounted for the topology of road networks so far, and its importance for explaining the regional distribution of invasive species remains mostly unexplored. To address this issue, we developed a spatially explicit and mechanistic human-mediated dispersal model that accounts and tests for the influence of transport networks on the regional spread of invasive species. Using as a model the spread of the invasive ant Lasius neglectus in the middle Rhône valley (France), we show that accounting for the topology of road networks improves our ability to explain the current distribution of the invasive ant. In contrast, we found that using human population density as a proxy for the frequency of transport events decreases models’ performance and might thus not be as appropriate as previously thought. Finally, by differentiating road networks into sub-networks, we show that national and regional roads are more important than smaller roads for explaining spread patterns. Overall, our results demonstrate that the topology of transport networks can strongly bias regional invasion patterns and highlight the importance of better incorporating it into future invasion models. The mechanistic modelling approach developed in this study should help invasion scientists explore how human-mediated dispersal and topography shape invasion dynamics in landscapes. Ultimately, our approach could be combined with demographic, natural dispersal and environmental suitability models to refine spread scenarios and improve invasive species monitoring and management at regional to national scales.
README: Supporting Information for the publication "Accounting for the topology of road networks to better explain human-mediated dispersal in terrestrial landscapes"
https://doi.org/10.5061/dryad.tdz08kq5j
This Dryad repository contains all the supporting information, data and the simulation framework related to the publication "Accounting for the topology of road networks to better explain human-mediated dispersal in terrestrial landscapes" (Rocabert et al. 2023).
The reader will find here a complete description of the repository, as well as guidelines to re-run the pipeline used to produce the results presented in the main manuscript.
Table of content
- 1) Repository content summary
- 2) MoRIS (Model of Routes of Invasive Spread) software
- 3) Description of the dataset
- 3.1) Supplementary figures and tables
- 3.2) Supporting information documents
- 3.3) Input files
- 4) Complete pipeline (Supporting Information 5)
- 4.1) Pipeline organization
- 4.2) Content description
- 5) Running the pipeline
- 5.1) Introduction
- 5.2) Supported platforms and dependencies
- 5.3) Dependencies
- 5.4) Compile the simulation executable
- 5.5) Run the validation of the CMA-ES outputs
- 5.6) Find and run the best parameters set of each scenario
- 5.7) Compute performance metrics distributions
- 5.8) Generate the figures of the manuscript
- 5.9) Convert figures
1) Repository content summary
- Figure S1. General overview of the human-mediated dispersal algorithm.
- Figure S2. Log-likelihood distribution of each calibrated model.
- Table S1. Performance metrics of each calibrated model.
- Animation S1. Animation of the spatial spread through time for each selected model.
- Supporting Information 1. Input files for the simulation framework.
- Supporting Information 2. Runtime information.
- Supporting Information 3. Spatial distribution of experimental and simulated presences/absences of each calibrated model.
- Supporting Information 4. Description of the invasion dynamics of the road network model.
- Supporting Information 5. Complete pipeline (including the simulation framework) used to produce post-analyses, figures and gif animation.
2) MoRIS (Model of Routes of Invasive Spread) software
The software developed and used for this publication is freely available at https://github.com/charlesrocabert/MoRIS. You can consult MoRIS Github page for details about the software, a first usage tutorial, and a guideline to construct input files. Please contact the authors if you plan to use MoRIS for scientific purpose.
3) Description of the dataset
The content of the repository is described below. Please consult the main manuscript for more context (Rocabert et al. 2023).
3.1) Supplementary figures and tables
Supplementary figures (Figure S1; Figure S2) and the supplementary Table S1 are available (also in tabular format).
A gif animation showing one simulated example of human-mediated dispersal for each tested scenario is available in Animation S1.
3.2) Supporting information documents
Three supplementary information documents are available:
- Supporting Information 2 (
SupportingInformation2.pdf
): Runtime information. - Supporting Information 3 (
SupportingInformation3.pdf
): Spatial distribution of experimental and simulated presences/absences of each calibrated model. - Supporting Information 4 (
SupportingInformation4.pdf
): Description of the invasion dynamics of the road network model.
3.3) Input files (Supporting Information 1)
Three input files are necessary to run human-mediated dispersal (HMD) simulations. As described in the main manuscript, these files are:
- The map file (
map.txt
), containing the area of interest discretized in 2x2km square cells, i.e. The Rhône valley around Lyon urban area (France); - The network file (
network.txt
), containing a discretized version of the road network connecting cells on the map; - The sample file (
sample.txt
), containing the sampling effort of the invasive species of interest, cell by cell.
A guideline is available at https://github.com/charlesrocabert/MoRIS/blob/master/INPUT_FILES_TUTORIAL.md to build these files.
Files are structured as following:
• map.txt
:
This file describes the properties of each cell in the discretized map.
- Column 1: Cell identifier;
- Column 2: X coordinate (in meters, cell centroid);
- Column 3: Y coordinate (in meters, cell centroid);
- Column 4: Cell's area (square meters);
- Column 5: Cell's suitable area (square meters, not used here);
- Column 6: Population size (not used here);
- Column 7: Population density;
- Column 8: Road density (not used here);
• network.txt
:
This file is an adjacency list describing road connectivity between cells. Identifier -1 symbolizes the map border.
- Column 1: Cell 1 identifier;
- Column 2: Cell 2 identifier;
- Column 3: Number of category I roads connecting the two cells;
- Column 4: Number of category II roads connecting the two cells;
- Column 5: Number of category III roads connecting the two cells;
- Column 6: Number of category IV roads connecting the two cells;
- Column 7: Number of category V roads connecting the two cells (not used here);
- Column 8: Number of category VI roads connecting the two cells (not used here);
• sample.txt
:
This file contains the Lasius neglectus experimental sampling dataset.
- Column 1: Cell identifier;
- Column 2: Number of positive samples in the cell (presence of L. neglectus);
- Column 3: Total number of samples in the cell;
4) Complete pipeline (Supporting Information 5)
The compressed archive SupportingInformation5.tar.gz
contains the complete pipeline to reproduce the data analysis, the figures and the animation presented in this work. It also includes the simulation framework (see the Github repository for the last version of the software).
The pipeline is organized around several folders that are pre-filled with simulation and post-processed data, to avoid time-consuming calculations for end-users and readers. Several scripts are also available to run the pipeline step by step.
4.1) Pipeline organization
1_simulation_results
:
This folder contains the raw CMA-ES optimization results. For each of the four scenarii (isotropic, human activity, road network, and combined; see main manuscript), two files are provided:- Best optimization result (file suffix
_best.txt
): This is the best point ever found for this optimization run, - Mean optimization result (file suffix
_mean.txt
, not used here): The center of the best multivariate normal distribution found by CMA-ES during the optimization process (see Hansen & Auger, 2011).
All files have the same structure, each line being the result of one optimization run. The
score
column corresponds to the log-likelihood; all other columns are optimized or pre-defined simulation parameters (please consult the main manuscript and the Github repository for detailed explanations).- Best optimization result (file suffix
2_cmaes_validation
:
This folder contains re-calculated log-likelihood distributions from the best parameter sets found by CMA-ES (see above). For each scenario, log-likelihood distribution mean and variance are in columnsreplay_mean
andreplay_var
of the fileXXX_replayed.txt
(withXXX
the scenario).3_best_models
:
This folder contains the output of one single simulation executed on the best parameters set found for each scenario. The organization of this output is described below.4_models_evaluation
:
This folder contains the result of the calculation of performance metrics (AUC, TSS, etc). For each of the four scenarii, performance metrics are computed 100 times to obtain a distribution (see main manuscript). For each scenario, metrics are store in the filescore_distribution.txt
.5_models_complete_evaluation
:
This folder contains the result of the calculation of some performance metrics at every time step during a simulation. For each scenario, metrics are store in the filecomplete_evaluation_all.txt
.input_files
:
This folder contains the three input files (see Supporting Information 1), and two datafiles (.shx
and.shp
) describing the area of interet and used to generate graphics.figures
,gif
:
These folders will contain generated figures and gif animations;src
,cmake
,build
:
These folders contain all the material (source code, compilation scripts, binary folders) to run numerical simulations (see instructions below);scripts
:
This folder contains lower-level scripts used to run the pipeline;
4.2) Content description
• 1_simulation_results
files content:
- Column
exec
: Relative path of MoRIS executable file; - Column
map
: Relative path of the map file; - Column
network
: Relative path of the network file; - Column
sample
: Relative path of the sample file; - Column
typeofdata
: Type of experimental data. In the case of this study, always "PRESENCE_ABSENCE"; - Column
optimfunc
: Optimization function used to calculate the minimization score (here, always "LOG_LIKELIHOOD"); - Column
law
: Law of the probability distribution of dispersal length (in number of cells; always "LOG_NORMAL"); - Column
seed
: Seed of the pseudo-random numbers generator; - Column
iters
: Number of iterations (here, always 25 years); - Column
reps
: Number of repetitions (here, always 1,000); - Column
wmin
: Minimal connection weight between cells; - Column
pintro
: Probability of presence in the cell of introduction (always 1 in this study); - Column
humanactivity
: Boolean indicating if human activity metrics should be used in simulations; - Column
cell_id
: Identifier of the cell of introduction; - Column
xintro
: X-coordinate of the point of introduction; - Column
yintro
: Y-coordinate of the point of introduction; - Column
mu
: Best mu value found by CMA-ES; - Column
lambda
: Best lambda value found by CMA-ES; - Column
sigma
: Best sigma value found by CMA-ES; - Column
gamma
: Parameter never used here (always 0); - Column
w1
: Best category I road weight found by CMA-ES; - Column
w2
: Best category II road weight found by CMA-ES; - Column
w3
: Best category III road weight found by CMA-ES; - Column
w4
: Best category IV road weight found by CMA-ES; - Column
w5
: Category V road weight (always 0 in this study); - Column
w6
: Category VI road weight (always 0 in this study); - Column
score
:Best log-likelihood found by CMA-ES ("score" is a generic term because other scores are possible, see MoRIS software);
• 2_cmaes_validation
files content:
File content is similar to above, except two columns of interest:
- Column
replay_mean
: Mean of the distribution of re-calculated log-likelihoods; - Column
replay_var
: Variance of the distribution of re-calculated log-likelihoods;
• 3_best_models
files content:
For each scenario, the output
folder contains the typical output of a simulation:
The final_state.txt
file contains the final state (here, after 25 simulated years) of the simulation. Similar files are also generated for each time step (from 0 to 24 years), with an identical structure. These files are structured as following:
- Column
id
: Cell identifier; - Column
x
: Cell X-coordinate (centroid); - Column
y
: Cell Y-coordinate (centroid); - Column
y_obs
: Number of experimental presences; - Column
n_obs
: Total number of experimental observations; - Column
p_obs
: Proportion of positive observations; - Column
total_nb_intros
: Total number of simulated introductions; - Column
mean_nb_intros
: Mean number of simulated introductions per repetition; - Column
var_nb_intros
: Variance of the number of simulated introductions per repetition; - Column
y_sim
: Number of simulated presences; - Column
n_sim
: Number of simulated observations; - Column
p_sim
: Proportion of simulated presences; - Column
mean_first_invasion
: Mean time of the first invasion (in years); - Column
var_first_invasion
: Variance of the time of the first invasion (in years); - Column
mean_last_invasion
: Mean time of the last invasion (in years); - Column
var_last_invasion
: Variance of the time of the last invasion (in years); - Column
L
: Likelihood; - Column
empty_L
: Log-likelihood with no simulated invasion; - Column
max_L
: Log-likelihood when the simulation perfectly matches the experimental data; - Column
empty_score
: Score with no simulated invasion; - Column
score
: Score when the simulation perfectly matches the experimental data;
The lineage_tree.txt
file contains the complete list of HMD events during a simulation (for all repetitions), allowing to reconstructing the spread history. Each line describes one HMD event:
- Column
repetition
: Repetition of the simulation; - Column
start_node
: Identifier of the starting cell of the HMD event; - Column
end_node
: Identifier of the ending cell of the HMD event; - Column
geodesic_dist
: "Geodesic" distance between cells (Shortest distance between the two cells on the connectivity graph, in number of cells); - Column
euclidean_dist
: Euclidean distance (in meters) between cell centroids; - Column
iteration
: Iteration (here, current year of the simulation);
The parameters.txt
file contains all the input parameters of the simulation (see MoRIS software for a full description).
• 4_models_evaluation
files content:
For each scenario, the file score_distribution.txt
contains the result of the calculation of performance metrics:
- Column
REP
: Repetition; - Column
likelihood
: Likelihood; - Column
empty_likelihood
: Likelihood when there is no simulated presence; - Column
max_likelihood
: Likelihood when the simulation perfectly matches experimental data; - Column
empty_score
: Log-likelihood when there is no simulated presence; - Column
score
: Log-likelihood; - Column
AUC
: Area under the (ROC) curve; - Column
d_th
: Partition threshold minimizing the euclidean distance to the top-right corner (max(TPR) and min(FPR)) of the ROC curve (threshold used to split simulated data in presences/absences, see main manuscript); - Column
d
: Euclidean distance to the top-right corner (max(TPR) and min(FPR)) of the ROC (receiver operating characteristic) curve; - Column
TPR
: Sensitivity; - Column
FPR
: 1-Specificity; - Column
ACC_th
: Partition threshold minimizing the accuracy score, - Column
ACC
: Accuracy score; - Column
F1_th
: Partition threshold minimizing the F1 score; - Column
F1
: F1 score; - Column
KAPPA_th
: Partition threshold minimizing the standard Kappa score; - Column
KAPPA
: Standard Kappa score; - Column
QDIS
: Quantity disagreement; - Column
ADIS
: Allocation disagreement; - Column
TSS_th
: Partition threshold minimizing the true skill statistic; - Column
TSS
: True skill statistic;
• 5_models_complete_evaluation
files content:
For each scenario, the file complete_evaluation_all.txt
contains the result of the calculation of some performance metrics at any time during a simulation:
- Column
rep
: Repetition; - Column
t
: Time step (in years); - Column
logL
: Log-likelihood; - Column
AIC
: Corresponding AIC; - Column
nb_colonies
: Total number of simulated colonies (i.e. presences); - Column
AUC
: Area under the curve; - Column
BOYCE_index
: Boyce index; - Column
BOYCE_pvalue
: Associated p-value;
• scripts
folder content:
This folder contains all the low-level scripts needed to run the pipeline. This includes Python and R-scripts to handle simulation outputs and run post-treatments, and R-scripts to generate and convert figures and the gif animation. The reader does not need to call directly these scripts, as higher-level shell scripts are provided to run the pipeline (see below).
- Script
validate.py
: This Python script runs the validation pipeline (related to2_cmaes_validation
folder); - Script
best_model.py
: This Python script runs simulation examples for the best parameters set of each scenario (related to3_best_models
folder); - Script
evaluate.py
: This Python script evaluates performance metrics of each best scenario (related to4_models_evaluation
folder); - Script
evaluation.R
: R-script associated toevaluate.py
script; - Script
complete_evaluation.py
: This Python script evaluates performance metrics at every time step of each best scenario (related to5_models_complete_evaluation
folder); - Script
complete_evaluation.R
: R-script associated tocomplete_evaluation.py
script; - Script
Print_LogLikelihood_AIC_metrics.R
: This R-script displays log-likelihood and AIC for each best scenario; - Script
Print_Evaluation_metrics.R
: This R-script displays performance metrics for each best scenario; - Script
Figure3.R
: This R-script generates the Figure 3 of the main manuscript; - Script
Figure4.R
: This R-script generates the Figure 4 of the main manuscript; - Script
Figure5.R
: This R-script generates the Figure 5 of the main manuscript; - Script
FigureS2.R
: This R-script generates the Figure S2 of the main manuscript; - Script
Figures_SupportingInformation3.R
: This R-script generates the figures of Supporting Information 3 document; - Script
Figure1_SupportingInformation4.R
: This R-script generates the Figure 1 of Supporting Information 4 document; - Script
Figure3_SupportingInformation4.R
: This R-script generates the Figure 3 of Supporting Information 4 document; - Script
Figure4_SupportingInformation4.R
: This R-script generates the Figure 4 of Supporting Information 4 document; - Script
AnimationS1.R
: This R-script generates the components of the Animation S1 gif;
• Simulation framework:
src
, cmake
and build
folders contain source code (C++), compilation scripts and executables for the simulation framework. Please consult MoRIS software Github page:
src
folder:HMD_model_run.cpp
: Main simulation executable;lib
folder:Enums.h
: Enumerations;Prng.h
: Pseudo-random numbers generator class declaration;Prng.cpp
: Pseudo-random numbers generator class definition;Node.h
: Node (here, corresponding to map cells) class declaration;Node.cpp
: Node class definition;Graph.h
: Graph (here, corresponding to the road network) class declaration;Graph.cpp
: Graph class definition;Parameters.h
: Simulation parameters class declaration;Parameters.cpp
: Simulation parameters class definition;Simulation.h
: Simulation class declaration;Simulation.cpp
: Simulation class definition;ully tested on Unix/Linux and macOS platforms.
cmake
folder:make_clean.sh
: Make clean script;make_debug.sh
: Compilation script in debug mode;make_release.sh
: Compilation script in optimized mode;modules
folder:Config.h.in
: Header file containing the versioning of the software;FindGSL.cmake
: Module used by CMake to find the GSL library;
build
folder:bin
folder: will contain the binary executableHMD_model_run
after compilation;
5.3) Dependencies
- A C++ compiler (GCC, LLVM, ...);
- CMake (command line version);
- GSL for C/C++;
- CBLAS for C/C++;
- Python ≥ 3 (Packages CMA-ES and numpy are required);
- R (packages ggplot2, cowplot, ggpubr, sf, viridis and scales are required);
- ImageMagick;
- poppler;
- pdf2svg;
5.4) Compile the simulation executable
To compile the executable, navigate to the folder cmake
, and run the following command line in a terminal:
sh make_release.sh
5.5) Run the validation of the CMA-ES outputs
To compute the log-likelihood distribution of the parameters sets found by the optimization algorithm (100 repetitions, see main manuscript), run the following command line in a terminal:
sh A_run_validation.sh
Resulting files will be saved in the folder 2_cmaes_validation
.
This script may take several hours to complete.
5.6) Find and run the best parameters set of each scenario
The next script finds the best parameters set of each model by comparing the average log-likelihoods and selecting the lowest one (see main manuscript). The script then launches a simulation with N=1,000 repetitions. Run the following command line in a terminal:
sh B_run_best_models.sh
Resulting files will be saved in the folder 3_best_models
.
5.7) Compute performance metrics distributions
To compute the various performance metrics associated to each calibrated model (see main manuscript), run the following scripts:
sh C_compute_evaluation_distributions.sh
And:
sh D_compute_complete_evaluation_distributions.sh
This operation could also take some time. Resulting files will be saved in the folders 4_models_evaluation
and 5_models_complete_evaluation
.
5.8) Generate the figures of the manuscript
To generate the figures of this manuscript, simply execute the following script (the Unix libraries poppler, pdf2svg and ImageMagick are needed, as well as the R-packages ggplot2, cowplot, sf, ggpubr, viridis and scales):
sh E_generate_figures.sh
All the figures are saved in the folder figures
. The AnimationS1 gif is saved in the folder gif
.
5.9) Convert figures
To convert figures in png and svg format, run:
sh F_convert_figures.sh
Converted figures are saved in the folder figures.