Predicting current and future Aedes vexans occurrence in the Netherlands
Data files
Sep 24, 2024 version files 17.68 MB
-
correlations.csv
2.87 KB
-
futureLandUse.zip
53.70 KB
-
h2oModels.zip
7.16 MB
-
rasterTemplate.asc
802.07 KB
-
README.md
4.70 KB
-
results.zip
6.35 MB
-
SSP1.zip
1.65 MB
-
SSP5.zip
1.66 MB
Abstract
We have created predictions of the current and future occurrence of Aedes vexans (Meigen, 1830) mosquitoes in the Netherlands. Aedes vexans can transmit many different diseases, including western and eastern equine encephalitis virus, Tahyna virus, West Nile Virus and Rift Valley Fever Virus. However, lack of occurrence maps, especially at a local scale, has hampered accurate disease modelling. We used extensive occurrence data collected by the Netherlands Centre for Monitoring of Vectors to train models using AutoML. We made future predictions for 2050 using a combination of climate scenarios (from the Dutch Meteorological Organistion) and socio-economic scenarios (the Dutch One Health SSPs). We made predictions for individual days rather than for the season as a whole, allowing us to consider future changes in seasonal dynamics. This is the first time a seasonal model for Aedes vexans has been developed and the first time future predictions have been made for this species at a national scale.
README: Predicting current and future Aedes vexans occurrence in the Netherlands
https://doi.org/10.5061/dryad.tb2rbp08d
We provide the h2o models and the data to make future predictions, as well as the code used to generate these models, evaluate model uncertainty and make predictions.
Description of the data and file structure
Mosquito data
See supplementary materials on Zenodo.
Models
The h2o models are available in h2oModels.zip. WIthin this folder are 4 sub-folders: ratio1, ratio2, ratio5 and ratio10. These refer to the presence:absence ratio in the training data used to generate the models. Within each of these sub-folders are 10 more sub-folders, each containing an h2o model file. These models can only be read using h2o version 3.42.0.2.
The most effective ratio was found to be 1:2; the models for this ratio are found in the 'ratio2' sub-folder. Our final model was the ensemble of the 10 models contained in this sub-folder.
Future data
Raster files containing data for the future (2050) non-climate predictors are found in zip folders SSP1 and SSP5, representing these two scenarios. These are used by the futureScenario.R script listed below. These contain the following predictors:
- agriAreas.asc: Percentage of agricultural land cover in 1km gridsquare
- artificial.asc: Percentage of articifical land cover in 1km gridsquare
- bulk_density.asc: Soil bulk density (tonnes per cubic metre)
- clay.asc: Soil clay content (percentage)
- distanceNature.asc: Distance to the nearest nature area in metres. Nature areas are Natura2000 areas, national parks and Natuurnetwerk Nederland areas
- floodrisk.asc: Flood risk, calculated as 0.1*(max depth of ‘1 in 10 year’ flood event) + 0.01*(max depth of ‘1 in 100 year’ flood event) + 0.001*(max depth of ‘1 in 1000 year’ flood event) + 0.00001*(max depth of ‘1 in 100,000 year’ flood event)
- permWater.asc: Percentage of permanent water in 1km gridsquare
- permWet.asc: Percentage of permanent wetland in 1km gridsquare
- shrubs.asc: Percentage shrub cover, includes shrubs between 1m and 2.5m tall, in 1km gridsquare
- surfaceSalinity.asc: Chloride concentration in surface water (mg/l)
The data for the climate predictors can be downloaded from https://klimaatscenarios-data.knmi.nl/downloads.
In addition, the future land use maps which were used to create the future non-climate predictors are contained in futureLandUse.zip. These are raster files with land uses coded as follows: 1 - urban, 2 - pasture, 3 - crops, 4 - forest, 5 - non-forest nature.
Results
The results.zip file contains the following raster files:
- the predicted mean, minimum and maximum occurrence probability for each scenario
- the standard deviation of the occurrence probability predictions across the 30-year period for each scenario
- the 95% confidence interval of model predictions for predictions made over the 2022 mosquito season. This includes both absolute uncertainty and uncertainty as a percentage of occurrence probability
Other
The correlations.csv file contains the correlations between the different predictor variables we considered for this study.
The Data_summary.docx file provides provides figures summarising the Aedes vexans data used in this study.
The figSM1_uncertainty*_*process.png file shows the process used for calculating the model uncertainty.
We also provide the file rasterTemplate.asc. This is a blank raster file showing the grid we have used for all our work in this study. This is a 1km grid with CRS: EPSG - 28992.
Code/Software
All coding was performed in R v4.3.1. We used the h2o package v3.42.0.2.
Available scripts:
- modelTraining.R
- This runs the autoML process for all the training datasets, makes predictions based on the model comparison 2 dataset (see accompanying paper for details) and records the variable importance for each individual model
- modelSelection.R
- This calculates which presence:absence ratio is optimal
- modelValidation.R
- This compares the final model wth the validation dataset, including calculating a confusion matrix and the balanced accuracy
- variableImportance.R
- Calculates the variable importance for the model ensemble based on the training data with presence:absence ratio 1:2 and makes a graph
- uncertainty.R
- Finds the uncertainty in the model derivation process using bootstrap sampling. This also plots the results.
- futureScenario.R
- Makes future occurrence predictions for a given scenario (in this case the scenario RCP2.6/SSP1 - dry, but it is easily adapted to other scenarios)
Methods
Full details can be found in our paper: Current and future habitat suitability of the floodwater mosquito Aedes vexans (Meigen, 1830) in the Netherlands.
We used occurrence data collected by The Netherlands Centre for Monitoring of Vectors (CMV). Predictors used for modelling were temperature, precipitation, soil bulk density, soil clay content, shrub density, flood risk, surface water salinity, agricultural land cover, artificial land cover, distance to nature, permanent water and permanent wetland. We tried different models using training data in different presence:absence ratios. Modelling techniques tried were random forest, generalised linear models, XGBoost, gradient boosting machines and neural networks. We used an autoML approach to select the best models using R's h20 package. The final best model was an ensemble of several other models. To make future predictions, we used climate scenarios produced by the Dutch Meteorological Organisation together with the Dutch One Health SSPs. We considered 4 different scenarios and made predictions for 2050 on a 1km grid. Our model also allowed us to predict seasonal variation in the occurrence on this species and we also predicted how this would change in the future.