Data from: Proximity to seabird colonies and water availability shape moss distributions in Antarctica
Data files
Oct 27, 2025 version files 167.94 KB
-
code.zip
46.21 KB
-
README.md
14.62 KB
-
sensitivity_analysis.xlsx
107.11 KB
Nov 12, 2025 version files 33.49 MB
-
all_pred_prob_SB.dbf
5.10 MB
-
all_pred_prob_SB.prj
394 B
-
all_pred_prob_SB.shp
3.17 MB
-
all_pred_prob_SB.shx
60.76 KB
-
all_pred_prob.dbf
5.10 MB
-
all_pred_prob.prj
394 B
-
all_pred_prob.shp
3.17 MB
-
all_pred_prob.shx
60.76 KB
-
all_pred_sd_SB.dbf
5.10 MB
-
all_pred_sd_SB.prj
388 B
-
all_pred_sd_SB.shp
3.17 MB
-
all_pred_sd_SB.shx
60.76 KB
-
all_pred_sd.dbf
5.10 MB
-
all_pred_sd.prj
388 B
-
all_pred_sd.shp
3.17 MB
-
all_pred_sd.shx
60.76 KB
-
code.zip
46.21 KB
-
README.md
16.16 KB
-
sensitivity_analysis.xlsx
107.11 KB
Abstract
Understanding species distributions across Antarctica is crucial for biodiversity conservation under climate change, but continental-scale analyses of key terrestrial species remain scarce. Here, we modeled distributions of 28 moss species across Antarctica using Log-Gaussian Cox Process models and environmental covariates, including topographic wetness index, distance to seabird colonies, and temperature. Broad-scale distributions were primarily driven by proximity to seabird colonies, while species exhibited distinct responses to water availability and temperature. Species exclusive to maritime Antarctica showed negative relationships with topographic wetness index, whereas continent-wide species responded positively to water accumulation potential, reflecting regional differences in water availability and habitat preferences. Bias-corrected predictions revealed the highest moss diversity in coastal regions, with inland areas supporting ecologically distinct assemblages. Our Bayesian modeling approach provides a foundation for forecasting biodiversity responses to environmental change in data-poor systems, offering critical insights for evidence-based conservation planning under increasing anthropogenic pressures.
Dataset DOI: 10.5061/dryad.wstqjq2zf
This repository contains the data sources and the code used for the paper "Proximity to seabird colonies and water availability shape moss distributions in Antarctica", published in Ecography, 10.1002/ecog.08166.
The main results of the paper can also be visualized interactively here: https://moss-app-c63b-prod.app.oceanum.io/
Data Availability Note
The input data files required to run the code are not included in this repository due to licensing restrictions and file size constraints. All data can be obtained from the publicly available sources listed below. We provide direct links and access instructions for each data source to facilitate reproduction of our analysis.
Files
sensitivity_analysis.xlsx: The table shows sensitivity analysis for prior settings in a Bayesian spatial model. It includes identifiers (sp_id, mesh_id), prior settings, posterior summaries for intercept and spatial field (mean ± credible intervals), and model performance metrics (DIC, CPO, WAIC, MLL).
code.zip: contains the code used in the manuscript
this dataset also includes the shapefiles containing the spatial predictions from our species distribution models. These files can be used to reproduce the maps presented in the paper without needing to re-run the computationally intensive modelling steps.
- all_pred_prob_SB.shp: Model predictions of all species presence probabilities without sampling bias correction
- Use these predictions to see the raw environmental suitability without accounting for proximity to research stations
- all_pred_prob.shp: Model predictions of species presence probabilities with sampling bias correction
- These predictions exclude the distance to scientific stations as a covariate to account for sampling bias in the occurrence data
- Use these predictions for understanding the actual distribution patterns after correcting for uneven sampling effort
Both shapefiles include predicted presence probabilities for the 28 species studied, as well as these standard deviations of predictions (uncertainty estimates).
Using the prediction shapefiles:
These shapefiles can be loaded directly into R using the sf package:
```r
library(sf)
# Load predictions without sampling bias correction
pred_with_bias <- st_read("all_pred_prob_SB.shp")
# Load predictions with sampling bias correction
pred_no_bias <- st_read("all_pred_prob.shp")
```
The predictions are provided at the same spatial resolution used in the analysis and cover all ice-free areas of Antarctica as defined by our study boundaries.
Code/Software
The R code presented here includes the following R scripts, and data sources are referenced by their number on the Data Source list below the scripts description.
01cleanData.R
Processes species occurrence data for analysis. It also creates the buffers of ice-free areas used as boundaries for model fitting and prediction.
Data used in this script come from:
- Data Source 1 (The biodiversity of ice-free Antarctica database): loaded as
dbobject - Data Source 2 (ADD rock outcrops): loaded as
rockoutobject - Data Source 3 (Antarctic coastline): loaded as
coastlineobject
# load data
db <- read_excel("data/Ant_Terr_Bio_Data_FINAL_July28_2021.xlsx", sheet=1)
rockout <- st_read("data/ADD_RockOutcrops_Landsat8.shp")
...
coastline <- st_read("add_coastline_high_res_line_v7_7.shp")
As well as from Data Source 4 (Tropicos database), using their API to download all the current names of Bryophytes on the database (as of August 2023):
# read bryophytes database
tsv <- readr::read_tsv("data/Taxon.tsv")
02covariates.R
Processes geospatial data by upscaling topographic variables (TWI, slope, aspect), calculating derived metrics (northness, eastness), computing distances to important bird areas and research stations, and exporting the processed raster files for subsequent analysis.
Data used in this script come from:
- Data Source 2 (ADD rock outcrops): loaded as
landsatobject (same shapefile as above) - Data Source 5 (REMA topographic variables): loaded as
twi,aspect,sloperasters - Data Source 6 (COMNAP research stations): loaded as
stationsobject - Data Source 7 (Important Bird Areas): loaded as
ibaobject
#### Load data ----------
landsat <- st_read("ADD_landsat_rockout_3031_60S_clean.shp")
# topographic variables
twi <- rast("data/topography/twi.tif")
aspect <- rast("data/topography/aspect.tif")
slope <- rast("data/topography/slope.tif")
# sampling
stations <- st_read("data/stations/COMNAP_Antarctic_Facilities.shp")
...
iba <- st_read("data/iba/ImportantBirdAreas.shp")
Note: Generating Topographic Variables
The topographic variables (TWI, aspect, and slope) were derived from the Reference Elevation Model of Antarctica (REMA, reference 5) using QGIS SAGA tools. Full methodological details are provided in the paper. The workflow follows:
Kopecký M, Macek M, Wild J. Topographic Wetness Index calculation guidelines based on measured soil moisture and plant species composition. Science of the Total Environment. 2021 Feb 25;757:143785.
Processing steps:
- Preprocessing: Terrain sinks and flat pixels were removed from the DEM using the Fill Sinks XXL (Wang & Liu) function in QGIS.
- Slope and Aspect: Generated using the Slope, Aspect, Curvature function with a 3rd-degree polynomial and 10 parameters.
- TWI Calculation:
- Flow accumulation (total catchment area) was calculated using the FD8 method with cell-specific flow dispersion based on maximum downslope gradient
- Flow width and specific catchment area were set equal to the raster cell size
- The Topographic Wetness Index was computed using slope, total catchment area, and flow width
The complete workflow is illustrated in Figure 1 of Kopecký et al. (2021).
03organizeData.R
Projects all spatial data to a common CRS, interpolates missing values, scales environmental covariates, simplifies the ice-free boundary to use for modelling, and tests for collinearity among predictors for subsequent analysis. Saves all the files for the next script.
Maximum temperature and sum of days in the season above 0 degrees were taken from Data Source 8 (AntAirICE dataset), loaded as max_temp and degday_season, and calculated following the code available at github.com/evabendix/AntAirICE-processing.
max_temp <- rast("data/antair/AntAir_Yr_max.tif")
degday_season <- rast("antair/Sum_DegDay_season.tif")
04makeMeshes.R
Creates three different meshes with different resolutions to test for sensitivity of the models to mesh specification, and prepares species occurrence data for modelling (creates a list used in the subsequent scripts).
05modelsSPDE.R
Performs sensitivity analysis on mesh and prior parameters, running parallel computations to test multiple combinations of meshes, spatial correlation ranges, and variance priors for each and all species included in the manuscript.
06sensitivityAnalysis.R
Analysis of the sensitivity of models to different mesh resolutions and prior specifications by using mixed-effects models, creating visualisations of model parameters (intercept means, spatial field ranges), comparing model fit metrics (DIC, CPO, WAIC), and generating tables and plots for supplementary material.
Data for this script is available with this code.
data <- read_excel("meshes_models/sensitivity_analysis.xlsx")
07chooseModels.R
Fits LGCP species distribution models for 28 Antarctic moss species, using the selected mesh and priors from previous sensitivity analyses, using environmental covariates (temperature, topography, distance to stations/bird areas). Also makes spatial predictions both with and without the sampling bias variable, and saves results as spatial data files for further analysis.
08modelValidation.R
Loads previously built individual species models, converts predicted intensities to probabilities, creates species distribution maps, evaluates model performance with CRPS metrics, calculates observed versus predicted counts for each species, generates summaries of species richness patterns with and without biases, and exports objects for further analysis.
09modelComparison.R
Performs multivariate analyses (distance-based redundancy analysis, dbRDA) of predicted species presences, creates ordination axis in the RGB scale, compares predicted species distributions with existing vegetation maps (Plantarctica).
10paperFigures.R
Creates species richness maps, bivariate maps of model residuals, coefficient plots, probability distribution maps for three species with uncertainty estimates, and ordination plots that visualise community composition patterns across Antarctica based on distance-based redundancy analysis.
11partialPredictions.R
Generating partial predictions of species responses to three key environmental variables (distance to Important Bird Areas, temperature, topographic wetness), creating response curves for individual species, calculating predicted species richness along environmental gradients with confidence intervals, and conducting multivariate analyses of community composition changes. Results are included in the supplementary material.
/functions
Various functions used throughout the analysis. Separated by model fitting, prediction, evaluation, and handling of rasters.
Session Info
inlabru, INLA, and fmesher are key packages for the models, and they are under active development, so it is important to keep track of the versions used. The version of R is also important, as the packages are built for specific versions of R.
sessionInfo()
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_New Zealand.utf8 LC_CTYPE=English_New Zealand.utf8
[3] LC_MONETARY=English_New Zealand.utf8 LC_NUMERIC=C
[5] LC_TIME=English_New Zealand.utf8
time zone: Pacific/Auckland
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] terra_1.7-78 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[5] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1
[9] tibble_3.2.1 ggplot2_3.5.1 tidyverse_2.0.0 sf_1.0-16
[13] inlabru_2.11.1 fmesher_0.2.0.9000 INLA_24.11.07-4 Matrix_1.7-1
[17] sp_2.1-4
loaded via a namespace (and not attached):
[1] utf8_1.2.4 generics_0.1.3 class_7.3-22 KernSmooth_2.23-24
[5] stringi_1.8.4 lattice_0.22-6 hms_1.1.3 magrittr_2.0.3
[9] grid_4.4.2 timechange_0.3.0 jsonlite_1.8.9 e1071_1.7-14
[13] DBI_1.2.3 fansi_1.0.6 scales_1.3.0 codetools_0.2-20
[17] cli_3.6.3 rlang_1.1.4 units_0.8-5 munsell_0.5.1
[21] splines_4.4.2 withr_3.0.1 tools_4.4.2 tzdb_0.4.0
[25] colorspace_2.1-0 vctrs_0.6.5 R6_2.5.1 proxy_0.4-27
[29] lifecycle_1.0.4 classInt_0.4-10 pkgconfig_2.0.3 pillar_1.9.0
[33] gtable_0.3.5 glue_1.7.0 Rcpp_1.0.12 tidyselect_1.2.1
[37] compiler_4.4.2
Data Sources
The input data files required to run the above-mentioned code are derived from the following sources:
1. The biodiversity of ice-free Antarctica database
(used in 01cleanData.R as db object)
- Citation: Terauds et al. (2025)
- DOI: 10.1002/ecy.70000
- How to access:
- Visit the article at https://doi.org/10.1002/ecy.70000
- Download the supplementary data file.
- Note: This dataset may have been updated since 2023, when we got access to the first version of this dataset.
2. Automatically extracted rock outcrop dataset for Antarctica
(used in 01cleanData.R as rockout object and 02covariates.R as landsat object)
- Citation: Gerrish, L. (2020)
- DOI: https://doi.org/10.5285/178ec50d-1ffb-42a4-a4a3-1145419da2bb
- How to access:
- Visit the UK Polar Data Centre at the DOI link above
- Download the "ADD_RockOutcrops_Landsat8.shp" shapefile (and associated files)
- Note: The file "ADD_landsat_rockout_3031_60S_clean.shp" is a processed version of this dataset, generated in script 01.
3. High resolution vector polylines of the Antarctic coastline
(used in 01cleanData.R as coastline object)
- Citation: Gerrish, L. et al. (2023)
- DOI: https://doi.org/10.5285/70AC5759-34EE-4F39-9069-2116DB592340
- How to access:
- Visit the UK Polar Data Centre at the DOI link above
- Download the high-resolution coastline shapefile (version 7.7)
4. Tropicos.org bryophyte taxonomy database
(used in 01cleanData.R as tsv object)
- Source: Tropicos.org, Missouri Botanical Garden
- Access date: 14 August 2023
- URL: https://tropicos.org
- How to access:
- Visit https://tropicos.org
- Use their API or download interface to obtain bryophyte taxonomy data
- Export as tab-separated values
- Note: Taxonomy may have been updated since August 2023
5. Reference Elevation Model of Antarctica (REMA)
(used in 02covariates.R as twi, aspect, slope rasters)
- Citation: Howat et al. (2022)
- DOI: https://doi.org/10.7910/DVN/EBW8UC
- How to access:
- Visit the Harvard Dataverse at the DOI link above
- Download the REMA DEM mosaic
- Process using QGIS SAGA tools following the methodology described in the "Generating Topographic Variables" section above
- Save processed files as:
data/topography/twi.tifdata/topography/aspect.tifdata/topography/slope.tif
6. COMNAP Antarctic facilities information
(used in 02covariates.R as stations object)
- Source: Council of Managers of National Antarctic Programs (COMNAP)
- URL: https://www.comnap.aq/antarctic-facilities-information
- License: Available for public use with attribution requirements
- How to access:
- Visit https://www.comnap.aq/antarctic-facilities-information
- Download the Antarctic facilities shapefile
7. Important Bird Areas in Antarctica
(used in 02covariates.R as iba object)
- Citation: Harris et al. (2015)
- URL: https://environments.aq/publications/important-bird-areas-in-antarctica/
- How to access:
- Download via Quantarctica: https://npolar.no/quantarctica/#toggle-id-11
- Or visit the Environmental Information Portal at the URL above
8. Antarctic daily mesoscale air temperature dataset (AntAirICE)
(used in 03organizeData.R as max_temp and degday_season rasters)
- Citation: Nielsen et al. (2024)
- DOI: https://doi.org/10.1038/s41597-023-02720-z
- How to access:
- Visit the article at the DOI link above
- Download the MODIS-derived air temperature dataset
- Process the data following the code available at: github.com/evabendix/AntAirICE-processing
- Calculate maximum temperature and sum of degree days above 0°C for the growing season
- Save processed files as:
data/antair/AntAir_Yr_max.tifantair/Sum_DegDay_season.tif
Note: All data sources listed above are CC-BY licensed and are not included in this repository. Please download them from their original sources and cite appropriately when using their data.
Changes after Oct 27, 2025:
The repository has been updated to include shapefiles containing the spatial predictions from our species distribution models. These files can be used to reproduce the maps presented in the paper without needing to re-run the computationally intensive modelling steps.
Files included:
- all_pred_prob_SB.shp: Model predictions of all species presence probabilities without sampling bias correction
- Use these predictions to see the raw environmental suitability without accounting for proximity to research stations
- all_pred_prob.shp: Model predictions of species presence probabilities with sampling bias correction
- These predictions exclude the distance to scientific stations as a covariate to account for sampling bias in the occurrence data
- Use these predictions for understanding the actual distribution patterns after correcting for uneven sampling effort
Both shapefiles include predicted presence probabilities for the 28 species studied, as well as these standard deviations of predictions (uncertainty estimates).
Using the prediction shapefiles:
These shapefiles can be loaded directly into R using the sf package:
```r
library(sf)
# Load predictions without sampling bias correction
pred_with_bias <- st_read("all_pred_prob_SB.shp")
# Load predictions with sampling bias correction
pred_no_bias <- st_read("all_pred_prob.shp")
```
The predictions are provided at the same spatial resolution used in the analysis and cover all ice-free areas of Antarctica as defined by our study boundaries.
