Data from: Species associations in joint species distribution models: from missing variables to conditional predictions

Data files

May 22, 2026 version files 2.05 MB

1_-_Fit_model_jSDM.R

5.54 KB
2_-_Evaluate_prediction_model.R

3.51 KB
3_-_Traits_correlation_with_association_matrix.R

9.70 KB
Dataframe_STOC_input.RData

2.03 MB
README.md

4.62 KB

Abstract

Aim: The abundance and distribution of multiple species are interconnected through various mechanisms (e.g., biotic interactions or common responses to the environment) shaping communities. Joint species distribution models (jSDM) have been introduced as a potential tool to integrate these mechanisms when modelling multiple species distributions, by inferring a residual matrix of species associations that could inform on biotic interactions. However, the direct link between these residual associations and biotic interactions has been challenged. Here, we test how the data type, resolution, and sampling size affect the species associations identified by jSDMs and their benefits for predicting species given the known state of others (i.e. conditional prediction).

Location: France

Taxon: Birds

Method: Utilizing standardized co‐abundances of 40 common bird species, across 7040 monitoring sites, and eight environmental variables at high resolution (200 m), we compared jSDM residual associations across data types (abundance vs. occurrence), resolution and sampling size. Additionally, we investigated correlations between residual associations and species functional similarities (eight traits). We then assessed to what extent residual associations contain valuable information for conditional predictions.

Results: Our results show that species associations identified by jSDM are greatly influenced by data resolution and sampling size rather than data types (abundance vs. occurrence). We find positive correlations between species associations and functional similarity that challenges the inference of negative biotic interactions expected from niche partitioning. However, retrieving these high‐resolution residual species associations for conditional predictions enhanced predictive quality for all species (+235% on average), potentially synthesizing missing variables difficult to capture in the field.

Main Conclusions: We highlight that species associations identified by jSDM using fine‐resolution co‐abundance datasets do not retrieve biotic interactions expected from niche partitioning (i.e. positive correlation with functional similarity) but probably missing environmental variables. Nonetheless, these residual associations contain valuable information to enhance predictive performance through currently underutilized conditional predictions.

https://doi.org/10.5061/dryad.bnzs7h4g3

Description of the data and file structure

Data and code description for Vallé et al. 2024 - Species associations in joint species distribution models: from missing variables to conditional predictions.

DATA-SPECIFIC INFORMATION FOR: Dataframe_STOC_input.RData

1. Number of variables: 96 (including species)

2. Number of rows: 26667

3. Key Variable List:

id_point_annee: ID survey point + year.
point: ID point.
carre: ID plot.
latitude_WGS84: latitude of the point in WGS84.
longitude_WGS84: latitude of the point in WGS84.
latitude_WGS84: latitude of the plot in WGS84.
latitude_WGS84: latitude of the plot in WGS84.
annne.x: Year of the data.

Used in models:

NDVI: NDVI at 250m from MODIS extracted with MODISTools'R package.
light_pollution: Light pollution at 1km from Li et al. 2020.
tmp_spring: Mean spring temperature at 10km resolution from the European climate rasters (E-OBS) provided by ECA&D, using the R package climateExtract
precip_spring: Total spring precipitation at 10km resolution from the European climate rasters (E-OBS) provided by ECA&D, using the R package climateExtract
Landscape.PCAx: x dimension of the PCA made with CESBIO 2018 raster categories (see Methods in article).
Habitat_V2: Habitat category according to Julliard et al. 2006 (see Table S2 for distribution).

4. Notes: This dataset is a subset of the studied year + species pool from the French Breeding Bird Survey (FBBS).

All variable descriptions in the order they appear in the.RData:

id_point_annee = ID survey point + year
NDVI = NDVI at 250m ( dimensionless ratio)
light_pollution = Light pollution at 1km from Li et al. 2020
annee.x = Year
precip_spring = Total spring precipitation (in mm/day)
tmp_spring = Mean spring temperature (in degrees)
p_type = Internal variables from the dataset extraction process to retrieve Habitat_V2 category.
p_milieu = Internal variables from th dataset extraction process to retrieve Habitat_V2 category.
point = ID point.
carre = ID plot.
altitude = Altitude (in m)
latitude_WGS84: latitude of the point in WGS84.
longitude_WGS84: latitude of the point in WGS84.
latitude_grid_WGS84: latitude of the plot in WGS84.
latitude_grid_WGS84: latitude of the plot in WGS84.
department = Number of the French department (region in France)
qualite_inventaire_STOC = Internal variables from the dataset extraction process, not used in this study.
habitat_principal = Internal variables from the dataset extraction process to retrieve Habitat_V2 category.
habitat_secondaire = Internal variables from the dataset extraction process to retrieve Habitat_V2 category.
foret_p = Boolean (TRUE/FALSE) if the plot are in forest habitat according to Habitat_V2 category.
agricole_p = Boolean (TRUE/FALSE) if the plot are in farmland habitat according to Habitat_V2 category.
urbain_p = Boolean (TRUE/FALSE) if the plot are in urban habitat according to Habitat_V2 category.
ouvert_p = Boolean (TRUE/FALSE) if the plot are in openfield according to Habitat_V2 category.
species_names [column 24 to 63]: count of species individual recorded.
Habitat_V2 = 18 Habitat categories according to Julliard et al. 2006 (see Table S2 of the associated article for distribution and definitions).
test$shape_df.short_name = Biogeographical region, not used in this study.
Total = Total of individual bird surveyed in the plot (total count of column 24 to 63)
Landscape.PCA1 = first dimension of the PCA made with CESBIO 2018 raster categories (see Methods in article).
Landscape.PCA2 = second dimension of the PCA made with CESBIO 2018 raster categories (see Methods in article).
Landscape.PCA3 = third dimension of the PCA made with CESBIO 2018 raster categories (see Methods in article).
Landscape.PCA4 = fourth dimension of the PCA made with CESBIO 2018 raster categories (see Methods in article).

CODE MAIN INFORMATION

1 - Fit model jSDM --> script used to fit the jSDM model with settings that can be changed to retrieve all models/associations matrix fitted in article (Figure 1/2/3) and described in Table S3.

2 - Evaluate prediction model --> script used to compute evaluation metrics presented in Figure 5.

3 - Traits correlation with association matrix --> script used to test correlation between association and traits (see Methods and Reference section for the source of the traits used) presented in Figure 4.