Benthic invertebrates in the Wadden Sea form a stable community characterized by facilitating relationships
Data files
Jan 31, 2025 version files 10.75 MB
-
DRYAD_Ecosphere.zip
10.75 MB
-
README.md
6.74 KB
Abstract
Entire tidal food webs rely on the presence and productivity of benthic invertebrates. These invertebrates recycle nutrients, decompose organic matter, and function as food for myriad species at higher trophic levels. The interactions between benthic invertebrate species also plays an important role in shaping the ecological functioning of these ecosystems. Here, we used a deep-learning species distribution model to characterise the interspecific interactions occurring in an intertidal benthic invertebrate community while accounting for abiotic factors. The data includes > 30 000 samples collected between 2008-2020, over a spatial grid of more than 2 400 km2 in the Wadden Sea. The benthic invertebrates in the Wadden Sea were shown to form a stable community where species engage in relatively few strong interactions in a larger network of weak interactions. This corroborates classical theory on stability-connectivity relations. We provide a steppingstone for species-specific analysis by showing that numbers of interaction link to functional species traits. However, the biological interpretation of these links remains open. We conclude that rather than posing a catch-all solution for improving our understanding of benthic invertebrate communities, our approach provides a baseline interaction mapping tool and starting point for more targeted experiments to elucidate underlying mechanisms.
README: Benthic invertebrates in the Wadden Sea form a stable community characterised by facilitating relationships
https://doi.org/10.5061/dryad.gqnk98svd
This repository contains information accompanying the manuscript "Benthic invertebrates in the Wadden Sea form a stable community characterized by facilitating relationships (2024, accepted)".
Description of the data and file structure
The data and scripts used in the analysis presented in the manuscript are contained in the DRYAD folder. The folder has two subdirectories (DATA and CODE).Note: The sampling data from the SIBES campaign including data on sediment composition and species abundance uand biomasses per sample used in the Python code are stored and accessible via Bijleveld, A. I., Tacoma, M. & Koolhaas, A. SIBES dataset (2024) doi:10.25850/nioz/7b.b.ug.
- The DATA folder contains three subdirectories:
- heatmap_data (1 file)
- All_species_variables_feature_importance.xlsx - This dataset with column delineation by semicolon (;) lists the shapley feature weights from all covariates in the model (column one), to the predicted target model species (column 2), the feature weight (column 3), type of relationship (column 4), and the associated PR AUC score of the DL-SDM of the target model (column 5, column 6).
- This dataset was used with the script Heatmap.R in the CODE/R subdirectory to create a heatmap of the abiotic interactions and interspecific interactions across species.
- trait_data (1 file)
- TRAIT_MATRIX.xlsx - This dataset has the trait scoring from Clare et al. 2022 (https://doi.org/10.1038/s41597-022-01442-y) for each of the species a DLSDM was run for and a column listing the number of interspecific interactions identified for each of these species.
- This dataset was used with the script Trait_PCA_table.R in the CODE/R subdirectory to conduct a PCA of trait structuring and a multivariate regression analysis of the relationship between PCA trait loading scores on each principal component as a covariate and the number of interspecific interactions of a species as dependent variable.
- Empty cells (-) in the columns GENUS or SIBES GENUS mean that individuals could only be classified down to the family level.
- raw_abiotic_data (5 files)
- Here all the abiotic variable data are stored that are used as features in the DL-SDM, spread over five .tiff, and two .csv files.
- dvd2019_20m_wgs84.tif - A raster with the dryvall duration in the Wadden Sea used to extract covariate values at each sampling location using the 2.Adding_covariates.ipynb notebook located in the CODE/PYTHON subdirectory.
- salinity_raster.tif - A raster with the salinity in the Wadden Sea used to extract covariate values at each sampling location using the 2.Adding_covariates.ipynb notebook located in the CODE/PYTHON subdirectory..
- shear_stress_EPSG4326.tif - A raster with the bed shear stress in the Wadden Sea used to extract covariate values at each sampling location using the 2.Adding_covariates.ipynb notebook located in the CODE/PYTHON subdirectory.
- wave_forcing_EPSG4326.tif - A raster with the wave forcing in the Wadden Sea. used to extract covariate values at each sampling location using the 2.Adding_covariates.ipynb notebook located in the CODE/PYTHON subdirectory.
- zoutgehalte.tif - A secondary raster with the salinity in the Wadden Sea used to reproject salinity raster.tif using the 2.Adding_covariates.ipynb notebook located in the CODE/PYTHON subdirectory.
- The CODE folder contains two subdirectories:
- PYTHON (3 files)
- Here, all the scripts related to loading in the raw data, data preparation and DL_SDM analysis are presented in three jupyter notebook files.
- 1.Species_selection.ipynb - The first script to run, taking the raw biotic data as input and creating a dataset with the 25 most common species with non-zero abundances as output (saved as df_abundance_nz.csv), used as input in the next notebook. Next to this the notebook can be used to check the interannual variation in co-occurrence between species.
- 2.Adding_covariates.ipynb - Takes the df_abundance_nz.csv file created in the previous notebook as input. Then uses the environmental data contained in the raw_abiotic_data subdirectory to add environmental covariates to each sample. Output is stored in df_abundance_nz_covariates_no_radius.csv.
- 3.DL_SDM_Imbalanced_class_approach.ipynb - Takes the df_abundance_nz_covariates_no_radius.csv created in the previous script as input and uses it to run a DLSDM for each of the target species, with performance measured in PR AUC and stored in species performance_8_.csv
- For feature importance two separate outputs are generated for each DLSDM, a .csv file listing the mean shapley value for each covariate across all subsampled points, and a .png file visualizing the individual shapley values across all subsampled points (outputted as 8 feature plot (species name) abundance m2.png and 8 feature importance (species name) abundance m2.csv).
- These outputs have been combined across all species in All_species_variables_feature_importance.xlsx in the DATA/heatmap_data subdirectory. The plots indicate the type of relationship (e.g. high positive shapley values with high positive feature values and vice versa indicate a 'positive relationship', a high positive shapley values with high negative feature values and vice versa indicate a 'negative relationship', and a mixture of these represent an 'indeterminate' relationship.
- R (2 files)
- Here, all the scripts related to analyzing the feature importances (as abiotic or interspecific interactions) and the relationship between interspecific interactions and traits are stored.
- heatmap.R - Takes All_species_variables_feature_importance.xlsx as input and creates heatmaps of the biotic and abiotic relationships in the dataset.
- Trait_PCA_table.R - The first part of the script takes the TRAIT_MATRIX.xlsx as input and fits a categorical principal component analysis on all traits (except for the number of interspecific interactions) using the Gifi package to structure traits. The second part of the script takes the identified principal components as input and examines the relationship between the number of interspecific interactions of a species and trait structuring across the principal components using multivariate linear regression.