Linking environmental gradients, functional traits, and Phylogenetic structure in Meloidae (Coleoptera) assemblages of inner western Anatolia
Data files
Mar 12, 2026 version files 254.18 KB
-
environment_filt.csv
145.29 KB
-
HMSC_Script1.R
13.74 KB
-
HMSC_Script2.R
1.37 KB
-
HMSC_Script3.R
9.30 KB
-
HMSC_Script4.R
1.85 KB
-
HMSC_Script5.R
13.74 KB
-
HMSC_Script6.R
18.31 KB
-
HMSC_Script7.R
6.20 KB
-
README.md
8.43 KB
-
species_incidence.csv
32.69 KB
-
traits.csv
3.28 KB
Abstract
This dataset contains species incidence records, environmental predictors, trait information, and R scripts used for the Hierarchical Modelling of Species Communities (HMSC) analysis of Meloidae (Coleoptera) assemblages in the Inner Western Anatolia region of Türkiye. The dataset includes three input files: (i) species_incidence.csv, a presence–absence matrix of Meloidae species across sampled sites, (ii) environment_filt.csv, a matrix of environmental predictors (bioclimatic, topographic, land-cover, and vegetation variables), and (iii) traits.csv, trait information including larval dispersal strategies (phoresy) and host type, as well as wing morphology. In addition, seven R scripts (HMSC_Script1–7.R) provide the complete workflow for data processing, model fitting, convergence assessment, prediction, and visualization.
Dataset DOI: 10.5061/dryad.9w0vt4bv5
Description of the data and file structure
This dataset was collected to investigate the environmental drivers, trait–environment interactions, and phylogenetic structure of Meloidae (Coleoptera) communities in Inner Western Anatolia, Türkiye. Field surveys were conducted between 2019, 2021, and 2022 across 1416 sites, from which 487 sites were retained for modeling. Species presence–absence data, site-level environmental predictors, and species trait information were assembled to fit Hierarchical Modelling of Species Communities (HMSC). The data support analyses of species distributions, community structure, and biodiversity responses to environmental gradients
Files and variables
File: traits.csv
Description: Species-level traits used in HMSC models.
Variables
- species: Species name
- genus, tribus, subfamily: Taxonomic classification
- triungulin_phoresy: Presence/absence of phoresy in first-instar larvae (0/1)
- triungulin_host: Host type (bee nests / grasshopper egg pods)
- CS_mean: Mean centroid size of male hind wings derived from geometric morphometric analysis. Centroid size is calculated as the square root of the summed squared distances of all landmarks from their centroid. Unit: pixels (relative geometric morphometric units)
- CS_cv: Coefficient of variation of centroid size (dimensionless)
- PC1, PC2, PC3: Principal components summarizing hind wing shape variation derived from geometric morphometric analysis
File: species_incidence.csv
Description: Presence–absence data matrix of Meloidae species across sites
Variables
- id: Unique site identifier
- All other columns = species presence (1) / absence (0).
File: environment_filt.csv
Description: Environmental predictors at each site
Variables
- id: Site identifier (matches species_incidence.csv)
- Climatic variables: bio_1–bio_19: Bioclimatic variables derived from WorldClim v2.1. These variables summarise annual trends, seasonality, and extreme or limiting environmental factors related to temperature and precipitation. Examples include annual mean temperature, temperature seasonality, precipitation of the wettest month, and precipitation seasonality
- Topographic variables: Derived from the EarthEnv topography dataset (Amatulli et al., 2018) based on SRTM elevation data. elev: Elevation above sea level (meters). roughness: Terrain roughness index representing variation in elevation within a grid cell (meters). slope: Terrain slope (degrees). tpi: Topographic Position Index, indicating whether a location is situated on a ridge, slope, or valley relative to the surrounding terrain (dimensionless). Positive values indicate ridges or hilltops, negative values indicate valleys. tri: Terrain Ruggedness Index representing elevation heterogeneity within the surrounding area (dimensionless)
- Land cover variables: Derived from Copernicus Global Land Monitoring Service land cover maps (100 m; Buchhorn et al., 2020)). Values represent the proportion of each land cover type within the grid cell (range 0–1). crops: Proportion of cropland cover. grass: Proportion of grassland cover. shrub: Proportion of shrubland cover
- Local environmental conditions. instant_temperature: Air temperature measured in situ at the time of sampling (°C). monthly_temperature: Mean air temperature of the sampling month derived from WorldClim (°C). monthly_precipitation: Total precipitation during the sampling month (millimeters, mm). NDVI: Normalized Difference Vegetation Index derived from MODIS satellite imagery (MODIS MOD13Q1), representing vegetation greenness and productivity (dimensionless; range approximately −1 to 1)
- n, e: Latitude (northing) and longitude (easting) in decimal degrees (WGS84).
- date: Sampling date DD.MM.YYYY
- hour: Sampling hour - 24-hour format
File: HMSC_Script1.R
Description: Data preparation and model setup for HMSC. **Imports species incidence, environmental predictors, and functional trait data; filters rare species; preprocesses and standardizes environmental variables; reduces multicollinearity among predictors; and constructs the phylogenetic and trait data structures. Defines the study design and spatial and temporal random effects, and builds the unfitted HMSC presence–absence model used for subsequent Bayesian inference.
File: HMSC_Script2.R
Description: Bayesian model fitting using MCMC. Runs Markov chain Monte Carlo sampling for the unfitted HMSC model using multiple parallel chains and saves posterior samples for all parameters, providing the basis for convergence diagnostics, model evaluation, and ecological inference in subsequent analyses.
File: HMSC_Script3.R
Description: MCMC convergence diagnostics and summary outputs. Evaluates convergence and sampling efficiency of the fitted HMSC model by computing Gelman–Rubin PSRF and effective sample size (ESS) for key parameter groups. Produces summary tables and diagnostic plots used to assess model convergence prior to interpretation of results.
File: HMSC_Script4.R
Description: Cross-validation and predictive model assessment. Performs k-fold cross-validation for the fitted HMSC model by withholding a subset of observations, refitting the model, and generating out-of-sample predicted values. Saves fold-specific prediction outputs used to evaluate model predictive performance.
File: HMSC_Script5.R
Description: Model evaluation and comparison (cross-validated vs in-sample). Summarises predictive performance of the HMSC model by comparing cross-validation (out-of-sample) fit metrics with in-sample fit metrics, and computes WAIC for model comparison. Exports tables and diagnostic summaries used to report model performance and to check potential overfitting.
File: HMSC_Script6.R
Description: Posterior inference and visualisation (Beta, Gamma, Omega). Extracts posterior estimates from the fitted HMSC model to summarise (i) environmental responses (Beta), (ii) trait–environment interactions (Gamma), and (iii) residual species associations by random level (Omega). Produces variance partitioning summaries, heatmaps, and exports posterior mean/support tables used for figures and interpretation in the manuscript.
File: HMSC_Script7.R
Description: Total and marginal effects on species richness. Quantifies total and marginal effects of environmental predictors on community-level species richness using posterior predictions from the fitted HMSC model. Summarises directional responses and associated posterior probabilities, and exports tables used to interpret richness patterns along environmental gradients.
Code/software
All analyses were conducted in R (version 4.3.x, R Core Team 2023) using the open-source package Hmsc (v3.0-13; Tikhonov et al. 2020).
Additional packages required include:
coda (MCMC diagnostics)
ggplot2, dplyr, tidyr, readr, reshape2, data.table (data handling and visualisation)
ComplexHeatmap, circlize, RColorBrewer, scales (heatmaps and colour scales)
ape, Matrix (phylogenetic and matrix operations)
Access information
Other publicly accessible locations of the data:
- N/A
Data was derived from the following sources:
- WorldClim v2.1 (Fick & Hijmans 2017): Global climate database providing bioclimatic variables at ~1 km spatial resolution derived from historical weather station data.
- EarthEnv (Amatulli et al. 2018): Global environmental layers derived from digital elevation models, based on SRTM (Shuttle Radar Topography Mission) version 4.1 digital elevation data.
- Copernicus Global Land Monitoring Service (Buchhorn et al. 2020): Global land cover maps at 100 m spatial resolution derived from satellite observations.
- MODIS MOD13Q1: Moderate Resolution Imaging Spectroradiometer (MODIS) vegetation index product providing NDVI values at 250 m spatial resolution in 16-day composites, distributed by NASA LP DAAC.
- Field surveys (2019–2022): Species occurrence data and some functional trait measurements collected directly by the authors during field sampling campaigns in Inner Western Anatolia, Türkiye.
