The dataset of Liquidambar orientalis for species distribution models
Data files
Oct 06, 2023 version files 359.60 KB
-
auc_tss_liqori.xls
70.66 KB
-
bioclim_var_scores.xls
239.62 KB
-
README.md
5.30 KB
-
Suitability.xls
23.04 KB
-
vif_results.xls
20.99 KB
Oct 06, 2023 version files 359.60 KB
-
auc_tss_liqori.xls
70.66 KB
-
bioclim_var_scores.xls
239.62 KB
-
README.md
5.30 KB
-
Suitability.xls
23.04 KB
-
vif_results.xls
20.99 KB
Abstract
The primary objective of this study was to predict the existing geographic range of Liquidambar orientalis, commonly known as the oriental sweetgum. To gain insights into the potential effects of climate change on the oriental sweetgum, the study employed species distribution models to project the model to future periods. Considering two Shared Socioeconomic Pathways (SSP1-2.6 and SSP5-8.5), the ensemble modeling approach utilized the biomod2 package in the R programming language to analyze the alterations in the spatial distribution of the species in forthcoming periods (namely, for the years 2035s, 2055s, and 2070s).
README: The dataset of Liquidambar orientalis for species distribution models
https://doi.org/10.5061/dryad.1ns1rn914
The dataset includes
1. Occurrence points of Liquidambar orientalis obtained from GBIF and EUFORGEN\,
2. Bioclimate data of the occurrences obtained from CHELSA\,
3. Variance Inflation Factors (VIF) results of four variables (bio1\, bio2\, bio13\, and bio18)\,
4. The Area Under the Curve (AUC) of the Receiving Operator Characteristics (ROC) and the True Skill Statistic (TSS) results of the modeling\,
5. Bioclimate variables' importance scores after modeling
6. The ROC graph (including each algorithm's total scores)
7. "Suitability.xls" contains the area calculation of suitable habitats.
Description of the data and file structure
1. "occurence_points.xls" contains 81 geographical coordinates (latitude and longitude) of Liquidambar orientalis in Anatolia and Rhodos Island.
2. "bioclim_variables.xls" contains nineteen bioclimatic variables (bio1 to bio19). These variables represented climatic factors and were downloaded at a spatial resolution of 30-arc seconds\, approximately equivalent to 1 km2. The dataset covered four temporal ranges: 1981-2010, 2011-2040, 2041-2070, and 2071-2100. Each bioclimatic variable value was extracted from the grid cell using QGIS 3.18.2. Temperatures in degrees Celsius; precipitation in mm
- BIO1 = Annual Mean Temperature
- BIO2 = Mean Diurnal Range (Mean of monthly (max temp - min temp))
- BIO3 = Isothermality (BIO2/BIO7) (×100)
- BIO4 = Temperature Seasonality (standard deviation ×100)
- BIO5 = Max Temperature of Warmest Month
- BIO6 = Min Temperature of Coldest Month
- BIO7 = Temperature Annual Range (BIO5-BIO6)
- BIO8 = Mean Temperature of Wettest Quarter
- BIO9 = Mean Temperature of Driest Quarter
- BIO10 = Mean Temperature of Warmest Quarter
- BIO11 = Mean Temperature of Coldest Quarter
- BIO12 = Annual Precipitation
- BIO13 = Precipitation of Wettest Month
- BIO14 = Precipitation of Driest Month
- BIO15 = Precipitation Seasonality (Coefficient of Variation)
- BIO16 = Precipitation of Wettest Quarter
- BIO17 = Precipitation of Driest Quarter
- BIO18 = Precipitation of Warmest Quarter
- BIO19 = Precipitation of Coldest Quarter
3."vif_results" contains Variance Inflation Factors (VIF) scores. These data were computed for climatic variables to avoid the issue of multi-collinearity using the "usdm package" in R. Environmental variables exhibiting a VIF over 10 were eliminated from the analysis. The final set included temperature-related (bio1 and bio2) and precipitation-related (bio13 and bio18). Species distribution modeling was continued with these final four variables.
4. "auc_tss_liqori.xls" contains The Area Under the Curve (AUC) of the Receiving Operator Characteristics (ROC) and the True Skill Statistic (TSS) results of each algorithm and the ensemble model. These results were obtained by modeling and performed in five replicates. Yellow highlightings indicate averages of five replicates.
The headers' explanation (from biomod2's pdf [https://cran.r-project.org/web/packages/biomod2/biomod2.pdf])
- cutoff : the associated cut-off used to transform the continuous values into binary.
- sensitivity : the sensibility obtained on fitted values with this threshold.
- specificity : the specificity obtained on fitted values with this threshold.
Evaluation metric are calculated on the calibrating data (column calibration), on the cross-validation data (column validation) or on the evaluation data (column evaluation).
- calibration IDs of elements selected for calibration.
- validation IDs of elements selected for validation (complementary to the calibration set).
biomod2_ensemble_model : model_class is EM.
- EMmean_biomod2_model : model_class is EMmean.
- EMmedian_biomod2_model : model_class is EMmedian.
- EMcv_biomod2_model : model_class is EMcv.
- EMci_biomod2_model : model_class is EMci.
- EMca_biomod2_model : model_class is EMca.
- EMwmean_biomod2_model : model_class is EMwmean.
5. "bioclim_var_scores.xls" contains four bioclimate variables' importance scores after modeling in five replicates. Yellow highlightings indicate averages of five replicates.
The headers' explanation (from biomod2's pdf [https://cran.r-project.org/web/packages/biomod2/biomod2.pdf])
- expl.var : the considered explanatory variable (the one permuted)
- var.imp: the variable’s importance score (optional, default NULL) An integer corresponding to the number of permutations for each variable to estimate variable importance.
- rand : the ID of the permutation run.
6. "Rplot_roc_graph.tiff" obtained from biomod2 after modeling. This graph shows the Area Under the Curve (AUC) of the Receiving Operator Characteristics (ROC) and the True Skill Statistic (TSS) results of each algorithm except the ensemble model.
7. "Suitability.xls" contains the area calculation of five suitability habitat categories for the current and the future. The categories were defined as follows: unsuitability (0-0.2)\, low suitability (0.2-0.4)\, medium suitability (0.4-0.6)\, suitability (0.6-0.8)\, and high suitability (0.8-1.0).
Methods
1. Occurrence data
81 occurrence data were obtained from two reputable sources: the Global Biodiversity Information Facility (GBIF 2023, www.gbif.org) and the European Forest Genetic Resources Program (EUFORGEN 2023). In the dataset obtained from the Global Biodiversity Information Facility (GBIF), erroneous and redundant records were removed.
2. Environmental data
The dataset used in this study consisted of nineteen bioclimatic variables (BIO1 to BIO19) obtained from the CHELSA version 2.1 (https://chelsa-climate.org/). These variables as .tiff format represented climatic and environmental factors and were downloaded at a spatial resolution of 30-arc seconds. The dataset covered four temporal ranges: 1981-2010, 2011-2040, 2041-2070, and 2071-2100. The bioclimatic variable values of the grid cell were obtained using QGIS 3.18.2. The data utilized in this study were obtained from two Global Circulation Models (GCMs), namely the Max Planck Institute Earth System Model (MPI-ESM1-2-HR) and the Meteorological Research Institute Earth System Model Version 2.0 (MRI-ESM2.0).
Variance Inflation Factors (VIF) were computed for climatic variables to avoid the issue of multi-collinearity. The usdm package in R was utilized for this purpose.
3. Species distribution modeling
The species distribution models, including the ensemble, were built using the R programming language (https://www.r-project.org/) and the biomod2 package version 4.2-4 (https://cran.r-project.org/web/packages/biomod2/biomod2.pdf). The following techniques were employed: Generalized Linear Model (GLM), Random Forest (RF), Generalized Boosted Model (GBM), Generalized Additive Model (GAM), Maximum Entropy (Maxent), and the ensemble model.