Skip to main content
Dryad

Data for: The meta-analysis of the effects of spatial sampling bias correction on presence only species distribution models

Cite this dataset

Baker, David (2023). Data for: The meta-analysis of the effects of spatial sampling bias correction on presence only species distribution models [Dataset]. Dryad. https://doi.org/10.5061/dryad.9zw3r22j1

Abstract

This dataset contains information extracted from 70 studies identified through a systematic review of the peer-reviewed literature (Web of Science and SCOPUS databases both searched on the 13/02/2023) to evaluate the effect of spatial sampling bias correction methods in presence-only species distribution models.

README


title: Data for the meta-analysis of the effects of spatial sampling bias correction on presence-only species distribution models.
output:
pdf_document: default

html_document: default

This dataset contains information extracted from 220 studies identified through a systematic review of the peer-reviewed literature (Web of Science and SCOPUS databases both searched on the 13/02/2023) to evaluate the usage and effect of spatial sampling bias correction methods in presence-only species distribution models.

Description of the Data and file structure

The dataset contains the following columns:

  • study_id = Unique number for identifying each study.
  • bibtexkey = Author/year key.
  • journal = Name of journal publishing the study.
  • title = Title of article.
  • volume = Volume number of published study.
  • year = Year of publication for study.
  • doi = doi of study.
  • focal_kingdom = Kingdom of focal species (all named, separated by '|').
  • focal_phylum = Phylum of focal species (all named, separated by '|').
  • focal_class = Class of focal species (all named, separated by '|').
  • focal_order = Order of focal species (all named, separated by '|').
  • focal_family = Family of focal species (all named, separated by '|').
  • focal_genus = Genus of focal species (all named, separated by '|').
  • focal_species = Species name for focal species (all named, separated by '|').
  • species_common_name = Species common name in English (all named, separated by '|').
  • number_species = Number of species included in analysis.
  • min_number_rec = Minimum number of records used in building species distribution models in the study.
  • max_number_rec = Maximum number of records used in building species distribution models in the study.
  • location = Geographic location of study.
  • grain = Spatial grain of analysis (if multiple, separated by '|').
  • sdm_method = Species distribution model (SDM) method used (if multiple, separated by '|').
  • n_bkgrd = Number of background points used in building SDMs (if multiple, separated by '|').
  • bias_correction_method = Spatial sampling bias correction method used (if multiple, separated by '+').
  • bias_correction_treatment = A numeric identifying treatments within studies, where multiple bias correction treatments are reported.
  • bias_correction_notes = Additional notes on spatial sampling bias correction method.
  • basisOfWeights = The approach used to create spatial sampling bias weights (i.e. for sampling background data).
  • covariates = Covariates included in model to correct for spatial sampling bias.
  • basisOfBuffer = The approach used to define a buffer to capture spatial sampling bias (i.e. for sampling background data).
  • comparison_to_uncorrected = Whether there was a comparison between models with and without spatial sampling bias correction.
  • performance_metric = The metric used to evaluate model predictive performance.
  • corrected_m = Reported mean value of the performance_metric for corrected models.
  • corrected_sd = Reported standard deviation of the performance_metric for corrected models.
  • corrected_n = Reported sample size for the corrected models.
  • corrected_lci = Reported lower 95% confidence interval for the corrected models.
  • corrected_uci = Reported upper 95% confidence interval for the corrected models.
  • uncorrected_m = Reported mean value of the performance_metric for uncorrected models.
  • uncorrected_sd = Reported standard deviation of the performance_metric for uncorrected models.
  • uncorrected_n = Reported sample size for the uncorrected models.
  • uncorrected_lci = Reported lower 95% confidence interval for the uncorrected models.
  • uncorrected_uci = Reported upper 95% confidence interval for the uncorrected models.
  • test_dataset = Whether the models were evaluated using internal or independent test data.
  • quant_metric = The quantile of metric reported (e.g. mean_sd, mean_95ci, median_iqr).
  • notes = Additional notes.
  • useInMa = Whether to use in meta-analysis (used where multiple performance_metrics (e.g. AUC, TSS) values have been extracted for the same model predictions).

NA values in cells indicate that the information was not available or not relevant to that entry.

Sharing/access Information

The dataset contains data extracted from published literature and the literature sources are referenced therein. In most cases the data was extracted from the published study or supplementary material, but where this was not possible the study authors were contacted and asked to supply the raw data.

Methods

Web of Science and SCOPUS databases were searched on the 13/02/2023 using the following search string:

ALL=(("species distribution*" OR SDM OR "environmental niche" OR ENM OR "resource selection" OR "habitat selection" OR suitability OR occurrence) AND ("presence-only" OR “presence data” OR "presence-background" OR “pseudo absence” OR opportunistic OR “citizen science” OR preferential OR maxent OR biomod))

After removing duplicates, the search returned 8564 unique studies, and these were further filtered to remove studies that fell outside of the review subject area based on the title and abstract and then the remaining studies were filtered by content based on the criteria that they involved the building of SDMs using PO data (i.e. no absence information, including inferred absences from complete species lists) and that the study included a direct comparison between SDMs that attempted to correct models for SSB and models without this correction. To avoid ambiguity, studies were required to mention explicitly that a particular analytical approach was designed to account for SSB (e.g., not “filtering to reduce spatial autocorrelation”, which is ambiguous as to the cause of the spatial autocorrelation). This identified 70 studies from which information on the effect of SSB correction on model performance was extracted, along with metadata on species taxonomy, sample sizes of occurrence data, and details of the SDM methods and SSB correction approach used.

Usage notes

The file can be opened with any software capable of reading a .csv file.

Funding

Natural Environment Research Council, Award: NE/V007726/1

Natural Environment Research Council, Award: NE/W004941/1