Skip to main content

Cannot see the diversity for all the species: evaluating inclusion criteria for local species lists when using abundant citizen science data

Cite this dataset

Ruete, Alejandro et al. (2021). Cannot see the diversity for all the species: evaluating inclusion criteria for local species lists when using abundant citizen science data [Dataset]. Dryad.


Abundant citizen science data on species occurrences is becoming increasingly available and enables identifying composition of communities occurring at multiple sites with high temporal resolution. However, for species displaying temporary patterns of local occurrences, i.e. that are transient to some sites, biodiversity measures are clearly dependent on the criteria used to include species into local species lists. Using abundant opportunistic citizen science data from frequently visited wetlands we investigated the sensitivity of α- and β-diversity estimates to the use raw vs. detection-corrected data and to the use of inclusion criteria for species presence reflecting alternative site use. We tested 7 inclusion criteria (with varying number of days required to be present) on time series of daily occurrence status during a breeding season of 90 days for 77 wetland bird species. We show that even when opportunistic presence-only observation data is abundant, raw data may not produce reliable local species richness estimates and rank sites very differently in terms of species richness. Furthermore, occupancy model based - and - diversity estimates were sensitive to the inclusion criteria used. Total species lists (all species observed at least once during a season) may therefore mask diversity differences among sites in local communities of species, by e.g. including vagrant species on potentially breeding communities and change the relative rank order of sites in terms of species richness. Very high sampling effort does not necessarily free opportunistic data from its inherent bias and can produce a pattern in which many species are observed at least once almost everywhere, thus leading to a possible paradox: the large amount of biological information may hinder its usefulness. Therefore, when prioritizing among sites to manage or preserve species diversity estimates need to be carefully related to relevant inclusion criteria depending on the diversity estimate in focus.


This table summarise the site and species specific probability of deteciton obtained from each species occupancy model. Values are point estimates representing the median of the posterior probability distribution of the probability of detection per site, given a field visit of quaality equal to the maximum observed quality (Species List Length = 50). That is, each value represents the maximum expected detection probability per site.


Swedish Research Council

Swedish Research Council for Environment Agricultural Sciences and Spatial Planning