Skip to main content

Assigning occurrence data to cryptic taxa improves climatic niche assessments: biodecrypt, a new tool tested on European butterflies

Cite this dataset

Dapporto, Leonardo et al. (2020). Assigning occurrence data to cryptic taxa improves climatic niche assessments: biodecrypt, a new tool tested on European butterflies [Dataset]. Dryad.


Occurrence data are fundamental to macroecology, but accuracy is often compromised when multiple units are lumped together (e.g. in recently separated cryptic species or citizen science records). Using amalgamated data leads to inaccuracy in species mapping, to biased beta-diversity assessments and to potentially erroneously predicted responses to climate change. We provide a set of R functions (biodecrypt) to objectively attribute undetermined occurrences to the most probable taxon based on a subset of identified records.

Biodecrypt assumes that unknown occurrences can only be attributed at certain distances from areas of sympatry. The function draws concave hulls based on the subset of identified records; subsequently, based on hull geometry, it attributes (or not) unknown records to a given taxon. Concavity can be imposed with an alpha value and sea or land areas can be excluded. A cross-validation function tests attribution reliability and another function optimizes the parameters (alpha, buffer, distance ratio between hulls). We applied the procedure to 16 European butterfly complexes recently separated into 33 cryptic species for which most records were amalgamated. We compared niche similarity and divergence between cryptic taxa, and we re-calculated and contributed updated CLIMBER variables for climatic preferences.

Main conclusions
Biodecrypt showed a cross-validated correct attribution of known records always ≥98% and attributed more than 80% of unknown records to the most likely taxon in parapatric species. The functions determined where records can be assigned even for largely sympatric species, and highlighted areas where further sampling is required. All the cryptic taxa showed significantly diverging climatic niches, reflected in different values of mean temperature and precipitation compared to the values originally provided in the CLIMBER database. The substantial fraction of cryptic taxa existing across different taxonomic groups and their divergence in climatic niches highlights the importance of using reliably assigned occurrence data in macroecology.

Usage notes

The script "Script.R" contains all the scrpts to run examples and the analyses to carry out separation of occurrence data among cryptic taxa.

The "biodecrypt.R" file contains the functions

The "Total_data.txt" file contains the data used for the analyses without duplicates for each cell

The "all_data.txt" file contains all the occurrence data (also duplicated data) with indications for geographic locations, BOLD and genbank IDs

The "Appendix_S2.txt" file contains the new CLIMBER data