Artificial Hotspot Occurrence Inventory (AHOI)
Park, Daniel et al. (2022), Artificial Hotspot Occurrence Inventory (AHOI), Dryad, Dataset, https://doi.org/10.5061/dryad.v41ns1s0p
Aim: Species occurrence records are essential to understanding Earth’s biodiversity and addressing global environmental issues, but do not always reflect actual locations of occurrence. Certain geographic coordinates are assigned repeatedly to thousands of observation/collection records. This may result from imperfect data management and georeferencing practices, and can greatly bias the inferred distribution of biodiversity and associated environmental conditions. Nonetheless, these ‘biodiverse’ coordinates are often overlooked in taxon-centric studies, as they are identifiable only in aggregate across taxa and datasets, and it is difficult to determine their true circumstance without in-depth, focused investigation. Here we assess highly recurring coordinates in biodiversity data to determine artificial hotspots of occurrences.
Taxon: Land plants, birds, mammals, insects
Methods: We identified highly recurring coordinates across plant, bird, insect, and mammal records in the Global Biodiversity Information Facility, the largest aggregator of biodiversity data. We determined which are likely artificial hotspots by examining metadata from over 40 million records; assessing spatial distributions of associated datasets; contacting data managers; and reviewing literature. These results were compiled into the Artificial Hotspot Occurrence Inventory (AHOI). Results: Artificial biodiversity hotspots generally comprised geopolitical and grid centroids. The associated uncertainty ranged from several square kilometers to millions. Such artificial biodiversity hotspots were most prevalent in plant records. For instance, over 100,000 plant occurrence records were assigned the centroid coordinates of Brazil, and points that have at least 1,000 associated occurrences comprised over 9 million records. In contrast, highly recurring coordinates in animal data more often reflected actual sites of observation.
Main Conclusions: AHOI can be used to i) improve accuracy of biodiversity assessments; ii) estimate uncertainty associated with records from artificial hotspots and make informed decisions on whether to include them in scientific studies; and iii) identify problems in biodiversity informatics workflows and priorities for improvement.
Primary biodiversity records were queried from the Global Biodiversity Information Facility on January 30 and May 10, 2021 for plants (Plantae; https://doi.org/10.15468/dl.th5tn8; https://doi.org/10.15468/dl.76jc24), June 3, 2022 for birds (Aves; https://doi.org/10.15468/dl.jh3u2u), and August 23, 2021 for insects (Insecta; https://doi.org/10.15468/dl.4q2972), and mammals (Mammalia; https://doi.org/10.15468/dl.cujmgz). We then assessed the frequency of the geographic coordinates and identified the most frequently recurring sets of coordinates across each taxonomic group. Coordinates were assessed as provided in the “decimalLatitude” and “decimalLongitude” columns of the downloaded data without any rounding to be conservative. Rounding coordinates before assessing their frequency would increase the overall number of records associated with each set of coordinates and increase the risk of associating true points with georeferenced ones. Only exact matches were counted to calculate the frequency of each unique set of coordinates.
We determined which of the highly-recurrent coordinates are likely artificial by examining metadata and images from datasets comprising over 40 million records to date; assessing spatial distributions of associated datasets; contacting data managers; and reviewing literature (Fig. 2). We used QGIS software to validate grid centroid coordinates by plotting the grid systems over the reported occurrence coordinates to confirm the grid centroid, grid size and the coordinate reference system. Countries represented in our dataset that utilized such grids were identified through occurrence record metadata, visual inspection of associated datasets, literature review, and data managers, and included France, the United Kingdom, Germany, the Netherlands, Belgium, Switzerland, and Spain. For each group, we started by evaluating the most recurrent set of coordinates and proceeded in order of decreasing frequency. We initially examined the top 100 recurring coordinates for plants and the top 50 recurring coordinates for each animal group. These coordinates were manually curated into the following categories when possible: grid centroid, geopolitical centroid, georeferenced location, and true observation or collection site. Some coordinates could be associated with multiple categories. It is possible that the determinations we made for highly-recurrent coordinates could also be extended to additional, less recurrent, coordinates that were assigned to other records in the datasets they belonged to (but not included in our initial survey). These data were compiled into AHOI, an inventory of highly-recurrent GBIF coordinates, with their descriptions and determinations.
To validate our approach and assess whether artificial biodiversity hotspots are the result of systemic practices or errors, we additionally evaluated data from the Field Museum of Natural History, as some of the top 100 most recurring coordinates were associated with the institution. We downloaded all plant records from this dataset and evaluated all coordinates that were assigned to at least 1000 records. We found that the coordinates from this dataset represented artificial aggregates of specimens around geopolitical centroids. These verifications were also included in AHOI. Further, we listed the rationale for each individual coordinate determination and provides examples of relevant information from occurrence record metadata in the “example_description” and “reasoning” fields respectively.
Any text editor or spreadsheet software.