Skip to main content

The global distribution of known and undiscovered ant biodiversity

Cite this dataset

Kass, Jamie et al. (2022). The global distribution of known and undiscovered ant biodiversity [Dataset]. Dryad.


Invertebrates constitute the majority of animal species and are critical for ecosystem functioning and services. Nonetheless, global invertebrate biodiversity patterns and their congruences with vertebrates remain largely unknown. We resolve the first high-resolution (~20-km) global diversity map for a major invertebrate clade, ants, using biodiversity informatics, range modeling, and machine learning to synthesize existing knowledge and predict the distribution of undiscovered diversity. We find that ants and different vertebrate groups have distinct features in their patterns of richness and rarity, underscoring the need to consider a diversity of taxa in conservation. However, despite their phylogenetic and physiological divergence, ant distributions are not highly anomalous relative to variation among vertebrate clades. Furthermore, our models predict rarity centers largely overlap (78%), suggesting that general forces shape endemism patterns across taxa. This raises confidence that conservation of areas important for small-ranged vertebrates will benefit invertebrates while providing a “treasure map” to guide future discovery.


This is the Data S1 supplemental archive for the paper "The global distribution of known and undiscovered ant biodiversity" in Science Advances by Kass et al. All methodological details can be found in the Materials and Methods section of the paper. The archive contains three data directory archives, programming code archived as software with Zenodo, supplementary figures and model metadata archived as supplemental information with Zenodo, and a master README file. Please see more detailed descriptions below.

Data The directory "main_analysis_data", archived in the "" file, contains all the core data used in the analyses described in the paper, except those data described below that were too big to include in one compressed file. This includes the Global Ant Biodiversity Informatics (GABI) data, both raw and after cleaning and geocoding, all diversity estimate raster data shown in the figures of the paper, and other related data. Please consult "main_analysis_data/README_main_analysis_data.txt" for more details. NOTE: If researchers encounter issues reproducing results with the code provided, please contact <>. Additional results for species and genera, including individual range estimates from polygons and species distribution models, individual datasets used for modeling, and intermediate occurrence data subsets for the geocoding analysis, are found in "results_species_add" and "results_genus_add", respectively. Intermediate occurrence data subsets for the geocoding analysis are also archived in "processing_data_add". The fitted Random Forest models used to make predictions of unknown diversity centers under a global high-sampling scenario, with variable importance calculated.

Please see more details in README_MASTER.txt.

Software A simple package that contains all the R and Python scripts used to conduct the analysis and generate the figures in the paper. This folder has its own separate README with more details. The easiest way to run code is to open the .Rproj file in RStudio and press the "Install and Restart" button under the "Build" tab in the Environment frame -- this installs and loads the package and thus makes all functions available in the programming environment. The main analysis script is located in analysis/main_analysis.R.

Supplemental Information Example figure plots made by Kass_et_al_2022_SciAdv_prog_code/analysis/figures.R. The figures displayed in the paper were made with ArcGIS using the same underlying data. Fig. 3 is not represented here because it was made with data processed in ArcGIS, but these processed files can be found in main_analysis_data/overlays/for_fig3.

ODMAP_model_metadata.csv: Metadata for Maxent species distribution models and Random Forest models structured according to the ODMAP (Overview, Data, Model, Assessment and Prediction) framework formalized by Zurell et al. (2020) []. This metadata was created using the shiny app located at and was edited lightly by hand to include some extra detail.

Usage notes

The programming code included in "Kass_et_al_2022_SciAdv_prog_code", archived on Zenodo as software, is written in both R and Python and is structured into a simple R package for ease of use. Additionally, ArcGIS software was used to build the geodatabase in "results_species_add/centers_projected/hs_proj_IRS60K_pairwiseInt.gdb" and can be used to access the data within, but the data can also be accessed via R or Python.


Okinawa Institute of Science and Technology Graduate University, Award: subsidy funding

Japan Society for the Promotion of Science, Award: KAKENHI 17K15180

Japan Society for the Promotion of Science, Award: Postdoctoral Fellowships for Foreign Researchers

Ministry of the Environment, Award: 4-1904

Leverhulme Trust, Award: RPG-2017-271

National Science Foundation, Award: DEB-1932405