Effectively and accurately mapping global biodiversity patterns for different regions and taxa
Hughes, Alice; Orr, Michael C.; Yang, Qinmin; Qiao, Huijie (2021), Effectively and accurately mapping global biodiversity patterns for different regions and taxa, Dryad, Dataset, https://doi.org/10.5061/dryad.hhmgqnkgd
To understand the representativeness and accuracy of expert range maps, and explore alternate methods for accurately mapping species distributions.
Major taxa studied
Terrestrial vertebrates, and Odonata
We analyzed the biases in 50,768 animal IUCN, GARD and BirdLife species maps, assessed the links between these maps and existing political and various non-ecological boundaries to assess their accuracy for certain types of analysis. We cross-referenced each species map with data from GBIF to assess if maps captured the whole range of a species, and what percentage of occurrence points fall within the species’ assessed ranges. In addition, we use a number of alternate methods to map diversity patterns and compare these to high resolution models of distribution patterns.
On average 20-30% of species’ non-coastal range boundaries overlapped with administrative national boundaries. In total, 60% of areas with the highest spatial turnover in species (high densities of species range boundaries marking high levels of shift in the community of species present) occurred at political boundaries, especially commonly in Southeast Asia. Different biases existed for different taxa, with gridded analysis in reptiles, river-basins in Odonata (except the Americas) and county-boundaries for Amphibians in the US. On average, up to half (25-46%) species recorded range points fall outside their mapped distributions. Filtered Minimum-convex polygons performed better than expert range maps in reproducing modeled diversity patterns.
Expert range maps showed high bias at administrative borders in all taxa, but this was highest at the transition from tropical to subtropical regions. Methods used were inconsistent across space, time and taxa, and ranges mapped did not match species distribution data. Alternate approaches can better reconstruct patterns of distribution than expert maps, and data driven approaches are needed to provide reliable alternatives to better understand species distributions.
Materials and methods
We use a combination of approaches to explore the relationship between species range maps and geopolitical boundaries and a subset of geographic features. In some cases we used the density of species range boundaries to explore the relationship between these and various features (i.e. administrative boundaries, river basin boundaries etc.). Additionally, species richness and spatial turnover are used to explore changes in richness over short geographic distances. Analyses were conducted in R statistical software unless noted otherwise. All code scripts are available at https://github.com/qiaohj/iucn_fix. Workflows are shown in Figure S1a-c with associated scripts listed.
Species ranges and boundary density maps
ERMs (Expert range maps) were downloaded from the IUCN RedList website for mammals (5,709 species), odonates (2,239 species) and amphibians (6,684 species; https://www.iucnredlist.org/resources/grid/spatial-data). Shapefile maps for birds were downloaded from BirdLife (10,423 species, http://datazone.birdlife.org/species/requestdis), and for reptiles from the Global Assessment of Reptile Distributions (GARD) (10,064 species; Roll et al., 2017). Each species’ polygon boundaries were converted to a polylines to show the boundary of each species range (Figure S1a-II; codes are lines 7 – 18 in line2raster_xxxx.r ; xxxx varies based on the taxa). The associated shapefile was then split to produce independent polyline files for each species within each taxon (see Figure S1a-I, codes are lines 29 to 83 in the same file above.).
To generate species boundary density maps, species range boundaries were rasterized at 1km spatial resolution with an equal area projection (Eckert-IV), and stacked to form a single raster for each taxon (at the level of amphibians, odonates, etc.). This represented the number of species in each group and their overlapping range boundaries (Figure S1b-II, codes are in line2raster_all.r). Each cell value indicated the number of species whose distribution boundaries overlapped with each cell, enabling us to overlay this rasterized information with other features (i.e. administrative boundaries) so that the overlaps between them can be calculated in R. These species boundary density maps underlie most subsequent analyses. R code and caveats are given in the supplements, links are provided in text and Figure S1.
Spatial exploration of species range boundaries in ArcGIS suggested that numerous geographic datasets (i.e. political and in few cases geographic features such as river basins) were used to delineate the species ranges for different regions and taxa (this is sometimes part of the methodology in developing ERMs as detailed by Ficetola et al., 2014). Thus in addition to analyzing the administrative bias and the percentage of occurrence records within each species’ ERM for all taxa, additional analyses were conducted when other biases were evident in any given taxa or region (detailed later in methods on a case-by-case basis).
For all taxa, we assessed the percentage of overlap between species range boundaries and national and provincial boundaries by digitizing each to 1km (equivalent to buffering thie polyline by 500m), both with and without coastal boundaries. An international map was used because international (Western) assessors use them, and does not necessarily denote agreed country boundaries (https://gadm.org/). The different buffers (500m, 1000m, 2500m, 5000m) were added to these administrative boundaries in ArcMap to account for potential, insignificant deviations from political boundaries (Figure S1b). An R script for the same function is provided in “country_line_buffer.r”.
To establish where multiple species shared range boundaries we reclassified the species range boundary density rasters for each taxa into richness classes using the ArcMap quartile function (Figure S1). From these ten classes the percentage of the top-two, and top-three quartiles of range densities within different buffers (500m, 1000m, 2500m, 5000m) was calculated per country to determine what percentage of highest range boundary density approximately followed administrative borders. This was done because people drawing ERMs may use detailed administrative maps or generalize near political borders, or may use political shapefiles that deviate slightly. It is consequently useful to include varying distances from administrative features to assess how range boundary densities vary in relation to administrative boundaries. Analyses of relationships between individual species range boundaries and administrative boundaries (coastal, non-coastal) were made in R and scripts provided (quantile_country_buffer_overlap.r).
Spatial turnover and administrative boundaries
Heatmaps of species richness were generated by summing entire sets of compiled species ranges for each taxon in polygonal form (Figure 1; Figure S1b-I). To assess abrupt diversity changes, standard deviations for 10km blocks were calculated using the block statistics function in ArcMap. Abrupt changes in diversity were signified by high standard deviations based on the cell statistics function in ArcGIS, which represented rapid changes in the number of species present. Maps were then classified into ten categories using the quartile function. Given the high variation in maximum diversity and taxonomic representation, only the top two –three richness categories were retained per taxon. This was then extracted using 1km buffers of national administrative boundaries to assess percentages of administrative boundaries overlapping turnover hotspots by assessing what proportion of political boundaries were covered by these turnover hotspots.
Data exploration and mapping exposed taxon and regional-specific biases requiring additional analysis. Where other biases and irregularities were clear from visual inspection of the range boundary density maps for each taxa, the possible causes of biases were assessed by comparing range boundary density maps to high-resolution imagery and administrative maps via the ArcGIS server (AGOL). Standardized overlay of the taxon boundary sets with administrative or geophysical features from the image-server revealed three types of bias which were either spatially or taxonomically limited between: 1) amphibians with county borders in the United States, 2) dragonflies and river basins globally and 3) gridding of distributions of reptiles. In these cases, species boundary density maps were used as a basis to identify potential biases which were then explored empirically using appropriate methods.
For amphibians, counties in the United States (US) were digitized using a county map from the US (https://gadm.org/), then buffered by with 2.5km either side. Amphibian species range boundary density maps were reclassified showing where species range boundaries existed (with other non-range boundary areas reclassified as “no data,”) and all species boundaries numerically indicated (i.e. values of 1 indicates one species range boundary, values of 10 indicates ten species range boundaries). Percentages of species boundary areas falling on county and in the buffers, in addition to species range boundaries which did not overlap with county boundaries were calculated to give measures of what percentage of the species boundaries fell within 2.5km of county boundaries.
For Odonata, many species were mapped to river basin borders. We used river basins of levels 6-8 (sub-basin to basin) in the river hierarchy (https://hydrosheds.org) to assess the relationship between Odonata boundaries and river boundaries. Two IUCN datasets exist for Odonata; the IUCN Odonata specialist group spatial dataset (https://www.iucnredlist.org/resources/spatial-data-download), and a larger dataset available via the RedList website (https://www.iucnredlist.org/resources/grid/spatial-data) containing an additional 1000 polygons relative to the previous file (as of September 2019), predominately in Latin America, (and often showing extensions of species ranges, or range fragments rather than 1000 additional species). We examined both, as either may be used for contemporary analyses on Odonata.
For reptiles, two grid resolutions were visible when mapping species range boundary density (1 and 0.5 decimal degrees) (Figure S2c shows these grids, and why further analysis was conducted). Gridding in range delineation was examined by developing 1- and 0.5-degree fishnet grids globally (matching the observed grid resolutions). Grids were then aligned with the noted reptile range boundary grids in central Africa (the closest area to 0, 0); if grids were not a genuine artifact of digitization, this would not be possible, or it would be inconsistent in different regions (alignment between the digitized fishnet grid and range boundaries was reconfirmed in Central Asia and South America). Grids were then clipped to land-areas and merged with national political boundaries into a combined shapefile. Species range boundary density was quantified and layers reclassified for areas where >3 species boundaries overlapped, this was then intersected with both grid-sizes to quantify percentages of boundary hotspots overlapping with grids or national borders.
We used occurrence records from the Global Biodiversity Information Facility (GBIF) to compare the downloaded ERMs with locations of known species occurrence. GBIF data are useful for understanding species distributions, and assessing how accurate mapped species ranges are. To ensure exclusion of inaccurate localities, we filtered GBIF point data using a stepwise approach before assessing ERMs (Figure S1c). Firstly, oceanic records (i.e., those geo-referenced outside of terrestrial land) were removed with a global land area mask. A biogeographic realm filter (https://ecoregions2017.appspot.com/) was then used to filter samples clearly in the wrong localities using the realms species occupy according to IUCN data (as IUCN assessment data lists what realm each species is found in). Corrections were made when the realm listed in the IUCN assessment was inconsistent with the associated ERM. In cases where IUCN assessment and ERM had different realms, further analysis was run to assay what realm the ERM fell into as a basis for developing a matching realm filter for GBIF filtering (IUCN_Real_Overlap_polygon.r and IUCN_Real_Dist.r). Once correct realms had been assessed from IUCN data, they were used to filter out GBIF data from the incorrect realm, before the percentages of occurrence points in and outside each species ranges were assessed, these were performed in R with the scripts “IUCN_GBIF_Overlap.r” and “IUCN_GBIF_Overlap_Bird.r”.
As GBIF data includes some synonyms, these were also corrected prior to their use. Synonym lists were developed via IUCN-lists; for birds, the Clements bird checklist was used (https://www.worldbirdnames.org/ioc-lists/master-list-2/). As IUCN lists sometimes gave species as both synonyms and true species, any species listed as both was corrected during filtering using R (using the codes “synonyms_analysis.r” and “merge_gbif_by_syn.r”). Given GBIF efforts to update data filters and the slow rate of taxonomic updates on the RedList (e.g only 45% of amphibian species described between 2004- 2016 were assessed by the IUCN (Tapley et al., 2018)), our approaches are at an ecologically meaningful resolution (500m) and accuracy to assess under-estimation in species ranges defined by their polygon data. Following GBIF filtering, the percentage of GBIF points within each corresponding species polygon was calculated; analyses were run in R and scripts provided (merge_gbif_by_syn_count.r).
Global diversity maps are clearly useful for ecological research, and various filters to remove commission errors have been proposed (i.e. Brooks et al., 2019), but their ability to accurately map distribution or overcome existing biases has not been well assessed. Trimming of ERMs by land-cover and elevation is regularly promoted as a means to trim ERMs to improve accuracy, but it is unknown if simple elevation and landcover trimming effectively corrects for spatial biases, and potentially reduce errors of commission. We used a pre-existing, high resolution dataset for which we already had reliable published diversity maps as a case study to test diversity patterns generated via original ERMs versus those from analysis of point data with and without trimming and compared these all with published diversity models for bats (Hughes, 2017). This enables a high resolution comparison of regional bat richness as a case-study. For this analysis of alternative approaches, we used one taxa in a single region to provide a proof of concept of how alternate approaches could be used. The workflow for this analysis is shown in Figure S1d. To do this, we developed elevation and landcover filters and applied these to the ERMs and to polygons derived from the recent species occurrence point data. Point data were clipped for Eurasia to match existing data and minimum convex polygons (MCPs, polygons bounding the outermost point data from each species’ range) were created in ArcMap for species with at least five unique localities. Filters were created for each species based on elevation and landcover, both 1) using IUCN assessment data exclusively and 2) based on extracting environmental data from points, and these were then paired with associated environmental data to clip species ranges on a per species basis.
We used point data to extract elevations from a 1km-resolution digital elevation model (DEM: http://www.earthenv.org//DEM), with min, max, mean and standard deviation per species calculated from summary statistics. Species exclusively recorded at <1000m=lowland, 1000-2000m=mid, >2000m=high, and between these ranges were ranked accordingly: lowland, low-mid, low-high, mid, high. DEMs were reclassified to corresponding elevation categories. IUCN assessment listings of elevational preference were recorded. An “integrated” status was determined based on comparing the point-based assessments with IUCN-based assessments (when species were assessed in IUCN and had sufficient point data): where only one assessment was given it was retained, where the two agreed it was retained, and where they differed we used the point-based data, given higher precision.
For habitat intactness, we collated IUCN assessments and data extracted from point data. For IUCN assessments, we used keywords to assay disturbance tolerance. Habitat listings which referenced roosting in buildings, houses, or tunnels were assigned as generalists. Species listed in cultivated areas, paddies, plantations, agriculture were assigned as semi-intact and those listing forest and no other “disturbed” habitats assigned as intact. For point data, we classified population layers at the 1km scale to under 50 people per kilometer as intact, 51-100 as semi-intact and over 100 per km as generalist (Ciesen 2020). From point data, species with over 50% of localities in the generalist category were listed as generalists, and species with at least 75% of records in the under 50 people were classed as intact. The IUCN and point-generated categories were then compared; where the two categories differed we selected the “final” classification based on further searches of the literature or direct experience with the species listed.
For richness mapping, we joined the elevation field based on species names using the join function in ArcGIS, split into our five elevation categories, each of which was then clipped by a polygon layer of the appropriate elevation bands. Then the clipped maps for all species were merged. This was repeated for the MCP layer and ERM layers. The ERM layer was run twice, for the “integrated” assessment data using the “integrated” category, and once for IUCN elevation assessments. These were then merged to form three species elevation trimmed species collations (one MCP, two ERM). Layers were then joined to intactness categories, and split into three categories prior to trimming with the appropriate intactness filter (intact, semi-intact, generalist). These were then merged before using the count overlap toolbox in ArcMap to count the number of species overlapping in any given area. This enabled comparison of trimmed and untrimmed layers to a previously published Maxent generated layer of bat species richness (Hughes, 2017) to assess how useful these alternate approaches are for approximating accepted richness patterns. The outputs include untrimmed ERM and MCP layers, one trimmed MCP layer (based on the integrated assessments) and two trimmed ERM layers (based on limited and integrated assessments) in addition to the Maxent (Version 3.4.1) map of regional bat species richness (based on the same point-dataset).