Data and code for: Plants with higher dispersal capabilities follow ‘abundant-centre’ distributions but such patterns remain rare in animals
Data files
May 28, 2025 version files 1.08 MB
-
animalia_v2.csv
856.81 KB
-
plantae_v2.csv
216.21 KB
-
README.md
5.15 KB
Abstract
The ‘abundant-centre’ hypothesis posits that a species’ abundance is highest at its range centre and declines towards its range edge. Recently, the hypothesis has been much debated, with supporting empirical evidence remaining limited. Here, we provide the largest global test of the hypothesis to date, on 3,660 species using 5,703,589 abundance observations. We summarise species-level patterns and test the effects of dispersal-related species traits and phylogeny on abundance–distance relationships. Support for the hypothesis varied by taxonomic group, with abundant-centre patterns being more pronounced across all plants but non-significant when summarised across all animals. Dispersal did not explain abundance–distance relationships in animals but likely explains such patterns in non-woody plants. Phylogeny improved models of abundance–distance patterns for plants but not for animals. Despite this, controlling for phylogeny yielded non-significant group-level results for plants, suggesting that only certain plant groups may conform to abundant-centre patterns. Overall, we demonstrate that abundant-centre patterns are not a general ecological phenomenon; they tend to not apply to animals but can manifest in certain plant groups, depending on dispersal capabilities and evolutionary histories. Leveraging species’ traits that account for dispersal can improve models of abundant-centre patterns across geographic space.
animalia_v2.csv
Data set containing animal abundance-distance correlations, original sources and dispersal-related species traits.
Animal sample size = 3,060 species.
Binomial = Genera and species names for taxon studied (one species per row).
Genus = genera name.
Species = species name.
Study = original study where species specific abundance data were extracted from.
Year = publication year.
AbundCent.test = 1,0 binary variable confirming whether the study included specifically tested the abundant-centre hypothesis.
Data.Source = original dataset where abundance data were sourced (note some studies used third-party data).
DOI.title = Digital Object Identifier for the included studies.
Environment = biogeographic realm of the study species, e.g., terrestrial or marine.
Kingdom = taxonomic kingdom.
Group = taxonomic group for animal species, e.g., birds, freshwater fishes, mammals or reef fishes.
coef = abundance-distance correlation coefficient (based on Spearman Rank Correlations).
t = Spearman Rank test statistic.
p = Spearman Rank P-value.
extent = original study extent, e.g., continental/global or local.regional (see Methods).
n = abundance sample size for each species.
invasive = 1,0 binary variable confirming whether the species is considered invasive in parts of its range or not (see Methods).
feeding.guild = animal feeding guild, e.g., carnivore, omnivore or herbivore.
mean.latitude = mean latitude across a species range, later transformed into absolute latitude (see associated R code).
grain = categorical grain size of the original study (see Methods).
log.body.mass = body size variable log10 transformed (g for all animal groups excluding fishes; svl cm for fishes) (see Methods).
log.range size = log10 transformed range size estimates derived from the IUCN Red List (km2) (see Methods).
Sharing/Access information
Data was derived from the following sources:
Chaiyes et al. (2020). Ecosphere. 10.1002/ecs2.3134
Dallas et al. (2017) Ecology Letters. 10.1111/ele.12860
Feldman et al. (2015) Global Ecology and Biogeography. 10.1111/geb.12323
Freeman & Beehler (2018) Journal of Biogeography. 10.1111/jbi.13370
Martinez-Guiterrez et al. (2017) Diversity and Distributions. 10.1111/ddi.12662
Santini et al. (2018) Ecography. 10.1111/ecog.04027
Shalom et al. (2020) Journal of Biogeography. 10.1111/jbi.13920
Wen et al. (2020) Journal of Biogeography. 10.1111/jbi.14025
plantae_v2
Data set containing plant abundance-distance correlations, original sources and dispersal-related species traits.
Plant sample size = 600 species.
Binomial = Genera and species names for taxon studied (one species per row).
Genus = genera name.
Species = species name.
Study = original study where species specific abundance data were extracted from.
Year = publication year.
AbundCent.test = 1,0 binary variable confirming whether the study included specifically tested the abundant-centre hypothesis.
Data.Source = original dataset where abundance data were sourced (note some studies used third-party data).
DOI.title = Digital Object Identifier for the included studies.
Environment = biogeographic realm of the study species, e.g., terrestrial or marine.
Kingdom = taxonomic kingdom.
Group = higher taxonomic group, e.g., plants.
coef = abundance-distance correlation coefficient (based on Spearman Rank Correlations).
t = Spearman Rank test statistic.
p = Spearman Rank P-value.
extent = original study extent, e.g., continental/global or local.regional (see Methods).
n = abundance sample size for each species.
invasive = 1,0 binary variable confirming whether the species is considered invasive in parts of its range or not (see Methods).
functional.group = functional group variable, e.g., trees, shrubs, herbs and grasses.
feeding.guild = animal feeding guild, e.g., carnivore, omnivore or herbivore.
life.form = plant life forms based on the classification system of Raunkiær (1934), e.g., phanerophyte, hemicryptophyte, geophyte and therophyte.
life.span = plant life span, e.g., annual, annual/biennial, biennial, biennial/perennial and perennial.
mean.latitude = mean latitude across a species range, later transformed into absolute latitude (see associated R code).
grain = categorical grain size of the original study (see Methods).
log.range size = log10 transformed range size estimates (see Methods).
lofe.mean.plant.height.m = log10 transformed mean plant height (m) estimate (see Methods).
log.seed.mass.mg = log10 transformed seed mass (mg) estimates (see Methods).
Sharing/Access information
Data was derived from the following sources:
Baer & Maron (2019) Journal of Ecology. 10.1111/1365-2745.13086
Dallas et al. (2017) Ecology Letters. 10.1111/ele.12860
Dixon et al. (2012) Molecular Ecology. 10.1111/mec.12207
Gao et al. (2017) Ecosphere. 10.1002/ecs2.1737
McMinn et al. (2016) Journal of Biogeography. 10.1111/jbi.12879
Phiri et al. (2015) Polar Biology. 10.1007/s00300-015-1749-1
Sporbert et al. (2020) Journal of Biogeography. 10.1111/jbi.13926
Code/Software
This data set should be used to run the accompanying R code "all analyses together - version 4 - SK.R".
Literature searches
We conducted a systematic literature search on 23rd July 2021 by querying the ISI Web of Science database (apps.webofknowledge.com) with the following search string for an initial broad search: “(abundan* OR abundance-cent* OR abundant niche-cent* OR niche cent* OR abundant-centre hypothesis) AND (range OR geographic range OR range size OR range edge OR species distribution)” using the TITLE field. We retained all studies that 1) comprised peer-reviewed primary studies, 2) presented globally extensive abundance point observations across all taxonomic groups, 3) were published between 1990-2020, 4) included extractable data relating to observed/estimated abundance counts and 5) were published in English, French or Spanish language.
Examination of the returned studies (N = 818) revealed that some key literature was missing from our results. Therefore, we used the studies returned from our initial search as reference sources for additional key search terms to derive an optimised search string (using the R package ‘litsearchr’ (Grames et al. 2019)). Search terms were extracted from unique study titles, abstracts and tagged keywords (e.g., terms such as ‘range edge’, ‘abundance’ and ‘species range’). We built a keyword co-occurrence network and quantitatively assessed potential search terms using a 60% cumulative cut-off point (see Grames et al. 2019). Resulting search terms (N = 326) were grouped into either 1) the ‘species group’, 2) ‘geographic group’ or 3) ‘both groups’ depending on whether the term referred to a species concept or geographic concept. Grouping refers to the string of search terms either side of the Boolean operator ‘AND’. We manually removed irrelevant search terms (N = 268; e.g., ‘field sites’, ‘habitat patch’ and ‘statistically significant’) and retained the most relevant search terms (N = 58) which formed our optimised search string (Table S13). To verify whether our resulting search string was fully optimised, we cross-referenced four key articles that we expected to be included in the optimised search results (Virgós et al. 2011; Dixon et al. 2013; Baldanzi et al. 2013 and Dallas et al. 2017), all of which were included. We queried the WoS database using our optimised search string and obtained 531 studies. After screening of titles and abstracts, we retained 23 studies for data extraction.
We supplemented our Web of Science literature search with the studies included within the foundational synthesis by Sagarin and Gaines (2002). We then conducted a snowball search of the literature that cited Sagarin and Gaines (2002) up until 31st December 2020. This resulted in a literature database of 1,109 studies of which we removed a set of 1,000 studies after screening titles and abstracts and an additional set of 95 studies after screening of full texts, leaving us with an additional 14 studies that were suitable for data extraction.
Data extraction and processing
From each study and for each species, we extracted raw abundance values and distance from the species’ geographic range centroids (in km). Corresponding authors were contacted via email correspondence if data were not publicly available. Where data were not publicly available and the corresponding author was unable to provide the data, we extracted abundance and distance data from appropriate figures within the published articles using the web-based tool WebPlotDigitizer version 4.5 (Rohatgi 2021; https://automeris.io/WebPlotDigitizer). If abundance data were available but distance values were not, global range maps were obtained in shapefile formats for each species and range centroids were calculated. We obtained global range shapefiles for terrestrial mammals from the IUCN Red List (IUCN 2021; https://www.iucnredlist.org) and for birds from the BirdLife Data Zone version 2020.1 (BirdLife International and the Handbook of the Birds of the World 2020; http://datazone.birdlife.org/home), thus accounting for the entire ranges for migratory and non-migratory species. IUCN range maps have been criticized due to oversimplification of species’ ranges derived from sampling bias (Herkt et al. 2017), but represent the most comprehensive spatial data set available for our study species. If global polygon range maps for particularly under-sampled taxonomic groups were unavailable, e.g., invertebrates and some plant species (N spp. = 8), we downloaded species occurrence point data from the Global Biodiversity Information Facility (GBIF; https://www.gbif.org). Occurrence data were then cleaned using the ‘CoordinateCleaner’ R package (Zizka et al. 2019) and manual checks were performed to remove any remaining outliers (Zizka et al. 2020; Panter et al. 2020). We calculated minimum convex hulls for terrestrial species in an attempt to not overestimate their global range sizes by accounting for unsuitable terrestrial environments, which we interpreted as proxies for global species ranges and calculated range centroids using QGIS 3.14.16 (QGIS.org 2022). Then, we calculated the geodesic distances on a sphere (km) between the sampling sites with associated abundance values to obtain the distance to the species’ range centroid, using the WGS84 co-ordinate reference system. Abundance and distance values were log10-transformed prior to statistical analyses to account for scaling inconsistencies. Species with unresolved species-level taxonomies (106) as well as species with fewer than five observations were omitted from the analysis. We calculated Spearman Rank Correlation Coefficients (rs) between log10(abundance) and log10(distance) values (Fig. 2). Negative rs values are consistent with an ‘abundant-centre’ distribution (see Fig. 2).
Scale effects on abundance–distance relationships
We used three measurements to attempt to explore the effect of scale on ‘abundant-centre’ patterns: 1) we calculated the study extent (km), i.e., the spatial extent at which the study was conducted at encompassing the total study area between sampling locations in the data. This was measured using both latitudinal and longitudinal measurements. Initially, we used four categorical levels: ‘Local’ ≤ 250 km, ‘Landscape’ > 251-500 km, ‘Regional’ > 501-1500 km and ‘Continental’ > 1,501 km. 2) We calculated the grain (km2), i.e., the spatial scale at which data were collected, which is important because the area of the base unit defines the spatial scale of the study (Field et al. 2009), and variation in grain may be reflected in population abundance estimates (see Caten et al. 2022). Grain was extracted from each study by taking the base unit area for each sampling technique, e.g., sampling units measured in km2 (Kallimanis & Koutsias 2012). 3) We calculated study focus (km2), defined as the spatial scale at which data were analysed. Often, abundance estimates from individual sampling sites are averaged across larger sampling areas, e.g. a protected area sampled using a number of line transects and multiple sampling points along each transect, with abundance values averaged across all of the sampling points to produce an estimate for each transect. In most cases grain and focus remained the same for each study, e.g. when abundance data were recorded in the form of points within a species’ geographic range. Grain and focus values were log10 transformed due to the large variation in the range of these values. We then plotted the distribution of the log10-transformed values on separate histograms and visualised the natural breaks in the data. Using these we binned both grain and focus into two new categorical levels: ‘small’ and ‘large’ (-10 to -3 and -3 to 3 on log10 scale, respectively). We decided to drop extent from our analyses due to uneven sample sizes: data for only three groups (birds, mammals and plants) global 2,717 species vs. local 146 species. We also dropped focus from our analyses because the natural break categorical bins were identical to those for grain. Grain was subsequently omitted from the statistical analyses due to its strong correlation with animal species group and plant functional group variables, and thus could not be included within the same models.
Compilation of dispersal-related species traits and geographic variables
We compiled six dispersal-related species traits for animal and eight traits for plant species to examine their effects on abundance–distance relationships (Table 1; Table S2). Traits were selected based on the morphological and/or ecological characteristics of the study species and included: for animals 1) taxonomic group (categorical), 2) body size (continuous), 3) invasiveness (binary 1,0) and 4) feeding guild (categorical); and for plants: 1) functional group (categorical), 2) mean plant height (m), 3) seed mass (mg), 4) invasiveness (binary 1,0), 5) life span (categorical) and 6) life form (categorical) (see Table 1 for an overview and Table S3 for justifications for the inclusion of each species trait/geographic variable). To explore spatial patterns within global abundance–distance relationships, we also compiled geographic data for species-level range sizes (km2) and absolute latitudes (°).
Due to small sample sizes (N = 9 species), invertebrates were dropped prior to statistical analysis. To examine the effects of body size, we compiled body mass (g) data for mammals and birds, and snout-vent lengths (SVL; cm) for freshwater and reef fishes to produce the trait variable ‘body size’. For plants, we used mean plant height (m) as a proxy for body size. Mean plant height was used instead of maximum plant height as these were the only data available for our selected species, and notable effects of plant height on species abundance patterns would be reflected in either measurement. Where plant height and seed mass data were unavailable, we supplemented our data with gap-filled measurements from Bruelheide et al. (2018) and Kattge et al. (2020) which were estimated using Bayesian Hierarchical Probabilistic Matrix Factorization (BHPMF; Schrodt et al. 2015). Trait data for plant functional groups were sourced from the BiolFlor database (Kühn et al. 2004) and the corresponding levels ‘dwarf shrub’ (N = 11 species) and ‘subshrub’ (N = 4 species) were merged into the level ‘shrub’ to produce four distinct categorical levels: ‘grasses’, ‘herbs’, ‘shrubs’ and ‘trees’. Plant life-form data followed the classification of Raunkiær (1934) but due to small sample sizes for ‘chamaephytes’ (woody plants with perennating buds borne close to the soil surface), we merged these with the ‘phanerophytes’ (woody perennial plants with buds at a distance from the surface, such as trees and shrubs). Invasiveness was assessed using a binary approach (1 = invasive and 0 = non-invasive) according to the Invasive Species Specialist Group’s Global Invasive Species Database (ISSG GISD; http://www.iucngisd.org/gisd/). For both animal and plant species, absolute latitude (°) was calculated as the absolute value of the range centroid. The following species traits/geographic variables were log10-transformed prior to analysis: body size (cm; g), mean plant height (m), seed mass (mg) and range size (km2) to account for right-skew within the data. We tested for, but did not find, collinearity between continuous explanatory variables using a correlation threshold value ≥ 0.70 in the R package ‘hmisc’ (Harrell Jr 2022) and visualised this using the ‘pheatmap’ package (Kolde 2019) (Fig. S1).
For full metholodogical details, see the "Methods" section of the associated manuscript.
Data sets are provided as comma separated values (.csv) files and the code is provided as a R File format for the software R.
- Panter, Connor; Bachman, Steven; Baines, Oliver et al. (2025). Data and code for: Plants with higher dispersal capabilities follow 'abundant-centre' distributions but such patterns remain rare in animals. Zenodo. https://doi.org/10.5281/zenodo.8211933
- Panter, Connor; Bachman, Steven; Baines, Oliver et al. (2025). Data and code for: Plants with higher dispersal capabilities follow 'abundant-centre' distributions but such patterns remain rare in animals. Zenodo. https://doi.org/10.5281/zenodo.8211934
- Panter, Connor T.; Bachman, Steven P.; Baines, Oliver et al. (2023). Species abundances often conform to ‘abundant-centre’ patterns depending on dispersal capabilities [Preprint]. Cold Spring Harbor Laboratory. https://doi.org/10.1101/2023.03.31.535106
- Panter, Connor T.; Kambach, Stephan; Bachman, Steven P. et al. (2025). Plants with higher dispersal capabilities follow ‘abundant-centre’ distributions but such patterns remain rare in animals. Nature Communications. https://doi.org/10.1038/s41467-025-63566-0
