Skip to main content

Quantifying niche similarity among new world seed plants--Species Distribution Models (SDMs) & associated metadata

Cite this dataset

Figueroa, Hector et al. (2021). Quantifying niche similarity among new world seed plants--Species Distribution Models (SDMs) & associated metadata [Dataset]. Dryad.


Niche shift and conservatism are often framed as mutually exclusive. However, both processes could contribute to biodiversity patterns. We tested this expectation by quantifying the degree of climatic niche similarity among New World seed plants.

To incorporate the biological reality that species experience varied abiotic conditions across their range, we assembled distribution models and used these to characterize temperature, precipitation, and elevation niches for species as continuously-valued distributions. We then quantified niche similarity (distributional overlap) and identified statistically significant differences compared to a randomized null.

The degree of niche similarity differed among climate variables, plant lineages, and at different phylogenetic scales. For example, ~17% of all seed plants were significantly different in elevational niche from their closest relative(s), whereas for precipitation, this value was only ~4%. Average niche similarity decreased with increasing phylogenetic distance, consistent with niche conservatism; however, variance in niche similarity among close relatives was large, such that there always existed niche differences equaling those among distantly related species.

Our results suggest researchers should incorporate both niche shift and conservatism as important, scale-dependent factors shaping biodiversity patterns as these processes are not mutually exclusive, nor do they contribute equally to patterns among different plant lineages or niche variables.


We obtained a dated phylogeny for all seed plants from Smith & Brown (2018) (ALLMB phylogeny) and left polytomies unresolved. This phylogeny generated a species list with which to query American occurrence records from the Global Biodiversity Information Facility (GBIF; and Integrated Digitized Biocollections (iDigBio; Records were then cleaned and filtered using the BiotaPhy Platform interface (, Soltis & Soltis, 2016), following their accepted best practices.

The full GBIF dataset (Nrecords=36,335,199) is described and accessible at, as well as the Online Supplement. Briefly, GBIF records with the following flags were removed: TAXON-MATCH_FUZZY, TAXON_MATCH_HIGHER_RANK, TAXON_MATCH_NONE. Further processing was performed after aggregating GBIF and iDigBio records. For iDigBio, data cleaning and filtering produced a dataset of 13,667,523 records (Ninitial=58,384,427; 23.4% retained). Briefly, initial records were filtered by removing those with any of the following flags: GEOPOINT_DATUM_MISSING, GEOPOINT_BOUNDS, GEOPOINT_DATUM_ERROR, GEOPOINT_SIMILAR_COORD, REV_GEOCODE_MISMATCH, REV_GEOCODE_FAILURE, GEOPOINT_0_COORD, TAXON_MATCH_FAILED, DWC_KINGDOM_SUSPECT, DWC_TAXONRANK_INVALID, DWC_TAXONRANK_REMOVED (see Online Supplement for full details).

Aggregated GBIF and iDigBio records were then further processed by excluding points with any of the following issues: 1) falling outside the study area (the Americas); 2) less than four decimal point precision; 2) duplicate localities; 3) falling outside polygons describing accepted species’ distributions (defined by Plants of the World Online, POWO,; Brummitt 2001;; 4) species with fewer than twelve records.

Cleaned records were then passed to MaxEnt (version 3.1.4;; Phillips et al., 2006, 2004) along with 2.5’ resolution climate data from WorldClim (; Fick & Hijmans, 2017) in order to build species distribution models (SDMs; climate layers described in the Online Supplement). We chose to perform our analyses using SDMs rather than point occurrence records for two reasons. SDMs offer a probabilistic way of describing expected species’ ranges based on the climate from sites where the species has been observed. In this way, SDMs convert presence/ absence data into a continuously valued function, allowing us to ask how distributions are impacted by abiotic factors without having to arbitrarily bin species, as for example, alpine or montane. Second, using SDMs helps overcome some sampling limitations by providing insight into the climatic tolerances of where species might occur, even if they have not been sampled at that precise location. Although this could lead to erroneously predicting, for example, that a northern boreal species should occur at extreme southern latitudes, we overcame this obstacle by masking the SDMs with polygons provided by POWO that define geographically broad areas where each species occurs based on expert assessments. This approach thus constrained SDMs by both known areas of occurrence and climatic tolerances.

Usage notes

This dryad deposit contains a zipped folder which in turn has subfolders for each genus in our dataset. The genus folders contain the Species Distribution Models (SDMs) as raster files in a .TIFF format. SDMs should be accessible with any software suitable for handling such rasters (R, Python, QGIS, etc.). 

Please note that this folder contains SDMs for >72, 000 species and as such is quite large (~800 GB when unzipped). We highly recommend using a higher-performance computer with suitable storage capabilities (or external storage device); opening on a personal laptop may not be feasible for many users. 

SDMs were used to characterize the climatic niches of American seed plants with respect to temperature, precipitation, and elevation as described in the associated publication. Custom scripts used to parse each SDM into each of these climate variables is available in the Online Supplement (Scripts folder), as well as within this dryad deposit. Climate data used to parse SDMs was obtained from WorldClim, and users could easily parse SDMs into other climate layers or raster/ shape files for their own purposes.


National Science Foundation, Award: 1930005

National Science Foundation, Award: 1930007

National Science Foundation, Award: 1930030

National Science Foundation, Award: 1338694