Skip to main content
Dryad

Factors influencing transferability in species distribution models

Cite this dataset

Rousseau, Josée; Betts, Matthew (2022). Factors influencing transferability in species distribution models [Dataset]. Dryad. https://doi.org/10.5061/dryad.08kprr54c

Abstract

Species distribution models (SDMs) provide insights into species’ ecology and distributions and are frequently used to guide conservation priorities. However, many uses of SDMs require model transferability, which refers to the degree to which a model built in one place or time can successfully predict distributions in a different place or time. If a species’ model has high spatial transferability, the relationship between abundance and predictor variables should be consistent across a geographical distribution. We used Breeding Bird Surveys, climate and remote sensing data, and a novel method for quantifying model transferability to test whether SDMs can be transferred across the geographic ranges of 129 species of North American birds. We also assessed whether species’ traits are correlated with model transferability. We expected that prediction accuracy between modeled regions should decrease with 1) geographical distance, 2) degree of extrapolation, and 3) were affected by a ‘core-boundary’ effect, which assesses distances to the boundary of a distribution. Our results suggest that very few species have a high model transferability index (MTI). Species with large distributions, with distributions located in areas with low topographic relief, and with short lifespans are more likely to exhibit low transferability. Transferability between modeled regions also decreased with geographical distance and degree of extrapolation. We expect that low transferability in SDMs potentially resulted from both ecological non-stationarity (i.e., biological differences within a species across its range) and over-extrapolation. Accounting for non-stationarity and extrapolation should substantially increase prediction success of species distribution models, therefore enhancing the success of conservation efforts.

Methods

Data – Bird species and abundance

We used data from the USGS Breeding Bird Survey (BBS) to extract abundances of bird species in Canada and USA (Pardieck et al. 2019). BBS consists of routes surveyed once a year during the breeding season (typically June). A BBS route includes 50 three-minute point counts, separated by a distance of at least 0.5 miles (Sauer et al. 2003).

We selected bird species (n = 138) using three criteria. First, a species had to be detected in at least 30 different routes per year. This provided for a minimum sample size in the number of routes where a species is present and helped ensure our models would predict reasonably well (Hernandez et al. 2006, Wisz et al. 2008). Second, we selected species with a prevalence (i.e., the percentage of routes where the species was present) of at least 20%. We also excluded highly common species (prevalence >75%). These prevalence values are recommended to improve the fit of SDMs (McPherson et al. 2004). Third, a minimum of 80% of the breeding distribution of each species had to be within the area covered by the BBS routes. We used breeding bird distributions from BirdLife International (2018). The area covered by the BBS routes was determined using a minimum convex polygon surrounding all BBS routes. 

For each BBS route and species, we used the mean abundance for the years of 2013 to 2017 (inclusively; Howard et al., 2014). This range of years represents the latest five years available at the time of download. Using the mean abundance across a short time frame enabled us to reduce the noise caused by yearly changes in detections, while limiting the impact caused by long-term changes in habitat and climate on bird abundance (Gutiérrez-Illán et al. 2014, Betts et al. 2019).

Data – Environmental covariates

We used climatic and land cover covariates known to be correlated with bird abundance (Austin 2002, Shirley et al. 2013, Howard et al. 2015). Data were obtained from Google Earth Engine (Gorelick et al. 2017) and were summarized for each BBS route and year (2013 to 2017), using a 400 m buffer (Bahn and McGill 2013). Datasets were selected based on their availability across North America. Climatic covariates were obtained from Daymet V3 (Thornton et al. 2017) and included summer precipitation (prcpSummer), winter precipitation (prcpWinter), maximum summer temperature (tMax), and minimum winter temperature (tMin). We used the equivalent of the band 3 (B3) and 4 (B4) of Landsat 7, from Landsat 5, 7, and 8 as land cover variables. These land cover data were summarized using the LandTrendr tools (Kennedy et al. 2018). LandTrendr includes pre-processing of the images including geometric rectification and cloud and shadow screening. It creates a yearly surface reflectance composite which we used to summarize data for each BBS route. We used B3 to discriminate between built-up environments and vegetation, and B4 to compare rates of chlorophyll absorption which is useful to distinguish between conifer and broadleaf as well as young versus senescent vegetation (Cohen and Goward 2004). The climatic and land cover covariates used in the analysis are summarized in Table 1 of paper. To be consistent with bird data and to increase model transferability (Tuanmu et al. 2011), covariates for each BBS route were then averaged over the period from 2013 to 2017.

Funding