Skip to main content

Habitats as predictors in species distribution models: Shall we use continuous or binary data?

Cite this dataset

Gábor, Lukáš et al. (2022). Habitats as predictors in species distribution models: Shall we use continuous or binary data? [Dataset]. Dryad.


The representation of a land cover type (i.e., habitat) within an area is often used as an explanatory variable in species distribution models. However, it is possible that a simple binary presence/absence of the suitable habitat might be the most important determinant of the presence/absence of some species and, thus, be a better predictor of species occurrence than the continuous parameter (area). We hypothesize that the binary predictor is more suitable for relatively rare habitats (e.g., wetlands) while for common habitats (e.g., forests) the amount of the focal habitat is a better predictor. We used the Third Atlas of Breeding Birds in the Czech Republic as the source of species distribution data and CORINE Land Cover inventory as the source of the landcover information. To test our hypothesis, we fitted generalized linear models of 32 water and 32 forest bird species. Our results show that for water bird species, models using binary predictors (presence/absence of the habitat) performed better than models with continuous predictors (i.e., the amount of the habitat); for forest species, however, we observed the opposite. Thus, future studies using habitats as predictors of species occurrences should consider the prevalence of the habitat in the landscape, and the biological role of the habitat type in the particular species’ life history. In addition, performing a preliminary comparison of the performance of the binary and continuous versions of habitat predictors (e.g., using information criteria) prior to modelling, during variable selection, can be beneficial. These are simple steps that will improve explanatory and predictive performance of models of species distributions in biogeography, community ecology, macroecology, and ecological conservation.


Study area and bird distribution data

The study area was the territory of the Czech Republic, a central European country covering almost 79,000 km2 (see Figure 2a). Data on bird species were obtained from the Third Atlas of Breeding Bird Distribution in the Czech Republic (Šťastný et al. 2006). The study area is divided into 628 grid squares of approx. 134 km2 (10’ east longitude × 6’ north latitude; hereafter referred to as mapping squares) to which bird occurrences and environmental predictors are referred. The fieldwork for the atlas was conducted by volunteers between 2001 and 2003 where the breeding status of all species was recorded in each mapping square. Field observations of the bird species occurring in each mapping square were recorded using 17 numerical breeding codes (Hagemeijer and Blair 1997). Breeding occurrence of each bird species within a given mapping square was classified into one of the following categories: 0 – Non-breeding (where no observations of the species were made, or where the species was observed but no breeding evidence was found), A – Possible breeding, B – Probable breeding or C – Confirmed breeding. For the purpose of our study, all breeding categories (A, B and C) were used as presences whereas category 0 was used as absences. We prepared data for 85 bird species, 36 of them nesting in wetlands and surrounding habitats (e.g., standing water, littoral zones of ponds, swamps), and 49 species nesting in forests, following classification of Reif et al. (2006). Nevertheless, we had to remove 21 species with relatively small (less than 30 presence cells out of 628 cells), and relatively high occupancy (more than 598 presence cells out of 628 cells). Therefore, 32 water and 32 forest bird species (see Table A1) were included in the study.

Habitat variables

We derived four habitat predictors from the CORINE Land Cover database at 100 m resolution (Feranec et al., 2010). Specifically, within mapping squares, we derived the area of agricultural areas (CORINE class 2), artificial surfaces divided into four classes (0, 0–20, 20–40, > 40 km2; CORINE class 1), continuous area of water bodies (CORINE class 5.1.2) and area of forest (CORINE class 3.1). In addition, binary factors representing presence or absence of water bodies and forests, respectively, were calculated. In order to generate binary habitat maps, it is necessary to determine an area threshold that defines the presence-absence of the habitat. An appropriate threshold should consider the prevalence of the habitat across the region of interest, the grain size at which the variable is being considered (i.e., the size of the grid cells at which the species are recorded) and the original grain size that the habitat variable is being aggregated from (i.e., the size of the grid cells of the original land-cover data, which is then aggregated to the larger modelling grain size). Due to the uncommonness of water habitats as well as due to the coarse resolution of CORINE Land Cover, we considered any amount of the water habitat in a cell as presence (i.e., the proportion of the cell occupied by one hectare set to > 0%). Forest pixels are, on the other hand, present in all mapping squares across the study region and, for this reason, we tested several thresholds (10%, 20%, 30%, 40%, and 50%) to derive the binary predictor.

Other environmental variables

Although the habitat predictors were our main focus, other predictors, such as climate, may also be important in determining the distributions of species. As climatic predictors, we used current climatic data from WorldClim (Hijmans et al., 2005). Following previous studies, we used two predictors: mean temperature and mean precipitation during the breeding season, i.e. in April–June (e.g., Moudrý and Šímová 2013, Venne and Currie 2021). We downloaded these at a resolution of 30 arc seconds (~ 1 km2) and averaged them inside each mapping square to match the grid resolution of the species distribution data (~ 100 km2). We also considered usage of elevation predictors such as maximum, minimum, and range of elevation derived from Shuttle radar topography mission (SRTM, Farr et al. 2007; Moudrý et al. 2018) as they might be ecologically important to birds (e.g., Kosicky 2017). However, as these variables were highly correlated with the mean temperature in April–June, we eventually decided not to include them. The data were processed in ArcGIS 10.7.1 (ESRI, CA, USA) and R (R Development Core Team) software.

Usage notes

Please read README.txt file for more infromation.


Internal Grant Agency of Faculty of Environmental Sciences, Award: 2021B0009