Skip to main content

Data from: Breeding habitat and nest site selection by an obligatory “nest-cleptoparasite”, the Amur Falcon Falco amurensis

Cite this dataset

Frommhold, Martin et al. (2019). Data from: Breeding habitat and nest site selection by an obligatory “nest-cleptoparasite”, the Amur Falcon Falco amurensis [Dataset]. Dryad.


The selection of a nest site is crucial for successful reproduction of birds. Animals which re-use or occupy nest sites constructed by other species often have limited choice. Little is known about the criteria of nest-stealing species to choose suitable nesting sites and habitats. Here, we analyze breeding site selection of an obligatory “nest-cleptoparasite”, the Amur Falcon Falco amurensis. We collected data on nest sites at Muraviovka Park in the Russian Far East, where the species breeds exclusively in nests of the Eurasian Magpie Pica pica. We sampled 117 Eurasian Magpie nests, 38 of which were occupied by Amur Falcons. Nest-specific variables were assessed and a recently developed habitat classification map was used to derive landscape metrics. We found that Amur Falcons chose a wide range of nesting sites, but significantly preferred nests with a domed roof. Breeding pairs of Eurasian Hobby Falco subbuteo and Eurasian Magpie were often found to breed near the nest in about the same distance as neighboring Amur Falcon pairs. Additionally, the occurrence of the species was positively associated with bare soil cover, forest cover and shrub patches within their home range and negatively with the distance to wetlands. Areas of wetlands and fallow land might be used for foraging since Amur Falcons mostly depend on an insect diet. Additionally, we found that rarely burned habitats were preferred. Overall, the effect of landscape variables on the choice of actual nest sites appeared to be rather small. We used different classification methods to predict the probability of occurrence, of which the Random forest method showed the highest accuracy. The areas determined as suitable habitat showed a high concordance with the actual nest locations. We conclude that Amur Falcons prefer to occupy newly built (domed) nests to ensure high nest quality, as well as nests surrounded by available feeding habitats.


Study area

Our study area, the Muraviovka Park for Sustainable Land Use and its surroundings, is situated at the southern end of the Zeya-Bureya plain on the middle section of the Amur River in the Russian Far East (Fig. 2). The area stretches about 16 km from south to north and 11.5 km from east to west, covering an area of about 13289 ha. The valley of the Amur River and its first terraces ranges in altitudes from 105 to 348 m above sea level.

The landscape is dominated by wetlands (6346 ha) with Carex meyeriana, Carex lasiocarpa, Iris laevigata and Menyanthes trifoliata (Akhtymaov et al., 2002). Other land cover types include agricultural fields (480 ha) with changing crops from year to year such as soy and buckwheat and fallow fields (1392 ha). The forest islands (475 ha) contain species of Quercus mongolica, Betula dahurica and Lespedeza bicolor, reed (323 ha) with a dominant vegetation of Phragmites australis and shrubs (268 ha) comprising Corylus heterophylla, Salix bebbiana and Lespedeza bicolor (Akhtymaov et al., 2002).

One year after its establishment in 1994, the Park and its adjacent territories became part of the Ramsar List of Wetlands of International Importance. The Amur Bird Project has been investigating the threatened avifauna together with the staff of Muraviovka Park since 2011 (Heim & Smirenski, 2013; 2017).

Data collection

We searched the complete study area for nests of magpies, other corvids and raptors during April-July 2013. Nests were easily located due to the limited number of trees in the area (Figure 1). We collected data on their location using a handheld GPS (Garmin eTrex 10) and assessed the following nest-specific variables: tree genera, nest height, status of the roof (old magpie nests often lose their roof) and nest content (breeding species, number of eggs or chicks). For the latter variable, the trees were climbed or a prolonged stick with an integrated camera was used to correctly identify the status of the nest.

Data analysis

Data were checked for consistency and nest locations were intersected with the habitat classification map (Heim, 2018) using ArcGIS (version 10.4). Around each nest location, a buffer with a radius of 2500 m was created that approximately represents the home range of the individual breeding pair. The decision about the size of the home range (1963.5 ha) was based on references regarding home range estimations for the closely related Red-footed Falcon Falco vespertinus (38 to 3467 ha) (Fehérvári et al., 2009; Palatitz et al., 2015). Subsequently, buffers were intersected with the habitat classification map to obtain the particular set of habitat patches around each nest. With the help of a fire frequency map (Heim et al., 2019) we tested for an influence of fire on the species occurrence.

The created surroundings of the nests were used for a further calculation of different landscape metrics using the software Fragstats (version 4.2.1). The environmental variables on a landscape level included proximity, area-edge, shape, aggregation and diversity metrics. The splitting index refers to the fragmentation of the landscape (McGarigal, 2017; Schindler et al., 2008). High values represent a mosaic-like structure with a higher diversity of different habitat patches. The landscape metric perimeter-area fractal dimension expresses the complexity of the perimeter-area ratio and accounts for the rise or decrease of environmental gradients between patches of a landscape (McGarigal, 2017; Schindler et al., 2013; Wang and Malanson, 2007).

Descriptive statistics were carried out and distributions were tested for the following statistical applications with the statistical software R (version 3.3.3). In order to start the statistical analyses, the categorical variables such as roof (unknown, no, yes), nesting habitat (willow shrubs, wetland, steppe, shrubs, forest, water, reed, field, bare soil) and tree genera (Betula, Quercus, Prunus, Ulmus, Salix, Populus, Tilia, Crataegus, dead unidentified tree) were transformed into factor variables.

Results accounting for the differences among the nest occupants are presented by using the median and the median absolute deviation (MAD). The MAD is an alternative to the standard deviation or the interquartile range and considered as a robust scale measure, especially in the presence of outliers. It is calculated by finding the median of absolute deviations from the median (Leys et al., 2013; Rousseeuw & Croux, 1993).

The dataset was divided into a training (70 %), test (15 %) and validation (15 %) set. With the help of the R package Rattle (Williams, 2009), the classification methods decision trees and random forest were tested. Decision trees are built by the creation of binary splits of the training data on every predictor variable and the structure of the algorithm allows classifying every new observation into one of two groups. The aim is to construct most homogeneous subsets of the data. The classification threshold can be taken from the pictorial graph of the decision tree (Kabacoff, 2015). Random forest combines many classification trees to produce more accurate classifications (Cutler et al., 2007).

First, all variables were incorporated into the machine learning algorithms. The splitting variables of the decision trees, the variable importance measure of the random forest application and the p-values of the Chi-square (χ²) test from the logistic regression served as indicators for influential variables. Decision trees as well as random forests overestimate variables with many categories. Those variables are divided in many auxiliary variables and therefore are more likely to be chosen. Due to this bias towards variables with many classes, nesting habitat and tree genera had to be excluded. Minimum buckets in the classification trees were put to seven and the numbers of trees for the random forest application were manually changed to 5000 to obtain better statistical results. The model run started with 63 variables and subsequent underperformers were removed. Depending on the lowest Akaike information criterion (AIC) values, a set of variables was chosen and incorporated into the classification procedure (Fig. 3).

An implied variable importance measure of the random forest application is called Mean Decrease Accuracy (MDA). The MDA shows the decrease in accuracy of the model performance by an error rate calculated with and without the variable. The error rate is calculated for every predictor and then averaged over all constructed trees, which used this specific variable. Predictors with high MDA values are seen as important in the classification of the data, as the predictive accuracy of the model would decrease, if those variables would be left out during calculation (Breiman, 2001).

The performances of the models, each of which includes different sets of predictor variables were compared using the area under the curve (AUC) values and the overall error. An AUC value of 1 resembles a perfect fit and the overall error validates the accuracy of the classification algorithms by accounting for all the misclassified cases (Williams, 2009).