Skip to main content

Biological traits of seabirds predict extinction risk and vulnerability to anthropogenic threats

Cite this dataset

Richards, Cerren; Cooke, Robert; Bates, Amanda (2021). Biological traits of seabirds predict extinction risk and vulnerability to anthropogenic threats [Dataset]. Dryad.



Seabirds are heavily threatened by anthropogenic activities and their conservation status is deteriorating rapidly. Yet, these pressures are unlikely to uniformly impact all species. It remains an open question if seabirds with similar ecological roles are responding similarly to human pressures. Here we aim to: 1) test whether threatened vs non-threatened seabirds are separated in trait space; 2) quantify the similarity of species’ roles (redundancy) per IUCN Red List Category; and 3) identify traits that render species vulnerable to anthropogenic threats.



Time period


Major taxa studied



We compile and impute eight traits that relate to species’ vulnerabilities and ecosystem functioning across 341 seabird species. Using these traits, we build a mixed-data PCA of species’ trait space. We quantify trait redundancy using the unique trait combinations (UTCs) approach. Finally, we employ a SIMPER analysis to identify which traits explain the greatest difference between threat groups.


We find seabirds segregate in trait space based on threat status, indicating anthropogenic impacts are selectively removing large, long-lived, pelagic surface feeders with narrow habitat breadths. We further find that threatened species have higher trait redundancy, while non-threatened species have relatively limited redundancy. Finally, we find that species with narrow habitat breadths, fast reproductive speeds, and varied diets are more likely to be threatened by habitat-modifying processes (e.g., pollution and natural system modifications); whereas pelagic specialists with slow reproductive speeds and varied diets are vulnerable to threats that directly impact survival and fecundity (e.g., invasive species and biological resource use) and climate change. Species with no threats are non-pelagic specialists with invertebrate diets and fast reproductive speeds.

Main conclusions

Our results suggest both threatened and non-threatened species contribute unique ecological strategies. Consequently, conserving both threat groups, but with contrasting approaches may avoid potential changes in ecosystem functioning and stability.


​​​​Trait Selection and Data

We compiled data from multiple databases for eight traits across all 341 extant species of seabirds. Here we recognise seabirds as those that feed at sea, either nearshore or offshore, but excluding marine ducks. These traits encompass the varying ecological and life history strategies of seabirds, and relate to ecosystem functioning and species’ vulnerabilities. We first extracted the trait data for body mass, clutch size, habitat breadth and diet guild from a recently compiled trait database for birds (Cooke, Bates, et al., 2019). Generation length and migration status were compiled from BirdLife International (, and pelagic specialism and foraging guild from Wilman et al. (2014). We further compiled clutch size information for 84 species through a literature search.


Foraging and diet guild describe the most dominant foraging strategy and diet of the species. Wilman et al. (2014) assigned species a score from 0 to 100% for each foraging and diet guild based on their relative usage of a given category. Using these scores, species were classified into four foraging guild categories (diver, surface, ground, and generalist foragers) and three diet guild categories (omnivore, invertebrate, and vertebrate & scavenger diets). Each was assigned to a guild based on the predominant foraging strategy or diet (score > 50%). Species with category scores < 50% were classified as generalists for the foraging guild trait and omnivores for the diet guild trait. Body mass was measured in grams and was the median across multiple databases. Habitat breadth is the number of habitats listed as suitable by the International Union for Conservation of Nature (IUCN, Generation length describes the mean age in years at which a species produces offspring. Clutch size is the number of eggs per clutch (the central tendency was recorded as the mean or mode). Migration status describes whether a species undertakes full migration (regular or seasonal cyclical movements beyond the breeding range, with predictable timing and destinations) or not. Pelagic specialism describes whether foraging is predominantly pelagic. To improve normality of the data, continuous traits, except clutch size, were log10 transformed.

Multiple Imputation

All traits had more than 80% coverage for our list of 341 seabird species, and body mass and habitat breadth had complete species coverage. To achieve complete species trait coverage, we imputed missing data for clutch size (4 species), generation length (1 species), diet guild (60 species), foraging guild (60 species), pelagic specialism (60 species) and migration status (3 species). The imputation approach has the advantage of increasing the sample size and consequently the statistical power of any analysis whilst reducing bias and error (Kim, Blomberg, & Pandolfi, 2018; Penone et al., 2014; Taugourdeau, Villerd, Plantureux, Huguenin-Elie, & Amiaud, 2014).


We estimated missing values using random forest regression trees, a non-parametric imputation method, based on the ecological and phylogenetic relationships between species (Breiman, 2001; Stekhoven & Bühlmann, 2012). This method has high predictive accuracy and the capacity to deal with complexity in relationships including non-linearities and interactions (Cutler et al., 2007). To perform the random forest multiple imputations, we used the missForest function from package “missForest” (Stekhoven & Bühlmann, 2012). We imputed missing values based on the ecological (the trait data) and phylogenetic (the first 10 phylogenetic eigenvectors, detailed below) relationships between species. We generated 1,000 trees - a cautiously large number to increase predictive accuracy and prevent overfitting (Stekhoven & Bühlmann, 2012). We set the number of variables randomly sampled at each split (mtry) as the square-root of the number variables included (10 phylogenetic eigenvectors, 8 traits; mtry = 4); a useful compromise between imputation error and computation time (Stekhoven & Bühlmann, 2012). We used a maximum of 20 iterations (maxiter = 20), to ensure the imputations finished due to the stopping criterion and not due to the limit of iterations (the imputed datasets generally finished after 4 – 10 iterations).


Due to the stochastic nature of the regression tree imputation approach, the estimated values will differ slightly each time. To capture this imputation uncertainty and to converge on a reliable result, we repeated the process 15 times, resulting in 15 trait datasets, which is suggested to be sufficient (González-Suárez, Zanchetta Ferreira, & Grilo, 2018; van Buuren & Groothuis-Oudshoorn, 2011). We took the mean values for continuous traits and modal values for categorical traits across the 15 datasets for subsequent analyses.


Phylogenetic data can improve the estimation of missing trait values in the imputation process (Kim et al., 2018; Swenson, 2014), because closely related species tend to be more similar to each other (Pagel, 1999) and many traits display high degrees of phylogenetic signal (Blomberg, Garland, & Ives, 2003). Phylogenetic information was summarised by eigenvectors extracted from a principal coordinate analysis, representing the variation in the phylogenetic distances among species (Jose Alexandre F. Diniz-Filho et al., 2012; José Alexandre Felizola Diniz-Filho, Rangel, Santos, & Bini, 2012). Bird phylogenetic distance data (Prum et al., 2015) were decomposed into a set of orthogonal phylogenetic eigenvectors using the Phylo2DirectedGraph and functions from the “MPSEM” package (Guenard & Legendre, 2018). Here, we used the first 10 phylogenetic eigenvectors, which have previously been shown to minimise imputation error (Penone et al., 2014). These phylogenetic eigenvectors summarise major phylogenetic differences between species (Diniz-Filho et al., 2012) and captured 61% of the variation in the phylogenetic distances among seabirds. Still, these eigenvectors do not include fine-scale differences between species (Diniz-Filho et al., 2012), however the inclusion of many phylogenetic eigenvectors would dilute the ecological information contained in the traits, and could lead to excessive noise (Diniz-Filho et al., 2012; Peres‐Neto & Legendre, 2010). Thus, including the first 10 phylogenetic eigenvectors reduces imputation error and ensures a balance between including detailed phylogenetic information and diluting the information contained in the other traits.


To quantify the average error in random forest predictions across the imputed datasets (out-of-bag error), we calculated the mean normalized root squared error and associated standard deviation across the 15 datasets for continuous traits (clutch size = 13.3 ± 0.35 %, generation length = 0.6 ± 0.02 %). For categorical data, we quantified the mean percentage of traits falsely classified (diet guild = 28.6 ± 0.97 %, foraging guild = 18.0 ± 1.05 %, pelagic specialism = 11.2 ± 0.66 %, migration status = 18.8 ± 0.58 %). Since body mass and habitat breadth have complete trait coverage, they did not require imputation. Low imputation accuracy is reflected in high out-of-bag error values where diet guild had the lowest imputation accuracy with 28.6% wrongly classified on average. Diet is generally difficult to predict (Gainsbury, Tallowin, & Meiri, 2018), potentially due to species’ high dietary plasticity (Gaglio, Cook, McInnes, Sherley, & Ryan, 2018) and/or the low phylogenetic conservatism of diet (Gainsbury et al., 2018). With this caveat in mind, we chose dietary guild, as more coarse dietary classifications are more predictable (Gainsbury et al., 2018), and we investigated the impact of the trait imputation with sensitivity analyses.

Usage notes

We have uploaded the non-imputed trait data and the imputed trait data. Non-imputed data contain NA values. The R code to replicate the imputation is available on Github:


binomial - species' Latin name

Order - the species' taxonomic order

Family - the species' taxomonic family

English - species' English name

clutch - the species' clutch size

body_mass_median - the species' body mass

GL - the species' generation length

hab_breadth - the species' habitat breadth

pelagic_specialist - is the species a pelagic specialist or non-pelagic specialist

foraging_guild - the species' foraging guild

migrate - is the species a migrant or non-migrant

diet_5cat - the species diet

IUCN - the species' IUCN Red List category (extracted in 2020)