# Model for evaluating seabirds preferences for hake offal in Patagonia

## Cite this dataset

Ojeda, Jaime (2024). Model for evaluating seabirds preferences for hake offal in Patagonia [Dataset]. Dryad. https://doi.org/10.5061/dryad.ngf1vhj3r

## Abstract

This data set was used to build a model to evaluate the seabirds' preferences for hake offal derived from an artisanal fishery in Patagonia.

Main results are: The fishers’ main contribution to seabirds is through offering them the offal of hake catches. We observed that seabirds consumed hake liver 99% of the time, while they consumed stomach less frequently (24%). We identified that southern giant petrels and black-browed albatrosses consumed more liver, while kelp gulls ate more stomach. The liver comprises 51.6% fat, essential for high trophic level marine predators such as black-browed albatrosses.

## README: Model for evaluating seabirds preferences for hake offal in Patagonia

### Dataset Description

We investigated the probability of the seabird assemblage consumption on specific hake offal items, such as the gonad, liver, and stomach. For this study, we defined a seabird assemblage attending a fishing boat in a single sampling period as the total number of taxonomic seabird species, including their abundances. In each sampling period, we randomly threw the offal items one by one from the boat into the sea. Consumption was categorized as “0” if no seabirds consumed an item and “1” if one or more seabirds consumed an offal item. We conducted twenty-four sampling periods, totaling 1298 observations of item consumption. The binomial positive or null “offal consumption by seabirds” served as our response variable, and the explanatory variables included ‘types of offal’ (fixed factor), ‘seasons’ (fixed factor), ‘sampling periods’ (random factor), and ‘abundance of seabird assemblage’ (random factor).

This dataset (`log_sea.xlsx`

) contains observations related to the consumption of various items by a certain species. The dataset includes variables such as `items`

, `seasons`

, `times`

, `abun`

, and `consum`

which represent different aspects influencing consumption behavior.

#### Column Descriptions

`consum`

: Binary variable indicating consumption of seabirds (0 = not consumed, 1 = consumed).`items`

: Categorical variable indicating types of hake offal (gonad, liver, stomach).`times`

: Integer variable indicating the number of experimental surveys.`abun`

: Integer variable indicating total seabird abundance around a fishing boat.`seasons`

: Categorical variable indicating seasonality of samples (winter, spring, summer).

### Data Preparation and Libraries Used

The following libraries were used for the analysis:

```
library("lme4")
library("nlme")
library("ggplot2")
library("MuMIn")
library("emmeans")
library("pROC")
library("ggeffects")
```

### Models and Analysis

#### Logistic Regression Models

Several logistic regression models were created to determine the effect of different variables on consumption:

```
model_1 <- glmer(consum ~ items + seasons + (1 | times) + (1 | abun), data = log_sea, family = binomial)
model_2 <- glmer(consum ~ items + seasons + (1 | times), data = log_sea, family = binomial)
model_3 <- glm(consum ~ items + seasons, data = log_sea, family = binomial)
model_4 <- glmer(consum ~ items + (1 | times), data = log_sea, family = binomial)
model_5 <- glm(consum ~ items, data = log_sea, family = binomial)
model_6 <- glmer(consum ~ 1 + (1 | times), data = log_sea, family = binomial)
```

#### Model Selection and Evaluation

The models were evaluated using Akaike Information Criterion (AIC) and corrected AIC (AICc):

```
# Calculate AIC
aic_values <- data.frame(
Model = c("model_1", "model_2", "model_3", "model_4", "model_5", "model_6"),
AIC = c(AIC(model_1), AIC(model_2), AIC(model_3), AIC(model_4), AIC(model_5), AIC(model_6))
)
aic_values <- aic_values[order(aic_values$AIC), ]
print(aic_values)
# Calculate AICc
aic_values <- data.frame(
Model = c("model_1", "model_2", "model_3", "model_4", "model_5", "model_6"),
AICc = c(AICc(model_1), AICc(model_2), AICc(model_3), AICc(model_4), AICc(model_5), AICc(model_6))
)
aic_values <- aic_values[order(aic_values$AICc), ]
aic_values$Delta_AICc <- aic_values$AICc - min(aic_values$AICc)
aic_values$Akaike_Weight <- exp(-0.5 * aic_values$Delta_AICc) / sum(exp(-0.5 * aic_values$Delta_AICc))
aic_values$Cumulative_Akaike_Weight <- cumsum(aic_values$Akaike_Weight)
print(aic_values)
```

#### Final Model Selection

The final model selected based on AIC and AICc values is `model_4`

:

```
# Final model
model_4 <- glmer(consum ~ items + (1 | times), data = log_sea, family = binomial)
summary(model_4)
anova(model_2, model_4, test = "Chi")
```

#### Post-hoc Analysis

Post-hoc analysis was conducted using the `emmeans`

package:

```
lsmeans(model_4, pairwise ~ items)
emmeans(model_4, pairwise ~ items, adjust = "tukey")
```

#### Model Fit and Visualization

The model fit was evaluated using ROC curves and predicted probabilities:

```
# ROC curve
roc_curve <- roc(log_sea$consum, predict(model_4, type = "response"))
plot(roc_curve)
# Predicted probabilities
ggpredict(model_4, c("items"), type = "fe")
df <- data.frame(
items = c("gonads", "liver", "stomach"),
predicted = c(0.55, 1.00, 0.24),
lower_ci = c(0.40, 0.99, 0.15),
upper_ci = c(0.68, 0.99, 0.37)
)
ggplot(df, aes(x = items, y = predicted)) +
geom_bar(stat = "identity", fill = "blue") +
geom_errorbar(aes(ymin = lower_ci, ymax = upper_ci), width = 0.4) +
labs(x = "Item", y = "Probabilidad predicha", title = "Probabilidades predichas de consumo") +
theme_classic()
```

### Conclusion

The logistic regression models help determine the significant effects of different items on consumption, with `model_4`

being the best-fitting model based on AIC and AICc values.

## Funding

Universidad de Magallanes