Large marine predator aerial survey data for Hauraki Gulf, New Zealand
Data files
May 01, 2023 version files 2.48 MB
-
Raw_data.7z
-
README.md
Abstract
Large marine predators, such as cetaceans and sharks, play a crucial role in maintaining biodiversity patterns and ecosystem health. Despite the recognised importance of these animals and their over-representation as threatened species, distribution data at appropriate temporal and spatial scales is often lacking or insufficient for effective conservation.
Here, we present sightings of large marine megafauna recorded from a replicate systematic aerial survey undertaken in the Hauraki Gulf, Aotearoa New Zealand during a full year. Using flexible machine learning models (Boosted Regression Tree models), we use these sightings data to investigate relationships between large marine predator occurrence (Bryde’s whales, common and bottlenose dolphins, bronze whalers, pelagic and immature hammerhead sharks) and spatially explicit environmental and biotic variables to predict species richness of large marine predators and investigate their fine-scale spatiotemporal distribution patterns. All models were considered informative (all, AUC > 0.78), and temporally dynamic variables, such as the distribution of prey, were important in predicting the occurrence of the study species and species groups.
Our approach and data highlight the value of multi-species surveys and the importance of considering temporally variable abiotic and biotic drivers for understanding biodiversity patterns when informing ecosystem-scale conservation planning and dynamic ocean management.
We provide data files of:
- Locations of species presence / pseudo absence location over time (in .csv format) and associated environmental and biotic variables for Bryde’s whales, common and bottlenose dolphins, bronze whaler, pelagic and immature hammerhead sharks.
- Monthly estimates of 14 high-resolution spatially explicit environmental and biotic variables (1 km grid resolution): Bathymetry; Slope; Distance from shore; Distance to 40m depth; Sand; Mud; Gravel; Seabed disturbance; Tidal current; Chlorophyll a; Sea surface temp; Distance to plankton; Distance to prey (saved as .R data)
- Model objects, R code, and model outputs of the Boosted Regression Tree modelling.
- Predicted monthly distributions (and associated spatially explicit uncertainty) of Bryde’s whales, common dolphins, bottlenose dolphins, bronze whaler sharks, pelagic sharks, immature hammerhead sharks, and richness of large marine predators in the Hauraki Gulf, New Zealand, (January to December). Monthly richness estimates of large marine predators.
Methods
Study site
The Hauraki Gulf (36⁰ 10’–37⁰ 10’ S; 174⁰ 40–175⁰ 30’E) is a large (~4,000 km2), semi-enclosed embayment situated on the northeast coast of the North Island – Te Ika a Māui, New Zealand.
Replicate systematic aerial survey design and protocol
Twenty-two double observer line-transect aerial surveys were conducted typically twice a month from November 2013 to October 2014, following MacKenzie and Clement (2014). Surveys followed a systematic grid of eight parallel transects evenly spaced apart every 10 km, orientated offshore at 62° from the north. Under this design, it was possible to sample across a wide range of environmental conditions. The survey followed a simple, unstratified design due to a wide range of species with different or unknown distributions (Dawson et al. 2008). To ensure equal coverage probability (i.e., a random sample of the total habitat area in the survey area; Buckland et al. 2004) the start point of each survey was randomly chosen using the striplet method (Fewster 2011). Surveys commenced when the Beaufort sea state (BSS) was ≤3 across the study area, there was no rain or fog, and there was sufficient light, i.e., one hour after sunrise and before sunset.
All 22 surveys were conducted in a Cessna 207 fixed-wing aircraft, flying at 500 feet (152.4 m) and 100 knots (185.2 km/h). The aircraft accommodated two observers on each side who operated independently with no communication during the survey. Observers logged all sightings of large marine predator species, i.e., cetaceans, pinnipeds, sharks, rays, and oceanic fish observed during the flights. Observers also recorded all sightings of potential prey patches, i.e., schooling fish such as kahawai (Arripis trutta), pilchards (Sardinops sagax), jack mackerel (Trachurus spp.), saury (Scomberesox saurus) and zooplankton aggregations.
For each sighting, the observer recorded 1) time of the sighting to the second, 2) species or the highest taxonomic level possible, 3) number of individuals, or, where they could not do an absolute count, an estimate of the minimum, best, and the maximum number of individuals, 4) for cetaceans, the perpendicular angle of the sighting to the plane was measured using an inclinometer and 5) environmental conditions: the BSS, glare direction, glare coverage, and water colour.
Due to the double-observer design, records partly consisted of duplicate sightings, i.e., two records of the same animal or group made by observers seated on the same side of the plane. Duplicate sightings were reconciled post-hoc by comparing all records of the same species made by observers seated on the same side of the plane during the same survey. For cetaceans, sightings made within seven seconds and within 10° of each other were considered a duplicate sighting. Sharks and rays were generally encountered at low densities allowing for easy identification of duplicate records. However, we applied a narrower time frame of five seconds to classify records of sharks and prey to compensate for the reduction in the information available to identify duplicates.
Sightings of sharks exceeding an estimated total length of 2.5 m (Last and Stevens 2009) that share a similar ecological role were combined to form “pelagic sharks” as they typically inhabit similar habitats (pers. comm. Clinton Duffy, Department of Conservation). These included rarely sighted species, i.e., shortfin mako (Isurus oxyrinchus), blue (Prionace glauca), and adult smooth hammerhead (Sphyrna zygaena) sharks. Juvenile hammerhead shark (total length < 2.5 m; Last and Stevens 2009) distribution was modelled separately as they exhibit different habitat preferences to adults.
Species Distribution Modelling
Presence / pseudo-absence records
To estimate the distributions of the large marine predators, binomial BRT models require locations of both presences and absences (Manly et al. 2007). Presence records consisted of sightings recorded during aerial surveys. Given that our sightings only reflect a portion of the animals’ habitat use (i.e., when the animal is at, or close to, the surface of the water), presences records will likely be an underestimate of true distribution. In line with this, absence information collected from our aerial surveys should therefore also not be considered true absences. Instead, pseudo-absences (artificial absence points) were generated (Lobo and Tognelli 2011; Phillips et al. 2009).
When data are collected during systematic surveys (as was the case here), pseudo-absences are distributed within the absence zones, i.e., where no sightings were made (Derville et al. 2016). To capture habitat use at a fine temporal (one-day) scale (Derville et al. 2016), 22 unique sets of pseudo-absences were generated for each species, corresponding to each survey. Pseudo-absences were distributed in the on-effort portions of transects, i.e., where observers were in search mode and within the viewing strip as per distance sampling methods (Hamilton et al. 2018; 420 m transect width for sharks and dolphins; 570 m for Bryde’s whale).
To ensure that pseudo-absence points reflected available but unused habitat in a non-biased manner, sightings collected over the entire period were pooled to create a kernel density map for each taxon using the Spatial Analyst kernel density tool in ArcMap (2019, v10.2). Density contours were overlaid on survey transects and cropped to the width and length of transects. An exclusion zone was created around presence points two times the length of the transect width (880 m, 1140 m for Bryde’s whales) to prevent environmental overlap between presences and pseudo-absences – the length approximate to the 1 km spatial resolution of the study (Torres et al. 2008). We generated stratified random pseudo-absence points with numbers inversely related to the density contours. Pseudo-absences were distributed with a minimum of two times the transect width to avoid serial autocorrelation (Derville et al. 2016). To weight pseudo-absences according to the stratified design, we created a standardised, inverted kernel density map and extracted the kernel density estimate (KDE) of the cell within which a pseudo-absence point fell to assign a weight. The process produced approximately 30 times more pseudo-absences than presence points for sub-sampling during the modelling process.
Environmental and biotic variables
Spatial estimates of 14 high-resolution environmental and biotic variables (1 km grid resolution) that influence the distribution of sharks and cetaceans (Redfern et al. 2006; Schlaff et al. 2014; Tardin et al. 2017; Stephenson et al. 2020) were collated. Some variables were static (no temporal variation, e.g., bathymetry), whereas others were dynamic (temporal variation, e.g. sea surface temperature - SST). Dynamic variables were available as a daily estimate. Two oceanographic variables — seafloor sediment disturbance from wave action and tidal currents — were only available as annual averages and therefore were considered static, despite knowing that these vary through time. Biotic variables included the distance of species’ sightings to zooplankton and fish sightings, representing prey.
Model fitting
Relationships between large marine predator presence / pseudo-absence and environmental and biotic variables were investigated using BRT models. All statistical analyses were undertaken in R (R Core Team 2020) using the ‘Dismo’ package (Hijmans et al. 2017).
Habitat models were built using a portion of the pseudo-absence dataset. Points were randomly selected until the combined weight of the pseudo-absences was equal to the combined weight of the presences, where presences were given a weight of 1 and pseudo-absences were given the weight of the KDE. BRT models were fitted with Bernoulli error distribution with a bag fraction of 0.6, a learning rate (the shrinkage parameter) of 0.001–0.0001 (to fit between 1000 and 2500 trees for each species’ model), and a tree complexity of 3 (the number of interactions fitted) using a random five-fold cross-validation.
Model simplification can be beneficial for small datasets as including more predictor variables than necessary may degrade the quality of model performance (Elith et al. 2008). A BRT model was initially fitted using all available environmental variables, which was then subjected to a simplification process whereby environmental variables were removed from the models, one at a time, using the “simplify” function (Elith et al. 2006).
BRT models were assessed using cross-validated measures of model performance (Elith et al. 2008; Compton et al. 2012), including the deviance explained and the area under the receiver operating characteristic curve (AUC). The explained deviance provides a measure of the goodness-of-fit between the predicted and raw values (total deviance) (Compton et al. 2012). The relative influence of each environmental variable in the models was the number of times it was selected for splitting, weighted by the squared improvement to the model as a result of each split (using in-bag data) (Friedman and Meulman 2003). The association between species and species group presence / pseudo-absence and environmental predictor variables was illustrated using partial dependence plots (i.e., predicted response curve of species probability of occurrence across the gradient of the variable of interest when all other variables are held at their means).
BRT models were bootstrapped 100 times for all species using the same parameter settings as those used in the tuned models. For each bootstrap iteration, a random sample of sightings was drawn with replacement, and random samples of pseudo absences were selected until the weighting of the pseudo-absence (from the KDE) was the same as the overall weighting of sightings available for each species. Sampling of pseudo-absences was drawn without replacement. At each bootstrap iteration, model fits were estimated using both “training data” (the randomly selected data used to tune the model) and the “evaluation” data (the remaining data not randomly selected from the presence data and a random pseudo absence selection as detailed above). Model fits calculated using evaluation data are presented here because these are considered a more robust and conservative method of evaluating goodness-of-fit (Friedman et al., 2001). For each species, bootstrapped BRT models were predicted geographically for each month using mean monthly estimates for those variables that were considered dynamic, and the static variables. The mean probability of occurrence and the coefficient of variation (Anderson et al. 2016) were calculated for each 1 km2 cell to produce 12 monthly habitat prediction maps and 12 spatially explicit maps of uncertainty, respectively. Monthly distribution of species richness was calculated by summing the monthly occurrence probability predictions (ranging from 0 to 1) from individual models (six BRT models) (Ferrier and Guisan 2006; Calabrese et al. 2014). Finally, a mean annual occurrence probability was produced for each species and species group by averaging each species’ monthly predictions.
Usage notes
All files provided can be opened using the open source software R and R-studio.