Predicting time-at-depth weighted biodiversity patterns for sharks of the North Pacific

Siders, Zachary 1 ; Trotta, Lauren1; Caltabellota, Fabio2; Patrone, Will3; Loesser, Katherine4; Baiser, Benjamin1

Published Mar 08, 2024 on Dryad. https://doi.org/10.5061/dryad.6hdr7sr7g

Data files

Mar 08, 2024 version files 13.11 MB

Datafiles.zip
13.10 MB
README.md
3.56 KB

Abstract

Depth is a fundamental and universal driver of ocean biogeography, but it is unclear how the biodiversity patterns of larger, more mobile organisms change as a function of depth. Here, we developed a predictive biogeography model to explore how information of mobile species’ depth preferences influence biodiversity patterns. We employed a literature review to collate shark biotelemetry studies and used open-access tools to extract 283 total records from 119 studies of 1,133 sharks from 35 species. We then matched field guide reported depth ranges and IUCN habitat associations for each shark species to use as covariates in a hurdle variant of Ensemble Random Forests. We successfully fit this model (R2 = 0.63) to the noisy time-at-depth observations and used it to predict the time budgets of the Northeast Pacific shark regional pool (n = 52). We then assessed how occurrence diversity patterns, informed by minimum and maximum depth of occurrence, compared to time-at-depth weighted diversity patterns. Time-at-depth weighted richness was highest between 0 and 25 m and at the upper part of the mesopelagic zone, 250 – 300 m; resulting in little similarity to common depth or elevational biodiversity patterns while the occurrence-weighted richness pattern was similar to the “low-plateau” pattern. In the phylogenetic and functional dimensions of biodiversity and over three different distance metrics, we found strong but haphazard differences between the occurrence- and time-at-depth weighted biodiversity patterns. The strong influence of time budgets on biodiversity led us to conclude that occurrence data alone is likely insufficient or even misleading in terms of the depth-driven biogeographic patterns in the open ocean. Utilizing the increasing amount of time-at-depth information from biotelemetry studies in predictive biogeographic models may be critical for capturing the preferences of pelagic, mobile species occupying the largest biome on the planet.

https://doi.org/10.5061/dryad.6hdr7sr7g

This dataset contains all the collected species-specific summarizations, processed .RData files, and R scripts to produce the manuscript results.

Description of the data and file structure

The datasets are structured into two separate types: (1) raw data contained in the “Data” folder in the Datafiles.zip and (2) processed data that are generated as outputs of the R scripts in the “RData” folder in the Scripts&Rdata.zip.

Species-specific time-at-depth raw data

In the “Data” folder is a “Species-specific” folder containing subfolders for each sharks species with a summarization of their time at depth included in the manuscript. Each subfolder has an Excel workbook with separate summarizations in each sheet. The sheet naming convention indicates which summarization is presented. Each summarization has two columns: the first column is the depth bin (with depth in meters) and the second is the proportion of the total time in that bin. For example, the folder “Carcharias_taurus” contains one Excel sheet “Ctaurus_Data.xlsx” with three sheets. The sheet names are: “Ctaurus1_1”, “Ctaurus1_2”, and “Ctaurus1_3” with the first digit indicating which Carcharias taurus study the summarizations came from [in this case Kneebone, Crisholm, and Skomal (2014) Movement patterns of juvenile sand tigers (Carcharias taurus) along the east coast of the USA. Marine Biology. 161: 1149 - 1163] and the second digit after the underscore indicating the specific summarization (see the Metadata files for details).

The study and summarization source materials can also be found in the “Metadata” folder and .csv file contained therein. This file has a unique summarization on each row and each summarization has columns indicating the species common name, the species scientific name, the priority level, the Excel sheet name corresponding to the unique summarization, the number of sharks in that summarization, the approximate length the summarization covers, and a link to the primary literature the summarization came from.

Also in the “Data” folder is a file “spp.hab.csv” which is a table of which habitats classes species occupy, indicated by 0’s (not occupied) and 1’s (occupied) (see publication for each habitat class definition).

Code/Software

The folder “Scripts” contains all R scripts needed to process the raw data (01_vertical_distribution_readin.R), calculate empirical cumulative distribution for each time-at-depth summarization (02_vertical_distribution_ecdf.R), run the hurdle Ensemble Random Forest (03_vertical_distribution_rfr.R), generate the site by species matrix (04_site_by_spp.R), and calculate the dimensions of biodiversity metrics (05_alpha_diversity.R). There is also an additional “figures.R” script to make the manuscript and supplemental figures and a “vertical_distribution_helpers.R” script that accompanies some of the main scripts with necessary variants of functions to conduct the analyses.

In the “RData” folder, are various .RData files produced or called by the various R scripts (see section below for additional details). There are subfolders of “traits” which bring in the Siders et al. (2022) traits, “phylo” which bring in the RAxML phylogeny from Siders et al. (2022), “vert_dist” which stores all the time-at-depth processed data, and “biodiv” which stores all the processed dimensions of biodiversity metrics.