Large differences in the distribution of pelagic biomass as a result of sonar frequency choice
Data files
Apr 18, 2025 version files 12.43 GB
-
CC0_DRYAD_deposit.zip
12.43 GB
-
README.md
18.03 KB
Abstract
Recent efforts to understand the global distribution of pelagic fauna have primarily relied on 38 kHz sonar observations, using water-column backscatter as a proxy of biomass. However, backscatter gradients across ocean provinces are not always consistent with biomass observations from net sampling. This mismatch is particularly evident in temperate to polar transition zones due to changes in resonance of pelagic fauna, which depends on sonar frequency but also on the size of resonant organs, such as fish swimbladders or zooplankton gas inclusions. Here, we investigate how sonar frequency choice changes our vision of pelagic ecosystems across latitudes. We analyse sonar observations at 38 and 18 kHz along with size distributions of swimbladdered fish species across the Indian Ocean Subantarctic Front. Our results show a shift from 38 to 18 kHz dominance towards the poles. More interestingly, backscatter differences across the Subantarctic Front are four times larger at 38 kHz compared to 18 kHz. The size distribution of fish suggests an increase in swimbladder volumes in subantarctic waters, which may explain the observed shift in frequency response. This study highlights the need to address swimbladder resonance variability across latitudes, with the aim of harmonising large-scale sonar observations of pelagic fauna.
Dataset DOI: 10.5061/dryad.3n5tb2rs7
Description of the Data and File Structure
This dataset supports the analyses presented in our paper and its supplementary material. The latter provides detailed information on the methodology and intermediate results used to describe and model the vertical distribution of acoustic backscatter across the Indian sector of the Southern Ocean. Data were collected as part of the Toward Hydroacoustics and Ecology of Mid-trophic levels in Indian and Southern Ocean (THEMISTO) observation programme (DOI: 10.18142/288).
We recommend consulting the supplementary material alongside this dataset and the provided scripts. Data were collected during austral summer cruises in January, February, and March of 2021, 2022, and 2023.
Firstly, we detail the analysis of in situ acoustic observations, focusing on the application of functional data analysis and clustering methods. Secondly, the backscatter modelling process using Random Forest with principal components from environmental profiles as predictors is explained. Finally, we describe net sampling methods to validate our acoustic observations, including fishing operations conducted during the 2022 acoustic surveys, as well as historical data from museum collections and literature.
Dataset structure and data formats
File: CC0_DRYAD_deposit.zip
The dataset includes:
- Raw acoustic and environmental data
- Scripts for each step of the analysis
- Output folders (mostly empty) to preserve reproducibility structure
Folder Structure and Contents
The dataset is structured to facilitate reproducibility and transparency. The file organisation follows the standard R project format, and the folder hierarchy reflects the different stages of the analysis.
│ Analysis_Roy_Soc.Rproj # Main R project file
├───00_data # All input datasets
│ │ polyW75.RDS
│ │ zone_bathy.Rdata
│ │
│ ├───acous_data_averaged_by_period # Averaged acoustic data
│ │ days.Rdata
│ │ days_metadata.Rdata
│ │ depths.Rdata
│ │ nights.Rdata
│ │ nights_and_days.Rdata
│ │ nights_and_days_metadata.Rdata
│ │ nights_metadata.Rdata
│ │
│ ├───acous_resonance_model # Excel and CSV files for resonance modelling
│ │ model_resonance_and_dwba_at_multiple_depths.xlsx
│ │ model_resonance_dwba_500m.csv
│ │
│ ├───copernicus-data # Satellite-derived environmental profiles and metadata
│ │ │ depth.rds
│ │ │ DN_metadata.rds
│ │ │ final_CHL_mat.rds
│ │ │ final_OX_mat.rds
│ │ │ final_S_mat.rds
│ │ │ final_T_mat.rds
│ │ │ lat.rds
│ │ │ lon.rds
│ │ │ metadata_themisto.rds
│ │ │ NA_to_rem.rds
│ │ │ smooth_daily_chl_profiles.rds
│ │ │ smooth_daily_ox_profiles.rds
│ │ │ smooth_daily_sal_profiles.rds
│ │ │ smooth_daily_temp_profiles.rds
│ │ │
│ │ ├───2021
│ │ │ ├───daily # Daily NetCDF data (2021)
│ │ │ └───monthly # Monthly NetCDF data (2021)
│ │ ├───2022
│ │ │ ├───daily
│ │ │ └───monthly
│ │ └───2023
│ │ ├───daily
│ │ └───monthly
│ │
│ └───Fronts_bioregions # Oceanic fronts and biogeographic regions
│ DSTF_Graham.csv
│ fronts_Park_2019.nc
│ Kim_fronts_pf.csv
│ Kim_fronts_saccf.csv
│ Kim_fronts_saf.csv
│ Kim_fronts_sbdy.csv
│ MBGCP_reygondeau_18.csv
│
├───01_scripts # All processing scripts
│ │ Step1_acous_MFDA_averaged_18_38.R # Acoustic smoothing + fPCA
│ │ Step2_classif_mfPCA_scores.R # Clustering of acoustic fPCA scores
│ │ Step3A_env_get_satellite_data_21_22_23.R # Download Copernicus data
│ │ Step3B_get_daily_profiles.R # Create daily environmental profiles
│ │ Step3C_fPCA_and_associate_cps_to_cps_21_22_23.R# fPCA on env. profiles and associations
│ │ Step4_predict_acous_pcs_.R # Predict acoustic PCs from env. PCs
│ │ Step5_plot_var_importance.R # Plot variable importance
│ │ Step6_reconstruction_from_acous_pred_pcs.R # Reconstruct acoustic profiles
│ │ Step7_pred_profiles_Sv_to_NASC.R # Convert predictions to NASC
│ │ Step8A_make_figure1.R # Make figure 1
│ │ Step8A_make_figure2.R # Make figure 2
│ │ Step8A_make_figure3.R # Make figure 3
│ │ Step8A_make_figure4.R # Make figure 4
│ │
│ └───functions_and_config # Custom functions and config files
│ jj2date_f.R
│ Step0a_CONFIG_FILE.R
│ Step0b_MAP_FILE.R
│ Step0_to_make_and_save_map_GEBCO.R
│
└───02_outputs # All outputs from the analysis pipeline
├───Article_figures # Final figures for publication
├───Data # Intermediate and processed data
│ ├───mfpca_objects_for_reco # fPCA objects used in reconstructions
│ │ ├───days
│ │ ├───env
│ │ └───nights
│ ├───model_data_after_proj # Predicted profiles
│ │ ├───days_pres
│ │ │ ├───NASC_computation
│ │ │ ├───predicted_data
│ │ │ ├───pred_1jan_21 to pred_9mar_23
│ │ └───nights_pres
│ │ ├───NASC_computation
│ │ ├───predicted_data
│ │ ├───pred_1jan_21 to pred_9mar_23
│ ├───model_data_before_proj # Input data before projection
│ ├───model_results # Model evaluation and summary results
│ └───res_classif # Clustering results
│ ├───days
│ └───nights
└───Figures # Organised output plots
├───days
│ ├───mfPCA
│ │ └───eigenfunctions
│ │ └───FIG_VMs_and_scores
│ ├───res_classif
│ └───res_pred
├───fPCA_env
│ └───eigenfunctions
├───model_results
├───NASC_predit
│ ├───days
│ └───nights
└───nights
├───mfPCA
│ └───eigenfunctions
│ └───FIG_VMs_and_scores
├───res_classif
└───res_pred
Description of the Data by Folder
Root
Analysis_Roy_Soc.Rproj – This R Project file should be opened first. It defines the working environment and organises all elements of the analysis workflow.
00_data – Contains all raw data used in the analysis, divided into several thematic folders:
polyW75.RDS– R object defining the 75th percentile weighted polygon of an Area of Ecological Significance (AES) in the Indian sector of the Southern Ocean (after Hindell et al., 2020).zone_bathy.Rdata– Matrix of bathymetry (in metres) for the study region.
Folder: acous_data_averaged_by_period – Averaged vertical acoustic backscatter data:
depths.Rdata– Vector of standard depths (in metres) corresponding to acoustic measurements for each frequency.days.Rdata– Median vertical acoustic backscatter profiles (Sv) for daytime, per station and frequency (18 & 38 kHz).days_metadata.Rdata– Metadata for daytime stations (e.g. coordinates, date, sampling period).nights.Rdata– Median vertical acoustic backscatter profiles (Sv) for nighttime, per station and frequency (18 & 38 kHz).nights_metadata.Rdata– Metadata for nighttime stations.nights_and_days.Rdata– Combined day and night backscatter profiles, per station and frequency.nights_and_days_metadata.Rdata– Metadata for stations with both day and night observations.
Folder: acous_resonance_model – Files for theoretical scattering model calculations:
model_resonance_and_dwba_at_multiple_depths.xlsx– Excel file containing outputs of a resonance and DWBA (Distorted Wave Born Approximation) scattering model for key fish families across depths and frequencies.model_resonance_dwba_500m.csv– Same model outputs, limited to 500 m depth.
Folder: copernicus-data – Environmental data from the Copernicus Marine Environment Monitoring Service:
Subfolders:
2021/,2022/,2023/– Contain NetCDF (.nc) files of raw chlorophyll and oxygen profiles:daily/– Daily resolution NetCDF files (0–1060 m).monthly/– Monthly resolution NetCDF files.
Processed files (in the root of copernicus-data/):
depth.rds– Depths of environmental observations.DN_metadata.rds– Metadata linking Copernicus profiles with acoustic observations.lat.rds,lon.rds– Latitude and longitude of environmental observations.metadata_themisto.rds– General metadata for THEMISTO Copernicus extractions.NA_to_rem.rds– Indices of profiles to be removed for NASC computations and plots.
Processed environmental matrices (monthly and smoothed daily):
final_CHL_mat.rds– Processed chlorophyll profiles (monthly).final_OX_mat.rds– Processed oxygen profiles (monthly).final_S_mat.rds– Processed salinity profiles (monthly).final_T_mat.rds– Processed temperature profiles (monthly).smooth_daily_chl_profiles.rds– Smoothed daily chlorophyll profiles.smooth_daily_ox_profiles.rds– Smoothed daily oxygen profiles.smooth_daily_sal_profiles.rds– Smoothed daily salinity profiles.smooth_daily_temp_profiles.rds– Smoothed daily temperature profiles.
Folder: Fronts_bioregions – Oceanic fronts and large-scale bioregions:
DSTF_Graham.csv– Position of the Dynamical Subtropical Front (DSTF).fronts_Park_2019.nc– NetCDF file with multiple frontal positions from Park et al. (2019).Kim_fronts_pf.csv– Polar Front (PF) positions from Kim & Orsi (1995).Kim_fronts_saccf.csv– Southern ACC Front (SACCF) positions.Kim_fronts_saf.csv– Subantarctic Front (SAF) positions.Kim_fronts_sbdy.csv– Southern Boundary (SBDY) of the ACC.MBGCP_reygondeau_18.csv– Marine biogeographic classification polygons from Reygondeau et al. (2018).
01_scripts – Analysis Code
The analysis is written entirely in R and structured as an RStudio project (Analysis_Roy_Soc.Rproj).
The pipeline is composed of a sequence of scripts located in the 01_scripts/ folder. These scripts should be run in order, as intermediate outputs are saved and reused in subsequent steps.
Before starting, make sure all required R packages are installed. These are listed and loaded in the files within functions_and_config/, which also contains custom functions and mapping tools used across the pipeline.
Each script is modular but follows a logical progression. Below is a description of the main analysis steps:
1. Step1_acous_MFDA_averaged_18_38.R
Processes raw vertical acoustic backscatter data (Sv) recorded at 18 and 38 kHz.
The script smooths these profiles using Functional Data Analysis (FDA) and performs Functional Principal Component Analysis (fPCA) to extract dominant modes of vertical variability.
The resulting principal components (PCs) are saved for use in clustering and modelling steps.
2. Step2_classif_mfPCA_scores.R
Applies unsupervised clustering on the acoustic PCs to identify distinct echobiomes—groups of stations with similar vertical acoustic structure.
It also computes mean Sv profiles and NASC (Nautical Area Scattering Coefficient) values for each echobiome.
3. Step3A_env_get_satellite_data_21_22_23.R
Process satellite-derived environmental profiles (temperature, chlorophyll, oxygen) from the Copernicus Marine Service for the years 2021 to 2023.
The data are filtered to the 25–1000 m depth range and formatted for further processing.
4. Step3B_get_daily_profiles.R
Process satellite-derived environmental profiles (temperature, chlorophyll, oxygen) from the Copernicus Marine Service for the years 2021 to 2023, from daily profiles matching acoustic observations.
The data are filtered to the 25–1000 m depth range and formatted for further processing.
5. Step3C_fPCA_and_associate_cps_to_cps_21_22_23.R
Performs fPCA on the environmental profiles to extract the main modes of environmental variability.
Daily profiles are projected into this environmental PC space and associated with the corresponding acoustic PCs from Step 1. Then links daily environmental profiles with the corresponding acoustic observations.
This allows for a day-by-day comparison between environmental structure and acoustic features at each station. This matched dataset is used to train predictive models.
6. Step4_predict_CP.R
Trains Random Forest models (via the h2o package) to predict acoustic PC scores from environmental PC scores.
The trained models are then applied to monthly environmental climatologies to generate predicted acoustic PCs across the study region.
7. Step5_plot_var_importance.R
Evaluates the relative importance of each environmental variable (e.g., temperature, chlorophyll, oxygen) in predicting acoustic PCs.
Produces visualisations of variable importance derived from the Random Forest models.
8. Step6_reconstruction_from_pred_cp.R
Reconstructs full vertical Sv profiles from the predicted acoustic PCs by summing weighted eigenfunctions obtained in Step 1. This yields spatially continuous estimates of vertical acoustic structure across the study region.
9. Step7_pred_profiles_Sv_to_NASC.R
Converts the reconstructed Sv profiles into NASC values—standard acoustic biomass metrics.
These are integrated over specified depth ranges to produce spatial predictions of acoustic energy.
10. Step8_make_figureX.R
Scripts for generating Figures 1 to 4 of the article.
These include visual summaries of the identified echobiomes, spatial distributions of NASC, and reconstructions of acoustic structure.
02_outputs – Results and Figures
This folder is mostly empty in the .zip archive to preserve structure and facilitate reproducibility.
It will be populated automatically as the scripts are run.
└───02_outputs # All outputs from the workflow
├───Article_figures # Final figures for publication
├───Data # Intermediate and final data products
│ ├───mfpca_objects_for_reco # fPCA objects used to reconstruct observed profiles
│ │ ├───days # fPCA objects for the daytime period
│ │ ├───env # fPCA objects for environmental profiles
│ │ └───nights # fPCA objects for nighttime period
│ ├───model_data_after_proj # Data after projection of environmental PCs
│ │ ├───days_pres # For daytime
│ │ │ ├───NASC_computation # Integrated backscatter (NASC) values
│ │ │ ├───predicted_data # predicted profiles
│ │ │ ├───pred_1jan_21 to pred_9mar_23 # Predicted acoustic PCs for each day
│ │ └───nights_pres # Same for night
│ │ ├───NASC_computation
│ │ ├───predicted_data
│ │ ├───pred_1jan_21 to pred_9mar_23
│ ├───model_data_before_proj # Training data for the Random Forest models
│ ├───model_results # Model outputs and performance metrics
│ └───res_classif # Clustering results (echobiome assignments)
│ ├───days # for daytime
│ └───nights # for nightime
└───Figures # Diagnostic and exploratory plots
├───days # Figures for daytime data
│ ├───mfPCA # result of the multivariate functional principal component analysis
│ │ └───eigenfunctions # plot the eigenfunction
│ │ └───FIG_VMs_and_scores # Formatted figures for LaTeX export
│ ├───res_classif # Clustering result figures
│ └───res_pred # Prediction visualisations
├───fPCA_env # fPCA results for environmental data
│ └───eigenfunctions # plot the eigenfunction
├───model_results # Random Forest model result figures
├───NASC_predit # NASC prediction figures
│ ├───days # for daytime
│ └───nights # for nightime
└───nights # Figures for nighttime data
├───mfPCA # result of mfpca
│ └───eigenfunctions # plot the eigenfunctions
│ └───FIG_VMs_and_scores
├───res_classif # result of the clustering
└───res_pred # result of the prediction
- Izard, Lloyd; Ariza, Alejandro; Fonvieille, Nadège et al. (2025). Large differences in the distribution of pelagic biomass as a result of sonar frequency choice. Proceedings of the Royal Society B: Biological Sciences. https://doi.org/10.1098/rspb.2024.2991
