Community-level egg and larval traits interaction with the Amazon River Plume determines its role as a dispersal barrier
Data files
May 22, 2026 version files 727.17 MB
-
Data_Table_ARM.xlsx
574.70 KB
-
Data_Table_GAM.xlsx
536.14 KB
-
DistanceTraveled_withDVM_EN.pkl
363.01 MB
-
DistanceTraveled_withoutDVM_EN.pkl
363.01 MB
-
Ecoregions.zip
23.14 KB
-
R_ARM.R
3.58 KB
-
R_GAM.R
5.02 KB
-
README.md
6.76 KB
Abstract
This dataset contains the total horizontal distance traveled (km) by individual virtual particles representing generic planktonic propagules in Lagrangian dispersal simulations designed to estimate community-level connectivity in the Western Tropical Atlantic. Simulations were conducted from 1992 to 2015 using the ICHTHYOP model (Lett et al., 2008), forced by hydrodynamic fields from the Regional Ocean Modeling System (ROMS), and designed to capture seasonal variability in the North Brazil Current system (wet season: February–April; dry season: August–October). A total of 614 spawning habitats (~500 km² each), grouped into five ecoregions along the northern Brazilian continental shelf (Northeast Brazil, São Marcos Bay, Pará–Maranhão, Amazon River Mouth, and Amapá), were defined as release sites. Particles were released within the upper 50 m of the water column, with 70,000 particles per simulation and 420,000 per year.
Dispersal was evaluated across four planktonic larval durations (PLDs: 5, 15, 30, and 45 days), representing a gradient of reproductive strategies from short-lived propagules to long-duration larval stages. Simulations were conducted under two behavioral scenarios, with and without diel vertical migration (DVM), totaling 288 experiments. While particles were treated as a single functional group in the physical model, PLDs allow interpretation of dispersal potential across different biological components of the community. Environmental constraints, including temperature and salinity tolerance thresholds, were incorporated to represent larval survival conditions.
The dataset corresponds to processed simulation outputs and contains the cumulative horizontal distance traveled by each particle over its pelagic duration, summarizing full trajectories into an integrated displacement metric suitable for direct analysis of dispersal potential. Due to storage limitations, raw simulation outputs (including full particle trajectories at high temporal resolution) are not included in this repository but are available upon request from the corresponding author (rmnbatistasantos@gmail.com).
This dataset supports research on marine connectivity, larval ecology, and the biogeographic influence of the Amazon River Plume, and can be reused for model intercomparison, hypothesis testing, or integration with genetic and ecological datasets. No ethical or legal restrictions apply, and all model configurations and biological assumptions are available for replication and adaptation in related studies.
Dataset DOI: 10.5061/dryad.h18932016
Description of the data and file structure
This dataset compiles outputs from Lagrangian dispersal simulations performed with the ICHTHYOP model (Lett et al., 2008), using hydrodynamic fields from the Regional Ocean Modeling System (ROMS) to investigate larval transport processes in the Western Tropical Atlantic between 1992 and 2015. The dataset represents the trajectories and survival of virtual larvae constrained by biological and environmental factors, including pelagic larval duration (PLD), diel vertical migration (DVM), and tolerance limits for temperature and salinity. Simulations were conducted under two scenarios—with and without DVM—across 24 years and two hydrological seasons (wet and dry), totaling 288 experiments.
A total of 614 spawning habitats, each covering approximately 500 km², were defined along five coastal ecoregions: Northeast Brazil, São Marcos Bay, Pará-Maranhão, the Mouth of the Amazon River, and Amapá. In each simulation, 70,000 particles were released within the upper 50 m of the water column, totaling 420,000 particles per year. The dataset includes particle trajectories, mortality causes, environmental conditions, and metadata describing the spatial and temporal configuration of each release.
Files and variables
File: Data_Table_ARM.xlsx
Description: Tabulated data for the multivariate regression tree.
File: Data_Table_GAM.xlsx
Description: Tabulated data for the Generalized Additive Model.
The tabular data files (Data_Table_GAM.xlsx and Data_Table_ARM.xlsx) contain the following variables:
- Zone – Geographic location identifier of the 614 individual spawning habitats considered in the Lagrangian particle dispersion simulations. Each record represents a spawning habitat used as a source or destination in connectivity analyses. Spatial locations and associated habitat polygons can be visualized in Ecoregions.zip, which contains the shapefiles of spawning zones and marine ecoregions used in the simulations. No units apply.
- Season – Seasonal oceanographic scenario used in the simulations. Two categorical classes were considered:
- Wet: February–April, period associated with the northwestward flow of the North Brazil Current (NBC).
- Dry: August–October, period during which approximately 70% of plume waters follow the NBC retroflection and feed the North Equatorial Countercurrent (NECC).
No units apply.
- Diel vertical migration (DVM) – Indicates whether diel vertical migration behavior was included in the particle simulations. DVM refers to the synchronized vertical movement of marine organisms within the water column over a 24-hour cycle. This variable is categorical with two possible values:
- With – simulations including vertical migration behavior.
- Without – simulations assuming passive particles without vertical migration.
No units apply.
- Ecoregion – Marine ecoregion classification assigned to each spawning habitat. For analytical purposes, spawning habitats were grouped into five ecoregions along the northern Brazilian continental shelf, from east to west: 1. Northeast Brazil; 2. São Marcos Bay; 3. Pará–Maranhão; 4. Mouth of the Amazon River; 5. Amapá
This is a categorical variable with no units. - Distance(km) – Geographic distance between spawning habitats or connectivity nodes, measured in kilometers (km). Continuous numerical variable.
- Pelagic larval duration(days) – Duration of the pelagic larval stage used to define the dispersal time window in the simulations. Four PLD scenarios were analyzed: 5 days; 15 days; 30 days; 45 days
Units: days. - Depth(m) – Bathymetric depth of the spawning habitat location. Continuous numerical variable measured in meters (m).
File: R_ARM.R
Description: R script for a multivariate regression tree.
File: R_GAM.R
Description: R script for a multivariate regression tree.
File: DistanceTraveled_withDVM_EN.pkl
Description: Processed results from the dispersal simulation performed in Ichthyop (a Lagrangian tool for simulating ichthyoplankton dynamics) containing the cumulative horizontal distance traveled by each particle throughout its pelagic duration. Scenario with daily vertical migration. The .pkl file can be opened in Python 3.x using the built-in pickle library or the pandas package. Users can load the file with pickle.load() or pandas.read_pickle(), depending on the object structure stored in the file. The file contains serialized Python objects generated during the post-processing of Ichthyop simulation outputs, and requires a Python environment with the appropriate libraries installed for access and analysis.
File: DistanceTraveled_withoutDVM_EN.pkl
Description: Processed results from the dispersal simulation performed in Ichthyop (a Lagrangian tool for simulating ichthyoplankton dynamics) containing the cumulative horizontal distance traveled by each particle throughout its pelagic duration. Scenario without daily vertical migration. The .pkl file can be opened in Python 3.x using the built-in pickle library or the pandas package. Users can load the file with pickle.load() or pandas.read_pickle(), depending on the object structure stored in the file. The file contains serialized Python objects generated during the post-processing of Ichthyop simulation outputs, and requires a Python environment with the appropriate libraries installed for access and analysis.
File: Ecoregions.zip
Description: Shapefile containing the spawning zones and ecoregions considered in the particle dispersion simulation. The Shapefile can be opened in Geographic Information System (GIS) software such as QGIS or ArcGIS Pro. After extracting the contents of the .zip file, users should load the .shp file into the GIS environment using the “Add Layer” or “Add Data” function. The associated files (.dbf, .shx, .prj, and others) must remain in the same directory as the .shp file to ensure correct visualization and attribute access. Once loaded, the shapefile can be visualized as georeferenced vector data representing the spawning zones and ecoregions used in the particle dispersion simulations.
Code/software
The data can be opened and viewed in R and/or Python.
Access information
Other publicly accessible locations of the data:
- N/A
Data was derived from the following sources:
- N/A
