Spatiotemporal dynamics and machine learning-based prediction of aboveground biomass in the Indus delta mangroves
Abstract
This dataset provides spatially explicit estimates of mangrove aboveground biomass (AGB) and associated environmental variables for the Indus Delta mangrove ecosystem. Field-based AGB spatial data were derived from the NASA CMS Global Mangrove Distribution, Aboveground Biomass, and Canopy Height dataset and used as reference data for model development. Multisource remote sensing data, including Sentinel-1 and Sentinel-2 optical imagery, were processed to extract predictor variables such as vegetation indices and surface characteristics. Additional environmental variables, including land surface temperature and land use/land cover, were incorporated to capture ecological controls on biomass distribution.
All satellite datasets underwent standard preprocessing steps, including atmospheric correction, radiometric calibration, cloud masking, and spatial resampling. The processed variables were then integrated into machine learning models (Random Forest, Gradient Boosted Regression Tree, Support Vector Regression, and Classification and Regression Trees) to estimate AGB across the study region.
The best-performing model (Gradient Boosted Regression Tree) was used to generate spatially explicit AGB maps and future projections for 2030, 2040, and 2050. Model outputs were exported as point-based datasets containing geographic coordinates and biomass values, along with corresponding spatial layers for mapping and analysis.
Dataset DOI: 10.5061/dryad.h44j0zq1m
1. README File Description
This dataset contains spatially explicit predictions of aboveground biomass (AGB) for coastal mangrove ecosystems under future climate scenarios (2030, 2040, and 2050). Predictions were generated using multiple machine learning algorithms, including:
- CART (Classification and Regression Trees)
- RF (Random Forest)
- SVR (Support Vector Regression)
- XGBoost
The dataset also includes shapefiles defining the study area and buffer zones used in spatial analysis.
2. Description of File Types
This dataset includes several file formats that serve different purposes. The .txt files are plain text tabular data files containing spatial prediction outputs, including coordinates and predicted aboveground biomass (AGB) values. These files can be opened using software such as R, Python, or LibreOffice Calc. Each .txt file may be accompanied by a corresponding .xml file, which provides structured metadata describing the dataset, including variable definitions and file structure, and supports interoperability with GIS and data systems. The schema.ini file is a configuration file that defines how the .txt files should be interpreted, including column formats and data types, ensuring consistent data reading. The dataset also includes shapefiles, where the .shp file contains geographic geometry, the .dbf file stores attribute data, the .shx file serves as an index, and the .prj file defines the coordinate reference system.
File Name: Data.zip
Subfolders:
- AGB_Predictions
- Shape_Files
3. Files and variables
The dataset is organized as follows:
Data/
│
├── AGB_Predictions/
│ ├── CART_2030.txt
│ ├── CART_2040.txt
│ ├── CART_2050.txt
│ ├── RF_2030.txt
│ ├── RF_2040.txt
│ ├── RF_2050.txt
│ ├── SVR_2030.txt
│ ├── SVR_2040.txt
│ ├── SVR_2050.txt
│ ├── XGBoost_2030.txt
│ ├── XGBoost_2040.txt
│ ├── XGBoost_2050.txt
│ ├── *.xml
│ └── schema.ini
│
└── Shape_Files/
├── IDMG.shp
├── IDMG.dbf
├── IDMG.shx
├── IDMG.prj
├── IDMG_Buffer.shp
├── IDMG_Buffer.dbf
├── IDMG_Buffer.shx
├── IDMG_Buffer.prj
2.1 Schema File Description
The schema.ini file defines the structure and formatting of the tabular .txt files in the AGB_Predictions folder. It specifies how columns are interpreted by compatible software, ensuring consistent data access and preventing misinterpretation of data types.
2.2 XML Metadata Files
Each .txt file in the AGB_Predictions folder is accompanied by an .xml metadata file. These files provide structured metadata describing dataset attributes, variable definitions, and file structure. They are included to enhance data interoperability and support usage in GIS and metadata-aware systems.
2.3 Shapefile Description
- IDMG.shp: Defines the study area boundary
- IDMG_Buffer.shp: Represents the buffer zone used in spatial analysis
Each shapefile includes associated files (.dbf, .shx, .prj) required for proper functionality.
4. File Naming Convention
All prediction files follow the format:
[Model]_[Year].txt
Where:
- Model = CART, RF, SVR, XGBoost
- Year = 2030, 2040, 2050
Example:
- RF_2030.txt → Random Forest prediction for 2030
- XGBoost_2050.txt → XGBoost prediction for 2050
5. Data Description (Variables and Units)
Each .txt file contains spatial prediction data with the following variables:
| Variable | Description | Unit |
| X | Longitude coordinate | Decimal degrees |
| Y | Latitude coordinate | Decimal degrees |
| AGB | Predicted aboveground biomass | Mg ha⁻¹ |
6. Code/software
The dataset can be accessed and visualized using the following free and open-source software:
Tabular Data (.txt)
- R
- Python
- LibreOffice Calc
Spatial Data (.shp)
- QGIS
- ArcGIS
7. Missing Values
Missing or undefined values (if present) are represented as:
- NA or blank cells
Data Usage Notes
- All shapefile components must remain in the same directory to function correctly
- Coordinate reference system (CRS) information is (GCS_WGS_1984)
- Prediction outputs are model-derived and subject to uncertainty associated with model assumptions
8. Contact Information
Muhammad Naveed
nawd38@scbg.ac.cn
