Data and code from: Deep reinforcement learning for pressure optimization in water distribution networks with multiple pumping stations: Case study
Data files
Oct 31, 2025 version files 5.57 MB
-
DataSetR02.zip
5.55 MB
-
README.md
17.41 KB
Abstract
This dataset provides the complete code, model files, and tabular data used to train and evaluate a deep reinforcement learning (DRL) agent for pressure optimization in large-scale water distribution networks with multiple pumping stations. It includes ten Python scripts (for environment definition, training, testing, and evaluation), calibrated EPANET hydraulic model files, and eight Excel workbooks containing control parameters, diurnal demand data, synthetic and observed evaluation sets, and energy-performance summaries. The dataset enables full reproduction of the case-study results and supports reuse for developing alternative DRL algorithms or benchmarking water-network optimization methods. The repository is self-contained and can be executed using Python 3.10 with the package versions specified herein.
Overview
This README file provides detailed documentation for the dataset Deep Reinforcement Learning for Pressure Optimization in Water Distribution Networks with Multiple Pumping Stations: Case Study, ASCE Journal of Water Resources Planning and Management. It describes the folder structure, variable definitions, software environment, and step-by-step instructions required to reproduce the analyses and results. The abstract for this dataset is provided separately on the Dryad record page.
1. Directory Structure and File Descriptions
DataSetR02.zip/
│
├── Python Scripts
│ ├── env_001.py → Custom Gym environment for multi-pump network.
│ ├── evaluate_001.py → Evaluates trained SAC agent on validation sets.
│ ├── Evaluation_conventional_001.py → Baseline evaluation using fixed setpoints.
│ ├── Evaluation_data_collection_001.py→ Collects training and evaluation data.
│ ├── Evaluation_diurnal_pattern_001.py→ Evaluation under diurnal demand variations.
│ ├── Evaluation_synthetic_DRL_001.py → Evaluation with synthetic demand profiles.
│ ├── reward_tracker.py → Tracks training rewards and performance metrics.
│ ├── SAC_Training_001.py → Main training script using Soft Actor–Critic (SAC).
│ ├── Sys_001.py → Hydraulic solver and reward function.
│ ├── Testing_001.py → Tests trained agent on individual scenarios.
│ ├── Tool_001.py → Predicts optimal setpoint for given flow input.
│ └── sac_water_networkR3.zip → Pre-trained SAC agent weights and replay buffer (PyTorch .zip, ~40 MB).
│
├── EPANET_FILES/
│ ├── network.inp → EPANET input file (network structure: nodes, pipes, pumps, reservoirs).
│ ├── network.rpt → EPANET report (simulation summary).
│ └── network.out → Binary output for data-driven processing.
│
├── Input/
│ ├── Input.xlsx → Primary configuration and control-variable dataset.
│ ├── Input_drnl.xlsx → Diurnal flow pattern input for time-of-day variation training.
│ ├── Input_evaluate.xlsx → Field-scale validation dataset with observed values.
│ └── Input_synt.xlsx → Synthetic data for generalisation and sensitivity analysis.
│
├── excel/
│ ├── Energy_Evaluation.xlsx
│ ├── Energy_Evaluation_synthetic.xlsx
│ ├── output_20250727_083046.xlsx
│ ├── output_conventional_20250716_164258.xlsx
│ ├── output_diurnal_pattern_20250728_093415.xlsx
│ ├── output_synthetic_DRL20250929_181553.xlsx
│ └── output_syth_conventional_20250929_183011.xlsx
│
├── logs/ → TensorBoard training logs (events.out.tfevents.*)
└── plot/
├── combined_plot_testing_2025-07-27_083401.png → Predicted vs observed pressures.
└── training_record_2025-07-27_082501.png → SAC training convergence plot.
2. Excel Data Details
2.1 Input.xlsx
Defines control and training parameters for the DRL environment.
| Column | Description |
|---|---|
| Junction | Junction identifier (EPANET node ID). |
| Demand | Hydraulic demand at the node (L/s). |
| target | Desired pressure (m). |
| target_max | Maximum allowable pressure (m). |
| setting_max_range | Upper range of controllable setpoint for each pump (comma-separated). |
| setting_min_range | Lower range of controllable setpoint for each pump (comma-separated). |
| weight_error_min | Weight applied to pressure undershoot error. |
| weight_error_max | Weight applied to pressure overshoot error. |
| error_lim | Permissible deviation from target pressure. |
| sigma | Standard deviation factor for reward scaling. |
| error_max_te | Maximum tolerated error for termination. |
| reward_max | Upper reward bound for ideal operation. |
| reward_min | Lower reward bound for worst-case operation. |
| flow_max | Maximum expected flow rate (L/s). |
| flow_max_setting | Maximum flow at design pressure. |
| setting_range_1 | Control range for pump group 1. |
| setting_range_2 | Control range for pump group 2. |
| setting_range_3 | Control range for pump group 3. |
| setting_max | Overall maximum setpoint for all pumps. |
| flow_range | Flow variability factor (ratio form). |
| flow_max_range | Maximum flow range in scenario analysis. |
| flow_min_range | Minimum flow range in scenario analysis. |
2.2 Input_drnl.xlsx
Represents randomised diurnal demand patterns used for training and testing the DRL agent. Each row corresponds to a normalized daily flow pattern for a pump station or demand scenario.
| Column | Description |
|---|---|
| TP | Identifier for the time pattern or station. |
| 1–25 | Normalized flow ratios (0–1) corresponding to hourly intervals from hour 0 to 24, representing a full diurnal cycle (the 25th column repeats hour 0 for continuity). |
Note:
The diurnal profiles were derived and slightly randomized from field demand data to ensure generalisation and prevent overfitting to a single pattern.
2.3 Input_evaluate.xlsx
Contains evaluation and validation data used to assess the DRL model’s performance under observed operating conditions. Each row corresponds to one network junction or evaluation scenario.
| Column | Description |
|---|---|
| Junction | Identifier of the junction or node in the EPANET model. |
| Demand | Reference demand used for simulation (m³/hr). |
| Demand_actu | Actual or measured demand during validation (m³/hr). |
| Pressure_out | Simulated or observed outlet pressure at the junction (m). |
| Setting_1 | Control setting for Pump Group 1 (e.g., pressure setpoint or speed). |
| Setting_2 | Control setting for Pump Group 2. |
| Setting_3 | Control setting for Pump Group 3 (the second Setting_2 column represents this parameter). |
| Flow_1 | Flow rate through Pump 1 (m³/hr). |
| Flow_2 | Flow rate through Pump 2 (m³/hr). |
| Flow_3 | Flow rate through Pump 3 (m³/hr). |
| Flow_range_max | Maximum flow range used for this scenario (m³/hr). |
| Flow_range_min | Minimum flow range used for this scenario (m³/hr). |
| Max_flow | Maximum expected flow limit for evaluation (m³/hr). |
| Mini_flow | Minimum expected flow limit for evaluation (m³/hr). |
Note:
This dataset was used to validate the DRL agent’s predictions by comparing simulated and actual performance across multiple flow and pressure conditions.
2.4 Input_synt.xlsx
Represents synthetic and randomised demand datasets generated to test the generalisation capability of the trained DRL agent under a wide range of operating conditions.
| Column | Description |
|---|---|
| TP | Identifier for the synthetic time pattern or random test case. |
| 1–100 | Randomised demand samples representing diverse network conditions. |
Note:
Each row defines a synthetic or randomised demand pattern of 100 points, ensuring the model’s robustness and stability during unseen operational scenarios.
2.5 Energy_Evaluation.xlsx and Energy_Evaluation_synthetic.xlsx
These files contain hourly energy and hydraulic performance data from the evaluation of the DRL-based and conventional control strategies. Each row represents one hourly time step during a 24-hour simulation.
| Column | Description |
|---|---|
| Hour | Simulation hour (0–24). |
| Setting_1 | Pump Group 1 control setting or discharge pressure setpoint. |
| Setting_2 | Pump Group 2 control setting or discharge pressure setpoint. |
| Setting_3 | Pump Group 3 control setting or discharge pressure setpoint. |
| Mini_Pressure | Minimum pressure in the network during the hour (m). |
| Max_Pressure | Maximum pressure in the network during the hour (m). |
| Flow_1 | Flow through Pump 1 (m³/hr). |
| Flow_2 | Flow through Pump 2 (m³/hr). |
| Flow_3 | Flow through Pump 3 (m³/hr). |
Note:
- Both files summarize the energy and hydraulic performance for DRL-based (synthetic) and conventional (actual) control cases.
- Data were used to quantify energy savings, pressure stability, and compliance across diurnal and synthetic operating conditions.
2.6 Output_*.xlsx
Each file represents time-series or batch evaluation results for a specific simulation condition (e.g., diurnal, synthetic, or conventional control). The datasets record the DRL agent’s control actions, resulting pressures, and network responses across all demand management areas (DMAs).
| Column | Description |
|---|---|
| Serial_number | Sequential identifier for each test or evaluation scenario. |
| flow_range_fact | Flow scaling factor representing demand intensity (dimensionless). |
| Max_steps | Maximum number of simulation or control steps executed during the scenario. |
| Setting_1 | Pump Group 1 control setting or discharge pressure (m). |
| Setting_2 | Pump Group 2 control setting or discharge pressure (m). |
| Setting_3 | Pump Group 3 control setting or discharge pressure (m). |
| Mini_Pressure | Minimum network pressure recorded during the scenario (m). |
| Max_Pressure | Maximum network pressure recorded during the scenario (m). |
| Flow_1 | Flow through Pump 1 (m³/hr). |
| Flow_2 | Flow through Pump 2 (m³/hr). |
| Flow_3 | Flow through Pump 3 (m³/hr). |
| AD-06 – HUDAY-NETWORK, AD-18, etc. | Individual DMA or network sector pressures (m) at each location (e.g., AD-06, AD-11, ER-01, HUDAY-NETWORK). Columns correspond to the monitored DMAs in the hydraulic model. |
Note:
Each output file (e.g., output_20250727_083046.xlsx, output_synthetic_DRL20250929_181553.xlsx) includes results from simulations under different operating scenarios — such as conventional, synthetic, or diurnal control cases. The data were used to assess the DRL agent’s performance in maintaining target pressures while minimizing energy consumption across all DMAs.
3. Software Environment and Execution
| Package | Version | Purpose |
|---|---|---|
| Python | 3.10.13 | Programming environment |
| stable-baselines3 | 2.2.1 | Reinforcement learning algorithms (SAC, PPO, TD3) |
| gymnasium | 0.29.1 | Environment interface |
| epanettools / EPYT | 0.3.0 | EPANET hydraulic simulation toolkit |
| numpy | 1.26.4 | Numerical computation |
| pandas | 2.2.2 | Tabular data handling |
| matplotlib | 3.9.2 | Plotting |
| torch | 2.2.2 | Deep learning framework |
| tensorboard | 2.16.2 | Training visualization |
Tested on: Windows 11 (64-bit) operating system.
To reproduce results:
Use the SAC_Training_001.py script to retrain and regenerate the agent file (sac_water_networkR3.zip). The trained agent can then be evaluated using the provided input datasets to reproduce the results reported in the study. When applying the workflow to other hydraulic models, SAC_Training_001.py must be adapted to reflect the specific observation and action spaces of those models. All relevant hyperparameters should also be tuned to suit the new configuration.
4. Important Notes
This README contains all information required for independent reuse and re-analysis of the dataset without referring to the published article.
- Folder hierarchy must remain unchanged for scripts to run correctly.
- The dataset is self-contained and can be interpreted independently of the associated publication.
- Minor stochastic differences in results are expected when retraining.
- All data were aggregated and anonymized to remove sensitive information.
- TensorBoard logs can be visualized with
tensorboard --logdir=logs/SAC_3/. - Datasets cover Diurnal, Synthetic, and Conventional scenarios used in the article.
