To realize the potential of autonomous underwater robots that scale up our observational capacity in the ocean, new approaches and techniques are needed. Fleets of autonomous robots could be used to study complex marine systems and animals with either new imaging configurations or by tracking tagged animals to study their behavior. These activities can then inform and create new policies for community conservation. The role of animal connectivity via active movement of animals represents a major knowledge gap related to the distribution of deep ocean populations. Tracking underwater targets represents a major challenge for observing biological processes in situ, and methods to robustly respond to a changing environment during monitoring missions are needed. Analytical techniques for optimal sensor placement and path planning to locate underwater targets are not straightforward in such cases. The aim of this study is to investigate the use of deep reinforcement learning as a tool for range-only underwater target tracking optimization, whose promising capabilities have been demonstrated in terrestrial scenarios. To evaluate its usefulness, a reinforcement learning method was implemented as a path planning system for an autonomous surface vehicle while tracking an underwater mobile target. A complete description of an open-source model, performance metrics in simulated environments, and evaluated algorithms based on more than 15 hours of at-sea field experiments are presented. These efforts demonstrate that deep reinforcement learning is a powerful approach that enhances the abilities of autonomous robots in the ocean and encourages the deployment of algorithms like these for monitoring marine biological systems in the future.
Deep Reinforcement Learning methods for Underwater Target Tracking
This is a set of tools developed to train an agent (and multiple agents) to find the optimal path to localize and track a target (and multiple targets).
The deep Reinforcement Learning (RL) algorithms implemented are:
- Deep Deterministic Policy Gradient (DDPG)
- Twin-Delayed DDPG (TD3)
- Soft Actor-Critic (SAC)
The environment to train the agents is based on the OpenAI Particle.
The main objective is to find the optimal path that an autonomous vehicle (e.g. autonomous underwater vehicles (AUV) or autonomous surface vehicles (ASV)) should follow in order to localize and track an underwater target using range-only and single-beacon algorithms. The target estimation algorithms implemented are based on:
- Least Squares (LS)
- Particle Filter (PF)
More information at this Github repository: https://github.com/imasmitja/RLforUTracking