Reaching the limit in autonomous racing: Optimal control versus reinforcement learning

Song, Yunlong 1 ; Romero Aguilar, Angel1 ; Müller, Matthias2 ; Koltun, Vladlen3 ; Scaramuzza, Davide1

Published Oct 30, 2023 on Dryad. https://doi.org/10.5061/dryad.3tx95x6md

Data files

Oct 30, 2023 version files 166.94 MB

README.md

6.37 KB
ScienceRLDataset.zip

166.93 MB

Abstract

A central question in robotics is how to design a control system for an agile mobile robot. This paper studies this question systematically, focusing on a challenging setting: autonomous drone racing. We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods in this setting. We then investigated which fundamental factors have contributed to the success of RL or have limited OC. Our study indicates that the fundamental advantage of RL over OC is not that it optimizes its objective better but that it optimizes a better objective. OC decomposes the problem into planning and control with an explicit intermediate representation, such as a trajectory, that serves as an interface. This decomposition limits the range of behaviors that can be expressed by the controller, leading to inferior control performance when facing unmodeled effects. In contrast, RL can directly optimize a task-level objective and can leverage domain randomization to cope with model uncertainty, allowing the discovery of more robust control responses. Our findings allowed us to push an agile drone to its maximum performance, achieving a peak acceleration greater than 12 times the gravitational acceleration and a peak velocity of 108 kilometers per hour. Our policy achieved superhuman control within minutes of training on a standard workstation. This work presents a milestone in agile robotics and sheds light on the role of RL and OC in robot control.

The structure of the dataset shows are following:
(1) RLvsOC: This folder contrains the simulation experiment for the comparison between optimal control and reinforcement learning.
- MPC: Trajectory tracking using an nonlinear model predictive control (MPC). It contains 50 trials, indicated by the folder name, from trial_0 to trial_49. Inside each folder, there are 4 different csv files:
- *ref_bem.csv* is the reference trajectory in BEM simulation
- *ref_nominal.csv* is the reference trajectory in the nominal simulation
- *traj_bem.csv* is the executed trajectory in the BEM simulation
- *traj_nominal.csv* is the executed trajectory in the nominal simulation
- MPCC: Path following using a model predictive contouring control (MPCC). It also contains 50 trials, indicated by the folder name. Inside each folder, there are 4 different csv fildes:
- *ref_bem.csv* is the reference path in BEM simulation
- *ref_nominal.csv* is the reference path in the nominal simulation
- *traj_bem.csv* is the executed trajectory in the BEM simulation
- *traj_nominal.csv* is the executed trajectory in the nominal simulation
- PPO: Gate progress maximization using proximal policy optimization (PPO). It contains two csv files. Since we use parallezied environment, each file contains all the 50 trials, the trail number is indicated by the *env_id* inside the csv. PPO does not require reference trajectory or path.
- *bem_traj.csv* is the executed trajectory in the BEM simulation
- *nominal_traj.csv* is the executed trajectory in the nominal simulation
(2) HumanPilots_MarvTrack_6sDrone: This folder contains the best trajectory flew by the three professional human pilots. All human pilot uses a 6s Drone.
- best_pilot1.csv: the best lap (trajectory) by Pilot 1 (Alex Vanover)
- best_pilot2.csv: the best lap (trajectory) by Pilot 2 (Thomas Bitmatta)
- best_pilot3.csv: the best lap (trajectory) by Pilot 3 (Marvin Schaepper)
(3) OptimizationMethods: This folder contains the data for the comparison between MPC and RL for trajectory tracking
- MPC: Using an nonlinear MPC to track a time-optimal trajectory. *mpc_bem_* means the experiment in the BEM simulation, *mpc_nominal_* indicates the experiment in the nominal simulation. *_ref.csv* is the reference trajectory, *_traj.csv* is the executed trajectory.
- PPO: Using a reinforcement learning controller to track the same time-optimal trajectory. *rl_bem_* means the experiment in the BEM simulation, *rl_nominal_* indicates the experiment in the nominal simulation. *_ref.csv* is the reference trajectory, *_traj.csv* is the executed trajectory.
(4) OptimizationObjectives: This folder contains the data for the comparison between optimization objectives using RL.
- PPO_Racing: Using a reinforcement learning controller to optimize a gate progress for drone racing. *rl_bem_traj.csv* means the experiment in the BEM simulation, *rl_nominal_traj.csv* indicates the experiment in the nominal simulation.
- PPO_Tracking: Using a reinforcement learning controller to track a time-optimal trajectory. *rl_bem_* means the experiment in the BEM simulation, *rl_nominal_* indicates the experiment in the nominal simulation. *_ref.csv* is the reference trajectory, *_traj.csv* is the executed trajectory.
(5) RL_MarvTrack_6sDrone: This folder contains the best trajectory (best.csv) using the 6s drone in the Marv track for the competition.
(6) RL_SplitS_6sDrone: This folder contains the real-world experiment data using the 6s drone in the SplitS track. It has 5 different folders, run with different batteries. The name of each folder indicates the experiment time, such as date and time. Inside each folder, it contains two files, one is the battery log (battery.csv) and another one is the vehicle states (states.csv).

(7) README.md: a copy of the readme.md

==============================CSV headers================================================
The column headers for the CSV files are explained as follows:
- t: Time in seconds (s). simulation timestamp.
- px, py, pz: Position coordinates in meters (m) in 3D space, representing the x, y, and z positions.
- qw, qx, qy, qz: Quaternion components which are unitless. Quaternions represent rotations in 3D space with the four components being w, x, y, and z.
- vx, vy, vz: Velocity in meters per second (m/s) for the x, y, and z directions.
- omex, omey, omez: Angular velocities in radians per second (rad/s) around the x, y, and z axes.
- accx, accy, accz: Accelerations in meters per second squared (m/s^{2) for the x, y, and z directions.

- taux, tauy, tauz: Torques (or moments) in Newton meters (N·m) around the x, y, and z axes.

- jerkx, jerky, jerkz: Jerk values in meters per second cubed (m/s}3), representing the rate of change of acceleration in the x, y, and z directions.
- snapx, snapy, snapz: Snap values in meters per second to the fourth power (m/s^4) for the x, y, and z directions.
- bomex, bomey, bomez: (placeholder only, not used, the value can be ignored)
- baccx, baccy, baccz: (placeholder only, not used, the value can be ignored)
- mot1, mot2, mot3, mot4: (placeholder only, not used, the value can be ignored)
- motdex1, motdex2, motdex3, motdex4: (placeholder only, not used, the value can be ignored)
- f1, f2, f3, f4: (N, newton). forces associated with each of the four motors.
- env_id: Environment ID for parallelized simulation in Flightmare, no unit
- done: a binary value indicating completion. 0 - not done, 1 - done
- flightmode: The mode of flight (0 - flying mode, 1 - Passing the gate, 2 - crashing into the gate, 3 -- crashing into the tube, 4 - crashing on the ground, 5 - crashing into the world box, 6 - episode done, 7 - inside the gate), no unit
- flightlap: The lap number of the flight test.
- laptime: (seconds, s) Time taken for a specific lap.
- reward: A metric used in a reinforcement learning setup\, indicating the reward for a particular action given a state. no unit
- value: A general value that represents the output of the state value function, no unit.

Reaching the limit in autonomous racing: Optimal control versus reinforcement learning

Data files

Abstract

README

Methods

Works referencing this dataset