Skip to content Skip to footer

A Basic Model-Free Open-Loop Baseline for Reinforcement Learning Mobility Tasks that Does Not Require Sophisticated Models or Computational Resources

Deep Reinforcement Learning (DRL) is advancing robotic control capabilities, albeit with a rising trend of algorithm complexity. These complexities lead to challenging implementation details, impacting the reproducibility of sophisticated algorithms. This issue, therefore, necessitates the need for simpler machine learning models that are not as computationally demanding.

A team of international researchers from the German Aerospace Center (DLR) RMC, Sorbonne Université CNRS, and TU Delft CoR has proposed an open-loop, model-free baseline for standard locomotion tasks. This method is a response to the complexity issue, providing a simpler alternative without excess use of resources. Although unable to outperform RL algorithms in simulation, the new method offers several advantages for real-world applications, including faster computation, ease of deployment on embedded systems, smooth control outputs, and robustness against sensor noise.

The team employed JAX implementations from Stable-Baselines3 and the RL Zoo framework for the RL baselines. They optimized the oscillators’ parameters using the search space and tested the new method’s effectiveness on MuJoCo v4 locomotion tasks included in the Gymnasium v0.29.1 library.

In terms of comparing against existing deep RL algorithms, the new method was pitted against Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradients (DDPG), and Soft Actor-Critic (SAC). The team used original hyperparameter settings from these methods, except for the swimmer task, where the discount factor was fine-tuned.

Overall, this research aims to challenge the limitations of DRL for robotic applications, driving discussion on complexity and generality costs. Additionally, it explores how open-loop oscillators compare to DRL methods in terms of performance, runtime, parameter efficiency, and resilience. It also investigates how learned policies transfer to a real robot when training without adjustments or reward-based incentives.

The paper states that the open-loop, model-free baseline performs well on locomotion tasks without requiring complex models or computational resources. However, while comparative results show that DRL falls short in terms of sensor noise and failure, the open-loop baseline’s own limitations include sensitivity to disturbances and incapability to recover from potential falls. Moreover, this method manufactures joint positions without leveraging robot states, thus requiring a PD controller to convert positions into torque commands in simulations.

Leave a comment

0.0/5