Introduction#
Welcome To OmniSafe Tutorial#
Welcome to OmniSafe in Safe RL! OmniSafe is a comprehensive and reliable benchmark for safe reinforcement learning, encompassing more than 20 different kinds of algorithms covering a multitude of SafeRL domains, and delivering a new suite of testing environments.
Hint
For beginners, it is necessary first to introduce you to Safe RL(Safe Reinforcement Learning). Safe Reinforcement Learning can be defined as the process of learning agent which maximize the expectation of the return on problems as well as ensure reasonably system performance and respect safety constraints during the learning and deployment processes.
This tutorial is useful for reinforcement learning learners of many levels.
For Beginners
If you are a beginner in machine learning with only some simple knowledge of linear algebra and probability theory, you can start with the mathematical fundamentals section of this tutorial.
For Average Users
If you have a general understanding of RL algorithms but are unfamiliar with the concept of Safe RL. This tutorial provides an introduction to it so you can get started quickly.
For Master
If you are already an expert in the field of RL, you can also gain new insights from our systematic introduction to Safe RL algorithms. Also, this tutorial will allow you to design your algorithms using OmniSafe quickly.
Why We Built This#
In recent years, RL (Reinforcement Learning) algorithms, especially Deep RL algorithms have achieved good performance in many tasks. Examples include achieving high scores on Atari games with only visual input, completing complex control tasks in high dimensions, and beating human grandmasters at Go tournaments. However, in the process of strategy updating by RL, the agents often learn cheating or even dangerous behaviors to improve their performance. Such an agent that can quickly achieve high scores differs from our desired result. Therefore, Safe RL algorithms are dedicated to solving the problem of how to train an agent to learn to achieve the desired simultaneously training goal without violating constraints.
However
Even experienced RL researchers have difficulty understanding Safe RL’s algorithms in a short time and quickly programming their implementation.
Therefore, OmniSafe will facilitate the subsequent study of Safe RL by providing both a detailed and systematic introduction to the algorithm and a streamlined and robust code.
Puzzling Math
Safe RL algorithms are a class of algorithms built on a rigorous mathematical system. These algorithms have a detailed theoretical derivation, but they lack a unified symbolic system, which makes it difficult for beginners to learn them systematically and comprehensively.
Hard-to-find Codes
Most of the existing Safe RL algorithms do not have open-source code, making it difficult for beginners to grasp the ideas of the algorithms at the code level, and researchers suffer from incorrect implementations, unfair comparisons, and misleading conclusions.
Friendly Math
OmniSafe tutorial provides a unified and standardized notation system that allows beginners to learn the theory of Safe RL algorithms in a complete and systematic way.
Robust Code
OmniSafe tutorial gives a code-level introduction in each algorithm introduction, allowing learners who are new to Safe RL theory to understand how to relate algorithmic ideas to code, and give experts in the field of Safe RL new insights into algorithm implementation.
Code Design Principles#
Consistent and Inherited
Our code has a complete logic system that allows you to understand the connection between each algorithm and the similarities together with differences. For example, if you understand the Policy Gradient algorithm, then you can learn the PPO algorithm by simply reading the a new function and immediately grasping the code implementation of the PPO algorithm.
Robust and Readable
Our code can play the role of both a tutorial and a tool. If you still need to become familiar with algorithms’ implementations in Safe RL, the highly readable code in OmniSafe can help you get started quickly. You can see how each algorithm performs. If you want to build your algorithms, OmniSafe’s highly robust code can also be an excellent tool!
Long-lived
Unlike other code that relies on a large number of external libraries, OmniSafe minimizes the dependency on third-party libraries. This avoids shortening the life of the project due to iterative changes in third-party library code also optimizes the users experience in installing and using OmniSafe, because they do not have to install lots of dependencies to run OmniSafe.
Before Reading#
Before you start having fun reading the OmniSafe tutorial, we want you to understand the usage of colors in this tutorial. In this tutorial, in general, the light blue boxes indicate mathematically relevant derivations, including but not limited to Theorem, Lemma, Proposition, Corollary, and their proofs, while the green boxes indicate specific implementations, both theoretical and code-based. We give an example below:
You may not yet understand the above theory and the specific meaning of the code, but do not worry, we will make a detailed introduction later in the Constrained Policy Optimization tutorial.
Long-Term Support and Support History#
OmniSafe is mainly developed by the SafeRL research team directed by Prof. Yaodong Yang, Our SafeRL research team members include Borong Zhang , Jiayi Zhou, JTao Dai, Weidong Huang, Ruiyang Sun, Xuehai Pan and Jiamg Ji. If you have any question in the process of using OmniSafe, or if you are willing to make a contribution to this project, don’t hesitate to ask your question in the GitHub issue page, we will reply you in 2-3 working days.