# Introduction#

## Welcome To OmniSafe Tutorial#

Welcome to OmniSafe in Safe RL! OmniSafe is an infrastructural framework designed to accelerate safe reinforcement learning (RL) research. It provides a comprehensive and reliable benchmark for safe RL algorithms, and also an out-of-box modular toolkit for researchers. Safe RL intends to develop algorithms that minimize the risk of unintended harm or unsafe behavior.

Hint

**Safe Reinforcement Learning** can be defined as the process of learning policies that maximize the expectation of the return in problems
in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes.

**This tutorial is useful for RL learners of roughly three levels.**

For Beginners

If you have only basic knowledge of linear algebra and probability theory and are new to machine learning, we recommend starting with the mathematical fundamentals section of this tutorial.

For Average Users

If you have a general understanding of RL algorithms but need to familiarize yourself with Safe RL, this tutorial introduces some classic Safe RL algorithms to you so you can get started quickly.

For Experts

If you are already an expert in the field of RL, our tutorial can still offer you new insights with its systematic introduction to Safe RL algorithms. Furthermore, it will enable you to quickly design your own algorithms.

## Why We Built This#

In recent years, RL (Reinforcement Learning) algorithms, especially Deep RL algorithms, have demonstrated remarkable performance in various tasks. Notable examples include:

Hint

Achieving high scores on Atari using only visual input.

Completing complex control tasks in high dimensions.

Beating human grandmasters at Go tournaments.

However, in the process of policy updating in reinforcement learning (RL),
agents sometimes learn to **engage in cheating or even dangerous behaviors** in
order to improve their performance. While these agents may achieve high scores
rapidly, their actions may not align with the desired outcome.

Therefore, Safe RL algorithms aim to train agents to achieve desired goals while adhering to constraints and safety requirements, addressing the challenge of maintaining agent safety during the policy updating process in RL. The primary objective of safe RL algorithms is to ensure that agents maintain safety and avoid behaviors that could lead to negative consequences or violate predefined constraints.

However

Even experienced researchers in RL may face challenges when it comes to quickly grasping the intricacies of Safe RL algorithms and efficiently programming their implementations.

To address this issue, OmniSafe aims to provide a comprehensive and systematic introduction to Safe RL algorithms, along with streamlined and robust code, making it easier for researchers to delve into Safe RL.

Puzzling Math

Safe RL algorithms belong to a class of rigorously grounded algorithms that are built upon a strong mathematical foundation. While these algorithms possess detailed theoretical derivations, their lack of a unified symbolic system can pose challenges for beginners in terms of systematic and comprehensive learning.

Hard-to-find Codes

Most of the existing Safe RL algorithms do not have **open-source**
code available, which makes it difficult for beginners to
understand the algorithms at the code level. Furthermore,
researchers may encounter issues such as incorrect implementations,
unfair comparisons, and misleading conclusions, which could have
been avoided with open-source code.

Friendly Math

The OmniSafe tutorial offers a **standardized notation system** that
enables beginners to acquire a complete and systematic
understanding of the theory behind Safe RL algorithms.

Robust Code

The OmniSafe tutorial provides **a comprehensive introduction** to
each algorithm, including a detailed explanation of the code
implementation. Beginners can easily understand the connection
between the algorithmic concepts and the code, while experts can
gain valuable insights into Safe RL by studying the code-level
details of each algorithm.

## Code Design Principles#

Consistent and Inherited

Our code follows a comprehensive and logical system, enabling users to understand the interconnection between each algorithm. For instance, if one comprehends the Policy Gradient algorithm, they can quickly grasp the code implementation of the PPO algorithm by reading a new function.

Robust and Readable

Our code not only serves as a tutorial but also as a practical tool.
For those who want to learn about the implementation of Safe RL
algorithms, the highly readable code in OmniSafe provides an easy and
quick way to get started. For those who
want to develop their algorithms, OmniSafe’s **highly modular and
reusable** code can be an excellent resource.

Long-lived

Unlike other codes that heavily rely on external libraries, OmniSafe minimizes its dependency on third-party libraries. This design prevents the project from becoming obsolete due to changes in the third-party library code, and optimizes the user experience by reducing the number of dependencies that need to be installed to run OmniSafe.

## Before Reading#

Before you start reading the OmniSafe tutorial, we want you to understand the usage of colors in this tutorial.

In this tutorial, in general, the light blue boxes indicate mathematically relevant derivations, including but not limited to Theorem, Lemma, Proposition, Corollary, and their proofs, while the green boxes indicate specifically implementations, both theoretical and code-based. We give an example below:

You may not yet understand the above theory and the specific meaning of the code, but do not worry, we will make a detailed introduction later in the Constrained Policy Optimization tutorial.

## Citing OmniSafe#

If you find OmniSafe useful or use OmniSafe in your research, please cite it in your publications.

```
@article{omnisafe,
title = {OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research},
author = {Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, Yaodong Yang},
journal = {arXiv preprint arXiv:2305.09304},
year = {2023}
}
```

## Long-Term Support and Support History#

**OmniSafe** is mainly developed by the Safe RL research team directed by Prof. Yaodong Yang.
Our Safe RL research team members include Borong Zhang , Jiayi Zhou, JTao Dai, Weidong Huang, Ruiyang Sun, Xuehai Pan and Jiaming Ji.
If you have any questions in the process of using OmniSafe, or if you are
willing to contribute to
this project, don’t hesitate to ask your question on the GitHub issue page, we will reply to you in 2-3 working days.