OmniSafe Critic#
Base Critic#
Documentation
- class omnisafe.models.base.Critic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#
A abstract class for critic.
A critic approximates the value function that maps observations to values. Critic is parameterized by a neural network that takes observations as input, (Q critic also takes actions as input) and outputs the value of the observation.
Note
Omnisafe provides two types of critic: Q critic (Input =
observation
+action
, Output =value
), and V critic (Input =observation
, Output =value
). You can also use this class to implement your own actor by inheriting it.Initialize the base critic.
- Parameters:
obs_space (OmnisafeSpace) – observation space.
act_space (OmnisafeSpace) – action space.
hidden_sizes (list) – hidden layer sizes.
activation (Activation, optional) – activation function. Defaults to ‘relu’.
weight_initialization_mode (InitFunction, optional) – weight initialization mode. Defaults to ‘kaiming_uniform’.
shared (nn.Module, optional) – shared module. Defaults to None.
|
Implementation of CriticBuilder |
|
Implementation of QCritic. |
|
Implementation of VCritic. |
Critic Builder#
Documentation
- class omnisafe.models.critic.CriticBuilder(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#
Implementation of CriticBuilder
Note
A
CriticBuilder
is a class for building a critic network. Inomnisafe
, instead of building the critic network directly, we build it by integrating various types of critic networks into theCriticBuilder
. The advantage of this is that each type of critic has a uniform way of passing parameters. This makes it easy for users to use existing critics, and also facilitates the extension of new critic types.Initialize CriticBuilder.
- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (List[int]) – Hidden sizes of the critic network.
activation (Activation) – Activation function.
weight_initialization_mode (InitFunction) – Weight initialization mode.
num_critics (int) – Number of critics.
use_obs_encoder (bool) – Whether to use observation encoder, only used in q critic.
Q Critic#
Documentation
- class omnisafe.models.critic.QCritic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#
Implementation of QCritic.
A Q-function approximator that uses a multi-layer perceptron (MLP) to map observation-action pairs to Q-values. This class is an inherit class of
Critic
. You can design your own Q-function approximator by inheriting this class orCritic
.Initialize the critic network.
The Q critic network has two modes:
Hint
use_obs_encoder
=False
:The input of the network is the concatenation of the observation and action.
use_obs_encoder
=True
:The input of the network is the concatenation of the output of the observation encoder and action.
For example, in
DDPG
, the action is not directly concatenated with the observation, but is concatenated with the output of the observation encoder.Note
The Q critic network contains multiple critics, and the output of the network :meth`forward` is a list of Q-values. If you want to get the single Q-value of a specific critic, you need to use the index to get it.
- Parameters:
obs_space (OmnisafeSpace) – observation space.
act_space (OmnisafeSpace) – action space.
hidden_sizes (list) – list of hidden layer sizes.
activation (Activation) – activation function.
weight_initialization_mode (InitFunction) – weight initialization mode.
shared (nn.Module) – shared network.
num_critics (int) – number of critics.
use_obs_encoder (bool) – whether to use observation encoder.
- forward(obs, act)[source]#
Forward function.
As a multi-critic network, the output of the network is a list of Q-values. If you want to use it as a single-critic network, you only need to set the
num_critics
parameter to 1 when initializing the network, and then use the index 0 to get the Q-value.- Parameters:
obs (torch.Tensor) – Observation.
act (torch.Tensor) – Action.
- Return type:
List
[Tensor
]
V Critic#
Documentation
- class omnisafe.models.critic.VCritic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1)[source]#
Implementation of VCritic.
A V-function approximator that uses a multi-layer perceptron (MLP) to map observations to V-values. This class is an inherit class of
Critic
. You can design your own V-function approximator by inheriting this class orCritic
.Initialize the critic network.
- Parameters:
obs_dim (int) – Observation dimension.
act_dim (int) – Action dimension.
hidden_sizes (list) – Hidden layer sizes.
activation (Activation) – Activation function.
weight_initialization_mode (InitFunction) – Weight initialization mode.
shared (nn.Module) – Shared network.