OmniSafe Critic#
Base Critic#
Documentation
- class omnisafe.models.base.Critic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#
A abstract class for critic.
A critic approximates the value function that maps observations to values. Critic is parameterized by a neural network that takes observations as input, (Q critic also takes actions as input) and outputs the value of the observation.
Note
Omnisafe provides two types of critic: Q critic (Input =
observation+action, Output =value), and V critic (Input =observation, Output =value). You can also use this class to implement your own actor by inheriting it.Initialize the base critic.
- Parameters:
obs_space (OmnisafeSpace) – observation space.
act_space (OmnisafeSpace) – action space.
hidden_sizes (list) – hidden layer sizes.
activation (Activation, optional) – activation function. Defaults to ‘relu’.
weight_initialization_mode (InitFunction, optional) – weight initialization mode. Defaults to ‘kaiming_uniform’.
shared (nn.Module, optional) – shared module. Defaults to None.
|
Implementation of CriticBuilder |
|
Implementation of QCritic. |
|
Implementation of VCritic. |
Critic Builder#
Documentation
- class omnisafe.models.critic.CriticBuilder(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#
Implementation of CriticBuilder
Note
A
CriticBuilderis a class for building a critic network. Inomnisafe, instead of building the critic network directly, we build it by integrating various types of critic networks into theCriticBuilder. The advantage of this is that each type of critic has a uniform way of passing parameters. This makes it easy for users to use existing critics, and also facilitates the extension of new critic types.Initialize CriticBuilder.
- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (List[int]) – Hidden sizes of the critic network.
activation (Activation) – Activation function.
weight_initialization_mode (InitFunction) – Weight initialization mode.
num_critics (int) – Number of critics.
use_obs_encoder (bool) – Whether to use observation encoder, only used in q critic.
Q Critic#
Documentation
- class omnisafe.models.critic.QCritic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#
Implementation of QCritic.
A Q-function approximator that uses a multi-layer perceptron (MLP) to map observation-action pairs to Q-values. This class is an inherit class of
Critic. You can design your own Q-function approximator by inheriting this class orCritic.Initialize the critic network.
The Q critic network has two modes:
Hint
use_obs_encoder=False:The input of the network is the concatenation of the observation and action.
use_obs_encoder=True:The input of the network is the concatenation of the output of the observation encoder and action.
For example, in
DDPG, the action is not directly concatenated with the observation, but is concatenated with the output of the observation encoder.Note
The Q critic network contains multiple critics, and the output of the network :meth`forward` is a list of Q-values. If you want to get the single Q-value of a specific critic, you need to use the index to get it.
- Parameters:
obs_space (OmnisafeSpace) – observation space.
act_space (OmnisafeSpace) – action space.
hidden_sizes (list) – list of hidden layer sizes.
activation (Activation) – activation function.
weight_initialization_mode (InitFunction) – weight initialization mode.
shared (nn.Module) – shared network.
num_critics (int) – number of critics.
use_obs_encoder (bool) – whether to use observation encoder.
- forward(obs, act)[source]#
Forward function.
As a multi-critic network, the output of the network is a list of Q-values. If you want to use it as a single-critic network, you only need to set the
num_criticsparameter to 1 when initializing the network, and then use the index 0 to get the Q-value.- Parameters:
obs (torch.Tensor) – Observation.
act (torch.Tensor) – Action.
- Return type:
List[Tensor]
V Critic#
Documentation
- class omnisafe.models.critic.VCritic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1)[source]#
Implementation of VCritic.
A V-function approximator that uses a multi-layer perceptron (MLP) to map observations to V-values. This class is an inherit class of
Critic. You can design your own V-function approximator by inheriting this class orCritic.Initialize the critic network.
- Parameters:
obs_dim (int) – Observation dimension.
act_dim (int) – Action dimension.
hidden_sizes (list) – Hidden layer sizes.
activation (Activation) – Activation function.
weight_initialization_mode (InitFunction) – Weight initialization mode.
shared (nn.Module) – Shared network.