OmniSafe Critic#

Base Critic#

Documentation

class omnisafe.models.base.Critic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#

A abstract class for critic.

A critic approximates the value function that maps observations to values. Critic is parameterized by a neural network that takes observations as input, (Q critic also takes actions as input) and outputs the value of the observation.

Note

Omnisafe provides two types of critic: Q critic (Input = observation + action , Output = value), and V critic (Input = observation , Output = value). You can also use this class to implement your own actor by inheriting it.

Initialize the base critic.

Parameters:
  • obs_space (OmnisafeSpace) – observation space.

  • act_space (OmnisafeSpace) – action space.

  • hidden_sizes (list) – hidden layer sizes.

  • activation (Activation, optional) – activation function. Defaults to ‘relu’.

  • weight_initialization_mode (InitFunction, optional) – weight initialization mode. Defaults to ‘kaiming_uniform’.

  • shared (nn.Module, optional) – shared module. Defaults to None.

CriticBuilder(obs_space, act_space, hidden_sizes)

Implementation of CriticBuilder

QCritic(obs_space, act_space, hidden_sizes)

Implementation of QCritic.

VCritic(obs_space, act_space, hidden_sizes)

Implementation of VCritic.

Critic Builder#

Documentation

class omnisafe.models.critic.CriticBuilder(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#

Implementation of CriticBuilder

Note

A CriticBuilder is a class for building a critic network. In omnisafe, instead of building the critic network directly, we build it by integrating various types of critic networks into the CriticBuilder. The advantage of this is that each type of critic has a uniform way of passing parameters. This makes it easy for users to use existing critics, and also facilitates the extension of new critic types.

Initialize CriticBuilder.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (List[int]) – Hidden sizes of the critic network.

  • activation (Activation) – Activation function.

  • weight_initialization_mode (InitFunction) – Weight initialization mode.

  • num_critics (int) – Number of critics.

  • use_obs_encoder (bool) – Whether to use observation encoder, only used in q critic.

build_critic(critic_type)[source]#

Build critic.

Currently, we support two types of critics: q and v. If you want to add a new critic type, you can simply add it here.

Parameters:

critic_type (str) – Critic type.

Return type:

Critic

Q Critic#

Documentation

class omnisafe.models.critic.QCritic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#

Implementation of QCritic.

A Q-function approximator that uses a multi-layer perceptron (MLP) to map observation-action pairs to Q-values. This class is an inherit class of Critic. You can design your own Q-function approximator by inheriting this class or Critic.

Initialize the critic network.

The Q critic network has two modes:

Hint

  • use_obs_encoder = False :

    The input of the network is the concatenation of the observation and action.

  • use_obs_encoder = True :

    The input of the network is the concatenation of the output of the observation encoder and action.

For example, in DDPG, the action is not directly concatenated with the observation, but is concatenated with the output of the observation encoder.

Note

The Q critic network contains multiple critics, and the output of the network :meth`forward` is a list of Q-values. If you want to get the single Q-value of a specific critic, you need to use the index to get it.

Parameters:
  • obs_space (OmnisafeSpace) – observation space.

  • act_space (OmnisafeSpace) – action space.

  • hidden_sizes (list) – list of hidden layer sizes.

  • activation (Activation) – activation function.

  • weight_initialization_mode (InitFunction) – weight initialization mode.

  • shared (nn.Module) – shared network.

  • num_critics (int) – number of critics.

  • use_obs_encoder (bool) – whether to use observation encoder.

forward(obs, act)[source]#

Forward function.

As a multi-critic network, the output of the network is a list of Q-values. If you want to use it as a single-critic network, you only need to set the num_critics parameter to 1 when initializing the network, and then use the index 0 to get the Q-value.

Parameters:
  • obs (torch.Tensor) – Observation.

  • act (torch.Tensor) – Action.

Return type:

List[Tensor]

V Critic#

Documentation

class omnisafe.models.critic.VCritic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1)[source]#

Implementation of VCritic.

A V-function approximator that uses a multi-layer perceptron (MLP) to map observations to V-values. This class is an inherit class of Critic. You can design your own V-function approximator by inheriting this class or Critic.

Initialize the critic network.

Parameters:
  • obs_dim (int) – Observation dimension.

  • act_dim (int) – Action dimension.

  • hidden_sizes (list) – Hidden layer sizes.

  • activation (Activation) – Activation function.

  • weight_initialization_mode (InitFunction) – Weight initialization mode.

  • shared (nn.Module) – Shared network.

forward(obs)[source]#

Forward function.

Specifically, V function approximator maps observations to V-values.

Parameters:

obs (torch.Tensor) – Observations.

Return type:

List[Tensor]