OmniSafe Actor#

Base Actor#

Documentation

class omnisafe.models.base.Actor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

A abstract class for actor.

An actor approximates the policy function that maps observations to actions. Actor is parameterized by a neural network that takes observations as input, and outputs the mean and standard deviation of the action distribution.

Note

You can use this class to implement your own actor by inheriting it.

Initialize the base actor.

Parameters:
  • obs_space (OmnisafeSpace) – observation space.

  • act_space (OmnisafeSpace) – action space.

  • hidden_sizes (list) – hidden layer sizes.

  • activation (Activation) – activation function.

  • weight_initialization_mode (InitFunction, optional) – weight initialization mode. Defaults to kaiming_uniform.

  • shared (nn.Module, optional) – shared module. Defaults to None.

__init__(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Initialize the base actor.

Parameters:
  • obs_space (OmnisafeSpace) – observation space.

  • act_space (OmnisafeSpace) – action space.

  • hidden_sizes (list) – hidden layer sizes.

  • activation (Activation) – activation function.

  • weight_initialization_mode (InitFunction, optional) – weight initialization mode. Defaults to kaiming_uniform.

  • shared (nn.Module, optional) – shared module. Defaults to None.

abstract _distribution(obs)[source]#

Return the distribution of action.

An actor generates a distribution, which is used to sample actions during training. When training, the mean and the variance of the distribution are used to calculate the loss. When testing, the mean of the distribution is used directly as actions.

For example, if the action is continuous, the actor can generate a Gaussian distribution.

(3)#\[p(a | s) = N(a | \mu(s), \sigma(s))\]

where \(\mu(s)\) and \(\sigma(s)\) are the mean and standard deviation of the distribution.

Warning

The distribution is a private method, which is only used to sample actions during training. You should not use it directly in your code, instead, you should use the public method predict to sample actions.

Parameters:

obs (torch.Tensor) – observation.

Returns:

Distribution – the distribution of action.

Return type:

Distribution

abstract forward(obs)[source]#

Return the distribution of action.

Parameters:

obs (torch.Tensor) – observation.

Return type:

Distribution

abstract log_prob(act)[source]#

Return the log probability of action under the distribution.

log_prob only can be called after calling predict or forward.

Parameters:
  • obs (torch.Tensor) – observation.

  • act (torch.Tensor) – action.

Returns:

torch.Tensor – the log probability of action under the distribution.

Return type:

Tensor

abstract predict(obs, deterministic=False)[source]#

Predict deterministic or stochastic action based on observation.

  • deterministic = True or False

When training the actor, one important trick to avoid local minimum is to use stochastic actions, which can simply be achieved by sampling actions from the distribution (set deterministic = False).

When testing the actor, we want to know the actual action that the agent will take, so we should use deterministic actions (set deterministic = True).

(4)#\[L = -\mathbb{E}_{s \sim p(s)} [\log p(a | s) A^R (s, a)]\]

where \(p(s)\) is the distribution of observation, \(p(a | s)\) is the distribution of action, and \(\log p(a | s)\) is the log probability of action under the distribution., \(A^R (s, a)\) is the advantage function.

Parameters:
  • obs (torch.Tensor) – observation.

  • deterministic (bool, optional) – whether to predict deterministic action. Defaults to False.

Return type:

Tensor

ActorBuilder(obs_space, act_space, hidden_sizes)

Class for building actor networks.

GaussianActor(obs_space, act_space, hidden_sizes)

A abstract class for normal distribution actor.

GaussianLearningActor(obs_space, act_space, ...)

Implementation of GaussianLearningActor.

GaussianSACActor(obs_space, act_space, ...)

Implementation of GaussianSACActor.

Actor Builder#

Documentation

class omnisafe.models.actor.ActorBuilder(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Class for building actor networks.

Initialize ActorBuilder.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list) – List of hidden layer sizes.

  • activation (Activation) – Activation function.

  • weight_initialization_mode (InitFunction) – Weight initialization mode.

__init__(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Initialize ActorBuilder.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list) – List of hidden layer sizes.

  • activation (Activation) – Activation function.

  • weight_initialization_mode (InitFunction) – Weight initialization mode.

build_actor(actor_type)[source]#

Build actor network.

Currently, we support the following actor types:
  • gaussian_learning: Gaussian actor with learnable standard deviation parameters.

  • gaussian_sac: Gaussian actor with learnable standard deviation network.

  • mlp: Multi-layer perceptron actor, used in DDPG and TD3.

Parameters:

actor_type (ActorType) – Actor type.

Return type:

Actor

Gaussian Actor#

Documentation

class omnisafe.models.actor.GaussianActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

A abstract class for normal distribution actor.

A NormalActor inherits from Actor and use Normal distribution to approximate the policy function.

Note

You can use this class to implement your own actor by inheriting it.

Initialize the base actor.

Parameters:
  • obs_space (OmnisafeSpace) – observation space.

  • act_space (OmnisafeSpace) – action space.

  • hidden_sizes (list) – hidden layer sizes.

  • activation (Activation) – activation function.

  • weight_initialization_mode (InitFunction, optional) – weight initialization mode. Defaults to kaiming_uniform.

  • shared (nn.Module, optional) – shared module. Defaults to None.

__init__(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')#

Initialize the base actor.

Parameters:
  • obs_space (OmnisafeSpace) – observation space.

  • act_space (OmnisafeSpace) – action space.

  • hidden_sizes (list) – hidden layer sizes.

  • activation (Activation) – activation function.

  • weight_initialization_mode (InitFunction, optional) – weight initialization mode. Defaults to kaiming_uniform.

  • shared (nn.Module, optional) – shared module. Defaults to None.

abstract property std: float#

Get the standard deviation of the normal distribution.

Gaussian Learning Actor#

Documentation

class omnisafe.models.actor.GaussianLearningActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Implementation of GaussianLearningActor.

Initialize GaussianLearningActor.

GaussianLearningActor is a Gaussian actor with a learnable standard deviation. It is used in on-policy algorithms such as PPO, TRPO and so on.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list) – List of hidden layer sizes.

  • activation (Activation) – Activation function.

  • weight_initialization_mode (InitFunction) – Weight initialization mode.

  • shared (nn.Module) – Shared module.

__init__(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Initialize GaussianLearningActor.

GaussianLearningActor is a Gaussian actor with a learnable standard deviation. It is used in on-policy algorithms such as PPO, TRPO and so on.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list) – List of hidden layer sizes.

  • activation (Activation) – Activation function.

  • weight_initialization_mode (InitFunction) – Weight initialization mode.

  • shared (nn.Module) – Shared module.

_distribution(obs)[source]#

Get the distribution of the actor.

Warning

This method is not supposed to be called by users. You should call forward() instead.

Parameters:

obs (torch.Tensor) – Observation.

Return type:

Distribution

forward(obs)[source]#

Forward method.

Parameters:

obs (torch.Tensor) – Observation.

Return type:

Distribution

log_prob(act)[source]#

Compute the log probability of the action given the current distribution.

Warning

You must call forward() or predict() before calling this method.

Parameters:

act (torch.Tensor) – Action.

Return type:

Tensor

predict(obs, deterministic=False)[source]#

Predict the action given observation.

The predicted action depends on the deterministic flag.

  • If deterministic is True, the predicted action is the mean of the distribution.

  • If deterministic is False, the predicted action is sampled from the distribution.

Parameters:
  • obs (torch.Tensor) – Observation.

  • deterministic (bool) – Whether to use deterministic policy.

Return type:

Tensor

property std: float#

Get the standard deviation of the distribution.

Gaussian SAC Actor#

Documentation

class omnisafe.models.actor.GaussianSACActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Implementation of GaussianSACActor.

Initialize GaussianSACActor.

GaussianSACActor is a Gaussian actor with a learnable standard deviation network. It is used in SAC, and other off-line or model-based algorithms related to SAC.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list) – List of hidden layer sizes.

  • activation (Activation) – Activation function.

  • weight_initialization_mode (InitFunction) – Weight initialization mode.

  • shared (nn.Module) – Shared module.

__init__(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Initialize GaussianSACActor.

GaussianSACActor is a Gaussian actor with a learnable standard deviation network. It is used in SAC, and other off-line or model-based algorithms related to SAC.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list) – List of hidden layer sizes.

  • activation (Activation) – Activation function.

  • weight_initialization_mode (InitFunction) – Weight initialization mode.

  • shared (nn.Module) – Shared module.

_distribution(obs)[source]#

Get the distribution of the actor.

Warning

This method is not supposed to be called by users. You should call forward() instead.

Specifically, this method will clip the standard deviation to a range of [-20, 2].

Parameters:

obs (torch.Tensor) – Observation.

Return type:

Distribution

forward(obs)[source]#

Forward method.

Parameters:

obs (torch.Tensor) – Observation.

Return type:

Distribution

log_prob(act)[source]#

Compute the log probability of the action given the current distribution.

Warning

You must call forward() or predict() before calling this method.

Note

In this method, we will regularize the log probability of the action. The regularization is as follows:

(6)#\[\log \pi(a|s) = \log \pi(a|s) - \sum_{i=1}^n (2 \log 2 - a_i - \log (1 + e^{-2 a_i}))\]

where \(a\) is the action, \(s\) is the observation, and \(n\) is the dimension of the action.

Parameters:

act (torch.Tensor) – Action.

Return type:

Tensor

predict(obs, deterministic=False)[source]#

Predict the action given observation.

The predicted action depends on the deterministic flag.

  • If deterministic is True, the predicted action is the mean of the distribution.

  • If deterministic is False, the predicted action is sampled from the distribution.

Parameters:
  • obs (torch.Tensor) – Observation.

  • deterministic (bool) – Whether to use deterministic policy.

Return type:

Tensor

property std: float#

Get the standard deviation of the normal distribution.