OmniSafe Actor#

Base Actor#

Documentation

class omnisafe.models.base.Actor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

A abstract class for actor.

An actor approximates the policy function that maps observations to actions. Actor is parameterized by a neural network that takes observations as input, and outputs the mean and standard deviation of the action distribution.

Note

You can use this class to implement your own actor by inheriting it.

Parameters:
  • obs_space (OmnisafeSpace) – observation space.

  • act_space (OmnisafeSpace) – action space.

  • hidden_sizes (list of int) – List of hidden layer sizes.

  • activation (Activation, optional) – Activation function. Defaults to 'relu'.

  • weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to 'kaiming_uniform'.

Initialize an instance of Actor.

abstract _distribution(obs)[source]#

Return the distribution of action.

An actor generates a distribution, which is used to sample actions during training. When training, the mean and the variance of the distribution are used to calculate the loss. When testing, the mean of the distribution is used directly as actions.

For example, if the action is continuous, the actor can generate a Gaussian distribution.

(3)#\[p (a | s) = N (\mu (s), \sigma (s))\]

where \(\mu (s)\) and \(\sigma (s)\) are the mean and standard deviation of the distribution.

Warning

The distribution is a private method, which is only used to sample actions during training. You should not use it directly in your code, instead, you should use the public method predict() to sample actions.

Parameters:

obs (torch.Tensor) – Observation from environments.

Returns:

The distribution of action.

Return type:

Distribution

abstract forward(obs)[source]#

Return the distribution of action.

Parameters:

obs (torch.Tensor) – Observation from environments.

Return type:

Distribution

abstract log_prob(act)[source]#

Return the log probability of action under the distribution.

log_prob() only can be called after calling predict() or forward().

Parameters:

act (torch.Tensor) – The action.

Returns:

The log probability of action under the distribution.

Return type:

Tensor

abstract predict(obs, deterministic=False)[source]#

Predict deterministic or stochastic action based on observation.

  • deterministic = True or False

When training the actor, one important trick to avoid local minimum is to use stochastic actions, which can simply be achieved by sampling actions from the distribution (set deterministic=False).

When testing the actor, we want to know the actual action that the agent will take, so we should use deterministic actions (set deterministic=True).

(4)#\[L = -\mathbb{E}_{s \sim p(s)} [ \log p (a | s) A^R (s, a) ]\]

where \(p (s)\) is the distribution of observation, \(p (a | s)\) is the distribution of action, and \(\log p (a | s)\) is the log probability of action under the distribution., and \(A^R (s, a)\) is the advantage function.

Parameters:
  • obs (torch.Tensor) – Observation from environments.

  • deterministic (bool, optional) – whether to predict deterministic action. Defaults to False.

Return type:

Tensor

ActorBuilder(obs_space, act_space, hidden_sizes)

Class for building actor networks.

GaussianActor(obs_space, act_space, hidden_sizes)

An abstract class for normal distribution actor.

GaussianLearningActor(obs_space, act_space, ...)

Implementation of GaussianLearningActor.

GaussianSACActor(obs_space, act_space, ...)

Implementation of GaussianSACActor.

Actor Builder#

Documentation

class omnisafe.models.actor.ActorBuilder(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Class for building actor networks.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list of int) – List of hidden layer sizes.

  • activation (Activation, optional) – Activation function. Defaults to 'relu'.

  • weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to 'kaiming_uniform'.

Initialize an instance of ActorBuilder.

build_actor(actor_type)[source]#

Build actor network.

Currently, we support the following actor types:
  • gaussian_learning: Gaussian actor with learnable standard deviation parameters.

  • gaussian_sac: Gaussian actor with learnable standard deviation network.

  • mlp: Multi-layer perceptron actor, used in DDPG and TD3.

Parameters:

actor_type (ActorType) – Type of actor network, e.g. gaussian_learning.

Returns:

Actor network, ranging from ``GaussianLearningActor``, ``GaussianSACActor`` to ``MLPActor``.

Raises:

NotImplementedError – If the actor type is not implemented.

Return type:

GaussianLearningActor | GaussianSACActor | MLPActor

Gaussian Actor#

Documentation

class omnisafe.models.actor.GaussianActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

An abstract class for normal distribution actor.

A NormalActor inherits from Actor and use Normal distribution to approximate the policy function.

Note

You can use this class to implement your own actor by inheriting it.

Initialize an instance of Actor.

abstract property std: float#

Get the standard deviation of the normal distribution.

Gaussian Learning Actor#

Documentation

class omnisafe.models.actor.GaussianLearningActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Implementation of GaussianLearningActor.

GaussianLearningActor is a Gaussian actor with a learnable standard deviation. It is used in on-policy algorithms such as PPO, TRPO and so on.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list of int) – List of hidden layer sizes.

  • activation (Activation, optional) – Activation function. Defaults to 'relu'.

  • weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to 'kaiming_uniform'.

Initialize an instance of GaussianLearningActor.

_distribution(obs)[source]#

Get the distribution of the actor.

Warning

This method is not supposed to be called by users. You should call forward() instead.

Parameters:

obs (torch.Tensor) – Observation from environments.

Returns:

The normal distribution of the mean and standard deviation from the actor.

Return type:

Normal

forward(obs)[source]#

Forward method.

Parameters:

obs (torch.Tensor) – Observation from environments.

Returns:

The current distribution.

Return type:

Distribution

log_prob(act)[source]#

Compute the log probability of the action given the current distribution.

Warning

You must call forward() or predict() before calling this method.

Parameters:

act (torch.Tensor) – Action.

Returns:

Log probability of the action.

Return type:

Tensor

predict(obs, deterministic=False)[source]#

Predict the action given observation.

The predicted action depends on the deterministic flag.

  • If deterministic is True, the predicted action is the mean of the distribution.

  • If deterministic is False, the predicted action is sampled from the distribution.

Parameters:
  • obs (torch.Tensor) – Observation from environments.

  • deterministic (bool, optional) – Whether to use deterministic policy. Defaults to False.

Returns:
  • The mean of the distribution if ``deterministic`` is ``True``, otherwise the sampled

  • action.

Return type:

Tensor

property std: float#

Standard deviation of the distribution.

Gaussian SAC Actor#

Documentation

class omnisafe.models.actor.GaussianSACActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Implementation of GaussianSACActor.

GaussianSACActor is a Gaussian actor with a learnable standard deviation network. It is used in SAC, and other off-line or model-based algorithms related to SAC.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list of int) – List of hidden layer sizes.

  • activation (Activation, optional) – Activation function. Defaults to 'relu'.

  • weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to 'kaiming_uniform'.

Initialize an instance of GaussianSACActor.

_distribution(obs)[source]#

Get the distribution of the actor.

Warning

This method is not supposed to be called by users. You should call forward() instead.

Specifically, this method will clip the standard deviation to a range of [-20, 2].

Parameters:

obs (torch.Tensor) – Observation from environments.

Returns:

The normal distribution of the mean and standard deviation from the actor.

Return type:

Normal

forward(obs)[source]#

Forward method.

Parameters:

obs (torch.Tensor) – Observation from environments.

Returns:

The current distribution.

Return type:

TanhNormal

log_prob(act)[source]#

Compute the log probability of the action given the current distribution.

Warning

You must call forward() or predict() before calling this method.

Note

In this method, we will regularize the log probability of the action. The regularization is as follows:

(6)#\[\log \pi (a|s) = \log \pi (a|s) - \sum_{i=1}^n (2 \log 2 - a_i - \log (1 + e^{-2 a_i}))\]

where \(a\) is the action, \(s\) is the observation, and \(n\) is the dimension of the action.

Parameters:

act (torch.Tensor) – Action.

Returns:

Log probability of the action.

Return type:

Tensor

predict(obs, deterministic=False)[source]#

Predict the action given observation.

The predicted action depends on the deterministic flag.

  • If deterministic is True, the predicted action is the mean of the distribution.

  • If deterministic is False, the predicted action is sampled from the distribution.

Parameters:
  • obs (torch.Tensor) – Observation from environments.

  • deterministic (bool, optional) – Whether to use deterministic policy. Defaults to False.

Returns:
  • The mean of the distribution if ``deterministic`` is ``True``, otherwise the sampled

  • action.

Return type:

Tensor

property std: float#

Standard deviation of the distribution.