OmniSafe Actor#
Base Actor#
Documentation
- class omnisafe.models.base.Actor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
A abstract class for actor.
An actor approximates the policy function that maps observations to actions. Actor is parameterized by a neural network that takes observations as input, and outputs the mean and standard deviation of the action distribution.
Note
You can use this class to implement your own actor by inheriting it.
Initialize the base actor.
- Parameters:
obs_space (OmnisafeSpace) – observation space.
act_space (OmnisafeSpace) – action space.
hidden_sizes (list) – hidden layer sizes.
activation (Activation) – activation function.
weight_initialization_mode (InitFunction, optional) – weight initialization mode. Defaults to
kaiming_uniform
.shared (nn.Module, optional) – shared module. Defaults to None.
- __init__(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
Initialize the base actor.
- Parameters:
obs_space (OmnisafeSpace) – observation space.
act_space (OmnisafeSpace) – action space.
hidden_sizes (list) – hidden layer sizes.
activation (Activation) – activation function.
weight_initialization_mode (InitFunction, optional) – weight initialization mode. Defaults to
kaiming_uniform
.shared (nn.Module, optional) – shared module. Defaults to None.
- abstract _distribution(obs)[source]#
Return the distribution of action.
An actor generates a distribution, which is used to sample actions during training. When training, the mean and the variance of the distribution are used to calculate the loss. When testing, the mean of the distribution is used directly as actions.
For example, if the action is continuous, the actor can generate a Gaussian distribution.
(3)#\[p(a | s) = N(a | \mu(s), \sigma(s))\]where \(\mu(s)\) and \(\sigma(s)\) are the mean and standard deviation of the distribution.
Warning
The distribution is a private method, which is only used to sample actions during training. You should not use it directly in your code, instead, you should use the public method
predict
to sample actions.- Parameters:
obs (torch.Tensor) – observation.
- Returns:
Distribution – the distribution of action.
- Return type:
Distribution
- abstract forward(obs)[source]#
Return the distribution of action.
- Parameters:
obs (torch.Tensor) – observation.
- Return type:
Distribution
- abstract log_prob(act)[source]#
Return the log probability of action under the distribution.
log_prob
only can be called after callingpredict
orforward
.- Parameters:
obs (torch.Tensor) – observation.
act (torch.Tensor) – action.
- Returns:
torch.Tensor – the log probability of action under the distribution.
- Return type:
Tensor
- abstract predict(obs, deterministic=False)[source]#
Predict deterministic or stochastic action based on observation.
deterministic
=True
orFalse
When training the actor, one important trick to avoid local minimum is to use stochastic actions, which can simply be achieved by sampling actions from the distribution (set
deterministic
=False
).When testing the actor, we want to know the actual action that the agent will take, so we should use deterministic actions (set
deterministic
=True
).(4)#\[L = -\mathbb{E}_{s \sim p(s)} [\log p(a | s) A^R (s, a)]\]where \(p(s)\) is the distribution of observation, \(p(a | s)\) is the distribution of action, and \(\log p(a | s)\) is the log probability of action under the distribution., \(A^R (s, a)\) is the advantage function.
- Parameters:
obs (torch.Tensor) – observation.
deterministic (bool, optional) – whether to predict deterministic action. Defaults to False.
- Return type:
Tensor
|
Class for building actor networks. |
|
A abstract class for normal distribution actor. |
|
Implementation of GaussianLearningActor. |
|
Implementation of GaussianSACActor. |
Actor Builder#
Documentation
- class omnisafe.models.actor.ActorBuilder(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
Class for building actor networks.
Initialize ActorBuilder.
- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list) – List of hidden layer sizes.
activation (Activation) – Activation function.
weight_initialization_mode (InitFunction) – Weight initialization mode.
- __init__(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
Initialize ActorBuilder.
- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list) – List of hidden layer sizes.
activation (Activation) – Activation function.
weight_initialization_mode (InitFunction) – Weight initialization mode.
- build_actor(actor_type)[source]#
Build actor network.
- Currently, we support the following actor types:
gaussian_learning
: Gaussian actor with learnable standard deviation parameters.gaussian_sac
: Gaussian actor with learnable standard deviation network.mlp
: Multi-layer perceptron actor, used inDDPG
andTD3
.
- Parameters:
actor_type (ActorType) – Actor type.
- Return type:
Gaussian Actor#
Documentation
- class omnisafe.models.actor.GaussianActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
A abstract class for normal distribution actor.
A NormalActor inherits from Actor and use Normal distribution to approximate the policy function.
Note
You can use this class to implement your own actor by inheriting it.
Initialize the base actor.
- Parameters:
obs_space (OmnisafeSpace) – observation space.
act_space (OmnisafeSpace) – action space.
hidden_sizes (list) – hidden layer sizes.
activation (Activation) – activation function.
weight_initialization_mode (InitFunction, optional) – weight initialization mode. Defaults to
kaiming_uniform
.shared (nn.Module, optional) – shared module. Defaults to None.
- __init__(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')#
Initialize the base actor.
- Parameters:
obs_space (OmnisafeSpace) – observation space.
act_space (OmnisafeSpace) – action space.
hidden_sizes (list) – hidden layer sizes.
activation (Activation) – activation function.
weight_initialization_mode (InitFunction, optional) – weight initialization mode. Defaults to
kaiming_uniform
.shared (nn.Module, optional) – shared module. Defaults to None.
- abstract property std: float#
Get the standard deviation of the normal distribution.
Gaussian Learning Actor#
Documentation
- class omnisafe.models.actor.GaussianLearningActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
Implementation of GaussianLearningActor.
Initialize GaussianLearningActor.
GaussianLearningActor is a Gaussian actor with a learnable standard deviation. It is used in on-policy algorithms such as
PPO
,TRPO
and so on.- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list) – List of hidden layer sizes.
activation (Activation) – Activation function.
weight_initialization_mode (InitFunction) – Weight initialization mode.
shared (nn.Module) – Shared module.
- __init__(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
Initialize GaussianLearningActor.
GaussianLearningActor is a Gaussian actor with a learnable standard deviation. It is used in on-policy algorithms such as
PPO
,TRPO
and so on.- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list) – List of hidden layer sizes.
activation (Activation) – Activation function.
weight_initialization_mode (InitFunction) – Weight initialization mode.
shared (nn.Module) – Shared module.
- _distribution(obs)[source]#
Get the distribution of the actor.
Warning
This method is not supposed to be called by users. You should call
forward()
instead.- Parameters:
obs (torch.Tensor) – Observation.
- Return type:
Distribution
- forward(obs)[source]#
Forward method.
- Parameters:
obs (torch.Tensor) – Observation.
- Return type:
Distribution
- log_prob(act)[source]#
Compute the log probability of the action given the current distribution.
- Parameters:
act (torch.Tensor) – Action.
- Return type:
Tensor
- predict(obs, deterministic=False)[source]#
Predict the action given observation.
The predicted action depends on the
deterministic
flag.If
deterministic
isTrue
, the predicted action is the mean of the distribution.If
deterministic
isFalse
, the predicted action is sampled from the distribution.
- Parameters:
obs (torch.Tensor) – Observation.
deterministic (bool) – Whether to use deterministic policy.
- Return type:
Tensor
- property std: float#
Get the standard deviation of the distribution.
Gaussian SAC Actor#
Documentation
- class omnisafe.models.actor.GaussianSACActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
Implementation of GaussianSACActor.
Initialize GaussianSACActor.
GaussianSACActor is a Gaussian actor with a learnable standard deviation network. It is used in
SAC
, and other off-line or model-based algorithms related toSAC
.- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list) – List of hidden layer sizes.
activation (Activation) – Activation function.
weight_initialization_mode (InitFunction) – Weight initialization mode.
shared (nn.Module) – Shared module.
- __init__(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
Initialize GaussianSACActor.
GaussianSACActor is a Gaussian actor with a learnable standard deviation network. It is used in
SAC
, and other off-line or model-based algorithms related toSAC
.- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list) – List of hidden layer sizes.
activation (Activation) – Activation function.
weight_initialization_mode (InitFunction) – Weight initialization mode.
shared (nn.Module) – Shared module.
- _distribution(obs)[source]#
Get the distribution of the actor.
Warning
This method is not supposed to be called by users. You should call
forward()
instead.Specifically, this method will clip the standard deviation to a range of [-20, 2].
- Parameters:
obs (torch.Tensor) – Observation.
- Return type:
Distribution
- forward(obs)[source]#
Forward method.
- Parameters:
obs (torch.Tensor) – Observation.
- Return type:
Distribution
- log_prob(act)[source]#
Compute the log probability of the action given the current distribution.
Note
In this method, we will regularize the log probability of the action. The regularization is as follows:
(6)#\[\log \pi(a|s) = \log \pi(a|s) - \sum_{i=1}^n (2 \log 2 - a_i - \log (1 + e^{-2 a_i}))\]where \(a\) is the action, \(s\) is the observation, and \(n\) is the dimension of the action.
- Parameters:
act (torch.Tensor) – Action.
- Return type:
Tensor
- predict(obs, deterministic=False)[source]#
Predict the action given observation.
The predicted action depends on the
deterministic
flag.If
deterministic
isTrue
, the predicted action is the mean of the distribution.If
deterministic
isFalse
, the predicted action is sampled from the distribution.
- Parameters:
obs (torch.Tensor) – Observation.
deterministic (bool) – Whether to use deterministic policy.
- Return type:
Tensor
- property std: float#
Get the standard deviation of the normal distribution.