OmniSafe Lagrange Multiplier#
|
Base class for Lagrangian-base Algorithms. |
Lagrange Multiplier#
Documentation
- class omnisafe.common.lagrange.Lagrange(cost_limit, lagrangian_multiplier_init, lambda_lr, lambda_optimizer, lagrangian_upper_bound=None)[source]#
Base class for Lagrangian-base Algorithms.
This class implements the Lagrange multiplier update and the Lagrange loss.
Note
Any traditional policy gradient algorithm can be converted to a Lagrangian-based algorithm by inheriting from this class and implementing the
_loss_pi()
method.Example
>>> from omnisafe.common.lagrange import Lagrange >>> def loss_pi(self, data): >>> # implement your own loss function here >>> return loss
You can also inherit this class to implement your own Lagrangian-based algorithm, with any policy gradient method you like in
omnisafe
.Example
>>> from omnisafe.common.lagrange import Lagrange >>> class CustomAlgo: >>> def __init(self) -> None: >>> # initialize your own algorithm here >>> super().__init__() >>> # initialize the Lagrange multiplier >>> self.lagrange = Lagrange(**self._cfgs.lagrange_cfgs)
Initialize Lagrange multiplier.
- __init__(cost_limit, lagrangian_multiplier_init, lambda_lr, lambda_optimizer, lagrangian_upper_bound=None)[source]#
Initialize Lagrange multiplier.
- compute_lambda_loss(mean_ep_cost)[source]#
Penalty loss for Lagrange multiplier.
Note
mean_ep_cost
obtained from:self.logger.get_stats('EpCosts')[0]
, which are already averaged across MPI processes.- Parameters:
mean_ep_cost (float) – mean episode cost.
- Return type:
Tensor
- update_lagrange_multiplier(Jc)[source]#
Update Lagrange multiplier (lambda).
Detailedly speaking, we update the Lagrange multiplier by minimizing the penalty loss, which is defined as:
(2)#\[\lambda ^{'} = \lambda + \eta * (J_c - J_c^*)\]where \(\lambda\) is the Lagrange multiplier, \(\eta\) is the learning rate, \(J_c\) is the mean episode cost, and \(J_c^*\) is the cost limit.
- Parameters:
Jc (float) – mean episode cost.
- Return type:
None
|
Abstract base class for Lagrangian-base Algorithms. |
PIDLagrange#
Documentation
- class omnisafe.common.pid_lagrange.PIDLagrangian(pid_kp, pid_ki, pid_kd, pid_d_delay, pid_delta_p_ema_alpha, pid_delta_d_ema_alpha, sum_norm, diff_norm, penalty_max, lagrangian_multiplier_init, cost_limit)[source]#
Abstract base class for Lagrangian-base Algorithms.
Similar to the
Lagrange
module, this module implements the PID version of the lagrangian method.Note
The PID-Lagrange is more general than the Lagrange, and can be used in any policy gradient algorithm. As PID_Lagrange use the PID controller to control the lagrangian multiplier, it is more stable than the naive Lagrange.
References:
Title: Responsive Safety in Reinforcement Learning by PID Lagrangian Methods
Authors: Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel.
URL: PID Lagrange
Initialize PIDLagrangian.
- Parameters:
pid_kp (
float
) – The proportional gain of the PID controller.pid_ki (
float
) – The integral gain of the PID controller.pid_kd (
float
) – The derivative gain of the PID controller.pid_d_delay (
int
) – The delay of the derivative term of the PID controller.pid_delta_p_ema_alpha (
float
) – The exponential moving average alpha of the proportional term of the PID controller.pid_delta_d_ema_alpha (
float
) – The exponential moving average alpha of the derivative term of the PID controller.sum_norm (
bool
) – Whether to normalize the sum of the cost.diff_norm (
bool
) – Whether to normalize the difference of the cost.penalty_max (
int
) – The maximum penalty.lagrangian_multiplier_init (
float
) – The initial value of the lagrangian multiplier.cost_limit (
int
) – The cost limit.
- __init__(pid_kp, pid_ki, pid_kd, pid_d_delay, pid_delta_p_ema_alpha, pid_delta_d_ema_alpha, sum_norm, diff_norm, penalty_max, lagrangian_multiplier_init, cost_limit)[source]#
Initialize PIDLagrangian.
- Parameters:
pid_kp (
float
) – The proportional gain of the PID controller.pid_ki (
float
) – The integral gain of the PID controller.pid_kd (
float
) – The derivative gain of the PID controller.pid_d_delay (
int
) – The delay of the derivative term of the PID controller.pid_delta_p_ema_alpha (
float
) – The exponential moving average alpha of the proportional term of the PID controller.pid_delta_d_ema_alpha (
float
) – The exponential moving average alpha of the derivative term of the PID controller.sum_norm (
bool
) – Whether to normalize the sum of the cost.diff_norm (
bool
) – Whether to normalize the difference of the cost.penalty_max (
int
) – The maximum penalty.lagrangian_multiplier_init (
float
) – The initial value of the lagrangian multiplier.cost_limit (
int
) – The cost limit.
- pid_update(ep_cost_avg)[source]#
Update the PID controller.
Detailedly, PID controller update the lagrangian multiplier following the next equation:
(4)#\[\lambda_{t+1} = \lambda_t + (K_p e_p + K_i \int e_p dt + K_d \frac{d e_p}{d t}) \eta\]where \(e_p\) is the error between the current episode cost and the cost limit, \(K_p\), \(K_i\), \(K_d\) are the PID parameters, and \(\eta\) is the learning rate.
- Parameters:
ep_cost_avg (float) – The average cost of the current episode.
- Return type:
None