OmniSafe Lagrange Multiplier#

Lagrange(cost_limit, ...[, ...])

Base class for Lagrangian-base Algorithms.

Lagrange Multiplier#

Documentation

class omnisafe.common.lagrange.Lagrange(cost_limit, lagrangian_multiplier_init, lambda_lr, lambda_optimizer, lagrangian_upper_bound=None)[source]#

Base class for Lagrangian-base Algorithms.

This class implements the Lagrange multiplier update and the Lagrange loss.

Note

Any traditional policy gradient algorithm can be converted to a Lagrangian-based algorithm by inheriting from this class and implementing the _loss_pi() method.

Example

>>> from omnisafe.common.lagrange import Lagrange
>>> def loss_pi(self, data):
>>>     # implement your own loss function here
>>>     return loss

You can also inherit this class to implement your own Lagrangian-based algorithm, with any policy gradient method you like in omnisafe.

Example

>>> from omnisafe.common.lagrange import Lagrange
>>> class CustomAlgo:
>>>     def __init(self) -> None:
>>>         # initialize your own algorithm here
>>>         super().__init__()
>>>         # initialize the Lagrange multiplier
>>>         self.lagrange = Lagrange(**self._cfgs.lagrange_cfgs)

Initialize Lagrange multiplier.

__init__(cost_limit, lagrangian_multiplier_init, lambda_lr, lambda_optimizer, lagrangian_upper_bound=None)[source]#

Initialize Lagrange multiplier.

compute_lambda_loss(mean_ep_cost)[source]#

Penalty loss for Lagrange multiplier.

Note

mean_ep_cost obtained from: self.logger.get_stats('EpCosts')[0], which are already averaged across MPI processes.

Parameters:

mean_ep_cost (float) – mean episode cost.

Return type:

Tensor

update_lagrange_multiplier(Jc)[source]#

Update Lagrange multiplier (lambda).

Detailedly speaking, we update the Lagrange multiplier by minimizing the penalty loss, which is defined as:

(2)#\[\lambda ^{'} = \lambda + \eta * (J_c - J_c^*)\]

where \(\lambda\) is the Lagrange multiplier, \(\eta\) is the learning rate, \(J_c\) is the mean episode cost, and \(J_c^*\) is the cost limit.

Parameters:

Jc (float) – mean episode cost.

Return type:

None

PIDLagrangian(pid_kp, pid_ki, pid_kd, ...)

Abstract base class for Lagrangian-base Algorithms.

PIDLagrange#

Documentation

class omnisafe.common.pid_lagrange.PIDLagrangian(pid_kp, pid_ki, pid_kd, pid_d_delay, pid_delta_p_ema_alpha, pid_delta_d_ema_alpha, sum_norm, diff_norm, penalty_max, lagrangian_multiplier_init, cost_limit)[source]#

Abstract base class for Lagrangian-base Algorithms.

Similar to the Lagrange module, this module implements the PID version of the lagrangian method.

Note

The PID-Lagrange is more general than the Lagrange, and can be used in any policy gradient algorithm. As PID_Lagrange use the PID controller to control the lagrangian multiplier, it is more stable than the naive Lagrange.

References:

  • Title: Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

  • Authors: Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel.

  • URL: PID Lagrange

Initialize PIDLagrangian.

Parameters:
  • pid_kp (float) – The proportional gain of the PID controller.

  • pid_ki (float) – The integral gain of the PID controller.

  • pid_kd (float) – The derivative gain of the PID controller.

  • pid_d_delay (int) – The delay of the derivative term of the PID controller.

  • pid_delta_p_ema_alpha (float) – The exponential moving average alpha of the proportional term of the PID controller.

  • pid_delta_d_ema_alpha (float) – The exponential moving average alpha of the derivative term of the PID controller.

  • sum_norm (bool) – Whether to normalize the sum of the cost.

  • diff_norm (bool) – Whether to normalize the difference of the cost.

  • penalty_max (int) – The maximum penalty.

  • lagrangian_multiplier_init (float) – The initial value of the lagrangian multiplier.

  • cost_limit (int) – The cost limit.

__init__(pid_kp, pid_ki, pid_kd, pid_d_delay, pid_delta_p_ema_alpha, pid_delta_d_ema_alpha, sum_norm, diff_norm, penalty_max, lagrangian_multiplier_init, cost_limit)[source]#

Initialize PIDLagrangian.

Parameters:
  • pid_kp (float) – The proportional gain of the PID controller.

  • pid_ki (float) – The integral gain of the PID controller.

  • pid_kd (float) – The derivative gain of the PID controller.

  • pid_d_delay (int) – The delay of the derivative term of the PID controller.

  • pid_delta_p_ema_alpha (float) – The exponential moving average alpha of the proportional term of the PID controller.

  • pid_delta_d_ema_alpha (float) – The exponential moving average alpha of the derivative term of the PID controller.

  • sum_norm (bool) – Whether to normalize the sum of the cost.

  • diff_norm (bool) – Whether to normalize the difference of the cost.

  • penalty_max (int) – The maximum penalty.

  • lagrangian_multiplier_init (float) – The initial value of the lagrangian multiplier.

  • cost_limit (int) – The cost limit.

pid_update(ep_cost_avg)[source]#

Update the PID controller.

Detailedly, PID controller update the lagrangian multiplier following the next equation:

(4)#\[\lambda_{t+1} = \lambda_t + (K_p e_p + K_i \int e_p dt + K_d \frac{d e_p}{d t}) \eta\]

where \(e_p\) is the error between the current episode cost and the cost limit, \(K_p\), \(K_i\), \(K_d\) are the PID parameters, and \(\eta\) is the learning rate.

Parameters:

ep_cost_avg (float) – The average cost of the current episode.

Return type:

None