Adapter#

Online Adapter#

Documentation

class omnisafe.adapter.OnlineAdapter(env_id, num_envs, seed, cfgs)[source]#

Online Adapter for OmniSafe.

OmniSafe is a framework for safe reinforcement learning. It is designed to be compatible with any existing RL algorithms. The online adapter is used to adapt the environment to the framework.

OmniSafe provides a set of adapters to adapt the environment to the framework.

  • OnPolicyAdapter: Adapt the environment to the on-policy framework.

  • OffPolicyAdapter: Adapt the environment to the off-policy framework.

  • SauteAdapter: Adapt the environment to the SAUTE framework.

  • SimmerAdapter: Adapt the environment to the SIMMER framework.

Parameters:
  • env_id (str) – The environment id.

  • num_envs (int) – The number of parallel environments.

  • seed (int) – The random seed.

  • cfgs (Config) – The configuration.

Initialize an instance of OnlineAdapter.

_wrapper(obs_normalize=True, reward_normalize=True, cost_normalize=True)[source]#

Wrapper the environment.

Hint

OmniSafe supports the following wrappers:

Wrapper

Description

TimeLimit

Limit the time steps of the environment.

AutoReset

Reset the environment when the episode is done.

ObsNormalize

Normalize the observation.

RewardNormalize

Normalize the reward.

CostNormalize

Normalize the cost.

ActionScale

Scale the action.

Unsqueeze

Unsqueeze the step result for single environment case.

Parameters:
  • obs_normalize (bool, optional) – Whether to normalize the observation. Defaults to True.

  • reward_normalize (bool, optional) – Whether to normalize the reward. Defaults to True.

  • cost_normalize (bool, optional) – Whether to normalize the cost. Defaults to True.

Return type:

None

property action_space: Box | Discrete#

The action space of the environment.

property observation_space: Box | Discrete#

The observation space of the environment.

reset()[source]#

Reset the environment and returns an initial observation.

Returns:
  • observation – The initial observation of the space.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, dict[str, Any]]

save()[source]#

Save the important components of the environment. :rtype: dict[str, Module]

Note

The saved components will be stored in the wrapped environment. If the environment is not wrapped, the saved components will be an empty dict. common wrappers are obs_normalize, reward_normalize, and cost_normalize.

Returns:

The saved components of environment, e.g., ``obs_normalizer``.

step(action)[source]#

Run one timestep of the environment’s dynamics using the agent actions.

Parameters:

action (torch.Tensor) – The action from the agent or random.

Returns:
  • observation – The agent’s observation of the current environment.

  • reward – The amount of reward returned after previous action.

  • cost – The amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict[str, Any]]

On Policy Adapter#

Documentation

class omnisafe.adapter.OnPolicyAdapter(env_id, num_envs, seed, cfgs)[source]#

OnPolicy Adapter for OmniSafe.

OnPolicyAdapter is used to adapt the environment to the on-policy training.

Parameters:
  • env_id (str) – The environment id.

  • num_envs (int) – The number of environments.

  • seed (int) – The random seed.

  • cfgs (Config) – The configuration.

Initialize an instance of OnPolicyAdapter.

_log_metrics(logger, idx)[source]#

Log metrics, including EpRet, EpCost, EpLen.

Parameters:
  • logger (Logger) – Logger, to log EpRet, EpCost, EpLen.

  • idx (int) – The index of the environment.

Return type:

None

_log_value(reward, cost, info)[source]#

Log value.

Note

OmniSafe uses RewardNormalizer wrapper, so the original reward and cost will be stored in info['original_reward'] and info['original_cost'].

Parameters:
  • reward (torch.Tensor) – The immediate step reward.

  • cost (torch.Tensor) – The immediate step cost.

  • info (dict[str, Any]) – Some information logged by the environment.

Return type:

None

_reset_log(idx=None)[source]#

Reset the episode return, episode cost and episode length.

Parameters:

idx (int or None, optional) – The index of the environment. Defaults to None (single environment).

Return type:

None

rollout(steps_per_epoch, agent, buffer, logger)[source]#

Rollout the environment and store the data in the buffer.

Warning

As OmniSafe uses AutoReset wrapper, the environment will be reset automatically, so the final observation will be stored in info['final_observation'].

Parameters:
  • steps_per_epoch (int) – Number of steps per epoch.

  • agent (ConstraintActorCritic) – Constraint actor-critic, including actor , reward critic and cost critic.

  • buffer (VectorOnPolicyBuffer) – Vector on-policy buffer.

  • logger (Logger) – Logger, to log EpRet, EpCost, EpLen.

Return type:

None

Off Policy Adapter#

Documentation

class omnisafe.adapter.OffPolicyAdapter(env_id, num_envs, seed, cfgs)[source]#

OffPolicy Adapter for OmniSafe.

OffPolicyAdapter is used to adapt the environment to the off-policy training.

Note

Off-policy training need to update the policy before finish the episode, so the OffPolicyAdapter will store the current observation in _current_obs. After update the policy, the agent will remember the current observation and use it to interact with the environment.

Parameters:
  • env_id (str) – The environment id.

  • num_envs (int) – The number of environments.

  • seed (int) – The random seed.

  • cfgs (Config) – The configuration.

Initialize a instance of OffPolicyAdapter.

_log_metrics(logger, idx)[source]#

Log metrics, including EpRet, EpCost, EpLen.

Parameters:
  • logger (Logger) – Logger, to log EpRet, EpCost, EpLen.

  • idx (int) – The index of the environment.

Return type:

None

_log_value(reward, cost, info)[source]#

Log value.

Note

OmniSafe uses RewardNormalizer wrapper, so the original reward and cost will be stored in info['original_reward'] and info['original_cost'].

Parameters:
  • reward (torch.Tensor) – The immediate step reward.

  • cost (torch.Tensor) – The immediate step cost.

  • info (dict[str, Any]) – Some information logged by the environment.

Return type:

None

_reset_log(idx=None)[source]#

Reset the episode return, episode cost and episode length.

Parameters:

idx (int or None, optional) – The index of the environment. Defaults to None (single environment).

Return type:

None

eval_policy(episode, agent, logger)[source]#

Rollout the environment with deterministic agent action.

Parameters:
  • episode (int) – Number of episodes.

  • agent (ConstraintActorCritic) – Agent.

  • logger (Logger) – Logger, to log EpRet, EpCost, EpLen.

Return type:

None

rollout(rollout_step, agent, buffer, logger, use_rand_action)[source]#

Rollout the environment and store the data in the buffer.

Warning

As OmniSafe uses AutoReset wrapper, the environment will be reset automatically, so the final observation will be stored in info['final_observation'].

Parameters:
  • rollout_step (int) – Number of rollout steps.

  • agent (ConstraintActorCritic) – Constraint actor-critic, including actor, reward critic, and cost critic.

  • buffer (VectorOnPolicyBuffer) – Vector on-policy buffer.

  • logger (Logger) – Logger, to log EpRet, EpCost, EpLen.

  • use_rand_action (bool) – Whether to use random action.

Return type:

None