Adapter#
Online Adapter#
Documentation
- class omnisafe.adapter.OnlineAdapter(env_id, num_envs, seed, cfgs)[source]#
Online Adapter for OmniSafe.
OmniSafe is a framework for safe reinforcement learning. It is designed to be compatible with any existing RL algorithms. The online adapter is used to adapt the environment to the framework.
OmniSafe provides a set of adapters to adapt the environment to the framework.
OnPolicyAdapter: Adapt the environment to the on-policy framework.
OffPolicyAdapter: Adapt the environment to the off-policy framework.
SauteAdapter: Adapt the environment to the SAUTE framework.
SimmerAdapter: Adapt the environment to the SIMMER framework.
- Parameters:
env_id (str) – The environment id.
num_envs (int) – The number of parallel environments.
seed (int) – The random seed.
cfgs (Config) – The configuration.
Initialize an instance of
OnlineAdapter
.- _wrapper(obs_normalize=True, reward_normalize=True, cost_normalize=True)[source]#
Wrapper the environment.
Hint
OmniSafe supports the following wrappers:
Wrapper
Description
TimeLimit
Limit the time steps of the environment.
AutoReset
Reset the environment when the episode is done.
ObsNormalize
Normalize the observation.
RewardNormalize
Normalize the reward.
CostNormalize
Normalize the cost.
ActionScale
Scale the action.
Unsqueeze
Unsqueeze the step result for single environment case.
- Parameters:
obs_normalize (bool, optional) – Whether to normalize the observation. Defaults to True.
reward_normalize (bool, optional) – Whether to normalize the reward. Defaults to True.
cost_normalize (bool, optional) – Whether to normalize the cost. Defaults to True.
- Return type:
None
- property action_space: Box | Discrete#
The action space of the environment.
- property observation_space: Box | Discrete#
The observation space of the environment.
- reset()[source]#
Reset the environment and returns an initial observation.
- Returns:
observation – The initial observation of the space.
info – Some information logged by the environment.
- Return type:
tuple
[Tensor
,dict
[str
,Any
]]
- save()[source]#
Save the important components of the environment. :rtype:
dict
[str
,Module
]Note
The saved components will be stored in the wrapped environment. If the environment is not wrapped, the saved components will be an empty dict. common wrappers are
obs_normalize
,reward_normalize
, andcost_normalize
.- Returns:
The saved components of environment, e.g., ``obs_normalizer``.
- step(action)[source]#
Run one timestep of the environment’s dynamics using the agent actions.
- Parameters:
action (torch.Tensor) – The action from the agent or random.
- Returns:
observation – The agent’s observation of the current environment.
reward – The amount of reward returned after previous action.
cost – The amount of cost returned after previous action.
terminated – Whether the episode has ended.
truncated – Whether the episode has been truncated due to a time limit.
info – Some information logged by the environment.
- Return type:
tuple
[Tensor
,Tensor
,Tensor
,Tensor
,Tensor
,dict
[str
,Any
]]
On Policy Adapter#
Documentation
- class omnisafe.adapter.OnPolicyAdapter(env_id, num_envs, seed, cfgs)[source]#
OnPolicy Adapter for OmniSafe.
OnPolicyAdapter
is used to adapt the environment to the on-policy training.- Parameters:
env_id (str) – The environment id.
num_envs (int) – The number of environments.
seed (int) – The random seed.
cfgs (Config) – The configuration.
Initialize an instance of
OnPolicyAdapter
.- _log_metrics(logger, idx)[source]#
Log metrics, including
EpRet
,EpCost
,EpLen
.- Parameters:
logger (Logger) – Logger, to log
EpRet
,EpCost
,EpLen
.idx (int) – The index of the environment.
- Return type:
None
- _log_value(reward, cost, info)[source]#
Log value.
Note
OmniSafe uses
RewardNormalizer
wrapper, so the original reward and cost will be stored ininfo['original_reward']
andinfo['original_cost']
.- Parameters:
reward (torch.Tensor) – The immediate step reward.
cost (torch.Tensor) – The immediate step cost.
info (dict[str, Any]) – Some information logged by the environment.
- Return type:
None
- _reset_log(idx=None)[source]#
Reset the episode return, episode cost and episode length.
- Parameters:
idx (int or None, optional) – The index of the environment. Defaults to None (single environment).
- Return type:
None
- rollout(steps_per_epoch, agent, buffer, logger)[source]#
Rollout the environment and store the data in the buffer.
Warning
As OmniSafe uses
AutoReset
wrapper, the environment will be reset automatically, so the final observation will be stored ininfo['final_observation']
.- Parameters:
steps_per_epoch (int) – Number of steps per epoch.
agent (ConstraintActorCritic) – Constraint actor-critic, including actor , reward critic and cost critic.
buffer (VectorOnPolicyBuffer) – Vector on-policy buffer.
logger (Logger) – Logger, to log
EpRet
,EpCost
,EpLen
.
- Return type:
None
Off Policy Adapter#
Documentation
- class omnisafe.adapter.OffPolicyAdapter(env_id, num_envs, seed, cfgs)[source]#
OffPolicy Adapter for OmniSafe.
OffPolicyAdapter
is used to adapt the environment to the off-policy training.Note
Off-policy training need to update the policy before finish the episode, so the
OffPolicyAdapter
will store the current observation in_current_obs
. After update the policy, the agent will remember the current observation and use it to interact with the environment.- Parameters:
env_id (str) – The environment id.
num_envs (int) – The number of environments.
seed (int) – The random seed.
cfgs (Config) – The configuration.
Initialize a instance of
OffPolicyAdapter
.- _log_metrics(logger, idx)[source]#
Log metrics, including
EpRet
,EpCost
,EpLen
.- Parameters:
logger (Logger) – Logger, to log
EpRet
,EpCost
,EpLen
.idx (int) – The index of the environment.
- Return type:
None
- _log_value(reward, cost, info)[source]#
Log value.
Note
OmniSafe uses
RewardNormalizer
wrapper, so the original reward and cost will be stored ininfo['original_reward']
andinfo['original_cost']
.- Parameters:
reward (torch.Tensor) – The immediate step reward.
cost (torch.Tensor) – The immediate step cost.
info (dict[str, Any]) – Some information logged by the environment.
- Return type:
None
- _reset_log(idx=None)[source]#
Reset the episode return, episode cost and episode length.
- Parameters:
idx (int or None, optional) – The index of the environment. Defaults to None (single environment).
- Return type:
None
- eval_policy(episode, agent, logger)[source]#
Rollout the environment with deterministic agent action.
- Parameters:
episode (int) – Number of episodes.
agent (ConstraintActorCritic) – Agent.
logger (Logger) – Logger, to log
EpRet
,EpCost
,EpLen
.
- Return type:
None
- rollout(rollout_step, agent, buffer, logger, use_rand_action)[source]#
Rollout the environment and store the data in the buffer.
Warning
As OmniSafe uses
AutoReset
wrapper, the environment will be reset automatically, so the final observation will be stored ininfo['final_observation']
.- Parameters:
rollout_step (int) – Number of rollout steps.
agent (ConstraintActorCritic) – Constraint actor-critic, including actor, reward critic, and cost critic.
buffer (VectorOnPolicyBuffer) – Vector on-policy buffer.
logger (Logger) – Logger, to log
EpRet
,EpCost
,EpLen
.use_rand_action (bool) – Whether to use random action.
- Return type:
None