garage.envs
¶
Garage wrappers for gym environments.
-
class
GridWorldEnv
(desc='4x4', max_episode_length=None)¶ Bases:
garage.Environment
A simply 2D grid environment.
‘S’ : starting point‘F’ or ‘.’: free space‘W’ or ‘x’: wall‘H’ or ‘o’: hole (terminates episode)‘G’ : goal-
property
action_space
(self)¶ akro.Space: The action space specification.
-
property
observation_space
(self)¶ akro.Space: The observation space specification.
-
property
spec
(self)¶ EnvSpec: The environment specification.
-
property
render_modes
(self)¶ list: A list of string representing the supported render modes.
-
reset
(self)¶ Resets the environment.
- Returns
- The first observation conforming to
observation_space.
- dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
- Return type
numpy.ndarray
-
step
(self, action)¶ Steps the environment.
action map: 0: left 1: down 2: right 3: up
- Parameters
action (int) – an int encoding the action
- Returns
The environment step resulting from the action.
- Return type
- Raises
RuntimeError – if step() is called after the environment has been constructed and reset() has not been called.
NotImplementedError – if a next step in self._desc does not match known state type.
-
render
(self, mode)¶ Renders the environment.
- Parameters
mode (str) – the mode to render with. The string must be present in Environment.render_modes.
-
visualize
(self)¶ Creates a visualization of the environment.
-
close
(self)¶ Close the env.
-
property
-
class
GymEnv
(env, is_image=False, max_episode_length=None)¶ Bases:
garage.Environment
Returns an abstract Garage wrapper class for gym.Env.
In order to provide pickling (serialization) and parameterization for
gym.Env
instances, they must be wrapped withGymEnv
. This ensures compatibility with existing samplers and checkpointing when the envs are passed internally around garage.Furthermore, classes inheriting from
GymEnv
should silently convert :attribute:`action_space` and :attribute:`observation_space` fromgym.Space
toakro.Space
.GymEnv
handles all environments created bymake()
.It returns a different wrapper class instance if the input environment requires special handling. Current supported wrapper classes are:
garage.envs.bullet.BulletEnv for Bullet-based gym environments.
See __new__() for details.
-
property
action_space
(self)¶ akro.Space: The action space specification.
-
property
observation_space
(self)¶ akro.Space: The observation space specification.
-
property
spec
(self)¶ garage.envs.env_spec.EnvSpec: The envionrment specification.
-
property
render_modes
(self)¶ list: A list of string representing the supported render modes.
-
reset
(self)¶ Call reset on wrapped env.
- Returns
- The first observation conforming to
observation_space.
- dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
- Return type
numpy.ndarray
-
step
(self, action)¶ Call step on wrapped env.
- Parameters
action (np.ndarray) – An action provided by the agent.
- Returns
The environment step resulting from the action.
- Return type
- Raises
RuntimeError – if step() is called after the environment has been constructed and reset() has not been called.
-
render
(self, mode)¶ Renders the environment.
-
visualize
(self)¶ Creates a visualization of the environment.
-
close
(self)¶ Close the wrapped env.
-
property
-
class
MetaWorldSetTaskEnv
(benchmark=None, kind=None, wrapper=None, add_env_onehot=False)¶ Bases:
garage._environment.Environment
Environment form of a MetaWorld benchmark.
This class is generally less efficient than using a TaskSampler, if that can be used instead, since each instance of this class internally caches a copy of each environment in the benchmark.
In order to sample tasks from this environment, a benchmark must be passed at construction time.
- Parameters
benchmark (metaworld.Benchmark or None) – The benchmark to wrap.
wrapper (Callable[garage.Env, garage.Env] or None) – Wrapper to apply to env instances.
add_env_onehot (bool) – If true, a one-hot representing the current environment name will be added to the environments. Should only be used with multi-task benchmarks.
- Raises
ValueError – If kind is not ‘train’, ‘test’, or None. Also raisd if add_env_onehot is used on a metaworld meta learning (not multi-task) benchmark.
-
property
num_tasks
(self)¶ int: Returns number of tasks.
Part of the set_task environment protocol.
-
sample_tasks
(self, n_tasks)¶ Samples n_tasks tasks.
Part of the set_task environment protocol. To call this method, a benchmark must have been passed in at environment construction.
-
set_task
(self, task)¶ Set the task.
Part of the set_task environment protocol.
-
property
action_space
(self)¶ akro.Space: The action space specification.
-
property
observation_space
(self)¶ akro.Space: The observation space specification.
-
property
spec
(self)¶ EnvSpec: The envionrment specification.
-
property
render_modes
(self)¶ list: A list of string representing the supported render modes.
-
step
(self, action)¶ Step the wrapped env.
- Parameters
action (np.ndarray) – An action provided by the agent.
- Returns
The environment step resulting from the action.
- Return type
-
reset
(self)¶ Reset the wrapped env.
- Returns
- The first observation conforming to
observation_space.
- dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
- Return type
numpy.ndarray
-
render
(self, mode)¶ Render the wrapped environment.
-
visualize
(self)¶ Creates a visualization of the wrapped environment.
-
close
(self)¶ Close the wrapped env.
-
class
MultiEnvWrapper
(envs, sample_strategy=uniform_random_strategy, mode='add-onehot', env_names=None)¶ Bases:
garage.Wrapper
A wrapper class to handle multiple environments.
This wrapper adds an integer ‘task_id’ to env_info every timestep.
- Parameters
envs (list(Environment)) – A list of objects implementing Environment.
sample_strategy (function(int, int)) – Sample strategy to be used when sampling a new task.
mode (str) –
A string from ‘vanilla`, ‘add-onehot’ and ‘del-onehot’. The type of observation to use. - ‘vanilla’ provides the observation as it is.
- Use case: metaworld environments with MT* algorithms,
gym environments with Task Embedding.
’add-onehot’ will append an one-hot task id to observation. Use case: gym environments with MT* algorithms.
’del-onehot’ assumes an one-hot task id is appended to observation, and it excludes that. Use case: metaworld environments with Task Embedding.
env_names (list(str)) – The names of the environments corresponding to envs. The index of an env_name must correspond to the index of the corresponding env in envs. An env_name in env_names must be unique.
-
property
observation_space
(self)¶ Observation space.
- Returns
Observation space.
- Return type
akro.Box
-
property
spec
(self)¶ Describes the action and observation spaces of the wrapped envs.
- Returns
- the action and observation spaces of the
wrapped environments.
- Return type
-
property
task_space
(self)¶ Task Space.
- Returns
Task space.
- Return type
akro.Box
-
property
active_task_index
(self)¶ Index of active task env.
- Returns
Index of active task.
- Return type
-
reset
(self)¶ Sample new task and call reset on new task env.
- Returns
- The first observation conforming to
observation_space.
- dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
- Return type
numpy.ndarray
-
step
(self, action)¶ Step the active task env.
-
close
(self)¶ Close all task envs.
-
property
action_space
(self)¶ akro.Space: The action space specification.
-
property
render_modes
(self)¶ list: A list of string representing the supported render modes.
-
render
(self, mode)¶ Render the wrapped environment.
-
visualize
(self)¶ Creates a visualization of the wrapped environment.
-
property
unwrapped
(self)¶ garage.Environment: The inner environment.
-
normalize
¶
-
class
PointEnv
(goal=np.array(1.0, 1.0, dtype=np.float32), arena_size=5.0, done_bonus=0.0, never_done=False, max_episode_length=math.inf)¶ Bases:
garage.Environment
A simple 2D point environment.
- Parameters
goal (np.ndarray) – A 2D array representing the goal position
arena_size (float) – The size of arena where the point is constrained within (-arena_size, arena_size) in each dimension
done_bonus (float) – A numerical bonus added to the reward once the point as reached the goal
never_done (bool) – Never send a done signal, even if the agent achieves the goal
max_episode_length (int) – The maximum steps allowed for an episode.
-
property
action_space
(self)¶ akro.Space: The action space specification.
-
property
observation_space
(self)¶ akro.Space: The observation space specification.
-
property
spec
(self)¶ EnvSpec: The environment specification.
-
property
render_modes
(self)¶ list: A list of string representing the supported render modes.
-
reset
(self)¶ Reset the environment.
- Returns
- The first observation conforming to
observation_space.
- dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
- Return type
numpy.ndarray
-
step
(self, action)¶ Step the environment.
- Parameters
action (np.ndarray) – An action provided by the agent.
- Returns
The environment step resulting from the action.
- Return type
- Raises
RuntimeError – if step() is called after the environment
has been – constructed and reset() has not been called.
-
render
(self, mode)¶ Renders the environment.
-
visualize
(self)¶ Creates a visualization of the environment.
-
close
(self)¶ Close the env.
-
sample_tasks
(self, num_tasks)¶ Sample a list of num_tasks tasks.
-
class
TaskNameWrapper
(env, *, task_name=None, task_id=None)¶ Bases:
garage.Wrapper
Add task_name or task_id to env infos.
- Parameters
-
step
(self, action)¶ gym.Env step for the active task env.
- Parameters
action (np.ndarray) – Action performed by the agent in the environment.
- Returns
np.ndarray: Agent’s observation of the current environment. float: Amount of reward yielded by previous action. bool: True iff the episode has ended. dict[str, np.ndarray]: Contains auxiliary diagnostic
information about this time-step.
- Return type
-
property
action_space
(self)¶ akro.Space: The action space specification.
-
property
observation_space
(self)¶ akro.Space: The observation space specification.
-
property
spec
(self)¶ EnvSpec: The environment specification.
-
property
render_modes
(self)¶ list: A list of string representing the supported render modes.
-
reset
(self)¶ Reset the wrapped env.
- Returns
- The first observation conforming to
observation_space.
- dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
- Return type
numpy.ndarray
-
render
(self, mode)¶ Render the wrapped environment.
-
visualize
(self)¶ Creates a visualization of the wrapped environment.
-
close
(self)¶ Close the wrapped env.
-
property
unwrapped
(self)¶ garage.Environment: The inner environment.
-
class
TaskOnehotWrapper
(env, task_index, n_total_tasks)¶ Bases:
garage.Wrapper
Append a one-hot task representation to an environment.
See TaskOnehotWrapper.wrap_env_list for the recommended way of creating this class.
- Parameters
env (Environment) – The environment to wrap.
task_index (int) – The index of this task among the tasks.
n_total_tasks (int) – The number of total tasks.
-
property
observation_space
(self)¶ akro.Space: The observation space specification.
-
property
spec
(self)¶ Return the environment specification.
- Returns
The envionrment specification.
- Return type
-
reset
(self)¶ Sample new task and call reset on new task env.
- Returns
- The first observation conforming to
observation_space.
- dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
- Return type
numpy.ndarray
-
step
(self, action)¶ Environment step for the active task env.
- Parameters
action (np.ndarray) – Action performed by the agent in the environment.
- Returns
The environment step resulting from the action.
- Return type
-
classmethod
wrap_env_list
(cls, envs)¶ Wrap a list of environments, giving each environment a one-hot.
This is the primary way of constructing instances of this class. It’s mostly useful when training multi-task algorithms using a multi-task aware sampler.
For example: ‘’’ .. code-block:: python
envs = get_mt10_envs() wrapped = TaskOnehotWrapper.wrap_env_list(envs) sampler = trainer.make_sampler(LocalSampler, env=wrapped)
‘’‘
- Parameters
envs (list[Environment]) – List of environments to wrap. Note
the (that) – order these environments are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent.
- Returns
The wrapped environments.
- Return type
-
classmethod
wrap_env_cons_list
(cls, env_cons)¶ Wrap a list of environment constructors, giving each a one-hot.
This function is useful if you want to avoid constructing any environments in the main experiment process, and are using a multi-task aware remote sampler (i.e. ~RaySampler).
For example: ‘’’ .. code-block:: python
env_constructors = get_mt10_env_cons() wrapped = TaskOnehotWrapper.wrap_env_cons_list(env_constructors) env_updates = [NewEnvUpdate(wrapped_con)
for wrapped_con in wrapped]
sampler = trainer.make_sampler(RaySampler, env=env_updates)
‘’‘
- Parameters
env_cons (list[Callable[Environment]]) – List of environment
constructor – to wrap. Note that the order these constructors are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent.
- Returns
The wrapped environments.
- Return type
list[Callable[TaskOnehotWrapper]]
-
property
action_space
(self)¶ akro.Space: The action space specification.
-
property
render_modes
(self)¶ list: A list of string representing the supported render modes.
-
render
(self, mode)¶ Render the wrapped environment.
-
visualize
(self)¶ Creates a visualization of the wrapped environment.
-
close
(self)¶ Close the wrapped env.
-
property
unwrapped
(self)¶ garage.Environment: The inner environment.