garage.envs.task_onehot_wrapper

Wrapper for appending one-hot task encodings to individual task envs.

See ~TaskOnehotWrapper.wrap_env_list for the main way of using this module.

class TaskOnehotWrapper(env, task_index, n_total_tasks)

Bases: garage.Wrapper

Inheritance diagram of garage.envs.task_onehot_wrapper.TaskOnehotWrapper

Append a one-hot task representation to an environment.

See TaskOnehotWrapper.wrap_env_list for the recommended way of creating this class.

Parameters
  • env (Environment) – The environment to wrap.

  • task_index (int) – The index of this task among the tasks.

  • n_total_tasks (int) – The number of total tasks.

property observation_space(self)

akro.Space: The observation space specification.

property spec(self)

Return the environment specification.

Returns

The envionrment specification.

Return type

EnvSpec

reset(self)

Sample new task and call reset on new task env.

Returns

The first observation conforming to

observation_space.

dict: The episode-level information.

Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)

Return type

numpy.ndarray

step(self, action)

Environment step for the active task env.

Parameters

action (np.ndarray) – Action performed by the agent in the environment.

Returns

The environment step resulting from the action.

Return type

EnvStep

classmethod wrap_env_list(cls, envs)

Wrap a list of environments, giving each environment a one-hot.

This is the primary way of constructing instances of this class. It’s mostly useful when training multi-task algorithms using a multi-task aware sampler.

For example: ‘’’ .. code-block:: python

envs = get_mt10_envs() wrapped = TaskOnehotWrapper.wrap_env_list(envs) sampler = trainer.make_sampler(LocalSampler, env=wrapped)

‘’’

Parameters
  • envs (list[Environment]) – List of environments to wrap. Note

  • the (that) – order these environments are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent.

Returns

The wrapped environments.

Return type

list[TaskOnehotWrapper]

classmethod wrap_env_cons_list(cls, env_cons)

Wrap a list of environment constructors, giving each a one-hot.

This function is useful if you want to avoid constructing any environments in the main experiment process, and are using a multi-task aware remote sampler (i.e. ~RaySampler).

For example: ‘’’ .. code-block:: python

env_constructors = get_mt10_env_cons() wrapped = TaskOnehotWrapper.wrap_env_cons_list(env_constructors) env_updates = [NewEnvUpdate(wrapped_con)

for wrapped_con in wrapped]

sampler = trainer.make_sampler(RaySampler, env=env_updates)

‘’’

Parameters
  • env_cons (list[Callable[Environment]]) – List of environment

  • constructor – to wrap. Note that the order these constructors are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent.

Returns

The wrapped environments.

Return type

list[Callable[TaskOnehotWrapper]]

property action_space(self)

akro.Space: The action space specification.

property render_modes(self)

list: A list of string representing the supported render modes.

render(self, mode)

Render the wrapped environment.

Parameters

mode (str) – the mode to render with. The string must be present in self.render_modes.

Returns

the return value for render, depending on each env.

Return type

object

visualize(self)

Creates a visualization of the wrapped environment.

close(self)

Close the wrapped env.

property unwrapped(self)

garage.Environment: The inner environment.