garage.envs.task_onehot_wrapper module¶

Wrapper for appending one-hot task encodings to individual task envs.

See ~TaskOnehotWrapper.wrap_env_list for the main way of using this module.

class TaskOnehotWrapper(env, task_index, n_total_tasks)[source]¶

Bases: gym.core.Wrapper

Append a one-hot task representation to an environment.

See TaskOnehotWrapper.wrap_env_list for the recommended way of creating this class.

Parameters:	env (gym.Env) – The environment to wrap. task_index (int) – The index of this task among the tasks. n_total_tasks (int) – The number of total tasks.

reset(**kwargs)[source]¶

Sample new task and call reset on new task env.

Parameters:	kwargs (dict) – Keyword arguments to be passed to env.reset
Returns:	active task one-hot representation + observation
Return type:	numpy.ndarray

spec¶

Return the environment specification.

Returns:	The envionrment specification.
Return type:	garage.envs.env_spec.EnvSpec

step(action)[source]¶

gym.Env step for the active task env.

Parameters:	action (np.ndarray) – Action performed by the agent in the environment.
Returns:	np.ndarray: Agent’s observation of the current environment. float: Amount of reward yielded by previous action. bool: True iff the episode has ended. dict[str, np.ndarray]: Contains auxiliary diagnostic information about this time-step.
Return type:	tuple

classmethod wrap_env_cons_list(env_cons)[source]¶

Wrap a list of environment constructors, giving each a one-hot.

This function is useful if you want to avoid constructing any environments in the main experiment process, and are using a multi-task aware remote sampler (i.e. ~RaySampler).

For example: ‘’’ .. code-block:: python

env_constructors = get_mt10_env_cons() wrapped = TaskOnehotWrapper.wrap_env_cons_list(env_constructors) env_updates = [NewEnvUpdate(wrapped_con)

for wrapped_con in wrapped]

sampler = runner.make_sampler(RaySampler, env=env_updates)

‘’‘

Parameters:	env_cons (list[Callable[gym.Env]]) – List of environment constructor to wrap. Note that the order these constructors are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent.
Returns:	The wrapped environments.
Return type:	list[Callable[TaskOnehotWrapper]]

classmethod wrap_env_list(envs)[source]¶

Wrap a list of environments, giving each environment a one-hot.

This is the primary way of constructing instances of this class. It’s mostly useful when training multi-task algorithms using a multi-task aware sampler.

For example: ‘’’ .. code-block:: python

envs = get_mt10_envs() wrapped = TaskOnehotWrapper.wrap_env_list(envs) sampler = runner.make_sampler(LocalSampler, env=wrapped)

‘’‘

Parameters:	envs (list[gym.Env]) – List of environments to wrap. Note that the order these environments are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent.
Returns:	The wrapped environments.
Return type:	list[TaskOnehotWrapper]