`garage.envs.multi_env_wrapper`¶

A wrapper env that handles multiple tasks from different envs.

Useful while training multi-task reinforcement learning algorithms. It provides observations augmented with one-hot representation of tasks.

round_robin_strategy(num_tasks, last_task=None)¶

A function for sampling tasks in round robin fashion.

Parameters

num_tasks (int) – Total number of tasks.
last_task (int) – Previously sampled task.

Returns

task id.

Return type

int

uniform_random_strategy(num_tasks, _)¶

A function for sampling tasks uniformly at random.

Parameters

num_tasks (int) – Total number of tasks.
_ (object) – Ignored by this sampling strategy.

Returns

task id.

Return type

int

class MultiEnvWrapper(envs, sample_strategy=uniform_random_strategy, mode='add-onehot', env_names=None)¶

Bases: garage.Wrapper

Inheritance diagram of garage.envs.multi_env_wrapper.MultiEnvWrapper

A wrapper class to handle multiple environments.

This wrapper adds an integer ‘task_id’ to env_info every timestep.

Parameters

envs (list(Environment)) – A list of objects implementing Environment.
sample_strategy (function(int, int)) – Sample strategy to be used when sampling a new task.
mode (str) –
A string from ‘vanilla`, ‘add-onehot’ and ‘del-onehot’. The type of observation to use. - ‘vanilla’ provides the observation as it is.

Use case: metaworld environments with MT* algorithms,
gym environments with Task Embedding.
- ’add-onehot’ will append an one-hot task id to observation. Use case: gym environments with MT* algorithms.
- ’del-onehot’ assumes an one-hot task id is appended to observation, and it excludes that. Use case: metaworld environments with Task Embedding.
env_names (list(str)) – The names of the environments corresponding to envs. The index of an env_name must correspond to the index of the corresponding env in envs. An env_name in env_names must be unique.

property observation_space¶

Observation space.

Returns: Observation space.
Return type: akro.Box

property spec¶

Describes the action and observation spaces of the wrapped envs.

Returns

the action and observation spaces of the: wrapped environments.

Return type

EnvSpec

property num_tasks¶

Total number of tasks.

Returns: number of tasks.
Return type: int

property task_space¶

Task Space.

Returns: Task space.
Return type: akro.Box

property active_task_index¶

Index of active task env.

Returns: Index of active task.
Return type: int

property action_space¶

The action space specification.

Type: akro.Space

property render_modes¶

A list of string representing the supported render modes.

Type: list

property unwrapped¶

The inner environment.

Type: garage.Environment

reset()¶

Sample new task and call reset on new task env.

Returns

The first observation conforming to: observation_space.
dict: The episode-level information.: Note that this is not part of env_info provided in step(). It contains information of he entire episode， which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)

Return type

numpy.ndarray

step(action)¶

Step the active task env.

Parameters: action (object) – object to be passed in Environment.reset(action)
Returns: The environment step resulting from the action.
Return type: EnvStep

close()¶: Close all task envs.

render(mode)¶

Render the wrapped environment.

Parameters: mode (str) – the mode to render with. The string must be present in self.render_modes.
Returns: the return value for render, depending on each env.
Return type: object

visualize()¶: Creates a visualization of the wrapped environment.

garage.envs.multi_env_wrapper¶

`garage.envs.multi_env_wrapper`¶