garage.envs.multi_env_wrapper module¶
A wrapper env that handles multiple tasks from different envs.
Useful while training multi-task reinforcement learning algorithms. It provides observations augmented with one-hot representation of tasks.
-
class
MultiEnvWrapper
(envs, sample_strategy=<function uniform_random_strategy>, mode='add-onehot', env_names=None)[source]¶ Bases:
gym.core.Wrapper
A wrapper class to handle multiple environments.
This wrapper adds an integer ‘task_id’ to env_info every timestep.
Parameters: - envs (list(gym.Env)) – A list of objects implementing gym.Env.
- sample_strategy (function(int, int)) – Sample strategy to be used when sampling a new task.
- mode (str) –
A string from ‘vanilla`, ‘add-onehot’ and ‘del-onehot’. The type of observation to use. - ‘vanilla’ provides the observation as it is.
- Use case: metaworld environments with MT* algorithms,
- gym environments with Task Embedding.
- ’add-onehot’ will append an one-hot task id to observation. Use case: gym environments with MT* algorithms.
- ’del-onehot’ assumes an one-hot task id is appended to observation, and it excludes that. Use case: metaworld environments with Task Embedding.
- env_names (list(str)) – The names of the environments corresponding to envs. The index of an env_name must correspond to the index of the corresponding env in envs. An env_name in env_names must be unique.
-
observation_space
¶ Observation space.
Returns: Observation space. Return type: akro.Box
-
reset
(**kwargs)[source]¶ Sample new task and call reset on new task env.
Parameters: kwargs (dict) – Keyword arguments to be passed to gym.Env.reset Returns: active task one-hot representation + observation Return type: numpy.ndarray
-
spec
¶ Describes the action and observation spaces of the wrapped envs.
Returns: - the action and observation spaces of the
- wrapped environments.
Return type: garage.envs.EnvSpec
-
step
(action)[source]¶ gym.Env step for the active task env.
Parameters: action (object) – object to be passed in gym.Env.reset(action) Returns: agent’s observation of the current environment float: amount of reward returned after previous action bool: whether the episode has ended dict: contains auxiliary diagnostic information Return type: object
-
task_space
¶ Task Space.
Returns: Task space. Return type: akro.Box