`garage.envs`¶

Garage wrappers for gym environments.

class GridWorldEnv(desc='4x4', max_episode_length=None)¶

Bases: garage.Environment

Inheritance diagram of garage.envs.GridWorldEnv

A simply 2D grid environment.

‘S’ : starting point
‘F’ or ‘.’: free space
‘W’ or ‘x’: wall
‘H’ or ‘o’: hole (terminates episode)
‘G’ : goal

property action_space(self)¶: akro.Space: The action space specification.

property observation_space(self)¶: akro.Space: The observation space specification.

property spec(self)¶: EnvSpec: The environment specification.

property render_modes(self)¶: list: A list of string representing the supported render modes.

reset(self)¶

Resets the environment.

Returns

The first observation conforming to: observation_space.
dict: The episode-level information.: Note that this is not part of env_info provided in step(). It contains information of he entire episode， which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)

Return type

numpy.ndarray

step(self, action)¶

Steps the environment.

action map: 0: left 1: down 2: right 3: up

Parameters

action (int) – an int encoding the action

Returns

The environment step resulting from the action.

Return type

EnvStep

Raises

RuntimeError – if step() is called after the environment has been constructed and reset() has not been called.
NotImplementedError – if a next step in self._desc does not match known state type.

render(self, mode)¶

Renders the environment.

Parameters: mode (str) – the mode to render with. The string must be present in Environment.render_modes.

visualize(self)¶: Creates a visualization of the environment.

close(self)¶: Close the env.

class GymEnv(env, is_image=False, max_episode_length=None)¶

Bases: garage.Environment

Inheritance diagram of garage.envs.GymEnv

Returns an abstract Garage wrapper class for gym.Env.

In order to provide pickling (serialization) and parameterization for gym.Env instances, they must be wrapped with GymEnv. This ensures compatibility with existing samplers and checkpointing when the envs are passed internally around garage.

Furthermore, classes inheriting from GymEnv should silently convert :attribute:`action_space` and :attribute:`observation_space` from gym.Space to akro.Space.

GymEnv handles all environments created by make().

It returns a different wrapper class instance if the input environment requires special handling. Current supported wrapper classes are:

garage.envs.bullet.BulletEnv for Bullet-based gym environments.

See __new__() for details.

property action_space(self)¶: akro.Space: The action space specification.

property observation_space(self)¶: akro.Space: The observation space specification.

property spec(self)¶: garage.envs.env_spec.EnvSpec: The envionrment specification.

property render_modes(self)¶: list: A list of string representing the supported render modes.

reset(self)¶

Call reset on wrapped env.

Returns

The first observation conforming to: observation_space.
dict: The episode-level information.: Note that this is not part of env_info provided in step(). It contains information of he entire episode， which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)

Return type

numpy.ndarray

step(self, action)¶

Call step on wrapped env.

Parameters: action (np.ndarray) – An action provided by the agent.
Returns: The environment step resulting from the action.
Return type: EnvStep
Raises: RuntimeError – if step() is called after the environment has been constructed and reset() has not been called.

render(self, mode)¶

Renders the environment.

Parameters: mode (str) – the mode to render with. The string must be present in self.render_modes.
Returns: the return value for render, depending on each env.
Return type: object

visualize(self)¶: Creates a visualization of the environment.

close(self)¶: Close the wrapped env.

class MetaWorldSetTaskEnv(benchmark=None, kind=None, wrapper=None, add_env_onehot=False)¶

Bases: garage._environment.Environment

Inheritance diagram of garage.envs.MetaWorldSetTaskEnv

Environment form of a MetaWorld benchmark.

This class is generally less efficient than using a TaskSampler, if that can be used instead, since each instance of this class internally caches a copy of each environment in the benchmark.

In order to sample tasks from this environment, a benchmark must be passed at construction time.

Parameters

benchmark (metaworld.Benchmark or None) – The benchmark to wrap.
kind (str or None) – Whether to use test or train tasks.
wrapper (Callable[garage.Env, garage.Env] or None) – Wrapper to apply to env instances.
add_env_onehot (bool) – If true, a one-hot representing the current environment name will be added to the environments. Should only be used with multi-task benchmarks.

Raises

ValueError – If kind is not ‘train’, ‘test’, or None. Also raisd if add_env_onehot is used on a metaworld meta learning (not multi-task) benchmark.

property num_tasks(self)¶

int: Returns number of tasks.

Part of the set_task environment protocol.

sample_tasks(self, n_tasks)¶

Samples n_tasks tasks.

Part of the set_task environment protocol. To call this method, a benchmark must have been passed in at environment construction.

Parameters: n_tasks (int) – Number of tasks to sample.
Returns: Task object to pass back to set_task.
Return type: dict[str,object]

set_task(self, task)¶

Set the task.

Part of the set_task environment protocol.

Parameters: task (dict[str,object]) – Task object from sample_tasks.

property action_space(self)¶: akro.Space: The action space specification.

property observation_space(self)¶: akro.Space: The observation space specification.

property spec(self)¶: EnvSpec: The envionrment specification.

property render_modes(self)¶: list: A list of string representing the supported render modes.

step(self, action)¶

Step the wrapped env.

Parameters: action (np.ndarray) – An action provided by the agent.
Returns: The environment step resulting from the action.
Return type: EnvStep

reset(self)¶

Reset the wrapped env.

Returns

The first observation conforming to: observation_space.
dict: The episode-level information.: Note that this is not part of env_info provided in step(). It contains information of he entire episode， which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)

Return type

numpy.ndarray

render(self, mode)¶

Render the wrapped environment.

Parameters: mode (str) – the mode to render with. The string must be present in self.render_modes.
Returns: the return value for render, depending on each env.
Return type: object

visualize(self)¶: Creates a visualization of the wrapped environment.

close(self)¶: Close the wrapped env.

class MultiEnvWrapper(envs, sample_strategy=uniform_random_strategy, mode='add-onehot', env_names=None)¶

Bases: garage.Wrapper

Inheritance diagram of garage.envs.MultiEnvWrapper

A wrapper class to handle multiple environments.

This wrapper adds an integer ‘task_id’ to env_info every timestep.

Parameters

envs (list(Environment)) – A list of objects implementing Environment.
sample_strategy (function(int, int)) – Sample strategy to be used when sampling a new task.
mode (str) –
A string from ‘vanilla`, ‘add-onehot’ and ‘del-onehot’. The type of observation to use. - ‘vanilla’ provides the observation as it is.

Use case: metaworld environments with MT* algorithms,
gym environments with Task Embedding.
- ’add-onehot’ will append an one-hot task id to observation. Use case: gym environments with MT* algorithms.
- ’del-onehot’ assumes an one-hot task id is appended to observation, and it excludes that. Use case: metaworld environments with Task Embedding.
env_names (list(str)) – The names of the environments corresponding to envs. The index of an env_name must correspond to the index of the corresponding env in envs. An env_name in env_names must be unique.

property observation_space(self)¶

Observation space.

Returns: Observation space.
Return type: akro.Box

property spec(self)¶

Describes the action and observation spaces of the wrapped envs.

Returns

the action and observation spaces of the: wrapped environments.

Return type

EnvSpec

property num_tasks(self)¶

Total number of tasks.

Returns: number of tasks.
Return type: int

property task_space(self)¶

Task Space.

Returns: Task space.
Return type: akro.Box

property active_task_index(self)¶

Index of active task env.

Returns: Index of active task.
Return type: int

reset(self)¶

Sample new task and call reset on new task env.

Returns

The first observation conforming to: observation_space.
dict: The episode-level information.: Note that this is not part of env_info provided in step(). It contains information of he entire episode， which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)

Return type

numpy.ndarray

step(self, action)¶

Step the active task env.

Parameters: action (object) – object to be passed in Environment.reset(action)
Returns: The environment step resulting from the action.
Return type: EnvStep

close(self)¶: Close all task envs.

property action_space(self)¶: akro.Space: The action space specification.

property render_modes(self)¶: list: A list of string representing the supported render modes.

render(self, mode)¶

Render the wrapped environment.

Parameters: mode (str) – the mode to render with. The string must be present in self.render_modes.
Returns: the return value for render, depending on each env.
Return type: object

visualize(self)¶: Creates a visualization of the wrapped environment.

property unwrapped(self)¶: garage.Environment: The inner environment.

normalize¶

class PointEnv(goal=np.array(1.0, 1.0, dtype=np.float32), arena_size=5.0, done_bonus=0.0, never_done=False, max_episode_length=math.inf)¶

Bases: garage.Environment

Inheritance diagram of garage.envs.PointEnv

A simple 2D point environment.

Parameters

goal (np.ndarray) – A 2D array representing the goal position
arena_size (float) – The size of arena where the point is constrained within (-arena_size, arena_size) in each dimension
done_bonus (float) – A numerical bonus added to the reward once the point as reached the goal
never_done (bool) – Never send a done signal, even if the agent achieves the goal
max_episode_length (int) – The maximum steps allowed for an episode.

property action_space(self)¶: akro.Space: The action space specification.

property observation_space(self)¶: akro.Space: The observation space specification.

property spec(self)¶: EnvSpec: The environment specification.

property render_modes(self)¶: list: A list of string representing the supported render modes.

reset(self)¶

Reset the environment.

Returns

The first observation conforming to: observation_space.
dict: The episode-level information.: Note that this is not part of env_info provided in step(). It contains information of he entire episode， which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)

Return type

numpy.ndarray

step(self, action)¶

Step the environment.

Parameters

action (np.ndarray) – An action provided by the agent.

Returns

The environment step resulting from the action.

Return type

EnvStep

Raises

RuntimeError – if step() is called after the environment
has been – constructed and reset() has not been called.

render(self, mode)¶

Renders the environment.

Parameters: mode (str) – the mode to render with. The string must be present in self.render_modes.
Returns: the point and goal of environment.
Return type: str

visualize(self)¶: Creates a visualization of the environment.

close(self)¶: Close the env.

sample_tasks(self, num_tasks)¶

Sample a list of num_tasks tasks.

Parameters

num_tasks (int) – Number of tasks to sample.

Returns

A list of “tasks”, where each task is: a dictionary containing a single key, “goal”, mapping to a point in 2D space.

Return type

list[dict[str, np.ndarray]]

set_task(self, task)¶

Reset with a task.

Parameters: task (dict[str, np.ndarray]) – A task (a dictionary containing a single key, “goal”, which should be a point in 2D space).

class TaskNameWrapper(env, *, task_name=None, task_id=None)¶

Bases: garage.Wrapper

Inheritance diagram of garage.envs.TaskNameWrapper

Add task_name or task_id to env infos.

Parameters

env (gym.Env) – The environment to wrap.
task_name (str or None) – Task name to be added, if any.
task_id (int or None) – Task ID to be added, if any.

step(self, action)¶

gym.Env step for the active task env.

Parameters

action (np.ndarray) – Action performed by the agent in the environment.

Returns

np.ndarray: Agent’s observation of the current environment. float: Amount of reward yielded by previous action. bool: True iff the episode has ended. dict[str, np.ndarray]: Contains auxiliary diagnostic

information about this time-step.

Return type

tuple

property action_space(self)¶: akro.Space: The action space specification.

property observation_space(self)¶: akro.Space: The observation space specification.

property spec(self)¶: EnvSpec: The environment specification.

property render_modes(self)¶: list: A list of string representing the supported render modes.

reset(self)¶

Reset the wrapped env.

Returns

The first observation conforming to: observation_space.
dict: The episode-level information.: Note that this is not part of env_info provided in step(). It contains information of he entire episode， which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)

Return type

numpy.ndarray

render(self, mode)¶

Render the wrapped environment.

Parameters: mode (str) – the mode to render with. The string must be present in self.render_modes.
Returns: the return value for render, depending on each env.
Return type: object

visualize(self)¶: Creates a visualization of the wrapped environment.

close(self)¶: Close the wrapped env.

property unwrapped(self)¶: garage.Environment: The inner environment.

class TaskOnehotWrapper(env, task_index, n_total_tasks)¶

Bases: garage.Wrapper

Inheritance diagram of garage.envs.TaskOnehotWrapper

Append a one-hot task representation to an environment.

See TaskOnehotWrapper.wrap_env_list for the recommended way of creating this class.

Parameters

env (Environment) – The environment to wrap.
task_index (int) – The index of this task among the tasks.
n_total_tasks (int) – The number of total tasks.

property observation_space(self)¶: akro.Space: The observation space specification.

property spec(self)¶

Return the environment specification.

Returns: The envionrment specification.
Return type: EnvSpec

reset(self)¶

Sample new task and call reset on new task env.

Returns

The first observation conforming to: observation_space.
dict: The episode-level information.: Note that this is not part of env_info provided in step(). It contains information of he entire episode， which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)

Return type

numpy.ndarray

step(self, action)¶

Environment step for the active task env.

Parameters: action (np.ndarray) – Action performed by the agent in the environment.
Returns: The environment step resulting from the action.
Return type: EnvStep

classmethod wrap_env_list(cls, envs)¶

Wrap a list of environments, giving each environment a one-hot.

This is the primary way of constructing instances of this class. It’s mostly useful when training multi-task algorithms using a multi-task aware sampler.

For example: ‘’’ .. code-block:: python

envs = get_mt10_envs() wrapped = TaskOnehotWrapper.wrap_env_list(envs) sampler = trainer.make_sampler(LocalSampler, env=wrapped)

‘’‘

Parameters

envs (list[Environment]) – List of environments to wrap. Note
the (that) – order these environments are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent.

Returns

The wrapped environments.

Return type

list[TaskOnehotWrapper]

classmethod wrap_env_cons_list(cls, env_cons)¶

Wrap a list of environment constructors, giving each a one-hot.

This function is useful if you want to avoid constructing any environments in the main experiment process, and are using a multi-task aware remote sampler (i.e. ~RaySampler).

For example: ‘’’ .. code-block:: python

env_constructors = get_mt10_env_cons() wrapped = TaskOnehotWrapper.wrap_env_cons_list(env_constructors) env_updates = [NewEnvUpdate(wrapped_con)

for wrapped_con in wrapped]

sampler = trainer.make_sampler(RaySampler, env=env_updates)

‘’‘

Parameters

env_cons (list[Callable[Environment]]) – List of environment
constructor – to wrap. Note that the order these constructors are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent.

Returns

The wrapped environments.

Return type

list[Callable[TaskOnehotWrapper]]

property action_space(self)¶: akro.Space: The action space specification.

property render_modes(self)¶: list: A list of string representing the supported render modes.

render(self, mode)¶

Render the wrapped environment.

Parameters: mode (str) – the mode to render with. The string must be present in self.render_modes.
Returns: the return value for render, depending on each env.
Return type: object

visualize(self)¶: Creates a visualization of the wrapped environment.

close(self)¶: Close the wrapped env.

property unwrapped(self)¶: garage.Environment: The inner environment.

garage.envs¶

`garage.envs`¶