garage.envs

Garage wrappers for gym environments.

class GridWorldEnv(desc='4x4', max_episode_length=None)

Bases: garage.Environment

Inheritance diagram of garage.envs.GridWorldEnv

A simply 2D grid environment.

‘S’ : starting point
‘F’ or ‘.’: free space
‘W’ or ‘x’: wall
‘H’ or ‘o’: hole (terminates episode)
‘G’ : goal
action_space

The action space specification.

Type:akro.Space
observation_space

The observation space specification.

Type:akro.Space
spec

The environment specification.

Type:EnvSpec
render_modes

A list of string representing the supported render modes.

Type:list
reset(self)

Resets the environment.

Returns:
The first observation conforming to
observation_space.
dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
Return type:numpy.ndarray
step(self, action)

Steps the environment.

action map: 0: left 1: down 2: right 3: up

Parameters:

action (int) – an int encoding the action

Returns:

The environment step resulting from the action.

Return type:

EnvStep

Raises:
  • RuntimeError – if step() is called after the environment has been constructed and reset() has not been called.
  • NotImplementedError – if a next step in self._desc does not match known state type.
render(self, mode)

Renders the environment.

Parameters:mode (str) – the mode to render with. The string must be present in Environment.render_modes.
visualize(self)

Creates a visualization of the environment.

close(self)

Close the env.

class GymEnv(env, is_image=False, max_episode_length=None)

Bases: garage.Environment

Inheritance diagram of garage.envs.GymEnv

Returns an abstract Garage wrapper class for gym.Env.

In order to provide pickling (serialization) and parameterization for gym.Env instances, they must be wrapped with GymEnv. This ensures compatibility with existing samplers and checkpointing when the envs are passed internally around garage.

Furthermore, classes inheriting from GymEnv should silently convert :attribute:`action_space` and :attribute:`observation_space` from gym.Space to akro.Space.

GymEnv handles all environments created by make().

It returns a different wrapper class instance if the input environment requires special handling. Current supported wrapper classes are:

garage.envs.bullet.BulletEnv for Bullet-based gym environments.

See __new__() for details.

action_space

The action space specification.

Type:akro.Space
observation_space

The observation space specification.

Type:akro.Space
spec

The envionrment specification.

Type:garage.envs.env_spec.EnvSpec
render_modes

A list of string representing the supported render modes.

Type:list
reset(self)

Call reset on wrapped env.

Returns:
The first observation conforming to
observation_space.
dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
Return type:numpy.ndarray
step(self, action)

Call step on wrapped env.

Parameters:action (np.ndarray) – An action provided by the agent.
Returns:The environment step resulting from the action.
Return type:EnvStep
Raises:RuntimeError – if step() is called after the environment has been constructed and reset() has not been called.
render(self, mode)

Renders the environment.

Parameters:mode (str) – the mode to render with. The string must be present in self.render_modes.
Returns:the return value for render, depending on each env.
Return type:object
visualize(self)

Creates a visualization of the environment.

close(self)

Close the wrapped env.

class MultiEnvWrapper(envs, sample_strategy=uniform_random_strategy, mode='add-onehot', env_names=None)

Bases: garage.Wrapper

Inheritance diagram of garage.envs.MultiEnvWrapper

A wrapper class to handle multiple environments.

This wrapper adds an integer ‘task_id’ to env_info every timestep.

Parameters:
  • envs (list(Environment)) – A list of objects implementing Environment.
  • sample_strategy (function(int, int)) – Sample strategy to be used when sampling a new task.
  • mode (str) –

    A string from ‘vanilla`, ‘add-onehot’ and ‘del-onehot’. The type of observation to use. - ‘vanilla’ provides the observation as it is.

    Use case: metaworld environments with MT* algorithms,
    gym environments with Task Embedding.
    • ’add-onehot’ will append an one-hot task id to observation. Use case: gym environments with MT* algorithms.
    • ’del-onehot’ assumes an one-hot task id is appended to observation, and it excludes that. Use case: metaworld environments with Task Embedding.
  • env_names (list(str)) – The names of the environments corresponding to envs. The index of an env_name must correspond to the index of the corresponding env in envs. An env_name in env_names must be unique.
observation_space

Observation space.

Returns:Observation space.
Return type:akro.Box
spec

Describes the action and observation spaces of the wrapped envs.

Returns:
the action and observation spaces of the
wrapped environments.
Return type:EnvSpec
num_tasks

Total number of tasks.

Returns:number of tasks.
Return type:int
task_space

Task Space.

Returns:Task space.
Return type:akro.Box
active_task_index

Index of active task env.

Returns:Index of active task.
Return type:int
action_space

The action space specification.

Type:akro.Space
render_modes

A list of string representing the supported render modes.

Type:list
reset(self)

Sample new task and call reset on new task env.

Returns:
The first observation conforming to
observation_space.
dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
Return type:numpy.ndarray
step(self, action)

Step the active task env.

Parameters:action (object) – object to be passed in Environment.reset(action)
Returns:The environment step resulting from the action.
Return type:EnvStep
close(self)

Close all task envs.

render(self, mode)

Render the wrapped environment.

Parameters:mode (str) – the mode to render with. The string must be present in self.render_modes.
Returns:the return value for render, depending on each env.
Return type:object
visualize(self)

Creates a visualization of the wrapped environment.

normalize
class PointEnv(goal=np.array((1.0, 1.0), dtype=np.float32), arena_size=5.0, done_bonus=0.0, never_done=False, max_episode_length=math.inf)

Bases: garage.Environment

Inheritance diagram of garage.envs.PointEnv

A simple 2D point environment.

Parameters:
  • goal (np.ndarray) – A 2D array representing the goal position
  • arena_size (float) – The size of arena where the point is constrained within (-arena_size, arena_size) in each dimension
  • done_bonus (float) – A numerical bonus added to the reward once the point as reached the goal
  • never_done (bool) – Never send a done signal, even if the agent achieves the goal
  • max_episode_length (int) – The maximum steps allowed for an episode.
action_space

The action space specification.

Type:akro.Space
observation_space

The observation space specification.

Type:akro.Space
spec

The environment specification.

Type:EnvSpec
render_modes

A list of string representing the supported render modes.

Type:list
reset(self)

Reset the environment.

Returns:
The first observation conforming to
observation_space.
dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
Return type:numpy.ndarray
step(self, action)

Step the environment.

Parameters:

action (np.ndarray) – An action provided by the agent.

Returns:

The environment step resulting from the action.

Return type:

EnvStep

Raises:
  • RuntimeError – if step() is called after the environment
  • has been – constructed and reset() has not been called.
render(self, mode)

Renders the environment.

Parameters:mode (str) – the mode to render with. The string must be present in self.render_modes.
Returns:the point and goal of environment.
Return type:str
visualize(self)

Creates a visualization of the environment.

close(self)

Close the env.

sample_tasks(self, num_tasks)

Sample a list of num_tasks tasks.

Parameters:num_tasks (int) – Number of tasks to sample.
Returns:
A list of “tasks”, where each task is
a dictionary containing a single key, “goal”, mapping to a point in 2D space.
Return type:list[dict[str, np.ndarray]]
set_task(self, task)

Reset with a task.

Parameters:task (dict[str, np.ndarray]) – A task (a dictionary containing a single key, “goal”, which should be a point in 2D space).
class TaskOnehotWrapper(env, task_index, n_total_tasks)

Bases: garage.Wrapper

Inheritance diagram of garage.envs.TaskOnehotWrapper

Append a one-hot task representation to an environment.

See TaskOnehotWrapper.wrap_env_list for the recommended way of creating this class.

Parameters:
  • env (Environment) – The environment to wrap.
  • task_index (int) – The index of this task among the tasks.
  • n_total_tasks (int) – The number of total tasks.
observation_space

The observation space specification.

Type:akro.Space
spec

Return the environment specification.

Returns:The envionrment specification.
Return type:EnvSpec
action_space

The action space specification.

Type:akro.Space
render_modes

A list of string representing the supported render modes.

Type:list
reset(self)

Sample new task and call reset on new task env.

Returns:
The first observation conforming to
observation_space.
dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
Return type:numpy.ndarray
step(self, action)

Environment step for the active task env.

Parameters:action (np.ndarray) – Action performed by the agent in the environment.
Returns:The environment step resulting from the action.
Return type:EnvStep
classmethod wrap_env_list(cls, envs)

Wrap a list of environments, giving each environment a one-hot.

This is the primary way of constructing instances of this class. It’s mostly useful when training multi-task algorithms using a multi-task aware sampler.

For example: ‘’’ .. code-block:: python

envs = get_mt10_envs() wrapped = TaskOnehotWrapper.wrap_env_list(envs) sampler = runner.make_sampler(LocalSampler, env=wrapped)

‘’‘

Parameters:
  • envs (list[Environment]) – List of environments to wrap. Note
  • the (that) – order these environments are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent.
Returns:

The wrapped environments.

Return type:

list[TaskOnehotWrapper]

classmethod wrap_env_cons_list(cls, env_cons)

Wrap a list of environment constructors, giving each a one-hot.

This function is useful if you want to avoid constructing any environments in the main experiment process, and are using a multi-task aware remote sampler (i.e. ~RaySampler).

For example: ‘’’ .. code-block:: python

env_constructors = get_mt10_env_cons() wrapped = TaskOnehotWrapper.wrap_env_cons_list(env_constructors) env_updates = [NewEnvUpdate(wrapped_con)

for wrapped_con in wrapped]

sampler = runner.make_sampler(RaySampler, env=env_updates)

‘’‘

Parameters:
  • env_cons (list[Callable[Environment]]) – List of environment
  • constructor – to wrap. Note that the order these constructors are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent.
Returns:

The wrapped environments.

Return type:

list[Callable[TaskOnehotWrapper]]

render(self, mode)

Render the wrapped environment.

Parameters:mode (str) – the mode to render with. The string must be present in self.render_modes.
Returns:the return value for render, depending on each env.
Return type:object
visualize(self)

Creates a visualization of the wrapped environment.

close(self)

Close the wrapped env.