garage.envs package

Garage wrappers for gym environments.

class GarageEnv(env=None, env_name='', is_image=False)[source]

Bases: gym.core.Wrapper

Returns an abstract Garage wrapper class for gym.Env.

In order to provide pickling (serialization) and parameterization for gym.Envs, they must be wrapped with a GarageEnv. This ensures compatibility with existing samplers and checkpointing when the envs are passed internally around garage.

Furthermore, classes inheriting from GarageEnv should silently convert action_space and observation_space from gym.Spaces to akro.spaces.

Parameters:
  • env (gym.Env) – An env that will be wrapped
  • env_name (str) – If the env_name is speficied, a gym environment with that name will be created. If such an environment does not exist, a gym.error is thrown.
  • is_image (bool) – True if observations contain pixel values, false otherwise. Setting this to true converts a gym.Spaces.Box obs space to an akro.Image and normalizes pixel values.
close()[source]

Close the wrapped env.

reset(**kwargs)[source]

Call reset on wrapped env.

This method is necessary to suppress a deprecated warning thrown by gym.Wrapper.

Parameters:kwargs – Keyword args
Returns:The initial observation.
Return type:object
spec

Return the environment specification.

This property needs to exist, since it’s defined as a property in gym.Wrapper in a way that makes it difficult to overwrite.

Returns:The envionrment specification.
Return type:garage.envs.env_spec.EnvSpec
step(action)[source]

Call step on wrapped env.

This method is necessary to suppress a deprecated warning thrown by gym.Wrapper.

Parameters:action (object) – An action provided by the agent.
Returns:Agent’s observation of the current environment float : Amount of reward returned after previous action bool : Whether the episode has ended, in which case further step()
calls will return undefined results
dict: Contains auxiliary diagnostic information (helpful for
debugging, and sometimes learning)
Return type:object
Step(observation, reward, done, **kwargs)[source]

Create a namedtuple from the results of environment.step(action).

Provides the option to put extra diagnostic info in the kwargs (if it exists) without demanding an explicit positional argument.

Parameters:
  • observation (object) – Agent’s observation of the current environment
  • reward (float) – Amount of reward returned after previous action
  • done (bool) – Whether the episode has ended, in which case further step() calls will return undefined results
  • kwargs – Keyword args
Returns:

A named tuple of the arguments.

Return type:

collections.namedtuple

class EnvSpec(observation_space, action_space)[source]

Bases: garage._dtypes.InOutSpec

Describes the action and observation spaces of an environment.

Parameters:
  • observation_space (akro.Space) – The observation space of the env.
  • action_space (akro.Space) – The action space of the env.
action_space

Get action space.

Returns:Action space of the env.
Return type:akro.Space
observation_space

Get observation space of the env.

Returns:Observation space.
Return type:akro.Space
class GridWorldEnv(desc='4x4')[source]

Bases: gym.core.Env

‘S’ : starting point
‘F’ or ‘.’: free space
‘W’ or ‘x’: wall
‘H’ or ‘o’: hole (terminates episode)
‘G’ : goal
static action_from_direction(d)[source]

Return the action corresponding to the given direction. This is a helper method for debugging and testing purposes. :return: the action index corresponding to the given direction

action_space
get_possible_next_states(state, action)[source]

Given the state and action, return a list of possible next states and their probabilities. Only next states with nonzero probabilities will be returned :param state: start state :param action: action :return: a list of pairs (s’, p(s’|s,a))

log_diagnostics(paths)[source]
observation_space
render(mode='human')[source]

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.
  • rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note

Make sure that your class’s metadata ‘render.modes’ key includes
the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.
Parameters:mode (str) – the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):
if mode == ‘rgb_array’:
return np.array(…) # return RGB frame suitable for video
elif mode == ‘human’:
… # pop up a window and render
else:
super(MyEnv, self).render(mode=mode) # just raise an exception
reset()[source]

Resets the state of the environment and returns an initial observation.

Returns:the initial observation.
Return type:observation (object)
step(action)[source]

action map: 0: left 1: down 2: right 3: up :param action: should be a one-hot vector encoding the action :return:

class MultiEnvWrapper(envs, sample_strategy=<function uniform_random_strategy>, mode='add-onehot', env_names=None)[source]

Bases: gym.core.Wrapper

A wrapper class to handle multiple environments.

This wrapper adds an integer ‘task_id’ to env_info every timestep.

Parameters:
  • envs (list(gym.Env)) – A list of objects implementing gym.Env.
  • sample_strategy (function(int, int)) – Sample strategy to be used when sampling a new task.
  • mode (str) –

    A string from ‘vanilla`, ‘add-onehot’ and ‘del-onehot’. The type of observation to use. - ‘vanilla’ provides the observation as it is.

    Use case: metaworld environments with MT* algorithms,
    gym environments with Task Embedding.
    • ’add-onehot’ will append an one-hot task id to observation. Use case: gym environments with MT* algorithms.
    • ’del-onehot’ assumes an one-hot task id is appended to observation, and it excludes that. Use case: metaworld environments with Task Embedding.
  • env_names (list(str)) – The names of the environments corresponding to envs. The index of an env_name must correspond to the index of the corresponding env in envs. An env_name in env_names must be unique.
active_task_index

Index of active task env.

Returns:Index of active task.
Return type:int
close()[source]

Close all task envs.

num_tasks

Total number of tasks.

Returns:number of tasks.
Return type:int
observation_space

Observation space.

Returns:Observation space.
Return type:akro.Box
reset(**kwargs)[source]

Sample new task and call reset on new task env.

Parameters:kwargs (dict) – Keyword arguments to be passed to gym.Env.reset
Returns:active task one-hot representation + observation
Return type:numpy.ndarray
spec

Describes the action and observation spaces of the wrapped envs.

Returns:
the action and observation spaces of the
wrapped environments.
Return type:garage.envs.EnvSpec
step(action)[source]

gym.Env step for the active task env.

Parameters:action (object) – object to be passed in gym.Env.reset(action)
Returns:agent’s observation of the current environment float: amount of reward returned after previous action bool: whether the episode has ended dict: contains auxiliary diagnostic information
Return type:object
task_space

Task Space.

Returns:Task space.
Return type:akro.Box
normalize

alias of garage.envs.normalized_env.NormalizedEnv

class PointEnv(goal=array([1., 1.], dtype=float32), arena_size=5.0, done_bonus=0.0, never_done=False)[source]

Bases: gym.core.Env

A simple 2D point environment.

observation_space

The observation space

Type:gym.spaces.Box
action_space

The action space

Type:gym.spaces.Box
Parameters:
  • goal (np.ndarray) – A 2D array representing the goal position
  • arena_size (float) – The size of arena where the point is constrained within (-arena_size, arena_size) in each dimension
  • done_bonus (float) – A numerical bonus added to the reward once the point as reached the goal
  • never_done (bool) – Never send a done signal, even if the agent achieves the goal
action_space

The action space.

Type:gym.spaces.Box
observation_space

The observation space.

Type:gym.spaces.Box
render(mode='human')[source]

Draw the environment.

Not implemented.

Parameters:mode (str) – Ignored.
reset()[source]

Reset the environment.

Returns:Observation of the environment.
Return type:np.ndarray
sample_tasks(num_tasks)[source]

Sample a list of num_tasks tasks.

Parameters:num_tasks (int) – Number of tasks to sample.
Returns:
A list of “tasks”, where each task is
a dictionary containing a single key, “goal”, mapping to a point in 2D space.
Return type:list[dict[str, np.ndarray]]
set_task(task)[source]

Reset with a task.

Parameters:task (dict[str, np.ndarray]) – A task (a dictionary containing a single key, “goal”, which should be a point in 2D space).
step(action)[source]

Step the environment state.

Parameters:action (np.ndarray) – The action to take in the environment.
Returns:Observation. The observation of the environment. float: Reward. The reward acquired at this time step. boolean: Done. Whether the environment was completed at this
time step. Always False for this environment.
Return type:np.ndarray
class TaskOnehotWrapper(env, task_index, n_total_tasks)[source]

Bases: gym.core.Wrapper

Append a one-hot task representation to an environment.

See TaskOnehotWrapper.wrap_env_list for the recommended way of creating this class.

Parameters:
  • env (gym.Env) – The environment to wrap.
  • task_index (int) – The index of this task among the tasks.
  • n_total_tasks (int) – The number of total tasks.
reset(**kwargs)[source]

Sample new task and call reset on new task env.

Parameters:kwargs (dict) – Keyword arguments to be passed to env.reset
Returns:active task one-hot representation + observation
Return type:numpy.ndarray
spec

Return the environment specification.

Returns:The envionrment specification.
Return type:garage.envs.env_spec.EnvSpec
step(action)[source]

gym.Env step for the active task env.

Parameters:action (np.ndarray) – Action performed by the agent in the environment.
Returns:np.ndarray: Agent’s observation of the current environment. float: Amount of reward yielded by previous action. bool: True iff the episode has ended. dict[str, np.ndarray]: Contains auxiliary diagnostic
information about this time-step.
Return type:tuple
classmethod wrap_env_cons_list(env_cons)[source]

Wrap a list of environment constructors, giving each a one-hot.

This function is useful if you want to avoid constructing any environments in the main experiment process, and are using a multi-task aware remote sampler (i.e. ~RaySampler).

For example: ‘’’ .. code-block:: python

env_constructors = get_mt10_env_cons() wrapped = TaskOnehotWrapper.wrap_env_cons_list(env_constructors) env_updates = [NewEnvUpdate(wrapped_con)

for wrapped_con in wrapped]

sampler = runner.make_sampler(RaySampler, env=env_updates)

‘’‘

Parameters:env_cons (list[Callable[gym.Env]]) – List of environment constructor to wrap. Note that the order these constructors are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent.
Returns:The wrapped environments.
Return type:list[Callable[TaskOnehotWrapper]]
classmethod wrap_env_list(envs)[source]

Wrap a list of environments, giving each environment a one-hot.

This is the primary way of constructing instances of this class. It’s mostly useful when training multi-task algorithms using a multi-task aware sampler.

For example: ‘’’ .. code-block:: python

envs = get_mt10_envs() wrapped = TaskOnehotWrapper.wrap_env_list(envs) sampler = runner.make_sampler(LocalSampler, env=wrapped)

‘’‘

Parameters:envs (list[gym.Env]) – List of environments to wrap. Note that the order these environments are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent.
Returns:The wrapped environments.
Return type:list[TaskOnehotWrapper]