garage.envs package¶

Garage wrappers for gym environments.

class GarageEnv(env=None, env_name='', is_image=False)[source]¶

Bases: gym.core.Wrapper

Returns an abstract Garage wrapper class for gym.Env.

In order to provide pickling (serialization) and parameterization for gym.Envs, they must be wrapped with a GarageEnv. This ensures compatibility with existing samplers and checkpointing when the envs are passed internally around garage.

Furthermore, classes inheriting from GarageEnv should silently convert action_space and observation_space from gym.Spaces to akro.spaces.

Parameters:

env (gym.Env) – An env that will be wrapped
env_name (str) – If the env_name is speficied, a gym environment with that name will be created. If such an environment does not exist, a gym.error is thrown.
is_image (bool) – True if observations contain pixel values, false otherwise. Setting this to true converts a gym.Spaces.Box obs space to an akro.Image and normalizes pixel values.

close()[source]¶: Close the wrapped env.

reset(**kwargs)[source]¶

Call reset on wrapped env.

This method is necessary to suppress a deprecated warning thrown by gym.Wrapper.

Parameters:	kwargs – Keyword args
Returns:	The initial observation.
Return type:	object

spec¶

Return the environment specification.

This property needs to exist, since it’s defined as a property in gym.Wrapper in a way that makes it difficult to overwrite.

Returns:	The envionrment specification.
Return type:	garage.envs.env_spec.EnvSpec

step(action)[source]¶

Call step on wrapped env.

This method is necessary to suppress a deprecated warning thrown by gym.Wrapper.

Parameters: action (object) – An action provided by the agent.

Returns:

Agent’s observation of the current environment float : Amount of reward returned after previous action bool : Whether the episode has ended, in which case further step()

calls will return undefined results

dict: Contains auxiliary diagnostic information (helpful for: debugging, and sometimes learning)

Return type: object

Step(observation, reward, done, **kwargs)[source]¶

Create a namedtuple from the results of environment.step(action).

Provides the option to put extra diagnostic info in the kwargs (if it exists) without demanding an explicit positional argument.

Parameters:	observation (object) – Agent’s observation of the current environment reward (float) – Amount of reward returned after previous action done (bool) – Whether the episode has ended, in which case further step() calls will return undefined results kwargs – Keyword args
Returns:	A named tuple of the arguments.
Return type:	collections.namedtuple

class EnvSpec(observation_space, action_space)[source]¶

Bases: garage._dtypes.InOutSpec

Describes the action and observation spaces of an environment.

Parameters:	observation_space (akro.Space) – The observation space of the env. action_space (akro.Space) – The action space of the env.

action_space¶

Get action space.

Returns:	Action space of the env.
Return type:	akro.Space

observation_space¶

Get observation space of the env.

Returns:	Observation space.
Return type:	akro.Space

class GridWorldEnv(desc='4x4')[source]¶

Bases: gym.core.Env

‘S’ : starting point
‘F’ or ‘.’: free space
‘W’ or ‘x’: wall
‘H’ or ‘o’: hole (terminates episode)
‘G’ : goal

static action_from_direction(d)[source]¶: Return the action corresponding to the given direction. This is a helper method for debugging and testing purposes. :return: the action index corresponding to the given direction

action_space¶

get_possible_next_states(state, action)[source]¶: Given the state and action, return a list of possible next states and their probabilities. Only next states with nonzero probabilities will be returned :param state: start state :param action: action :return: a list of pairs (s’, p(s’|s,a))

log_diagnostics(paths)[source]¶

observation_space¶

render(mode='human')[source]¶

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note

Make sure that your class’s metadata ‘render.modes’ key includes: the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Parameters:	mode (str) – the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):

if mode == ‘rgb_array’:: return np.array(…) # return RGB frame suitable for video
elif mode == ‘human’:: … # pop up a window and render
else:: super(MyEnv, self).render(mode=mode) # just raise an exception

reset()[source]¶

Resets the state of the environment and returns an initial observation.

Returns:	the initial observation.
Return type:	observation (object)

step(action)[source]¶: action map: 0: left 1: down 2: right 3: up :param action: should be a one-hot vector encoding the action :return:

class MultiEnvWrapper(envs, sample_strategy=<function uniform_random_strategy>, mode='add-onehot', env_names=None)[source]¶

Bases: gym.core.Wrapper

A wrapper class to handle multiple environments.

This wrapper adds an integer ‘task_id’ to env_info every timestep.

Parameters:

envs (list(gym.Env)) – A list of objects implementing gym.Env.
sample_strategy (function(int, int)) – Sample strategy to be used when sampling a new task.
mode (str) –
A string from ‘vanilla`, ‘add-onehot’ and ‘del-onehot’. The type of observation to use. - ‘vanilla’ provides the observation as it is.

Use case: metaworld environments with MT* algorithms,

gym environments with Task Embedding.
- ’add-onehot’ will append an one-hot task id to observation. Use case: gym environments with MT* algorithms.
- ’del-onehot’ assumes an one-hot task id is appended to observation, and it excludes that. Use case: metaworld environments with Task Embedding.
env_names (list(str)) – The names of the environments corresponding to envs. The index of an env_name must correspond to the index of the corresponding env in envs. An env_name in env_names must be unique.

active_task_index¶

Index of active task env.

Returns:	Index of active task.
Return type:	int

close()[source]¶: Close all task envs.

num_tasks¶

Total number of tasks.

Returns:	number of tasks.
Return type:	int

observation_space¶

Observation space.

Returns:	Observation space.
Return type:	akro.Box

reset(**kwargs)[source]¶

Sample new task and call reset on new task env.

Parameters:	kwargs (dict) – Keyword arguments to be passed to gym.Env.reset
Returns:	active task one-hot representation + observation
Return type:	numpy.ndarray

spec¶

Describes the action and observation spaces of the wrapped envs.

Returns:	the action and observation spaces of the wrapped environments.
Return type:	garage.envs.EnvSpec

step(action)[source]¶

gym.Env step for the active task env.

Parameters:	action (object) – object to be passed in gym.Env.reset(action)
Returns:	agent’s observation of the current environment float: amount of reward returned after previous action bool: whether the episode has ended dict: contains auxiliary diagnostic information
Return type:	object

task_space¶

Task Space.

Returns:	Task space.
Return type:	akro.Box

normalize¶: alias of garage.envs.normalized_env.NormalizedEnv

class PointEnv(goal=array([1., 1.], dtype=float32), arena_size=5.0, done_bonus=0.0, never_done=False)[source]¶

Bases: gym.core.Env

A simple 2D point environment.

observation_space¶

The observation space

Type:	gym.spaces.Box

action_space¶

The action space

Type:	gym.spaces.Box

Parameters:	goal (np.ndarray) – A 2D array representing the goal position arena_size (float) – The size of arena where the point is constrained within (-arena_size, arena_size) in each dimension done_bonus (float) – A numerical bonus added to the reward once the point as reached the goal never_done (bool) – Never send a done signal, even if the agent achieves the goal

action_space

The action space.

Type:	gym.spaces.Box

observation_space

The observation space.

Type:	gym.spaces.Box

render(mode='human')[source]¶

Draw the environment.

Not implemented.

Parameters:	mode (str) – Ignored.

reset()[source]¶

Reset the environment.

Returns:	Observation of the environment.
Return type:	np.ndarray

sample_tasks(num_tasks)[source]¶

Sample a list of num_tasks tasks.

Parameters:	num_tasks (int) – Number of tasks to sample.
Returns:	A list of “tasks”, where each task is a dictionary containing a single key, “goal”, mapping to a point in 2D space.
Return type:	list[dict[str, np.ndarray]]

set_task(task)[source]¶

Reset with a task.

Parameters:	task (dict[str, np.ndarray]) – A task (a dictionary containing a single key, “goal”, which should be a point in 2D space).

step(action)[source]¶

Step the environment state.

Parameters:	action (np.ndarray) – The action to take in the environment.
Returns:	Observation. The observation of the environment. float: Reward. The reward acquired at this time step. boolean: Done. Whether the environment was completed at this time step. Always False for this environment.
Return type:	np.ndarray

class TaskOnehotWrapper(env, task_index, n_total_tasks)[source]¶

Bases: gym.core.Wrapper

Append a one-hot task representation to an environment.

See TaskOnehotWrapper.wrap_env_list for the recommended way of creating this class.

Parameters:	env (gym.Env) – The environment to wrap. task_index (int) – The index of this task among the tasks. n_total_tasks (int) – The number of total tasks.

reset(**kwargs)[source]¶

Sample new task and call reset on new task env.

Parameters:	kwargs (dict) – Keyword arguments to be passed to env.reset
Returns:	active task one-hot representation + observation
Return type:	numpy.ndarray

spec¶

Return the environment specification.

Returns:	The envionrment specification.
Return type:	garage.envs.env_spec.EnvSpec

step(action)[source]¶

gym.Env step for the active task env.

Parameters:	action (np.ndarray) – Action performed by the agent in the environment.
Returns:	np.ndarray: Agent’s observation of the current environment. float: Amount of reward yielded by previous action. bool: True iff the episode has ended. dict[str, np.ndarray]: Contains auxiliary diagnostic information about this time-step.
Return type:	tuple

classmethod wrap_env_cons_list(env_cons)[source]¶

Wrap a list of environment constructors, giving each a one-hot.

This function is useful if you want to avoid constructing any environments in the main experiment process, and are using a multi-task aware remote sampler (i.e. ~RaySampler).

For example: ‘’’ .. code-block:: python

env_constructors = get_mt10_env_cons() wrapped = TaskOnehotWrapper.wrap_env_cons_list(env_constructors) env_updates = [NewEnvUpdate(wrapped_con)

for wrapped_con in wrapped]

sampler = runner.make_sampler(RaySampler, env=env_updates)

‘’‘

Parameters:	env_cons (list[Callable[gym.Env]]) – List of environment constructor to wrap. Note that the order these constructors are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent.
Returns:	The wrapped environments.
Return type:	list[Callable[TaskOnehotWrapper]]

classmethod wrap_env_list(envs)[source]¶

Wrap a list of environments, giving each environment a one-hot.

This is the primary way of constructing instances of this class. It’s mostly useful when training multi-task algorithms using a multi-task aware sampler.

For example: ‘’’ .. code-block:: python

envs = get_mt10_envs() wrapped = TaskOnehotWrapper.wrap_env_list(envs) sampler = runner.make_sampler(LocalSampler, env=wrapped)

‘’‘

Parameters:	envs (list[gym.Env]) – List of environments to wrap. Note that the order these environments are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent.
Returns:	The wrapped environments.
Return type:	list[TaskOnehotWrapper]

garage.envs package¶

Subpackages¶

Submodules¶