garage.envs package¶
Garage wrappers for gym environments.
-
class
GarageEnv
(env=None, env_name='', is_image=False)[source]¶ Bases:
gym.core.Wrapper
Returns an abstract Garage wrapper class for gym.Env.
In order to provide pickling (serialization) and parameterization for gym.Envs, they must be wrapped with a GarageEnv. This ensures compatibility with existing samplers and checkpointing when the envs are passed internally around garage.
Furthermore, classes inheriting from GarageEnv should silently convert action_space and observation_space from gym.Spaces to akro.spaces.
Parameters: - env (gym.Env) – An env that will be wrapped
- env_name (str) – If the env_name is speficied, a gym environment with that name will be created. If such an environment does not exist, a gym.error is thrown.
- is_image (bool) – True if observations contain pixel values, false otherwise. Setting this to true converts a gym.Spaces.Box obs space to an akro.Image and normalizes pixel values.
-
reset
(**kwargs)[source]¶ Call reset on wrapped env.
This method is necessary to suppress a deprecated warning thrown by gym.Wrapper.
Parameters: kwargs – Keyword args Returns: The initial observation. Return type: object
-
spec
¶ Return the environment specification.
This property needs to exist, since it’s defined as a property in gym.Wrapper in a way that makes it difficult to overwrite.
Returns: The envionrment specification. Return type: garage.envs.env_spec.EnvSpec
-
step
(action)[source]¶ Call step on wrapped env.
This method is necessary to suppress a deprecated warning thrown by gym.Wrapper.
Parameters: action (object) – An action provided by the agent. Returns: Agent’s observation of the current environment float : Amount of reward returned after previous action bool : Whether the episode has ended, in which case further step() calls will return undefined results- dict: Contains auxiliary diagnostic information (helpful for
- debugging, and sometimes learning)
Return type: object
-
Step
(observation, reward, done, **kwargs)[source]¶ Create a namedtuple from the results of environment.step(action).
Provides the option to put extra diagnostic info in the kwargs (if it exists) without demanding an explicit positional argument.
Parameters: Returns: A named tuple of the arguments.
Return type: collections.namedtuple
-
class
EnvSpec
(observation_space, action_space)[source]¶ Bases:
garage._dtypes.InOutSpec
Describes the action and observation spaces of an environment.
Parameters: - observation_space (akro.Space) – The observation space of the env.
- action_space (akro.Space) – The action space of the env.
-
action_space
¶ Get action space.
Returns: Action space of the env. Return type: akro.Space
-
observation_space
¶ Get observation space of the env.
Returns: Observation space. Return type: akro.Space
-
class
GridWorldEnv
(desc='4x4')[source]¶ Bases:
gym.core.Env
‘S’ : starting point‘F’ or ‘.’: free space‘W’ or ‘x’: wall‘H’ or ‘o’: hole (terminates episode)‘G’ : goal-
static
action_from_direction
(d)[source]¶ Return the action corresponding to the given direction. This is a helper method for debugging and testing purposes. :return: the action index corresponding to the given direction
-
action_space
¶
-
get_possible_next_states
(state, action)[source]¶ Given the state and action, return a list of possible next states and their probabilities. Only next states with nonzero probabilities will be returned :param state: start state :param action: action :return: a list of pairs (s’, p(s’|s,a))
-
observation_space
¶
-
render
(mode='human')[source]¶ Renders the environment.
The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:
- human: render to the current display or terminal and return nothing. Usually for human consumption.
- rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
- ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
Note
- Make sure that your class’s metadata ‘render.modes’ key includes
- the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.
Parameters: mode (str) – the mode to render with Example:
- class MyEnv(Env):
metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}
- def render(self, mode=’human’):
- if mode == ‘rgb_array’:
- return np.array(…) # return RGB frame suitable for video
- elif mode == ‘human’:
- … # pop up a window and render
- else:
- super(MyEnv, self).render(mode=mode) # just raise an exception
-
static
-
class
MultiEnvWrapper
(envs, sample_strategy=<function uniform_random_strategy>, mode='add-onehot', env_names=None)[source]¶ Bases:
gym.core.Wrapper
A wrapper class to handle multiple environments.
This wrapper adds an integer ‘task_id’ to env_info every timestep.
Parameters: - envs (list(gym.Env)) – A list of objects implementing gym.Env.
- sample_strategy (function(int, int)) – Sample strategy to be used when sampling a new task.
- mode (str) –
A string from ‘vanilla`, ‘add-onehot’ and ‘del-onehot’. The type of observation to use. - ‘vanilla’ provides the observation as it is.
- Use case: metaworld environments with MT* algorithms,
- gym environments with Task Embedding.
- ’add-onehot’ will append an one-hot task id to observation. Use case: gym environments with MT* algorithms.
- ’del-onehot’ assumes an one-hot task id is appended to observation, and it excludes that. Use case: metaworld environments with Task Embedding.
- env_names (list(str)) – The names of the environments corresponding to envs. The index of an env_name must correspond to the index of the corresponding env in envs. An env_name in env_names must be unique.
-
observation_space
¶ Observation space.
Returns: Observation space. Return type: akro.Box
-
reset
(**kwargs)[source]¶ Sample new task and call reset on new task env.
Parameters: kwargs (dict) – Keyword arguments to be passed to gym.Env.reset Returns: active task one-hot representation + observation Return type: numpy.ndarray
-
spec
¶ Describes the action and observation spaces of the wrapped envs.
Returns: - the action and observation spaces of the
- wrapped environments.
Return type: garage.envs.EnvSpec
-
step
(action)[source]¶ gym.Env step for the active task env.
Parameters: action (object) – object to be passed in gym.Env.reset(action) Returns: agent’s observation of the current environment float: amount of reward returned after previous action bool: whether the episode has ended dict: contains auxiliary diagnostic information Return type: object
-
task_space
¶ Task Space.
Returns: Task space. Return type: akro.Box
-
normalize
¶
-
class
PointEnv
(goal=array([1., 1.], dtype=float32), arena_size=5.0, done_bonus=0.0, never_done=False)[source]¶ Bases:
gym.core.Env
A simple 2D point environment.
-
observation_space
¶ The observation space
Type: gym.spaces.Box
-
action_space
¶ The action space
Type: gym.spaces.Box
Parameters: - goal (np.ndarray) – A 2D array representing the goal position
- arena_size (float) – The size of arena where the point is constrained within (-arena_size, arena_size) in each dimension
- done_bonus (float) – A numerical bonus added to the reward once the point as reached the goal
- never_done (bool) – Never send a done signal, even if the agent achieves the goal
-
action_space
The action space.
Type: gym.spaces.Box
-
observation_space
The observation space.
Type: gym.spaces.Box
-
render
(mode='human')[source]¶ Draw the environment.
Not implemented.
Parameters: mode (str) – Ignored.
-
reset
()[source]¶ Reset the environment.
Returns: Observation of the environment. Return type: np.ndarray
-
sample_tasks
(num_tasks)[source]¶ Sample a list of num_tasks tasks.
Parameters: num_tasks (int) – Number of tasks to sample. Returns: - A list of “tasks”, where each task is
- a dictionary containing a single key, “goal”, mapping to a point in 2D space.
Return type: list[dict[str, np.ndarray]]
-
set_task
(task)[source]¶ Reset with a task.
Parameters: task (dict[str, np.ndarray]) – A task (a dictionary containing a single key, “goal”, which should be a point in 2D space).
-
step
(action)[source]¶ Step the environment state.
Parameters: action (np.ndarray) – The action to take in the environment. Returns: Observation. The observation of the environment. float: Reward. The reward acquired at this time step. boolean: Done. Whether the environment was completed at this time step. Always False for this environment.Return type: np.ndarray
-
-
class
TaskOnehotWrapper
(env, task_index, n_total_tasks)[source]¶ Bases:
gym.core.Wrapper
Append a one-hot task representation to an environment.
See TaskOnehotWrapper.wrap_env_list for the recommended way of creating this class.
Parameters: -
reset
(**kwargs)[source]¶ Sample new task and call reset on new task env.
Parameters: kwargs (dict) – Keyword arguments to be passed to env.reset Returns: active task one-hot representation + observation Return type: numpy.ndarray
-
spec
¶ Return the environment specification.
Returns: The envionrment specification. Return type: garage.envs.env_spec.EnvSpec
-
step
(action)[source]¶ gym.Env step for the active task env.
Parameters: action (np.ndarray) – Action performed by the agent in the environment. Returns: np.ndarray: Agent’s observation of the current environment. float: Amount of reward yielded by previous action. bool: True iff the episode has ended. dict[str, np.ndarray]: Contains auxiliary diagnostic information about this time-step.Return type: tuple
-
classmethod
wrap_env_cons_list
(env_cons)[source]¶ Wrap a list of environment constructors, giving each a one-hot.
This function is useful if you want to avoid constructing any environments in the main experiment process, and are using a multi-task aware remote sampler (i.e. ~RaySampler).
For example: ‘’’ .. code-block:: python
env_constructors = get_mt10_env_cons() wrapped = TaskOnehotWrapper.wrap_env_cons_list(env_constructors) env_updates = [NewEnvUpdate(wrapped_con)
for wrapped_con in wrapped]sampler = runner.make_sampler(RaySampler, env=env_updates)
‘’‘
Parameters: env_cons (list[Callable[gym.Env]]) – List of environment constructor to wrap. Note that the order these constructors are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent. Returns: The wrapped environments. Return type: list[Callable[TaskOnehotWrapper]]
-
classmethod
wrap_env_list
(envs)[source]¶ Wrap a list of environments, giving each environment a one-hot.
This is the primary way of constructing instances of this class. It’s mostly useful when training multi-task algorithms using a multi-task aware sampler.
For example: ‘’’ .. code-block:: python
envs = get_mt10_envs() wrapped = TaskOnehotWrapper.wrap_env_list(envs) sampler = runner.make_sampler(LocalSampler, env=wrapped)‘’‘
Parameters: envs (list[gym.Env]) – List of environments to wrap. Note that the order these environments are passed in determines the value of their one-hot encoding. It is essential that this list is always in the same order, or the resulting encodings will be inconsistent. Returns: The wrapped environments. Return type: list[TaskOnehotWrapper]
-
Subpackages¶
- garage.envs.dm_control package
- garage.envs.mujoco package
- garage.envs.wrappers package
- Submodules
- garage.envs.wrappers.atari_env module
- garage.envs.wrappers.clip_reward module
- garage.envs.wrappers.episodic_life module
- garage.envs.wrappers.fire_reset module
- garage.envs.wrappers.grayscale module
- garage.envs.wrappers.max_and_skip module
- garage.envs.wrappers.noop module
- garage.envs.wrappers.resize module
- garage.envs.wrappers.stack_frames module
- Submodules