garage._environment

Base Garage Environment API.

class InOutSpec[source]

Describes the input and output spaces of a primitive or module.

input_space :akro.Space
output_space :akro.Space
class EnvSpec(observation_space, action_space, max_episode_length=None)[source]

Bases: InOutSpec

Inheritance diagram of garage._environment.EnvSpec

Describes the observations, actions, and time horizon of an MDP.

Parameters
  • observation_space (akro.Space) – The observation space of the env.

  • action_space (akro.Space) – The action space of the env.

  • max_episode_length (int) – The maximum number of steps allowed in an episode.

max_episode_length :int or None
input_space :akro.Space
output_space :akro.Space
property action_space(self)

Get action space.

Returns

Action space of the env.

Return type

akro.Space

property observation_space(self)

Get observation space of the env.

Returns

Observation space.

Return type

akro.Space

class EnvStep[source]

A tuple representing a single step returned by the environment.

env_spec

Specification for the environment from which this data was sampled.

Type

EnvSpec

action

A numpy array of shape \((A^*)\) containing the action for the this time step. These must conform to EnvStep.action_space. None if step_type is StepType.FIRST, i.e. at the start of a sequence.

Type

numpy.ndarray

reward

A float representing the reward for taking the action given the observation, at the this time step. None if step_type is StepType.FIRST, i.e. at the start of a sequence.

Type

float

observation

A numpy array of shape \((O^*)\) containing the observation for the this time step in the environment. These must conform to EnvStep.observation_space. The observation after applying the action.

Type

numpy.ndarray

env_info

A dict containing environment state information.

Type

dict

step_type

a StepType enum value. Can either be StepType.FIRST, StepType.MID, StepType.TERMINAL, StepType.TIMEOUT.

Type

StepType

env_spec :EnvSpec
action :numpy.ndarray
reward :float
observation :numpy.ndarray
env_info :Dict[str, np.ndarray or dict]
step_type :garage._dtypes.StepType
property first(self)

bool: Whether this TimeStep is the first of a sequence.

property mid(self)

bool: Whether this TimeStep is in the mid of a sequence.

property terminal(self)

bool: Whether this TimeStep records a termination condition.

property timeout(self)

bool: Whether this TimeStep records a time out condition.

property last(self)

bool: Whether this TimeStep is the last of a sequence.

class Environment[source]

Bases: abc.ABC

Inheritance diagram of garage._environment.Environment

The main API for garage environments.

The public API methods are:

Functions

reset()

step()

render()

visualize()

close()

Set the following properties:

Properties

Description

action_space

The action space specification

observation_space

The observation space specification

spec

The environment specifications

render_modes

The list of supported render modes

Example of a simple rollout loop:

env = MyEnv()
policy = MyPolicy()
first_observation, episode_info = env.reset()
env.visualize()  # visualization window opened

episode = []
# Determine the first action
first_action = policy.get_action(first_observation, episode_info)
episode.append(env.step(first_action))

while not episode[-1].last():
   action = policy.get_action(episode[-1].observation)
   episode.append(env.step(action))

env.close()  # visualization window closed
Make sure your environment is pickle-able:

Garage pickles the environment via the cloudpickle module to save snapshots of the experiment. However, some environments may contain attributes that are not pickle-able (e.g. a client-server connection). In such cases, override __setstate__() and __getstate__() to add your custom pickle logic.

You might want to refer to the EzPickle module: https://github.com/openai/gym/blob/master/gym/utils/ezpickle.py for a lightweight way of pickle and unpickle via constructor arguments.

property action_space(self)

akro.Space: The action space specification.

property observation_space(self)

akro.Space: The observation space specification.

property spec(self)

EnvSpec: The environment specification.

property render_modes(self)

list: A list of string representing the supported render modes.

See render() for a list of modes.

abstract reset(self)[source]

Resets the environment.

Returns

The first observation conforming to

observation_space.

dict: The episode-level information.

Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)

Return type

numpy.ndarray

abstract step(self, action)[source]

Steps the environment with the action and returns a EnvStep.

If the environment returned the last EnvStep of a sequence (either of type TERMINAL or TIMEOUT) at the previous step, this call to step() will start a new sequence and action will be ignored.

If spec.max_episode_length is reached after applying the action and the environment has not terminated the episode, step() should return a EnvStep with step_type==StepType.TIMEOUT.

If possible, update the visualization display as well.

Parameters

action (object) – A NumPy array, or a nested dict, list or tuple of arrays conforming to action_space.

Returns

The environment step resulting from the action.

Return type

EnvStep

Raises

RuntimeError – if step() is called after the environment has been constructed and reset() has not been called.

abstract render(self, mode)[source]

Renders the environment.

The set of supported modes varies per environment. By convention, if mode is:

  • rgb_array: Return an numpy.ndarray with shape (x, y, 3) and type

    uint8, representing RGB values for an x-by-y pixel image, suitable for turning into a video.

  • ansi: Return a string (str) or StringIO.StringIO containing a

    terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Make sure that your class’s render_modes includes the list of supported modes.

For example:

class MyEnv(Environment):
    def render_modes(self):
        return ['rgb_array', 'ansi']

    def render(self, mode):
        if mode == 'rgb_array':
            return np.array(...)  # return RGB frame for video
        elif mode == 'ansi':
            ...  # return text output
        else:
            raise ValueError('Supported render modes are {}, but '
                             'got render mode {} instead.'.format(
                                 self.render_modes, mode))
Parameters

mode (str) – the mode to render with. The string must be present in self.render_modes.

abstract visualize(self)[source]

Creates a visualization of the environment.

This function should be called only once after reset() to set up the visualization display. The visualization should be updated when the environment is changed (i.e. when step() is called.)

Calling close() will deallocate any resources and close any windows created by visualize(). If close() is not explicitly called, the visualization will be closed when the environment is destructed (i.e. garbage collected).

abstract close(self)[source]

Closes the environment.

This method should close all windows invoked by visualize().

Override this function in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when they are garbage collected or when the program exits.

class Wrapper(env)[source]

Bases: Environment

Inheritance diagram of garage._environment.Wrapper

A wrapper for an environment that implements the Environment API.

property action_space(self)

akro.Space: The action space specification.

property observation_space(self)

akro.Space: The observation space specification.

property spec(self)

EnvSpec: The environment specification.

property render_modes(self)

list: A list of string representing the supported render modes.

step(self, action)[source]

Step the wrapped env.

Parameters

action (np.ndarray) – An action provided by the agent.

Returns

The environment step resulting from the action.

Return type

EnvStep

reset(self)[source]

Reset the wrapped env.

Returns

The first observation conforming to

observation_space.

dict: The episode-level information.

Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)

Return type

numpy.ndarray

render(self, mode)[source]

Render the wrapped environment.

Parameters

mode (str) – the mode to render with. The string must be present in self.render_modes.

Returns

the return value for render, depending on each env.

Return type

object

visualize(self)[source]

Creates a visualization of the wrapped environment.

close(self)[source]

Close the wrapped env.

property unwrapped(self)

garage.Environment: The inner environment.