garage._environment
¶
Base Garage Environment API.
-
class
EnvSpec
(observation_space, action_space, max_episode_length=None)[source]¶ Bases:
garage.InOutSpec
Describes the action and observation spaces of an environment.
- Parameters
observation_space (akro.Space) – The observation space of the env.
action_space (akro.Space) – The action space of the env.
max_episode_length (int) – The maximum number of steps allowed in an episode.
-
property
action_space
(self)¶ Get action space.
- Returns
Action space of the env.
- Return type
akro.Space
-
property
observation_space
(self)¶ Get observation space of the env.
- Returns
Observation space.
- Return type
akro.Space
-
property
max_episode_length
(self)¶ Get max episode steps.
- Returns
The maximum number of steps that an episode
- Return type
-
property
input_space
(self)¶ Get input space of the module.
- Returns
Input space of the module.
- Return type
akro.Space
-
property
output_space
(self)¶ Get output space of the module.
- Returns
Output space of the module.
- Return type
akro.Space
-
class
EnvStep
[source]¶ Bases:
collections.namedtuple()
A tuple representing a single step returned by the environment.
-
action
¶ A numpy array of shape \((A^*)\) containing the action for the this time step. These must conform to
EnvStep.action_space
. None if step_type is StepType.FIRST, i.e. at the start of a sequence.- Type
numpy.ndarray
-
reward
¶ A float representing the reward for taking the action given the observation, at the this time step. None if step_type is StepType.FIRST, i.e. at the start of a sequence.
- Type
-
observation
¶ A numpy array of shape \((O^*)\) containing the observation for the this time step in the environment. These must conform to
EnvStep.observation_space
. The observation after applying the action.- Type
numpy.ndarray
-
step_type
¶ a StepType enum value. Can either be StepType.FIRST, StepType.MID, StepType.TERMINAL, StepType.TIMEOUT.
- Type
-
property
first
(self)¶ bool: Whether this TimeStep is the first of a sequence.
-
property
mid
(self)¶ bool: Whether this TimeStep is in the mid of a sequence.
-
property
terminal
(self)¶ bool: Whether this TimeStep records a termination condition.
-
property
timeout
(self)¶ bool: Whether this TimeStep records a time out condition.
-
property
last
(self)¶ bool: Whether this TimeStep is the last of a sequence.
-
count
()¶ Return number of occurrences of value.
-
index
()¶ Return first index of value.
Raises ValueError if the value is not present.
-
-
class
Environment
[source]¶ Bases:
abc.ABC
The main API for garage environments.
The public API methods are:
Functions
reset()
step()
render()
visualize()
close()
Set the following properties:
Properties
Description
action_space
The action space specification
observation_space
The observation space specification
spec
The environment specifications
render_modes
The list of supported render modes
Example of a simple rollout loop:
env = MyEnv() policy = MyPolicy() first_observation, episode_info = env.reset() env.visualize() # visualization window opened episode = [] # Determine the first action first_action = policy.get_action(first_observation, episode_info) episode.append(env.step(first_action)) while not episode[-1].last(): action = policy.get_action(episode[-1].observation) episode.append(env.step(action)) env.close() # visualization window closed
- Make sure your environment is pickle-able:
Garage pickles the environment via the cloudpickle module to save snapshots of the experiment. However, some environments may contain attributes that are not pickle-able (e.g. a client-server connection). In such cases, override __setstate__() and __getstate__() to add your custom pickle logic.
You might want to refer to the EzPickle module: https://github.com/openai/gym/blob/master/gym/utils/ezpickle.py for a lightweight way of pickle and unpickle via constructor arguments.
-
property
action_space
(self)¶ akro.Space: The action space specification.
-
property
observation_space
(self)¶ akro.Space: The observation space specification.
-
property
spec
(self)¶ EnvSpec: The environment specification.
-
property
render_modes
(self)¶ list: A list of string representing the supported render modes.
See render() for a list of modes.
-
abstract
reset
(self)[source]¶ Resets the environment.
- Returns
- The first observation conforming to
observation_space.
- dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
- Return type
numpy.ndarray
-
abstract
step
(self, action)[source]¶ Steps the environment with the action and returns a EnvStep.
If the environment returned the last EnvStep of a sequence (either of type TERMINAL or TIMEOUT) at the previous step, this call to step() will start a new sequence and action will be ignored.
If spec.max_episode_length is reached after applying the action and the environment has not terminated the episode, step() should return a EnvStep with step_type==StepType.TIMEOUT.
If possible, update the visualization display as well.
- Parameters
action (object) – A NumPy array, or a nested dict, list or tuple of arrays conforming to action_space.
- Returns
The environment step resulting from the action.
- Return type
- Raises
RuntimeError – if step() is called after the environment has been constructed and reset() has not been called.
-
abstract
render
(self, mode)[source]¶ Renders the environment.
The set of supported modes varies per environment. By convention, if mode is:
- rgb_array: Return an numpy.ndarray with shape (x, y, 3) and type
uint8, representing RGB values for an x-by-y pixel image, suitable for turning into a video.
- ansi: Return a string (str) or StringIO.StringIO containing a
terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
Make sure that your class’s render_modes includes the list of supported modes.
For example:
class MyEnv(Environment): def render_modes(self): return ['rgb_array', 'ansi'] def render(self, mode): if mode == 'rgb_array': return np.array(...) # return RGB frame for video elif mode == 'ansi': ... # return text output else: raise ValueError('Supported render modes are {}, but ' 'got render mode {} instead.'.format( self.render_modes, mode))
- Parameters
mode (str) – the mode to render with. The string must be present in self.render_modes.
-
abstract
visualize
(self)[source]¶ Creates a visualization of the environment.
This function should be called only once after reset() to set up the visualization display. The visualization should be updated when the environment is changed (i.e. when step() is called.)
Calling close() will deallocate any resources and close any windows created by visualize(). If close() is not explicitly called, the visualization will be closed when the environment is destructed (i.e. garbage collected).
-
class
Wrapper
(env)[source]¶ Bases:
garage._environment.Environment
A wrapper for an environment that implements the Environment API.
-
property
action_space
(self)¶ akro.Space: The action space specification.
-
property
observation_space
(self)¶ akro.Space: The observation space specification.
-
property
spec
(self)¶ EnvSpec: The environment specification.
-
property
render_modes
(self)¶ list: A list of string representing the supported render modes.
-
step
(self, action)[source]¶ Step the wrapped env.
- Parameters
action (np.ndarray) – An action provided by the agent.
- Returns
The environment step resulting from the action.
- Return type
-
reset
(self)[source]¶ Reset the wrapped env.
- Returns
- The first observation conforming to
observation_space.
- dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
- Return type
numpy.ndarray
-
property
unwrapped
(self)¶ garage.Environment: The inner environment.
-
property