garage
¶
Garage Base.
- class EpisodeBatch(env_spec, episode_infos, observations, last_observations, actions, rewards, env_infos, agent_infos, step_types, lengths)¶
Bases:
TimeStepBatch
A tuple representing a batch of whole episodes.
Data type for on-policy algorithms.
A
EpisodeBatch
represents a batch of whole episodes, produced when one or more agents interacts with one or more environments.Symbol
Description
\(N\)
Episode batch dimension
\([T]\)
Variable-length time dimension of each episode
\(S^*\)
Single-step shape of a time-series tensor
\(N \bullet [T]\)
A dimension computed by flattening a variable-length time dimension \([T]\) into a single batch dimension with length \(sum_{i \in N} [T]_i\)
- episode_infos¶
A dict of numpy arrays containing the episode-level information of each episode. Each value of this dict should be a numpy array of shape \((N, S^*)\). For example, in goal-conditioned reinforcement learning this could contain the goal state for each episode.
- observations¶
A numpy array of shape \((N \bullet [T], O^*)\) containing the (possibly multi-dimensional) observations for all time steps in this batch. These must conform to
EnvStep.observation_space
.- Type
numpy.ndarray
- last_observations¶
A numpy array of shape \((N, O^*)\) containing the last observation of each episode. This is necessary since there are one more observations than actions every episode.
- Type
numpy.ndarray
- actions¶
A numpy array of shape \((N \bullet [T], A^*)\) containing the (possibly multi-dimensional) actions for all time steps in this batch. These must conform to
EnvStep.action_space
.- Type
numpy.ndarray
- rewards¶
A numpy array of shape \((N \bullet [T])\) containing the rewards for all time steps in this batch.
- Type
numpy.ndarray
- env_infos¶
A dict of numpy arrays arbitrary environment state information. Each value of this dict should be a numpy array of shape \((N \bullet [T])\) or \((N \bullet [T], S^*)\).
- agent_infos¶
A dict of numpy arrays arbitrary agent state information. Each value of this dict should be a numpy array of shape \((N \bullet [T])\) or \((N \bullet [T], S^*)\). For example, this may contain the hidden states from an RNN policy.
- step_types¶
A numpy array of StepType with shape :math:`(N bullet [T]) containing the time step types for all transitions in this batch.
- Type
numpy.ndarray
- lengths¶
An integer numpy array of shape \((N,)\) containing the length of each episode in this batch. This may be used to reconstruct the individual episodes.
- Type
numpy.ndarray
- Raises
ValueError – If any of the above attributes do not conform to their prescribed types and shapes.
- property next_observations¶
Get the observations seen after actions are performed.
In an
EpisodeBatch
, next_observations don’t need to be stored explicitly, since the next observation is already stored in the batch.- Returns
- The “next_observations” with shape
\((N \bullet [T], O^*)\)
- Return type
np.ndarray
- property episode_infos¶
Get the episode_infos.
In an
EpisodeBatch
, episode_infos only need to be stored once per episode. However, the episode_infos field ofTimeStepBatch
has shape \((N \bullet [T])\). This method expands episode_infos_by_episode (which have shape \((N)\)) to \((N \bullet [T])\).
- property padded_observations¶
Padded observations.
- Returns
- Padded observations with shape of
\((N, max_episode_length, O^*)\).
- Return type
np.ndarray
- property padded_actions¶
Padded actions.
- Returns
- Padded actions with shape of
\((N, max_episode_length, A^*)\).
- Return type
np.ndarray
- property observations_list¶
Split observations into a list.
- Returns
Splitted list.
- Return type
list[np.ndarray]
- property actions_list¶
Split actions into a list.
- Returns
Splitted list.
- Return type
list[np.ndarray]
- property padded_rewards¶
Padded rewards.
- Returns
- Padded rewards with shape of
\((N, max_episode_length)\).
- Return type
np.ndarray
- property valids¶
An array indicating valid steps in a padded tensor.
- Returns
the shape is \((N, max_episode_length)\).
- Return type
np.ndarray
- property padded_next_observations¶
Padded next_observations array.
- Returns
Array of shape \((N, max_episode_length, O^*)\)
- Return type
np.ndarray
- property padded_step_types¶
Padded step_type array.
- Returns
Array of shape \((N, max_episode_length)\)
- Return type
np.ndarray
- property padded_agent_infos¶
Padded agent infos.
- property padded_env_infos¶
Padded env infos.
- property terminals¶
Get an array of boolean indicating ternianal information.
- episode_infos_by_episode :numpy.ndarray¶
- last_observations :numpy.ndarray¶
- lengths :numpy.ndarray¶
- env_spec :garage.EnvSpec¶
- observations :numpy.ndarray¶
- actions :numpy.ndarray¶
- rewards :numpy.ndarray¶
- agent_infos :Dict[str, np.ndarray or dict]¶
- env_infos :Dict[str, np.ndarray or dict]¶
- step_types :numpy.ndarray¶
- classmethod concatenate(*batches)¶
Create a EpisodeBatch by concatenating EpisodeBatches.
- Parameters
batches (list[EpisodeBatch]) – Batches to concatenate.
- Returns
The concatenation of the batches.
- Return type
- split()¶
Split an EpisodeBatch into a list of EpisodeBatches.
The opposite of concatenate.
- Returns
- A list of EpisodeBatches, with one
episode per batch.
- Return type
- to_list()¶
Convert the batch into a list of dictionaries.
- Returns
- Keys:
- observations (np.ndarray): Non-flattened array of
observations. Has shape (T, S^*) (the unflattened state space of the current environment). observations[i] was used by the agent to choose actions[i].
- next_observations (np.ndarray): Non-flattened array of
observations. Has shape (T, S^*). next_observations[i] was observed by the agent after taking actions[i].
- actions (np.ndarray): Non-flattened array of actions. Must
have shape (T, S^*) (the unflattened action space of the current environment).
- rewards (np.ndarray): Array of rewards of shape (T,) (1D
array of length timesteps).
- agent_infos (dict[str, np.ndarray]): Dictionary of stacked,
non-flattened agent_info arrays.
- env_infos (dict[str, np.ndarray]): Dictionary of stacked,
non-flattened env_info arrays.
- step_types (numpy.ndarray): A numpy array of `StepType with
shape (T,) containing the time step types for all transitions in this batch.
- episode_infos (dict[str, np.ndarray]): Dictionary of stacked,
non-flattened episode_info arrays.
- Return type
- classmethod from_list(env_spec, paths)¶
Create a EpisodeBatch from a list of episodes.
- Parameters
env_spec (EnvSpec) – Specification for the environment from which this data was sampled.
paths (list[dict[str, np.ndarray or dict[str, np.ndarray]]]) –
Keys: * episode_infos (dict[str, np.ndarray]): Dictionary of stacked,
non-flattened episode_info arrays, each of shape (S^*).
- observations (np.ndarray): Non-flattened array of
observations. Typically has shape (T, S^*) (the unflattened state space of the current environment). observations[i] was used by the agent to choose actions[i]. observations may instead have shape (T + 1, S^*).
- next_observations (np.ndarray): Non-flattened array of
observations. Has shape (T, S^*). next_observations[i] was observed by the agent after taking actions[i]. Optional. Note that to ensure all information from the environment was preserved, observations[i] must have shape (T + 1, S^*), or this key must be set. However, this method is lenient and will “duplicate” the last observation if the original last observation has been lost.
- actions (np.ndarray): Non-flattened array of actions. Must
have shape (T, S^*) (the unflattened action space of the current environment).
- rewards (np.ndarray): Array of rewards of shape (T,) (1D
array of length timesteps).
- agent_infos (dict[str, np.ndarray]): Dictionary of stacked,
non-flattened agent_info arrays.
- env_infos (dict[str, np.ndarray]): Dictionary of stacked,
non-flattened env_info arrays.
- step_types (numpy.ndarray): A numpy array of `StepType with
shape (T,) containing the time step types for all transitions in this batch.
- to_time_step_list() List[Dict[str, numpy.ndarray]] ¶
Convert the batch into a list of dictionaries.
Breaks the
TimeStepBatch
into a list of single time step sample dictionaries. len(rewards) (or the number of discrete time step) dictionaries are returned- Returns
- Keys:
- episode_infos (dict[str, np.ndarray]): A dict of numpy arrays
containing the episode-level information of each episode. Each value of this dict must be a numpy array of shape \((S^*,)\). For example, in goal-conditioned reinforcement learning this could contain the goal state for each episode.
- observations (numpy.ndarray): Non-flattened array of
observations. Typically has shape (batch_size, S^*) (the unflattened state space of the current environment).
- actions (numpy.ndarray): Non-flattened array of actions. Must
have shape (batch_size, S^*) (the unflattened action space of the current environment).
- rewards (numpy.ndarray): Array of rewards of shape (
batch_size,) (1D array of length batch_size).
- next_observation (numpy.ndarray): Non-flattened array of next
observations. Has shape (batch_size, S^*). next_observations[i] was observed by the agent after taking actions[i].
- env_infos (dict): A dict arbitrary environment state
information.
- agent_infos (dict): A dict of arbitrary agent state
information. For example, this may contain the hidden states from an RNN policy.
- step_types (numpy.ndarray): A numpy array of `StepType with
shape (batch_size,) containing the time step types for all transitions in this batch.
- Return type
- classmethod from_time_step_list(env_spec, ts_samples)¶
Create a
TimeStepBatch
from a list of time step dictionaries.- Parameters
env_spec (EnvSpec) – Specification for the environment from which this data was sampled.
ts_samples (list[dict[str, np.ndarray or dict[str, np.ndarray]]]) –
keys: * episode_infos (dict[str, np.ndarray]): A dict of numpy arrays
containing the episode-level information of each episode. Each value of this dict must be a numpy array of shape \((N, S^*)\). For example, in goal-conditioned reinforcement learning this could contain the goal state for each episode.
- observations (numpy.ndarray): Non-flattened array of
observations. Typically has shape (batch_size, S^*) (the unflattened state space of the current environment).
- actions (numpy.ndarray): Non-flattened array of actions.
Must have shape (batch_size, S^*) (the unflattened action space of the current environment).
- rewards (numpy.ndarray): Array of rewards of shape (
batch_size,) (1D array of length batch_size).
- next_observation (numpy.ndarray): Non-flattened array of next
observations. Has shape (batch_size, S^*). next_observations[i] was observed by the agent after taking actions[i].
- env_infos (dict): A dict arbitrary environment state
information.
- agent_infos (dict): A dict of arbitrary agent
state information. For example, this may contain the hidden states from an RNN policy.
step_types (numpy.ndarray): A numpy array of `StepType with
- shape (batch_size,) containing the time step types for all
transitions in this batch.
- Returns
The concatenation of samples.
- Return type
- Raises
ValueError – If no dicts are provided.
- class TimeStep¶
A single TimeStep in an environment.
- A
TimeStep
represents a single sample when an agent interacts with an environment. It describes as SARS (State–action–reward–state) tuple that characterizes the evolution of a MDP.
- episode_info¶
A dict of numpy arrays of shape \((S*^,)\) containing episode-level information of each episode. For example, in goal-conditioned reinforcement learning this could contain the goal state for each episode.
- observation¶
A numpy array of shape \((O^*)\) containing the observation for this time step in the environment. These must conform to
EnvStep.observation_space
. The observation before applying the action. None if step_type is StepType.FIRST, i.e. at the start of a sequence.- Type
numpy.ndarray
- action¶
A numpy array of shape \((A^*)\) containing the action for this time step. These must conform to
EnvStep.action_space
. None if step_type is StepType.FIRST, i.e. at the start of a sequence.- Type
numpy.ndarray
- reward¶
A float representing the reward for taking the action given the observation, at this time step. None if step_type is StepType.FIRST, i.e. at the start of a sequence.
- Type
- next_observation¶
A numpy array of shape \((O^*)\) containing the observation for this time step in the environment. These must conform to
EnvStep.observation_space
. The observation after applying the action.- Type
numpy.ndarray
- agent_info¶
A dict of arbitrary agent state information. For example, this may contain the hidden states from an RNN policy.
- Type
- step_type¶
a
StepType
enum value. Can be one of :attribute:`~StepType.FIRST`, :attribute:`~StepType.MID`, :attribute:`~StepType.TERMINAL`, or :attribute:`~StepType.TIMEOUT`.- Type
- env_spec :garage.EnvSpec¶
- episode_info :Dict[str, numpy.ndarray]¶
- observation :numpy.ndarray¶
- action :numpy.ndarray¶
- reward :float¶
- next_observation :numpy.ndarray¶
- env_info :Dict[str, numpy.ndarray]¶
- agent_info :Dict[str, numpy.ndarray]¶
- step_type :StepType¶
- classmethod from_env_step(env_step, last_observation, agent_info, episode_info)¶
Create a TimeStep from a EnvStep.
- Parameters
env_step (EnvStep) – the env step returned by the environment.
last_observation (numpy.ndarray) – A numpy array of shape \((O^*)\) containing the observation for this time step in the environment. These must conform to
EnvStep.observation_space
. The observation before applying the action.agent_info (dict) – A dict of arbitrary agent state information.
episode_info (dict) – A dict of arbitrary information associated with the whole episode.
- Returns
The TimeStep with all information of EnvStep plus the agent info.
- Return type
- A
- class TimeStepBatch¶
A tuple representing a batch of TimeSteps.
Data type for off-policy algorithms, imitation learning and batch-RL.
- episode_infos¶
A dict of numpy arrays containing the episode-level information of each episode. Each value of this dict should be a numpy array of shape \((N, S^*)\). For example, in goal-conditioned reinforcement learning this could contain the goal state for each episode.
- observations¶
Non-flattened array of observations. Typically has shape (batch_size, S^*) (the unflattened state space of the current environment).
- Type
numpy.ndarray
- actions¶
Non-flattened array of actions. Must have shape (batch_size, S^*) (the unflattened action space of the current environment).
- Type
numpy.ndarray
- rewards¶
Array of rewards of shape (batch_size, 1).
- Type
numpy.ndarray
- next_observation¶
Non-flattened array of next observations. Has shape (batch_size, S^*). next_observations[i] was observed by the agent after taking actions[i].
- Type
numpy.ndarray
- agent_infos¶
A dict of arbitrary agent state information. For example, this may contain the hidden states from an RNN policy.
- Type
- step_types¶
A numpy array of `StepType with shape ( batch_size,) containing the time step types for all transitions in this batch.
- Type
numpy.ndarray
- Raises
ValueError – If any of the above attributes do not conform to their prescribed types and shapes.
- property terminals¶
Get an array of boolean indicating ternianal information.
- env_spec :garage.EnvSpec¶
- episode_infos :Dict[str, np.ndarray or dict]¶
- observations :numpy.ndarray¶
- actions :numpy.ndarray¶
- rewards :numpy.ndarray¶
- next_observations :numpy.ndarray¶
- agent_infos :Dict[str, np.ndarray or dict]¶
- env_infos :Dict[str, np.ndarray or dict]¶
- step_types :numpy.ndarray¶
- classmethod concatenate(*batches)¶
Concatenate two or more :class:`TimeStepBatch`s.
- Parameters
batches (list[TimeStepBatch]) – Batches to concatenate.
- Returns
The concatenation of the batches.
- Return type
- Raises
ValueError – If no TimeStepBatches are provided.
- split() List[TimeStepBatch] ¶
Split a
TimeStepBatch
into a list of :class:`~TimeStepBatch`s.The opposite of concatenate.
- Returns
- A list of :class:`TimeStepBatch`s, with one
TimeStep
perTimeStepBatch
.
- Return type
- to_time_step_list() List[Dict[str, numpy.ndarray]] ¶
Convert the batch into a list of dictionaries.
Breaks the
TimeStepBatch
into a list of single time step sample dictionaries. len(rewards) (or the number of discrete time step) dictionaries are returned- Returns
- Keys:
- episode_infos (dict[str, np.ndarray]): A dict of numpy arrays
containing the episode-level information of each episode. Each value of this dict must be a numpy array of shape \((S^*,)\). For example, in goal-conditioned reinforcement learning this could contain the goal state for each episode.
- observations (numpy.ndarray): Non-flattened array of
observations. Typically has shape (batch_size, S^*) (the unflattened state space of the current environment).
- actions (numpy.ndarray): Non-flattened array of actions. Must
have shape (batch_size, S^*) (the unflattened action space of the current environment).
- rewards (numpy.ndarray): Array of rewards of shape (
batch_size,) (1D array of length batch_size).
- next_observation (numpy.ndarray): Non-flattened array of next
observations. Has shape (batch_size, S^*). next_observations[i] was observed by the agent after taking actions[i].
- env_infos (dict): A dict arbitrary environment state
information.
- agent_infos (dict): A dict of arbitrary agent state
information. For example, this may contain the hidden states from an RNN policy.
- step_types (numpy.ndarray): A numpy array of `StepType with
shape (batch_size,) containing the time step types for all transitions in this batch.
- Return type
- classmethod from_time_step_list(env_spec, ts_samples)¶
Create a
TimeStepBatch
from a list of time step dictionaries.- Parameters
env_spec (EnvSpec) – Specification for the environment from which this data was sampled.
ts_samples (list[dict[str, np.ndarray or dict[str, np.ndarray]]]) –
keys: * episode_infos (dict[str, np.ndarray]): A dict of numpy arrays
containing the episode-level information of each episode. Each value of this dict must be a numpy array of shape \((N, S^*)\). For example, in goal-conditioned reinforcement learning this could contain the goal state for each episode.
- observations (numpy.ndarray): Non-flattened array of
observations. Typically has shape (batch_size, S^*) (the unflattened state space of the current environment).
- actions (numpy.ndarray): Non-flattened array of actions.
Must have shape (batch_size, S^*) (the unflattened action space of the current environment).
- rewards (numpy.ndarray): Array of rewards of shape (
batch_size,) (1D array of length batch_size).
- next_observation (numpy.ndarray): Non-flattened array of next
observations. Has shape (batch_size, S^*). next_observations[i] was observed by the agent after taking actions[i].
- env_infos (dict): A dict arbitrary environment state
information.
- agent_infos (dict): A dict of arbitrary agent
state information. For example, this may contain the hidden states from an RNN policy.
step_types (numpy.ndarray): A numpy array of `StepType with
- shape (batch_size,) containing the time step types for all
transitions in this batch.
- Returns
The concatenation of samples.
- Return type
- Raises
ValueError – If no dicts are provided.
- class Environment¶
Bases:
abc.ABC
The main API for garage environments.
The public API methods are:
Functions
reset()
step()
render()
visualize()
close()
Set the following properties:
Properties
Description
action_space
The action space specification
observation_space
The observation space specification
spec
The environment specifications
render_modes
The list of supported render modes
Example of a simple rollout loop:
env = MyEnv() policy = MyPolicy() first_observation, episode_info = env.reset() env.visualize() # visualization window opened episode = [] # Determine the first action first_action = policy.get_action(first_observation, episode_info) episode.append(env.step(first_action)) while not episode[-1].last(): action = policy.get_action(episode[-1].observation) episode.append(env.step(action)) env.close() # visualization window closed
- Make sure your environment is pickle-able:
Garage pickles the environment via the cloudpickle module to save snapshots of the experiment. However, some environments may contain attributes that are not pickle-able (e.g. a client-server connection). In such cases, override __setstate__() and __getstate__() to add your custom pickle logic.
You might want to refer to the EzPickle module: https://github.com/openai/gym/blob/master/gym/utils/ezpickle.py for a lightweight way of pickle and unpickle via constructor arguments.
- abstract property action_space¶
The action space specification.
- Type
akro.Space
- abstract property observation_space¶
The observation space specification.
- Type
akro.Space
- abstract property render_modes¶
A list of string representing the supported render modes.
See render() for a list of modes.
- Type
- abstract reset()¶
Resets the environment.
- Returns
- The first observation conforming to
observation_space.
- dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
- Return type
numpy.ndarray
- abstract step(action)¶
Steps the environment with the action and returns a EnvStep.
If the environment returned the last EnvStep of a sequence (either of type TERMINAL or TIMEOUT) at the previous step, this call to step() will start a new sequence and action will be ignored.
If spec.max_episode_length is reached after applying the action and the environment has not terminated the episode, step() should return a EnvStep with step_type==StepType.TIMEOUT.
If possible, update the visualization display as well.
- Parameters
action (object) – A NumPy array, or a nested dict, list or tuple of arrays conforming to action_space.
- Returns
The environment step resulting from the action.
- Return type
- Raises
RuntimeError – if step() is called after the environment has been constructed and reset() has not been called.
- abstract render(mode)¶
Renders the environment.
The set of supported modes varies per environment. By convention, if mode is:
- rgb_array: Return an numpy.ndarray with shape (x, y, 3) and type
uint8, representing RGB values for an x-by-y pixel image, suitable for turning into a video.
- ansi: Return a string (str) or StringIO.StringIO containing a
terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
Make sure that your class’s render_modes includes the list of supported modes.
For example:
class MyEnv(Environment): def render_modes(self): return ['rgb_array', 'ansi'] def render(self, mode): if mode == 'rgb_array': return np.array(...) # return RGB frame for video elif mode == 'ansi': ... # return text output else: raise ValueError('Supported render modes are {}, but ' 'got render mode {} instead.'.format( self.render_modes, mode))
- Parameters
mode (str) – the mode to render with. The string must be present in self.render_modes.
- abstract visualize()¶
Creates a visualization of the environment.
This function should be called only once after reset() to set up the visualization display. The visualization should be updated when the environment is changed (i.e. when step() is called.)
Calling close() will deallocate any resources and close any windows created by visualize(). If close() is not explicitly called, the visualization will be closed when the environment is destructed (i.e. garbage collected).
- abstract close()¶
Closes the environment.
This method should close all windows invoked by visualize().
Override this function in your subclass to perform any necessary cleanup.
Environments will automatically close() themselves when they are garbage collected or when the program exits.
- class EnvSpec(observation_space, action_space, max_episode_length=None)¶
Bases:
InOutSpec
Describes the observations, actions, and time horizon of an MDP.
- Parameters
observation_space (akro.Space) – The observation space of the env.
action_space (akro.Space) – The action space of the env.
max_episode_length (int) – The maximum number of steps allowed in an episode.
- property action_space¶
Get action space.
- Returns
Action space of the env.
- Return type
akro.Space
- property observation_space¶
Get observation space of the env.
- Returns
Observation space.
- Return type
akro.Space
- max_episode_length :int or None¶
- input_space :akro.Space¶
- output_space :akro.Space¶
- class EnvStep¶
A tuple representing a single step returned by the environment.
- action¶
A numpy array of shape \((A^*)\) containing the action for the this time step. These must conform to
EnvStep.action_space
. None if step_type is StepType.FIRST, i.e. at the start of a sequence.- Type
numpy.ndarray
- reward¶
A float representing the reward for taking the action given the observation, at the this time step. None if step_type is StepType.FIRST, i.e. at the start of a sequence.
- Type
- observation¶
A numpy array of shape \((O^*)\) containing the observation for the this time step in the environment. These must conform to
EnvStep.observation_space
. The observation after applying the action.- Type
numpy.ndarray
- step_type¶
a StepType enum value. Can either be StepType.FIRST, StepType.MID, StepType.TERMINAL, StepType.TIMEOUT.
- Type
- env_spec :EnvSpec¶
- action :numpy.ndarray¶
- reward :float¶
- observation :numpy.ndarray¶
- env_info :Dict[str, np.ndarray or dict]¶
- step_type :garage._dtypes.StepType¶
- class InOutSpec¶
Describes the input and output spaces of a primitive or module.
- input_space :akro.Space¶
- output_space :akro.Space¶
- class StepType¶
Bases:
enum.IntEnum
Defines the status of a
TimeStep
within a sequence.Note that the last
TimeStep
in a sequence can either be :attribute:`StepType.TERMINAL` or :attribute:`StepType.TIMEOUT`.Suppose max_episode_length = 5: * A success sequence terminated at step 4 will look like:
FIRST, MID, MID, TERMINAL
- A success sequence terminated at step 5 will look like:
FIRST, MID, MID, MID, TERMINAL
- An unsuccessful sequence truncated by time limit will look like:
FIRST, MID, MID, MID, TIMEOUT
- class denominator¶
the denominator of a rational number in lowest terms
- class imag¶
the imaginary part of a complex number
- class numerator¶
the numerator of a rational number in lowest terms
- class real¶
the real part of a complex number
- FIRST = 0¶
- MID = 1¶
- TERMINAL = 2¶
- TIMEOUT = 3¶
- classmethod get_step_type(step_cnt, max_episode_length, done)¶
Determines the step type based on step cnt and done signal.
- Parameters
- Returns
the step type.
- Return type
- Raises
ValueError – if step_cnt is < 1. In this case a environment’s
reset()` is likely not called yet and the step_cnt is None –
- bit_length()¶
Number of bits necessary to represent self in binary.
>>> bin(37) '0b100101' >>> (37).bit_length() 6
- conjugate()¶
Returns self, the complex conjugate of any int.
- to_bytes()¶
Return an array of bytes representing an integer.
- length
Length of bytes object to use. An OverflowError is raised if the integer is not representable with the given number of bytes.
- byteorder
The byte order used to represent the integer. If byteorder is ‘big’, the most significant byte is at the beginning of the byte array. If byteorder is ‘little’, the most significant byte is at the end of the byte array. To request the native byte order of the host system, use `sys.byteorder’ as the byte order value.
- signed
Determines whether two’s complement is used to represent the integer. If signed is False and a negative integer is given, an OverflowError is raised.
- name()¶
The name of the Enum member.
- value()¶
The value of the Enum member.
- class Wrapper(env)¶
Bases:
Environment
A wrapper for an environment that implements the Environment API.
- property action_space¶
The action space specification.
- Type
akro.Space
- property observation_space¶
The observation space specification.
- Type
akro.Space
- property unwrapped¶
The inner environment.
- Type
- step(action)¶
Step the wrapped env.
- Parameters
action (np.ndarray) – An action provided by the agent.
- Returns
The environment step resulting from the action.
- Return type
- reset()¶
Reset the wrapped env.
- Returns
- The first observation conforming to
observation_space.
- dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
- Return type
numpy.ndarray
- render(mode)¶
Render the wrapped environment.
- visualize()¶
Creates a visualization of the wrapped environment.
- close()¶
Close the wrapped env.
- wrap_experiment(function=None, *, log_dir=None, prefix='experiment', name=None, snapshot_mode='last', snapshot_gap=1, archive_launch_repo=True, name_parameters=None, use_existing_dir=False, x_axis='TotalEnvSteps')¶
Decorate a function to turn it into an ExperimentTemplate.
When invoked, the wrapped function will receive an ExperimentContext, which will contain the log directory into which the experiment should log information.
This decorator can be invoked in two differed ways.
Without arguments, like this:
@wrap_experiment def my_experiment(ctxt, seed, lr=0.5):
…
Or with arguments:
@wrap_experiment(snapshot_mode=’all’) def my_experiment(ctxt, seed, lr=0.5):
…
All arguments must be keyword arguments.
- Parameters
function (callable or None) – The experiment function to wrap.
log_dir (str or None) – The full log directory to log to. Will be computed from name if omitted.
name (str or None) – The name of this experiment template. Will be filled from the wrapped function’s name if omitted.
prefix (str) – Directory under data/local in which to place the experiment directory.
snapshot_mode (str) – Policy for which snapshots to keep (or make at all). Can be either “all” (all iterations will be saved), “last” (only the last iteration will be saved), “gap” (every snapshot_gap iterations are saved), or “none” (do not save snapshots).
snapshot_gap (int) – Gap between snapshot iterations. Waits this number of iterations before taking another snapshot.
archive_launch_repo (bool) – Whether to save an archive of the repository containing the launcher script. This is a potentially expensive operation which is useful for ensuring reproducibility.
name_parameters (str or None) – Parameters to insert into the experiment name. Should be either None (the default), ‘all’ (all parameters will be used), or ‘passed’ (only passed parameters will be used). The used parameters will be inserted in the order they appear in the function definition.
use_existing_dir (bool) – If true, (re)use the directory for this experiment, even if it already contains data.
x_axis (str) – Key to use for x axis of plots.
- Returns
The wrapped function.
- Return type
callable
- class TFTrainer(snapshot_config, sess=None)¶
Bases:
Trainer
This class implements a trainer for TensorFlow algorithms.
A trainer provides a default TensorFlow session using python context. This is useful for those experiment components (e.g. policy) that require a TensorFlow session during construction.
Use trainer.setup(algo, env) to setup algorithm and environment for trainer and trainer.train() to start training.
- Parameters
snapshot_config (garage.experiment.SnapshotConfig) – The snapshot configuration used by Trainer to create the snapshotter. If None, it will create one with default settings.
sess (tf.Session) – An optional TensorFlow session. A new session will be created immediately if not provided.
Note
When resume via command line, new snapshots will be saved into the SAME directory if not specified.
When resume programmatically, snapshot directory should be specify manually or through @wrap_experiment interface.
Examples
# to train with TFTrainer() as trainer:
env = gym.make(‘CartPole-v1’) policy = CategoricalMLPPolicy(
env_spec=env.spec, hidden_sizes=(32, 32))
- algo = TRPO(
env=env, policy=policy, baseline=baseline, max_episode_length=100, discount=0.99, max_kl_step=0.01)
trainer.setup(algo, env) trainer.train(n_epochs=100, batch_size=4000)
# to resume immediately. with TFTrainer() as trainer:
trainer.restore(resume_from_dir) trainer.resume()
# to resume with modified training arguments. with TFTrainer() as trainer:
trainer.restore(resume_from_dir) trainer.resume(n_epochs=20)
- property total_env_steps¶
Total environment steps collected.
- Returns
Total environment steps collected.
- Return type
- setup(algo, env)¶
Set up trainer and sessions for algorithm and environment.
This method saves algo and env within trainer and creates a sampler, and initializes all uninitialized variables in session.
Note
After setup() is called all variables in session should have been initialized. setup() respects existing values in session so policy weights can be loaded before setup().
- Parameters
algo (RLAlgorithm) – An algorithm instance.
env (Environment) – An environment instance.
- initialize_tf_vars()¶
Initialize all uninitialized variables in session.
- obtain_episodes(itr, batch_size=None, agent_update=None, env_update=None)¶
Obtain one batch of episodes.
- Parameters
itr (int) – Index of iteration (epoch).
batch_size (int) – Number of steps in batch. This is a hint that the sampler may or may not respect.
agent_update (object) – Value which will be passed into the agent_update_fn before doing sampling episodes. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
env_update (object) – Value which will be passed into the env_update_fn before sampling episodes. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
- Raises
ValueError – If the trainer was initialized without a sampler, or batch_size wasn’t provided here or to train.
- Returns
Batch of episodes.
- Return type
- obtain_samples(itr, batch_size=None, agent_update=None, env_update=None)¶
Obtain one batch of samples.
- Parameters
itr (int) – Index of iteration (epoch).
batch_size (int) – Number of steps in batch. This is a hint that the sampler may or may not respect.
agent_update (object) – Value which will be passed into the agent_update_fn before sampling episodes. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
env_update (object) – Value which will be passed into the env_update_fn before sampling episodes. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
- Raises
ValueError – Raised if the trainer was initialized without a sampler, or batch_size wasn’t provided here or to train.
- Returns
One batch of samples.
- Return type
- save(epoch)¶
Save snapshot of current batch.
- Parameters
epoch (int) – Epoch.
- Raises
NotSetupError – if save() is called before the trainer is set up.
- restore(from_dir, from_epoch='last')¶
Restore experiment from snapshot.
- log_diagnostics(pause_for_plot=False)¶
Log diagnostics.
- Parameters
pause_for_plot (bool) – Pause for plot.
- train(n_epochs, batch_size=None, plot=False, store_episodes=False, pause_for_plot=False)¶
Start training.
- Parameters
- Raises
NotSetupError – If train() is called before setup().
- Returns
The average return in last epoch cycle.
- Return type
- step_epochs()¶
Step through each epoch.
This function returns a magic generator. When iterated through, this generator automatically performs services such as snapshotting and log management. It is used inside train() in each algorithm.
The generator initializes two variables: self.step_itr and self.step_episode. To use the generator, these two have to be updated manually in each epoch, as the example shows below.
- Yields
int – The next training epoch.
Examples
- for epoch in trainer.step_epochs():
trainer.step_episode = trainer.obtain_samples(…) self.train_once(…) trainer.step_itr += 1
- resume(n_epochs=None, batch_size=None, plot=None, store_episodes=None, pause_for_plot=None)¶
Resume from restored experiment.
This method provides the same interface as train().
If not specified, an argument will default to the saved arguments from the last call to train().
- Parameters
- Raises
NotSetupError – If resume() is called before restore().
- Returns
The average return in last epoch cycle.
- Return type
- get_env_copy()¶
Get a copy of the environment.
- Returns
An environment instance.
- Return type
- class Trainer(snapshot_config)¶
Base class of trainer.
Use trainer.setup(algo, env) to setup algorithm and environment for trainer and trainer.train() to start training.
- Parameters
snapshot_config (garage.experiment.SnapshotConfig) – The snapshot configuration used by Trainer to create the snapshotter. If None, it will create one with default settings.
Note
For the use of any TensorFlow environments, policies and algorithms, please use TFTrainer().
Examples
# to traintrainer = Trainer()env = Env(…)policy = Policy(…)algo = Algo(env=env,policy=policy,…)trainer.setup(algo, env)trainer.train(n_epochs=100, batch_size=4000)# to resume immediately.trainer = Trainer()trainer.restore(resume_from_dir)trainer.resume()# to resume with modified training arguments.trainer = Trainer()trainer.restore(resume_from_dir)trainer.resume(n_epochs=20)- property total_env_steps¶
Total environment steps collected.
- Returns
Total environment steps collected.
- Return type
- setup(algo, env)¶
Set up trainer for algorithm and environment.
This method saves algo and env within trainer and creates a sampler.
Note
After setup() is called all variables in session should have been initialized. setup() respects existing values in session so policy weights can be loaded before setup().
- Parameters
algo (RLAlgorithm) – An algorithm instance. If this algo want to use samplers, it should have a _sampler field.
env (Environment) – An environment instance.
- obtain_episodes(itr, batch_size=None, agent_update=None, env_update=None)¶
Obtain one batch of episodes.
- Parameters
itr (int) – Index of iteration (epoch).
batch_size (int) – Number of steps in batch. This is a hint that the sampler may or may not respect.
agent_update (object) – Value which will be passed into the agent_update_fn before doing sampling episodes. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
env_update (object) – Value which will be passed into the env_update_fn before sampling episodes. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
- Raises
ValueError – If the trainer was initialized without a sampler, or batch_size wasn’t provided here or to train.
- Returns
Batch of episodes.
- Return type
- obtain_samples(itr, batch_size=None, agent_update=None, env_update=None)¶
Obtain one batch of samples.
- Parameters
itr (int) – Index of iteration (epoch).
batch_size (int) – Number of steps in batch. This is a hint that the sampler may or may not respect.
agent_update (object) – Value which will be passed into the agent_update_fn before sampling episodes. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
env_update (object) – Value which will be passed into the env_update_fn before sampling episodes. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
- Raises
ValueError – Raised if the trainer was initialized without a sampler, or batch_size wasn’t provided here or to train.
- Returns
One batch of samples.
- Return type
- save(epoch)¶
Save snapshot of current batch.
- Parameters
epoch (int) – Epoch.
- Raises
NotSetupError – if save() is called before the trainer is set up.
- restore(from_dir, from_epoch='last')¶
Restore experiment from snapshot.
- log_diagnostics(pause_for_plot=False)¶
Log diagnostics.
- Parameters
pause_for_plot (bool) – Pause for plot.
- train(n_epochs, batch_size=None, plot=False, store_episodes=False, pause_for_plot=False)¶
Start training.
- Parameters
- Raises
NotSetupError – If train() is called before setup().
- Returns
The average return in last epoch cycle.
- Return type
- step_epochs()¶
Step through each epoch.
This function returns a magic generator. When iterated through, this generator automatically performs services such as snapshotting and log management. It is used inside train() in each algorithm.
The generator initializes two variables: self.step_itr and self.step_episode. To use the generator, these two have to be updated manually in each epoch, as the example shows below.
- Yields
int – The next training epoch.
Examples
- for epoch in trainer.step_epochs():
trainer.step_episode = trainer.obtain_samples(…) self.train_once(…) trainer.step_itr += 1
- resume(n_epochs=None, batch_size=None, plot=None, store_episodes=None, pause_for_plot=None)¶
Resume from restored experiment.
This method provides the same interface as train().
If not specified, an argument will default to the saved arguments from the last call to train().
- Parameters
- Raises
NotSetupError – If resume() is called before restore().
- Returns
The average return in last epoch cycle.
- Return type
- get_env_copy()¶
Get a copy of the environment.
- Returns
An environment instance.
- Return type