garage package

Garage Base.

make_optimizer(optimizer_type, module=None, **kwargs)[source]

Create an optimizer for pyTorch & tensorflow algos.

Parameters:
  • optimizer_type (Union[type, tuple[type, dict]]) – Type of optimizer. This can be an optimizer type such as ‘torch.optim.Adam’ or a tuple of type and dictionary, where dictionary contains arguments to initialize the optimizer e.g. (torch.optim.Adam, {‘lr’ : 1e-3})
  • module (optional) – If the optimizer type is a torch.optimizer. The torch.nn.Module module whose parameters needs to be optimized must be specify.
  • kwargs (dict) – Other keyword arguments to initialize optimizer. This is not used when optimizer_type is tuple.
Returns:

Constructed optimizer.

Return type:

torch.optim.Optimizer

Raises:

ValueError – Raises value error when optimizer_type is tuple, and non-default argument is passed in kwargs.

wrap_experiment(function=None, *, log_dir=None, prefix='experiment', name=None, snapshot_mode='last', snapshot_gap=1, archive_launch_repo=True, name_parameters=None, use_existing_dir=False)[source]

Decorate a function to turn it into an ExperimentTemplate.

When invoked, the wrapped function will receive an ExperimentContext, which will contain the log directory into which the experiment should log information.

This decorator can be invoked in two differed ways.

Without arguments, like this:

@wrap_experiment def my_experiment(ctxt, seed, lr=0.5):

Or with arguments:

@wrap_experiment(snapshot_mode=’all’) def my_experiment(ctxt, seed, lr=0.5):

All arguments must be keyword arguments.

Parameters:
  • function (callable or None) – The experiment function to wrap.
  • log_dir (str or None) – The full log directory to log to. Will be computed from name if omitted.
  • name (str or None) – The name of this experiment template. Will be filled from the wrapped function’s name if omitted.
  • prefix (str) – Directory under data/local in which to place the experiment directory.
  • snapshot_mode (str) – Policy for which snapshots to keep (or make at all). Can be either “all” (all iterations will be saved), “last” (only the last iteration will be saved), “gap” (every snapshot_gap iterations are saved), or “none” (do not save snapshots).
  • snapshot_gap (int) – Gap between snapshot iterations. Waits this number of iterations before taking another snapshot.
  • archive_launch_repo (bool) – Whether to save an archive of the repository containing the launcher script. This is a potentially expensive operation which is useful for ensuring reproducibility.
  • name_parameters (str or None) – Parameters to insert into the experiment name. Should be either None (the default), ‘all’ (all parameters will be used), or ‘passed’ (only passed parameters will be used). The used parameters will be inserted in the order they appear in the function definition.
  • use_existing_dir (bool) – If true, (re)use the directory for this experiment, even if it already contains data.
Returns:

The wrapped function.

Return type:

callable

class TimeStep[source]

Bases: garage._dtypes.TimeStep

A tuple representing a single TimeStep.

A TimeStep represents a single sample when an agent interacts with
an environment.
env_spec

Specification for the environment from which this data was sampled.

Type:garage.envs.EnvSpec
observation

A numpy array of shape \((O^*)\) containing the observation for the this time step in the environment. These must conform to env_spec.observation_space.

Type:numpy.ndarray
action

A numpy array of shape \((A^*)\) containing the action for the this time step. These must conform to env_spec.action_space.

Type:numpy.ndarray
reward

A float representing the reward for taking the action given the observation, at the this time step.

Type:float
terminals

The termination signal for the this time step.

Type:bool
env_info

A dict arbitrary environment state information.

Type:dict
agent_info

A dict of arbitrary agent state information. For example, this may contain the hidden states from an RNN policy.

Type:numpy.ndarray
Raises:ValueError – If any of the above attributes do not conform to their prescribed types and shapes.
class TrajectoryBatch[source]

Bases: garage._dtypes.TrajectoryBatch

A tuple representing a batch of whole trajectories.

Data type for on-policy algorithms.

A TrajectoryBatch represents a batch of whole trajectories produced when one or more agents interacts with one or more environments.

Symbol Description
\(N\) Trajectory index dimension
\([T]\) Variable-length time dimension of each trajectory
\(S^*\) Single-step shape of a time-series tensor
\(N \bullet [T]\) A dimension computed by flattening a variable-length time dimension \([T]\) into a single batch dimension with length \(sum_{i \in N} [T]_i\)
env_spec

Specification for the environment from which this data was sampled.

Type:garage.envs.EnvSpec
observations

A numpy array of shape \((N \bullet [T], O^*)\) containing the (possibly multi-dimensional) observations for all time steps in this batch. These must conform to env_spec.observation_space.

Type:numpy.ndarray
last_observations

A numpy array of shape \((N, O^*)\) containing the last observation of each trajectory. This is necessary since there are one more observations than actions every trajectory.

Type:numpy.ndarray
actions

A numpy array of shape \((N \bullet [T], A^*)\) containing the (possibly multi-dimensional) actions for all time steps in this batch. These must conform to env_spec.action_space.

Type:numpy.ndarray
rewards

A numpy array of shape \((N \bullet [T])\) containing the rewards for all time steps in this batch.

Type:numpy.ndarray
terminals

A boolean numpy array of shape \((N \bullet [T])\) containing the termination signals for all time steps in this batch.

Type:numpy.ndarray
env_infos

A dict of numpy arrays arbitrary environment state information. Each value of this dict should be a numpy array of shape \((N \bullet [T])\) or \((N \bullet [T], S^*)\).

Type:dict
agent_infos

A dict of numpy arrays arbitrary agent state information. Each value of this dict should be a numpy array of shape \((N \bullet [T])\) or \((N \bullet [T], S^*)\). For example, this may contain the hidden states from an RNN policy.

Type:numpy.ndarray
lengths

An integer numpy array of shape \((N,)\) containing the length of each trajectory in this batch. This may be used to reconstruct the individual trajectories.

Type:numpy.ndarray
Raises:ValueError – If any of the above attributes do not conform to their prescribed types and shapes.
classmethod concatenate(*batches)[source]

Create a TrajectoryBatch by concatenating TrajectoryBatches.

Parameters:batches (list[TrajectoryBatch]) – Batches to concatenate.
Returns:The concatenation of the batches.
Return type:TrajectoryBatch
classmethod from_trajectory_list(env_spec, paths)[source]

Create a TrajectoryBatch from a list of trajectories.

Parameters:
  • env_spec (garage.envs.EnvSpec) – Specification for the environment from which this data was sampled.
  • paths (list[dict[str, np.ndarray or dict[str, np.ndarray]]]) –

    Keys: * observations (np.ndarray): Non-flattened array of

    observations. Typically has shape (T, S^*) (the unflattened state space of the current environment). observations[i] was used by the agent to choose actions[i]. observations may instead have shape (T + 1, S^*).
    • next_observations (np.ndarray): Non-flattened array of
      observations. Has shape (T, S^*). next_observations[i] was observed by the agent after taking actions[i]. Optional. Note that to ensure all information from the environment was preserved, observations[i] should have shape (T + 1, S^*), or this key should be set. However, this method is lenient and will “duplicate” the last observation if the original last observation has been lost.
    • actions (np.ndarray): Non-flattened array of actions. Should
      have shape (T, S^*) (the unflattened action space of the current environment).
    • rewards (np.ndarray): Array of rewards of shape (T,) (1D
      array of length timesteps).
    • dones (np.ndarray): Array of rewards of shape (T,) (1D array
      of length timesteps).
    • agent_infos (dict[str, np.ndarray]): Dictionary of stacked,
      non-flattened agent_info arrays.
    • env_infos (dict[str, np.ndarray]): Dictionary of stacked,
      non-flattened env_info arrays.
split()[source]

Split a TrajectoryBatch into a list of TrajectoryBatches.

The opposite of concatenate.

Returns:
A list of TrajectoryBatches, with one
trajectory per batch.
Return type:list[TrajectoryBatch]
to_trajectory_list()[source]

Convert the batch into a list of dictionaries.

Returns:
Keys:
  • observations (np.ndarray): Non-flattened array of
    observations. Has shape (T, S^*) (the unflattened state space of the current environment). observations[i] was used by the agent to choose actions[i].
  • next_observations (np.ndarray): Non-flattened array of
    observations. Has shape (T, S^*). next_observations[i] was observed by the agent after taking actions[i].
  • actions (np.ndarray): Non-flattened array of actions. Should
    have shape (T, S^*) (the unflattened action space of the current environment).
  • rewards (np.ndarray): Array of rewards of shape (T,) (1D
    array of length timesteps).
  • dones (np.ndarray): Array of dones of shape (T,) (1D array
    of length timesteps).
  • agent_infos (dict[str, np.ndarray]): Dictionary of stacked,
    non-flattened agent_info arrays.
  • env_infos (dict[str, np.ndarray]): Dictionary of stacked,
    non-flattened env_info arrays.
Return type:list[dict[str, np.ndarray or dict[str, np.ndarray]]]
log_multitask_performance(itr, batch, discount, name_map=None)[source]

Log performance of trajectories from multiple tasks.

Parameters:
  • itr (int) – Iteration number to be logged.
  • batch (garage.TrajectoryBatch) – Batch of trajectories. The trajectories should have either the “task_name” or “task_id” env_infos. If the “task_name” is not present, then name_map is required, and should map from task id’s to task names.
  • discount (float) – Discount used in computing returns.
  • name_map (dict[int, str] or None) – Mapping from task id’s to task names. Optional if the “task_name” environment info is present. Note that if provided, all tasks listed in this map will be logged, even if there are no trajectories present for them.
Returns:

Undiscounted returns averaged across all tasks. Has

shape \((N \bullet [T])\).

Return type:

numpy.ndarray

log_performance(itr, batch, discount, prefix='Evaluation')[source]

Evaluate the performance of an algorithm on a batch of trajectories.

Parameters:
  • itr (int) – Iteration number.
  • batch (TrajectoryBatch) – The trajectories to evaluate with.
  • discount (float) – Discount value, from algorithm’s property.
  • prefix (str) – Prefix to add to all logged keys.
Returns:

Undiscounted returns.

Return type:

numpy.ndarray

class InOutSpec(input_space, output_space)[source]

Bases: object

Describes the input and output spaces of a primitive or module.

Parameters:
  • input_space (akro.Space) – Input space of a module.
  • output_space (akro.Space) – Output space of a module.
input_space

Get input space of the module.

Returns:Input space of the module.
Return type:akro.Space
output_space

Get output space of the module.

Returns:Output space of the module.
Return type:akro.Space
class TimeStepBatch[source]

Bases: garage._dtypes.TimeStepBatch

A tuple representing a batch of TimeSteps.

Data type for off-policy algorithms, imitation learning and batch-RL.

env_spec

Specification for the environment from which this data was sampled.

Type:garage.envs.EnvSpec
observations

Non-flattened array of observations. Typically has shape (batch_size, S^*) (the unflattened state space of the current environment).

Type:numpy.ndarray
actions

Non-flattened array of actions. Should have shape (batch_size, S^*) (the unflattened action space of the current environment).

Type:numpy.ndarray
rewards

Array of rewards of shape (batch_size,) (1D array of length batch_size).

Type:numpy.ndarray
next_observation

Non-flattened array of next observations. Has shape (batch_size, S^*). next_observations[i] was observed by the agent after taking actions[i].

Type:numpy.ndarray
terminals

A boolean numpy array of shape shape (batch_size,) containing the termination signals for all transitions in this batch.

Type:numpy.ndarray
env_infos

A dict arbitrary environment state information.

Type:dict
agent_infos

A dict of arbitrary agent state information. For example, this may contain the hidden states from an RNN policy.

Type:dict
Raises:ValueError – If any of the above attributes do not conform to their prescribed types and shapes.
classmethod concatenate(*batches)[source]

Create a TimeStepBatch by concatenating TimeStepBatches.

Parameters:batches (list[TimeStepBatch]) – Batches to concatenate.
Returns:The concatenation of the batches.
Return type:TimeStepBatch
Raises:ValueError – If no TimeStepBatches are provided.
classmethod from_time_step_list(env_spec, ts_samples)[source]

Create a TimeStepBatch from a list of time step dictionaries.

Parameters:
  • env_spec (garage.envs.EnvSpec) – Specification for the environment from which this data was sampled.
  • ts_samples (list[dict[str, np.ndarray or dict[str, np.ndarray]]]) –

    keys: * observations (numpy.ndarray): Non-flattened array of

    observations. Typically has shape (batch_size, S^*) (the unflattened state space of the current environment).
    • actions (numpy.ndarray): Non-flattened array of actions.
      Should have shape (batch_size, S^*) (the unflattened action space of the current environment).
    • rewards (numpy.ndarray): Array of rewards of shape (
      batch_size,) (1D array of length batch_size).
    • next_observation (numpy.ndarray): Non-flattened array of next
      observations. Has shape (batch_size, S^*). next_observations[i] was observed by the agent after taking actions[i].
    • terminals (numpy.ndarray): A boolean numpy array of shape
      shape (batch_size,) containing the termination signals for all transitions in this batch.
    • env_infos (dict): A dict arbitrary environment state
      information.
    • agent_infos (dict): A dict of arbitrary agent
      state information. For example, this may contain the hidden states from an RNN policy.
Returns:

The concatenation of samples.

Return type:

TimeStepBatch

Raises:

ValueError – If no dicts are provided.

split()[source]

Split a TimeStepBatch into a list of TimeStepBatches.

The opposite of concatenate.

Returns:
A list of TimeStepBatches, with one
TimeStep per TimeStepBatch.
Return type:list[TimeStepBatch]
to_time_step_list()[source]

Convert the batch into a list of dictionaries.

This breaks the TimeStepBatch object into a list of single time step sample dictionaries. len(terminals) (or the number of discrete time step) dictionaries are returned

Returns:
Keys:
observations (numpy.ndarray): Non-flattened array of
observations. Typically has shape (batch_size, S^*) (the unflattened state space of the current environment).
actions (numpy.ndarray): Non-flattened array of actions. Should
have shape (batch_size, S^*) (the unflattened action space of the current environment).
rewards (numpy.ndarray): Array of rewards of shape (
batch_size,) (1D array of length batch_size).
next_observation (numpy.ndarray): Non-flattened array of next
observations. Has shape (batch_size, S^*). next_observations[i] was observed by the agent after taking actions[i].
terminals (numpy.ndarray): A boolean numpy array of shape
shape (batch_size,) containing the termination signals for all transitions in this batch.
env_infos (dict): A dict arbitrary environment state
information.
agent_infos (dict): A dict of arbitrary agent state
information. For example, this may contain the hidden states from an RNN policy.
Return type:list[dict[str, np.ndarray or dict[str, np.ndarray]]]

Subpackages