garage package¶

Garage Base.

make_optimizer(optimizer_type, module=None, **kwargs)[source]¶

Create an optimizer for pyTorch & tensorflow algos.

Parameters:	optimizer_type (Union[type, tuple[type, dict]]) – Type of optimizer. This can be an optimizer type such as ‘torch.optim.Adam’ or a tuple of type and dictionary, where dictionary contains arguments to initialize the optimizer e.g. (torch.optim.Adam, {‘lr’ : 1e-3}) module (optional) – If the optimizer type is a torch.optimizer. The torch.nn.Module module whose parameters needs to be optimized must be specify. kwargs (dict) – Other keyword arguments to initialize optimizer. This is not used when optimizer_type is tuple.
Returns:	Constructed optimizer.
Return type:	torch.optim.Optimizer
Raises:	`ValueError` – Raises value error when optimizer_type is tuple, and non-default argument is passed in kwargs.

wrap_experiment(function=None, *, log_dir=None, prefix='experiment', name=None, snapshot_mode='last', snapshot_gap=1, archive_launch_repo=True, name_parameters=None, use_existing_dir=False)[source]¶

Decorate a function to turn it into an ExperimentTemplate.

When invoked, the wrapped function will receive an ExperimentContext, which will contain the log directory into which the experiment should log information.

This decorator can be invoked in two differed ways.

Without arguments, like this:

@wrap_experiment def my_experiment(ctxt, seed, lr=0.5):

…

Or with arguments:

@wrap_experiment(snapshot_mode=’all’) def my_experiment(ctxt, seed, lr=0.5):

…

All arguments must be keyword arguments.

Parameters:	function (callable or None) – The experiment function to wrap. log_dir (str or None) – The full log directory to log to. Will be computed from name if omitted. name (str or None) – The name of this experiment template. Will be filled from the wrapped function’s name if omitted. prefix (str) – Directory under data/local in which to place the experiment directory. snapshot_mode (str) – Policy for which snapshots to keep (or make at all). Can be either “all” (all iterations will be saved), “last” (only the last iteration will be saved), “gap” (every snapshot_gap iterations are saved), or “none” (do not save snapshots). snapshot_gap (int) – Gap between snapshot iterations. Waits this number of iterations before taking another snapshot. archive_launch_repo (bool) – Whether to save an archive of the repository containing the launcher script. This is a potentially expensive operation which is useful for ensuring reproducibility. name_parameters (str or None) – Parameters to insert into the experiment name. Should be either None (the default), ‘all’ (all parameters will be used), or ‘passed’ (only passed parameters will be used). The used parameters will be inserted in the order they appear in the function definition. use_existing_dir (bool) – If true, (re)use the directory for this experiment, even if it already contains data.
Returns:	The wrapped function.
Return type:	callable

class TimeStep[source]¶

Bases: garage._dtypes.TimeStep

A tuple representing a single TimeStep.

A TimeStep represents a single sample when an agent interacts with: an environment.

env_spec¶

Specification for the environment from which this data was sampled.

Type:	garage.envs.EnvSpec

observation¶

A numpy array of shape \((O^*)\) containing the observation for the this time step in the environment. These must conform to env_spec.observation_space.

Type:	numpy.ndarray

action¶

A numpy array of shape \((A^*)\) containing the action for the this time step. These must conform to env_spec.action_space.

Type:	numpy.ndarray

reward¶

A float representing the reward for taking the action given the observation, at the this time step.

Type:	float

terminals¶

The termination signal for the this time step.

Type:	bool

env_info¶

A dict arbitrary environment state information.

Type:	dict

agent_info¶

A dict of arbitrary agent state information. For example, this may contain the hidden states from an RNN policy.

Type:	numpy.ndarray

Raises:	`ValueError` – If any of the above attributes do not conform to their prescribed types and shapes.

class TrajectoryBatch[source]¶

Bases: garage._dtypes.TrajectoryBatch

A tuple representing a batch of whole trajectories.

Data type for on-policy algorithms.

A TrajectoryBatch represents a batch of whole trajectories produced when one or more agents interacts with one or more environments.

Symbol	Description
\(N\)	Trajectory index dimension
\([T]\)	Variable-length time dimension of each trajectory
\(S^*\)	Single-step shape of a time-series tensor
\(N \bullet [T]\)	A dimension computed by flattening a variable-length time dimension \([T]\) into a single batch dimension with length \(sum_{i \in N} [T]_i\)

env_spec¶

Specification for the environment from which this data was sampled.

Type:	garage.envs.EnvSpec

observations¶

A numpy array of shape \((N \bullet [T], O^*)\) containing the (possibly multi-dimensional) observations for all time steps in this batch. These must conform to env_spec.observation_space.

Type:	numpy.ndarray

last_observations¶

A numpy array of shape \((N, O^*)\) containing the last observation of each trajectory. This is necessary since there are one more observations than actions every trajectory.

Type:	numpy.ndarray

actions¶

A numpy array of shape \((N \bullet [T], A^*)\) containing the (possibly multi-dimensional) actions for all time steps in this batch. These must conform to env_spec.action_space.

Type:	numpy.ndarray

rewards¶

A numpy array of shape \((N \bullet [T])\) containing the rewards for all time steps in this batch.

Type:	numpy.ndarray

terminals¶

A boolean numpy array of shape \((N \bullet [T])\) containing the termination signals for all time steps in this batch.

Type:	numpy.ndarray

env_infos¶

A dict of numpy arrays arbitrary environment state information. Each value of this dict should be a numpy array of shape \((N \bullet [T])\) or \((N \bullet [T], S^*)\).

Type:	dict

agent_infos¶

A dict of numpy arrays arbitrary agent state information. Each value of this dict should be a numpy array of shape \((N \bullet [T])\) or \((N \bullet [T], S^*)\). For example, this may contain the hidden states from an RNN policy.

Type:	numpy.ndarray

lengths¶

An integer numpy array of shape \((N,)\) containing the length of each trajectory in this batch. This may be used to reconstruct the individual trajectories.

Type:	numpy.ndarray

Raises:	`ValueError` – If any of the above attributes do not conform to their prescribed types and shapes.

classmethod concatenate(*batches)[source]¶

Create a TrajectoryBatch by concatenating TrajectoryBatches.

Parameters:	batches (list[TrajectoryBatch]) – Batches to concatenate.
Returns:	The concatenation of the batches.
Return type:	TrajectoryBatch

classmethod from_trajectory_list(env_spec, paths)[source]¶

Create a TrajectoryBatch from a list of trajectories.

Parameters:

env_spec (garage.envs.EnvSpec) – Specification for the environment from which this data was sampled.
paths (list[dict[str, np.ndarray or dict[str, np.ndarray]]]) –
Keys: * observations (np.ndarray): Non-flattened array of

observations. Typically has shape (T, S^*) (the unflattened state space of the current environment). observations[i] was used by the agent to choose actions[i]. observations may instead have shape (T + 1, S^*).
- next_observations (np.ndarray): Non-flattened array of
  
  observations. Has shape (T, S^*). next_observations[i] was observed by the agent after taking actions[i]. Optional. Note that to ensure all information from the environment was preserved, observations[i] should have shape (T + 1, S^*), or this key should be set. However, this method is lenient and will “duplicate” the last observation if the original last observation has been lost.
- actions (np.ndarray): Non-flattened array of actions. Should
  
  have shape (T, S^*) (the unflattened action space of the current environment).
- rewards (np.ndarray): Array of rewards of shape (T,) (1D
  
  array of length timesteps).
- dones (np.ndarray): Array of rewards of shape (T,) (1D array
  
  of length timesteps).
- agent_infos (dict[str, np.ndarray]): Dictionary of stacked,
  
  non-flattened agent_info arrays.
- env_infos (dict[str, np.ndarray]): Dictionary of stacked,
  
  non-flattened env_info arrays.

split()[source]¶

Split a TrajectoryBatch into a list of TrajectoryBatches.

The opposite of concatenate.

Returns:	A list of TrajectoryBatches, with one trajectory per batch.
Return type:	list[TrajectoryBatch]

to_trajectory_list()[source]¶

Convert the batch into a list of dictionaries.

Returns:

Keys:

observations (np.ndarray): Non-flattened array of

observations. Has shape (T, S^*) (the unflattened state space of the current environment). observations[i] was used by the agent to choose actions[i].
next_observations (np.ndarray): Non-flattened array of

observations. Has shape (T, S^*). next_observations[i] was observed by the agent after taking actions[i].
actions (np.ndarray): Non-flattened array of actions. Should

have shape (T, S^*) (the unflattened action space of the current environment).
rewards (np.ndarray): Array of rewards of shape (T,) (1D

array of length timesteps).
dones (np.ndarray): Array of dones of shape (T,) (1D array

of length timesteps).
agent_infos (dict[str, np.ndarray]): Dictionary of stacked,

non-flattened agent_info arrays.
env_infos (dict[str, np.ndarray]): Dictionary of stacked,

non-flattened env_info arrays.

Return type: list[dict[str, np.ndarray or dict[str, np.ndarray]]]

log_multitask_performance(itr, batch, discount, name_map=None)[source]¶

Log performance of trajectories from multiple tasks.

Parameters:

itr (int) – Iteration number to be logged.
batch (garage.TrajectoryBatch) – Batch of trajectories. The trajectories should have either the “task_name” or “task_id” env_infos. If the “task_name” is not present, then name_map is required, and should map from task id’s to task names.
discount (float) – Discount used in computing returns.
name_map (dict[int, str] or None) – Mapping from task id’s to task names. Optional if the “task_name” environment info is present. Note that if provided, all tasks listed in this map will be logged, even if there are no trajectories present for them.

Returns:

Undiscounted returns averaged across all tasks. Has: shape \((N \bullet [T])\).

Return type:

numpy.ndarray

log_performance(itr, batch, discount, prefix='Evaluation')[source]¶

Evaluate the performance of an algorithm on a batch of trajectories.

Parameters:	itr (int) – Iteration number. batch (TrajectoryBatch) – The trajectories to evaluate with. discount (float) – Discount value, from algorithm’s property. prefix (str) – Prefix to add to all logged keys.
Returns:	Undiscounted returns.
Return type:	numpy.ndarray

class InOutSpec(input_space, output_space)[source]¶

Bases: object

Describes the input and output spaces of a primitive or module.

Parameters:	input_space (akro.Space) – Input space of a module. output_space (akro.Space) – Output space of a module.

input_space¶

Get input space of the module.

Returns:	Input space of the module.
Return type:	akro.Space

output_space¶

Get output space of the module.

Returns:	Output space of the module.
Return type:	akro.Space

class TimeStepBatch[source]¶

Bases: garage._dtypes.TimeStepBatch

A tuple representing a batch of TimeSteps.

Data type for off-policy algorithms, imitation learning and batch-RL.

env_spec¶

Specification for the environment from which this data was sampled.

Type:	garage.envs.EnvSpec

observations¶

Non-flattened array of observations. Typically has shape (batch_size, S^*) (the unflattened state space of the current environment).

Type:	numpy.ndarray

actions¶

Non-flattened array of actions. Should have shape (batch_size, S^*) (the unflattened action space of the current environment).

Type:	numpy.ndarray

rewards¶

Array of rewards of shape (batch_size,) (1D array of length batch_size).

Type:	numpy.ndarray

next_observation¶

Non-flattened array of next observations. Has shape (batch_size, S^*). next_observations[i] was observed by the agent after taking actions[i].

Type:	numpy.ndarray

terminals¶

A boolean numpy array of shape shape (batch_size,) containing the termination signals for all transitions in this batch.

Type:	numpy.ndarray

env_infos¶

A dict arbitrary environment state information.

Type:	dict

agent_infos¶

A dict of arbitrary agent state information. For example, this may contain the hidden states from an RNN policy.

Type:	dict

Raises:	`ValueError` – If any of the above attributes do not conform to their prescribed types and shapes.

classmethod concatenate(*batches)[source]¶

Create a TimeStepBatch by concatenating TimeStepBatches.

Parameters:	batches (list[TimeStepBatch]) – Batches to concatenate.
Returns:	The concatenation of the batches.
Return type:	TimeStepBatch
Raises:	`ValueError` – If no TimeStepBatches are provided.

classmethod from_time_step_list(env_spec, ts_samples)[source]¶

Create a TimeStepBatch from a list of time step dictionaries.

Parameters:	env_spec (garage.envs.EnvSpec) – Specification for the environment from which this data was sampled. ts_samples (list[dict[str, np.ndarray or dict[str, np.ndarray]]]) – keys: * observations (numpy.ndarray): Non-flattened array of observations. Typically has shape (batch_size, S^) (the unflattened state space of the current environment). actions (numpy.ndarray): Non-flattened array of actions. Should have shape (batch_size, S^) (the unflattened action space of the current environment). rewards (numpy.ndarray): Array of rewards of shape ( batch_size,) (1D array of length batch_size). next_observation (numpy.ndarray): Non-flattened array of next observations. Has shape (batch_size, S^*). next_observations[i] was observed by the agent after taking actions[i]. terminals (numpy.ndarray): A boolean numpy array of shape shape (batch_size,) containing the termination signals for all transitions in this batch. env_infos (dict): A dict arbitrary environment state information. agent_infos (dict): A dict of arbitrary agent state information. For example, this may contain the hidden states from an RNN policy.
Returns:	The concatenation of samples.
Return type:	TimeStepBatch
Raises:	`ValueError` – If no dicts are provided.

split()[source]¶

Split a TimeStepBatch into a list of TimeStepBatches.

The opposite of concatenate.

Returns:	A list of TimeStepBatches, with one TimeStep per TimeStepBatch.
Return type:	list[TimeStepBatch]

to_time_step_list()[source]¶

Convert the batch into a list of dictionaries.

This breaks the TimeStepBatch object into a list of single time step sample dictionaries. len(terminals) (or the number of discrete time step) dictionaries are returned

Returns:

Keys:

observations (numpy.ndarray): Non-flattened array of: observations. Typically has shape (batch_size, S^*) (the unflattened state space of the current environment).
actions (numpy.ndarray): Non-flattened array of actions. Should: have shape (batch_size, S^*) (the unflattened action space of the current environment).
rewards (numpy.ndarray): Array of rewards of shape (: batch_size,) (1D array of length batch_size).
next_observation (numpy.ndarray): Non-flattened array of next: observations. Has shape (batch_size, S^*). next_observations[i] was observed by the agent after taking actions[i].
terminals (numpy.ndarray): A boolean numpy array of shape: shape (batch_size,) containing the termination signals for all transitions in this batch.
env_infos (dict): A dict arbitrary environment state: information.
agent_infos (dict): A dict of arbitrary agent state: information. For example, this may contain the hidden states from an RNN policy.

Return type: list[dict[str, np.ndarray or dict[str, np.ndarray]]]

garage package¶

Subpackages¶