garage package¶
Garage Base.
-
make_optimizer
(optimizer_type, module=None, **kwargs)[source]¶ Create an optimizer for pyTorch & tensorflow algos.
Parameters: - optimizer_type (Union[type, tuple[type, dict]]) – Type of optimizer. This can be an optimizer type such as ‘torch.optim.Adam’ or a tuple of type and dictionary, where dictionary contains arguments to initialize the optimizer e.g. (torch.optim.Adam, {‘lr’ : 1e-3})
- module (optional) – If the optimizer type is a torch.optimizer. The torch.nn.Module module whose parameters needs to be optimized must be specify.
- kwargs (dict) – Other keyword arguments to initialize optimizer. This is not used when optimizer_type is tuple.
Returns: Constructed optimizer.
Return type: torch.optim.Optimizer
Raises: ValueError
– Raises value error when optimizer_type is tuple, and non-default argument is passed in kwargs.
-
wrap_experiment
(function=None, *, log_dir=None, prefix='experiment', name=None, snapshot_mode='last', snapshot_gap=1, archive_launch_repo=True, name_parameters=None, use_existing_dir=False)[source]¶ Decorate a function to turn it into an ExperimentTemplate.
When invoked, the wrapped function will receive an ExperimentContext, which will contain the log directory into which the experiment should log information.
This decorator can be invoked in two differed ways.
Without arguments, like this:
@wrap_experiment def my_experiment(ctxt, seed, lr=0.5):
…Or with arguments:
@wrap_experiment(snapshot_mode=’all’) def my_experiment(ctxt, seed, lr=0.5):
…All arguments must be keyword arguments.
Parameters: - function (callable or None) – The experiment function to wrap.
- log_dir (str or None) – The full log directory to log to. Will be computed from name if omitted.
- name (str or None) – The name of this experiment template. Will be filled from the wrapped function’s name if omitted.
- prefix (str) – Directory under data/local in which to place the experiment directory.
- snapshot_mode (str) – Policy for which snapshots to keep (or make at all). Can be either “all” (all iterations will be saved), “last” (only the last iteration will be saved), “gap” (every snapshot_gap iterations are saved), or “none” (do not save snapshots).
- snapshot_gap (int) – Gap between snapshot iterations. Waits this number of iterations before taking another snapshot.
- archive_launch_repo (bool) – Whether to save an archive of the repository containing the launcher script. This is a potentially expensive operation which is useful for ensuring reproducibility.
- name_parameters (str or None) – Parameters to insert into the experiment name. Should be either None (the default), ‘all’ (all parameters will be used), or ‘passed’ (only passed parameters will be used). The used parameters will be inserted in the order they appear in the function definition.
- use_existing_dir (bool) – If true, (re)use the directory for this experiment, even if it already contains data.
Returns: The wrapped function.
Return type: callable
-
class
TimeStep
[source]¶ Bases:
garage._dtypes.TimeStep
A tuple representing a single TimeStep.
- A
TimeStep
represents a single sample when an agent interacts with - an environment.
-
env_spec
¶ Specification for the environment from which this data was sampled.
Type: garage.envs.EnvSpec
-
observation
¶ A numpy array of shape \((O^*)\) containing the observation for the this time step in the environment. These must conform to
env_spec.observation_space
.Type: numpy.ndarray
-
action
¶ A numpy array of shape \((A^*)\) containing the action for the this time step. These must conform to
env_spec.action_space
.Type: numpy.ndarray
-
reward
¶ A float representing the reward for taking the action given the observation, at the this time step.
Type: float
-
agent_info
¶ A dict of arbitrary agent state information. For example, this may contain the hidden states from an RNN policy.
Type: numpy.ndarray
Raises: ValueError
– If any of the above attributes do not conform to their prescribed types and shapes.- A
-
class
TrajectoryBatch
[source]¶ Bases:
garage._dtypes.TrajectoryBatch
A tuple representing a batch of whole trajectories.
Data type for on-policy algorithms.
A
TrajectoryBatch
represents a batch of whole trajectories produced when one or more agents interacts with one or more environments.Symbol Description \(N\) Trajectory index dimension \([T]\) Variable-length time dimension of each trajectory \(S^*\) Single-step shape of a time-series tensor \(N \bullet [T]\) A dimension computed by flattening a variable-length time dimension \([T]\) into a single batch dimension with length \(sum_{i \in N} [T]_i\) -
env_spec
¶ Specification for the environment from which this data was sampled.
Type: garage.envs.EnvSpec
-
observations
¶ A numpy array of shape \((N \bullet [T], O^*)\) containing the (possibly multi-dimensional) observations for all time steps in this batch. These must conform to
env_spec.observation_space
.Type: numpy.ndarray
-
last_observations
¶ A numpy array of shape \((N, O^*)\) containing the last observation of each trajectory. This is necessary since there are one more observations than actions every trajectory.
Type: numpy.ndarray
-
actions
¶ A numpy array of shape \((N \bullet [T], A^*)\) containing the (possibly multi-dimensional) actions for all time steps in this batch. These must conform to
env_spec.action_space
.Type: numpy.ndarray
-
rewards
¶ A numpy array of shape \((N \bullet [T])\) containing the rewards for all time steps in this batch.
Type: numpy.ndarray
-
terminals
¶ A boolean numpy array of shape \((N \bullet [T])\) containing the termination signals for all time steps in this batch.
Type: numpy.ndarray
-
env_infos
¶ A dict of numpy arrays arbitrary environment state information. Each value of this dict should be a numpy array of shape \((N \bullet [T])\) or \((N \bullet [T], S^*)\).
Type: dict
-
agent_infos
¶ A dict of numpy arrays arbitrary agent state information. Each value of this dict should be a numpy array of shape \((N \bullet [T])\) or \((N \bullet [T], S^*)\). For example, this may contain the hidden states from an RNN policy.
Type: numpy.ndarray
-
lengths
¶ An integer numpy array of shape \((N,)\) containing the length of each trajectory in this batch. This may be used to reconstruct the individual trajectories.
Type: numpy.ndarray
Raises: ValueError
– If any of the above attributes do not conform to their prescribed types and shapes.-
classmethod
concatenate
(*batches)[source]¶ Create a TrajectoryBatch by concatenating TrajectoryBatches.
Parameters: batches (list[TrajectoryBatch]) – Batches to concatenate. Returns: The concatenation of the batches. Return type: TrajectoryBatch
-
classmethod
from_trajectory_list
(env_spec, paths)[source]¶ Create a TrajectoryBatch from a list of trajectories.
Parameters: - env_spec (garage.envs.EnvSpec) – Specification for the environment from which this data was sampled.
- paths (list[dict[str, np.ndarray or dict[str, np.ndarray]]]) –
Keys: * observations (np.ndarray): Non-flattened array of
observations. Typically has shape (T, S^*) (the unflattened state space of the current environment). observations[i] was used by the agent to choose actions[i]. observations may instead have shape (T + 1, S^*).- next_observations (np.ndarray): Non-flattened array of
- observations. Has shape (T, S^*). next_observations[i] was observed by the agent after taking actions[i]. Optional. Note that to ensure all information from the environment was preserved, observations[i] should have shape (T + 1, S^*), or this key should be set. However, this method is lenient and will “duplicate” the last observation if the original last observation has been lost.
- actions (np.ndarray): Non-flattened array of actions. Should
- have shape (T, S^*) (the unflattened action space of the current environment).
- rewards (np.ndarray): Array of rewards of shape (T,) (1D
- array of length timesteps).
- dones (np.ndarray): Array of rewards of shape (T,) (1D array
- of length timesteps).
- agent_infos (dict[str, np.ndarray]): Dictionary of stacked,
- non-flattened agent_info arrays.
- env_infos (dict[str, np.ndarray]): Dictionary of stacked,
- non-flattened env_info arrays.
-
split
()[source]¶ Split a TrajectoryBatch into a list of TrajectoryBatches.
The opposite of concatenate.
Returns: - A list of TrajectoryBatches, with one
- trajectory per batch.
Return type: list[TrajectoryBatch]
-
to_trajectory_list
()[source]¶ Convert the batch into a list of dictionaries.
Returns: - Keys:
- observations (np.ndarray): Non-flattened array of
- observations. Has shape (T, S^*) (the unflattened state space of the current environment). observations[i] was used by the agent to choose actions[i].
- next_observations (np.ndarray): Non-flattened array of
- observations. Has shape (T, S^*). next_observations[i] was observed by the agent after taking actions[i].
- actions (np.ndarray): Non-flattened array of actions. Should
- have shape (T, S^*) (the unflattened action space of the current environment).
- rewards (np.ndarray): Array of rewards of shape (T,) (1D
- array of length timesteps).
- dones (np.ndarray): Array of dones of shape (T,) (1D array
- of length timesteps).
- agent_infos (dict[str, np.ndarray]): Dictionary of stacked,
- non-flattened agent_info arrays.
- env_infos (dict[str, np.ndarray]): Dictionary of stacked,
- non-flattened env_info arrays.
Return type: list[dict[str, np.ndarray or dict[str, np.ndarray]]]
-
-
log_multitask_performance
(itr, batch, discount, name_map=None)[source]¶ Log performance of trajectories from multiple tasks.
Parameters: - itr (int) – Iteration number to be logged.
- batch (garage.TrajectoryBatch) – Batch of trajectories. The trajectories should have either the “task_name” or “task_id” env_infos. If the “task_name” is not present, then name_map is required, and should map from task id’s to task names.
- discount (float) – Discount used in computing returns.
- name_map (dict[int, str] or None) – Mapping from task id’s to task names. Optional if the “task_name” environment info is present. Note that if provided, all tasks listed in this map will be logged, even if there are no trajectories present for them.
Returns: - Undiscounted returns averaged across all tasks. Has
shape \((N \bullet [T])\).
Return type: numpy.ndarray
-
log_performance
(itr, batch, discount, prefix='Evaluation')[source]¶ Evaluate the performance of an algorithm on a batch of trajectories.
Parameters: - itr (int) – Iteration number.
- batch (TrajectoryBatch) – The trajectories to evaluate with.
- discount (float) – Discount value, from algorithm’s property.
- prefix (str) – Prefix to add to all logged keys.
Returns: Undiscounted returns.
Return type: numpy.ndarray
-
class
InOutSpec
(input_space, output_space)[source]¶ Bases:
object
Describes the input and output spaces of a primitive or module.
Parameters: - input_space (akro.Space) – Input space of a module.
- output_space (akro.Space) – Output space of a module.
-
input_space
¶ Get input space of the module.
Returns: Input space of the module. Return type: akro.Space
-
output_space
¶ Get output space of the module.
Returns: Output space of the module. Return type: akro.Space
-
class
TimeStepBatch
[source]¶ Bases:
garage._dtypes.TimeStepBatch
A tuple representing a batch of TimeSteps.
Data type for off-policy algorithms, imitation learning and batch-RL.
-
env_spec
¶ Specification for the environment from which this data was sampled.
Type: garage.envs.EnvSpec
-
observations
¶ Non-flattened array of observations. Typically has shape (batch_size, S^*) (the unflattened state space of the current environment).
Type: numpy.ndarray
-
actions
¶ Non-flattened array of actions. Should have shape (batch_size, S^*) (the unflattened action space of the current environment).
Type: numpy.ndarray
-
rewards
¶ Array of rewards of shape (batch_size,) (1D array of length batch_size).
Type: numpy.ndarray
-
next_observation
¶ Non-flattened array of next observations. Has shape (batch_size, S^*). next_observations[i] was observed by the agent after taking actions[i].
Type: numpy.ndarray
-
terminals
¶ A boolean numpy array of shape shape (batch_size,) containing the termination signals for all transitions in this batch.
Type: numpy.ndarray
-
agent_infos
¶ A dict of arbitrary agent state information. For example, this may contain the hidden states from an RNN policy.
Type: dict
Raises: ValueError
– If any of the above attributes do not conform to their prescribed types and shapes.-
classmethod
concatenate
(*batches)[source]¶ Create a TimeStepBatch by concatenating TimeStepBatches.
Parameters: batches (list[TimeStepBatch]) – Batches to concatenate. Returns: The concatenation of the batches. Return type: TimeStepBatch Raises: ValueError
– If no TimeStepBatches are provided.
-
classmethod
from_time_step_list
(env_spec, ts_samples)[source]¶ Create a TimeStepBatch from a list of time step dictionaries.
Parameters: - env_spec (garage.envs.EnvSpec) – Specification for the environment from which this data was sampled.
- ts_samples (list[dict[str, np.ndarray or dict[str, np.ndarray]]]) –
keys: * observations (numpy.ndarray): Non-flattened array of
observations. Typically has shape (batch_size, S^*) (the unflattened state space of the current environment).- actions (numpy.ndarray): Non-flattened array of actions.
- Should have shape (batch_size, S^*) (the unflattened action space of the current environment).
- rewards (numpy.ndarray): Array of rewards of shape (
- batch_size,) (1D array of length batch_size).
- next_observation (numpy.ndarray): Non-flattened array of next
- observations. Has shape (batch_size, S^*). next_observations[i] was observed by the agent after taking actions[i].
- terminals (numpy.ndarray): A boolean numpy array of shape
- shape (batch_size,) containing the termination signals for all transitions in this batch.
- env_infos (dict): A dict arbitrary environment state
- information.
- agent_infos (dict): A dict of arbitrary agent
- state information. For example, this may contain the hidden states from an RNN policy.
Returns: The concatenation of samples.
Return type: Raises: ValueError
– If no dicts are provided.
-
split
()[source]¶ Split a TimeStepBatch into a list of TimeStepBatches.
The opposite of concatenate.
Returns: - A list of TimeStepBatches, with one
- TimeStep per TimeStepBatch.
Return type: list[TimeStepBatch]
-
to_time_step_list
()[source]¶ Convert the batch into a list of dictionaries.
This breaks the TimeStepBatch object into a list of single time step sample dictionaries. len(terminals) (or the number of discrete time step) dictionaries are returned
Returns: - Keys:
- observations (numpy.ndarray): Non-flattened array of
- observations. Typically has shape (batch_size, S^*) (the unflattened state space of the current environment).
- actions (numpy.ndarray): Non-flattened array of actions. Should
- have shape (batch_size, S^*) (the unflattened action space of the current environment).
- rewards (numpy.ndarray): Array of rewards of shape (
- batch_size,) (1D array of length batch_size).
- next_observation (numpy.ndarray): Non-flattened array of next
- observations. Has shape (batch_size, S^*). next_observations[i] was observed by the agent after taking actions[i].
- terminals (numpy.ndarray): A boolean numpy array of shape
- shape (batch_size,) containing the termination signals for all transitions in this batch.
- env_infos (dict): A dict arbitrary environment state
- information.
- agent_infos (dict): A dict of arbitrary agent state
- information. For example, this may contain the hidden states from an RNN policy.
Return type: list[dict[str, np.ndarray or dict[str, np.ndarray]]]
-
Subpackages¶
- garage.envs package
- Subpackages
- garage.envs.dm_control package
- garage.envs.mujoco package
- garage.envs.wrappers package
- Submodules
- garage.envs.wrappers.atari_env module
- garage.envs.wrappers.clip_reward module
- garage.envs.wrappers.episodic_life module
- garage.envs.wrappers.fire_reset module
- garage.envs.wrappers.grayscale module
- garage.envs.wrappers.max_and_skip module
- garage.envs.wrappers.noop module
- garage.envs.wrappers.resize module
- garage.envs.wrappers.stack_frames module
- Submodules
- Submodules
- Subpackages
- garage.experiment package
- Submodules
- garage.experiment.deterministic module
- garage.experiment.experiment module
- garage.experiment.experiment_wrapper module
- garage.experiment.local_runner module
- garage.experiment.local_tf_runner module
- garage.experiment.meta_evaluator module
- garage.experiment.snapshotter module
- garage.experiment.task_sampler module
- Submodules
- garage.misc package
- garage.np package
- garage.plotter package
- garage.replay_buffer package
- garage.sampler package
- Submodules
- garage.sampler.batch_sampler module
- garage.sampler.default_worker module
- garage.sampler.env_update module
- garage.sampler.is_sampler module
- garage.sampler.local_sampler module
- garage.sampler.multiprocessing_sampler module
- garage.sampler.off_policy_vectorized_sampler module
- garage.sampler.on_policy_vectorized_sampler module
- garage.sampler.parallel_sampler module
- garage.sampler.parallel_vec_env_executor module
- garage.sampler.ray_sampler module
- garage.sampler.sampler module
- garage.sampler.sampler_deprecated module
- garage.sampler.stateful_pool module
- garage.sampler.utils module
- garage.sampler.vec_env_executor module
- garage.sampler.vec_worker module
- garage.sampler.worker module
- garage.sampler.worker_factory module
- Submodules
- garage.tf package
- Subpackages
- garage.tf.algos package
- Submodules
- garage.tf.algos.ddpg module
- garage.tf.algos.dqn module
- garage.tf.algos.erwr module
- garage.tf.algos.npo module
- garage.tf.algos.ppo module
- garage.tf.algos.reps module
- garage.tf.algos.rl2 module
- garage.tf.algos.rl2ppo module
- garage.tf.algos.rl2trpo module
- garage.tf.algos.td3 module
- garage.tf.algos.te module
- garage.tf.algos.te_npo module
- garage.tf.algos.te_ppo module
- garage.tf.algos.tnpg module
- garage.tf.algos.trpo module
- garage.tf.algos.vpg module
- Submodules
- garage.tf.baselines package
- garage.tf.distributions package
- garage.tf.embeddings package
- garage.tf.misc package
- garage.tf.models package
- Submodules
- garage.tf.models.categorical_cnn_model module
- garage.tf.models.categorical_gru_model module
- garage.tf.models.categorical_lstm_model module
- garage.tf.models.categorical_mlp_model module
- garage.tf.models.cnn module
- garage.tf.models.cnn_mlp_merge_model module
- garage.tf.models.cnn_model module
- garage.tf.models.cnn_model_max_pooling module
- garage.tf.models.gaussian_cnn_model module
- garage.tf.models.gaussian_gru_model module
- garage.tf.models.gaussian_lstm_model module
- garage.tf.models.gaussian_mlp_model module
- garage.tf.models.gru module
- garage.tf.models.gru_model module
- garage.tf.models.lstm module
- garage.tf.models.lstm_model module
- garage.tf.models.mlp module
- garage.tf.models.mlp_dueling_model module
- garage.tf.models.mlp_merge_model module
- garage.tf.models.mlp_model module
- garage.tf.models.model module
- garage.tf.models.module module
- garage.tf.models.normalized_input_mlp_model module
- garage.tf.models.parameter module
- garage.tf.models.sequential module
- Submodules
- garage.tf.optimizers package
- garage.tf.plotter package
- garage.tf.policies package
- Submodules
- garage.tf.policies.categorical_cnn_policy module
- garage.tf.policies.categorical_gru_policy module
- garage.tf.policies.categorical_lstm_policy module
- garage.tf.policies.categorical_mlp_policy module
- garage.tf.policies.continuous_mlp_policy module
- garage.tf.policies.discrete_qf_derived_policy module
- garage.tf.policies.gaussian_gru_policy module
- garage.tf.policies.gaussian_lstm_policy module
- garage.tf.policies.gaussian_mlp_policy module
- garage.tf.policies.gaussian_mlp_task_embedding_policy module
- garage.tf.policies.policy module
- garage.tf.policies.task_embedding_policy module
- garage.tf.policies.uniform_control_policy module
- Submodules
- garage.tf.q_functions package
- garage.tf.regressors package
- Submodules
- garage.tf.regressors.bernoulli_mlp_regressor module
- garage.tf.regressors.categorical_mlp_regressor module
- garage.tf.regressors.categorical_mlp_regressor_model module
- garage.tf.regressors.continuous_mlp_regressor module
- garage.tf.regressors.gaussian_cnn_regressor module
- garage.tf.regressors.gaussian_cnn_regressor_model module
- garage.tf.regressors.gaussian_mlp_regressor module
- garage.tf.regressors.gaussian_mlp_regressor_model module
- garage.tf.regressors.regressor module
- Submodules
- garage.tf.samplers package
- garage.tf.algos package
- Subpackages
- garage.torch package
- Subpackages
- garage.torch.algos package
- Submodules
- garage.torch.algos.ddpg module
- garage.torch.algos.maml module
- garage.torch.algos.maml_ppo module
- garage.torch.algos.maml_trpo module
- garage.torch.algos.maml_vpg module
- garage.torch.algos.mtsac module
- garage.torch.algos.pearl module
- garage.torch.algos.ppo module
- garage.torch.algos.sac module
- garage.torch.algos.trpo module
- garage.torch.algos.vpg module
- Submodules
- garage.torch.distributions package
- garage.torch.embeddings package
- garage.torch.modules package
- garage.torch.optimizers package
- garage.torch.policies package
- garage.torch.q_functions package
- garage.torch.value_functions package
- garage.torch.algos package
- Subpackages