garage._functions
¶
Functions exposed directly in the garage namespace.
-
make_optimizer
(optimizer_type, module=None, **kwargs)[source]¶ Create an optimizer for pyTorch & tensorflow algos.
- Parameters
optimizer_type (Union[type, tuple[type, dict]]) – Type of optimizer. This can be an optimizer type such as ‘torch.optim.Adam’ or a tuple of type and dictionary, where dictionary contains arguments to initialize the optimizer e.g. (torch.optim.Adam, {‘lr’ : 1e-3})
module (optional) – If the optimizer type is a torch.optimizer. The torch.nn.Module module whose parameters needs to be optimized must be specify.
kwargs (dict) – Other keyword arguments to initialize optimizer. This is not used when optimizer_type is tuple.
- Returns
Constructed optimizer.
- Return type
torch.optim.Optimizer
- Raises
ValueError – Raises value error when optimizer_type is tuple, and non-default argument is passed in kwargs.
-
rollout
(env, agent, *, max_episode_length=np.inf, animated=False, pause_per_frame=None, deterministic=False)[source]¶ Sample a single episode of the agent in the environment.
- Parameters
agent (Policy) – Policy used to select actions.
env (Environment) – Environment to perform actions in.
max_episode_length (int) – If the episode reaches this many timesteps, it is truncated.
animated (bool) – If true, render the environment after each step.
pause_per_frame (float) – Time to sleep between steps. Only relevant if animated == true.
deterministic (bool) – If true, use the mean action returned by the stochastic policy instead of sampling from the returned action distribution.
- Returns
- Dictionary, with keys:
- observations(np.array): Flattened array of observations.
There should be one more of these than actions. Note that observations[i] (for i < len(observations) - 1) was used by the agent to choose actions[i]. Should have shape \((T + 1, S^*)\), i.e. the unflattened observation space of
the current environment.
- actions(np.array): Non-flattened array of actions. Should have
shape \((T, S^*)\), i.e. the unflattened action space of the current environment.
- rewards(np.array): Array of rewards of shape \((T,)\), i.e. a
1D array of length timesteps.
- agent_infos(Dict[str, np.array]): Dictionary of stacked,
non-flattened agent_info arrays.
- env_infos(Dict[str, np.array]): Dictionary of stacked,
non-flattened env_info arrays.
dones(np.array): Array of termination signals.
- Return type
-
obtain_evaluation_episodes
(policy, env, max_episode_length=1000, num_eps=100, deterministic=True)[source]¶ Sample the policy for num_eps episodes and return average values.
- Parameters
policy (Policy) – Policy to use as the actor when gathering samples.
env (Environment) – The environement used to obtain episodes.
max_episode_length (int) – Maximum episode length. The episode will truncated when length of episode reaches max_episode_length.
num_eps (int) – Number of episodes.
deterministic (bool) – Whether the a deterministic approach is used in rollout.
- Returns
- Evaluation episodes, representing the best current
performance of the algorithm.
- Return type
-
log_multitask_performance
(itr, batch, discount, name_map=None)[source]¶ Log performance of episodes from multiple tasks.
- Parameters
itr (int) – Iteration number to be logged.
batch (EpisodeBatch) – Batch of episodes. The episodes should have either the “task_name” or “task_id” env_infos. If the “task_name” is not present, then name_map is required, and should map from task id’s to task names.
discount (float) – Discount used in computing returns.
name_map (dict[int, str] or None) – Mapping from task id’s to task names. Optional if the “task_name” environment info is present. Note that if provided, all tasks listed in this map will be logged, even if there are no episodes present for them.
- Returns
- Undiscounted returns averaged across all tasks. Has
shape \((N \bullet [T])\).
- Return type
numpy.ndarray
-
log_performance
(itr, batch, discount, prefix='Evaluation')[source]¶ Evaluate the performance of an algorithm on a batch of episodes.
- Parameters
itr (int) – Iteration number.
batch (EpisodeBatch) – The episodes to evaluate with.
discount (float) – Discount value, from algorithm’s property.
prefix (str) – Prefix to add to all logged keys.
- Returns
Undiscounted returns.
- Return type
numpy.ndarray