Functions exposed directly in the garage namespace.

make_optimizer(optimizer_type, module=None, **kwargs)[source]

Create an optimizer for pyTorch & tensorflow algos.

  • optimizer_type (Union[type, tuple[type, dict]]) – Type of optimizer. This can be an optimizer type such as ‘torch.optim.Adam’ or a tuple of type and dictionary, where dictionary contains arguments to initialize the optimizer e.g. (torch.optim.Adam, {‘lr’ : 1e-3})

  • module (optional) – If the optimizer type is a torch.optimizer. The torch.nn.Module module whose parameters needs to be optimized must be specify.

  • kwargs (dict) – Other keyword arguments to initialize optimizer. This is not used when optimizer_type is tuple.


Constructed optimizer.

Return type



ValueError – Raises value error when optimizer_type is tuple, and non-default argument is passed in kwargs.

rollout(env, agent, *, max_episode_length=np.inf, animated=False, pause_per_frame=None, deterministic=False)[source]

Sample a single episode of the agent in the environment.

  • agent (Policy) – Policy used to select actions.

  • env (Environment) – Environment to perform actions in.

  • max_episode_length (int) – If the episode reaches this many timesteps, it is truncated.

  • animated (bool) – If true, render the environment after each step.

  • pause_per_frame (float) – Time to sleep between steps. Only relevant if animated == true.

  • deterministic (bool) – If true, use the mean action returned by the stochastic policy instead of sampling from the returned action distribution.


Dictionary, with keys:
  • observations(np.array): Flattened array of observations.

    There should be one more of these than actions. Note that observations[i] (for i < len(observations) - 1) was used by the agent to choose actions[i]. Should have shape \((T + 1, S^*)\), i.e. the unflattened observation space of

    the current environment.

  • actions(np.array): Non-flattened array of actions. Should have

    shape \((T, S^*)\), i.e. the unflattened action space of the current environment.

  • rewards(np.array): Array of rewards of shape \((T,)\), i.e. a

    1D array of length timesteps.

  • agent_infos(Dict[str, np.array]): Dictionary of stacked,

    non-flattened agent_info arrays.

  • env_infos(Dict[str, np.array]): Dictionary of stacked,

    non-flattened env_info arrays.

  • dones(np.array): Array of termination signals.

Return type

dict[str, np.ndarray or dict]

obtain_evaluation_episodes(policy, env, max_episode_length=1000, num_eps=100, deterministic=True)[source]

Sample the policy for num_eps episodes and return average values.

  • policy (Policy) – Policy to use as the actor when gathering samples.

  • env (Environment) – The environement used to obtain episodes.

  • max_episode_length (int) – Maximum episode length. The episode will truncated when length of episode reaches max_episode_length.

  • num_eps (int) – Number of episodes.

  • deterministic (bool) – Whether the a deterministic approach is used in rollout.


Evaluation episodes, representing the best current

performance of the algorithm.

Return type


log_multitask_performance(itr, batch, discount, name_map=None)[source]

Log performance of episodes from multiple tasks.

  • itr (int) – Iteration number to be logged.

  • batch (EpisodeBatch) – Batch of episodes. The episodes should have either the “task_name” or “task_id” env_infos. If the “task_name” is not present, then name_map is required, and should map from task id’s to task names.

  • discount (float) – Discount used in computing returns.

  • name_map (dict[int, str] or None) – Mapping from task id’s to task names. Optional if the “task_name” environment info is present. Note that if provided, all tasks listed in this map will be logged, even if there are no episodes present for them.


Undiscounted returns averaged across all tasks. Has

shape \((N \bullet [T])\).

Return type


log_performance(itr, batch, discount, prefix='Evaluation')[source]

Evaluate the performance of an algorithm on a batch of episodes.

  • itr (int) – Iteration number.

  • batch (EpisodeBatch) – The episodes to evaluate with.

  • discount (float) – Discount value, from algorithm’s property.

  • prefix (str) – Prefix to add to all logged keys.


Undiscounted returns.

Return type