garage.np package

Reinforcement Learning Algorithms which use NumPy as a numerical backend.

obtain_evaluation_samples(policy, env, max_path_length=1000, num_trajs=100)[source]

Sample the policy for num_trajs trajectories and return average values.

Parameters:
  • policy (garage.Policy) – Policy to use as the actor when gathering samples.
  • env (garage.envs.GarageEnv) – The environement used to obtain trajectories.
  • max_path_length (int) – Maximum path length. The episode will terminate when length of trajectory reaches max_path_length.
  • num_trajs (int) – Number of trajectories.
Returns:

Evaluation trajectories, representing the best

current performance of the algorithm.

Return type:

TrajectoryBatch

paths_to_tensors(paths, max_path_length, baseline_predictions, discount)[source]

Return processed sample data based on the collected paths.

Parameters:
  • paths (list[dict]) – A list of collected paths.
  • max_path_length (int) – Maximum length of a single rollout.
  • baseline_predictions (numpy.ndarray) – : Predicted value of GAE (Generalized Advantage Estimation) Baseline.
  • discount (float) – Environment reward discount.
Returns:

Processed sample data, with key
  • observations (numpy.ndarray): Padded array of the observations of
    the environment
  • actions (numpy.ndarray): Padded array of the actions fed to the
    the environment
  • rewards (numpy.ndarray): Padded array of the acquired rewards
  • agent_infos (dict): a dictionary of {stacked tensors or
    dictionary of stacked tensors}
  • env_infos (dict): a dictionary of {stacked tensors or
    dictionary of stacked tensors}
  • rewards (numpy.ndarray): Padded array of the validity information

Return type:

dict

samples_to_tensors(paths)[source]

Return processed sample data based on the collected paths.

Parameters:paths (list[dict]) – A list of collected paths.
Returns:
Processed sample data, with keys
  • undiscounted_returns (list[float])
  • success_history (list[float])
  • complete (list[bool])
Return type:dict