`garage.np`¶

Reinforcement Learning Algorithms which use NumPy as a numerical backend.

obtain_evaluation_episodes(policy, env, max_episode_length=1000, num_eps=100)¶

Sample the policy for num_eps episodes and return average values.

Parameters:

policy (Policy) – Policy to use as the actor when gathering samples.
env (Environment) – The environement used to obtain episodes.
max_episode_length (int) – Maximum episode length. The episode will truncated when length of episode reaches max_episode_length.
num_eps (int) – Number of episodes.

Returns:

Evaluation episodes, representing the best current: performance of the algorithm.

Return type:

paths_to_tensors(paths, max_episode_length, baseline_predictions, discount)¶

Return processed sample data based on the collected paths.

Parameters:

paths (list[dict]) – A list of collected paths.
max_episode_length (int) – Maximum length of a single episode.
baseline_predictions (numpy.ndarray) – : Predicted value of GAE (Generalized Advantage Estimation) Baseline.
discount (float) – Environment reward discount.

Returns:

Processed sample data, with key

observations (numpy.ndarray): Padded array of the observations of

the environment
actions (numpy.ndarray): Padded array of the actions fed to the

the environment
rewards (numpy.ndarray): Padded array of the acquired rewards
agent_infos (dict): a dictionary of {stacked tensors or

dictionary of stacked tensors}
env_infos (dict): a dictionary of {stacked tensors or

dictionary of stacked tensors}
rewards (numpy.ndarray): Padded array of the validity information

Return type:

samples_to_tensors(paths)¶

Return processed sample data based on the collected paths.

Parameters:	paths (list[dict]) – A list of collected paths.
Returns:	Processed sample data, with keys undiscounted_returns (list[float]) complete (list[bool])
Return type:	dict

garage.np¶