garage.np
¶
Reinforcement Learning Algorithms which use NumPy as a numerical backend.
-
obtain_evaluation_episodes
(policy, env, max_episode_length=1000, num_eps=100)¶ Sample the policy for num_eps episodes and return average values.
Parameters: - policy (Policy) – Policy to use as the actor when gathering samples.
- env (Environment) – The environement used to obtain episodes.
- max_episode_length (int) – Maximum episode length. The episode will truncated when length of episode reaches max_episode_length.
- num_eps (int) – Number of episodes.
Returns: - Evaluation episodes, representing the best current
performance of the algorithm.
Return type:
-
paths_to_tensors
(paths, max_episode_length, baseline_predictions, discount)¶ Return processed sample data based on the collected paths.
Parameters: Returns: - Processed sample data, with key
- observations (numpy.ndarray): Padded array of the observations of
- the environment
- actions (numpy.ndarray): Padded array of the actions fed to the
- the environment
- rewards (numpy.ndarray): Padded array of the acquired rewards
- agent_infos (dict): a dictionary of {stacked tensors or
- dictionary of stacked tensors}
- env_infos (dict): a dictionary of {stacked tensors or
- dictionary of stacked tensors}
- rewards (numpy.ndarray): Padded array of the validity information
Return type: