garage.tf
¶
Tensorflow Branch.
-
paths_to_tensors
(paths, max_episode_length, baseline_predictions, discount, gae_lambda)¶ Return processed sample data based on the collected paths.
Parameters: - paths (list[dict]) – A list of collected paths.
- max_episode_length (int) – Maximum length of a single episode.
- baseline_predictions (numpy.ndarray) – : Predicted value of GAE (Generalized Advantage Estimation) Baseline.
- discount (float) – Environment reward discount.
- gae_lambda (float) – Lambda used for generalized advantage estimation.
Returns: - Processed sample data, with key
- observations: (numpy.ndarray)
- actions: (numpy.ndarray)
- rewards: (numpy.ndarray)
- baselines: (numpy.ndarray)
- returns: (numpy.ndarray)
- valids: (numpy.ndarray)
- agent_infos: (dict)
- env_infos: (dict)
- paths: (list[dict])
Return type: