garage.torch package¶
PyTorch-backed modules and algorithms.
-
compute_advantages
(discount, gae_lambda, max_path_length, baselines, rewards)[source]¶ Calculate advantages.
Advantages are a discounted cumulative sum.
Calculate advantages using a baseline according to Generalized Advantage Estimation (GAE)
The discounted cumulative sum can be computed using conv2d with filter. filter:
[1, (discount * gae_lambda), (discount * gae_lambda) ^ 2, …] where the length is same with max_path_length.- baselines and rewards are also has same shape.
baselines: [ [b_11, b_12, b_13, … b_1n],
[b_21, b_22, b_23, … b_2n], … [b_m1, b_m2, b_m3, … b_mn] ]rewards: [ [r_11, r_12, r_13, … r_1n],
[r_21, r_22, r_23, … r_2n], … [r_m1, r_m2, r_m3, … r_mn] ]
Parameters: - discount (float) – RL discount factor (i.e. gamma).
- gae_lambda (float) – Lambda, as used for Generalized Advantage Estimation (GAE).
- max_path_length (int) – Maximum length of a single rollout.
- baselines (torch.Tensor) – A 2D vector of value function estimates with shape (N, T), where N is the batch dimension (number of episodes) and T is the maximum path length experienced by the agent. If an episode terminates in fewer than T time steps, the remaining elements in that episode should be set to 0.
- rewards (torch.Tensor) – A 2D vector of per-step rewards with shape (N, T), where N is the batch dimension (number of episodes) and T is the maximum path length experienced by the agent. If an episode terminates in fewer than T time steps, the remaining elements in that episode should be set to 0.
Returns: - A 2D vector of calculated advantage values with shape
(N, T), where N is the batch dimension (number of episodes) and T is the maximum path length experienced by the agent. If an episode terminates in fewer than T time steps, the remaining values in that episode should be set to 0.
Return type: torch.Tensor
-
dict_np_to_torch
(array_dict)[source]¶ Convert a dict whose values are numpy arrays to PyTorch tensors.
Modifies array_dict in place.
Parameters: array_dict (dict) – Dictionary of data in numpy arrays Returns: Dictionary of data in PyTorch tensors Return type: dict
-
filter_valids
(tensor, valids)[source]¶ Filter out tensor using valids (last index of valid tensors).
valids contains last indices of each rows.
Parameters: Returns: Filtered Tensor
Return type: torch.Tensor
-
flatten_batch
(tensor)[source]¶ Flatten a batch of observations.
Reshape a tensor of size (X, Y, Z) into (X*Y, Z)
Parameters: tensor (torch.Tensor) – Tensor to flatten. Returns: Flattened tensor. Return type: torch.Tensor
-
global_device
()[source]¶ Returns the global device that torch.Tensors should be placed on.
- Note: The global device is set by using the function
- garage.torch._functions.set_gpu_mode. If this functions is never called garage.torch._functions.device() returns None.
Returns: - The global device that newly created torch.Tensors
- should be placed on.
Return type: torch.Device
-
pad_to_last
(nums, total_length, axis=-1, val=0)[source]¶ Pad val to last in nums in given axis.
length of the result in given axis should be total_length.
Raises: IndexError
– If the input axis value is out of range of the nums arrayParameters: Returns: Padded array
Return type: torch.Tensor
-
product_of_gaussians
(mus, sigmas_squared)[source]¶ Compute mu, sigma of product of gaussians.
Parameters: - mus (torch.Tensor) – Means, with shape \((N, M)\). M is the number of mean values.
- sigmas_squared (torch.Tensor) – Variances, with shape \((N, V)\). V is the number of variance values.
Returns: Mu of product of gaussians, with shape \((N, 1)\). torch.Tensor: Sigma of product of gaussians, with shape \((N, 1)\).
Return type: torch.Tensor
-
torch_to_np
(tensors)[source]¶ Convert PyTorch tensors to numpy arrays.
Parameters: tensors (tuple) – Tuple of data in PyTorch tensors. Returns: Tuple of data in numpy arrays. Return type: tuple[numpy.ndarray] - Note: This method is deprecated and now replaced by
- garage.torch._functions.to_numpy.
-
update_module_params
(module, new_params)[source]¶ Load parameters to a module.
This function acts like torch.nn.Module._load_from_state_dict(), but it replaces the tensors in module with those in new_params, while _load_from_state_dict() loads only the value. Use this function so that the grad and grad_fn of new_params can be restored
Parameters: - module (torch.nn.Module) – A torch module.
- new_params (dict) – A dict of torch tensor used as the new parameters of this module. This parameters dict should be generated by torch.nn.Module.named_parameters()
Subpackages¶
- garage.torch.algos package
- Submodules
- garage.torch.algos.ddpg module
- garage.torch.algos.maml module
- garage.torch.algos.maml_ppo module
- garage.torch.algos.maml_trpo module
- garage.torch.algos.maml_vpg module
- garage.torch.algos.mtsac module
- garage.torch.algos.pearl module
- garage.torch.algos.ppo module
- garage.torch.algos.sac module
- garage.torch.algos.trpo module
- garage.torch.algos.vpg module
- Submodules
- garage.torch.distributions package
- garage.torch.embeddings package
- garage.torch.modules package
- garage.torch.optimizers package
- garage.torch.policies package
- garage.torch.q_functions package
- garage.torch.value_functions package