garage.torch package

PyTorch-backed modules and algorithms.

compute_advantages(discount, gae_lambda, max_path_length, baselines, rewards)[source]

Calculate advantages.

Advantages are a discounted cumulative sum.

Calculate advantages using a baseline according to Generalized Advantage Estimation (GAE)

The discounted cumulative sum can be computed using conv2d with filter. filter:

[1, (discount * gae_lambda), (discount * gae_lambda) ^ 2, …] where the length is same with max_path_length.
baselines and rewards are also has same shape.

baselines: [ [b_11, b_12, b_13, … b_1n],

[b_21, b_22, b_23, … b_2n], … [b_m1, b_m2, b_m3, … b_mn] ]

rewards: [ [r_11, r_12, r_13, … r_1n],

[r_21, r_22, r_23, … r_2n], … [r_m1, r_m2, r_m3, … r_mn] ]
Parameters:
  • discount (float) – RL discount factor (i.e. gamma).
  • gae_lambda (float) – Lambda, as used for Generalized Advantage Estimation (GAE).
  • max_path_length (int) – Maximum length of a single rollout.
  • baselines (torch.Tensor) – A 2D vector of value function estimates with shape (N, T), where N is the batch dimension (number of episodes) and T is the maximum path length experienced by the agent. If an episode terminates in fewer than T time steps, the remaining elements in that episode should be set to 0.
  • rewards (torch.Tensor) – A 2D vector of per-step rewards with shape (N, T), where N is the batch dimension (number of episodes) and T is the maximum path length experienced by the agent. If an episode terminates in fewer than T time steps, the remaining elements in that episode should be set to 0.
Returns:

A 2D vector of calculated advantage values with shape

(N, T), where N is the batch dimension (number of episodes) and T is the maximum path length experienced by the agent. If an episode terminates in fewer than T time steps, the remaining values in that episode should be set to 0.

Return type:

torch.Tensor

dict_np_to_torch(array_dict)[source]

Convert a dict whose values are numpy arrays to PyTorch tensors.

Modifies array_dict in place.

Parameters:array_dict (dict) – Dictionary of data in numpy arrays
Returns:Dictionary of data in PyTorch tensors
Return type:dict
filter_valids(tensor, valids)[source]

Filter out tensor using valids (last index of valid tensors).

valids contains last indices of each rows.

Parameters:
  • tensor (torch.Tensor) – The tensor to filter
  • valids (list[int]) – Array of length of the valid values
Returns:

Filtered Tensor

Return type:

torch.Tensor

flatten_batch(tensor)[source]

Flatten a batch of observations.

Reshape a tensor of size (X, Y, Z) into (X*Y, Z)

Parameters:tensor (torch.Tensor) – Tensor to flatten.
Returns:Flattened tensor.
Return type:torch.Tensor
global_device()[source]

Returns the global device that torch.Tensors should be placed on.

Note: The global device is set by using the function
garage.torch._functions.set_gpu_mode. If this functions is never called garage.torch._functions.device() returns None.
Returns:
The global device that newly created torch.Tensors
should be placed on.
Return type:torch.Device
pad_to_last(nums, total_length, axis=-1, val=0)[source]

Pad val to last in nums in given axis.

length of the result in given axis should be total_length.

Raises:

IndexError – If the input axis value is out of range of the nums array

Parameters:
  • nums (numpy.ndarray) – The array to pad.
  • total_length (int) – The final width of the Array.
  • axis (int) – Axis along which a sum is performed.
  • val (int) – The value to set the padded value.
Returns:

Padded array

Return type:

torch.Tensor

product_of_gaussians(mus, sigmas_squared)[source]

Compute mu, sigma of product of gaussians.

Parameters:
  • mus (torch.Tensor) – Means, with shape \((N, M)\). M is the number of mean values.
  • sigmas_squared (torch.Tensor) – Variances, with shape \((N, V)\). V is the number of variance values.
Returns:

Mu of product of gaussians, with shape \((N, 1)\). torch.Tensor: Sigma of product of gaussians, with shape \((N, 1)\).

Return type:

torch.Tensor

set_gpu_mode(mode, gpu_id=0)[source]

Set GPU mode and device ID.

Parameters:
  • mode (bool) – Whether or not to use GPU
  • gpu_id (int) – GPU ID
torch_to_np(tensors)[source]

Convert PyTorch tensors to numpy arrays.

Parameters:tensors (tuple) – Tuple of data in PyTorch tensors.
Returns:Tuple of data in numpy arrays.
Return type:tuple[numpy.ndarray]
Note: This method is deprecated and now replaced by
garage.torch._functions.to_numpy.
update_module_params(module, new_params)[source]

Load parameters to a module.

This function acts like torch.nn.Module._load_from_state_dict(), but it replaces the tensors in module with those in new_params, while _load_from_state_dict() loads only the value. Use this function so that the grad and grad_fn of new_params can be restored

Parameters:
  • module (torch.nn.Module) – A torch module.
  • new_params (dict) – A dict of torch tensor used as the new parameters of this module. This parameters dict should be generated by torch.nn.Module.named_parameters()

Subpackages