garage.torch
¶
PyTorch-backed modules and algorithms.
-
compute_advantages
(discount, gae_lambda, max_episode_length, baselines, rewards)¶ Calculate advantages.
Advantages are a discounted cumulative sum.
Calculate advantages using a baseline according to Generalized Advantage Estimation (GAE)
The discounted cumulative sum can be computed using conv2d with filter. filter:
[1, (discount * gae_lambda), (discount * gae_lambda) ^ 2, …] where the length is same with max_episode_length.
- baselines and rewards are also has same shape.
baselines: [ [b_11, b_12, b_13, … b_1n],
[b_21, b_22, b_23, … b_2n], … [b_m1, b_m2, b_m3, … b_mn] ]
rewards: [ [r_11, r_12, r_13, … r_1n],
[r_21, r_22, r_23, … r_2n], … [r_m1, r_m2, r_m3, … r_mn] ]
- Parameters
discount (float) – RL discount factor (i.e. gamma).
gae_lambda (float) – Lambda, as used for Generalized Advantage Estimation (GAE).
max_episode_length (int) – Maximum length of a single episode.
baselines (torch.Tensor) – A 2D vector of value function estimates with shape (N, T), where N is the batch dimension (number of episodes) and T is the maximum episode length experienced by the agent. If an episode terminates in fewer than T time steps, the remaining elements in that episode should be set to 0.
rewards (torch.Tensor) – A 2D vector of per-step rewards with shape (N, T), where N is the batch dimension (number of episodes) and T is the maximum episode length experienced by the agent. If an episode terminates in fewer than T time steps, the remaining elements in that episode should be set to 0.
- Returns
- A 2D vector of calculated advantage values with shape
(N, T), where N is the batch dimension (number of episodes) and T is the maximum episode length experienced by the agent. If an episode terminates in fewer than T time steps, the remaining values in that episode should be set to 0.
- Return type
torch.Tensor
-
dict_np_to_torch
(array_dict)¶ Convert a dict whose values are numpy arrays to PyTorch tensors.
Modifies array_dict in place.
-
filter_valids
(tensor, valids)¶ Filter out tensor using valids (last index of valid tensors).
valids contains last indices of each rows.
-
flatten_batch
(tensor)¶ Flatten a batch of observations.
Reshape a tensor of size (X, Y, Z) into (X*Y, Z)
- Parameters
tensor (torch.Tensor) – Tensor to flatten.
- Returns
Flattened tensor.
- Return type
torch.Tensor
-
flatten_to_single_vector
(tensor)¶ Collapse the C x H x W values per representation into a single long vector.
Reshape a tensor of size (N, C, H, W) into (N, C * H * W).
- Parameters
tensor (torch.tensor) – batch of data.
- Returns
Reshaped view of that data (analogous to numpy.reshape)
- Return type
torch.Tensor
-
global_device
()¶ Returns the global device that torch.Tensors should be placed on.
- Note: The global device is set by using the function
garage.torch._functions.set_gpu_mode. If this functions is never called garage.torch._functions.device() returns None.
- Returns
- The global device that newly created torch.Tensors
should be placed on.
- Return type
torch.Device
-
class
NonLinearity
(non_linear)¶ Bases:
torch.nn.Module
Wrapper class for non linear function or module.
- Parameters
non_linear (callable or type) – Non-linear function or type to be wrapped.
-
forward
(self, input_value)¶ Forward method.
- Parameters
input_value (torch.Tensor) – Input values
- Returns
Output value
- Return type
torch.Tensor
-
np_to_torch
(array)¶ Numpy arrays to PyTorch tensors.
- Parameters
array (np.ndarray) – Data in numpy array.
- Returns
float tensor on the global device.
- Return type
torch.Tensor
-
pad_to_last
(nums, total_length, axis=- 1, val=0)¶ Pad val to last in nums in given axis.
length of the result in given axis should be total_length.
- Raises
IndexError – If the input axis value is out of range of the nums array
- Parameters
- Returns
Padded array
- Return type
torch.Tensor
-
product_of_gaussians
(mus, sigmas_squared)¶ Compute mu, sigma of product of gaussians.
- Parameters
mus (torch.Tensor) – Means, with shape \((N, M)\). M is the number of mean values.
sigmas_squared (torch.Tensor) – Variances, with shape \((N, V)\). V is the number of variance values.
- Returns
Mu of product of gaussians, with shape \((N, 1)\). torch.Tensor: Sigma of product of gaussians, with shape \((N, 1)\).
- Return type
torch.Tensor
-
set_gpu_mode
(mode, gpu_id=0)¶ Set GPU mode and device ID.
-
torch_to_np
(tensors)¶ Convert PyTorch tensors to numpy arrays.
- Parameters
tensors (tuple) – Tuple of data in PyTorch tensors.
- Returns
Tuple of data in numpy arrays.
- Return type
tuple[numpy.ndarray]
- Note: This method is deprecated and now replaced by
garage.torch._functions.to_numpy.
-
class
TransposeImage
(env)¶ Bases:
garage.Wrapper
Transpose observation space for image observation in PyTorch.
- Reshape the input observation shape from (H, W, C) into (C, H, W)
in pytorch format.
-
property
observation_space
(self)¶ akro.Space: The observation space specification.
-
property
spec
(self)¶ EnvSpec: The environment specification.
-
step
(self, action)¶ Step the wrapped env.
- Parameters
action (np.ndarray) – An action provided by the agent.
- Returns
The environment step resulting from the action.
- Return type
-
property
action_space
(self)¶ akro.Space: The action space specification.
-
property
render_modes
(self)¶ list: A list of string representing the supported render modes.
-
reset
(self)¶ Reset the wrapped env.
- Returns
- The first observation conforming to
observation_space.
- dict: The episode-level information.
Note that this is not part of env_info provided in step(). It contains information of he entire episode, which could be needed to determine the first action (e.g. in the case of goal-conditioned or MTRL.)
- Return type
numpy.ndarray
-
render
(self, mode)¶ Render the wrapped environment.
-
visualize
(self)¶ Creates a visualization of the wrapped environment.
-
close
(self)¶ Close the wrapped env.
-
property
unwrapped
(self)¶ garage.Environment: The inner environment.
-
update_module_params
(module, new_params)¶ Load parameters to a module.
This function acts like torch.nn.Module._load_from_state_dict(), but it replaces the tensors in module with those in new_params, while _load_from_state_dict() loads only the value. Use this function so that the grad and grad_fn of new_params can be restored
- Parameters
module (torch.nn.Module) – A torch module.
new_params (dict) – A dict of torch tensor used as the new parameters of this module. This parameters dict should be generated by torch.nn.Module.named_parameters()