garage.tf._functions
¶
Utility functions for tfbased Reinforcement learning algorithms.

compile_function
(inputs, outputs)¶ Compiles a tensorflow function using the current session.

get_target_ops
(variables, target_variables, tau=None)¶ Get target variables update operations.
In RL algorithms we often update target network every n steps. This function returns the tf.Operation for updating target variables (denoted by target_var) from variables (denote by var) with fraction tau. In other words, each time we want to keep tau of the var and add (1  tau) of target_var to var.

flatten_batch
(t, name='flatten_batch')¶ Flatten a batch of observations.
Reshape a tensor of size (X, Y, Z) into (X*Y, Z)
 Parameters
t (tf.Tensor) – Tensor to flatten.
name (string) – Name of the operation.
 Returns
Flattened tensor.
 Return type
tf.Tensor

flatten_batch_dict
(d, name='flatten_batch_dict')¶ Flatten a batch of observations represented as a dict.

filter_valids
(t, valid, name='filter_valids')¶ Filter out tensor using valid array.

filter_valids_dict
(d, valid, name='filter_valids_dict')¶ Filter valid values on a dict.

graph_inputs
(name, **kwargs)¶ Creates a namedtuple of the given keys and values.
 Parameters
name (string) – Name of the tuple.
kwargs (tf.Tensor) – One or more tensor(s) to add to the namedtuple’s values. The parameter names are used as keys in the namedtuple. Ex. obs1=tensor1, obs2=tensor2.
 Returns
namedtuple containing the collection of variables passed.
 Return type

flatten_inputs
(deep)¶ Flattens an
collections.abc.Iterable
recursively.

flatten_tensor_variables
(ts)¶ Flattens a list of tensors into a single, 1dimensional tensor.
 Parameters
ts (collections.abc.Iterable) – Iterable containing either tf.Tensors or arrays.
 Returns
Flattened Tensor.
 Return type
tf.Tensor

new_tensor
(name, ndim, dtype)¶ Creates a placeholder
tf.Tensor
with the specified arguments.

new_tensor_like
(name, arr_like)¶ Creates a new placeholder
tf.Tensor
similar to arr_like.The new
tf.Tensor
has the same number of dimensions and dtype as arr_like. Parameters
name (string) – Name of the new tf.Tensor.
arr_like (tf.Tensor) – Tensor to copy attributes from.
 Returns
New placeholder tensor.
 Return type
tf.Tensor

concat_tensor_list
(tensor_list)¶ Concatenates a list of tensors into one tensor.
 Parameters
tensor_list (list[ndarray]) – list of tensors.
 Returns
Concatenated tensor.
 Return type
ndarray

concat_tensor_dict_list
(tensor_dict_list)¶ Concatenates a dict of tensors lists.
Each list of tensors gets concatenated into one tensor.

stack_tensor_dict_list
(tensor_dict_list)¶ Stack a list of dictionaries of {tensors or dictionary of tensors}.

split_tensor_dict_list
(tensor_dict)¶ Split a list of dictionaries of {tensors or dictionary of tensors}.

pad_tensor
(x, max_len)¶ Pad tensors with zeros.
 Parameters
x (numpy.ndarray) – Tensors to be padded.
max_len (int) – Maximum length.
 Returns
Padded tensor.
 Return type
numpy.ndarray

pad_tensor_n
(xs, max_len)¶ Pad array of tensors.
 Parameters
xs (numpy.ndarray) – Tensors to be padded.
max_len (int) – Maximum length.
 Returns
Padded tensor.
 Return type
numpy.ndarray

pad_tensor_dict
(tensor_dict, max_len)¶ Pad dictionary of tensors with zeros.

compute_advantages
(discount, gae_lambda, max_len, baselines, rewards, name='compute_advantages')¶ Calculate advantages.
Advantages are a discounted cumulative sum.
The discount cumulative sum can be represented as an IIR filter ob the reversed input vectors, i.e.
y[t]  discount*y[t+1] = x[t], or rev(y)[t]  discount*rev(y)[t1] = rev(x)[t]
Given the timedomain IIR filter step response, we can calculate the filter response to our signal by convolving the signal with the filter response function. The timedomain IIR step response is calculated below as discount_filter:
discount_filter = [1, discount, discount^2, …, discount^N1] where the epsiode length is N.
We convolve discount_filter with the reversed timedomain signal deltas to calculate the reversed advantages:
rev(advantages) = discount_filter (X) rev(deltas)
TensorFlow’s tf.nn.conv1d op is not a true convolution, but actually a crosscorrelation, so its input and output are already implicitly reversed for us.
advantages = discount_filter (tf.nn.conv1d) deltas
 Parameters
discount (float) – Discount factor.
gae_lambda (float) – Lambda, as used for Generalized Advantage Estimation (GAE).
max_len (int) – Maximum length of a single episode.
baselines (tf.Tensor) – A 2D vector of value function estimates with shape \((N, T)\), where \(N\) is the batch dimension (number of episodes) and \(T\) is the maximum episode length experienced by the agent.
rewards (tf.Tensor) – A 2D vector of perstep rewards with shape \((N, T)\), where \(N\) is the batch dimension (number of episodes) and \(T\) is the maximum episode length experienced by the agent.
name (string) – Name of the operation.
 Returns
 A 2D vector of calculated advantage values with shape
\((N, T)\), where \(N\) is the batch dimension (number of episodes) and \(T\) is the maximum episode length experienced by the agent.
 Return type
tf.Tensor

center_advs
(advs, axes, eps, offset=0, scale=1, name='center_adv')¶ Normalize the advs tensor.
This calculates the mean and variance using the axes specified and normalizes the tensor using those values.
 Parameters
advs (tf.Tensor) – Tensor to normalize.
axes (array[int]) – Axes along which to compute the mean and variance.
eps (float) – Small number to avoid dividing by zero.
offset (tf.Tensor) – Offset added to the normalized tensor. This is zero by default.
scale (tf.Tensor) – Scale to apply to the normalized tensor. This is 1 by default but can also be None.
name (string) – Name of the operation. None by default.
 Returns
Normalized, scaled and offset tensor.
 Return type
tf.Tensor

positive_advs
(advs, eps, name='positive_adv')¶ Make all the values in the advs tensor positive.
Offsets all values in advs by the minimum value in the tensor, plus an epsilon value to avoid dividing by zero.
 Parameters
advs (tf.Tensor) – The tensor to offset.
eps (tf.float32) – A small value to avoid byzero division.
name (string) – Name of the operation.
 Returns
Tensor with modified (postiive) values.
 Return type
tf.Tensor

discounted_returns
(discount, max_len, rewards, name='discounted_returns')¶ Calculate discounted returns.
 Parameters
discount (float) – Discount factor.
max_len (int) – Maximum length of a single episode.
rewards (tf.Tensor) – A 2D vector of perstep rewards with shape \((N, T)\), where \(N\) is the batch dimension (number of episodes) and \(T\) is the maximum episode length experienced by the agent.
name (string) – Name of the operation. None by default.
 Returns
Tensor of discounted returns.
 Return type
tf.Tensor