garage.tf

Tensorflow Branch.

center_advs(advs, axes, eps, offset=0, scale=1, name='center_adv')

Normalize the advs tensor.

This calculates the mean and variance using the axes specified and normalizes the tensor using those values.

Parameters
  • advs (tf.Tensor) – Tensor to normalize.

  • axes (array[int]) – Axes along which to compute the mean and variance.

  • eps (float) – Small number to avoid dividing by zero.

  • offset (tf.Tensor) – Offset added to the normalized tensor. This is zero by default.

  • scale (tf.Tensor) – Scale to apply to the normalized tensor. This is 1 by default but can also be None.

  • name (string) – Name of the operation. None by default.

Returns

Normalized, scaled and offset tensor.

Return type

tf.Tensor

compile_function(inputs, outputs)

Compiles a tensorflow function using the current session.

Parameters
  • inputs (list[tf.Tensor]) – Inputs to the function. Can be a list of inputs or just one.

  • outputs (list[tf.Tensor]) – Outputs of the function. Can be a list of outputs or just one.

Returns

Compiled TensorFlow function.

Return type

Callable

compute_advantages(discount, gae_lambda, max_len, baselines, rewards, name='compute_advantages')

Calculate advantages.

Advantages are a discounted cumulative sum.

The discount cumulative sum can be represented as an IIR filter ob the reversed input vectors, i.e.

y[t] - discount*y[t+1] = x[t], or rev(y)[t] - discount*rev(y)[t-1] = rev(x)[t]

Given the time-domain IIR filter step response, we can calculate the filter response to our signal by convolving the signal with the filter response function. The time-domain IIR step response is calculated below as discount_filter:

discount_filter = [1, discount, discount^2, …, discount^N-1] where the epsiode length is N.

We convolve discount_filter with the reversed time-domain signal deltas to calculate the reversed advantages:

rev(advantages) = discount_filter (X) rev(deltas)

TensorFlow’s tf.nn.conv1d op is not a true convolution, but actually a cross-correlation, so its input and output are already implicitly reversed for us.

advantages = discount_filter (tf.nn.conv1d) deltas

Parameters
  • discount (float) – Discount factor.

  • gae_lambda (float) – Lambda, as used for Generalized Advantage Estimation (GAE).

  • max_len (int) – Maximum length of a single episode.

  • baselines (tf.Tensor) – A 2D vector of value function estimates with shape \((N, T)\), where \(N\) is the batch dimension (number of episodes) and \(T\) is the maximum episode length experienced by the agent.

  • rewards (tf.Tensor) – A 2D vector of per-step rewards with shape \((N, T)\), where \(N\) is the batch dimension (number of episodes) and \(T\) is the maximum episode length experienced by the agent.

  • name (string) – Name of the operation.

Returns

A 2D vector of calculated advantage values with shape

\((N, T)\), where \(N\) is the batch dimension (number of episodes) and \(T\) is the maximum episode length experienced by the agent.

Return type

tf.Tensor

concat_tensor_dict_list(tensor_dict_list)

Concatenates a dict of tensors lists.

Each list of tensors gets concatenated into one tensor.

Parameters

tensor_dict_list (dict[list[ndarray]]) – Dict with lists of tensors.

Returns

A dict with the concatenated tensors.

Return type

dict[ndarray]

concat_tensor_list(tensor_list)

Concatenates a list of tensors into one tensor.

Parameters

tensor_list (list[ndarray]) – list of tensors.

Returns

Concatenated tensor.

Return type

ndarray

discounted_returns(discount, max_len, rewards, name='discounted_returns')

Calculate discounted returns.

Parameters
  • discount (float) – Discount factor.

  • max_len (int) – Maximum length of a single episode.

  • rewards (tf.Tensor) – A 2D vector of per-step rewards with shape \((N, T)\), where \(N\) is the batch dimension (number of episodes) and \(T\) is the maximum episode length experienced by the agent.

  • name (string) – Name of the operation. None by default.

Returns

Tensor of discounted returns.

Return type

tf.Tensor

filter_valids(t, valid, name='filter_valids')

Filter out tensor using valid array.

Parameters
  • t (tf.Tensor) – The tensor to filter.

  • valid (list[float]) – Array of length of the valid values (either 0 or 1).

  • name (string) – Name of the operation.

Returns

Filtered Tensor.

Return type

tf.Tensor

filter_valids_dict(d, valid, name='filter_valids_dict')

Filter valid values on a dict.

Parameters
  • d (dict[tf.Tensor]) – Dict of tensors to be filtered.

  • valid (list[float]) – Array of length of the valid values (elements

  • be either 0 or 1) (can) –

  • name (string) – Name of the operation. None by default.

Returns

Dict with filtered tensors.

Return type

dict[tf.Tensor]

flatten_batch(t, name='flatten_batch')

Flatten a batch of observations.

Reshape a tensor of size (X, Y, Z) into (X*Y, Z)

Parameters
  • t (tf.Tensor) – Tensor to flatten.

  • name (string) – Name of the operation.

Returns

Flattened tensor.

Return type

tf.Tensor

flatten_batch_dict(d, name='flatten_batch_dict')

Flatten a batch of observations represented as a dict.

Parameters
  • d (dict[tf.Tensor]) – A dict of Tensors to flatten.

  • name (string) – The name of the operation (None by default).

Returns

A dict with flattened tensors.

Return type

dict[tf.Tensor]

flatten_inputs(deep)

Flattens an collections.abc.Iterable recursively.

Parameters

deep (Iterable) – An Iterable to flatten.

Returns

The flattened result.

Return type

list

flatten_tensor_variables(ts)

Flattens a list of tensors into a single, 1-dimensional tensor.

Parameters

ts (collections.abc.Iterable) – Iterable containing either tf.Tensors or arrays.

Returns

Flattened Tensor.

Return type

tf.Tensor

get_target_ops(variables, target_variables, tau=None)

Get target variables update operations.

In RL algorithms we often update target network every n steps. This function returns the tf.Operation for updating target variables (denoted by target_var) from variables (denote by var) with fraction tau. In other words, each time we want to keep tau of the var and add (1 - tau) of target_var to var.

Parameters
  • variables (list[tf.Variable]) – Soure variables for update.

  • target_variables (list[tf.Variable]) – Target variables to be updated.

  • tau (float) – Fraction to update. Set it to be None for hard-update.

Returns

Operation for updating the target variables.

Return type

tf.Operation

graph_inputs(name, **kwargs)

Creates a namedtuple of the given keys and values.

Parameters
  • name (string) – Name of the tuple.

  • kwargs (tf.Tensor) – One or more tensor(s) to add to the namedtuple’s values. The parameter names are used as keys in the namedtuple. Ex. obs1=tensor1, obs2=tensor2.

Returns

namedtuple containing the collection of variables passed.

Return type

tuple

new_tensor(name, ndim, dtype)

Creates a placeholder tf.Tensor with the specified arguments.

Parameters
  • name (string) – Name of the tf.Tensor.

  • ndim (int) – Number of dimensions of the tf.Tensor.

  • dtype (type) – Data type of the tf.Tensor’s contents.

Returns

Placeholder tensor.

Return type

tf.Tensor

new_tensor_like(name, arr_like)

Creates a new placeholder tf.Tensor similar to arr_like.

The new tf.Tensor has the same number of dimensions and dtype as arr_like.

Parameters
  • name (string) – Name of the new tf.Tensor.

  • arr_like (tf.Tensor) – Tensor to copy attributes from.

Returns

New placeholder tensor.

Return type

tf.Tensor

pad_tensor(x, max_len)

Pad tensors with zeros.

Parameters
  • x (numpy.ndarray) – Tensors to be padded.

  • max_len (int) – Maximum length.

Returns

Padded tensor.

Return type

numpy.ndarray

pad_tensor_dict(tensor_dict, max_len)

Pad dictionary of tensors with zeros.

Parameters
  • tensor_dict (dict[numpy.ndarray]) – Tensors to be padded.

  • max_len (int) – Maximum length.

Returns

Padded tensor.

Return type

dict[numpy.ndarray]

pad_tensor_n(xs, max_len)

Pad array of tensors.

Parameters
  • xs (numpy.ndarray) – Tensors to be padded.

  • max_len (int) – Maximum length.

Returns

Padded tensor.

Return type

numpy.ndarray

paths_to_tensors(paths, max_episode_length, baseline_predictions, discount, gae_lambda)

Return processed sample data based on the collected paths.

Parameters
  • paths (list[dict]) – A list of collected paths.

  • max_episode_length (int) – Maximum length of a single episode.

  • baseline_predictions (numpy.ndarray) – : Predicted value of GAE (Generalized Advantage Estimation) Baseline.

  • discount (float) – Environment reward discount.

  • gae_lambda (float) – Lambda used for generalized advantage estimation.

Returns

Processed sample data, with key
  • observations: (numpy.ndarray)

  • actions: (numpy.ndarray)

  • rewards: (numpy.ndarray)

  • baselines: (numpy.ndarray)

  • returns: (numpy.ndarray)

  • valids: (numpy.ndarray)

  • agent_infos: (dict)

  • env_infos: (dict)

  • paths: (list[dict])

Return type

dict

positive_advs(advs, eps, name='positive_adv')

Make all the values in the advs tensor positive.

Offsets all values in advs by the minimum value in the tensor, plus an epsilon value to avoid dividing by zero.

Parameters
  • advs (tf.Tensor) – The tensor to offset.

  • eps (tf.float32) – A small value to avoid by-zero division.

  • name (string) – Name of the operation.

Returns

Tensor with modified (postiive) values.

Return type

tf.Tensor

split_tensor_dict_list(tensor_dict)

Split a list of dictionaries of {tensors or dictionary of tensors}.

Parameters
  • tensor_dict (dict) – a list of dictionaries of {tensors or

  • of tensors}. (dictionary) –

Returns

a dictionary of {split tensors or dictionary of split tensors}.

Return type

dict

stack_tensor_dict_list(tensor_dict_list)

Stack a list of dictionaries of {tensors or dictionary of tensors}.

Parameters

tensor_dict_list (dict) – a list of dictionaries of {tensors or dictionary of tensors}.

Returns

a dictionary of {stacked tensors or dictionary of stacked

tensors}.

Return type

dict