# garage.np¶

Reinforcement Learning Algorithms which use NumPy as a numerical backend.

concat_tensor_dict_list(tensor_dict_list)[source]

Concatenate dictionary of list of tensor.

Parameters

tensor_dict_list (dict[list]) – a list of dictionaries of {tensors or dictionary of tensors}.

Returns

a dictionary of {stacked tensors or dictionary of

stacked tensors}

Return type

dict

discount_cumsum(x, discount)[source]

Discounted cumulative sum.

See https://docs.scipy.org/doc/scipy/reference/tutorial/signal.html#difference-equation-filtering # noqa: E501 Here, we have y[t] - discount*y[t+1] = x[t] or rev(y)[t] - discount*rev(y)[t-1] = rev(x)[t]

Parameters
• x (np.ndarrary) – Input.

• discount (float) – Discount factor.

Returns

Discounted cumulative sum.

Return type

np.ndarrary

explained_variance_1d(ypred, y, valids=None)[source]

Explained variation for 1D inputs.

It is the proportion of the variance in one variable that is explained or predicted from another variable.

Parameters
• ypred (np.ndarray) – Sample data from the first variable. Shape: $$(N, max_episode_length)$$.

• y (np.ndarray) – Sample data from the second variable. Shape: $$(N, max_episode_length)$$.

• valids (np.ndarray) – Optional argument. Array indicating valid indices. If None, it assumes the entire input array are valid. Shape: $$(N, max_episode_length)$$.

Returns

The explained variance.

Return type

float

flatten_tensors(tensors)[source]

Flatten a list of tensors.

Parameters

tensors (list[numpy.ndarray]) – List of tensors to be flattened.

Returns

Flattened tensors.

Return type

numpy.ndarray

Example:

>>> flatten_tensors([np.ndarray([1]), np.ndarray([1])])
array(...)

pad_tensor(x, max_len, mode='zero')[source]

Parameters
• x (numpy.ndarray) – Tensors to be padded.

• max_len (int) – Maximum length.

• mode (str) – If ‘last’, pad with the last element, otherwise pad with 0.

Returns

Return type

numpy.ndarray

pad_tensor_dict(tensor_dict, max_len, mode='zero')[source]

Parameters
• tensor_dict (dict[numpy.ndarray]) – Tensors to be padded.

• max_len (int) – Maximum length.

• mode (str) – If ‘last’, pad with the last element, otherwise pad with 0.

Returns

Return type

dict[numpy.ndarray]

pad_tensor_n(xs, max_len)[source]

Parameters
• xs (numpy.ndarray) – Tensors to be padded.

• max_len (int) – Maximum length.

Returns

Return type

numpy.ndarray

paths_to_tensors(paths, max_episode_length, baseline_predictions, discount)[source]

Return processed sample data based on the collected paths.

Parameters
• paths (list[dict]) – A list of collected paths.

• max_episode_length (int) – Maximum length of a single episode.

• baseline_predictions (numpy.ndarray) – : Predicted value of GAE (Generalized Advantage Estimation) Baseline.

• discount (float) – Environment reward discount.

Returns

Processed sample data, with key
• observations (numpy.ndarray): Padded array of the observations of

the environment

• actions (numpy.ndarray): Padded array of the actions fed to the

the environment

• rewards (numpy.ndarray): Padded array of the acquired rewards

• agent_infos (dict): a dictionary of {stacked tensors or

dictionary of stacked tensors}

• env_infos (dict): a dictionary of {stacked tensors or

dictionary of stacked tensors}

• rewards (numpy.ndarray): Padded array of the validity information

Return type

dict

rrse(actual, predicted)[source]

Root Relative Squared Error.

Parameters
• actual (np.ndarray) – The actual value.

• predicted (np.ndarray) – The predicted value.

Returns

The root relative square error between the actual and the

predicted value.

Return type

float

slice_nested_dict(dict_or_array, start, stop)[source]

Slice a dictionary containing arrays (or dictionaries).

This function is primarily intended for un-batching env_infos and action_infos.

Parameters
• dict_or_array (dict[str, dict or np.ndarray] or np.ndarray) – A nested dictionary should only contain dictionaries and numpy arrays (recursively).

• start (int) – First index to be included in the slice.

• stop (int) – First index to be excluded from the slice. In other words, these are typical python slice indices.

Returns

The input, but sliced.

Return type

dict or np.ndarray

sliding_window(t, window, smear=False)[source]

Create a sliding window over a tensor.

Parameters
• t (np.ndarray) – A tensor to create sliding window from, with shape $$(N, D)$$, where N is the length of a trajectory, D is the dimension of each step in trajectory.

• window (int) – Window size, mush be less than N.

• smear (bool) – If true, copy the last window so that N windows are generated.

Returns

All windows generate over t, with shape $$(M, W, D)$$,

where W is the window size. If smear if False, M is $$N-W+1$$, otherwise M is N.

Return type

np.ndarray

Raises
stack_and_pad_tensor_dict_list(tensor_dict_list, max_len)[source]

Stack and pad array of list of tensors.

Input paths are a list of N dicts, each with values of shape $$(D, S^*)$$. This function stack and pad the values with the input key with max_len, so output will be shape $$(N, D, S^*)$$.

Parameters
• tensor_dict_list (list[dict]) – List of dict to be stacked and padded. Value of each dict will be shape of $$(D, S^*)$$.

• max_len (int) – Maximum length for padding.

Returns

a dictionary of {stacked tensors or dictionary of

stacked tensors}. Shape: $$(N, D, S^*)$$ where N is the len of input paths.

Return type

dict

stack_tensor_dict_list(tensor_dict_list)[source]

Stack a list of dictionaries of {tensors or dictionary of tensors}.

Parameters

tensor_dict_list (dict[list]) – a list of dictionaries of {tensors or dictionary of tensors}.

Returns

a dictionary of {stacked tensors or dictionary of

stacked tensors}

Return type

dict

truncate_tensor_dict(tensor_dict, truncated_len)[source]

Truncate dictionary of list of tensor.

Parameters
• tensor_dict (dict[numpy.ndarray]) – a dictionary of {tensors or dictionary of tensors}.

• truncated_len (int) – Length to truncate.

Returns

a dictionary of {stacked tensors or dictionary of

stacked tensors}

Return type

dict

unflatten_tensors(flattened, tensor_shapes)[source]

Unflatten a flattened tensors into a list of tensors.

Parameters
• flattened (numpy.ndarray) – Flattened tensors.

• tensor_shapes (tuple) – Tensor shapes.

Returns

Unflattened list of tensors.

Return type

list[numpy.ndarray]