garage.tf.policies.categorical_lstm_policy

Categorical LSTM Policy.

A policy represented by a Categorical distribution which is parameterized by a Long short-term memory (LSTM).

class CategoricalLSTMPolicy(env_spec, name='CategoricalLSTMPolicy', hidden_dim=32, hidden_nonlinearity=tf.nn.tanh, hidden_w_init=tf.initializers.glorot_uniform(seed=deterministic.get_tf_seed_stream()), hidden_b_init=tf.zeros_initializer(), recurrent_nonlinearity=tf.nn.sigmoid, recurrent_w_init=tf.initializers.glorot_uniform(seed=deterministic.get_tf_seed_stream()), output_nonlinearity=tf.nn.softmax, output_w_init=tf.initializers.glorot_uniform(seed=deterministic.get_tf_seed_stream()), output_b_init=tf.zeros_initializer(), hidden_state_init=tf.zeros_initializer(), hidden_state_init_trainable=False, cell_state_init=tf.zeros_initializer(), cell_state_init_trainable=False, state_include_action=True, forget_bias=True, layer_normalization=False)

Bases: garage.tf.models.CategoricalLSTMModel, garage.tf.policies.policy.Policy

Inheritance diagram of garage.tf.policies.categorical_lstm_policy.CategoricalLSTMPolicy

Categorical LSTM Policy.

A policy represented by a Categorical distribution which is parameterized by a Long short-term memory (LSTM).

It only works with akro.Discrete action space.

Parameters
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.

  • name (str) – Policy name, also the variable scope.

  • hidden_dim (int) – Hidden dimension for LSTM cell.

  • hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s). It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • hidden_w_init (callable) – Initializer function for the weight of intermediate dense layer(s). The function should return a tf.Tensor.

  • hidden_b_init (callable) – Initializer function for the bias of intermediate dense layer(s). The function should return a tf.Tensor.

  • recurrent_nonlinearity (callable) – Activation function for recurrent layers. It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • recurrent_w_init (callable) – Initializer function for the weight of recurrent layer(s). The function should return a tf.Tensor.

  • output_nonlinearity (callable) – Activation function for output dense layer. It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • output_w_init (callable) – Initializer function for the weight of output dense layer(s). The function should return a tf.Tensor.

  • output_b_init (callable) – Initializer function for the bias of output dense layer(s). The function should return a tf.Tensor.

  • hidden_state_init (callable) – Initializer function for the initial hidden state. The functino should return a tf.Tensor.

  • hidden_state_init_trainable (bool) – Bool for whether the initial hidden state is trainable.

  • cell_state_init (callable) – Initializer function for the initial cell state. The functino should return a tf.Tensor.

  • cell_state_init_trainable (bool) – Bool for whether the initial cell state is trainable.

  • state_include_action (bool) – Whether the state includes action. If True, input dimension will be (observation dimension + action dimension).

  • forget_bias (bool) – If True, add 1 to the bias of the forget gate at initialization. It’s used to reduce the scale of forgetting at the beginning of the training.

  • layer_normalization (bool) – Bool for using layer normalization or not.

build(self, state_input, name=None)

Build policy.

Parameters
  • state_input (tf.Tensor) – State input.

  • name (str) – Name of the policy, which is also the name scope.

Returns

Policy distribution. tf.Tensor: Step output, with shape \((N, S^*)\) tf.Tensor: Step hidden state, with shape \((N, S^*)\) tf.Tensor: Step cell state, with shape \((N, S^*)\) tf.Tensor: Initial hidden state, used to reset the hidden state

when policy resets. Shape: \((S^*)\)

tf.Tensor: Initial cell state, used to reset the cell state

when policy resets. Shape: \((S^*)\)

Return type

tfp.distributions.OneHotCategorical

property input_dim(self)

int: Dimension of the policy input.

reset(self, do_resets=None)

Reset the policy.

Note

If do_resets is None, it will be by default np.array([True]), which implies the policy will not be “vectorized”, i.e. number of paralle environments for training data sampling = 1.

Parameters

do_resets (numpy.ndarray) – Bool that indicates terminal state(s).

get_action(self, observation)

Return a single action.

Parameters

observation (numpy.ndarray) – Observations.

Returns

Action given input observation. dict(numpy.ndarray): Distribution parameters.

Return type

int

get_actions(self, observations)

Return multiple actions.

Parameters

observations (numpy.ndarray) – Observations.

Returns

Actions given input observations. dict(numpy.ndarray): Distribution parameters.

Return type

list[int]

property state_info_specs(self)

State info specifcation.

Returns

keys and shapes for the information related to the

policy’s state when taking an action.

Return type

List[str]

property env_spec(self)

Policy environment specification.

Returns

Environment specification.

Return type

garage.EnvSpec

clone(self, name)

Return a clone of the policy.

It copies the configuration of the primitive and also the parameters.

Parameters

name (str) – Name of the newly created policy. It has to be different from source policy if cloned under the same computational graph.

Returns

Newly cloned policy.

Return type

garage.tf.policies.CategoricalLSTMPolicy

network_output_spec(self)

Network output spec.

Returns

Name of the model outputs, in order.

Return type

list[str]

network_input_spec(self)

Network input spec.

Returns

List of key(str) for the network outputs.

Return type

list[str]

property parameters(self)

Parameters of the model.

Returns

Parameters

Return type

np.ndarray

property name(self)

Name (str) of the model.

This is also the variable scope of the model.

Returns

Name of the model.

Return type

str

property input(self)

Default input of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the input of the network.

Returns

Default input of the model.

Return type

tf.Tensor

property output(self)

Default output of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the output of the network.

Returns

Default output of the model.

Return type

tf.Tensor

property inputs(self)

Default inputs of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the inputs of the network.

Returns

Default inputs of the model.

Return type

list[tf.Tensor]

property outputs(self)

Default outputs of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the outputs of the network.

Returns

Default outputs of the model.

Return type

list[tf.Tensor]

property state_info_keys(self)

State info keys.

Returns

keys for the information related to the module’s state

when taking an input.

Return type

List[str]

terminate(self)

Clean up operation.

get_trainable_vars(self)

Get trainable variables.

Returns

A list of trainable variables in the current

variable scope.

Return type

List[tf.Variable]

get_global_vars(self)

Get global variables.

Returns

A list of global variables in the current

variable scope.

Return type

List[tf.Variable]

get_regularizable_vars(self)

Get all network weight variables in the current scope.

Returns

A list of network weight variables in the

current variable scope.

Return type

List[tf.Variable]

get_params(self)

Get the trainable variables.

Returns

A list of trainable variables in the current

variable scope.

Return type

List[tf.Variable]

get_param_shapes(self)

Get parameter shapes.

Returns

A list of variable shapes.

Return type

List[tuple]

get_param_values(self)

Get param values.

Returns

Values of the parameters evaluated in

the current session

Return type

np.ndarray

set_param_values(self, param_values)

Set param values.

Parameters

param_values (np.ndarray) – A numpy array of parameter values.

flat_to_params(self, flattened_params)

Unflatten tensors according to their respective shapes.

Parameters

flattened_params (np.ndarray) – A numpy array of flattened params.

Returns

A list of parameters reshaped to the

shapes specified.

Return type

List[np.ndarray]

property observation_space(self)

Observation space.

Returns

The observation space of the environment.

Return type

akro.Space

property action_space(self)

Action space.

Returns

The action space of the environment.

Return type

akro.Space