garage.tf.policies package

Policies for TensorFlow-based algorithms.

class Policy(name, env_spec)[source]

Bases: abc.ABC

Base class for Policies.

Parameters:
action_space

The action space for the environment.

Type:akro.Space
env_spec

Policy environment specification.

Type:garage.EnvSpec
flat_to_params(flattened_params, **tags)[source]

Unflatten tensors according to their respective shapes.

Parameters:
  • flattened_params (np.ndarray) – A numpy array of flattened params.
  • tags (dict) – A map specifying the parameters and their shapes.
Returns:

A list of parameters reshaped to the shapes specified.

Return type:

tensors (List[np.ndarray])

get_action(observation)[source]

Get action sampled from the policy.

Parameters:observation (np.ndarray) – Observation from the environment.
Returns:Action sampled from the policy.
Return type:(np.ndarray)
get_actions(observations)[source]

Get action sampled from the policy.

Parameters:observations (list[np.ndarray]) – Observations from the environment.
Returns:Actions sampled from the policy.
Return type:(np.ndarray)
get_global_vars()[source]

Get global variables.

Returns:A list of global variables in the current variable scope.
Return type:List[tf.Variable]
get_param_shapes(**tags)[source]

Get parameter shapes.

get_param_values(**tags)[source]

Get param values.

Parameters:tags (dict) – A map of parameters for which the values are required.
Returns:Values of the parameters evaluated in the current session
Return type:param_values (np.ndarray)
get_params(trainable=True)[source]

Get the trainable variables.

Returns:A list of trainable variables in the current variable scope.
Return type:List[tf.Variable]
get_trainable_vars()[source]

Get trainable variables.

Returns:A list of trainable variables in the current variable scope.
Return type:List[tf.Variable]
log_diagnostics(paths)[source]

Log extra information per iteration based on the collected paths.

name

Name of the policy model and the variable scope.

Type:str
observation_space

The observation space of the environment.

Type:akro.Space
recurrent

Indicating if the policy is recurrent.

Type:bool
reset(dones=None)[source]

Reset the policy.

If dones is None, it will be by default np.array([True]) which implies the policy will not be “vectorized”, i.e. number of parallel environments for training data sampling = 1.

Parameters:dones (numpy.ndarray) – Bool that indicates terminal state(s).
set_param_values(param_values, name=None, **tags)[source]

Set param values.

Parameters:
  • param_values (np.ndarray) – A numpy array of parameter values.
  • tags (dict) – A map of parameters for which the values should be
  • loaded.
state_info_keys

State info keys.

Returns:keys for the information related to the policy’s state when taking an action.
Return type:List[str]
state_info_specs

State info specifcation.

Returns:keys and shapes for the information related to the policy’s state when taking an action.
Return type:List[str]
terminate()[source]

Clean up operation.

vectorized

Boolean for vectorized.

Returns:Indicates whether the policy is vectorized. If True, it should implement get_actions(), and support resetting with multiple simultaneous states.
Return type:bool
class StochasticPolicy(name, env_spec)[source]

Bases: garage.tf.policies.base.Policy

StochasticPolicy.

dist_info(obs, state_infos)[source]

Distribution info.

Return the distribution information about the actions.

Parameters:
  • obs (tf.Tensor) – observation values
  • state_infos (dict) – a dictionary whose values should contain information about the state of the policy at the time it received the observation
dist_info_sym(obs_var, state_info_vars, name='dist_info_sym')[source]

Symbolic graph of the distribution.

Return the symbolic distribution information about the actions. :param obs_var: symbolic variable for observations :type obs_var: tf.Tensor :param state_info_vars: a dictionary whose values should contain

information about the state of the policy at the time it received the observation.
Parameters:name (str) – Name of the symbolic graph.
distribution

Distribution.

class CategoricalCNNPolicy(env_spec, conv_filters, conv_filter_sizes, conv_strides, conv_pad, name='CategoricalCNNPolicy', hidden_sizes=[], hidden_nonlinearity=<function relu>, hidden_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, hidden_b_init=<tensorflow.python.ops.init_ops.Zeros object>, output_nonlinearity=<function softmax>, output_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, output_b_init=<tensorflow.python.ops.init_ops.Zeros object>, layer_normalization=False)[source]

Bases: garage.tf.policies.base.StochasticPolicy

A policy that contains a CNN and a MLP to make prediction based on a categorical distribution.

It only works with akro.Discrete action space.

Parameters:
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
  • conv_filter_sizes (tuple[int]) – Dimension of the filters. For example, (3, 5) means there are two convolutional layers. The filter for first layer is of dimension (3 x 3) and the second one is of dimension (5 x 5).
  • conv_filters (tuple[int]) – Number of filters. For example, (3, 32) means there are two convolutional layers. The filter for the first layer has 3 channels and the second one with 32 channels.
  • conv_strides (tuple[int]) – The stride of the sliding window. For example, (1, 2) means there are two convolutional layers. The stride of the filter for first layer is 1 and that of the second layer is 2.
  • conv_pad (str) – The type of padding algorithm to use, either ‘SAME’ or ‘VALID’.
  • name (str) – Policy name, also the variable scope of the policy.
  • hidden_sizes (list[int]) – Output dimension of dense layer(s). For example, (32, 32) means the MLP of this policy consists of two hidden layers, each with 32 hidden units.
  • hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s). It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • hidden_w_init (callable) – Initializer function for the weight of intermediate dense layer(s). The function should return a tf.Tensor.
  • hidden_b_init (callable) – Initializer function for the bias of intermediate dense layer(s). The function should return a tf.Tensor.
  • output_nonlinearity (callable) – Activation function for output dense layer. It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • output_w_init (callable) – Initializer function for the weight of output dense layer(s). The function should return a tf.Tensor.
  • output_b_init (callable) – Initializer function for the bias of output dense layer(s). The function should return a tf.Tensor.
  • layer_normalization (bool) – Bool for using layer normalization or not.
dist_info(obs, state_infos=None)[source]

Distribution info.

dist_info_sym(obs_var, state_info_vars=None, name=None)[source]

Symbolic graph of the distribution.

distribution

Policy distribution.

get_action(observation)[source]

Return a single action.

get_actions(observations)[source]

Return multiple actions.

vectorized

Vectorized or not.

class CategoricalGRUPolicy(env_spec, name='CategoricalGRUPolicy', hidden_dim=32, hidden_nonlinearity=<function tanh>, hidden_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, hidden_b_init=<tensorflow.python.ops.init_ops.Zeros object>, recurrent_nonlinearity=<function sigmoid>, recurrent_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, output_nonlinearity=<function softmax>, output_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, output_b_init=<tensorflow.python.ops.init_ops.Zeros object>, hidden_state_init=<tensorflow.python.ops.init_ops.Zeros object>, hidden_state_init_trainable=False, state_include_action=True, layer_normalization=False)[source]

Bases: garage.tf.policies.base.StochasticPolicy

A policy that contains a GRU to make prediction based on a categorical distribution.

It only works with akro.Discrete action space.

Parameters:
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
  • name (str) – Policy name, also the variable scope.
  • hidden_dim (int) – Hidden dimension for LSTM cell.
  • hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s). It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • hidden_w_init (callable) – Initializer function for the weight of intermediate dense layer(s). The function should return a tf.Tensor.
  • hidden_b_init (callable) – Initializer function for the bias of intermediate dense layer(s). The function should return a tf.Tensor.
  • recurrent_nonlinearity (callable) – Activation function for recurrent layers. It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • recurrent_w_init (callable) – Initializer function for the weight of recurrent layer(s). The function should return a tf.Tensor.
  • output_nonlinearity (callable) – Activation function for output dense layer. It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • output_w_init (callable) – Initializer function for the weight of output dense layer(s). The function should return a tf.Tensor.
  • output_b_init (callable) – Initializer function for the bias of output dense layer(s). The function should return a tf.Tensor.
  • hidden_state_init (callable) – Initializer function for the initial hidden state. The functino should return a tf.Tensor.
  • hidden_state_init_trainable (bool) – Bool for whether the initial hidden state is trainable.
  • state_include_action (bool) – Whether the state includes action. If True, input dimension will be (observation dimension + action dimension).
  • layer_normalization (bool) – Bool for using layer normalization or not.
dist_info_sym(obs_var, state_info_vars, name=None)[source]

Symbolic graph of the distribution.

distribution

Policy distribution.

get_action(observation)[source]

Return a single action.

get_actions(observations)[source]

Return multiple actions.

recurrent

Recurrent or not.

reset(dones=None)[source]

Reset the policy.

state_info_specs

State info specification.

vectorized

Vectorized or not.

class CategoricalLSTMPolicy(env_spec, name='CategoricalLSTMPolicy', hidden_dim=32, hidden_nonlinearity=<function tanh>, hidden_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, hidden_b_init=<tensorflow.python.ops.init_ops.Zeros object>, recurrent_nonlinearity=<function sigmoid>, recurrent_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, output_nonlinearity=<function softmax>, output_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, output_b_init=<tensorflow.python.ops.init_ops.Zeros object>, hidden_state_init=<tensorflow.python.ops.init_ops.Zeros object>, hidden_state_init_trainable=False, cell_state_init=<tensorflow.python.ops.init_ops.Zeros object>, cell_state_init_trainable=False, state_include_action=True, forget_bias=True, layer_normalization=False)[source]

Bases: garage.tf.policies.base.StochasticPolicy

A policy that contains a LSTM to make prediction based on a categorical distribution.

It only works with akro.Discrete action space.

Parameters:
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
  • name (str) – Policy name, also the variable scope.
  • hidden_dim (int) – Hidden dimension for LSTM cell.
  • hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s). It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • hidden_w_init (callable) – Initializer function for the weight of intermediate dense layer(s). The function should return a tf.Tensor.
  • hidden_b_init (callable) – Initializer function for the bias of intermediate dense layer(s). The function should return a tf.Tensor.
  • recurrent_nonlinearity (callable) – Activation function for recurrent layers. It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • recurrent_w_init (callable) – Initializer function for the weight of recurrent layer(s). The function should return a tf.Tensor.
  • output_nonlinearity (callable) – Activation function for output dense layer. It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • output_w_init (callable) – Initializer function for the weight of output dense layer(s). The function should return a tf.Tensor.
  • output_b_init (callable) – Initializer function for the bias of output dense layer(s). The function should return a tf.Tensor.
  • hidden_state_init (callable) – Initializer function for the initial hidden state. The functino should return a tf.Tensor.
  • hidden_state_init_trainable (bool) – Bool for whether the initial hidden state is trainable.
  • cell_state_init (callable) – Initializer function for the initial cell state. The functino should return a tf.Tensor.
  • cell_state_init_trainable (bool) – Bool for whether the initial cell state is trainable.
  • state_include_action (bool) – Whether the state includes action. If True, input dimension will be (observation dimension + action dimension).
  • forget_bias (bool) – If True, add 1 to the bias of the forget gate at initialization. It’s used to reduce the scale of forgetting at the beginning of the training.
  • layer_normalization (bool) – Bool for using layer normalization or not.
dist_info_sym(obs_var, state_info_vars, name=None)[source]

Symbolic graph of the distribution.

distribution

Policy distribution.

get_action(observation)[source]

Return a single action.

get_actions(observations)[source]

Return multiple actions.

recurrent

Recurrent or not.

reset(dones=None)[source]

Reset the policy.

state_info_specs

State info specification.

vectorized

Vectorized or not.

class CategoricalMLPPolicy(env_spec, name='CategoricalMLPPolicy', hidden_sizes=(32, 32), hidden_nonlinearity=<function tanh>, hidden_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, hidden_b_init=<tensorflow.python.ops.init_ops.Zeros object>, output_nonlinearity=<function softmax>, output_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, output_b_init=<tensorflow.python.ops.init_ops.Zeros object>, layer_normalization=False)[source]

Bases: garage.tf.policies.base.StochasticPolicy

A policy that contains a MLP to make prediction based on a categorical distribution.

It only works with akro.Discrete action space.

Parameters:
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
  • name (str) – Policy name, also the variable scope.
  • hidden_sizes (list[int]) – Output dimension of dense layer(s). For example, (32, 32) means the MLP of this policy consists of two hidden layers, each with 32 hidden units.
  • hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s). It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • hidden_w_init (callable) – Initializer function for the weight of intermediate dense layer(s). The function should return a tf.Tensor.
  • hidden_b_init (callable) – Initializer function for the bias of intermediate dense layer(s). The function should return a tf.Tensor.
  • output_nonlinearity (callable) – Activation function for output dense layer. It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • output_w_init (callable) – Initializer function for the weight of output dense layer(s). The function should return a tf.Tensor.
  • output_b_init (callable) – Initializer function for the bias of output dense layer(s). The function should return a tf.Tensor.
  • layer_normalization (bool) – Bool for using layer normalization or not.
dist_info(obs, state_infos=None)[source]

Distribution info.

dist_info_sym(obs_var, state_info_vars=None, name=None)[source]

Symbolic graph of the distribution.

distribution

Policy distribution.

get_action(observation)[source]

Return a single action.

get_actions(observations)[source]

Return multiple actions.

get_regularizable_vars()[source]

Get regularizable weight variables under the Policy scope.

vectorized

Vectorized or not.

class ContinuousMLPPolicy(env_spec, name='ContinuousMLPPolicy', hidden_sizes=(64, 64), hidden_nonlinearity=<function relu>, hidden_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, hidden_b_init=<tensorflow.python.ops.init_ops.Zeros object>, output_nonlinearity=<function tanh>, output_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, output_b_init=<tensorflow.python.ops.init_ops.Zeros object>, input_include_goal=False, layer_normalization=False)[source]

Bases: garage.tf.policies.base.Policy

Continuous MLP Policy Network.

The policy network selects action based on the state of the environment. It uses neural nets to fit the function of pi(s).

Parameters:
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
  • name (str) – Policy name, also the variable scope.
  • hidden_sizes (list[int]) – Output dimension of dense layer(s). For example, (32, 32) means the MLP of this policy consists of two hidden layers, each with 32 hidden units.
  • hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s). It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • hidden_w_init (callable) – Initializer function for the weight of intermediate dense layer(s). The function should return a tf.Tensor.
  • hidden_b_init (callable) – Initializer function for the bias of intermediate dense layer(s). The function should return a tf.Tensor.
  • output_nonlinearity (callable) – Activation function for output dense layer. It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • output_w_init (callable) – Initializer function for the weight of output dense layer(s). The function should return a tf.Tensor.
  • output_b_init (callable) – Initializer function for the bias of output dense layer(s). The function should return a tf.Tensor.
  • input_include_goal (bool) – Include goal in the observation or not.
  • layer_normalization (bool) – Bool for using layer normalization or not.
clone(name)[source]

Return a clone of the policy.

It only copies the configuration of the Q-function, not the parameters.

Parameters:name (str) – Name of the newly created policy.
Returns:Clone of this object
Return type:garage.tf.policies.ContinuousMLPPolicy
get_action(observation)[source]

Get single action from this policy for the input observation.

Parameters:observation (numpy.ndarray) – Observation from environment.
Returns:Predicted action. dict: Empty dict since this policy does not model a distribution.
Return type:numpy.ndarray
get_action_sym(obs_var, name=None)[source]

Symbolic graph of the action.

Parameters:
  • obs_var (tf.Tensor) – Tensor input for symbolic graph.
  • name (str) – Name for symbolic graph.
Returns:

symbolic graph of the action.

Return type:

tf.Tensor

get_actions(observations)[source]

Get multiple actions from this policy for the input observations.

Parameters:observations (numpy.ndarray) – Observations from environment.
Returns:Predicted actions. dict: Empty dict since this policy does not model a distribution.
Return type:numpy.ndarray
get_regularizable_vars()[source]

Get regularizable weight variables under the Policy scope.

Returns:List of regularizable variables.
Return type:list(tf.Variable)
vectorized

Vectorized or not.

Returns:vectorized or not.
Return type:bool
class DiscreteQfDerivedPolicy(env_spec, qf, name='DiscreteQfDerivedPolicy')[source]

Bases: garage.tf.policies.base.Policy

DiscreteQfDerived policy.

Parameters:
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
  • qf (garage.q_functions.QFunction) – The q-function used.
  • name (str) – Name of the policy.
get_action(observation)[source]

Get action from this policy for the input observation.

Parameters:observation (numpy.ndarray) – Observation from environment.
Returns:Single optimal action from this policy.
get_actions(observations)[source]

Get actions from this policy for the input observations.

Parameters:observations (numpy.ndarray) – Observations from environment.
Returns:Optimal actions from this policy.
vectorized

Vectorized or not.

class GaussianGRUPolicy(env_spec, hidden_dim=32, name='GaussianGRUPolicy', hidden_nonlinearity=<function tanh>, hidden_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, hidden_b_init=<tensorflow.python.ops.init_ops.Zeros object>, recurrent_nonlinearity=<function sigmoid>, recurrent_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, output_nonlinearity=None, output_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, output_b_init=<tensorflow.python.ops.init_ops.Zeros object>, hidden_state_init=<tensorflow.python.ops.init_ops.Zeros object>, hidden_state_init_trainable=False, learn_std=True, std_share_network=False, init_std=1.0, layer_normalization=False, state_include_action=True)[source]

Bases: garage.tf.policies.base.StochasticPolicy

Models the action distribution using a Gaussian parameterized by a GRU.

Parameters:
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
  • name (str) – Model name, also the variable scope.
  • hidden_dim (int) – Hidden dimension for GRU cell for mean.
  • hidden_nonlinearity (Callable) – Activation function for intermediate dense layer(s). It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • hidden_w_init (Callable) – Initializer function for the weight of intermediate dense layer(s). The function should return a tf.Tensor.
  • hidden_b_init (Callable) – Initializer function for the bias of intermediate dense layer(s). The function should return a tf.Tensor.
  • recurrent_nonlinearity (Callable) – Activation function for recurrent layers. It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • recurrent_w_init (Callable) – Initializer function for the weight of recurrent layer(s). The function should return a tf.Tensor.
  • output_nonlinearity (Callable) – Activation function for output dense layer. It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • output_w_init (Callable) – Initializer function for the weight of output dense layer(s). The function should return a tf.Tensor.
  • output_b_init (Callable) – Initializer function for the bias of output dense layer(s). The function should return a tf.Tensor.
  • hidden_state_init (Callable) – Initializer function for the initial hidden state. The functino should return a tf.Tensor.
  • hidden_state_init_trainable (bool) – Bool for whether the initial hidden state is trainable.
  • learn_std (bool) – Is std trainable.
  • std_share_network (bool) – Boolean for whether mean and std share the same network.
  • init_std (float) – Initial value for std.
  • layer_normalization (bool) – Bool for using layer normalization or not.
  • state_include_action (bool) – Whether the state includes action. If True, input dimension will be (observation dimension + action dimension).
dist_info_sym(obs_var, state_info_vars, name=None)[source]

Build a symbolic graph of the distribution parameters.

Parameters:
  • obs_var (tf.Tensor) – Tensor input for symbolic graph.
  • state_info_vars (dict) – Extra state information, e.g. previous action.
  • name (str) – Name for symbolic graph.
Returns:

Outputs of the symbolic graph of distribution

parameters.

Return type:

dict[tf.Tensor]

distribution

Policy distribution.

Type:garage.tf.distributions.DiagonalGaussian
get_action(observation)[source]

Get a single action from this policy for the input observation.

Parameters:observation (numpy.ndarray) – Observation from environment.
Returns:Predicted action and agent info.
action (numpy.ndarray): Predicted action. agent_info (dict): Distribution obtained after observing the
given observation, with keys * mean: (numpy.ndarray) * log_std: (numpy.ndarray) * prev_action: (numpy.ndarray), only present if
self._state_include_action is True.
Return type:tuple[numpy.ndarray, dict]
get_actions(observations)[source]

Get multiple actions from this policy for the input observations.

Parameters:observations (numpy.ndarray) – Observations from environment.
Returns:Prediction actions and agent infos.
actions (numpy.ndarray): Predicted actions. agent_infos (dict): Distribution obtained after observing the
given observation, with keys * mean: (numpy.ndarray) * log_std: (numpy.ndarray) * prev_action: (numpy.ndarray), only present if
self._state_include_action is True.
Return type:tuple[numpy.ndarray, dict]
recurrent

Whether this policy is recurrent or not.

Type:bool
reset(dones=None)[source]

Reset the policy.

Note

If dones is None, it will be by default np.array([True]) which implies the policy will not be “vectorized”, i.e. number of parallel environments for training data sampling = 1.

Parameters:dones (numpy.ndarray) – Bool that indicates terminal state(s).
state_info_specs

State info specification.

Type:list
vectorized

Whether the policy is vectorized or not.

Type:bool
class GaussianLSTMPolicy(env_spec, hidden_dim=32, name='GaussianLSTMPolicy', hidden_nonlinearity=<function tanh>, hidden_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, hidden_b_init=<tensorflow.python.ops.init_ops.Zeros object>, recurrent_nonlinearity=<function sigmoid>, recurrent_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, output_nonlinearity=None, output_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, output_b_init=<tensorflow.python.ops.init_ops.Zeros object>, hidden_state_init=<tensorflow.python.ops.init_ops.Zeros object>, hidden_state_init_trainable=False, cell_state_init=<tensorflow.python.ops.init_ops.Zeros object>, cell_state_init_trainable=False, forget_bias=True, learn_std=True, std_share_network=False, init_std=1.0, layer_normalization=False, state_include_action=True)[source]

Bases: garage.tf.policies.base.StochasticPolicy

A policy which models actions with a Gaussian parameterized by an LSTM.

Parameters:
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
  • name (str) – Model name, also the variable scope.
  • hidden_dim (int) – Hidden dimension for LSTM cell for mean.
  • hidden_nonlinearity (Callable) – Activation function for intermediate dense layer(s). It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • hidden_w_init (Callable) – Initializer function for the weight of intermediate dense layer(s). The function should return a tf.Tensor.
  • hidden_b_init (Callable) – Initializer function for the bias of intermediate dense layer(s). The function should return a tf.Tensor.
  • recurrent_nonlinearity (Callable) – Activation function for recurrent layers. It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • recurrent_w_init (Callable) – Initializer function for the weight of recurrent layer(s). The function should return a tf.Tensor.
  • output_nonlinearity (Callable) – Activation function for output dense layer. It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • output_w_init (Callable) – Initializer function for the weight of output dense layer(s). The function should return a tf.Tensor.
  • output_b_init (Callable) – Initializer function for the bias of output dense layer(s). The function should return a tf.Tensor.
  • hidden_state_init (Callable) – Initializer function for the initial hidden state. The functino should return a tf.Tensor.
  • hidden_state_init_trainable (bool) – Bool for whether the initial hidden state is trainable.
  • cell_state_init (Callable) – Initializer function for the initial cell state. The functino should return a tf.Tensor.
  • cell_state_init_trainable (bool) – Bool for whether the initial cell state is trainable.
  • forget_bias (bool) – If True, add 1 to the bias of the forget gate at initialization. It’s used to reduce the scale of forgetting at the beginning of the training.
  • learn_std (bool) – Is std trainable.
  • std_share_network (bool) – Boolean for whether mean and std share the same network.
  • init_std (float) – Initial value for std.
  • layer_normalization (bool) – Bool for using layer normalization or not.
  • state_include_action (bool) – Whether the state includes action. If True, input dimension will be (observation dimension + action dimension).
dist_info_sym(obs_var, state_info_vars, name=None)[source]

Build a symbolic graph of the action distribution parameters.

Parameters:
  • obs_var (tf.Tensor) – Tensor input for symbolic graph.
  • state_info_vars (dict) – Extra state information, e.g. previous action.
  • name (str) – Name for symbolic graph.
Returns:

Output of the symbolic graph of action

distribution parameters.

Return type:

dict[tf.Tensor]

distribution

Policy distribution.

Type:garage.tf.distributions.DiagonalGaussian
get_action(observation)[source]

Get single action from this policy for the input observation.

Parameters:observation (numpy.ndarray) – Observation from environment.
Returns:Predicted action and agent information.
action (numpy.ndarray): Predicted action. agent_info (dict): Distribution obtained after observing the
given observation, with keys * mean: (numpy.ndarray) * log_std: (numpy.ndarray) * prev_action: (numpy.ndarray), only present if
self._state_include_action is True.
Return type:tuple[numpy.ndarray, dict]
get_actions(observations)[source]

Get multiple actions from this policy for the input observations.

Parameters:observations (numpy.ndarray) – Observations from environment.
Returns:Predicted action and agent information.
actions (numpy.ndarray): Predicted actions. agent_infos (dict): Distribution obtained after observing the
given observation, with keys * mean: (numpy.ndarray) * log_std: (numpy.ndarray) * prev_action: (numpy.ndarray), only present if
self._state_include_action is True.
Return type:tuple[numpy.ndarray, dict]
recurrent

Whether this policy is recurrent or not.

Type:bool
reset(dones=None)[source]

Reset the policy.

Note

If dones is None, it will be by default np.array([True]), which implies the policy will not be “vectorized”, i.e. number of paralle environments for training data sampling = 1.

Parameters:dones (numpy.ndarray) – Bool that indicates terminal state(s).
state_info_specs

State info specification.

Type:list
vectorized

Whether this policy is vectorized.

Type:bool
class GaussianMLPPolicy(env_spec, name='GaussianMLPPolicy', hidden_sizes=(32, 32), hidden_nonlinearity=<function tanh>, hidden_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, hidden_b_init=<tensorflow.python.ops.init_ops.Zeros object>, output_nonlinearity=None, output_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, output_b_init=<tensorflow.python.ops.init_ops.Zeros object>, learn_std=True, adaptive_std=False, std_share_network=False, init_std=1.0, min_std=1e-06, max_std=None, std_hidden_sizes=(32, 32), std_hidden_nonlinearity=<function tanh>, std_output_nonlinearity=None, std_parameterization='exp', layer_normalization=False)[source]

Bases: garage.tf.policies.base.StochasticPolicy

GaussianMLPPolicy with GaussianMLPModel.

A policy that contains a MLP to make prediction based on a gaussian distribution.

Parameters:
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
  • name (str) – Model name, also the variable scope.
  • hidden_sizes (list[int]) – Output dimension of dense layer(s) for the MLP for mean. For example, (32, 32) means the MLP consists of two hidden layers, each with 32 hidden units.
  • hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s). It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • hidden_w_init (callable) – Initializer function for the weight of intermediate dense layer(s). The function should return a tf.Tensor.
  • hidden_b_init (callable) – Initializer function for the bias of intermediate dense layer(s). The function should return a tf.Tensor.
  • output_nonlinearity (callable) – Activation function for output dense layer. It should return a tf.Tensor. Set it to None to maintain a linear activation.
  • output_w_init (callable) – Initializer function for the weight of output dense layer(s). The function should return a tf.Tensor.
  • output_b_init (callable) – Initializer function for the bias of output dense layer(s). The function should return a tf.Tensor.
  • learn_std (bool) – Is std trainable.
  • adaptive_std (bool) – Is std a neural network. If False, it will be a parameter.
  • std_share_network (bool) – Boolean for whether mean and std share the same network.
  • init_std (float) – Initial value for std.
  • std_hidden_sizes (list[int]) – Output dimension of dense layer(s) for the MLP for std. For example, (32, 32) means the MLP consists of two hidden layers, each with 32 hidden units.
  • min_std (float) – If not None, the std is at least the value of min_std, to avoid numerical issues.
  • max_std (float) – If not None, the std is at most the value of max_std, to avoid numerical issues.
  • std_hidden_nonlinearity – Nonlinearity for each hidden layer in the std network.
  • std_output_nonlinearity – Nonlinearity for output layer in the std network.
  • std_parametrization (str) – How the std should be parametrized. There are a few options:
  • exp (-) – the logarithm of the std will be stored, and applied a exponential transformation
  • softplus (-) – the std will be computed as log(1+exp(x))
  • layer_normalization (bool) – Bool for using layer normalization or not.
Returns:

dist_info_sym(obs_var, state_info_vars=None, name='default')[source]

Symbolic graph of the distribution.

distribution

Policy distribution.

get_action(observation)[source]

Get action from the policy.

get_actions(observations)[source]

Get actions from the policy.

get_params(trainable=True)[source]

Get the trainable variables.

vectorized

Vectorized or not.