garage.tf.policies.continuous_mlp_policy module¶

This modules creates a continuous MLP policy network.

A continuous MLP network can be used as policy method in different RL algorithms. It accepts an observation of the environment and predicts a continuous action.

class ContinuousMLPPolicy(env_spec, name='ContinuousMLPPolicy', hidden_sizes=(64, 64), hidden_nonlinearity=<function relu>, hidden_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, hidden_b_init=<tensorflow.python.ops.init_ops.Zeros object>, output_nonlinearity=<function tanh>, output_w_init=<tensorflow.python.ops.init_ops.GlorotUniform object>, output_b_init=<tensorflow.python.ops.init_ops.Zeros object>, input_include_goal=False, layer_normalization=False)[source]¶

Bases: garage.tf.policies.base.Policy

Continuous MLP Policy Network.

The policy network selects action based on the state of the environment. It uses neural nets to fit the function of pi(s).

Parameters:

env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
name (str) – Policy name, also the variable scope.
hidden_sizes (list[int]) – Output dimension of dense layer(s). For example, (32, 32) means the MLP of this policy consists of two hidden layers, each with 32 hidden units.
hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s). It should return a tf.Tensor. Set it to None to maintain a linear activation.
hidden_w_init (callable) – Initializer function for the weight of intermediate dense layer(s). The function should return a tf.Tensor.
hidden_b_init (callable) – Initializer function for the bias of intermediate dense layer(s). The function should return a tf.Tensor.
output_nonlinearity (callable) – Activation function for output dense layer. It should return a tf.Tensor. Set it to None to maintain a linear activation.
output_w_init (callable) – Initializer function for the weight of output dense layer(s). The function should return a tf.Tensor.
output_b_init (callable) – Initializer function for the bias of output dense layer(s). The function should return a tf.Tensor.
input_include_goal (bool) – Include goal in the observation or not.
layer_normalization (bool) – Bool for using layer normalization or not.

clone(name)[source]¶

Return a clone of the policy.

It only copies the configuration of the Q-function, not the parameters.

Parameters:	name (str) – Name of the newly created policy.
Returns:	Clone of this object
Return type:	garage.tf.policies.ContinuousMLPPolicy

get_action(observation)[source]¶

Get single action from this policy for the input observation.

Parameters:	observation (numpy.ndarray) – Observation from environment.
Returns:	Predicted action. dict: Empty dict since this policy does not model a distribution.
Return type:	numpy.ndarray

get_action_sym(obs_var, name=None)[source]¶

Symbolic graph of the action.

Parameters:	obs_var (tf.Tensor) – Tensor input for symbolic graph. name (str) – Name for symbolic graph.
Returns:	symbolic graph of the action.
Return type:	tf.Tensor

get_actions(observations)[source]¶

Get multiple actions from this policy for the input observations.

Parameters:	observations (numpy.ndarray) – Observations from environment.
Returns:	Predicted actions. dict: Empty dict since this policy does not model a distribution.
Return type:	numpy.ndarray

get_regularizable_vars()[source]¶

Get regularizable weight variables under the Policy scope.

Returns:	List of regularizable variables.
Return type:	list(tf.Variable)

vectorized¶

Vectorized or not.

Returns:	vectorized or not.
Return type:	bool