garage.tf.q_functions

Q-Functions for TensorFlow-based algorithms.

class ContinuousCNNQFunction(env_spec, filters, strides, hidden_sizes=(256,), action_merge_layer=-2, name=None, padding='SAME', max_pooling=False, pool_strides=(2, 2), pool_shapes=(2, 2), cnn_hidden_nonlinearity=tf.nn.relu, hidden_nonlinearity=tf.nn.relu, hidden_w_init=tf.initializers.glorot_uniform(seed=deterministic.get_tf_seed_stream()), hidden_b_init=tf.zeros_initializer(), output_nonlinearity=None, output_w_init=tf.initializers.glorot_uniform(seed=deterministic.get_tf_seed_stream()), output_b_init=tf.zeros_initializer(), layer_normalization=False)

Bases: garage.tf.models.CNNMLPMergeModel

Inheritance diagram of garage.tf.q_functions.ContinuousCNNQFunction

Q function based on a CNN-MLP structure for continuous action space.

This class implements a Q value network to predict Q based on the input state and action. It uses an CNN and a MLP to fit the function of Q(s, a).

Parameters
  • env_spec (EnvSpec) – Environment specification.

  • filters (Tuple[Tuple[int, Tuple[int, int]], ...]) – Number and dimension of filters. For example, ((3, (3, 5)), (32, (3, 3))) means there are two convolutional layers. The filter for the first layer have 3 channels and its shape is (3 x 5), while the filter for the second layer have 32 channels and its shape is (3 x 3).

  • strides (tuple[int]) – The stride of the sliding window. For example, (1, 2) means there are two convolutional layers. The stride of the filter for first layer is 1 and that of the second layer is 2.

  • hidden_sizes (tuple[int]) – Output dimension of dense layer(s). For example, (32, 32) means the MLP of this q-function consists of two hidden layers, each with 32 hidden units.

  • action_merge_layer (int) – The index of layers at which to concatenate action inputs with the network. The indexing works like standard python list indexing. Index of 0 refers to the input layer (observation input) while an index of -1 points to the last hidden layer. Default parameter points to second layer from the end.

  • name (str) – Variable scope of the cnn.

  • padding (str) – The type of padding algorithm to use, either ‘SAME’ or ‘VALID’.

  • max_pooling (bool) – Boolean for using max pooling layer or not.

  • pool_shapes (tuple[int]) – Dimension of the pooling layer(s). For example, (2, 2) means that all the pooling layers have shape (2, 2).

  • pool_strides (tuple[int]) – The strides of the pooling layer(s). For example, (2, 2) means that all the pooling layers have strides (2, 2).

  • cnn_hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s) in the CNN. It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s) in the MLP. It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • hidden_w_init (callable) – Initializer function for the weight of intermediate dense layer(s) in the MLP. The function should return a tf.Tensor.

  • hidden_b_init (callable) – Initializer function for the bias of intermediate dense layer(s) in the MLP. The function should return a tf.Tensor.

  • output_nonlinearity (callable) – Activation function for output dense layer in the MLP. It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • output_w_init (callable) – Initializer function for the weight of output dense layer(s) in the MLP. The function should return a tf.Tensor.

  • output_b_init (callable) – Initializer function for the bias of output dense layer(s) in the MLP. The function should return a tf.Tensor.

  • layer_normalization (bool) – Bool for using layer normalization or not.

property inputs

The observation and action input tensors.

The returned tuple contains two tensors. The first is the observation tensor with shape \((N, O*)\), and the second is the action tensor with shape \((N, A*)\).

Type

tuple[tf.Tensor]

property parameters

Parameters of the model.

Returns

Parameters

Return type

np.ndarray

property name

Name (str) of the model.

This is also the variable scope of the model.

Returns

Name of the model.

Return type

str

property input

Default input of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the input of the network.

Returns

Default input of the model.

Return type

tf.Tensor

property output

Default output of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the output of the network.

Returns

Default output of the model.

Return type

tf.Tensor

property outputs

Default outputs of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the outputs of the network.

Returns

Default outputs of the model.

Return type

list[tf.Tensor]

property state_info_specs

State info specification.

Returns

keys and shapes for the information related to the

module’s state when taking an action.

Return type

List[str]

property state_info_keys

State info keys.

Returns

keys for the information related to the module’s state

when taking an input.

Return type

List[str]

get_qval(observation, action)

Q Value of the network.

Parameters
  • observation (np.ndarray) – Observation input of shape \((N, O*)\).

  • action (np.ndarray) – Action input of shape \((N, A*)\).

Returns

Array of shape \((N, )\) containing Q values

corresponding to each (obs, act) pair.

Return type

np.ndarray

build(state_input, action_input, name)

Build the symbolic graph for q-network.

Parameters
  • state_input (tf.Tensor) – The state input tf.Tensor of shape \((N, O*)\).

  • action_input (tf.Tensor) – The action input tf.Tensor of shape \((N, A*)\).

  • name (str) – Network variable scope.

Returns

The output Q value tensor of shape \((N, )\).

Return type

tf.Tensor

clone(name)

Return a clone of the Q-function.

It copies the configuration of the primitive and also the parameters.

Parameters

name (str) – Name of the newly created q-function.

Returns

Cloned Q function.

Return type

ContinuousCNNQFunction

network_input_spec()

Network input spec.

Returns

List of key(str) for the network inputs.

Return type

list[str]

network_output_spec()

Network output spec.

Returns

List of key(str) for the network outputs.

Return type

list[str]

reset(do_resets=None)

Reset the module.

This is effective only to recurrent modules. do_resets is effective only to vectoried modules.

For a vectorized modules, do_resets is an array of boolean indicating which internal states to be reset. The length of do_resets should be equal to the length of inputs.

Parameters

do_resets (numpy.ndarray) – Bool array indicating which states to be reset.

terminate()

Clean up operation.

get_trainable_vars()

Get trainable variables.

Returns

A list of trainable variables in the current

variable scope.

Return type

List[tf.Variable]

get_global_vars()

Get global variables.

Returns

A list of global variables in the current

variable scope.

Return type

List[tf.Variable]

get_regularizable_vars()

Get all network weight variables in the current scope.

Returns

A list of network weight variables in the

current variable scope.

Return type

List[tf.Variable]

get_params()

Get the trainable variables.

Returns

A list of trainable variables in the current

variable scope.

Return type

List[tf.Variable]

get_param_shapes()

Get parameter shapes.

Returns

A list of variable shapes.

Return type

List[tuple]

get_param_values()

Get param values.

Returns

Values of the parameters evaluated in

the current session

Return type

np.ndarray

set_param_values(param_values)

Set param values.

Parameters

param_values (np.ndarray) – A numpy array of parameter values.

flat_to_params(flattened_params)

Unflatten tensors according to their respective shapes.

Parameters

flattened_params (np.ndarray) – A numpy array of flattened params.

Returns

A list of parameters reshaped to the

shapes specified.

Return type

List[np.ndarray]

class ContinuousMLPQFunction(env_spec, name='ContinuousMLPQFunction', hidden_sizes=(32, 32), action_merge_layer=-2, hidden_nonlinearity=tf.nn.relu, hidden_w_init=tf.initializers.glorot_uniform(seed=deterministic.get_tf_seed_stream()), hidden_b_init=tf.zeros_initializer(), output_nonlinearity=None, output_w_init=tf.initializers.glorot_uniform(seed=deterministic.get_tf_seed_stream()), output_b_init=tf.zeros_initializer(), layer_normalization=False)

Bases: garage.tf.models.MLPMergeModel

Inheritance diagram of garage.tf.q_functions.ContinuousMLPQFunction

Continuous MLP QFunction.

This class implements a q value network to predict q based on the input state and action. It uses an MLP to fit the function of Q(s, a).

Parameters
  • env_spec (EnvSpec) – Environment specification.

  • name (str) – Name of the q-function, also serves as the variable scope.

  • hidden_sizes (list[int]) – Output dimension of dense layer(s). For example, (32, 32) means the MLP of this q-function consists of two hidden layers, each with 32 hidden units.

  • action_merge_layer (int) – The index of layers at which to concatenate action inputs with the network. The indexing works like standard python list indexing. Index of 0 refers to the input layer (observation input) while an index of -1 points to the last hidden layer. Default parameter points to second layer from the end.

  • hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s). It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • hidden_w_init (callable) – Initializer function for the weight of intermediate dense layer(s). The function should return a tf.Tensor.

  • hidden_b_init (callable) – Initializer function for the bias of intermediate dense layer(s). The function should return a tf.Tensor.

  • output_nonlinearity (callable) – Activation function for output dense layer. It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • output_w_init (callable) – Initializer function for the weight of output dense layer(s). The function should return a tf.Tensor.

  • output_b_init (callable) – Initializer function for the bias of output dense layer(s). The function should return a tf.Tensor.

  • layer_normalization (bool) – Bool for using layer normalization.

property inputs

Return the input tensor.

Returns

The input tensors of the model.

Return type

tf.Tensor

property parameters

Parameters of the model.

Returns

Parameters

Return type

np.ndarray

property name

Name (str) of the model.

This is also the variable scope of the model.

Returns

Name of the model.

Return type

str

property input

Default input of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the input of the network.

Returns

Default input of the model.

Return type

tf.Tensor

property output

Default output of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the output of the network.

Returns

Default output of the model.

Return type

tf.Tensor

property outputs

Default outputs of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the outputs of the network.

Returns

Default outputs of the model.

Return type

list[tf.Tensor]

property state_info_specs

State info specification.

Returns

keys and shapes for the information related to the

module’s state when taking an action.

Return type

List[str]

property state_info_keys

State info keys.

Returns

keys for the information related to the module’s state

when taking an input.

Return type

List[str]

get_qval(observation, action)

Q Value of the network.

Parameters
  • observation (np.ndarray) – Observation input.

  • action (np.ndarray) – Action input.

Returns

Q values.

Return type

np.ndarray

build(state_input, action_input, name)

Build the symbolic graph for q-network.

Parameters
  • state_input (tf.Tensor) – The state input tf.Tensor to the network.

  • action_input (tf.Tensor) – The action input tf.Tensor to the network.

  • name (str) – Network variable scope.

Returns

The output of Continuous MLP QFunction.

Return type

tf.Tensor

clone(name)

Return a clone of the Q-function.

It copies the configuration of the primitive and also the parameters.

Parameters

name (str) – Name of the newly created q-function.

Returns

A new instance with same arguments.

Return type

ContinuousMLPQFunction

network_input_spec()

Network input spec.

Returns

List of key(str) for the network outputs.

Return type

list[str]

network_output_spec()

Network output spec.

Returns

List of key(str) for the network outputs.

Return type

list[str]

reset(do_resets=None)

Reset the module.

This is effective only to recurrent modules. do_resets is effective only to vectoried modules.

For a vectorized modules, do_resets is an array of boolean indicating which internal states to be reset. The length of do_resets should be equal to the length of inputs.

Parameters

do_resets (numpy.ndarray) – Bool array indicating which states to be reset.

terminate()

Clean up operation.

get_trainable_vars()

Get trainable variables.

Returns

A list of trainable variables in the current

variable scope.

Return type

List[tf.Variable]

get_global_vars()

Get global variables.

Returns

A list of global variables in the current

variable scope.

Return type

List[tf.Variable]

get_regularizable_vars()

Get all network weight variables in the current scope.

Returns

A list of network weight variables in the

current variable scope.

Return type

List[tf.Variable]

get_params()

Get the trainable variables.

Returns

A list of trainable variables in the current

variable scope.

Return type

List[tf.Variable]

get_param_shapes()

Get parameter shapes.

Returns

A list of variable shapes.

Return type

List[tuple]

get_param_values()

Get param values.

Returns

Values of the parameters evaluated in

the current session

Return type

np.ndarray

set_param_values(param_values)

Set param values.

Parameters

param_values (np.ndarray) – A numpy array of parameter values.

flat_to_params(flattened_params)

Unflatten tensors according to their respective shapes.

Parameters

flattened_params (np.ndarray) – A numpy array of flattened params.

Returns

A list of parameters reshaped to the

shapes specified.

Return type

List[np.ndarray]

class DiscreteCNNQFunction(env_spec, filters, strides, hidden_sizes=(256,), name=None, padding='SAME', max_pooling=False, pool_strides=(2, 2), pool_shapes=(2, 2), cnn_hidden_nonlinearity=tf.nn.relu, hidden_nonlinearity=tf.nn.relu, hidden_w_init=tf.initializers.glorot_uniform(seed=deterministic.get_tf_seed_stream()), hidden_b_init=tf.zeros_initializer(), output_nonlinearity=None, output_w_init=tf.initializers.glorot_uniform(seed=deterministic.get_tf_seed_stream()), output_b_init=tf.zeros_initializer(), dueling=False, layer_normalization=False)

Bases: garage.tf.models.Sequential

Inheritance diagram of garage.tf.q_functions.DiscreteCNNQFunction

Q function based on a CNN-MLP structure for discrete action space.

This class implements a Q value network to predict Q based on the input state and action. It uses an CNN and a MLP to fit the function of Q(s, a).

Parameters
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.

  • filters (Tuple[Tuple[int, Tuple[int, int]], ...]) – Number and dimension of filters. For example, ((3, (3, 5)), (32, (3, 3))) means there are two convolutional layers. The filter for the first layer have 3 channels and its shape is (3 x 5), while the filter for the second layer have 32 channels and its shape is (3 x 3).

  • strides (tuple[int]) – The stride of the sliding window. For example, (1, 2) means there are two convolutional layers. The stride of the filter for first layer is 1 and that of the second layer is 2.

  • hidden_sizes (list[int]) – Output dimension of dense layer(s). For example, (32, 32) means the MLP of this q-function consists of two hidden layers, each with 32 hidden units.

  • name (str) – Variable scope of the cnn.

  • padding (str) – The type of padding algorithm to use, either ‘SAME’ or ‘VALID’.

  • max_pooling (bool) – Boolean for using max pooling layer or not.

  • pool_shapes (tuple[int]) – Dimension of the pooling layer(s). For example, (2, 2) means that all the pooling layers have shape (2, 2).

  • pool_strides (tuple[int]) – The strides of the pooling layer(s). For example, (2, 2) means that all the pooling layers have strides (2, 2).

  • cnn_hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s) in the CNN. It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s) in the MLP. It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • hidden_w_init (callable) – Initializer function for the weight of intermediate dense layer(s) in the MLP. The function should return a tf.Tensor.

  • hidden_b_init (callable) – Initializer function for the bias of intermediate dense layer(s) in the MLP. The function should return a tf.Tensor.

  • output_nonlinearity (callable) – Activation function for output dense layer in the MLP. It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • output_w_init (callable) – Initializer function for the weight of output dense layer(s) in the MLP. The function should return a tf.Tensor.

  • output_b_init (callable) – Initializer function for the bias of output dense layer(s) in the MLP. The function should return a tf.Tensor.

  • dueling (bool) – Bool for using dueling network or not.

  • layer_normalization (bool) – Bool for using layer normalization or not.

property q_vals

Return the Q values, the output of the network.

Returns

Q values.

Return type

list[tf.Tensor]

property input

Get input.

Returns

QFunction Input.

Return type

tf.Tensor

property output

output of the model by default.

Type

tf.Tensor

property inputs

inputs of the model by default.

Type

tf.Tensor

property outputs

outputs of the model by default.

Type

tf.Tensor

property parameters

Parameters of the model.

Returns

Parameters

Return type

np.ndarray

property name

Name (str) of the model.

This is also the variable scope of the model.

Returns

Name of the model.

Return type

str

property state_info_specs

State info specification.

Returns

keys and shapes for the information related to the

module’s state when taking an action.

Return type

List[str]

property state_info_keys

State info keys.

Returns

keys for the information related to the module’s state

when taking an input.

Return type

List[str]

build(state_input, name)

Build the symbolic graph for q-network.

Parameters
  • state_input (tf.Tensor) – The state input tf.Tensor to the network.

  • name (str) – Network variable scope.

Returns

The tf.Tensor output of Discrete CNN QFunction.

Return type

tf.Tensor

clone(name)

Return a clone of the Q-function.

It copies the configuration of the primitive and also the parameters.

Parameters

name (str) – Name of the newly created q-function.

Returns

Clone of this object

Return type

garage.tf.q_functions.DiscreteCNNQFunction

network_input_spec()

Network input spec.

Returns

List of key(str) for the network inputs.

Return type

list[str]

network_output_spec()

Network output spec.

Returns

List of key(str) for the network outputs.

Return type

list[str]

reset(do_resets=None)

Reset the module.

This is effective only to recurrent modules. do_resets is effective only to vectoried modules.

For a vectorized modules, do_resets is an array of boolean indicating which internal states to be reset. The length of do_resets should be equal to the length of inputs.

Parameters

do_resets (numpy.ndarray) – Bool array indicating which states to be reset.

terminate()

Clean up operation.

get_trainable_vars()

Get trainable variables.

Returns

A list of trainable variables in the current

variable scope.

Return type

List[tf.Variable]

get_global_vars()

Get global variables.

Returns

A list of global variables in the current

variable scope.

Return type

List[tf.Variable]

get_regularizable_vars()

Get all network weight variables in the current scope.

Returns

A list of network weight variables in the

current variable scope.

Return type

List[tf.Variable]

get_params()

Get the trainable variables.

Returns

A list of trainable variables in the current

variable scope.

Return type

List[tf.Variable]

get_param_shapes()

Get parameter shapes.

Returns

A list of variable shapes.

Return type

List[tuple]

get_param_values()

Get param values.

Returns

Values of the parameters evaluated in

the current session

Return type

np.ndarray

set_param_values(param_values)

Set param values.

Parameters

param_values (np.ndarray) – A numpy array of parameter values.

flat_to_params(flattened_params)

Unflatten tensors according to their respective shapes.

Parameters

flattened_params (np.ndarray) – A numpy array of flattened params.

Returns

A list of parameters reshaped to the

shapes specified.

Return type

List[np.ndarray]

class DiscreteMLPQFunction(env_spec, name=None, hidden_sizes=(32, 32), hidden_nonlinearity=tf.nn.relu, hidden_w_init=tf.initializers.glorot_uniform(seed=deterministic.get_tf_seed_stream()), hidden_b_init=tf.zeros_initializer(), output_nonlinearity=None, output_w_init=tf.initializers.glorot_uniform(seed=deterministic.get_tf_seed_stream()), output_b_init=tf.zeros_initializer(), layer_normalization=False)

Bases: garage.tf.models.MLPModel

Inheritance diagram of garage.tf.q_functions.DiscreteMLPQFunction

Discrete MLP Q Function.

This class implements a Q-value network. It predicts Q-value based on the input state and action. It uses an MLP to fit the function Q(s, a).

Parameters
  • env_spec (EnvSpec) – Environment specification.

  • name (str) – Name of the q-function, also serves as the variable scope.

  • hidden_sizes (list[int]) – Output dimension of dense layer(s). For example, (32, 32) means the MLP of this q-function consists of two hidden layers, each with 32 hidden units.

  • hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s). It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • hidden_w_init (callable) – Initializer function for the weight of intermediate dense layer(s). The function should return a tf.Tensor.

  • hidden_b_init (callable) – Initializer function for the bias of intermediate dense layer(s). The function should return a tf.Tensor.

  • output_nonlinearity (callable) – Activation function for output dense layer. It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • output_w_init (callable) – Initializer function for the weight of output dense layer(s). The function should return a tf.Tensor.

  • output_b_init (callable) – Initializer function for the bias of output dense layer(s). The function should return a tf.Tensor.

  • layer_normalization (bool) – Bool for using layer normalization.

property q_vals

Return the Q values, the output of the network.

Returns

Q values.

Return type

list[tf.Tensor]

property input

Get input.

Returns

QFunction Input.

Return type

tf.Tensor

property parameters

Parameters of the model.

Returns

Parameters

Return type

np.ndarray

property name

Name (str) of the model.

This is also the variable scope of the model.

Returns

Name of the model.

Return type

str

property output

Default output of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the output of the network.

Returns

Default output of the model.

Return type

tf.Tensor

property inputs

Default inputs of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the inputs of the network.

Returns

Default inputs of the model.

Return type

list[tf.Tensor]

property outputs

Default outputs of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the outputs of the network.

Returns

Default outputs of the model.

Return type

list[tf.Tensor]

property state_info_specs

State info specification.

Returns

keys and shapes for the information related to the

module’s state when taking an action.

Return type

List[str]

property state_info_keys

State info keys.

Returns

keys for the information related to the module’s state

when taking an input.

Return type

List[str]

build(state_input, name)

Build the symbolic graph for q-network.

Parameters
  • state_input (tf.Tensor) – The state input tf.Tensor to the network.

  • name (str) – Network variable scope.

Returns

The tf.Tensor output of Discrete MLP QFunction.

Return type

tf.Tensor

clone(name)

Return a clone of the Q-function.

It copies the configuration of the primitive and also the parameters.

Parameters

name (str) – Name of the newly created q-function.

Returns

Clone of this object

Return type

garage.tf.q_functions.DiscreteMLPQFunction

network_input_spec()

Network input spec.

Returns

List of key(str) for the network inputs.

Return type

list[str]

network_output_spec()

Network output spec.

Returns

List of key(str) for the network outputs.

Return type

list[str]

reset(do_resets=None)

Reset the module.

This is effective only to recurrent modules. do_resets is effective only to vectoried modules.

For a vectorized modules, do_resets is an array of boolean indicating which internal states to be reset. The length of do_resets should be equal to the length of inputs.

Parameters

do_resets (numpy.ndarray) – Bool array indicating which states to be reset.

terminate()

Clean up operation.

get_trainable_vars()

Get trainable variables.

Returns

A list of trainable variables in the current

variable scope.

Return type

List[tf.Variable]

get_global_vars()

Get global variables.

Returns

A list of global variables in the current

variable scope.

Return type

List[tf.Variable]

get_regularizable_vars()

Get all network weight variables in the current scope.

Returns

A list of network weight variables in the

current variable scope.

Return type

List[tf.Variable]

get_params()

Get the trainable variables.

Returns

A list of trainable variables in the current

variable scope.

Return type

List[tf.Variable]

get_param_shapes()

Get parameter shapes.

Returns

A list of variable shapes.

Return type

List[tuple]

get_param_values()

Get param values.

Returns

Values of the parameters evaluated in

the current session

Return type

np.ndarray

set_param_values(param_values)

Set param values.

Parameters

param_values (np.ndarray) – A numpy array of parameter values.

flat_to_params(flattened_params)

Unflatten tensors according to their respective shapes.

Parameters

flattened_params (np.ndarray) – A numpy array of flattened params.

Returns

A list of parameters reshaped to the

shapes specified.

Return type

List[np.ndarray]

class DiscreteMLPDuelingQFunction(env_spec, name=None, hidden_sizes=(32, 32), hidden_nonlinearity=tf.nn.relu, hidden_w_init=tf.initializers.glorot_uniform(), hidden_b_init=tf.zeros_initializer(), output_nonlinearity=None, output_w_init=tf.initializers.glorot_uniform(), output_b_init=tf.zeros_initializer(), layer_normalization=False)

Bases: garage.tf.models.MLPDuelingModel

Inheritance diagram of garage.tf.q_functions.DiscreteMLPDuelingQFunction

Discrete Q Function with dualing MLP network.

This class implements a Q-value network. It predicts Q-value based on the input state and action. It uses an MLP to fit the function Q(s, a).

Parameters
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.

  • name (str) – Name of the q-function, also serves as the variable scope.

  • hidden_sizes (list[int]) – Output dimension of dense layer(s). For example, (32, 32) means the MLP of this q-function consists of two hidden layers, each with 32 hidden units.

  • hidden_nonlinearity (callable) – Activation function for intermediate dense layer(s). It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • hidden_w_init (callable) – Initializer function for the weight of intermediate dense layer(s). The function should return a tf.Tensor.

  • hidden_b_init (callable) – Initializer function for the bias of intermediate dense layer(s). The function should return a tf.Tensor.

  • output_nonlinearity (callable) – Activation function for output dense layer. It should return a tf.Tensor. Set it to None to maintain a linear activation.

  • output_w_init (callable) – Initializer function for the weight of output dense layer(s). The function should return a tf.Tensor.

  • output_b_init (callable) – Initializer function for the bias of output dense layer(s). The function should return a tf.Tensor.

  • layer_normalization (bool) – Bool for using layer normalization.

property q_vals

Return the Q values, the output of the network.

Returns

Q values.

Return type

list[tf.Tensor]

property input

Get input.

Returns

QFunction Input.

Return type

tf.Tensor

property parameters

Parameters of the model.

Returns

Parameters

Return type

np.ndarray

property name

Name (str) of the model.

This is also the variable scope of the model.

Returns

Name of the model.

Return type

str

property output

Default output of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the output of the network.

Returns

Default output of the model.

Return type

tf.Tensor

property inputs

Default inputs of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the inputs of the network.

Returns

Default inputs of the model.

Return type

list[tf.Tensor]

property outputs

Default outputs of the model.

When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the outputs of the network.

Returns

Default outputs of the model.

Return type

list[tf.Tensor]

property state_info_specs

State info specification.

Returns

keys and shapes for the information related to the

module’s state when taking an action.

Return type

List[str]

property state_info_keys

State info keys.

Returns

keys for the information related to the module’s state

when taking an input.

Return type

List[str]

build(state_input, name)

Build the symbolic graph for q-network.

Parameters
  • state_input (tf.Tensor) – The state input tf.Tensor to the network.

  • name (str) – Network variable scope.

Returns

The tf.Tensor output of Discrete MLP QFunction.

Return type

tf.Tensor

clone(name)

Return a clone of the Q-function.

It copies the configuration of the primitive and also the parameters.

Parameters

name (str) – Name of the newly created q-function.

Returns

Clone of this object

Return type

garage.tf.q_functions.DiscreteMLPQFunction

network_input_spec()

Network input spec.

Returns

List of key(str) for the network inputs.

Return type

list[str]

network_output_spec()

Network output spec.

Returns

List of key(str) for the network outputs.

Return type

list[str]

reset(do_resets=None)

Reset the module.

This is effective only to recurrent modules. do_resets is effective only to vectoried modules.

For a vectorized modules, do_resets is an array of boolean indicating which internal states to be reset. The length of do_resets should be equal to the length of inputs.

Parameters

do_resets (numpy.ndarray) – Bool array indicating which states to be reset.

terminate()

Clean up operation.

get_trainable_vars()

Get trainable variables.

Returns

A list of trainable variables in the current

variable scope.

Return type

List[tf.Variable]

get_global_vars()

Get global variables.

Returns

A list of global variables in the current

variable scope.

Return type

List[tf.Variable]

get_regularizable_vars()

Get all network weight variables in the current scope.

Returns

A list of network weight variables in the

current variable scope.

Return type

List[tf.Variable]

get_params()

Get the trainable variables.

Returns

A list of trainable variables in the current

variable scope.

Return type

List[tf.Variable]

get_param_shapes()

Get parameter shapes.

Returns

A list of variable shapes.

Return type

List[tuple]

get_param_values()

Get param values.

Returns

Values of the parameters evaluated in

the current session

Return type

np.ndarray

set_param_values(param_values)

Set param values.

Parameters

param_values (np.ndarray) – A numpy array of parameter values.

flat_to_params(flattened_params)

Unflatten tensors according to their respective shapes.

Parameters

flattened_params (np.ndarray) – A numpy array of flattened params.

Returns

A list of parameters reshaped to the

shapes specified.

Return type

List[np.ndarray]