garage.tf.baselines.continuous_mlp_baseline
¶
A value function (baseline) based on a MLP model.
-
class
ContinuousMLPBaseline
(env_spec, num_seq_inputs=1, name='ContinuousMLPBaseline', hidden_sizes=(32, 32), hidden_nonlinearity=tf.nn.tanh, hidden_w_init=tf.initializers.glorot_uniform(seed=deterministic.get_tf_seed_stream()), hidden_b_init=tf.zeros_initializer(), output_nonlinearity=None, output_w_init=tf.initializers.glorot_uniform(seed=deterministic.get_tf_seed_stream()), output_b_init=tf.zeros_initializer(), optimizer=None, optimizer_args=None, normalize_inputs=True)¶ Bases:
garage.tf.models.NormalizedInputMLPModel
,garage.np.baselines.Baseline
A value function using a MLP network.
It fits the input data by performing linear regression to the outputs.
Parameters: - env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
- num_seq_inputs (float) – Number of sequence per input. By default it is 1.0, which means only one single sequence.
- name (str) – Name of baseline.
- hidden_sizes (list[int]) – Output dimension of dense layer(s) for the MLP for mean. For example, (32, 32) means the MLP consists of two hidden layers, each with 32 hidden units.
- hidden_nonlinearity (Callable) – Activation function for intermediate dense layer(s). It should return a tf.Tensor. Set it to None to maintain a linear activation.
- hidden_w_init (Callable) – Initializer function for the weight of intermediate dense layer(s). The function should return a tf.Tensor.
- hidden_b_init (Callable) – Initializer function for the bias of intermediate dense layer(s). The function should return a tf.Tensor.
- output_nonlinearity (Callable) – Activation function for output dense layer. It should return a tf.Tensor. Set it to None to maintain a linear activation.
- output_w_init (Callable) – Initializer function for the weight of output dense layer(s). The function should return a tf.Tensor.
- output_b_init (Callable) – Initializer function for the bias of output dense layer(s). The function should return a tf.Tensor.
- optimizer (garage.tf.Optimizer) – Optimizer for minimizing the negative log-likelihood.
- optimizer_args (dict) – Arguments for the optimizer. Default is None, which means no arguments.
- normalize_inputs (bool) – Bool for normalizing inputs or not.
-
env_spec
¶ Policy environment specification.
Returns: Environment specification. Return type: garage.EnvSpec
-
parameters
¶ Parameters of the model.
Returns: Parameters Return type: np.ndarray
-
name
¶ Name (str) of the model.
This is also the variable scope of the model.
Returns: Name of the model. Return type: str
-
input
¶ Default input of the model.
When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the input of the network.
Returns: Default input of the model. Return type: tf.Tensor
-
output
¶ Default output of the model.
When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the output of the network.
Returns: Default output of the model. Return type: tf.Tensor
-
inputs
¶ Default inputs of the model.
When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the inputs of the network.
Returns: Default inputs of the model. Return type: list[tf.Tensor]
-
outputs
¶ Default outputs of the model.
When the model is built the first time, by default it creates the ‘default’ network. This property creates a reference to the outputs of the network.
Returns: Default outputs of the model. Return type: list[tf.Tensor]
-
state_info_specs
¶ State info specification.
Returns: - keys and shapes for the information related to the
- module’s state when taking an action.
Return type: List[str]
-
state_info_keys
¶ State info keys.
Returns: - keys for the information related to the module’s state
- when taking an input.
Return type: List[str]
-
fit
(self, paths)¶ Fit regressor based on paths.
Parameters: paths (dict[numpy.ndarray]) – Sample paths.
-
predict
(self, paths)¶ Predict value based on paths.
Parameters: paths (dict[numpy.ndarray]) – Sample paths. Returns: Predicted value. Return type: numpy.ndarray
-
network_output_spec
(self)¶ Network output spec.
Returns: List of key(str) for the network outputs. Return type: list[str]
-
build
(self, *inputs, name=None)¶ Build a Network with the given input(s).
* Do not call tf.global_variable_initializers() after building a model as it will reassign random weights to the model. The parameters inside a model will be initialized when calling build(). *
It uses the same, fixed variable scope for all Networks, to ensure parameter sharing. Different Networks must have an unique name.
Parameters: Raises: ValueError
– When a Network with the same name is already built.Returns: - Output tensors of the model with the given
inputs.
Return type: list[tf.Tensor]
-
network_input_spec
(self)¶ Network input spec.
Returns: List of key(str) for the network inputs. Return type: list[str]
-
reset
(self, do_resets=None)¶ Reset the module.
This is effective only to recurrent modules. do_resets is effective only to vectoried modules.
For a vectorized modules, do_resets is an array of boolean indicating which internal states to be reset. The length of do_resets should be equal to the length of inputs.
Parameters: do_resets (numpy.ndarray) – Bool array indicating which states to be reset.
-
terminate
(self)¶ Clean up operation.
-
get_trainable_vars
(self)¶ Get trainable variables.
Returns: - A list of trainable variables in the current
- variable scope.
Return type: List[tf.Variable]
-
get_global_vars
(self)¶ Get global variables.
Returns: - A list of global variables in the current
- variable scope.
Return type: List[tf.Variable]
-
get_regularizable_vars
(self)¶ Get all network weight variables in the current scope.
Returns: - A list of network weight variables in the
- current variable scope.
Return type: List[tf.Variable]
-
get_params
(self)¶ Get the trainable variables.
Returns: - A list of trainable variables in the current
- variable scope.
Return type: List[tf.Variable]
-
get_param_shapes
(self)¶ Get parameter shapes.
Returns: A list of variable shapes. Return type: List[tuple]
-
get_param_values
(self)¶ Get param values.
Returns: - Values of the parameters evaluated in
- the current session
Return type: np.ndarray
-
set_param_values
(self, param_values)¶ Set param values.
Parameters: param_values (np.ndarray) – A numpy array of parameter values.
-
flat_to_params
(self, flattened_params)¶ Unflatten tensors according to their respective shapes.
Parameters: flattened_params (np.ndarray) – A numpy array of flattened params. Returns: - A list of parameters reshaped to the
- shapes specified.
Return type: List[np.ndarray]