garage.tf.policies.base module¶
Base class for Policies.
-
class
Policy
(name, env_spec)[source]¶ Bases:
abc.ABC
Base class for Policies.
Parameters: - name (str) – Policy name, also the variable scope.
- env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
-
action_space
¶ The action space for the environment.
Type: akro.Space
-
env_spec
¶ Policy environment specification.
Type: garage.EnvSpec
-
flat_to_params
(flattened_params, **tags)[source]¶ Unflatten tensors according to their respective shapes.
Parameters: - flattened_params (np.ndarray) – A numpy array of flattened params.
- tags (dict) – A map specifying the parameters and their shapes.
Returns: A list of parameters reshaped to the shapes specified.
Return type: tensors (List[np.ndarray])
-
get_action
(observation)[source]¶ Get action sampled from the policy.
Parameters: observation (np.ndarray) – Observation from the environment. Returns: Action sampled from the policy. Return type: (np.ndarray)
-
get_actions
(observations)[source]¶ Get action sampled from the policy.
Parameters: observations (list[np.ndarray]) – Observations from the environment. Returns: Actions sampled from the policy. Return type: (np.ndarray)
-
get_global_vars
()[source]¶ Get global variables.
Returns: A list of global variables in the current variable scope. Return type: List[tf.Variable]
-
get_param_values
(**tags)[source]¶ Get param values.
Parameters: tags (dict) – A map of parameters for which the values are required. Returns: Values of the parameters evaluated in the current session Return type: param_values (np.ndarray)
-
get_params
(trainable=True)[source]¶ Get the trainable variables.
Returns: A list of trainable variables in the current variable scope. Return type: List[tf.Variable]
-
get_trainable_vars
()[source]¶ Get trainable variables.
Returns: A list of trainable variables in the current variable scope. Return type: List[tf.Variable]
-
observation_space
¶ The observation space of the environment.
Type: akro.Space
-
reset
(dones=None)[source]¶ Reset the policy.
If dones is None, it will be by default np.array([True]) which implies the policy will not be “vectorized”, i.e. number of parallel environments for training data sampling = 1.
Parameters: dones (numpy.ndarray) – Bool that indicates terminal state(s).
-
set_param_values
(param_values, name=None, **tags)[source]¶ Set param values.
Parameters: - param_values (np.ndarray) – A numpy array of parameter values.
- tags (dict) – A map of parameters for which the values should be
- loaded. –
-
state_info_keys
¶ State info keys.
Returns: keys for the information related to the policy’s state when taking an action. Return type: List[str]
-
class
StochasticPolicy
(name, env_spec)[source]¶ Bases:
garage.tf.policies.base.Policy
StochasticPolicy.
-
dist_info
(obs, state_infos)[source]¶ Distribution info.
Return the distribution information about the actions.
Parameters: - obs (tf.Tensor) – observation values
- state_infos (dict) – a dictionary whose values should contain information about the state of the policy at the time it received the observation
-
dist_info_sym
(obs_var, state_info_vars, name='dist_info_sym')[source]¶ Symbolic graph of the distribution.
Return the symbolic distribution information about the actions. :param obs_var: symbolic variable for observations :type obs_var: tf.Tensor :param state_info_vars: a dictionary whose values should contain
information about the state of the policy at the time it received the observation.Parameters: name (str) – Name of the symbolic graph.
-
distribution
¶ Distribution.
-