garage.np.policies.policy module¶
Base class for policies based on numpy.
-
class
Policy
(env_spec)[source]¶ Bases:
abc.ABC
Base classe for policies based on numpy.
Parameters: env_spec (garage.envs.env_spec.EnvSpec) – Environment specification. -
action_space
¶ The action space for the environment.
Type: akro.Space
-
get_action
(observation)[source]¶ Get action sampled from the policy.
Parameters: observation (np.ndarray) – Observation from the environment. Returns: Action sampled from the policy. Return type: (np.ndarray)
-
log_diagnostics
(paths)[source]¶ Log extra information per iteration based on the collected paths.
Parameters: paths (list[dict]) – A list of collected paths
-
observation_space
¶ The observation space of the environment.
Type: akro.Space
-
reset
(dones=None)[source]¶ Reset the policy.
If dones is None, it will be by default np.array([True]) which implies the policy will not be “vectorized”, i.e. number of parallel environments for training data sampling = 1.
Parameters: dones (numpy.ndarray) – Bool that indicates terminal state(s).
-
-
class
StochasticPolicy
(env_spec)[source]¶ Bases:
garage.np.policies.policy.Policy
Base class for stochastic policies implemented in numpy.
-
dist_info
(obs, state_infos)[source]¶ Return the distribution information about the actions.
Parameters: - obs (np.ndarray) – observation values
- state_infos (dict) – a dictionary whose values should contain information about the state of the policy at the time it received the observation
-
distribution
¶ Get the distribution of the policy.
Returns: The distribution of the policy. Return type: garage.tf.distribution
-