garage.np.policies.base module

Base class for policies based on numpy.

class Policy(env_spec)[source]

Bases: abc.ABC

Base classe for policies based on numpy.

Parameters:env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
action_space

The action space for the environment.

Type:akro.Space
get_action(observation)[source]

Get action sampled from the policy.

Parameters:observation (np.ndarray) – Observation from the environment.
Returns:Action sampled from the policy.
Return type:(np.ndarray)
log_diagnostics(paths)[source]

Log extra information per iteration based on the collected paths.

Parameters:paths (list[dict]) – A list of collected paths
observation_space

The observation space of the environment.

Type:akro.Space
recurrent

Indicate whether the policy is recurrent.

Returns:True if policy is recurrent, False otherwise.
Return type:bool
reset(dones=None)[source]

Reset the policy.

If dones is None, it will be by default np.array([True]) which implies the policy will not be “vectorized”, i.e. number of parallel environments for training data sampling = 1.

Parameters:dones (numpy.ndarray) – Bool that indicates terminal state(s).
state_info_keys

Get keys describing policy’s state.

Returns:keys for the information related to the policy’s state when taking an action.
Return type:List[str]
terminate()[source]

Clean up operation.

class StochasticPolicy(env_spec)[source]

Bases: garage.np.policies.base.Policy

Base class for stochastic policies implemented in numpy.

dist_info(obs, state_infos)[source]

Return the distribution information about the actions.

Parameters:
  • obs (np.ndarray) – observation values
  • state_infos (dict) – a dictionary whose values should contain information about the state of the policy at the time it received the observation
distribution

Get the distribution of the policy.

Returns:The distribution of the policy.
Return type:garage.tf.distribution