garage.np.policies.base module¶

Base class for policies based on numpy.

class Policy(env_spec)[source]¶

Bases: abc.ABC

Base classe for policies based on numpy.

Parameters:	env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.

action_space¶

The action space for the environment.

Type:	akro.Space

get_action(observation)[source]¶

Get action sampled from the policy.

Parameters:	observation (np.ndarray) – Observation from the environment.
Returns:	Action sampled from the policy.
Return type:	(np.ndarray)

log_diagnostics(paths)[source]¶

Log extra information per iteration based on the collected paths.

Parameters:	paths (list[dict]) – A list of collected paths

observation_space¶

The observation space of the environment.

Type:	akro.Space

recurrent¶

Indicate whether the policy is recurrent.

Returns:	True if policy is recurrent, False otherwise.
Return type:	bool

reset(dones=None)[source]¶

Reset the policy.

If dones is None, it will be by default np.array([True]) which implies the policy will not be “vectorized”, i.e. number of parallel environments for training data sampling = 1.

Parameters:	dones (numpy.ndarray) – Bool that indicates terminal state(s).

state_info_keys¶

Get keys describing policy’s state.

Returns:	keys for the information related to the policy’s state when taking an action.
Return type:	List[str]

terminate()[source]¶: Clean up operation.

class StochasticPolicy(env_spec)[source]¶

Bases: garage.np.policies.base.Policy

Base class for stochastic policies implemented in numpy.

dist_info(obs, state_infos)[source]¶

Return the distribution information about the actions.

Parameters:	obs (np.ndarray) – observation values state_infos (dict) – a dictionary whose values should contain information about the state of the policy at the time it received the observation

distribution¶

Get the distribution of the policy.

Returns:	The distribution of the policy.
Return type:	garage.tf.distribution