garage.np.policies package¶

Policies which use NumPy as a numerical backend.

class FixedPolicy(env_spec, scripted_actions, agent_infos=None)[source]¶

Bases: garage.np.policies.policy.Policy

Policy that performs a fixed sequence of actions.

Parameters:	env_spec (garage.envs.env_spec.EnvSpec) – Environment specification. scripted_actions (list[np.ndarray] or np.ndarray) – Sequence of actions to perform. agent_infos (list[dict[str, np.ndarray]] or None) – Sequence of agent_infos to produce.

get_action(observation)[source]¶

Get next action.

Parameters:	observation (np.ndarray) – Ignored.
Raises:	`ValueError` – If policy is currently vectorized (reset was called with more than one done value).
Returns:	The action and agent_info for this time step.
Return type:	tuple[np.ndarray, dict[str, np.ndarray]]

get_actions(observations)[source]¶

Get next action.

Parameters:	observations (np.ndarray) – Ignored.
Raises:	`ValueError` – If observations has length greater than 1.
Returns:	The action and agent_info for this time step.
Return type:	tuple[np.ndarray, dict[str, np.ndarray]]

get_param_values()[source]¶

Return policy params (there are none).

Returns:	Empty tuple.
Return type:	tuple

reset(dones=None)[source]¶

Reset policy.

Parameters:	dones (None or list[bool]) – Vectorized policy states to reset.
Raises:	`ValueError` – If dones has length greater than 1.

set_param_values(params)[source]¶

Set param values of policy.

Parameters:	params (object) – Ignored.

class Policy(env_spec)[source]¶

Bases: abc.ABC

Base classe for policies based on numpy.

Parameters:	env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.

action_space¶

The action space for the environment.

Type:	akro.Space

get_action(observation)[source]¶

Get action sampled from the policy.

Parameters:	observation (np.ndarray) – Observation from the environment.
Returns:	Action sampled from the policy.
Return type:	(np.ndarray)

log_diagnostics(paths)[source]¶

Log extra information per iteration based on the collected paths.

Parameters:	paths (list[dict]) – A list of collected paths

observation_space¶

The observation space of the environment.

Type:	akro.Space

reset(dones=None)[source]¶

Reset the policy.

If dones is None, it will be by default np.array([True]) which implies the policy will not be “vectorized”, i.e. number of parallel environments for training data sampling = 1.

Parameters:	dones (numpy.ndarray) – Bool that indicates terminal state(s).

state_info_keys¶

Get keys describing policy’s state.

Returns:	keys for the information related to the policy’s state when taking an action.
Return type:	List[str]

terminate()[source]¶: Clean up operation.

class StochasticPolicy(env_spec)[source]¶

Bases: garage.np.policies.policy.Policy

Base class for stochastic policies implemented in numpy.

dist_info(obs, state_infos)[source]¶

Return the distribution information about the actions.

Parameters:	obs (np.ndarray) – observation values state_infos (dict) – a dictionary whose values should contain information about the state of the policy at the time it received the observation

distribution¶

Get the distribution of the policy.

Returns:	The distribution of the policy.
Return type:	garage.tf.distribution

class ScriptedPolicy(scripted_actions, agent_env_infos=None)[source]¶

Bases: object

Simulates a garage policy object.

Parameters:	scripted actions (-) – data structure indexed by obervation, returns a corresponding action agent_env_infos (-) – data structure indexed by obervation, returns a corresponding agent_env_info

get_action(obs)[source]¶: Return action sampled from the policy.

get_actions(obses)[source]¶: Return ACTIONS sampled from the policy.

get_param_values()[source]¶: Return policy params as a list.

reset(dones=None)[source]¶: Reset Policy to initial state.

set_param_values(params)[source]¶: Set param values of policy.

garage.np.policies package¶

Submodules¶