garage.np.policies package

Policies which use NumPy as a numerical backend.

class FixedPolicy(env_spec, scripted_actions, agent_infos=None)[source]

Bases: garage.np.policies.policy.Policy

Policy that performs a fixed sequence of actions.

Parameters:
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
  • scripted_actions (list[np.ndarray] or np.ndarray) – Sequence of actions to perform.
  • agent_infos (list[dict[str, np.ndarray]] or None) – Sequence of agent_infos to produce.
get_action(observation)[source]

Get next action.

Parameters:observation (np.ndarray) – Ignored.
Raises:ValueError – If policy is currently vectorized (reset was called with more than one done value).
Returns:
The action and agent_info
for this time step.
Return type:tuple[np.ndarray, dict[str, np.ndarray]]
get_actions(observations)[source]

Get next action.

Parameters:observations (np.ndarray) – Ignored.
Raises:ValueError – If observations has length greater than 1.
Returns:
The action and agent_info
for this time step.
Return type:tuple[np.ndarray, dict[str, np.ndarray]]
get_param_values()[source]

Return policy params (there are none).

Returns:Empty tuple.
Return type:tuple
reset(dones=None)[source]

Reset policy.

Parameters:dones (None or list[bool]) – Vectorized policy states to reset.
Raises:ValueError – If dones has length greater than 1.
set_param_values(params)[source]

Set param values of policy.

Parameters:params (object) – Ignored.
class Policy(env_spec)[source]

Bases: abc.ABC

Base classe for policies based on numpy.

Parameters:env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
action_space

The action space for the environment.

Type:akro.Space
get_action(observation)[source]

Get action sampled from the policy.

Parameters:observation (np.ndarray) – Observation from the environment.
Returns:Action sampled from the policy.
Return type:(np.ndarray)
log_diagnostics(paths)[source]

Log extra information per iteration based on the collected paths.

Parameters:paths (list[dict]) – A list of collected paths
observation_space

The observation space of the environment.

Type:akro.Space
reset(dones=None)[source]

Reset the policy.

If dones is None, it will be by default np.array([True]) which implies the policy will not be “vectorized”, i.e. number of parallel environments for training data sampling = 1.

Parameters:dones (numpy.ndarray) – Bool that indicates terminal state(s).
state_info_keys

Get keys describing policy’s state.

Returns:keys for the information related to the policy’s state when taking an action.
Return type:List[str]
terminate()[source]

Clean up operation.

class StochasticPolicy(env_spec)[source]

Bases: garage.np.policies.policy.Policy

Base class for stochastic policies implemented in numpy.

dist_info(obs, state_infos)[source]

Return the distribution information about the actions.

Parameters:
  • obs (np.ndarray) – observation values
  • state_infos (dict) – a dictionary whose values should contain information about the state of the policy at the time it received the observation
distribution

Get the distribution of the policy.

Returns:The distribution of the policy.
Return type:garage.tf.distribution
class ScriptedPolicy(scripted_actions, agent_env_infos=None)[source]

Bases: object

Simulates a garage policy object.

Parameters:
  • scripted actions (-) – data structure indexed by obervation, returns a corresponding action
  • agent_env_infos (-) – data structure indexed by obervation, returns a corresponding agent_env_info
get_action(obs)[source]

Return action sampled from the policy.

get_actions(obses)[source]

Return ACTIONS sampled from the policy.

get_param_values()[source]

Return policy params as a list.

reset(dones=None)[source]

Reset Policy to initial state.

set_param_values(params)[source]

Set param values of policy.