garage.tf.policies.discrete_qf_derived_policy module

A Discrete QFunction-derived policy.

This policy chooses the action that yields to the largest Q-value.

class DiscreteQfDerivedPolicy(env_spec, qf, name='DiscreteQfDerivedPolicy')[source]

Bases: garage.tf.policies.policy.Policy

DiscreteQfDerived policy.

Parameters:
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
  • qf (garage.q_functions.QFunction) – The q-function used.
  • name (str) – Name of the policy.
get_action(observation)[source]

Get action from this policy for the input observation.

Parameters:observation (numpy.ndarray) – Observation from environment.
Returns:Single optimal action from this policy. dict: Predicted action and agent information. It returns an empty
dict since there is no parameterization.
Return type:numpy.ndarray
get_actions(observations)[source]

Get actions from this policy for the input observations.

Parameters:observations (numpy.ndarray) – Observations from environment.
Returns:Optimal actions from this policy. dict: Predicted action and agent information. It returns an empty
dict since there is no parameterization.
Return type:numpy.ndarray
vectorized

Vectorized or not.

Returns:True if primitive supports vectorized operations.
Return type:Bool