garage.torch.policies.discrete_qf_argmax_policy
¶
A Discrete QFunctionderived policy.
This policy chooses the action that yields to the largest Qvalue.

class
DiscreteQFArgmaxPolicy
(qf, env_spec, name='DiscreteQFArgmaxPolicy')¶ Bases:
garage.torch.policies.policy.Policy
Policy that derives its actions from a learned Q function.
The action returned is the one that yields the highest Q value for a given state, as determined by the supplied Q function.
 Parameters

forward
(self, observations)¶ Get actions corresponding to a batch of observations.
 Parameters
observations (torch.Tensor) – Batch of observations of shape \((N, O)\). Observations should be flattened even if they are images as the underlying Q network handles unflattening.
 Returns
Batch of actions of shape \((N, A)\)
 Return type
torch.Tensor

get_action
(self, observation)¶ Get a single action given an observation.
 Parameters
observation (np.ndarray) – Observation with shape \((O, )\).
 Returns
Predicted action with shape \((A, )\). dict: Empty since this policy does not produce a distribution.
 Return type
torch.Tensor

get_actions
(self, observations)¶ Get actions given observations.
 Parameters
observations (np.ndarray) – Batch of observations, should have shape \((N, O)\).
 Returns
Predicted actions. Tensor has shape \((N, A)\). dict: Empty since this policy does not produce a distribution.
 Return type
torch.Tensor

get_param_values
(self)¶ Get the parameters to the policy.
This method is included to ensure consistency with TF policies.
 Returns
The parameters (in the form of the state dictionary).
 Return type

set_param_values
(self, state_dict)¶ Set the parameters to the policy.
This method is included to ensure consistency with TF policies.
 Parameters
state_dict (dict) – State dictionary.

reset
(self, do_resets=None)¶ Reset the policy.
This is effective only to recurrent policies.
do_resets is an array of boolean indicating which internal states to be reset. The length of do_resets should be equal to the length of inputs, i.e. batch size.
 Parameters
do_resets (numpy.ndarray) – Bool array indicating which states to be reset.

property
env_spec
(self)¶ Policy environment specification.
 Returns
Environment specification.
 Return type
garage.EnvSpec

property
observation_space
(self)¶ Observation space.
 Returns
The observation space of the environment.
 Return type
akro.Space

property
action_space
(self)¶ Action space.
 Returns
The action space of the environment.
 Return type
akro.Space