`garage.np.exploration_policies.epsilon_greedy_policy`¶

ϵ-greedy exploration strategy.

Random exploration according to the value of epsilon.

class EpsilonGreedyPolicy(env_spec, policy, *, total_timesteps, max_epsilon=1.0, min_epsilon=0.02, decay_ratio=0.1)¶

ϵ-greedy exploration strategy.

Select action based on the value of ϵ. ϵ will decrease from max_epsilon to min_epsilon within decay_ratio * total_timesteps.

At state s, with probability 1 − ϵ: select action = argmax Q(s, a) ϵ : select a random action from an uniform distribution.

Parameters:

env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
policy (garage.Policy) – Policy to wrap.
total_timesteps (int) – Total steps in the training, equivalent to max_episode_length * n_epochs.
max_epsilon (float) – The maximum(starting) value of epsilon.
min_epsilon (float) – The minimum(terminal) value of epsilon.
decay_ratio (float) – Fraction of total steps for epsilon decay.

get_action(self, observation)¶

Get action from this policy for the input observation.

Parameters:	observation (numpy.ndarray) – Observation from the environment.
Returns:	An action with noise. dict: Arbitrary policy state information (agent_info).
Return type:	np.ndarray

get_actions(self, observations)¶

Get actions from this policy for the input observations.

Parameters:	observations (numpy.ndarray) – Observation from the environment.
Returns:	Actions with noise. List[dict]: Arbitrary policy state information (agent_info).
Return type:	np.ndarray

reset(self, dones=None)¶

Reset the state of the exploration.

Parameters:	dones (List[bool] or numpy.ndarray or None) – Which vectorization states to reset.

get_param_values(self)¶

Get parameter values.

Returns:	Values of each parameter.
Return type:	list or dict

set_param_values(self, params)¶

Set param values.

Parameters:	params (np.ndarray) – A numpy array of parameter values.

garage.np.exploration_policies.epsilon_greedy_policy¶