garage.np.exploration_strategies.epsilon_greedy_strategy module¶

ϵ-greedy exploration strategy.

Random exploration according to the value of epsilon.

class EpsilonGreedyStrategy(env_spec, total_timesteps, max_epsilon=1.0, min_epsilon=0.02, decay_ratio=0.1)[source]¶

ϵ-greedy exploration strategy.

Select action based on the value of ϵ. ϵ will decrease from max_epsilon to min_epsilon within decay_ratio * total_timesteps.

At state s, with probability 1 − ϵ: select action = argmax Q(s, a) ϵ : select a random action from an uniform distribution.

Parameters:	env_spec (garage.envs.env_spec.EnvSpec) – Environment specification. total_timesteps (int) – Total steps in the training, equivalent to max_path_length * n_epochs. max_epsilon (float) – The maximum(starting) value of epsilon. min_epsilon (float) – The minimum(terminal) value of epsilon. decay_ratio (float) – Fraction of total steps for epsilon decay.

get_action(t, observation, policy, **kwargs)[source]¶

Get action from this policy for the input observation.

Parameters:	t – Iteration. observation – Observation from the environment. policy – Policy network to predict action based on the observation.
Returns:	optimal action from this policy.
Return type:	opt_action

get_actions(t, observations, policy, **kwargs)[source]¶

Get actions from this policy for the input observations.

Parameters:	t – Iteration. observation – Observation from the environment. policy – Policy network to predict action based on the observation.
Returns:	optimal actions from this policy.
Return type:	opt_action