garage.np.exploration_policies.epsilon_greedy_policy
¶
ϵ-greedy exploration strategy.
Random exploration according to the value of epsilon.
-
class
EpsilonGreedyPolicy
(env_spec, policy, *, total_timesteps, max_epsilon=1.0, min_epsilon=0.02, decay_ratio=0.1)¶ Bases:
garage.np.exploration_policies.exploration_policy.ExplorationPolicy
ϵ-greedy exploration strategy.
Select action based on the value of ϵ. ϵ will decrease from max_epsilon to min_epsilon within decay_ratio * total_timesteps.
At state s, with probability 1 − ϵ: select action = argmax Q(s, a) ϵ : select a random action from an uniform distribution.
Parameters: - env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
- policy (garage.Policy) – Policy to wrap.
- total_timesteps (int) – Total steps in the training, equivalent to max_episode_length * n_epochs.
- max_epsilon (float) – The maximum(starting) value of epsilon.
- min_epsilon (float) – The minimum(terminal) value of epsilon.
- decay_ratio (float) – Fraction of total steps for epsilon decay.
-
get_action
(self, observation)¶ Get action from this policy for the input observation.
Parameters: observation (numpy.ndarray) – Observation from the environment. Returns: An action with noise. dict: Arbitrary policy state information (agent_info). Return type: np.ndarray
-
get_actions
(self, observations)¶ Get actions from this policy for the input observations.
Parameters: observations (numpy.ndarray) – Observation from the environment. Returns: Actions with noise. List[dict]: Arbitrary policy state information (agent_info). Return type: np.ndarray
-
reset
(self, dones=None)¶ Reset the state of the exploration.
Parameters: dones (List[bool] or numpy.ndarray or None) – Which vectorization states to reset.
-
get_param_values
(self)¶ Get parameter values.
Returns: Values of each parameter. Return type: list or dict
-
set_param_values
(self, params)¶ Set param values.
Parameters: params (np.ndarray) – A numpy array of parameter values.