garage.np.exploration_strategies package¶
Exploration strategies which use NumPy as a numerical backend.
-
class
EpsilonGreedyStrategy
(env_spec, total_timesteps, max_epsilon=1.0, min_epsilon=0.02, decay_ratio=0.1)[source]¶ Bases:
garage.np.exploration_strategies.base.ExplorationStrategy
ϵ-greedy exploration strategy.
Select action based on the value of ϵ. ϵ will decrease from max_epsilon to min_epsilon within decay_ratio * total_timesteps.
At state s, with probability 1 − ϵ: select action = argmax Q(s, a) ϵ : select a random action from an uniform distribution.
Parameters: - env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
- total_timesteps (int) – Total steps in the training, equivalent to max_path_length * n_epochs.
- max_epsilon (float) – The maximum(starting) value of epsilon.
- min_epsilon (float) – The minimum(terminal) value of epsilon.
- decay_ratio (float) – Fraction of total steps for epsilon decay.
-
get_action
(t, observation, policy, **kwargs)[source]¶ Get action from this policy for the input observation.
Parameters: - t – Iteration.
- observation – Observation from the environment.
- policy – Policy network to predict action based on the observation.
Returns: optimal action from this policy.
Return type: opt_action
-
get_actions
(t, observations, policy, **kwargs)[source]¶ Get actions from this policy for the input observations.
Parameters: - t – Iteration.
- observation – Observation from the environment.
- policy – Policy network to predict action based on the observation.
Returns: optimal actions from this policy.
Return type: opt_action
-
class
OUStrategy
(env_spec, mu=0, sigma=0.3, theta=0.15, dt=0.01, x0=None)[source]¶ Bases:
garage.np.exploration_strategies.base.ExplorationStrategy
An OU exploration strategy to add noise to environment actions.
Parameters: - env_spec – Environment for OUStrategy to explore.
- mu – A parameter to simulate the process.
- sigma – A parameter to simulate the process.
- theta – A parameter to simulate the process.
- dt – A parameter to simulate the process.
- x0 – Initial state.
Example
$ python garage/tf/exploration_strategies/ou_strategy.py