garage.np.exploration_strategies package

Exploration strategies which use NumPy as a numerical backend.

class EpsilonGreedyStrategy(env_spec, total_timesteps, max_epsilon=1.0, min_epsilon=0.02, decay_ratio=0.1)[source]

Bases: garage.np.exploration_strategies.base.ExplorationStrategy

ϵ-greedy exploration strategy.

Select action based on the value of ϵ. ϵ will decrease from max_epsilon to min_epsilon within decay_ratio * total_timesteps.

At state s, with probability 1 − ϵ: select action = argmax Q(s, a) ϵ : select a random action from an uniform distribution.

Parameters:
  • env_spec (garage.envs.env_spec.EnvSpec) – Environment specification.
  • total_timesteps (int) – Total steps in the training, equivalent to max_path_length * n_epochs.
  • max_epsilon (float) – The maximum(starting) value of epsilon.
  • min_epsilon (float) – The minimum(terminal) value of epsilon.
  • decay_ratio (float) – Fraction of total steps for epsilon decay.
get_action(t, observation, policy, **kwargs)[source]

Get action from this policy for the input observation.

Parameters:
  • t – Iteration.
  • observation – Observation from the environment.
  • policy – Policy network to predict action based on the observation.
Returns:

optimal action from this policy.

Return type:

opt_action

get_actions(t, observations, policy, **kwargs)[source]

Get actions from this policy for the input observations.

Parameters:
  • t – Iteration.
  • observation – Observation from the environment.
  • policy – Policy network to predict action based on the observation.
Returns:

optimal actions from this policy.

Return type:

opt_action

class ExplorationStrategy[source]

Bases: object

get_action(t, observation, policy, **kwargs)[source]
get_actions(t, observations, policy, **kwargs)[source]
reset()[source]
class OUStrategy(env_spec, mu=0, sigma=0.3, theta=0.15, dt=0.01, x0=None)[source]

Bases: garage.np.exploration_strategies.base.ExplorationStrategy

An OU exploration strategy to add noise to environment actions.

Parameters:
  • env_spec – Environment for OUStrategy to explore.
  • mu – A parameter to simulate the process.
  • sigma – A parameter to simulate the process.
  • theta – A parameter to simulate the process.
  • dt – A parameter to simulate the process.
  • x0 – Initial state.

Example

$ python garage/tf/exploration_strategies/ou_strategy.py

get_action(t, observation, policy, **kwargs)[source]

Return an action with noise.

Parameters:
  • t – Iteration.
  • observation – Observation from the environment.
  • policy – Policy network to predict action based on the observation.
Returns:

An action with noise explored by OUStrategy.

get_actions(t, observations, policy, **kwargs)[source]
reset()[source]

Reset the state of the exploration.

simulate()[source]

Compute the next state of the exploration.

Returns:Next state of the exploration.
Return type:self.state