garage.np.exploration_strategies.ou_strategy module

This module creates an OU exploration strategy.

Ornstein Uhlenbeck exploration strategy comes from the Ornstein-Uhlenbeck process. It is often used in DDPG algorithm because in continuous control task it is better to have temporally correlated exploration to get smoother transitions. And OU process is relatively smooth in time.

class OUStrategy(env_spec, mu=0, sigma=0.3, theta=0.15, dt=0.01, x0=None)[source]

Bases: garage.np.exploration_strategies.base.ExplorationStrategy

An OU exploration strategy to add noise to environment actions.

Parameters:
  • env_spec – Environment for OUStrategy to explore.
  • mu – A parameter to simulate the process.
  • sigma – A parameter to simulate the process.
  • theta – A parameter to simulate the process.
  • dt – A parameter to simulate the process.
  • x0 – Initial state.

Example

$ python garage/tf/exploration_strategies/ou_strategy.py

get_action(t, observation, policy, **kwargs)[source]

Return an action with noise.

Parameters:
  • t – Iteration.
  • observation – Observation from the environment.
  • policy – Policy network to predict action based on the observation.
Returns:

An action with noise explored by OUStrategy.

get_actions(t, observations, policy, **kwargs)[source]
reset()[source]

Reset the state of the exploration.

simulate()[source]

Compute the next state of the exploration.

Returns:Next state of the exploration.
Return type:self.state