garage.np.algos.cma_es module

Covariance Matrix Adaptation Evolution Strategy.

class CMAES(env_spec, policy, baseline, n_samples, discount=0.99, max_path_length=500, sigma0=1.0)[source]

Bases: garage.np.algos.rl_algorithm.RLAlgorithm

Covariance Matrix Adaptation Evolution Strategy.

Note

The CMA-ES method can hardly learn a successful policy even for simple task. It is still maintained here only for consistency with original rllab paper.

Parameters:
  • env_spec (garage.envs.EnvSpec) – Environment specification.
  • policy (garage.np.policies.Policy) – Action policy.
  • baseline (garage.np.baselines.Baseline) – Baseline for GAE (Generalized Advantage Estimation).
  • n_samples (int) – Number of policies sampled in one epoch.
  • discount (float) – Environment reward discount.
  • max_path_length (int) – Maximum length of a single rollout.
  • sigma0 (float) – Initial std for param distribution.
train(runner)[source]

Initialize variables and start training.

Parameters:runner (LocalRunner) – LocalRunner is passed to give algorithm the access to runner.step_epochs(), which provides services such as snapshotting and sampler control.
Returns:The average return in last epoch cycle.
Return type:float
train_once(itr, paths)[source]

Perform one step of policy optimization given one batch of samples.

Parameters:
  • itr (int) – Iteration number.
  • paths (list[dict]) – A list of collected paths.
Returns:

The average return in last epoch cycle.

Return type:

float