`garage.np.algos.cma_es`¶

Covariance Matrix Adaptation Evolution Strategy.

class CMAES(env_spec, policy, baseline, n_samples, discount=0.99, sigma0=1.0)¶

Bases: garage.np.algos.rl_algorithm.RLAlgorithm

Inheritance diagram of garage.np.algos.cma_es.CMAES

Covariance Matrix Adaptation Evolution Strategy.

Note

The CMA-ES method can hardly learn a successful policy even for simple task. It is still maintained here only for consistency with original rllab paper.

Parameters

env_spec (EnvSpec) – Environment specification.
policy (garage.np.policies.Policy) – Action policy.
baseline (garage.np.baselines.Baseline) – Baseline for GAE (Generalized Advantage Estimation).
n_samples (int) – Number of policies sampled in one epoch.
discount (float) – Environment reward discount.
sigma0 (float) – Initial std for param distribution.

train(self, trainer)¶

Initialize variables and start training.

Parameters: trainer (Trainer) – Trainer is passed to give algorithm the access to trainer.step_epochs(), which provides services such as snapshotting and sampler control.
Returns: The average return in last epoch cycle.
Return type: float

train_once(self, itr, paths)¶

Perform one step of policy optimization given one batch of samples.

Parameters

itr (int) – Iteration number.
paths (list[dict]) – A list of collected paths.

Returns

The average return in last epoch cycle.

Return type

float

garage.np.algos.cma_es¶

`garage.np.algos.cma_es`¶