garage.np.algos.cma_es
¶
Covariance Matrix Adaptation Evolution Strategy.
-
class
CMAES
(env_spec, policy, baseline, n_samples, discount=0.99, max_episode_length=500, sigma0=1.0)¶ Bases:
garage.np.algos.rl_algorithm.RLAlgorithm
Covariance Matrix Adaptation Evolution Strategy.
Note
The CMA-ES method can hardly learn a successful policy even for simple task. It is still maintained here only for consistency with original rllab paper.
Parameters: - env_spec (EnvSpec) – Environment specification.
- policy (garage.np.policies.Policy) – Action policy.
- baseline (garage.np.baselines.Baseline) – Baseline for GAE (Generalized Advantage Estimation).
- n_samples (int) – Number of policies sampled in one epoch.
- discount (float) – Environment reward discount.
- max_episode_length (int) – Maximum length of a single episode.
- sigma0 (float) – Initial std for param distribution.
-
train
(self, runner)¶ Initialize variables and start training.
Parameters: runner (LocalRunner) – LocalRunner is passed to give algorithm the access to runner.step_epochs(), which provides services such as snapshotting and sampler control. Returns: The average return in last epoch cycle. Return type: float