garage.np.algos

Reinforcement learning algorithms which use NumPy as a numerical backend.

class CEM(env_spec, policy, sampler, n_samples, discount=0.99, init_std=1, best_frac=0.05, extra_std=1.0, extra_decay_time=100)

Bases: garage.np.algos.rl_algorithm.RLAlgorithm

Inheritance diagram of garage.np.algos.CEM

Cross Entropy Method.

CEM works by iteratively optimizing a gaussian distribution of policy.

In each epoch, CEM does the following: 1. Sample n_samples policies from a gaussian distribution of

mean cur_mean and std cur_std.

  1. Collect episodes for each policy.

  2. Update cur_mean and cur_std by doing Maximum Likelihood Estimation over the n_best top policies in terms of return.

Parameters
  • env_spec (EnvSpec) – Environment specification.

  • policy (garage.np.policies.Policy) – Action policy.

  • sampler (garage.sampler.Sampler) – Sampler.

  • n_samples (int) – Number of policies sampled in one epoch.

  • discount (float) – Environment reward discount.

  • best_frac (float) – The best fraction.

  • init_std (float) – Initial std for policy param distribution.

  • extra_std (float) – Decaying std added to param distribution.

  • extra_decay_time (float) – Epochs that it takes to decay extra std.

train(self, trainer)

Initialize variables and start training.

Parameters

trainer (Trainer) – Experiment trainer, which provides services such as snapshotting and sampler control.

Returns

The average return in last epoch cycle.

Return type

float

class CMAES(env_spec, policy, sampler, n_samples, discount=0.99, sigma0=1.0)

Bases: garage.np.algos.rl_algorithm.RLAlgorithm

Inheritance diagram of garage.np.algos.CMAES

Covariance Matrix Adaptation Evolution Strategy.

Note

The CMA-ES method can hardly learn a successful policy even for simple task. It is still maintained here only for consistency with original rllab paper.

Parameters
train(self, trainer)

Initialize variables and start training.

Parameters

trainer (Trainer) – Trainer is passed to give algorithm the access to trainer.step_epochs(), which provides services such as snapshotting and sampler control.

Returns

The average return in last epoch cycle.

Return type

float

class MetaRLAlgorithm

Bases: garage.np.algos.rl_algorithm.RLAlgorithm, abc.ABC

Inheritance diagram of garage.np.algos.MetaRLAlgorithm

Base class for Meta-RL Algorithms.

abstract get_exploration_policy(self)

Return a policy used before adaptation to a specific task.

Each time it is retrieved, this policy should only be evaluated in one task.

Returns

The policy used to obtain samples, which are later used for

meta-RL adaptation.

Return type

Policy

abstract adapt_policy(self, exploration_policy, exploration_episodes)

Produce a policy adapted for a task.

Parameters
  • exploration_policy (Policy) – A policy which was returned from get_exploration_policy(), and which generated exploration_trajectories by interacting with an environment. The caller may not use this object after passing it into this method.

  • exploration_episodes (EpisodeBatch) – Episodes with which to adapt. These are generated by exploration_policy while exploring the environment.

Returns

A policy adapted to the task represented by the

exploration_episodes.

Return type

Policy

abstract train(self, trainer)

Obtain samplers and start actual training for each epoch.

Parameters

trainer (Trainer) – Trainer is passed to give algorithm the access to trainer.step_epochs(), which provides services such as snapshotting and sampler control.

class NOP

Bases: garage.np.algos.rl_algorithm.RLAlgorithm

Inheritance diagram of garage.np.algos.NOP

NOP (no optimization performed) policy search algorithm.

init_opt(self)

Initialize the optimization procedure.

optimize_policy(self, paths)

Optimize the policy using the samples.

Parameters

paths (list[dict]) – A list of collected paths.

train(self, trainer)

Obtain samplers and start actual training for each epoch.

Parameters

trainer (Trainer) – Trainer is passed to give algorithm the access to trainer.step_epochs(), which provides services such as snapshotting and sampler control.

class RLAlgorithm

Bases: abc.ABC

Inheritance diagram of garage.np.algos.RLAlgorithm

Base class for all the algorithms.

Note

If the field sampler_cls exists, it will be by Trainer.setup to initialize a sampler.

abstract train(self, trainer)

Obtain samplers and start actual training for each epoch.

Parameters

trainer (Trainer) – Trainer is passed to give algorithm the access to trainer.step_epochs(), which provides services such as snapshotting and sampler control.