garage.torch.algos.maml

Model-Agnostic Meta-Learning (MAML) algorithm implementation for RL.

class MAML(inner_algo, env, policy, sampler, task_sampler, meta_optimizer, meta_batch_size=40, inner_lr=0.1, outer_lr=0.001, num_grad_updates=1, meta_evaluator=None, evaluate_every_n_epochs=1)

Model-Agnostic Meta-Learning (MAML).

Parameters
  • inner_algo (garage.torch.algos.VPG) – The inner algorithm used for computing loss.

  • env (Environment) – An environment.

  • policy (garage.torch.policies.Policy) – Policy.

  • sampler (garage.sampler.Sampler) – Sampler.

  • task_sampler (garage.experiment.TaskSampler) – Task sampler.

  • meta_optimizer (Union[torch.optim.Optimizer, tuple]) – Type of optimizer. This can be an optimizer type such as torch.optim.Adam or a tuple of type and dictionary, where dictionary contains arguments to initialize the optimizer e.g. (torch.optim.Adam, {‘lr’ : 1e-3}).

  • meta_batch_size (int) – Number of tasks sampled per batch.

  • inner_lr (float) – Adaptation learning rate.

  • outer_lr (float) – Meta policy learning rate.

  • num_grad_updates (int) – Number of adaptation gradient steps.

  • meta_evaluator (MetaEvaluator) – A meta evaluator for meta-testing. If None, don’t do meta-testing.

  • evaluate_every_n_epochs (int) – Do meta-testing every this epochs.

train(self, trainer)

Obtain samples and start training for each epoch.

Parameters

trainer (Trainer) – Gives the algorithm access to :method:`~Trainer.step_epochs()`, which provides services such as snapshotting and sampler control.

Returns

The average return in last epoch cycle.

Return type

float

property policy(self)

Current policy of the inner algorithm.

Returns

Current policy of the inner

algorithm.

Return type

garage.torch.policies.Policy

get_exploration_policy(self)

Return a policy used before adaptation to a specific task.

Each time it is retrieved, this policy should only be evaluated in one task.

Returns

The policy used to obtain samples that are later used for

meta-RL adaptation.

Return type

Policy

adapt_policy(self, exploration_policy, exploration_episodes)

Adapt the policy by one gradient steps for a task.

Parameters
  • exploration_policy (Policy) – A policy which was returned from get_exploration_policy(), and which generated exploration_episodes by interacting with an environment. The caller may not use this object after passing it into this method.

  • exploration_episodes (EpisodeBatch) – Episodes with which to adapt, generated by exploration_policy exploring the environment.

Returns

A policy adapted to the task represented by the

exploration_episodes.

Return type

Policy