garage.torch.algos.maml
¶
Model-Agnostic Meta-Learning (MAML) algorithm implementation for RL.
-
class
MAML
(inner_algo, env, policy, task_sampler, meta_optimizer, meta_batch_size=40, inner_lr=0.1, outer_lr=0.001, num_grad_updates=1, meta_evaluator=None, evaluate_every_n_epochs=1)¶ Model-Agnostic Meta-Learning (MAML).
- Parameters
inner_algo (garage.torch.algos.VPG) – The inner algorithm used for computing loss.
env (Environment) – An environment.
policy (garage.torch.policies.Policy) – Policy.
task_sampler (garage.experiment.TaskSampler) – Task sampler.
meta_optimizer (Union[torch.optim.Optimizer, tuple]) – Type of optimizer. This can be an optimizer type such as torch.optim.Adam or a tuple of type and dictionary, where dictionary contains arguments to initialize the optimizer e.g. (torch.optim.Adam, {‘lr’ : 1e-3}).
meta_batch_size (int) – Number of tasks sampled per batch.
inner_lr (float) – Adaptation learning rate.
outer_lr (float) – Meta policy learning rate.
num_grad_updates (int) – Number of adaptation gradient steps.
meta_evaluator (MetaEvaluator) – A meta evaluator for meta-testing. If None, don’t do meta-testing.
evaluate_every_n_epochs (int) – Do meta-testing every this epochs.
-
train
(self, trainer)¶ Obtain samples and start training for each epoch.
- Parameters
trainer (Trainer) – Gives the algorithm access to :method:`~Trainer.step_epochs()`, which provides services such as snapshotting and sampler control.
- Returns
The average return in last epoch cycle.
- Return type
-
property
policy
(self)¶ Current policy of the inner algorithm.
- Returns
- Current policy of the inner
algorithm.
- Return type
-
get_exploration_policy
(self)¶ Return a policy used before adaptation to a specific task.
Each time it is retrieved, this policy should only be evaluated in one task.
- Returns
- The policy used to obtain samples that are later used for
meta-RL adaptation.
- Return type
-
adapt_policy
(self, exploration_policy, exploration_episodes)¶ Adapt the policy by one gradient steps for a task.
- Parameters
exploration_policy (Policy) – A policy which was returned from get_exploration_policy(), and which generated exploration_episodes by interacting with an environment. The caller may not use this object after passing it into this method.
exploration_episodes (EpisodeBatch) – Episodes with which to adapt, generated by exploration_policy exploring the environment.
- Returns
- A policy adapted to the task represented by the
exploration_episodes.
- Return type