pearl_metaworld_ml45

PEARL ML45 example.

pearl_metaworld_ml45(ctxt=None, seed=1, num_epochs=1000, num_train_tasks=45, latent_size=7, encoder_hidden_size=200, net_size=300, meta_batch_size=16, num_steps_per_epoch=4000, num_initial_steps=4000, num_tasks_sample=15, num_steps_prior=750, num_extra_rl_steps_posterior=750, batch_size=256, embedding_batch_size=64, embedding_mini_batch_size=64, reward_scale=10.0, use_gpu=False)

Train PEARL with ML45 environments.

Parameters
  • ctxt (garage.experiment.ExperimentContext) – The experiment configuration used by Trainer to create the snapshotter.

  • seed (int) – Used to seed the random number generator to produce determinism.

  • num_epochs (int) – Number of training epochs.

  • num_train_tasks (int) – Number of tasks for training.

  • latent_size (int) – Size of latent context vector.

  • encoder_hidden_size (int) – Output dimension of dense layer of the context encoder.

  • net_size (int) – Output dimension of a dense layer of Q-function and value function.

  • meta_batch_size (int) – Meta batch size.

  • num_steps_per_epoch (int) – Number of iterations per epoch.

  • num_initial_steps (int) – Number of transitions obtained per task before training.

  • num_tasks_sample (int) – Number of random tasks to obtain data for each iteration.

  • num_steps_prior (int) – Number of transitions to obtain per task with z ~ prior.

  • num_extra_rl_steps_posterior (int) – Number of additional transitions to obtain per task with z ~ posterior that are only used to train the policy and NOT the encoder.

  • batch_size (int) – Number of transitions in RL batch.

  • embedding_batch_size (int) – Number of transitions in context batch.

  • embedding_mini_batch_size (int) – Number of transitions in mini context batch; should be same as embedding_batch_size for non-recurrent encoder.

  • reward_scale (int) – Reward scale.

  • use_gpu (bool) – Whether or not to use GPU for training.