pearl_half_cheetah_vel

PEARL HalfCheetahVel example.

pearl_half_cheetah_vel(ctxt=None, seed=1, num_epochs=500, num_train_tasks=100, num_test_tasks=100, latent_size=5, encoder_hidden_size=200, net_size=300, meta_batch_size=16, num_steps_per_epoch=2000, num_initial_steps=2000, num_tasks_sample=5, num_steps_prior=400, num_extra_rl_steps_posterior=600, batch_size=256, embedding_batch_size=100, embedding_mini_batch_size=100, max_episode_length=200, reward_scale=5.0, use_gpu=False)

Train PEARL with HalfCheetahVel environment.

Parameters
  • ctxt (garage.experiment.ExperimentContext) – The experiment configuration used by Trainer to create the snapshotter.

  • seed (int) – Used to seed the random number generator to produce determinism.

  • num_epochs (int) – Number of training epochs.

  • num_train_tasks (int) – Number of tasks for training.

  • num_test_tasks (int) – Number of tasks to use for testing.

  • latent_size (int) – Size of latent context vector.

  • encoder_hidden_size (int) – Output dimension of dense layer of the context encoder.

  • net_size (int) – Output dimension of a dense layer of Q-function and value function.

  • meta_batch_size (int) – Meta batch size.

  • num_steps_per_epoch (int) – Number of iterations per epoch.

  • num_initial_steps (int) – Number of transitions obtained per task before training.

  • num_tasks_sample (int) – Number of random tasks to obtain data for each iteration.

  • num_steps_prior (int) – Number of transitions to obtain per task with z ~ prior.

  • num_extra_rl_steps_posterior (int) – Number of additional transitions to obtain per task with z ~ posterior that are only used to train the policy and NOT the encoder.

  • batch_size (int) – Number of transitions in RL batch.

  • embedding_batch_size (int) – Number of transitions in context batch.

  • embedding_mini_batch_size (int) – Number of transitions in mini context batch; should be same as embedding_batch_size for non-recurrent encoder.

  • max_episode_length (int) – Maximum episode length.

  • reward_scale (int) – Reward scale.

  • use_gpu (bool) – Whether or not to use GPU for training.