Implementation of Behavioral Cloning in PyTorch.

class BC(env_spec, learner, *, batch_size, source=None, sampler=None, policy_optimizer=torch.optim.Adam, policy_lr=_Default(0.001), loss='log_prob', minibatches_per_epoch=16, name='BC')


Inheritance diagram of garage.torch.algos.bc.BC

Behavioral Cloning.

Based on Model-Free Imitation Learning with Policy Optimization:

  • env_spec (EnvSpec) – Specification of environment.

  • learner (garage.torch.Policy) – Policy to train.

  • batch_size (int) – Size of optimization batch.

  • source (Policy or Generator[TimeStepBatch]) – Expert to clone. If a policy is passed, will set .policy to source and use the trainer to sample from the policy.

  • sampler (garage.sampler.Sampler) – Sampler. If source is a policy, a sampler is required for sampling.

  • policy_optimizer (torch.optim.Optimizer) – Optimizer to be used to optimize the policy.

  • policy_lr (float) – Learning rate of the policy optimizer.

  • loss (str) – Which loss function to use. Must be either ‘log_prob’ or ‘mse’. If set to ‘log_prob’ (the default), learner must be a garage.torch.StochasticPolicy.

  • minibatches_per_epoch (int) – Number of minibatches per epoch.

  • name (str) – Name to use for logging.


ValueError – If learner` is not a garage.torch.StochasticPolicy and loss is ‘log_prob’.


Obtain samplers and start actual training for each epoch.


trainer (Trainer) – Experiment trainer, for services such as snapshotting and sampler control.