garage.torch.algos.bc
¶
Implementation of Behavioral Cloning in PyTorch.
-
class
BC
(env_spec, learner, *, batch_size, source=None, max_episode_length=None, policy_optimizer=torch.optim.Adam, policy_lr=_Default(0.001), loss='log_prob', minibatches_per_epoch=16, name='BC')¶ Bases:
garage.np.algos.rl_algorithm.RLAlgorithm
Behavioral Cloning.
- Based on Model-Free Imitation Learning with Policy Optimization:
- https://arxiv.org/abs/1605.08478
Parameters: - env_spec (EnvSpec) – Specification of environment.
- learner (garage.torch.Policy) – Policy to train.
- batch_size (int) – Size of optimization batch.
- source (Policy or Generator[TimeStepBatch]) – Expert to clone. If a policy is passed, will set .policy to source and use the runner to sample from the policy.
- max_episode_length (int or None) – Required if a policy is passed as source.
- policy_optimizer (torch.optim.Optimizer) – Optimizer to be used to optimize the policy.
- policy_lr (float) – Learning rate of the policy optimizer.
- loss (str) – Which loss function to use. Must be either ‘log_prob’ or ‘mse’. If set to ‘log_prob’ (the default), learner must be a garage.torch.StochasticPolicy.
- minibatches_per_epoch (int) – Number of minibatches per epoch.
- name (str) – Name to use for logging.
Raises: ValueError
– If source is a garage.Policy and max_episode_length is not passed or learner is not a garage.torch.StochasticPolicy and loss is ‘log_prob’.-
train
(self, runner)¶ Obtain samplers and start actual training for each epoch.
Parameters: runner (LocalRunner) – Experiment runner, for services such as snapshotting and sampler control.