garage.experiment.local_runner module

Provides algorithms with access to most of garage’s features.

class LocalRunner(snapshot_config, max_cpus=1)[source]

Bases: object

Base class of local runner.

Use Runner.setup(algo, env) to setup algorithm and environement for runner and Runner.train() to start training.

Parameters:
  • snapshot_config (garage.experiment.SnapshotConfig) – The snapshot configuration used by LocalRunner to create the snapshotter. If None, it will create one with default settings.
  • max_cpus (int) – The maximum number of parallel sampler workers.

Note

For the use of any TensorFlow environments, policies and algorithms, please use LocalTFRunner().

Examples

# to train
runner = LocalRunner()
env = Env(…)
policy = Policy(…)
algo = Algo(
env=env,
policy=policy,
…)
runner.setup(algo, env)
runner.train(n_epochs=100, batch_size=4000)
# to resume immediately.
runner = LocalRunner()
runner.restore(resume_from_dir)
runner.resume()
# to resume with modified training arguments.
runner = LocalRunner()
runner.restore(resume_from_dir)
runner.resume(n_epochs=20)
log_diagnostics(pause_for_plot=False)[source]

Log diagnostics.

Parameters:pause_for_plot (bool) – Pause for plot.
obtain_samples(itr, batch_size=None)[source]

Obtain one batch of samples.

Parameters:
  • itr (int) – Index of iteration (epoch).
  • batch_size (int) – Number of steps in batch. This is a hint that the sampler may or may not respect.
Returns:

One batch of samples.

restore(from_dir, from_epoch='last')[source]

Restore experiment from snapshot.

Parameters:
  • from_dir (str) – Directory of the pickle file to resume experiment from.
  • from_epoch (str or int) – The epoch to restore from. Can be ‘first’, ‘last’ or a number. Not applicable when snapshot_mode=’last’.
Returns:

A SimpleNamespace for train()’s arguments.

resume(n_epochs=None, batch_size=None, n_epoch_cycles=None, plot=None, store_paths=None, pause_for_plot=None)[source]

Resume from restored experiment.

This method provides the same interface as train().

If not specified, an argument will default to the saved arguments from the last call to train().

Returns:The average return in last epoch cycle.
save(epoch, paths=None)[source]

Save snapshot of current batch.

Parameters:
  • itr (int) – Index of iteration (epoch).
  • paths (dict) – Batch of samples after preprocessed. If None, no paths will be logged to the snapshot.
setup(algo, env, sampler_cls=None, sampler_args=None)[source]

Set up runner for algorithm and environment.

This method saves algo and env within runner and creates a sampler.

Note

After setup() is called all variables in session should have been initialized. setup() respects existing values in session so policy weights can be loaded before setup().

Parameters:
  • algo (garage.np.algos.RLAlgorithm) – An algorithm instance.
  • env (garage.envs.GarageEnv) – An environement instance.
  • sampler_cls (garage.sampler.Sampler) – A sampler class.
  • sampler_args (dict) – Arguments to be passed to sampler constructor.
step_epochs()[source]

Step through each epoch.

This function returns a magic generator. When iterated through, this generator automatically performs services such as snapshotting and log management. It is used inside train() in each algorithm.

The generator initializes two variables: self.step_itr and self.step_path. To use the generator, these two have to be updated manually in each epoch, as the example shows below.

Yields:int – The next training epoch.

Examples

for epoch in runner.step_epochs():
runner.step_path = runner.obtain_samples(…) self.train_once(…) runner.step_itr += 1
train(n_epochs, batch_size, n_epoch_cycles=1, plot=False, store_paths=False, pause_for_plot=False)[source]

Start training.

Parameters:
  • n_epochs (int) – Number of epochs.
  • batch_size (int) – Number of environment steps in one batch.
  • n_epoch_cycles (int) – Number of batches of samples in each epoch. This is only useful for off-policy algorithm. For on-policy algorithm this value should always be 1.
  • plot (bool) – Visualize policy by doing rollout after each epoch.
  • store_paths (bool) – Save paths in snapshot.
  • pause_for_plot (bool) – Pause for plot.
Returns:

The average return in last epoch cycle.