garage.experiment.local_runner module¶

Provides algorithms with access to most of garage’s features.

class LocalRunner(snapshot_config, max_cpus=1)[source]¶

Bases: object

Base class of local runner.

Use Runner.setup(algo, env) to setup algorithm and environement for runner and Runner.train() to start training.

Parameters:	snapshot_config (garage.experiment.SnapshotConfig) – The snapshot configuration used by LocalRunner to create the snapshotter. If None, it will create one with default settings. max_cpus (int) – The maximum number of parallel sampler workers.

Note

For the use of any TensorFlow environments, policies and algorithms, please use LocalTFRunner().

Examples

# to train
runner = LocalRunner()
env = Env(…)
policy = Policy(…)
algo = Algo(
env=env,
policy=policy,
…)
runner.setup(algo, env)
runner.train(n_epochs=100, batch_size=4000)

# to resume immediately.
runner = LocalRunner()
runner.restore(resume_from_dir)
runner.resume()

# to resume with modified training arguments.
runner = LocalRunner()
runner.restore(resume_from_dir)
runner.resume(n_epochs=20)

log_diagnostics(pause_for_plot=False)[source]¶

Log diagnostics.

Parameters:	pause_for_plot (bool) – Pause for plot.

obtain_samples(itr, batch_size=None)[source]¶

Obtain one batch of samples.

Parameters:	itr (int) – Index of iteration (epoch). batch_size (int) – Number of steps in batch. This is a hint that the sampler may or may not respect.
Returns:	One batch of samples.

restore(from_dir, from_epoch='last')[source]¶

Restore experiment from snapshot.

Parameters:	from_dir (str) – Directory of the pickle file to resume experiment from. from_epoch (str or int) – The epoch to restore from. Can be ‘first’, ‘last’ or a number. Not applicable when snapshot_mode=’last’.
Returns:	A SimpleNamespace for train()’s arguments.

resume(n_epochs=None, batch_size=None, n_epoch_cycles=None, plot=None, store_paths=None, pause_for_plot=None)[source]¶

Resume from restored experiment.

This method provides the same interface as train().

If not specified, an argument will default to the saved arguments from the last call to train().

Returns:	The average return in last epoch cycle.

save(epoch, paths=None)[source]¶

Save snapshot of current batch.

Parameters:	itr (int) – Index of iteration (epoch). paths (dict) – Batch of samples after preprocessed. If None, no paths will be logged to the snapshot.

setup(algo, env, sampler_cls=None, sampler_args=None)[source]¶

Set up runner for algorithm and environment.

This method saves algo and env within runner and creates a sampler.

Note

After setup() is called all variables in session should have been initialized. setup() respects existing values in session so policy weights can be loaded before setup().

Parameters:	algo (garage.np.algos.RLAlgorithm) – An algorithm instance. env (garage.envs.GarageEnv) – An environement instance. sampler_cls (garage.sampler.Sampler) – A sampler class. sampler_args (dict) – Arguments to be passed to sampler constructor.

step_epochs()[source]¶

Step through each epoch.

This function returns a magic generator. When iterated through, this generator automatically performs services such as snapshotting and log management. It is used inside train() in each algorithm.

The generator initializes two variables: self.step_itr and self.step_path. To use the generator, these two have to be updated manually in each epoch, as the example shows below.

Yields:	int – The next training epoch.

Examples

for epoch in runner.step_epochs():: runner.step_path = runner.obtain_samples(…) self.train_once(…) runner.step_itr += 1

train(n_epochs, batch_size, n_epoch_cycles=1, plot=False, store_paths=False, pause_for_plot=False)[source]¶

Start training.

Parameters:

n_epochs (int) – Number of epochs.
batch_size (int) – Number of environment steps in one batch.
n_epoch_cycles (int) – Number of batches of samples in each epoch. This is only useful for off-policy algorithm. For on-policy algorithm this value should always be 1.
plot (bool) – Visualize policy by doing rollout after each epoch.
store_paths (bool) – Save paths in snapshot.
pause_for_plot (bool) – Pause for plot.

Returns:

The average return in last epoch cycle.