garage.experiment.local_runner module¶

Provides algorithms with access to most of garage’s features.

class ExperimentStats(total_epoch, total_itr, total_env_steps, last_path)[source]¶

Bases: object

Statistics of a experiment.

Parameters:	total_epoch (int) – Total epoches. total_itr (int) – Total Iterations. total_env_steps (int) – Total environment steps collected. last_path (list[dict]) – Last sampled paths.

class LocalRunner(snapshot_config, max_cpus=1)[source]¶

Bases: object

Base class of local runner.

Use Runner.setup(algo, env) to setup algorithm and environement for runner and Runner.train() to start training.

Parameters:	snapshot_config (garage.experiment.SnapshotConfig) – The snapshot configuration used by LocalRunner to create the snapshotter. If None, it will create one with default settings. max_cpus (int) – The maximum number of parallel sampler workers.

Note

For the use of any TensorFlow environments, policies and algorithms, please use LocalTFRunner().

Examples

# to train
runner = LocalRunner()
env = Env(…)
policy = Policy(…)
algo = Algo(
env=env,
policy=policy,
…)
runner.setup(algo, env)
runner.train(n_epochs=100, batch_size=4000)

# to resume immediately.
runner = LocalRunner()
runner.restore(resume_from_dir)
runner.resume()

# to resume with modified training arguments.
runner = LocalRunner()
runner.restore(resume_from_dir)
runner.resume(n_epochs=20)

get_env_copy()[source]¶

Get a copy of the environment.

Returns:	An environement instance.
Return type:	garage.envs.GarageEnv

log_diagnostics(pause_for_plot=False)[source]¶

Log diagnostics.

Parameters:	pause_for_plot (bool) – Pause for plot.

make_sampler(sampler_cls, *, seed=None, n_workers=1, max_path_length=None, worker_class=<class 'garage.sampler.default_worker.DefaultWorker'>, sampler_args=None, worker_args=None)[source]¶

Construct a Sampler from a Sampler class.

Parameters:	sampler_cls (type) – The type of sampler to construct. seed (int) – Seed to use in sampler workers. max_path_length (int) – Maximum path length to be sampled by the sampler. Paths longer than this will be truncated. n_workers (int) – The number of workers the sampler should use. worker_class (type) – Type of worker the Sampler should use. sampler_args (dict or None) – Additional arguments that should be passed to the sampler. worker_args (dict or None) – Additional arguments that should be passed to the sampler.
Raises:	`ValueError` – If max_path_length isn’t passed and the algorithm doesn’t contain a max_path_length field, or if the algorithm doesn’t have a policy field.
Returns:	An instance of the sampler class.
Return type:	sampler_cls

obtain_samples(itr, batch_size=None, agent_update=None, env_update=None)[source]¶

Obtain one batch of samples.

Parameters:	itr (int) – Index of iteration (epoch). batch_size (int) – Number of steps in batch. This is a hint that the sampler may or may not respect. agent_update (object) – Value which will be passed into the agent_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers. env_update (object) – Value which will be passed into the env_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
Raises:	`ValueError` – Raised if the runner was initialized without a sampler, or batch_size wasn’t provided here or to train.
Returns:	One batch of samples.
Return type:	list[dict]

restore(from_dir, from_epoch='last')[source]¶

Restore experiment from snapshot.

Parameters:	from_dir (str) – Directory of the pickle file to resume experiment from. from_epoch (str or int) – The epoch to restore from. Can be ‘first’, ‘last’ or a number. Not applicable when snapshot_mode=’last’.
Returns:	Arguments for train().
Return type:	TrainArgs

resume(n_epochs=None, batch_size=None, plot=None, store_paths=None, pause_for_plot=None)[source]¶

Resume from restored experiment.

This method provides the same interface as train().

If not specified, an argument will default to the saved arguments from the last call to train().

Parameters:	n_epochs (int) – Number of epochs. batch_size (int) – Number of environment steps in one batch. plot (bool) – Visualize policy by doing rollout after each epoch. store_paths (bool) – Save paths in snapshot. pause_for_plot (bool) – Pause for plot.
Raises:	`NotSetupError` – If resume() is called before restore().
Returns:	The average return in last epoch cycle.
Return type:	float

save(epoch)[source]¶

Save snapshot of current batch.

Parameters:	epoch (int) – Epoch.
Raises:	`NotSetupError` – if save() is called before the runner is set up.

setup(algo, env, sampler_cls=None, sampler_args=None, n_workers=1, worker_class=<class 'garage.sampler.default_worker.DefaultWorker'>, worker_args=None)[source]¶

Set up runner for algorithm and environment.

This method saves algo and env within runner and creates a sampler.

Note

After setup() is called all variables in session should have been initialized. setup() respects existing values in session so policy weights can be loaded before setup().

Parameters:

algo (garage.np.algos.RLAlgorithm) – An algorithm instance.
env (garage.envs.GarageEnv) – An environement instance.
sampler_cls (garage.sampler.Sampler) – A sampler class.
sampler_args (dict) – Arguments to be passed to sampler constructor.
n_workers (int) – The number of workers the sampler should use.
worker_class (type) – Type of worker the sampler should use.
worker_args (dict or None) – Additional arguments that should be passed to the worker.

Raises:

ValueError – If sampler_cls is passed and the algorithm doesn’t contain a max_path_length field.

step_epochs()[source]¶

Step through each epoch.

This function returns a magic generator. When iterated through, this generator automatically performs services such as snapshotting and log management. It is used inside train() in each algorithm.

The generator initializes two variables: self.step_itr and self.step_path. To use the generator, these two have to be updated manually in each epoch, as the example shows below.

Yields:	int – The next training epoch.

Examples

for epoch in runner.step_epochs():: runner.step_path = runner.obtain_samples(…) self.train_once(…) runner.step_itr += 1

total_env_steps¶

Total environment steps collected.

Returns:	Total environment steps collected.
Return type:	int

train(n_epochs, batch_size=None, plot=False, store_paths=False, pause_for_plot=False)[source]¶

Start training.

Parameters:	n_epochs (int) – Number of epochs. batch_size (int or None) – Number of environment steps in one batch. plot (bool) – Visualize policy by doing rollout after each epoch. store_paths (bool) – Save paths in snapshot. pause_for_plot (bool) – Pause for plot.
Raises:	`NotSetupError` – If train() is called before setup().
Returns:	The average return in last epoch cycle.
Return type:	float

exception NotSetupError[source]¶

Bases: Exception

Raise when an experiment is about to run without setup.

class SetupArgs(sampler_cls, sampler_args, seed)[source]¶

Bases: object

Arguments to setup a runner.

Parameters:	sampler_cls (garage.sampler.Sampler) – A sampler class. sampler_args (dict) – Arguments to be passed to sampler constructor. seed (int) – Random seed.

class TrainArgs(n_epochs, batch_size, plot, store_paths, pause_for_plot, start_epoch)[source]¶

Bases: object

Arguments to call train() or resume().

Parameters:	n_epochs (int) – Number of epochs. batch_size (int) – Number of environment steps in one batch. plot (bool) – Visualize policy by doing rollout after each epoch. store_paths (bool) – Save paths in snapshot. pause_for_plot (bool) – Pause for plot. start_epoch (int) – The starting epoch. Used for resume().