garage.experiment.local_runner module

Provides algorithms with access to most of garage’s features.

class ExperimentStats(total_epoch, total_itr, total_env_steps, last_path)[source]

Bases: object

Statistics of a experiment.

Parameters:
  • total_epoch (int) – Total epoches.
  • total_itr (int) – Total Iterations.
  • total_env_steps (int) – Total environment steps collected.
  • last_path (list[dict]) – Last sampled paths.
class LocalRunner(snapshot_config, max_cpus=1)[source]

Bases: object

Base class of local runner.

Use Runner.setup(algo, env) to setup algorithm and environement for runner and Runner.train() to start training.

Parameters:
  • snapshot_config (garage.experiment.SnapshotConfig) – The snapshot configuration used by LocalRunner to create the snapshotter. If None, it will create one with default settings.
  • max_cpus (int) – The maximum number of parallel sampler workers.

Note

For the use of any TensorFlow environments, policies and algorithms, please use LocalTFRunner().

Examples

# to train
runner = LocalRunner()
env = Env(…)
policy = Policy(…)
algo = Algo(
env=env,
policy=policy,
…)
runner.setup(algo, env)
runner.train(n_epochs=100, batch_size=4000)
# to resume immediately.
runner = LocalRunner()
runner.restore(resume_from_dir)
runner.resume()
# to resume with modified training arguments.
runner = LocalRunner()
runner.restore(resume_from_dir)
runner.resume(n_epochs=20)
get_env_copy()[source]

Get a copy of the environment.

Returns:An environement instance.
Return type:garage.envs.GarageEnv
log_diagnostics(pause_for_plot=False)[source]

Log diagnostics.

Parameters:pause_for_plot (bool) – Pause for plot.
make_sampler(sampler_cls, *, seed=None, n_workers=1, max_path_length=None, worker_class=<class 'garage.sampler.default_worker.DefaultWorker'>, sampler_args=None, worker_args=None)[source]

Construct a Sampler from a Sampler class.

Parameters:
  • sampler_cls (type) – The type of sampler to construct.
  • seed (int) – Seed to use in sampler workers.
  • max_path_length (int) – Maximum path length to be sampled by the sampler. Paths longer than this will be truncated.
  • n_workers (int) – The number of workers the sampler should use.
  • worker_class (type) – Type of worker the Sampler should use.
  • sampler_args (dict or None) – Additional arguments that should be passed to the sampler.
  • worker_args (dict or None) – Additional arguments that should be passed to the sampler.
Raises:

ValueError – If max_path_length isn’t passed and the algorithm doesn’t contain a max_path_length field, or if the algorithm doesn’t have a policy field.

Returns:

An instance of the sampler class.

Return type:

sampler_cls

obtain_samples(itr, batch_size=None, agent_update=None, env_update=None)[source]

Obtain one batch of samples.

Parameters:
  • itr (int) – Index of iteration (epoch).
  • batch_size (int) – Number of steps in batch. This is a hint that the sampler may or may not respect.
  • agent_update (object) – Value which will be passed into the agent_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
  • env_update (object) – Value which will be passed into the env_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
Raises:

ValueError – Raised if the runner was initialized without a sampler, or batch_size wasn’t provided here or to train.

Returns:

One batch of samples.

Return type:

list[dict]

restore(from_dir, from_epoch='last')[source]

Restore experiment from snapshot.

Parameters:
  • from_dir (str) – Directory of the pickle file to resume experiment from.
  • from_epoch (str or int) – The epoch to restore from. Can be ‘first’, ‘last’ or a number. Not applicable when snapshot_mode=’last’.
Returns:

Arguments for train().

Return type:

TrainArgs

resume(n_epochs=None, batch_size=None, plot=None, store_paths=None, pause_for_plot=None)[source]

Resume from restored experiment.

This method provides the same interface as train().

If not specified, an argument will default to the saved arguments from the last call to train().

Parameters:
  • n_epochs (int) – Number of epochs.
  • batch_size (int) – Number of environment steps in one batch.
  • plot (bool) – Visualize policy by doing rollout after each epoch.
  • store_paths (bool) – Save paths in snapshot.
  • pause_for_plot (bool) – Pause for plot.
Raises:

NotSetupError – If resume() is called before restore().

Returns:

The average return in last epoch cycle.

Return type:

float

save(epoch)[source]

Save snapshot of current batch.

Parameters:epoch (int) – Epoch.
Raises:NotSetupError – if save() is called before the runner is set up.
setup(algo, env, sampler_cls=None, sampler_args=None, n_workers=1, worker_class=None, worker_args=None)[source]

Set up runner for algorithm and environment.

This method saves algo and env within runner and creates a sampler.

Note

After setup() is called all variables in session should have been initialized. setup() respects existing values in session so policy weights can be loaded before setup().

Parameters:
  • algo (garage.np.algos.RLAlgorithm) – An algorithm instance.
  • env (garage.envs.GarageEnv) – An environement instance.
  • sampler_cls (garage.sampler.Sampler) – A sampler class.
  • sampler_args (dict) – Arguments to be passed to sampler constructor.
  • n_workers (int) – The number of workers the sampler should use.
  • worker_class (type) – Type of worker the sampler should use.
  • worker_args (dict or None) – Additional arguments that should be passed to the worker.
Raises:

ValueError – If sampler_cls is passed and the algorithm doesn’t contain a max_path_length field.

step_epochs()[source]

Step through each epoch.

This function returns a magic generator. When iterated through, this generator automatically performs services such as snapshotting and log management. It is used inside train() in each algorithm.

The generator initializes two variables: self.step_itr and self.step_path. To use the generator, these two have to be updated manually in each epoch, as the example shows below.

Yields:int – The next training epoch.

Examples

for epoch in runner.step_epochs():
runner.step_path = runner.obtain_samples(…) self.train_once(…) runner.step_itr += 1
total_env_steps

Total environment steps collected.

Returns:Total environment steps collected.
Return type:int
train(n_epochs, batch_size=None, plot=False, store_paths=False, pause_for_plot=False)[source]

Start training.

Parameters:
  • n_epochs (int) – Number of epochs.
  • batch_size (int or None) – Number of environment steps in one batch.
  • plot (bool) – Visualize policy by doing rollout after each epoch.
  • store_paths (bool) – Save paths in snapshot.
  • pause_for_plot (bool) – Pause for plot.
Raises:

NotSetupError – If train() is called before setup().

Returns:

The average return in last epoch cycle.

Return type:

float

exception NotSetupError[source]

Bases: Exception

Raise when an experiment is about to run without setup.

class SetupArgs(sampler_cls, sampler_args, seed)[source]

Bases: object

Arguments to setup a runner.

Parameters:
  • sampler_cls (garage.sampler.Sampler) – A sampler class.
  • sampler_args (dict) – Arguments to be passed to sampler constructor.
  • seed (int) – Random seed.
class TrainArgs(n_epochs, batch_size, plot, store_paths, pause_for_plot, start_epoch)[source]

Bases: object

Arguments to call train() or resume().

Parameters:
  • n_epochs (int) – Number of epochs.
  • batch_size (int) – Number of environment steps in one batch.
  • plot (bool) – Visualize policy by doing rollout after each epoch.
  • store_paths (bool) – Save paths in snapshot.
  • pause_for_plot (bool) – Pause for plot.
  • start_epoch (int) – The starting epoch. Used for resume().