garage.experiment

Experiment functions.

class LocalRunner(snapshot_config)

Base class of local runner.

Use Runner.setup(algo, env) to setup algorithm and environment for runner and Runner.train() to start training.

Parameters:snapshot_config (garage.experiment.SnapshotConfig) – The snapshot configuration used by LocalRunner to create the snapshotter. If None, it will create one with default settings.

Note

For the use of any TensorFlow environments, policies and algorithms, please use LocalTFRunner().

Examples

# to train
runner = LocalRunner()
env = Env(…)
policy = Policy(…)
algo = Algo(
env=env,
policy=policy,
…)
runner.setup(algo, env)
runner.train(n_epochs=100, batch_size=4000)
# to resume immediately.
runner = LocalRunner()
runner.restore(resume_from_dir)
runner.resume()
# to resume with modified training arguments.
runner = LocalRunner()
runner.restore(resume_from_dir)
runner.resume(n_epochs=20)
total_env_steps

Total environment steps collected.

Returns:Total environment steps collected.
Return type:int
make_sampler(self, sampler_cls, *, seed=None, n_workers=psutil.cpu_count(logical=False), max_episode_length=None, worker_class=None, sampler_args=None, worker_args=None)

Construct a Sampler from a Sampler class.

Parameters:
  • sampler_cls (type) – The type of sampler to construct.
  • seed (int) – Seed to use in sampler workers.
  • max_episode_length (int) – Maximum episode length to be sampled by the sampler. Epsiodes longer than this will be truncated.
  • n_workers (int) – The number of workers the sampler should use.
  • worker_class (type) – Type of worker the Sampler should use.
  • sampler_args (dict or None) – Additional arguments that should be passed to the sampler.
  • worker_args (dict or None) – Additional arguments that should be passed to the sampler.
Raises:

ValueError – If max_episode_length isn’t passed and the algorithm doesn’t contain a max_episode_length field, or if the algorithm doesn’t have a policy field.

Returns:

An instance of the sampler class.

Return type:

sampler_cls

setup(self, algo, env, sampler_cls=None, sampler_args=None, n_workers=psutil.cpu_count(logical=False), worker_class=DefaultWorker, worker_args=None)

Set up runner for algorithm and environment.

This method saves algo and env within runner and creates a sampler.

Note

After setup() is called all variables in session should have been initialized. setup() respects existing values in session so policy weights can be loaded before setup().

Parameters:
  • algo (RLAlgorithm) – An algorithm instance.
  • env (Environment) – An environment instance.
  • sampler_cls (type) – A class which implements Sampler.
  • sampler_args (dict) – Arguments to be passed to sampler constructor.
  • n_workers (int) – The number of workers the sampler should use.
  • worker_class (type) – Type of worker the sampler should use.
  • worker_args (dict or None) – Additional arguments that should be passed to the worker.
Raises:

ValueError – If sampler_cls is passed and the algorithm doesn’t contain a max_episode_length field.

obtain_episodes(self, itr, batch_size=None, agent_update=None, env_update=None)

Obtain one batch of episodes.

Parameters:
  • itr (int) – Index of iteration (epoch).
  • batch_size (int) – Number of steps in batch. This is a hint that the sampler may or may not respect.
  • agent_update (object) – Value which will be passed into the agent_update_fn before doing sampling episodes. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
  • env_update (object) – Value which will be passed into the env_update_fn before sampling episodes. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
Raises:

ValueError – If the runner was initialized without a sampler, or batch_size wasn’t provided here or to train.

Returns:

Batch of episodes.

Return type:

EpisodeBatch

obtain_samples(self, itr, batch_size=None, agent_update=None, env_update=None)

Obtain one batch of samples.

Parameters:
  • itr (int) – Index of iteration (epoch).
  • batch_size (int) – Number of steps in batch. This is a hint that the sampler may or may not respect.
  • agent_update (object) – Value which will be passed into the agent_update_fn before sampling episodes. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
  • env_update (object) – Value which will be passed into the env_update_fn before sampling episodes. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
Raises:

ValueError – Raised if the runner was initialized without a sampler, or batch_size wasn’t provided here or to train.

Returns:

One batch of samples.

Return type:

list[dict]

save(self, epoch)

Save snapshot of current batch.

Parameters:epoch (int) – Epoch.
Raises:NotSetupError – if save() is called before the runner is set up.
restore(self, from_dir, from_epoch='last')

Restore experiment from snapshot.

Parameters:
  • from_dir (str) – Directory of the pickle file to resume experiment from.
  • from_epoch (str or int) – The epoch to restore from. Can be ‘first’, ‘last’ or a number. Not applicable when snapshot_mode=’last’.
Returns:

Arguments for train().

Return type:

TrainArgs

log_diagnostics(self, pause_for_plot=False)

Log diagnostics.

Parameters:pause_for_plot (bool) – Pause for plot.
train(self, n_epochs, batch_size=None, plot=False, store_episodes=False, pause_for_plot=False)

Start training.

Parameters:
  • n_epochs (int) – Number of epochs.
  • batch_size (int or None) – Number of environment steps in one batch.
  • plot (bool) – Visualize an episode from the policy after each epoch.
  • store_episodes (bool) – Save episodes in snapshot.
  • pause_for_plot (bool) – Pause for plot.
Raises:

NotSetupError – If train() is called before setup().

Returns:

The average return in last epoch cycle.

Return type:

float

step_epochs(self)

Step through each epoch.

This function returns a magic generator. When iterated through, this generator automatically performs services such as snapshotting and log management. It is used inside train() in each algorithm.

The generator initializes two variables: self.step_itr and self.step_episode. To use the generator, these two have to be updated manually in each epoch, as the example shows below.

Yields:int – The next training epoch.

Examples

for epoch in runner.step_epochs():
runner.step_episode = runner.obtain_samples(…) self.train_once(…) runner.step_itr += 1
resume(self, n_epochs=None, batch_size=None, plot=None, store_episodes=None, pause_for_plot=None)

Resume from restored experiment.

This method provides the same interface as train().

If not specified, an argument will default to the saved arguments from the last call to train().

Parameters:
  • n_epochs (int) – Number of epochs.
  • batch_size (int) – Number of environment steps in one batch.
  • plot (bool) – Visualize an episode from the policy after each epoch.
  • store_episodes (bool) – Save episodes in snapshot.
  • pause_for_plot (bool) – Pause for plot.
Raises:

NotSetupError – If resume() is called before restore().

Returns:

The average return in last epoch cycle.

Return type:

float

get_env_copy(self)

Get a copy of the environment.

Returns:An environment instance.
Return type:Environment
LocalTFRunner
class MetaEvaluator(*, test_task_sampler, max_episode_length, n_exploration_eps=10, n_test_tasks=None, n_test_episodes=1, prefix='MetaTest', test_task_names=None, worker_class=DefaultWorker, worker_args=None)

Evaluates Meta-RL algorithms on test environments.

Parameters:
  • test_task_sampler (TaskSampler) – Sampler for test tasks. To demonstrate the effectiveness of a meta-learning method, these should be different from the training tasks.
  • max_episode_length (int) – Maximum length of evaluation episodes.
  • n_test_tasks (int or None) – Number of test tasks to sample each time evaluation is performed. Note that tasks are sampled “without replacement”. If None, is set to test_task_sampler.n_tasks.
  • n_exploration_eps (int) – Number of episodes to gather from the exploration policy before requesting the meta algorithm to produce an adapted policy.
  • n_test_episodes (int) – Number of episodes to use for each adapted policy. The adapted policy should forget previous episodes when .reset() is called.
  • prefix (str) – Prefix to use when logging. Defaults to MetaTest. For example, this results in logging the key ‘MetaTest/SuccessRate’. If not set to MetaTest, it should probably be set to MetaTrain.
  • test_task_names (list[str]) – List of task names to test. Should be in an order consistent with the task_id env_info, if that is present.
  • worker_class (type) – Type of worker the Sampler should use.
  • worker_args (dict or None) – Additional arguments that should be passed to the worker.
evaluate(self, algo, test_episodes_per_task=None)

Evaluate the Meta-RL algorithm on the test tasks.

Parameters:
  • algo (MetaRLAlgorithm) – The algorithm to evaluate.
  • test_episodes_per_task (int or None) – Number of episodes per task.
SnapshotConfig
class Snapshotter(snapshot_dir=os.path.join(os.getcwd(), 'data/local/experiment'), snapshot_mode='last', snapshot_gap=1)

Snapshotter snapshots training data.

When training, it saves data to binary files. When resuming, it loads from saved data.

Parameters:
  • snapshot_dir (str) – Path to save the log and iteration snapshot.
  • snapshot_mode (str) – Mode to save the snapshot. Can be either “all” (all iterations will be saved), “last” (only the last iteration will be saved), “gap” (every snapshot_gap iterations are saved), or “none” (do not save snapshots).
  • snapshot_gap (int) – Gap between snapshot iterations. Wait this number of iterations before taking another snapshot.
snapshot_dir

Return the directory of snapshot.

Returns:The directory of snapshot
Return type:str
snapshot_mode

Return the type of snapshot.

Returns:The type of snapshot. Can be “all”, “last” or “gap”
Return type:str
snapshot_gap

Return the gap number of snapshot.

Returns:The gap number of snapshot.
Return type:int
save_itr_params(self, itr, params)

Save the parameters if at the right iteration.

Parameters:
  • itr (int) – Number of iterations. Used as the index of snapshot.
  • params (obj) – Content of snapshot to be saved.
Raises:

ValueError – If snapshot_mode is not one of “all”, “last” or “gap”.

load(self, load_dir, itr='last')

Load one snapshot of parameters from disk.

Parameters:
  • load_dir (str) – Directory of the cloudpickle file to resume experiment from.
  • itr (int or string) – Iteration to load. Can be an integer, ‘last’ or ‘first’.
Returns:

Loaded snapshot.

Return type:

dict

Raises:
  • ValueError – If itr is neither an integer nor one of (“last”, “first”).
  • FileNotFoundError – If the snapshot file is not found in load_dir.
  • NotAFileError – If the snapshot exists but is not a file.
class ConstructEnvsSampler(env_constructors)

Bases: garage.experiment.task_sampler.TaskSampler

Inheritance diagram of garage.experiment.ConstructEnvsSampler

TaskSampler where each task has its own constructor.

Generally, this is used when the different tasks are completely different environments.

Parameters:env_constructors (list[Callable[Environment]]) – Callables that produce environments (for example, environment types).
n_tasks

the number of tasks.

Type:int
sample(self, n_tasks, with_replacement=False)

Sample a list of environment updates.

Parameters:
  • n_tasks (int) – Number of updates to sample.
  • with_replacement (bool) – Whether tasks can repeat when sampled. Note that if more tasks are sampled than exist, then tasks may repeat, but only after every environment has been included at least once in this batch. Ignored for continuous task spaces.
Returns:

Batch of sampled environment updates, which, when

invoked on environments, will configure them with new tasks. See EnvUpdate for more information.

Return type:

list[EnvUpdate]

class EnvPoolSampler(envs)

Bases: garage.experiment.task_sampler.TaskSampler

Inheritance diagram of garage.experiment.EnvPoolSampler

TaskSampler that samples from a finite pool of environments.

This can be used with any environments, but is generally best when using in-process samplers with environments that are expensive to construct.

Parameters:envs (list[Environment]) – List of environments to use as a pool.
n_tasks

the number of tasks.

Type:int
sample(self, n_tasks, with_replacement=False)

Sample a list of environment updates.

Parameters:
  • n_tasks (int) – Number of updates to sample.
  • with_replacement (bool) – Whether tasks can repeat when sampled. Since this cannot be easily implemented for an object pool, setting this to True results in ValueError.
Raises:

ValueError – If the number of requested tasks is larger than the pool, or with_replacement is set.

Returns:

Batch of sampled environment updates, which, when

invoked on environments, will configure them with new tasks. See EnvUpdate for more information.

Return type:

list[EnvUpdate]

grow_pool(self, new_size)

Increase the size of the pool by copying random tasks in it.

Note that this only copies the tasks already in the pool, and cannot create new original tasks in any way.

Parameters:new_size (int) – Size the pool should be after growning.
class SetTaskSampler(env_constructor)

Bases: garage.experiment.task_sampler.TaskSampler

Inheritance diagram of garage.experiment.SetTaskSampler

TaskSampler where the environment can sample “task objects”.

This is used for environments that implement sample_tasks and set_task. For example, HalfCheetahVelEnv, as implemented in Garage.

Parameters:env_constructor (Callable[Environment]) – Callable that produces an environment (for example, an environment type).
n_tasks

The number of tasks if known and finite.

Type:int or None
sample(self, n_tasks, with_replacement=False)

Sample a list of environment updates.

Parameters:
  • n_tasks (int) – Number of updates to sample.
  • with_replacement (bool) – Whether tasks can repeat when sampled. Note that if more tasks are sampled than exist, then tasks may repeat, but only after every environment has been included at least once in this batch. Ignored for continuous task spaces.
Returns:

Batch of sampled environment updates, which, when

invoked on environments, will configure them with new tasks. See EnvUpdate for more information.

Return type:

list[EnvUpdate]

class TaskSampler

Bases: abc.ABC

Inheritance diagram of garage.experiment.TaskSampler

Class for sampling batches of tasks, represented as `~EnvUpdate`s.

n_tasks

Number of tasks, if known and finite.

Type:int or None
n_tasks

The number of tasks if known and finite.

Type:int or None
sample(self, n_tasks, with_replacement=False)

Sample a list of environment updates.

Parameters:
  • n_tasks (int) – Number of updates to sample.
  • with_replacement (bool) – Whether tasks can repeat when sampled. Note that if more tasks are sampled than exist, then tasks may repeat, but only after every environment has been included at least once in this batch. Ignored for continuous task spaces.
Returns:

Batch of sampled environment updates, which, when

invoked on environments, will configure them with new tasks. See EnvUpdate for more information.

Return type:

list[EnvUpdate]