garage.sampler package

Samplers which run agents in environments.

class BatchSampler(algo, env)[source]

Bases: garage.sampler.sampler_deprecated.BaseSampler

Class with batch-based sampling.

Parameters:
obtain_samples(itr, batch_size=None, whole_paths=True)[source]

Sample the policy for new trajectories.

Parameters:
  • itr (int) – Number of iteration.
  • batch_size (int) – Number of environment steps in one batch.
  • whole_paths (bool) – Whether to use whole path or truncated.
Returns:

A list of paths.

Return type:

list[dict]

shutdown_worker()[source]

Shutdown workers.

start_worker()[source]

Start workers.

class ISSampler(algo, env, n_backtrack=None, n_is_pretrain=0, init_is=0, skip_is_itrs=False, hist_variance_penalty=0.0, max_is_ratio=0, ess_threshold=0, randomize_draw=False)[source]

Bases: garage.sampler.batch_sampler.BatchSampler

Importance sampling sampler.

Sampler which alternates between live sampling iterations using BatchSampler and importance sampling iterations.

Parameters:
  • algo (garage.np.algos.RLAlgorithm) – An algorithm instance.
  • env (garage.envs.GarageEnv) – An environement instance.
  • n_backtrack (int) – Number of past policies to update from. If None, it uses all past policies.
  • n_is_pretrain (int) – Number of importance sampling iterations to perform in beginning of training
  • init_is (bool) – Set initial iteration (after pretrain) an importance sampling iteration.
  • skip_is_itrs (bool) – Do not do any importance sampling iterations (after pretrain).
  • hist_variance_penalty (int) – Penalize variance of historical policy.
  • max_is_ratio (int) – Maximum allowed importance sampling ratio.
  • ess_threshold (int) – Minimum effective sample size required.
  • randomize_draw (bool) – Whether to randomize important samples.
add_history(policy_distribution, paths)[source]

Store policy distribution and paths in history.

Parameters:
get_history_list(n_past=None)[source]

Get list of (distribution, data) tuples from history.

Parameters:n_past (int) – Number of past policies to update from. If None, it uses all past policies.
Returns:A list of paths.
Return type:list
history

History of policies.

History of policies that have interacted with the environment and the data from interaction episode(s).

Type:list
obtain_samples(itr, batch_size=None, whole_paths=True)[source]

Collect samples for the given iteration number.

Parameters:
  • itr (int) – Number of iteration.
  • batch_size (int) – Number of environment steps in one batch.
  • whole_paths (bool) – Whether to use whole path or truncated.
Returns:

A list of paths.

Return type:

list[dict]

class Sampler(algo, env)[source]

Bases: abc.ABC

Abstract base class of all samplers.

Implementations of this class should override construct, obtain_samples, and shutdown_worker. construct takes a WorkerFactory, which implements most of the RL-specific functionality a Sampler needs. Specifically, it specifies how to construct `Worker`s, which know how to perform rollouts and update both agents and environments.

Currently, __init__ is also part of the interface, but calling it is deprecated. start_worker is also deprecated, and does not need to be implemented.

classmethod from_worker_factory(worker_factory, agents, envs)[source]

Construct this sampler.

Parameters:
  • worker_factory (WorkerFactory) – Pickleable factory for creating workers. Should be transmitted to other processes / nodes where work needs to be done, then workers should be constructed there.
  • agents (Agent or List[Agent]) – Agent(s) to use to perform rollouts. If a list is passed in, it must have length exactly worker_factory.n_workers, and will be spread across the workers.
  • envs (gym.Env or List[gym.Env]) – Environment rollouts are performed in. If a list is passed in, it must have length exactly worker_factory.n_workers, and will be spread across the workers.
Returns:

An instance of cls.

Return type:

Sampler

obtain_samples(itr, num_samples, agent_update, env_update=None)[source]

Collect at least a given number transitions (timesteps).

Parameters:
  • itr (int) – The current iteration number. Using this argument is deprecated.
  • num_samples (int) – Minimum number of transitions / timesteps to sample.
  • agent_update (object) – Value which will be passed into the agent_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
  • env_update (object) – Value which will be passed into the env_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
Returns:

The batch of collected trajectories.

Return type:

garage.TrajectoryBatch

shutdown_worker()[source]

Terminate workers if necessary.

Because Python object destruction can be somewhat unpredictable, this method isn’t deprecated.

start_worker()[source]

Initialize the sampler.

i.e. launching parallel workers if necessary.

This method is deprecated, please launch workers in construct instead.

class LocalSampler(worker_factory, agents, envs)[source]

Bases: garage.sampler.sampler.Sampler

Sampler that runs workers in the main process.

This is probably the simplest possible sampler. It’s called the “Local” sampler because it runs everything in the same process and thread as where it was called from.

Parameters:
  • worker_factory (WorkerFactory) – Pickleable factory for creating workers. Should be transmitted to other processes / nodes where work needs to be done, then workers should be constructed there.
  • agents (Agent or List[Agent]) – Agent(s) to use to perform rollouts. If a list is passed in, it must have length exactly worker_factory.n_workers, and will be spread across the workers.
  • envs (gym.Env or List[gym.Env]) – Environment rollouts are performed in. If a list is passed in, it must have length exactly worker_factory.n_workers, and will be spread across the workers.
classmethod from_worker_factory(worker_factory, agents, envs)[source]

Construct this sampler.

Parameters:
  • worker_factory (WorkerFactory) – Pickleable factory for creating workers. Should be transmitted to other processes / nodes where work needs to be done, then workers should be constructed there.
  • agents (Agent or List[Agent]) – Agent(s) to use to perform rollouts. If a list is passed in, it must have length exactly worker_factory.n_workers, and will be spread across the workers.
  • envs (gym.Env or List[gym.Env]) – Environment rollouts are performed in. If a list is passed in, it must have length exactly worker_factory.n_workers, and will be spread across the workers.
Returns:

An instance of cls.

Return type:

Sampler

obtain_exact_trajectories(n_traj_per_worker, agent_update, env_update=None)[source]

Sample an exact number of trajectories per worker.

Parameters:
  • n_traj_per_worker (int) – Exact number of trajectories to gather for each worker.
  • agent_update (object) – Value which will be passed into the agent_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
  • env_update (object) – Value which will be passed into the env_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
Returns:

Batch of gathered trajectories. Always in worker

order. In other words, first all trajectories from worker 0, then all trajectories from worker 1, etc.

Return type:

TrajectoryBatch

obtain_samples(itr, num_samples, agent_update, env_update=None)[source]

Collect at least a given number transitions (timesteps).

Parameters:
  • itr (int) – The current iteration number. Using this argument is deprecated.
  • num_samples (int) – Minimum number of transitions / timesteps to sample.
  • agent_update (object) – Value which will be passed into the agent_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
  • env_update (object) – Value which will be passed into the env_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
Returns:

The batch of collected trajectories.

Return type:

garage.TrajectoryBatch

shutdown_worker()[source]

Shutdown the workers.

class RaySampler(worker_factory, agents, envs)[source]

Bases: garage.sampler.sampler.Sampler

Collects Policy Rollouts in a data parallel fashion.

Parameters:
  • worker_factory (garage.sampler.WorkerFactory) – Used for worker behavior.
  • agents (list[garage.Policy]) – Agents to distribute across workers.
  • envs (list[gym.Env]) – Environments to distribute across workers.
classmethod from_worker_factory(worker_factory, agents, envs)[source]

Construct this sampler.

Parameters:
  • worker_factory (WorkerFactory) – Pickleable factory for creating workers. Should be transmitted to other processes / nodes where work needs to be done, then workers should be constructed there.
  • agents (Agent or List[Agent]) – Agent(s) to use to perform rollouts. If a list is passed in, it must have length exactly worker_factory.n_workers, and will be spread across the workers.
  • envs (gym.Env or List[gym.Env]) – Environment rollouts are performed in. If a list is passed in, it must have length exactly worker_factory.n_workers, and will be spread across the workers.
Returns:

An instance of cls.

Return type:

Sampler

obtain_exact_trajectories(n_traj_per_worker, agent_update, env_update=None)[source]

Sample an exact number of trajectories per worker.

Parameters:
  • n_traj_per_worker (int) – Exact number of trajectories to gather for each worker.
  • agent_update (object) – Value which will be passed into the agent_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
  • env_update (object) – Value which will be passed into the env_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
Returns:

Batch of gathered trajectories. Always in worker

order. In other words, first all trajectories from worker 0, then all trajectories from worker 1, etc.

Return type:

TrajectoryBatch

obtain_samples(itr, num_samples, agent_update, env_update=None)[source]

Sample the policy for new trajectories.

Parameters:
  • itr (int) – Iteration number.
  • num_samples (int) – Number of steps the the sampler should collect.
  • agent_update (object) – Value which will be passed into the agent_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
  • env_update (object) – Value which will be passed into the env_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
Returns:

Batch of gathered trajectories.

Return type:

TrajectoryBatch

shutdown_worker()[source]

Shuts down the worker.

start_worker()[source]

Initialize a new ray worker.

class MultiprocessingSampler(worker_factory, agents, envs)[source]

Bases: garage.sampler.sampler.Sampler

Sampler that uses multiprocessing to distribute workers.

Parameters:
  • worker_factory (WorkerFactory) – Pickleable factory for creating workers. Should be transmitted to other processes / nodes where work needs to be done, then workers should be constructed there.
  • agents (Agent or List[Agent]) – Agent(s) to use to perform rollouts. If a list is passed in, it must have length exactly worker_factory.n_workers, and will be spread across the workers.
  • envs (gym.Env or List[gym.Env]) – Environment rollouts are performed in. If a list is passed in, it must have length exactly worker_factory.n_workers, and will be spread across the workers.
classmethod from_worker_factory(worker_factory, agents, envs)[source]

Construct this sampler.

Parameters:
  • worker_factory (WorkerFactory) – Pickleable factory for creating workers. Should be transmitted to other processes / nodes where work needs to be done, then workers should be constructed there.
  • agents (Agent or List[Agent]) – Agent(s) to use to perform rollouts. If a list is passed in, it must have length exactly worker_factory.n_workers, and will be spread across the workers.
  • envs (gym.Env or List[gym.Env]) – Environment rollouts are performed in. If a list is passed in, it must have length exactly worker_factory.n_workers, and will be spread across the workers.
Returns:

An instance of cls.

Return type:

Sampler

obtain_exact_trajectories(n_traj_per_worker, agent_update, env_update=None)[source]

Sample an exact number of trajectories per worker.

Parameters:
  • n_traj_per_worker (int) – Exact number of trajectories to gather for each worker.
  • agent_update (object) – Value which will be passed into the agent_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
  • env_update (object) – Value which will be passed into the env_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
Returns:

Batch of gathered trajectories. Always in worker

order. In other words, first all trajectories from worker 0, then all trajectories from worker 1, etc.

Return type:

TrajectoryBatch

Raises:

AssertionError – On internal errors.

obtain_samples(itr, num_samples, agent_update, env_update=None)[source]

Collect at least a given number transitions (timesteps).

Parameters:
  • itr (int) – The current iteration number. Using this argument is deprecated.
  • num_samples (int) – Minimum number of transitions / timesteps to sample.
  • agent_update (object) – Value which will be passed into the agent_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
  • env_update (object) – Value which will be passed into the env_update_fn before doing rollouts. If a list is passed in, it must have length exactly factory.n_workers, and will be spread across the workers.
Returns:

The batch of collected trajectories.

Return type:

garage.TrajectoryBatch

Raises:

AssertionError – On internal errors.

shutdown_worker()[source]

Shutdown the workers.

class ParallelVecEnvExecutor(env, n, max_path_length, scope=None)[source]

Bases: object

Environment wrapper that runs multiple environments in parallel.

action_space

Read / write the action space.

close()[source]

Close all environments.

num_envs

Read / write the number of environments.

observation_space

Read / write the observation space.

reset()[source]

Reset all environments.

step(action_n)[source]

Step all environments using the provided actions.

class VecEnvExecutor(envs, max_path_length)[source]

Bases: object

Environment wrapper that runs multiple environments.

Parameters:
  • envs (list[gym.Env]) – List of environments to batch together.
  • max_path_length (int) – Maximum length of any path.
action_space

Read the action space.

Returns:The action space.
Return type:gym.Space
close()[source]

Close all environments.

num_envs

Read the number of environments.

Returns:Number of environments
Return type:int
observation_space

Read the observation space.

Returns:The observation space.
Return type:gym.Space
reset()[source]

Reset all environments.

Returns:Observations of shape \((K, O*)\)
Return type:np.ndarray
step(action_n)[source]

Step all environments using the provided actions.

Inserts an environment infor ‘vec_env_executor.complete’ containing the episode end signal (time limit reached or done signal from environment).

Parameters:action_n (np.ndarray) – Array of actions.
Returns:
Tuple containing:
  • observations (np.ndarray)
  • rewards (np.ndarray)
  • dones (np.ndarray): The done signal from the environment.
  • env_infos (dict[str, np.ndarray])
  • completes (np.ndarray): whether or not the path is complete.
    A path is complete at some time-step N if the done signal has been received at that or before N, or if max_path_length N >= max_path_length.
Return type:tuple
class VecWorker(*, seed, max_path_length, worker_number, n_envs=8)[source]

Bases: garage.sampler.default_worker.DefaultWorker

Worker with a single policy and multiple environemnts.

Alternates between taking a single step in all environments and asking the policy for an action for every environment. This allows computing a batch of actions, which is generally much more efficient than computing a single action when using neural networks.

Parameters:
  • seed (int) – The seed to use to intialize random number generators.
  • max_path_length (int or float) – The maximum length paths which will be sampled. Can be (floating point) infinity.
  • worker_number (int) – The number of the worker this update is occurring in. This argument is used to set a different seed for each worker.
  • n_envs (int) – Number of environment copies to use.
DEFAULT_N_ENVS = 8
collect_rollout()[source]

Collect all completed rollouts.

Returns:
A batch of the trajectories completed since
the last call to collect_rollout().
Return type:garage.TrajectoryBatch
shutdown()[source]

Close the worker’s environments.

start_rollout()[source]

Begin a new rollout.

step_rollout()[source]

Take a single time-step in the current rollout.

Returns:True iff at least one of the paths was completed.
Return type:bool
update_agent(agent_update)[source]

Update an agent, assuming it implements garage.Policy.

Parameters:agent_update (np.ndarray or dict or garage.Policy) – If a tuple, dict, or np.ndarray, these should be parameters to agent, which should have been generated by calling policy.get_param_values. Alternatively, a policy itself. Note that other implementations of Worker may take different types for this parameter.
update_env(env_update)[source]

Use any non-None env_update as a new environment.

A simple env update function. If env_update is not None, it should be the complete new environment.

This allows changing environments by passing the new environment as env_update into obtain_samples.

Parameters:

env_update (gym.Env or EnvUpdate or None) – The environment to replace the existing env with. Note that other implementations of Worker may take different types for this parameter.

Raises:
  • TypeError – If env_update is not one of the documented types.
  • ValueError – If the wrong number of updates is passed.
class OffPolicyVectorizedSampler(algo, env, n_envs=None, no_reset=True)[source]

Bases: garage.sampler.batch_sampler.BatchSampler

This class implements OffPolicyVectorizedSampler.

Parameters:
  • algo (garage.np.RLAlgorithm) – Algorithm.
  • env (garage.envs.GarageEnv) – Environment.
  • n_envs (int) – Number of parallel environments managed by sampler.
  • no_reset (bool) – Reset environment between samples or not.
obtain_samples(itr, batch_size=None, whole_paths=True)[source]

Collect samples for the given iteration number.

Parameters:
  • itr (int) – Iteration number.
  • batch_size (int) – Number of environment interactions in one batch.
  • whole_paths (bool) – Not effective. Only keep here to comply with base class.
Raises:

ValueError – If the algorithm doesn’t have an exploration_policy field.

Returns:

A list of paths.

Return type:

list

shutdown_worker()[source]

Terminate workers if necessary.

start_worker()[source]

Initialize the sampler.

class OnPolicyVectorizedSampler(algo, env, n_envs=None)[source]

Bases: garage.sampler.batch_sampler.BatchSampler

BatchSampler which uses VecEnvExecutor to run multiple environments.

Parameters:
obtain_samples(itr, batch_size=None, whole_paths=True)[source]

Sample the policy for new trajectories.

Parameters:
  • itr (int) – Iteration number.
  • batch_size (int) – Number of samples to be collected. If None, it will be default [algo.max_path_length * n_envs].
  • whole_paths (bool) – Whether return all the paths or not. True by default. It’s possible for the paths to have total actual sample size larger than batch_size, and will be truncated if this flag is true.
Returns:

Sample paths.

Return type:

list[dict]

Note

Each path is a dictionary, with keys and values as following:
  • observations: numpy.ndarray with shape [Batch, *obs_dims]
  • actions: numpy.ndarray with shape [Batch, *act_dims]
  • rewards: numpy.ndarray with shape [Batch, ]
  • env_infos: A dictionary with each key representing one environment info, value being a numpy.ndarray with shape [Batch, ?]. One example is “ale.lives” for atari environments.
  • agent_infos: A dictionary with each key representing one agent info, value being a numpy.ndarray with shape [Batch, ?]. One example is “prev_action”, which is used for recurrent policy as previous action input, merged with the observation input as the state input.
  • dones: numpy.ndarray with shape [Batch, ]
shutdown_worker()[source]

Shutdown workers.

start_worker()[source]

Start workers.

class WorkerFactory(*, seed, max_path_length, n_workers=1, worker_class=<class 'garage.sampler.default_worker.DefaultWorker'>, worker_args=None)[source]

Bases: object

Constructs workers for Samplers.

The intent is that this object should be sufficient to avoid subclassing the sampler. Instead of subclassing the sampler for e.g. a specific backend, implement a specialized WorkerFactory (or specify appropriate functions to this one). Not that this object must be picklable, since it may be passed to workers. However, its fields individually need not be.

All arguments to this type must be passed by keyword.

Parameters:
  • seed (int) – The seed to use to intialize random number generators.
  • n_workers (int) – The number of workers to use.
  • max_path_length (int) – The maximum length paths which will be sampled.
  • worker_class (type) – Class of the workers. Instances should implement the Worker interface.
  • worker_args (dict or None) – Additional arguments that should be passed to the worker.
prepare_worker_messages(objs, preprocess=<function identity_function>)[source]

Take an argument and canonicalize it into a list for all workers.

This helper function is used to handle arguments in the sampler API which may (optionally) be lists. Specifically, these are agent, env, agent_update, and env_update. Checks that the number of parameters is correct.

Parameters:
  • objs (object or list) – Must be either a single object or a list of length n_workers.
  • preprocess (function) – Function to call on each single object before creating the list.
Raises:

ValueError – If a list is passed of a length other than n_workers.

Returns:

A list of length self.n_workers.

Return type:

List[object]

class Worker(*, seed, max_path_length, worker_number)[source]

Bases: abc.ABC

Worker class used in all Samplers.

collect_rollout()[source]

Collect the current rollout, clearing the internal buffer.

Returns:
Batch of sampled trajectories. May be
truncated if the rollouts haven’t completed yet.
Return type:garage.TrajectoryBatch
rollout()[source]

Sample a single rollout of the agent in the environment.

Returns:
Batch of sampled trajectories. May be
truncated if max_path_length is set.
Return type:garage.TrajectoryBatch
shutdown()[source]

Shutdown the worker.

start_rollout()[source]

Begin a new rollout.

step_rollout()[source]

Take a single time-step in the current rollout.

Returns:True iff the path is done, either due to the environment indicating termination of due to reaching max_path_length.
update_agent(agent_update)[source]

Update the worker’s agent, using agent_update.

Parameters:agent_update (object) – An agent update. The exact type of this argument depends on the Worker implementation.
update_env(env_update)[source]

Update the worker’s env, using env_update.

Parameters:env_update (object) – An environment update. The exact type of this argument depends on the Worker implementation.
class DefaultWorker(*, seed, max_path_length, worker_number)[source]

Bases: garage.sampler.worker.Worker

Initialize a worker.

Parameters:
  • seed (int) – The seed to use to intialize random number generators.
  • max_path_length (int or float) – The maximum length paths which will be sampled. Can be (floating point) infinity.
  • worker_number (int) – The number of the worker where this update is occurring. This argument is used to set a different seed for each worker.
agent

The worker’s agent.

Type:Policy or None
env

The worker’s environment.

Type:gym.Env or None
collect_rollout()[source]

Collect the current rollout, clearing the internal buffer.

Returns:
A batch of the trajectories completed since
the last call to collect_rollout().
Return type:garage.TrajectoryBatch
rollout()[source]

Sample a single rollout of the agent in the environment.

Returns:The collected trajectory.
Return type:garage.TrajectoryBatch
shutdown()[source]

Close the worker’s environment.

start_rollout()[source]

Begin a new rollout.

step_rollout()[source]

Take a single time-step in the current rollout.

Returns:True iff the path is done, either due to the environment indicating termination of due to reaching max_path_length.
Return type:bool
update_agent(agent_update)[source]

Update an agent, assuming it implements garage.Policy.

Parameters:agent_update (np.ndarray or dict or garage.Policy) – If a tuple, dict, or np.ndarray, these should be parameters to agent, which should have been generated by calling policy.get_param_values. Alternatively, a policy itself. Note that other implementations of Worker may take different types for this parameter.
update_env(env_update)[source]

Use any non-None env_update as a new environment.

A simple env update function. If env_update is not None, it should be the complete new environment.

This allows changing environments by passing the new environment as env_update into obtain_samples.

Parameters:env_update (gym.Env or EnvUpdate or None) – The environment to replace the existing env with. Note that other implementations of Worker may take different types for this parameter.
Raises:TypeError – If env_update is not one of the documented types.
worker_init()[source]

Initialize a worker.