garage.sampler.on_policy_vectorized_sampler module¶

BatchSampler which uses VecEnvExecutor to run multiple environments.

class OnPolicyVectorizedSampler(algo, env, n_envs=None)[source]¶

BatchSampler which uses VecEnvExecutor to run multiple environments.

Parameters:	algo (garage.np.algos.RLAlgorithm) – An algorithm instance. env (garage.envs.GarageEnv) – An environement instance. n_envs (int) – Number of environment instances to setup. This parameter has effect on sampling performance.

obtain_samples(itr, batch_size=None, whole_paths=True)[source]¶

Sample the policy for new trajectories.

Parameters:	itr (int) – Iteration number. batch_size (int) – Number of samples to be collected. If None, it will be default [algo.max_path_length * n_envs]. whole_paths (bool) – Whether return all the paths or not. True by default. It’s possible for the paths to have total actual sample size larger than batch_size, and will be truncated if this flag is true.
Returns:	Sample paths.
Return type:	list[dict]

Note

Each path is a dictionary, with keys and values as following:

observations: numpy.ndarray with shape [Batch, *obs_dims]
actions: numpy.ndarray with shape [Batch, *act_dims]
rewards: numpy.ndarray with shape [Batch, ]
env_infos: A dictionary with each key representing one environment info, value being a numpy.ndarray with shape [Batch, ?]. One example is “ale.lives” for atari environments.
agent_infos: A dictionary with each key representing one agent info, value being a numpy.ndarray with shape [Batch, ?]. One example is “prev_action”, which is used for recurrent policy as previous action input, merged with the observation input as the state input.
dones: numpy.ndarray with shape [Batch, ]