garage.sampler.on_policy_vectorized_sampler module¶
BatchSampler which uses VecEnvExecutor to run multiple environments.
-
class
OnPolicyVectorizedSampler
(algo, env, n_envs=None)[source]¶ Bases:
garage.sampler.batch_sampler.BatchSampler
BatchSampler which uses VecEnvExecutor to run multiple environments.
Parameters: - algo (garage.np.algos.RLAlgorithm) – An algorithm instance.
- env (garage.envs.GarageEnv) – An environement instance.
- n_envs (int) – Number of environment instances to setup. This parameter has effect on sampling performance.
-
obtain_samples
(itr, batch_size=None, whole_paths=True)[source]¶ Sample the policy for new trajectories.
Parameters: - itr (int) – Iteration number.
- batch_size (int) – Number of samples to be collected. If None, it will be default [algo.max_path_length * n_envs].
- whole_paths (bool) – Whether return all the paths or not. True by default. It’s possible for the paths to have total actual sample size larger than batch_size, and will be truncated if this flag is true.
Returns: Sample paths.
Return type: Note
- Each path is a dictionary, with keys and values as following:
- observations: numpy.ndarray with shape [Batch, *obs_dims]
- actions: numpy.ndarray with shape [Batch, *act_dims]
- rewards: numpy.ndarray with shape [Batch, ]
- env_infos: A dictionary with each key representing one environment info, value being a numpy.ndarray with shape [Batch, ?]. One example is “ale.lives” for atari environments.
- agent_infos: A dictionary with each key representing one agent info, value being a numpy.ndarray with shape [Batch, ?]. One example is “prev_action”, which is used for recurrent policy as previous action input, merged with the observation input as the state input.
- dones: numpy.ndarray with shape [Batch, ]