garage.sampler.is_sampler module

Importance sampling sampler.

class ISSampler(algo, env, n_backtrack=None, n_is_pretrain=0, init_is=0, skip_is_itrs=False, hist_variance_penalty=0.0, max_is_ratio=0, ess_threshold=0, randomize_draw=False)[source]

Bases: garage.sampler.batch_sampler.BatchSampler

Importance sampling sampler.

Sampler which alternates between live sampling iterations using BatchSampler and importance sampling iterations.

Parameters:
  • algo (garage.np.algos.RLAlgorithm) – An algorithm instance.
  • env (garage.envs.GarageEnv) – An environement instance.
  • n_backtrack (int) – Number of past policies to update from. If None, it uses all past policies.
  • n_is_pretrain (int) – Number of importance sampling iterations to perform in beginning of training
  • init_is (bool) – Set initial iteration (after pretrain) an importance sampling iteration.
  • skip_is_itrs (bool) – Do not do any importance sampling iterations (after pretrain).
  • hist_variance_penalty (int) – Penalize variance of historical policy.
  • max_is_ratio (int) – Maximum allowed importance sampling ratio.
  • ess_threshold (int) – Minimum effective sample size required.
  • randomize_draw (bool) – Whether to randomize important samples.
add_history(policy_distribution, paths)[source]

Store policy distribution and paths in history.

Parameters:
get_history_list(n_past=None)[source]

Get list of (distribution, data) tuples from history.

Parameters:n_past (int) – Number of past policies to update from. If None, it uses all past policies.
Returns:A list of paths.
Return type:list
history

History of policies.

History of policies that have interacted with the environment and the data from interaction episode(s).

Type:list
obtain_samples(itr, batch_size=None, whole_paths=True)[source]

Collect samples for the given iteration number.

Parameters:
  • itr (int) – Number of iteration.
  • batch_size (int) – Number of environment steps in one batch.
  • whole_paths (bool) – Whether to use whole path or truncated.
Returns:

A list of paths.

Return type:

list[dict]