Quick Start with garage¶
Table of Content
What is garage?¶
garage is a reinforcement learning (RL) toolkit for developing and evaluating algorithms. The garage library also provides a collection of state-of-the-art implementations of RL algorithms.
The toolkit provides a wide range of modular tools for implementing RL algorithms, including:
Composable neural network models
Replay buffers
High-performance samplers
An expressive experiment definition interface
Tools for reproducibility (e.g. set a global random seed which all components respect)
Logging to many outputs, including TensorBoard
Reliable experiment checkpointing and resuming
Environment interfaces for many popular benchmark suites
Supporting for running garage in diverse environments, including always up-to-date Docker containers
Why garage?¶
garage aims to provide both researchers and developers:
a flexible and structured tool for developing algorithms to solve a variety of RL problems,
a standardized and reproducible environment for experimenting and evaluating RL algorithms,
a collection of benchmarks and examples of RL algorithms.
Kick Start garage¶
This quickstart will show how to quickly get started with garage in 5 minutes.
import garage
Algorithms¶
An array of algorithms are available in garage:
Algorithm |
Framework(s) |
CEM |
|
CMA-ES |
|
REINFORCE (a.k.a. VPG) |
|
DDPG |
|
DQN |
|
DDQN |
TensorFlow |
ERWR |
|
NPO |
|
PPO |
PyTorch, TensorFlow |
REPS |
|
TD3 |
|
TNPG |
|
TRPO |
PyTorch, TensorFlow |
MAML |
PyTorch |
RL2 |
|
PEARL |
|
SAC |
|
MTSAC |
PyTorch |
MTPPO |
PyTorch, TensorFlow |
MTTRPO |
PyTorch, TensorFlow |
Task Embedding |
TensorFlow |
Behavioral Cloning |
They are organized in the github repository as:
└── garage
├── envs
├── experiment
├── misc
├── np
├── plotter
├── replay_buffer
├── sampler
├── tf
└── torch
Note: clickable links represents the directory of algorithms.
A simple pytorch example to import TRPO
algorithm, as well as, the policy
GaussianMLPPolicy
, value function GaussianMLPValueFunction
and sampler
LocalSampler
in garage is shown below:
import gym
import torch
from garage.envs import GarageEnv, normalize
from garage.sampler import LocalSampler
from garage.torch.algos import TRPO as PyTorch_TRPO
from garage.torch.policies import GaussianMLPPolicy as PyTorch_GMP
from garage.torch.value_functions import GaussianMLPValueFunction
def trpo_garage_pytorch():
env = GarageEnv(normalize(gym.make(env_id))) # specify env_id
policy = PyTorch_GMP(env.spec,
hidden_sizes= [32, 32],
hidden_nonlinearity=torch.tanh,
output_nonlinearity=None)
value_function = GaussianMLPValueFunction(env_spec=env.spec,
hidden_sizes=(32, 32),
hidden_nonlinearity=torch.tanh,
output_nonlinearity=None)
sampler = LocalSampler(agents=policy,
envs=env,
max_episode_length=env.spec.max_episode_length)
algo = PyTorch_TRPO(
env_spec=env.spec,
policy=policy,
value_function=value_function,
sampler=sampler,
discount=0.99,
gae_lambda=0.97)
The full code can be found here.
To know more about implementing new algorithms, see this guide
Running Examples¶
Garage ships with example files to help you get started. To get a list of examples, run:
garage examples
This prints a list of examples along with their fully qualified name, such as:
tf/dqn_cartpole.py (garage.examples.tf.dqn_cartpole.py)
To get the source of an example, run:
garage examples tf/dqn_cartpole.py
This will print the source on your console, which you can write to a file as follows:
garage examples tf/dqn_cartpole.py > tf_dqn_cartpole.py
You can also directly run an example by passing the fully qualified name to
python -m
, as follows:
python -m garage.examples.tf.dqn_cartpole.py
You can also access the examples for a specific version on GitHub by visiting
the tag corresponding to that version and then navigating to
src/garage/examples
.
Running Experiments¶
In garage, experiments are run using the “experiment launcher” wrap_experiment
, a decorated Python function, which can be imported directly from the garage package.
from garage import wrap_experiment
Moreover, objects, such as trainer
, environment
, policy
, sampler
e.t.c are commonly used when constructing experiments in garage.
"""A regression test for automatic benchmarking garage-PyTorch-TRPO."""
import torch
from garage import wrap_experiment
from garage.envs import GymEnv, normalize
from garage.experiment import deterministic
from garage.sampler import LocalSampler
from garage.torch.algos import TRPO as PyTorch_TRPO
from garage.torch.policies import GaussianMLPPolicy as PyTorch_GMP
from garage.torch.value_functions import GaussianMLPValueFunction
from garage.trainer import Trainer
hyper_parameters = {
'hidden_sizes': [32, 32],
'max_kl': 0.01,
'gae_lambda': 0.97,
'discount': 0.99,
'n_epochs': 999,
'batch_size': 1024,
}
@wrap_experiment
def trpo_garage_pytorch(ctxt, env_id, seed):
"""Create garage PyTorch TRPO model and training.
Args:
ctxt (garage.experiment.ExperimentContext): The experiment
configuration used by Trainer to create the
snapshotter.
env_id (str): Environment id of the task.
seed (int): Random positive integer for the trial.
"""
deterministic.set_seed(seed)
trainer = Trainer(ctxt)
env = normalize(GymEnv(env_id))
policy = PyTorch_GMP(env.spec,
hidden_sizes=hyper_parameters['hidden_sizes'],
hidden_nonlinearity=torch.tanh,
output_nonlinearity=None)
value_function = GaussianMLPValueFunction(env_spec=env.spec,
hidden_sizes=(32, 32),
hidden_nonlinearity=torch.tanh,
output_nonlinearity=None)
sampler = LocalSampler(agents=policy,
envs=env,
max_episode_length=env.spec.max_episode_length)
algo = PyTorch_TRPO(env_spec=env.spec,
policy=policy,
value_function=value_function,
sampler=sampler,
discount=hyper_parameters['discount'],
gae_lambda=hyper_parameters['gae_lambda'])
trainer.setup(algo, env)
trainer.train(n_epochs=hyper_parameters['n_epochs'],
batch_size=hyper_parameters['batch_size'])
This page will give you more insight into running experiments.
Plotting results¶
In garage, we use TensorBoard for plotting experiment results.
This guide will provide details how to set up tensorboard when running experiments in garage.
Experiment outputs¶
Localrunner
is a state manager of experiments in garage, It is set up to create, save and restore the state, also known as snapshot
object, upon/ during an experiment. The snapshot
object includes the hyperparameter configuration, training progress, a pickled object of algorithm(s) and environment(s), tensorboard event file etc.
Experiment results will, by default, output to the same directory as the garage package in the relative directory data/local/experiment
. The output directory is generally organized as the following:
└── data
└── local
└── experiment
└── your_experiment_name
├── progress.csv
├── debug.log
├── variant.json
├── metadata.json
├── launch_archive.tar.xz
└── events.out.tfevents.xxx
wrap_experiment
can be invoked with arguments to support actions like modifying default output directory, changing snapshot modes, controlling snapshot gap etc. For example, to modify the default output directory and change the snapshot mode from last
(only last iteration will be saved) to all
, we can do this:
@wrap_experiment(log_dir='./your_log_dir', snapshot_mode='all')
def my_experiment(ctxt, seed, lr=0.5):
...
During an experiment, garage extensively use logger
from Dowel
for logging outputs to StdOutput, and/ or TextOutput, and/or CsvOutput. For details, you can check this.
Open Source Support¶
Since October 2018, garage is active in the open-source community contributing to RL researches and developments. Any contributions from the community is more than welcomed.
Resources¶
If you are interested in a more in-depth and specific capabilities of garage, you can find many other guides in this website such as, but not limited to, the followings:
This page was authored by Iris Liu (@irisliucy).