Quick Start with garage

Table of Content

What is garage?

garage is a reinforcement learning (RL) toolkit for developing and evaluating algorithms. The garage library also provides a collection of state-of-the-art implementations of RL algorithms.

The toolkit provides a wide range of modular tools for implementing RL algorithms, including:

  • Composable neural network models

  • Replay buffers

  • High-performance samplers

  • An expressive experiment definition interface

  • Tools for reproducibility (e.g. set a global random seed which all components respect)

  • Logging to many outputs, including TensorBoard

  • Reliable experiment checkpointing and resuming

  • Environment interfaces for many popular benchmark suites

  • Supporting for running garage in diverse environments, including always up-to-date Docker containers

Why garage?

garage aims to provide both researchers and developers:

  • a flexible and structured tool for developing algorithms to solve a variety of RL problems,

  • a standardized and reproducible environment for experimenting and evaluating RL algorithms,

  • a collection of benchmarks and examples of RL algorithms.

Kick Start garage

This quickstart will show how to quickly get started with garage in 5 minutes.

import garage

Algorithms

An array of algorithms are available in garage:

Algorithm

Framework(s)

CEM

Numpy

CMA-ES

Numpy

REINFORCE (a.k.a. VPG)

PyTorch, TensorFlow

DDPG

PyTorch, TensorFlow

DQN

TensorFlow

DDQN

TensorFlow

ERWR

TensorFlow

NPO

TensorFlow

PPO

PyTorch, TensorFlow

REPS

TensorFlow

TD3

TensorFlow

TNPG

TensorFlow

TRPO

PyTorch, TensorFlow

MAML

PyTorch

RL2

TensorFlow

PEARL

PyTorch

SAC

PyTorch

MTSAC

PyTorch

MTPPO

PyTorch, TensorFlow

MTTRPO

PyTorch, TensorFlow

Task Embedding

TensorFlow

Behavioral Cloning

PyTorch

They are organized in the github repository as:

└── garage
    ├── envs
    ├── experiment
    ├── misc
    ├── np
    ├── plotter
    ├── replay_buffer
    ├── sampler
    ├── tf
    └── torch

Note: clickable links represents the directory of algorithms.

A simple pytorch example to import TRPO algorithm, as well as, the policy GaussianMLPPolicy and value function GaussianMLPValueFunction in garage is shown below:

import gym
import torch

from garage.envs import GarageEnv, normalize
from garage.torch.algos import TRPO as PyTorch_TRPO
from garage.torch.policies import GaussianMLPPolicy as PyTorch_GMP
from garage.torch.value_functions import GaussianMLPValueFunction

def trpo_garage_pytorch():

    env = GarageEnv(normalize(gym.make(env_id))) # specify env_id

    policy = PyTorch_GMP(env.spec,
                         hidden_sizes= [32, 32],
                         hidden_nonlinearity=torch.tanh,
                         output_nonlinearity=None)

    value_function = GaussianMLPValueFunction(env_spec=env.spec,
                                              hidden_sizes=(32, 32),
                                              hidden_nonlinearity=torch.tanh,
                                              output_nonlinearity=None)

    algo = PyTorch_TRPO(
        env_spec=env.spec,
        policy=policy,
        value_function=value_function,
        max_episode_length=100,
        discount=0.99,
        gae_lambda=0.97)

The full code can be found here.

To know more about implementing new algorithms, see this guide

Running Experiments

In garage, experiments are run using the “experiment launcher” wrap_experiment, a decorated Python function, which can be imported directly from the garage package.

from garage import wrap_experiment

Moreover, objects, such as trainer, environment, policy e.t.c are commonly used when constructing experiments in garage.

import gym
import torch

from garage import wrap_experiment
from garage.envs import GarageEnv, normalize
from garage.experiment import deterministic, LocalRunner
from garage.torch.algos import TRPO as PyTorch_TRPO
from garage.torch.policies import GaussianMLPPolicy as PyTorch_GMP
from garage.torch.value_functions import GaussianMLPValueFunction

@wrap_experiment
def trpo_garage_pytorch(ctxt, env_id, seed):
    deterministic.set_seed(seed)

    runner = LocalRunner(ctxt)

    env = GarageEnv(normalize(gym.make(env_id)))

    policy = PyTorch_GMP(env.spec,
                         hidden_sizes=[32, 32],
                         hidden_nonlinearity=torch.tanh,
                         output_nonlinearity=None)

    value_function = GaussianMLPValueFunction(env_spec=env.spec,
                                              hidden_sizes=(32, 32),
                                              hidden_nonlinearity=torch.tanh,
                                              output_nonlinearity=None)

    algo = PyTorch_TRPO(
        env_spec=env.spec,
        policy=policy,
        value_function=value_function,
        max_episode_length=100,
        discount=0.99,
        gae_lambda=0.97)

    runner.setup(algo, env)
    runner.train(n_epochs=999,
                 batch_size=1024)

This page will give you more insight into running experiments.

Plotting results

In garage, we use TensorBoard for plotting experiment results.

This guide will provide details how to set up tensorboard when running experiments in garage.

Experiment outputs

Localrunner is a state manager of experiments in garage, It is set up to create, save and restore the state, also known as snapshot object, upon/ during an experiment. The snapshot object includes the hyperparameter configuration, training progress, a pickled object of algorithm(s) and environment(s), tensorboard event file etc.

Experiment results will, by default, output to the same directory as the garage package in the relative directory data/local/experiment. The output directory is generally organized as the following:

└── data
    └── local
        └── experiment
            └── your_experiment_name
                ├── progress.csv
                ├── debug.log
                ├── variant.json
                ├── metadata.json
                ├── launch_archive.tar.xz
                └── events.out.tfevents.xxx

wrap_experiment can be invoked with arguments to support actions like modifying default output directory, changing snapshot modes, controlling snapshot gap etc. For example, to modify the default output directory and change the snapshot mode from last (only last iteration will be saved) to all, we can do this:

@wrap_experiment(log_dir='./your_log_dir', snapshot_mode='all')
        def my_experiment(ctxt, seed, lr=0.5):
            ...

During an experiment, garage extensively use logger from Dowel for logging outputs to StdOutput, and/ or TextOutput, and/or CsvOutput. For details, you can check this.

Open Source Support

Since October 2018, garage is active in the open-source community contributing to RL researches and developments. Any contributions from the community is more than welcomed.

Resources

If you are interested in a more in-depth and specific capabilities of garage, you can find many other guides in this website such as, but not limited to, the followings:


This page was authored by Iris Liu (@irisliucy).