Maximize Resource Usage¶

To accelerate running an experiment, we want to maximize the usage of resources (CPUs or GPUs).

Maximize CPU utilization¶

The most compute-intensive part in training a policy is sampling. Garage uses Worker to abstract a CPU to perform a rollout and uses Sampler to manage them. To maximize CPU utilization, we only need to choose proper Sampler and Worker when setting up Trainer.

In the following tests, we will use examples/torch/trpo_pendulum.py as example experiment file.

LocalSampler and DefaultWorker¶

First, we use the most basic config, LocalSampler and DefaultWorker. The sampler will run workers in the same process.

from garage.sampler import LocalSampler, DefaultWorker

    ...
    sampler = LocalSampler(agents=policy,
                           envs=env,
                           # below is the params for constructing worker factory
                           max_episode_length=env.spec.max_episode_length,
                           worker_cls=DefaultWorker)

    algo = TRPO(...
                sampler=sampler,
                ...)
    ...

With top command, in my environment (4 cores), we can see the CPU usage is about 300%.

...
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
14610 ruofu     20   0 4555092 558680 268692 R 301.0  6.9   6:06.37 python examples
...

Use VecWorker¶

To reduce the overhead in sampling, we use VecWorker to run multiple environments in one step. To set the level of vectorization (i.e. the number of environments simulated in one step), just set n_envs in worker_args.

from garage.sampler import LocalSampler, VecWorker

    ...
    sampler = LocalSampler(agents=policy,
                           envs=env,
                           # below is the params for constructing worker factory
                           max_episode_length=env.spec.max_episode_length,
                           worker_class=VecWorker,
                           worker_args=dict(n_envs=12))
    ...

And the CPU usage is about 350%.

...
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
11902 ruofu     20   0 4696076 709944 268424 R 351.2  8.7   2:56.95 python examples
...

Use RaySampler¶

Though the CPU usage is increased, it still run in one process. Using RaySampler, we can not only parallelize sampling across CPUs, but also deploy it on clusters (AWS/Azure/GCP, k8s, etc.). Here we will show the example running on the local machine.

from garage.sampler import RaySampler, VecWorker

    ...
    sampler = RaySampler(agents=policy,
                         envs=env,
                         # below is the params for constructing worker factory
                         max_episode_length=env.spec.max_episode_length,
                         worker_class=VecWorker,
                         worker_args=dict(n_envs=12))
    ...

From top command, we get:

...
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
12652 ruofu     20   0 7471988 570096 273596 S 380.7  7.0   7:12.44 python examples
12714 ruofu     20   0 6303456 554708 264884 S  24.3  6.8   0:30.03 ray::SamplerWor
12720 ruofu     20   0 6303516 553376 265004 S  23.9  6.8   0:29.79 ray::SamplerWor
12715 ruofu     20   0 6303456 555516 265112 S  22.9  6.8   0:29.34 ray::SamplerWor
12721 ruofu     20   0 6303516 556660 264656 S  22.9  6.8   0:29.52 ray::SamplerWor
...

We can see that there are 4 ray::SamplerWorker processes running, which are parallelled workers for sampling.

Use GPU¶

For algorithms with PyTorch or TensorFlow, we can use GPU to train policies.

PyTorch¶

In Garage, PyTorch use CPU mode by default. To enable GPU mode, use set_gpu_mode() function after declaring an algorithm in the experiment function. For example:

import torch
from garage.torch import set_gpu_mode

@wrap_experiment
def trpo_pendulum(ctxt=None, seed=1):

    ...
    algo = TRPO(env_spec=env.spec,
                policy=policy,
                value_function=value_function,
                sampler=sampler,
                discount=0.99,
                center_adv=False)

    # enable GPU
    if torch.cuda.is_available():
        set_gpu_mode(True)
    else:
        set_gpu_mode(False)
    algo.to()

    trainer.setup(algo, env)
    trainer.train(n_epochs=1000, batch_size=1024)

TensorFlow¶

Different from PyTorch, TensorFlow will use a GPU if one is available by default. To disable it, you can execute the following command before run the experiment launcher.

export CUDA_VISIBLE_DEVICES=-1  # CPU only

You can also use tf.compat.v1.ConfigProto to config CPU and GPU used for training. The following example show how to set to use CPU only. More configs documentations can be seen in TensorFlow docs.

@wrap_experiment
def trpo_cartpole(ctxt=None):
    sess_config = tf.compat.v1.ConfigProto(
        # the maximum number of GPU to use is 0, (i.e. use CPU only)
        device_count = {'GPU': 0}
    )
    sess = tf.compat.v1.Session(config=sess_config)
    with TFTrainer(ctxt, sess=sess) as trainer:
        ...

You can see more information about using GPU in the experiments page.

This page was authored by Ruofu Wang (@yeukfu).