`garage.tf.policies.task_embedding_policy`¶

Policy class for Task Embedding envs.

class TaskEmbeddingPolicy¶

Bases: garage.tf.policies.policy.Policy

Inheritance diagram of garage.tf.policies.task_embedding_policy.TaskEmbeddingPolicy

Base class for Task Embedding policies in TensorFlow.

This policy needs a task id in addition to observation to sample an action.

encoder¶

Encoder.

Type:	garage.tf.embeddings.encoder.Encoder

latent_space¶

Space of latent.

Type:	akro.Box

task_space¶

One-hot space of task id.

Type:	akro.Box

augmented_observation_space¶

Concatenated observation space and one-hot task id.

Type:	akro.Box

encoder_distribution¶

Encoder distribution.

Type:	tfp.Distribution.MultivariateNormalDiag

state_info_specs¶

State info specification.

Returns:	keys and shapes for the information related to the module’s state when taking an action.
Return type:	List[str]

state_info_keys¶

State info keys.

Returns:	keys for the information related to the module’s state when taking an input.
Return type:	List[str]

name¶

Name of policy.

Returns:	Name of policy
Return type:	str

env_spec¶

Policy environment specification.

Returns:	Environment specification.
Return type:	garage.EnvSpec

observation_space¶

Observation space.

Returns:	The observation space of the environment.
Return type:	akro.Space

action_space¶

Action space.

Returns:	The action space of the environment.
Return type:	akro.Space

get_latent(self, task_id)¶

Get embedded task id in latent space.

Parameters: task_id (np.ndarray) – One-hot task id, with shape \((N, )\). N is the number of tasks.

Returns:

An embedding sampled from embedding distribution, with: shape \((Z, )\). Z is the dimension of the latent embedding.

dict: Embedding distribution information.

Return type: np.ndarray

get_action(self, observation)¶

Get action sampled from the policy.

Parameters: observation (np.ndarray) – Augmented observation from the environment, with shape \((O+N, )\). O is the dimension of observation, N is the number of tasks.

Returns:

Action sampled from the policy,: with shape \((A, )\). A is the dimension of action.

dict: Action distribution information.

Return type: np.ndarray

get_actions(self, observations)¶

Get actions sampled from the policy.

Parameters: observations (np.ndarray) – Augmented observation from the environment, with shape \((T, O+N)\). T is the number of environment steps, O is the dimension of observation, N is the number of tasks.

Returns:

Actions sampled from the policy,: with shape \((T, A)\). T is the number of environment steps, A is the dimension of action.

dict: Action distribution information.

Return type: np.ndarray

get_action_given_task(self, observation, task_id)¶

Sample an action given observation and task id.

Parameters:

observation (np.ndarray) – Observation from the environment, with shape \((O, )\). O is the dimension of the observation.
task_id (np.ndarray) – One-hot task id, with shape :math:`(N, ). N is the number of tasks.

Returns:

Action sampled from the policy, with shape: \((A, )\). A is the dimension of action.

dict: Action distribution information.

Return type:

np.ndarray

get_actions_given_tasks(self, observations, task_ids)¶

Sample a batch of actions given observations and task ids.

Parameters:

observations (np.ndarray) – Observations from the environment, with shape \((T, O)\). T is the number of environment steps, O is the dimension of observation.
task_ids (np.ndarry) – One-hot task ids, with shape \((T, N)\). T is the number of environment steps, N is the number of tasks.

Returns:

Actions sampled from the policy,: with shape \((T, A)\). T is the number of environment steps, A is the dimension of action.

dict: Action distribution information.

Return type:

np.ndarray

get_action_given_latent(self, observation, latent)¶

Sample an action given observation and latent.

Parameters:

observation (np.ndarray) – Observation from the environment, with shape \((O, )\). O is the dimension of observation.
latent (np.ndarray) – Latent, with shape \((Z, )\). Z is the dimension of latent embedding.

Returns:

Action sampled from the policy,: with shape \((A, )\). A is the dimension of action.

dict: Action distribution information.

Return type:

np.ndarray

get_actions_given_latents(self, observations, latents)¶

Sample a batch of actions given observations and latents.

Parameters:

observations (np.ndarray) – Observations from the environment, with shape \((T, O)\). T is the number of environment steps, O is the dimension of observation.
latents (np.ndarray) – Latents, with shape \((T, Z)\). T is the number of environment steps, Z is the dimension of latent embedding.

Returns:

Actions sampled from the policy,: with shape \((T, A)\). T is the number of environment steps, A is the dimension of action.

dict: Action distribution information.

Return type:

np.ndarray

split_augmented_observation(self, collated)¶

Splits up observation into one-hot task and environment observation.

Parameters:	collated (np.ndarray) – Environment observation concatenated with task one-hot, with shape \((O+N, )\). O is the dimension of observation, N is the number of tasks.
Returns:	Vanilla environment observation, with shape \((O, )\). O is the dimension of observation. np.ndarray: Task one-hot, with shape \((N, )\). N is the number of tasks.
Return type:	np.ndarray

reset(self, do_resets=None)¶

Reset the policy.

This is effective only to recurrent policies.

do_resets is an array of boolean indicating which internal states to be reset. The length of do_resets should be equal to the length of inputs, i.e. batch size.

Parameters:	do_resets (numpy.ndarray) – Bool array indicating which states to be reset.

garage.tf.policies.task_embedding_policy¶

`garage.tf.policies.task_embedding_policy`¶