State Representation Learning

What is S-RL

State representation learning (SRL, Lesort et al., 2018) aims at learning compact representations from raw observations (e.g., learn a position (x, y) directly from raw pixels) without explicit supervision.

Most of these algorithms are designed to learn abstract features that characterize data. The goal is to use that representation to solve a task with RL.

The idea is that a low-dimensional representation should only keep the useful information and reduce the search space, thus contributing to address two main challenges of RL:

Sample inefficiency
Instability

Moreover, a state representation learned for a particular task may be transferred to related tasks and therefore speed up learning in multiple task settings.

Using RL notations, SRL corresponds to learning a transformation(In,practice, the learned transformation is a neural network) $ϕ$ from the observation space $O$ to the state space $S$ . Then, a policy $\pi$ , that takes a state $s_t \in S$ as input and outputs action $a_t$ , is learned to solve the task:

$o_t \xrightarrow[SRL]{\phi} s_t \xrightarrow[RL]{\pi} a_t$

Environmental Details

The simulated environments run at 250 FPS on a 8-core machine that allows to train a RL agent on 1 Million steps in only 1h (or to generate 20k samples in less than 2 min)

A ground truth state is defined in each scenario:

The absolute robot position in static scenarios
The relative position (w.r.t. the target) in moving goal scenarios

Images are 224x224 pixels, navigation datasets use 4 discrete actions (right, left, forward, backward); robot arms use one more (down) action.

All CNN policies normalize the input image by dividing it by 255.

Observations are not stacked.

when learning from SRL, the states are normalized using a running mean/std average.

Reinforcement learning metrics reported are the average returned rewards over 5 policies, independently trained using the same RL algorithm with a different seed.

Evaluation of Learned State Representations

Qualitative Evaluation: Which is the perceived utility of the state representation using visualization tools
Metrics:
1. KNN-MSE: A low KNN-MSE means that a neighbor in the ground truth is still a neighbor in the learned representation, and thus, local coherence is preserved
2. Correlation:
  1. GTC for Ground Truth Correlation
  2. The mean of GTC

RL and ES Algorithms

A2C: A synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C).
ACER: Sample Efficient Actor-Critic with Experience Replay
ACKTR: Actor Critic using Kronecker-Factored Trust Region
ARS: Augmented Random Search
CMA-ES: Covariance Matrix Adaptation Evolution Strategy
DDPG: Deep Deterministic Policy Gradients
DeepQ: DQN and variants (Double, Dueling, prioritized experience replay)
PPO1: Proximal Policy Optimization (MPI Implementation)
PPO2: Proximal Policy Optimization (GPU Implementation)
SAC: Soft Actor Critic
TRPO: Trust Region Policy Optimization (MPI Implementation)

Training

Before you start a RL experiment, you have to make sure that a visdom server is running, unless you deactivate visualization, add --no-vis.

Launch visdom server:

python -m visdom.server

Train a agent:

python -m rl_baselines.train --algo rl_algo --env env1 --log-dir logs/ --srl-model raw_pixels --num-timesteps 10000

Continuous Actions

Continuous actions have been implemented for DDPG, PPO2, ARS, CMA-ES, SAC and random agent. To use continuous actions in the position space:

To use continuous actions in the position space:

python -m rl_baselines.train --algo ppo2 --log-dir logs/ -c

To use continuous actions in the joint space:

python -m rl_baselines.train --algo ppo2 --log-dir logs/ -c -joints

Multiple Trainings

Train an agent multiple times on multiple environments, using different methods

python  -m rl_baselines.pipeline --algo ppo2 --log-dir logs/ --env env1 env2 [...] --srl-model model1 model2 [...]

python  -m rl_baselines.pipeline --algo ppo2 --log-dir logs/ --srl-model vae ground_truth --random-target --num-cpu 4 --num-iteration 15

Enjoy the Trained Agent

To load a trained agent and see the result:

python -m replay.enjoy_baselines --log-dir path/to/trained/agent/ --render

Add Your Own RL Algorithm

Create a class that inherits rl_baselines.base_classes.BaseRLObject which implements your algorithm
Add your class to the registered_rl dictionary in rl_baselines/registry.py, using this format NAME: (CLASS, ALGO_TYPE, [ACTION_TYPE])
Now you can call your algorithm using --algo NAME with train.py or pipeline.py

Hyperparameter Search

S-RL toolbox allows hyperparameter search, using hyperband or hyperopt for the implemented RL algorithms

python -m rl_baselines.hyperparam_search --optimizer hyperband --algo ppo2 --env MobileRobotGymEnv-v0 --srl-model ground_truth

Available Environments

Kuka arm
Mobile robot
Racing car
Baxter
Robobo
Omnidirectional Robot

To test the environment:

python -m environments.dataset_generator --no-record-data --display

To record data (i.e. generate a dataset) from the environment for training a SRL model, using random actions:

python -m environments.dataset_generator --num-cpu 4 --name folder_name

Add a Custom Environment

Create a class that inherits environments.srl_env.SRLGymEnv which implements your environment
Add this code to the same file as the class declaration
Add your class to the registered_env dictionary in environments/registry.py, using this format NAME: (CLASS, SUPER_CLASS, PLOT_TYPE, THREAD_TYPE)
Add the name of the environment to config/srl_models.yaml, with the location of the saved model for each SRL model (can point to a dummy location, but must be defined)
Now you can call your environment using --env NAME with train.py, pipeline.py or dataset_generator.py

S-RL Models

Look the SRL Repo to learn how to train a state representation model
Then you must edit config/srl_models.yaml
Set the right path to use the learned state representations

To train a Reinforcement learning agent on a specific SRL model:

python -m rl_baselines.train --algo ppo2 --log-dir logs/ --srl-model model_name

Available SRL models

The available state representation models are:

ground_truth: Hand engineered features (e.g., robot position + target position for mobile robot env)
raw_pixels: Learning a policy in an end-to-end manner, directly from pixels to actions.
supervised: A model trained with Ground Truth states as targets in a supervised setting.
autoencoder: an autoencoder from the raw pixels
vae: a variational autoencoder from the raw pixels
inverse: an inverse dynamics model
forward: a forward dynamics model
srl_combination: a model combining several losses (e.g. vae + forward + inverse…) for SRL
pca: pca applied to the raw pixels
robotic_priors: robotic priors model,.Learning State Representations with Robotic Priors
multi_view_srl: a SRL model using views from multiple cameras as input, with any of the above losses (e.g triplet and others)
joints: the arm’s joints angles (kuka environments only)
joints_position: the arm’s x,y,z position and joints angles (kuka environments only)

Add a Custom SRL Model

If your SRL model is a characteristics of the environment (position, angles, …):

Add the name of the model to the registered_srl dictionary in state_representation/registry.py
Modify the def getSRLState(self, observation) in the environments to return the data you want for this model.
Now you can call your SRL model using --srl-model NAME with train.py or pipeline.py

Otherwise, for the SRL model that are external to the environment (Supervised, autoencoder, …):

Add your SRL model that inherits SRLBaseClass, to the function state_representation.models.loadSRLModel
Add the name of the model to the registered_srl dictionary in state_representation/registry.py
Add the name of the model to config/srl_models.yaml, with the location of the saved model for each environment (can point to a dummy location, but must be defined).
Add the name of the environment to config/srl_models.yaml, with the location of the saved model for each SRL model (can point to a dummy location, but must be defined)
Now you can call your environment using --env NAME with train.py, pipeline.py

SRL Zoo

A collection of State Representation Learning (SRL) methods for Reinforcement Learning, written using PyTorch.

Available Methods

SRL with Robotic Priors + extensions (stereo-vision, additional priors)
Denoising Autoencoder (DAE)
Variational Autoencoder (VAE) and beta-VAE
PCA
Supervised Learning
Forward, Inverse Models
Triplet Network (for stereo-vision only)
Reward loss
Combination and stacking of methods
Random Features

Learning a State Representation

To learn a state representation, you need to enforce constrains on the representation using one or more losses.

All losses are defined in losses/losses.py. The available losses are:

autoencoder: reconstruction loss, using current and next observation
denoising autoencoder (dae): same as for the auto-encoder, except that the model reconstruct inputs from noisy observations containing a random zero-pixel mask
vae: (beta)-VAE loss (reconstruction + kullback leiber divergence loss)
inverse: predict the action given current and next state
forward: predict the next state given current state and taken action
reward: predict the reward (positive or not) given current and next state
priors: robotic priors losses (see “Learning State Representations with Robotic Priors”)
triplet: triplet loss for multi-cam setting (see Multiple Cameras section)

References

Raffin, (2018). S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning.
Lesort, (2018). State Representation Learning for Control: An Overview.

Note: Cover Picture

Super Agents of AI