State Representation Learning

What is S-RL

State representation learning (SRL, Lesort et al., 2018) aims at learning compact representations from raw observations (e.g., learn a position (x, y) directly from raw pixels) without explicit supervision.

Most of these algorithms are designed to learn abstract features that characterize data. The goal is to use that representation to solve a task with RL.

The idea is that a low-dimensional representation should only keep the useful information and reduce the search space, thus contributing to address two main challenges of RL:

  1. Sample inefficiency
  2. Instability

Moreover, a state representation learned for a particular task may be transferred to related tasks and therefore speed up learning in multiple task settings.

Using RL notations, SRL corresponds to learning a transformation(In,practice, the learned transformation is a neural network) $ϕ$ from the observation space $O$ to the state space $S$. Then, a policy $\pi$, that takes a state $s_t \in S$ as input and outputs action $a_t$, is learned to solve the task:

$$ o_t \xrightarrow[SRL]{\phi} s_t \xrightarrow[RL]{\pi} a_t $$

Environmental Details

The simulated environments run at 250 FPS on a 8-core machine that allows to train a RL agent on 1 Million steps in only 1h (or to generate 20k samples in less than 2 min)

A ground truth state is defined in each scenario:

  • The absolute robot position in static scenarios
  • The relative position (w.r.t. the target) in moving goal scenarios

Images are 224x224 pixels, navigation datasets use 4 discrete actions (right, left, forward, backward); robot arms use one more (down) action.

All CNN policies normalize the input image by dividing it by 255.

Observations are not stacked.

when learning from SRL, the states are normalized using a running mean/std average.

Reinforcement learning metrics reported are the average returned rewards over 5 policies, independently trained using the same RL algorithm with a different seed.

Evaluation of Learned State Representations

  1. Qualitative Evaluation: Which is the perceived utility of the state representation using visualization tools
  2. Metrics:
    1. KNN-MSE: A low KNN-MSE means that a neighbor in the ground truth is still a neighbor in the learned representation, and thus, local coherence is preserved
    2. Correlation:
      1. GTC for Ground Truth Correlation
      2. The mean of GTC

RL and ES Algorithms

  • A2C: A synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C).
  • ACER: Sample Efficient Actor-Critic with Experience Replay
  • ACKTR: Actor Critic using Kronecker-Factored Trust Region
  • ARS: Augmented Random Search
  • CMA-ES: Covariance Matrix Adaptation Evolution Strategy
  • DDPG: Deep Deterministic Policy Gradients
  • DeepQ: DQN and variants (Double, Dueling, prioritized experience replay)
  • PPO1: Proximal Policy Optimization (MPI Implementation)
  • PPO2: Proximal Policy Optimization (GPU Implementation)
  • SAC: Soft Actor Critic
  • TRPO: Trust Region Policy Optimization (MPI Implementation)

Training

Before you start a RL experiment, you have to make sure that a visdom server is running, unless you deactivate visualization, add --no-vis.

Launch visdom server:

python -m visdom.server

Train a agent:

python -m rl_baselines.train --algo rl_algo --env env1 --log-dir logs/ --srl-model raw_pixels --num-timesteps 10000

Continuous Actions

Continuous actions have been implemented for DDPG, PPO2, ARS, CMA-ES, SAC and random agent. To use continuous actions in the position space:

To use continuous actions in the position space:

python -m rl_baselines.train --algo ppo2 --log-dir logs/ -c

To use continuous actions in the joint space:

python -m rl_baselines.train --algo ppo2 --log-dir logs/ -c -joints

Multiple Trainings

Train an agent multiple times on multiple environments, using different methods

python  -m rl_baselines.pipeline --algo ppo2 --log-dir logs/ --env env1 env2 [...] --srl-model model1 model2 [...]

python  -m rl_baselines.pipeline --algo ppo2 --log-dir logs/ --srl-model vae ground_truth --random-target --num-cpu 4 --num-iteration 15

Enjoy the Trained Agent

To load a trained agent and see the result:

python -m replay.enjoy_baselines --log-dir path/to/trained/agent/ --render

Add Your Own RL Algorithm

  1. Create a class that inherits rl_baselines.base_classes.BaseRLObject which implements your algorithm
  2. Add your class to the registered_rl dictionary in rl_baselines/registry.py, using this format NAME: (CLASS, ALGO_TYPE, [ACTION_TYPE])
  3. Now you can call your algorithm using --algo NAME with train.py or pipeline.py

S-RL toolbox allows hyperparameter search, using hyperband or hyperopt for the implemented RL algorithms

python -m rl_baselines.hyperparam_search --optimizer hyperband --algo ppo2 --env MobileRobotGymEnv-v0 --srl-model ground_truth

Available Environments

  • Kuka arm
  • Mobile robot
  • Racing car
  • Baxter
  • Robobo
  • Omnidirectional Robot

To test the environment:

python -m environments.dataset_generator --no-record-data --display

To record data (i.e. generate a dataset) from the environment for training a SRL model, using random actions:

python -m environments.dataset_generator --num-cpu 4 --name folder_name

Add a Custom Environment

  1. Create a class that inherits environments.srl_env.SRLGymEnv which implements your environment
  2. Add this code to the same file as the class declaration
  3. Add your class to the registered_env dictionary in environments/registry.py, using this format NAME: (CLASS, SUPER_CLASS, PLOT_TYPE, THREAD_TYPE)
  4. Add the name of the environment to config/srl_models.yaml, with the location of the saved model for each SRL model (can point to a dummy location, but must be defined)
  5. Now you can call your environment using --env NAME with train.py, pipeline.py or dataset_generator.py

S-RL Models

  1. Look the SRL Repo to learn how to train a state representation model
  2. Then you must edit config/srl_models.yaml
  3. Set the right path to use the learned state representations

To train a Reinforcement learning agent on a specific SRL model:

python -m rl_baselines.train --algo ppo2 --log-dir logs/ --srl-model model_name

Available SRL models

The available state representation models are:

  • ground_truth: Hand engineered features (e.g., robot position + target position for mobile robot env)
  • raw_pixels: Learning a policy in an end-to-end manner, directly from pixels to actions.
  • supervised: A model trained with Ground Truth states as targets in a supervised setting.
  • autoencoder: an autoencoder from the raw pixels
  • vae: a variational autoencoder from the raw pixels
  • inverse: an inverse dynamics model
  • forward: a forward dynamics model
  • srl_combination: a model combining several losses (e.g. vae + forward + inverse…) for SRL
  • pca: pca applied to the raw pixels
  • robotic_priors: robotic priors model,.Learning State Representations with Robotic Priors
  • multi_view_srl: a SRL model using views from multiple cameras as input, with any of the above losses (e.g triplet and others)
  • joints: the arm’s joints angles (kuka environments only)
  • joints_position: the arm’s x,y,z position and joints angles (kuka environments only)

Add a Custom SRL Model

If your SRL model is a characteristics of the environment (position, angles, …):

  1. Add the name of the model to the registered_srl dictionary in state_representation/registry.py
  2. Modify the def getSRLState(self, observation) in the environments to return the data you want for this model.
  3. Now you can call your SRL model using --srl-model NAME with train.py or pipeline.py

Otherwise, for the SRL model that are external to the environment (Supervised, autoencoder, …):

  1. Add your SRL model that inherits SRLBaseClass, to the function state_representation.models.loadSRLModel
  2. Add the name of the model to the registered_srl dictionary in state_representation/registry.py
  3. Add the name of the model to config/srl_models.yaml, with the location of the saved model for each environment (can point to a dummy location, but must be defined).
  4. Add the name of the environment to config/srl_models.yaml, with the location of the saved model for each SRL model (can point to a dummy location, but must be defined)
  5. Now you can call your environment using --env NAME with train.py, pipeline.py

SRL Zoo

A collection of State Representation Learning (SRL) methods for Reinforcement Learning, written using PyTorch.

Available Methods

  • SRL with Robotic Priors + extensions (stereo-vision, additional priors)
  • Denoising Autoencoder (DAE)
  • Variational Autoencoder (VAE) and beta-VAE
  • PCA
  • Supervised Learning
  • Forward, Inverse Models
  • Triplet Network (for stereo-vision only)
  • Reward loss
  • Combination and stacking of methods
  • Random Features

Learning a State Representation

To learn a state representation, you need to enforce constrains on the representation using one or more losses.

All losses are defined in losses/losses.py. The available losses are:

  • autoencoder: reconstruction loss, using current and next observation
  • denoising autoencoder (dae): same as for the auto-encoder, except that the model reconstruct inputs from noisy observations containing a random zero-pixel mask
  • vae: (beta)-VAE loss (reconstruction + kullback leiber divergence loss)
  • inverse: predict the action given current and next state
  • forward: predict the next state given current state and taken action
  • reward: predict the reward (positive or not) given current and next state
  • priors: robotic priors losses (see “Learning State Representations with Robotic Priors”)
  • triplet: triplet loss for multi-cam setting (see Multiple Cameras section)

References

  1. Raffin, (2018). S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning.
  2. Lesort, (2018). State Representation Learning for Control: An Overview.

Note: Cover Picture