Supervised Learning of Behaviors | Super Agents of AI

Supervised Learning of Behaviors

Dec 7, 2019
One minute read

Terminology & Notation

Imitation Learning

DAgger: Dataset Aggregation

DAgger Needs humans to label the data.

DAgger addresses the problem of distributional “drift”.

What if our model is so good that it doesn’t drift?

Need to mimic expert behavior very accurately
But don’t overfit!

Imitation learning: what’s the problem

Humans need to provide data, which is typically finite
- Deep learning works best when data is plentiful
Humans are not good at providing some kinds of actions
Humans can learn autonomously; can our machines do the same?
- Unlimited data from own experience
- Continuous self-improvement

Cost Function

The goal is to:

$\min_\theta E_{s_1:T,a_1:T} \left [ \sum_t c(s_t, a_t) \right ]$

Goal-Conditioned Behavioral Cloning

See more from: Learning Latent Plans from Play

Cost/reward Functions in Theory and Practice

Note: Cover Picture

Previous

Introduction and Course Overview

Next

Introduction to Reinforcement Learning

- CATALOG -

Terminology & Notation

Imitation Learning

DAgger: Dataset Aggregation

Imitation learning: what’s the problem

Cost Function

Goal-Conditioned Behavioral Cloning

Cost/reward Functions in Theory and Practice