Frontiers and Open Challenges

Data Distributions of Tasks for Meta-Training

如果元训练的数据分布不当,会造成模型忘记完成特定任务的能力。

比如在姿态估计任务中:

Meta-regularization(MR)

Minimize meta-training loss + information in $\theta$:

$$ L(\theta, D_{meta-train}) + \beta D_{KL}(q(\theta;\theta_\mu,\theta_\sigma) || p(\theta)) $$

Places precedence on using information from $D_{i}^{tr}$ over $\theta$.

Can combine with your favorite meta-learning algorithm.

Unsupervised Meta-RL

General Recipe:

Random Task Proposals

Diversity-Driven Proposals

See more from Diversity is All You Need

Relatively simple mechanisms for proposing tasks work surprisingly well.

Section Summary

We can Learn Prior for Few-shot Adaptation Using:

  1. Non-mutually exclusive tasks, through meta-regularization
  2. From unsegmented time series, via ene-to-end change-point detection
  3. From unlabeled data and experience, using clustering

Can We Meta-Learn Across Task Families

Key idea:

  1. Watch one task demonstration
  2. Try task in new situation
  3. Learn from demo & trial to solve task

How can we train for this in a scalable way?

  1. Collect a few demonstrations for many different tasks
  2. Train a one-shot imitation learning policy
  3. Collect trials for each task by running one-shot imitation policy(batch off-policy collection)
  4. Train “re-trial” policy through imitation objective: $D_{train}: demo + trial(s)$

Performance:

Meta-World Benchmark

Meta-learning and Multi-task RL algorithms seem to struggle.

  1. All tasks individually solvable
  2. All methods given budget with plenty of samples
  3. All methods plenty of capacity

Training models independently performs the best.

See more from Meta-world benchmark.

This must be an optimization challenge.

Solution:

Gradient Surgery for Multi-task Learning.

What does It Take to Run Multi-Task & meta-TL Across Distinct Tasks

Scaling to broad task distributions is hard, can’t be taken for granted:

  • Convey task information beyond reward (e.g. a demo)
  • Train on broad, dense task distributions like Meta-World
  • Avoid conflicting gradients

Open Challenges in Multi-Task and Meta Learning

Machine are specialists, but humans are generalists.

Something Covered in CS330

  • Learn multiple tasks(multi-task learning)
  • Leverage prior experience when learning new things(meta-learning)
  • Learn general-purpose models(model-based RL)
  • Prepare for tasks before you know what they are(exploration, unsupervised learning)
  • Perform tasks in sequence(hierarchical RL)
  • Learn continuously(lifelong learning)

Note: Cover Picture