Optimization-based Meta-Learning

Meta-Learning

Definition of meta-learning:

The probabilistic view of meta-learning:

Choose a form of $p(\phi_i | D_i^{tr}, \theta)$
Choose how to optimize $\theta$ w.r.t max-likelihood objective using $D_{meta-train}$

Pro:

Con:

Key idea: Acquire $\phi_i$ through optimization.

$\max_{\phi} \log p(D_i^{tr} | \phi_{i}) + \log p(\phi_{i} | \theta)$

This assumes meta-parameters $\theta$ serve as a prior. One successful form of prior knowledge is: Initialization for fine-tuning.

Fine-tuning is less effective with very small datasets.

Key idea: Over many tasks, learn parameter vector $\theta$ that transfers via fine-tuning.

From: Finn, Abbeel, Levine. Model-Agnostic Meta-Learning. ICML 2017

For a sufficiently deep f, MAML function can approximate any function of $D_i^{tr}, x^{ts}$ .

How to choose architecture that is effective for inner gradient-step?

Idea: Progressive neural architecture search + MAML (Kim et al. Auto-Meta):

Bi-level optimization can exhibit instabilities.

Back-propagating through many inner gradient steps is compute & memory intensive.

Idea: Derive meta-gradient using the implicit function theorem. Form Rajeswaran, Finn, Kakade, Levine. Implicit MAML, 2019