Meta Learning Backpropagation And Improving It (VSML)

Louis Kirsch

Last update: Dec 21, 2022

Related tags

Deep Learning vsml-neurips2021

Overview

Meta Learning Backpropagation And Improving It (VSML)

This is research code for the NeurIPS 2021 publication Kirsch & Schmidhuber 2021.

Many concepts have been proposed for meta learning with neural networks (NNs), e.g., NNs that learn to reprogram fast weights, Hebbian plasticity, learned learning rules, and meta recurrent NNs. Our Variable Shared Meta Learning (VSML) unifies the above and demonstrates that simple weight-sharing and sparsity in an NN is sufficient to express powerful learning algorithms (LAs) in a reusable fashion. A simple implementation of VSML where the weights of a neural network are replaced by tiny LSTMs allows for implementing the backpropagation LA solely by running in forward-mode. It can even meta learn new LAs that differ from online backpropagation and generalize to datasets outside of the meta training distribution without explicit gradient calculation. Introspection reveals that our meta learned LAs learn through fast association in a way that is qualitatively different from gradient descent.

Installation

Create a virtual env

python3 -m venv venv
. venv/bin/activate

Install pip dependencies

pip3 install --upgrade pip wheel setuptools
pip3 install -r requirements.txt

Initialize weights and biases

wandb init

Inspect your results at https://wandb.ai/.

Run instructions

Non distributed

For any algorithm that does not require multiple workers.

python3 launch.py --config_files CONFIG_FILES --config arg1=val1 arg2=val2

Distributed

For any algorithm that does require multiple workers

GPU_COUNT=4 mpirun -n NUM_WORKERS python3 assign_gpu.py python3 launch.py

where NUM_WORKERS is the number of workers to run. The assign_gpu python script distributes the mpi workers evenly over the specified GPUs

Alternatively, specify the CUDA_VISIBLE_DEVICES instead of GPU_COUNT env variable:

CUDA_VISIBLE_DEVICES=0,2,3 mpirun -n NUM_WORKERS python3 assign_gpu.py python3 launch.py

Slurm-based cluster

Modify slurm/schedule.sh and slurm/job.sh to suit your environment.

bash slurm/schedule.sh --nodes=7 --ntasks-per-node=12 -- python3 launch.py --config_files CONFIG_FILES

If only a single worker is required (non-distributed), set --nodes=1 and --ntasks-per-node=1.

Remote (via ssh)

Modify ssh/schedule.sh to suit your environment. Requires gpustat in .local/bin/gpustat, via pip3 install --user gpustat. Also install tmux and mpirun.

bash ssh/schedule.sh --host HOST_NAME --nodes=7 --ntasks-per-node=12 -- python3 launch.py --config_files CONFIG_FILES

Example training runs

Section 4.2 Figure 6

VSML

slurm/schedule.py --nodes=128 --time 04:00:00 -- python3 launch.py --config_files configs/rand_proj.yaml

You can also try fewer nodes and use --config training.population_size=128. Or use backpropagation-based meta optimization --config_files configs/{rand_proj,backprop}.yaml.

Section 4.4 Figure 8

VSML

slurm/schedule.py --array=1-11 --nodes=128 --time 04:00:00 -- python3 launch.py --array configs/array/datasets.yaml

Meta RNN (Hochreiter 2001)

slurm/schedule.py --array=1-11 --nodes=32 --time 04:00:00 -- python3 launch.py --array configs/array/datasets.yaml --config_files configs/{metarnn,pad}.yaml --tags metarnn

Fast weight memory

slurm/schedule.py --array=1-11 --nodes=32 --time 04:00:00 -- python3 launch.py --array configs/array/datasets.yaml --config_files configs/{fwmemory,pad}.yaml --tags fwmemory

SGD

slurm/schedule.py --array=1-4 --nodes=2 --time 00:15:00 -- python3 launch.py --array configs/array/sgd.yaml --config_files configs/sgd.yaml --tags sgd

Hebbian

slurm/schedule.py --array=1-11 --nodes=32 --time 04:00:00 -- python3 launch.py --array configs/array/datasets.yaml --config_files configs/{hebbian,pad}.yaml --tags hebbian

Meta Learning Backpropagation And Improving It (VSML)

Related tags

Overview

Meta Learning Backpropagation And Improving It (VSML)

Installation

Run instructions

Non distributed

Distributed

Slurm-based cluster

Remote (via ssh)

Example training runs

Section 4.2 Figure 6

Section 4.4 Figure 8

You might also like...

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

Code for "Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation". [AAAI 2021]

DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates

Meta Representation Transformation for Low-resource Cross-lingual Learning

Official implementation of "MetaSDF: Meta-learning Signed Distance Functions"

NeRF Meta-Learning with PyTorch

Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs

Meta-learning for NLP

Owner

Louis Kirsch

Physics-Aware Training (PAT) is a method to train real physical systems with backpropagation.

Proximal Backpropagation - a neural network training algorithm that takes implicit instead of explicit gradient steps

Implementation of "Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner"

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Repository for "Improving evidential deep learning via multi-task learning," published in AAAI2022

EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

CNN Based Meta-Learning for Noisy Image Classification and Template Matching