[NeurIPS'20] Multiscale Deep Equilibrium Models

CMU Locus Lab

Last update: Dec 26, 2022

Related tags

Deep Learning mdeq

Overview

Multiscale Deep Equilibrium Models

💥 💥 💥 💥

This repo is deprecated and we will soon stop actively maintaining it, as a more up-to-date (and simpler & more efficient) implementation of MDEQ with the same set of tasks as here is now available in the DEQ repo.

We STRONGLY recommend using with the MDEQ-Vision code in the DEQ repo (which also supports Jacobian-related analysis).

💥 💥 💥 💥

This repository contains the code for the multiscale deep equilibrium (MDEQ) model proposed in the paper Multiscale Deep Equilibrium Models by Shaojie Bai, Vladlen Koltun and J. Zico Kolter.

Is implicit deep learning relevant for general, large-scale pattern recognition tasks? We propose the multiscale deep equilibrium (MDEQ) model, which expands upon the DEQ formulation substantially to introduce simultaneous equilibrium modeling of multiple signal resolutions. Specifically, MDEQ solves for and backpropagates through synchronized equilibria of multiple feature representation streams. Such structure rectifies one of the major drawbacks of DEQ, and provide natural hierarchical interfaces for auxiliary losses and compound training procedures (e.g., pretraining and finetuning). Our experiment demonstrate for the first time that "shallow" implicit models can scale to and achieve near-SOTA results on practical computer vision tasks (e.g., megapixel images on Cityscapes segmentation).

We provide in this repo the implementation and the links to the pretrained classification & segmentation MDEQ models.

If you find thie repository useful for your research, please consider citing our work:

@inproceedings{bai2020multiscale,
    author    = {Shaojie Bai and Vladlen Koltun and J. Zico Kolter},
    title     = {Multiscale Deep Equilibrium Models},
    booktitle   = {Advances in Neural Information Processing Systems (NeurIPS)},
    year      = {2020},
}

Overview

The structure of a multiscale deep equilibrium model (MDEQ) is shown below. All components of the model are shown in this figure (in practice, we use n=4).

Examples

Some examples of MDEQ segmentation results on the Cityscapes dataset.

Requirements

PyTorch >=1.4.0, torchvision >= 0.4.0

Datasets

CIFAR-10: We download the CIFAR-10 dataset using PyTorch's torchvision package (included in this repo).
ImageNet We follow the implementation from the PyTorch ImageNet Training repo.
Cityscapes: We download the Cityscapes dataset from its official website and process it according to this repo. Cityscapes dataset additionally require a list folder that aligns each original image with its corresponding labeled segmented image. This list folder can be downloaded here.

All datasets should be downloaded, processed and put in the respective data/[DATASET_NAME] directory. The data/ directory should look like the following:

data/
  cityscapes/
  imagenet/
  ...          (other datasets)
  list/        (see above)

Usage

All experiment settings are provided in the .yaml files under the experiments/ folder.

To train an MDEQ classification model on ImageNet/CIFAR-10, do

python tools/cls_train.py --cfg experiments/[DATASET_NAME]/[CONFIG_FILE_NAME].yaml

To train an MDEQ segmentation model on Cityscapes, do

python -m torch.distributed.launch --nproc_per_node=4 tools/seg_train.py --cfg experiments/[DATASET_NAME]/[CONFIG_FILE_NAME].yaml

where you should provide the pretrained ImageNet model path in the corresponding configuration (.yaml) file. We provide a sample pretrained model extractor in pretrained_models/, but you can also write your own script.

Similarly, to test the model and generate segmentation results on Cityscapes, do

python tools/seg_test.py --cfg experiments/[DATASET_NAME]/[CONFIG_FILE_NAME].yaml

You can (and probably should) initiate the Cityscapes training with an ImageNet-pretrained MDEQ. You need to extract the state dict from the ImageNet checkpointed model, and set the MODEL.PRETRAINED entry in Cityscapes yaml file to this state dict on disk.

The model implementation and MDEQ's algorithmic components (e.g., L-Broyden's method) can be found in lib/.

Pre-trained Models

We provide some reasonably good pre-trained weights here so that one can quickly play with DEQs without training from scratch.

Description	Task	Dataset	Model
MDEQ-XL	ImageNet Classification	ImageNet	download (.pkl)
MDEQ-XL	Cityscapes(val) Segmentation	Cityscapes	download (.pkl)
MDEQ-Small	ImageNet Classification	ImageNet	download (.pkl)
MDEQ-Small	Cityscapes(val) Segmentation	Cityscapes	download (.pkl)

I. Example of how to evaluate the pretrained ImageNet model:

Download the pretrained ImageNet .pkl file. (I recommend using the gdown command!)
Put the model under pretrained_models/ folder with some file name [FILENAME].
Run the MDEQ classification validation command:

python tools/cls_valid.py --testModel pretrained_models/[FILENAME] --cfg experiments/imagenet/cls_mdeq_[SIZE].yaml

For example, for MDEQ-Small, you should get >75% top-1 accuracy.

II. Example of how to use the pretrained ImageNet model to train on Cityscapes:

Download the pretrained ImageNet .pkl file.
Put the model under pretrained_models/ folder with some file name [FILENAME].
In the corresponding experiments/cityscapes/seg_MDEQ_[SIZE].yaml (where SIZE is typically SMALL, LARGE or XL), set MODEL.PRETRAINED to "pretrained_models/[FILENAME]".
Run the MDEQ segmentation training command (see the "Usage" section above):

python -m torch.distributed.launch --nproc_per_node=[N_GPUS] tools/seg_train.py --cfg experiments/cityscapes/seg_MDEQ_[SIZE].yaml

III. Example of how to use the pretrained Cityscapes model for inference:

Download the pretrained Cityscapes .pkl file
Put the model under pretrained_models/ folder with some file name [FILENAME].
In the corresponding experiments/cityscapes/seg_MDEQ_[SIZE].yaml (where SIZE is typically SMALL, LARGE or XL), set TEST.MODEL_FILE to "pretrained_models/[FILENAME]".
Run the MDEQ segmentation testing command (see the "Usage" section above):

python tools/seg_test.py --cfg experiments/cityscapes/seg_MDEQ_[SIZE].yaml

Tips:

To load the Cityscapes pretrained model, download the .pkl file and specify the path in config.[TRAIN/TEST].MODEL_FILE (which is '' by default) in the .yaml files. This is different from setting MODEL.PRETRAINED, see the point below.
The difference between [TRAIN/TEST].MODEL_FILE and MODEL.PRETRAINED arguments in the yaml files: the former is used to load all of the model parameters; the latter is for compound training (e.g., when transferring from ImageNet to Cityscapes, we want to discard the final classifier FC layers).
The repo supports checkpointing of models at each epoch. One can resume from a previously saved checkpoint by turning on the TRAIN.RESUME argument in the yaml files.
Just like DEQs, the MDEQ models can be slower than explicit deep networks, and even more so as the image size increases (because larger images typically require more Broyden iterations to converge well; see Figure 5 in the paper). But one can play with the forward and backward thresholds to adjust the runtime.

Acknowledgement

Some utilization code (e.g., model summary and yaml processing) of this repo were modified from the HRNet repo and the DEQ repo.

Comments

out of memory at validation stage
Hi,

Thanks for your code.

I encountered RuntimeError: CUDA out of memory. Tried to allocate 25.49 GiB (GPU 0; 23.70 GiB total capacity; 57.77 MiB already allocated; 22.24 GiB free; 84.00 MiB reserved in total by PyTorch) at eval stage with 24G memory.

python tools/cls_train.py --cfg experiments/cifar/cls_mdeq_TINY.yaml
opened by ShoufaChen 8
Question about the initial update of the Broyden method
Hi @jerrybai1995 ,

As you know I have been interested in the Broyden method you use in this code. I have an implementation question regarding the intial update of the Broyden method that can be found here.

Indeed, we have the following:

each iteration for a given update direction is given in the code as more or less: z = z + update (see here, where s=1 because no line search is performed)

this means that if update is -gx, then we have the first update being z = z - gx.

I think this is in contradiction with the suggested update in the paper, where we should have z = z - B gx = z - (-I) gx = z + gx.

Of course this is very minor since it's just the initialization but I wanted to know your opinion on this given I am not sure how to resolve what I see as a contradiction.
opened by zaccharieramzi 6
The config.yaml settings in experiments/ are quite different from the settings in the paper

@jerrybai1995 @vkoltun @zkolter

Thanks for the nice work! I am quite interested in the MDEQ model and try to train it on the cifar10 dataset. However, I can't get the expected performances reported in the paper. Specifically, the accuracy of MDEQ-small(ours) without data augmentation in Table 1 is 87.1% while I only get 80.3% after removing the data augmentation code in tools/cls_train.py.

I checked the experiments\cifar\cls_mdeq_TINY.yaml and experiments\cifar\cls_mdeq_LARGE.yaml carefully and found that the settings are quite different from the settings in the paper (i.e. Table 4 in Appendix A), including dropout rate, For-Backword Thresholds, group num of GroupNormalization and so on. I have adjusted the setting in Table 4 while the performance is not improved. I have no idea with whether the LR_STEP or other settings in the .yaml file which can not be found in the paper harm the training process.

Is there any suggestion to reimplement the performance(87.1% ± 0.4%) in Table 1 in the paper? In addition, I will appreciate that if you can share the YAML files used in your paper experiments since I find the config.yaml settings of cifar10 and imagenet in experiments/ are quite different from the settings in Table 4 in Appendix A of the original paper.

opened by jianjieluo 6
Setup for CIFAR-10 LARGE

Hi,

I am currently trying to replicate the results of MDEQ for the CIFAR-10 dataset using the LARGE configuration. I noticed that there were discrepancies between the supplementary material of the paper and the file in the code.

I wanted to know which configs allowed you to achieve 93.8 top1 acc. Discrepancies I noted: batch size, weight decay, # channels, # epochs, thresholds, dropout.

Currently, with the content of the file in the code unchanged (except paths, # epochs and resume), this is my learning curve (I did it in 2 steps because I originally had set 50 epochs):

opened by zaccharieramzi 4
如何在自己的数据集上使用MDEQ / How to use MDEQ for my own dataset

作者您好!我现在尝试利用MDEQ来训练自己的数据. 数据集:细胞核分割 ,只区分前景和背景. 数据目录结构已与文中所述一致,现在数据已经能够正常载入. 其余还应该在哪些地方作出修改? 非常期待您的回应.

(Translation by @jerrybai1995): Hi! I'm trying to use MDEQ to train on my own dataset. The dataset is about cell nucleus segmentation, where we only want to distinguish the foreground from the background. The structure of the data directory has already been organized to be consistent with the repo, and we have verified that the data loading process worked correctly. What else should I do to run the MDEQ model on my dataset? I look forward to your response.

opened by qzsrh 3
Linear fixed-point solver in backward

Hi @jerrybai1995,

Thanks a lot for sharing the deq and mdeq repositories!

After reading the Deep Implicit Layers tutorial I had two small questions regarding the implementation of the linear fixed-point equation solving happening in the backward pass.

In both MDEQ and DEQ a function g(x) is defined to solve the linear fixed-point equations. For MDEQ it looks like this:

https://github.com/locuslab/mdeq/blob/eb3d85fa01404fdaaa4913fc220653ffee354078/lib/modules/deq2d.py#L151-L155

Now broyden solves for g(x)=0 (like done in the forward pass at equilibrium). But to agree with the fixed-point equations, wouldn't there need to be an additional term -x added to res? It looks like it's now solving for a vector that makes the VJP equal to -grad. I got confused because g(x) looks very similar to the lambda-function defined in the NeurIPS tutorial using an Anderson fixed-point solver but that solver solves for f(x)=x instead of g(x)=f(x)-x=0.

My second question is about the lack of .clone().detach() in the snippet above. Is there a reason for the .clone().detach() appearing the DEQ implementation in the respective definition of g(x) but not in the MDEQ implementation?

Thank you!

opened by mcbal 2
Slightly higher mIoU for cityscapes

Hi,

I have a slightly higher val mIoU than what is reported in the paper when validating the small MDEQ model on the cityscape dataset (76.5 whereas 75.1 is reported in the paper).

I understand that higher is better for mIoU, but I would like to understand if there are sources of variability in the results. I didn't change (or at least I don't think so), the parameters for the Broyden method so I think it's not coming from there. I just used the seg_test.py module and downloaded your trained small MDEQ, as well as the cityscapes dataset.

opened by zaccharieramzi 2
Downsampling for segmentation

Hi @jerrybai1995 ,

I am coming here after seeing your oral at NeurIPS and talking with you at the poster session. I was looking at your model architecture for segmentation, and noticed that you downsample the full resolution image 4 times before feeding it into the implicit layer. Would it be to time-consuming to train on the full-resolution image? Did you try anyway?

opened by zaccharieramzi 2

[NeurIPS'20] Multiscale Deep Equilibrium Models

Related tags

Overview

Multiscale Deep Equilibrium Models

Overview

Examples

Requirements

Datasets

Usage

Pre-trained Models

Tips:

Acknowledgement

Comments

out of memory at validation stage

Question about the initial update of the Broyden method

The config.yaml settings in experiments/ are quite different from the settings in the paper

Setup for CIFAR-10 LARGE

如何在自己的数据集上使用MDEQ / How to use MDEQ for my own dataset

Linear fixed-point solver in backward

Slightly higher mIoU for cityscapes

Downsampling for segmentation

Owner

CMU Locus Lab

Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021

Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks

This project provides the proof of the uniqueness of the equilibrium and the global asymptotic stability.

Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

pyhsmm - library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.

Code for pre-training CharacterBERT models (as well as BERT models).

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

Minimal diffusion models - Minimal code and simple experiments to play with Denoising Diffusion Probabilistic Models (DDPMs)

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Visualizer for neural network, deep learning, and machine learning models

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.