Deep Multimodal Neural Architecture Search

Related tags

Deep Learning mmnas
Overview

MMNas: Deep Multimodal Neural Architecture Search

This repository corresponds to the PyTorch implementation of the MMnas for visual question answering (VQA), visual grounding (VGD), and image-text matching (ITM) tasks.

example-image

Prerequisites

Software and Hardware Requirements

You may need a machine with at least 4 GPU (>= 8GB), 50GB memory for VQA and VGD and 150GB for ITM and 50GB free disk space. We strongly recommend to use a SSD drive to guarantee high-speed I/O.

You should first install some necessary packages.

  1. Install Python >= 3.6

  2. Install Cuda >= 9.0 and cuDNN

  3. Install PyTorch >= 0.4.1 with CUDA (Pytorch 1.x is also supported).

  4. Install SpaCy and initialize the GloVe as follows:

    $ pip install -r requirements.txt
    $ wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
    $ pip install en_vectors_web_lg-2.1.0.tar.gz

Dataset Preparations

Please follow the instructions in dataset_setup.md to download the datasets and features.

Search

To search an optimal architecture for a specific task, run

$ python3 search_[vqa|vgd|vqa].py

At the end of each searching epoch, we will output the optimal architecture (choosing operators with largest architecture weight for every block) accroding to current architecture weights. When the optimal architecture doesn't change for several continuous epochs, you can kill the searching process manually.

Training

The following script will start training network with the optimal architecture that we've searched by MMNas:

$ python3 train_[vqa|vgd|itm].py --RUN='train' --ARCH_PATH='./arch/train_vqa.json'

To add:

  1. --VERSION=str, e.g.--VERSION='mmnas_vqa' to assign a name for your this model.

  2. --GPU=str, e.g.--GPU='0, 1, 2, 3' to train the model on specified GPU device.

  3. --NW=int, e.g.--NW=8 to accelerate I/O speed.

  1. --RESUME to start training with saved checkpoint parameters.

  2. --ARCH_PATH can use the different searched architectures.

If you want to evaluate an architecture that you got from seaching stage, for example, it's the output architecture at the 50-th searching epoch for vqa model, you can run

$ python3 train_vqa.py --RUN='train' --ARCH_PATH='[PATH_TO_YOUR_SEARCHING_LOG]' --ARCH_EPOCH=50

Validation and Testing

Offline Evaluation

It's convenient to modify follows args: --RUN={'val', 'test'} --CKPT_PATH=[Your Model Path] to Run val or test Split.

Example:

$ python3 train_vqa.py --RUN='test' --CKPT_PATH=[Your Model Path] --ARCH_PATH=[Searched Architecture Path]

Online Evaluation (ONLY FOR VQA)

Test Result files will stored in ./logs/ckpts/result_test/result_train_[Your Version].json

You can upload the obtained result file to Eval AI to evaluate the scores on test-dev and test-std splits.

Pretrained Models

We provide the pretrained models in pretrained_models.md to reproduce the experimental results in our paper.

Citation

If this repository is helpful for your research, we'd really appreciate it if you could cite the following paper:

@article{yu2020mmnas,
  title={Deep Multimodal Neural Architecture Search},
  author={Yu, Zhou and Cui, Yuhao and Yu, Jun and Wang, Meng and Tao, Dacheng and Tian, Qi},
  journal={Proceedings of the 28th ACM International Conference on Multimedia},
  pages = {3743--3752},
  year={2020}
}
Comments
  • Searching ITM is stucked at epoch10

    Searching ITM is stucked at epoch10

    run search_itm.py is stucked at epoch10. No errors occur and the program does not terminate itself. The last output is as the following,

    evaluate percent 45.2755905511811
    evaluate percent 47.24409448818898
    evaluate percent 49.21259842519685
    evaluate percent 51.181102362204726
    evaluate percent 53.14960629921261
    evaluate percent 55.118110236220474
    evaluate percent 57.08661417322835
    evaluate percent 59.055118110236215
    evaluate percent 61.023622047244096
    evaluate percent 62.99212598425197
    evaluate percent 64.96062992125984
    evaluate percent 66.92913385826772
    evaluate percent 68.89763779527559
    evaluate percent 70.86614173228347
    evaluate percent 72.83464566929135
    evaluate percent 74.80314960629921
    evaluate percent 76.77165354330708
    evaluate percent 78.74015748031496
    evaluate percent 80.70866141732283
    evaluate percent 82.67716535433071
    evaluate percent 84.64566929133859
    evaluate percent 86.61417322834646
    evaluate percent 88.58267716535433
    evaluate percent 90.5511811023622
    evaluate percent 92.51968503937007
    evaluate percent 94.48818897637796
    evaluate percent 96.45669291338582
    evaluate percent 98.4251968503937
    (1014, 5070)
    i2t stat num: 1014
    i2t results: 14.89 37.48 50.79 10.00 34.80
    
    t2i stat num: 5070
    t2i results: 12.31 36.31 51.50 10.00 29.36
    
    reset negative captions ...
    reset negative captions ...
    reset negative captions ...
    reset negative captions ...
    
    

    And the output of nvidia-smi is as the following all the time since the program is stucked.

    Sat Feb 27 18:48:18 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla K40c          On   | 00000000:02:00.0 Off |                    0 |
    | 23%   37C    P0    63W / 235W |   5573MiB / 11441MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    |   1  Tesla K40c          On   | 00000000:03:00.0 Off |                    0 |
    | 23%   43C    P0    69W / 235W |   9840MiB / 11441MiB |    100%      Default |
    +-------------------------------+----------------------+----------------------+
    |   2  Tesla K40m          On   | 00000000:82:00.0 Off |                    0 |
    | N/A   33C    P0    62W / 235W |   5573MiB / 11441MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    |   3  Tesla K40m          On   | 00000000:83:00.0 Off |                    0 |
    | N/A   34C    P0    68W / 235W |   9840MiB / 11441MiB |    100%      Default |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |    0     24425      C   ...naconda3/envs/py36-t101-cu90/bin/python  5562MiB |
    |    1     24426      C   ...naconda3/envs/py36-t101-cu90/bin/python  9827MiB |
    |    2     24427      C   ...naconda3/envs/py36-t101-cu90/bin/python  5562MiB |
    |    3     24428      C   ...naconda3/envs/py36-t101-cu90/bin/python  9827MiB |
    +-----------------------------------------------------------------------------+
    

    I have noticed that epoch10 is the NEG_START_EPOCH, but I have no idea about what is wrong there.

    opened by ghost 6
  • Implementation details of hyper-parameters in CfgSearch and Cfg

    Implementation details of hyper-parameters in CfgSearch and Cfg

    I am following your work and running the code on VGD, with pertained features from dataset_setup.md But I failed to get the results mentioned in the paper: my test accuracy was often 2%~5% lower. Could you possibly provide more experimental details about the hyper-parameters such as CfgSearch and Cfg (e.g. ALPHA_START, ALPHA_EVERY, ALPHA_WEIGHT_DECAY, NET_OPTIM_WARMUP, NET_LR_DECAY_R), and other potentially helpful tricks? Thanks for your preeminent work and help.

    opened by JackyWang2001 2
  • Why add 0 loss to the original loss?

    Why add 0 loss to the original loss?

    https://github.com/MILVLG/mmnas/blob/552e29e666625819799ca22de324df2be50626cc/search_vqa.py#L285-L288 What is this part of the codes aimed at? I'll appreciate it if anyone could explain that.

    opened by ghost 2
  • Errors during searching VQA.

    Errors during searching VQA.

    The error is as the following:

     ========== Answer token vocab size (occur more than 8 times): 3129
     ========== Answer token vocab size (occur more than 8 times): 3129
     ========== Answer token vocab size (occur more than 8 times): 3129
     ========== Answer token vocab size (occur more than 8 times): 3129
    Traceback (most recent call last):
      File "search_vqa.py", line 615, in <module>
        join=True
      File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
        while not spawn_context.join():
      File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join
        raise Exception(msg)
    Exception:
    
    -- Process 3 terminated with the following error:
    Traceback (most recent call last):
      File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
        fn(i, *args)
      File "/home/zhouxx/gprojects/mmnas/search_vqa.py", line 607, in mp_entrance
        exec.run()
      File "/home/zhouxx/gprojects/mmnas/search_vqa.py", line 585, in run
        self.search(train_loader, eval_loader)
      File "/home/zhouxx/gprojects/mmnas/search_vqa.py", line 268, in search
        for step, step_load in enumerate(train_loader):
      File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
        return self._process_data(data)
      File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
        data.reraise()
      File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/_utils.py", line 369, in reraise
        raise self.exc_type(msg)
    KeyError: Caught KeyError in DataLoader worker process 0.
    Original Traceback (most recent call last):
      File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
        data = fetcher.fetch(index)
      File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/home/zhouxx/gprojects/mmnas/mmnas/loader/load_data_vqa.py", line 224, in __getitem__
        frcn_feat = np.load(self.iid_to_frcn_feat_path[iid])
    KeyError: '463620'
    
    
    opened by ghost 1
  • Why does the warning occur? And is it necessary to fix it?

    Why does the warning occur? And is it necessary to fix it?

    The warning is as the following:

    lib/python3.6/site-packages/spacy/util.py:275: UserWarning: [W031] Model 'en_vectors_web_lg' (2.1.0) requires spaCy v2.1 and is incompatible with the current spaCy ver
    sion (2.3.5). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and availabl
    e updates, run: python -m spacy validate
      warnings.warn(warn_msg)
    

    Why does the warning occur? And is it necessary to fix it?

    opened by ghost 1
  • Some questions about how to calculate the gradient of 'alpha_prob'

    Some questions about how to calculate the gradient of 'alpha_prob'

    Why calculate the gradient of alpha_probs like this?

    probs = self.probs_over_ops.data
                for i in range(self.n_choices):     
                    for j in range(self.n_choices): 
                        self.alpha_prob.grad.data[i] += binary_grads[j] * probs[j] * (self.delta_ij(i, j) - probs[i])  
    

    Code In MixedOp.set_arch_param_grad().

    opened by Jay1Zhang 0
  • Potential bugs in `train_itm.py` when generating negative samples

    Potential bugs in `train_itm.py` when generating negative samples

    As the following codes show, https://github.com/MILVLG/mmnas/blob/552e29e666625819799ca22de324df2be50626cc/train_itm.py#L335-L353

    And here is what confuses me, https://github.com/MILVLG/mmnas/blob/552e29e666625819799ca22de324df2be50626cc/train_itm.py#L336-L338 Why use negative caption indices to get corresponding image features? I think the three lines of codes should be removed.

    opened by ghost 0
  • Some questions about the performance when searching with `MODE='full'` or `MODE='two'`

    Some questions about the performance when searching with `MODE='full'` or `MODE='two'`

    I have tried searching VQA with both MODE='full' and MODE='two'. Searching with MODE='two' holds less GPU memory, which is expected. But it takes more time to search with MODE='two' than with MODE='full', which confuses me. Could anyone give some explanation? I'll appreciate it!

    opened by ghost 0
Owner
Vision and Language Group@ MIL
Hangzhou Dianzi University
Vision and Language Group@ MIL
code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? Code for paper: Does Unsupervised Architecture Representation

null 39 Dec 17, 2022
Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

Deep Image Search - AI-Based Image Search Engine Deep Image Search is an AI-based image search engine that includes deep transfer learning features Ex

null 139 Jan 1, 2023
DeepHyper: Scalable Asynchronous Neural Architecture and Hyperparameter Search for Deep Neural Networks

What is DeepHyper? DeepHyper is a software package that uses learning, optimization, and parallel computing to automate the design and development of

DeepHyper Team 214 Jan 8, 2023
Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

DenseNAS The code of the CVPR2020 paper Densely Connected Search Space for More Flexible Neural Architecture Search. Neural architecture search (NAS)

Jamin Fong 291 Nov 18, 2022
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

MultiModal-InfoMax This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Informa

Deep Cognition and Language Research (DeCLaRe) Lab 89 Dec 26, 2022
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model architecture for their classification problems (i.e., DNNs with different types of layers).

Google 3.2k Dec 31, 2022
Rethinking the U-Net architecture for multimodal biomedical image segmentation

MultiResUNet Rethinking the U-Net architecture for multimodal biomedical image segmentation This repository contains the original implementation of "M

Nabil Ibtehaz 308 Jan 5, 2023
Alex Pashevich 62 Dec 24, 2022
[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective [PDF] Wuyang Chen, Xinyu Gong, Zhangyang Wang In ICLR 2

VITA 156 Nov 28, 2022
[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark Accepted as a spotlight paper at ICLR 2021. Table of content File structure Prerequi

null 72 Jan 3, 2023
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

BossNAS This repository contains PyTorch evaluation code, retraining code and pretrained models of our paper: BossNAS: Exploring Hybrid CNN-transforme

Changlin Li 127 Dec 26, 2022
Official implementation of Rethinking Graph Neural Architecture Search from Message-passing (CVPR2021)

Rethinking Graph Neural Architecture Search from Message-passing Intro The GNAS can automatically learn better architecture with the optimal depth of

Shaofei Cai 48 Sep 30, 2022
Block-wisely Supervised Neural Architecture Search with Knowledge Distillation (CVPR 2020)

DNA This repository provides the code of our paper: Blockwisely Supervised Neural Architecture Search with Knowledge Distillation. Illustration of DNA

Changlin Li 215 Dec 19, 2022
"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

NAS-Bench-301 This repository containts code for the paper: "NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search". The

AutoML-Freiburg-Hannover 57 Nov 30, 2022
[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search The official implementation of the paper LightTra

Multimedia Research 290 Dec 24, 2022
Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

Geometry-Aware Gradient Algorithms for Neural Architecture Search This repository contains the code required to run the experiments for the DARTS sear

null 18 May 27, 2022
Official PyTorch implementation of "Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets" (ICLR 2021)

Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets This is the official PyTorch implementation for the paper Rapid Neural A

null 48 Dec 26, 2022
code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

Facebook Research 94 Oct 26, 2022
Few-shot Neural Architecture Search

One-shot Neural Architecture Search uses a single supernet to approximate the performance each architecture. However, this performance estimation is super inaccurate because of co-adaption among operations in supernet.

Yiyang Zhao 38 Oct 18, 2022