Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

Bo Wan

Last update: Dec 23, 2022

Related tags

Deep Learning cliora

Overview

CLIORA

This is the official codebase for ICLR oral paper: Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling.

We introduce a new task of Unsupervised Vision-Language Grammar Induction and devise a model Contrastive Language-Image inside-Outside Recursive Autoencoder (CLIORA) to solve it. Please read our paper for more details: https://openreview.net/forum?id=N0n_QyQ5lBF.

This code follows the implementation architecture of DIORA.

Please cite our paper as follows:

@inproceedings{wan2022cliora,
  title={Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling},
  author={Wan, Bo and Han, Wenjuan and Zheng, Zilong and Tuytelaars, Tinne},
  booktitle={The International Conference on Learning Representations (ICLR)},
  year={2022},
}

Envs and Datas

Install dependencies (using Conda as a virtual environment):

conda create -n cliora python=3.8
source activate cliora
pip install -r requirements.txt

Download flickr_data and outputs and put the files as the following structure:

  cliora
  ├───cliora
  │   ├─...
  │
  ├───flickr_data
  │   ├─flickr_feat_maf
  │
  ├───outputs
      ├─flickr

We use the same object features as MAF. Download train_features_compress.hdf5, val features_compress.hdf5, test features_compress.hdf5 to flickr_data/flickr_feat_maf.

Running CLIORA

export PYTHONPATH=$(pwd):$PYTHONPATH


## Train DIORA
sh train_diora.sh

## Test DIORA
sh test_diora.sh

## Train CLOIRA based on DIORA
sh train_clora.sh

## Test CLIORA 
sh test_cliora.sh

Multi-GPU Training

Single-GPU training:

export CUDA_VISIBLE_DEVICES=0
python -m cliora/scripts/train.py
    --cuda
    ... # other args

Multi-GPU Training:

export CUDA_VISIBLE_DEVICES=0,1,2,3
export NGPUS=4
python -m torch.distributed.launch --nproc_per_node=$NGPUS cliora/scripts/train.py
    --cuda
    --multigpu
    ... # other args

Visualization

Download Flickr30K Entities Dataset and put the image folder flickr_images under flickr_data/. Add --visualize when run test_cliora.sh:

# test_cliora.sh
python cliora/scripts/parse.py
    --cuda
    --visualize
    --obj_feats
    ... # other args

Word Embedding

We provide randomly-initialized word embedding, skip-thoughts embedding and ELMo embedding. If you use ELMo embedding and specify the --elmo_cache_dir, then the context-insensitive ELMo vectors will be cached, making it much faster to load these vectors after the initial usage.

Example Usage:

word_emb=none/skip/elmo

python cliora/scripts/train.py
    --emb word_emb
    ... # other args

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

You might also like...

[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

F8Net Fixed-Point 8-bit Only Multiplication for Network Quantization (ICLR 2022 Oral) OpenReview | arXiv | PDF | Model Zoo | BibTex PyTorch implementa

76 Dec 13, 2022

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

R2D2 This is the official code for paper titled "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Mode

49 Dec 17, 2022

Comments

Hyperparams and setup for DIORA with random word embeddings on MSCOCO

Hi Bo Wan,

Thanks for making your implementation publicly available! Could you also share the hyperparameters and setup you used for DIORA on MSCOCO without pretrained embeddings? Your work is the first I've seen that shows DIORA working on random word embeddings, and I believe many would benefit from the reproducibility of those results.

Thanks in advance! Roger

opened by roger-tseng 1

Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

Related tags

Overview

CLIORA

Envs and Datas

Running CLIORA

Multi-GPU Training

Visualization

Word Embedding

License

You might also like...

[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

[ICLR'19] Trellis Networks for Sequence Modeling

An intelligent, flexible grammar of machine learning.

a grammar based feedback fuzzer

A task-agnostic vision-language architecture as a step towards General Purpose Vision

[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Official implementation of the ICLR 2021 paper

Comments

Hyperparams and setup for DIORA with random word embeddings on MSCOCO

Owner

Bo Wan

PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Based on the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral

A PyTorch implementation of ICLR 2022 Oral paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

Implementation for our AAAI2021 paper (Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction).

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang