Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Zhenhailong Wang

Last update: Jul 15, 2022

Related tags

Text Data & NLP Multitask-Finetuning_CLIP

Overview

Downloading our datasets

https://drive.google.com/file/d/1CfomsX6qmdCLfFutptqrQnp1RlaJEpXh/view?usp=sharing
extract and put the /data folder under the same root as /src

Dataset structure

Each dataset may have several subdatasets (most of them only have one)

|
   
   
    
    
    |dataset/
        -|
    
    
     
     
            -|
     
     
      
      
            -|
      
      
       
       
        -|
       
       
         ... |pickled/ -|tensor_dict.pt

The pickle file tensor_dict.pt has the following format:

{
    'subdataset_1':{
        'label_1':{
            'image_tensors':np.array((N,3,224,224)), # N: image number
            'input_ids':np.array(S), # S: token length of the filled template text
            'attention_masks':np.array(S),
            'template_input_ids':np.array(S_), # S_: token length of the un-filled template text
            'template_attention_masks':np.array(S_),
        },
        'label_2':{
            ...
        }
    },
    ...
}

ABO dataset contains an additional label_to_text.json file, which provides text template for each subdataset and label.

A list of available datasets and subdatasets

Dataset	dataset name (-i)	subdataset name (-d)
Clevr Counting	`ClevrCounting`	`counting`
Amazon Berkeley Objects (ABO)	`ABO`	`material`,`color`
Caltech-UCSD Birds 200 (CUB)	`CUB`	`classification`
Fungi	`Fungi`	`classification`
Mini-imagenet	`mini`	`classification`

Training with provided datasets

run.sh provided example code for performing training and meta-testing on our datasets.

Output format

Each model checkpoint dir contains two files:

step1.ckpt: model checkpoint after training phase
dev_test_results.json: scores on each task configuration on dev and test set during meta-testing

Loading checkpoint

Here is an example snippet for loading step1.ckpt from multitask-finetuning/classical-finetuning/zeroshot models:

/step1.ckpt")">

    model = MultitaskFinetuneCLIP()
    model = model.load_from_checkpoint(checkpoint_path="
    
    
     
     /step1.ckpt")

Here is an example snippet for loading step1.ckpt from fomaml models:

/step1.ckpt"))">

    model = LightningCLIP()
    model = l2l.algorithms.MAML(model, lr=1e-5 first_order=True)
    model.load_state_dict(torch.load("
    
    
     
     /step1.ckpt"))

Training with custom datasets

preprocess dataset

put your new dataset in the same format as provided dataset into data/
Specify template_function or the path to label_to_text json file (an example file can be found in /data/ABO/label_to_text.json) at line 350 and 355 in data.py
preprocess.sh provides an example of running data.py to create pickle file for your new dataset
add your dataset into construct_dataset(): line 77 in train.py and line 80 in train_MAML.py

train

modify run.sh to train and meta-test on your own dataset
refer to train.py and train_MAML.py for default and tuning hyperparameters for each algorithm

Citation

Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning This repo is for Findings at EMNLP 2021 paper: Learn Cont

6 Sep 2, 2022

Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Realistic Few-Shot Relation Extraction This repository contains code to reproduce the results in the paper "Towards Realistic Few-Shot Relation Extrac

8 Nov 9, 2022

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

66 Dec 26, 2022

Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

data2vec-pytorch PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI (F

105 Jan 4, 2023

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Related tags

Overview

Downloading our datasets

Dataset structure

A list of available datasets and subdatasets

Training with provided datasets

Output format

Loading checkpoint

Training with custom datasets

preprocess dataset

train

Citation

You might also like...

Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

History Aware Multimodal Transformer for Vision-and-Language Navigation

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

Owner

Zhenhailong Wang

Code for "Finetuning Pretrained Transformers into Variational Autoencoders"

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.