Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

Overview

This repository contains code for the following two papers:

The model VisualBERT has been also integrated into several libararies such as Huggingface Transformer (many thanks to Gunjan Chhablani who made it work) and Facebook MMF.

Thanks~

Comments
  • Extracting image features for VQA

    Extracting image features for VQA

    https://github.com/uclanlp/visualbert#extracting-image-features

    Could you go into more detail? Should we install the custom pytorch into a new virtual environment, so it doesn't break the pytorch used in training the model? What command do we run with detectron to extract features?

    opened by johntiger1 6
  • The ROIAlign module used in VCR experiments crushes when forwarding

    The ROIAlign module used in VCR experiments crushes when forwarding

    Hi, I am running the experiments on VCR following the instructions in the readme. I have installed the customed torchvision and detectron modules. However, when the COCO pre-training process begins, the program is terminated by segmentation fault while the tensors forward the ROIAlign module. Could you help me to figure out this issue? Thank you very much!

    opened by yangapku 4
  • COCO features

    COCO features

    Hi! Thank you for your excellent work. I noticed that we downloaded COCO features separately for NLVR, VQA and VCR. What is the difference between the features? Are they from different models of detectron2? By the way, could you please provide the script for generating Flickr30k features?

    opened by qinzzz 2
  • Using visualBERT for generation

    Using visualBERT for generation

    Hi, great work with this, very clearly explained and I'm enjoying tinkering around with it. I wanted to try and use the same for text generation - captioning images for example, could you give some guidance on how I could proceed here? I think it will require adding a decoder stack on top of the encoder and can be trained on COCO(which has captions) itself right, in the same way - MLM plus fine tuning on COCO itself? https://arxiv.org/pdf/2003.01473.pdf - these people have done this and their approach is slightly different in that they use 2 BERT encoders in parallel for encoding images and text separately. Do you think generation like that would be possible with visualBERT and how do you think I can proceed to try it out? Since you say your version of BERT is from huggingface, maybe I can use a decoder stack from them? Else - huggingface themselves have a EncoderDecoder class - this may work once trained right? If I preprocess image features the same way you have here?

    opened by nishanthcgit 2
  • Mask Probabilty for Task-specific Pre-training

    Mask Probabilty for Task-specific Pre-training

    Hi, in your paper you mention that task-specific pre-training is also using masked-language modelling similar to task-agnostic pre-training. However, I cannot find any mask probability for the task-specific pre-training .json files. Why is no probability specified & did you use the same 15% probability as for task agnostic pre-training?

    Sorry perhaps I'm missing something here -- Thanks for the help!

    opened by Muennighoff 2
  • Pre-training on other BERT models

    Pre-training on other BERT models

    Thanks for the great repo and your efforts! Two quick questions:

    Is there anything that speaks against pre-training VisualBERT with Albert instead of BERT on COCO and then finetune it for downstream tasks? Also, I havn't found exact details on what resources are needed for pre-training, except for that it took less than a day on COCO according to your paper - How much hours did it take & what GPUs did you use?

    opened by Muennighoff 2
  • Flickr30k Entities fine-tuning clarification

    Flickr30k Entities fine-tuning clarification

    Hi,

    Thanks for the open-source repository.

    I was wondering: how did you implement fine-tuning for the Flickr30k Entities dataset? From the ACL short paper:

    Screen Shot 2020-06-01 at 3 25 11 PM

    What is the loss between the predicted alignment and the ground-truth alignment? As noted in your preprint on arXiv, this can be a bit complicated because the ground-truth alignment can have multiple boxes.

    opened by tonyduan 2
  • config_vcr is no where to be found

    config_vcr is no where to be found

    I am running the following command:

    python train.py --folder log --config ../configs/vcr/fine-tune-qa.json
    
    Traceback (most recent call last):
      File "train.py", line 26, in <module>
        from visualbert.dataloaders.vcr import VCR, VCRLoader
      File "/auto/nlg-05/chengham/third-party/visualbert/dataloaders/vcr.py", line 20, in <module>
        from dataloaders.box_utils import load_image, resize_image, to_tensor_and_normalize
      File "/auto/nlg-05/chengham/third-party/visualbert/dataloaders/box_utils.py", line 8, in <module>
        from config_vcr import USE_IMAGENET_PRETRAINED
    ModuleNotFoundError: No module named 'config_vcr'
    

    I searched the whole repo and cannot find the file.

    opened by ChenghaoMou 2
  • Features vqacoco-pre-train

    Features vqacoco-pre-train

    Hi,

    Thank you for this repo! I would like to know what are the visual features used for the checkpoint: visualbert/configs/vqa/coco-pre-train.json?

    The image features are from which model? Do you have the checkpoint (for example: Detectron e2e_mask_rcnn_R-101-FPN_2x, model_id: 35861858) so that I can use the model with different images?

    opened by RitaRamo 1
  • About evaluation

    About evaluation

    visualbert

    Can I use models on huggingface model hub do evaluation without fiinetuning, and get mentioned performance in your paper?

    1. "uclanlp/visualbert-vqa" evaluate vqa
    2. "uclanlp/visualbert-nlvr2" evaluate nlvr
    opened by renmada 1
  • seq_relationship_score logits order

    seq_relationship_score logits order

    I'm testing this model on the image-sentence-alignment task and I'm observing weird results.

    By running the pretrained COCO model in eval-mode on COCO17 I get results below random chance ( using basically the setting used for the pretraining).

    The 'seq_relationship_score' returns two logits and according to what reported in the doc:

    • index 0 is "next sentence is the continuation"
    • index 1 is "next sentence is random"

    Following the doc, as I said, I get results that would make much more sense if the meaning of the logits was flipped.

    Moreover, that part of the code seems to have been borrowed from the transformers library, and recently a similar issue has been found in another BERT-based model: https://github.com/huggingface/transformers/issues/9212

    We are conducting experiments with your model and it would be convenient for us just to ignore the documentation and to report the results flipped.

    It would be great if you could clarify this point!

    Thank you in advance!

    opened by michelecafagna26 1
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • VisualBERT with Detectron2

    VisualBERT with Detectron2

    Hi,

    I was wondering whether VisualBERT can be used out of the box (from Hugging Face) with Detectron2? I followed this nice tutorial (also linked in the same Hugging Face page) for extracting embeddings with Detectron2, but the VisualBERT paper states that it was trained with Detectron rather than Detectron2. Do I have to do my own pretraining then in order to use Detectron2 embeddings?

    Thanks in advance!

    opened by smfsamir 0
  • How to use visualbert for visual grounding(entity grounding)?

    How to use visualbert for visual grounding(entity grounding)?

    Paperwithcode https://paperswithcode.com/sota/phrase-grounding-on-flickr30k-entities-test?metric=R%4010 shows visualbert can be used for entity grounding. Can you please tell me how to achieve this?

    opened by gagaein 0
  • "VisualBERTDetector not in acceptable choices for type

    I encounter this error when I pretrain on VCR. How can I solve this? allennlp.common.checks.ConfigurationError: "VisualBERTDetector not in acceptable choices for type: ['bcn', 'constituency_parser', 'biaffine_parser', 'coref', 'crf_tagger', 'decomposable_attention', 'event2mind', 'simple_seq2seq', 'bidaf', 'bidaf-ensemble', 'dialog_qa', 'nlvr_coverage_parser', 'nlvr_direct_parser', 'quarel_parser', 'wikitables_mml_parser', 'wikitables_erm_parser', 'atis_parser', 'text2sql_parser', 'srl', 'simple_tagger', 'esim', 'bimpm', 'graph_parser', 'bidirectional-language-model']"

    opened by menggehe 1
Owner
Natural Language Processing @UCLA
Natural Language Processing @UCLA
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 4.6k Jan 1, 2023
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 3.2k Feb 17, 2021
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

null 44 Dec 31, 2022
Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

Amazon 69 Jan 3, 2023
This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields Project Page | Paper | Supplementary | Video | Slides | Blog | Talk If

null 1.1k Dec 27, 2022
Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances This repository contains the code and pre-trained mode

ICTNLP 90 Dec 27, 2022
Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

Francis R. Willett 305 Dec 22, 2022
source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

WhiteningBERT Source code and data for paper WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach. Preparation git clone https://github.com

null 49 Dec 17, 2022
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

null 44 Jan 6, 2023
Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Mask-Align: Self-Supervised Neural Word Alignment This is the implementation of our work Mask-Align: Self-Supervised Neural Word Alignment. @inproceed

THUNLP-MT 46 Dec 15, 2022
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022
Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

?? Fingerprinting Fine-tuned Language Models in the wild This is the code and dataset for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned La

LCS2-IIITDelhi 5 Sep 13, 2022
Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

TRICE: a task-agnostic transferring framework for multi-source sequence generation This is the source code of our work Transfer Learning for Sequence

THUNLP-MT 9 Jun 27, 2022
Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

THUNLP 118 Dec 30, 2022
null 189 Jan 2, 2023
Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

Lars Mescheder 884 Nov 11, 2022
This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

The baseline code is for EDA: Easy Data Augmentation techniques for boosting performance on text classification tasks

Akbar Karimi 81 Dec 9, 2022
This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

null 79 Dec 27, 2022