Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Overview

OFA

[Paper] [Blog] [Colab] [Spaces]

Overview

OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.) to a simple sequence-to-sequence learning framework. For more information, please refer to our paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework.

News

  • 2022.2.13: Released the demo of image captioning. Have fun! Hugging Face Spaces
  • 2022.2.11: Released the Colab notebook for image captioning . Enjoy!
  • 2022.2.11: Released the pretrained checkpoint of OFA-Large and the complete (2-staged) finetuning code for image captioning.
  • 2022.2.10: Released the inference code & finetuned checkpoint for image captioning, which can reproduce the results on COCO Karparthy test split (149.6 CIDEr)

TODO

  • To release finetuning and inference codes for multimodal downstream tasks soon, including image captioning, VQA, text-to-image generation, SNLI-VE, Referring expression, comprehension, etc.
  • To release codes for pretraining soon.

Approach

approach

Requirements

  • python 3.7.4
  • pytorch 1.8.1
  • torchvision 0.9.1
  • JAVA 1.8 (for COCO evaluation)

Installation

git clone https://github.com/OFA-Sys/OFA
pip install -r requirements.txt

Datasets and Checkpoints

See datasets.md and checkpoints.md.

Pretraining

To release soon:)

Finetuning & Inference

Below we provide methods for fintuning and inference on different downstream tasks.

Caption

  1. Download data and files and put them in the correct directory
  2. Train
cd run_scripts/caption
nohup sh train_caption_stage1.sh &  # stage1, train with cross-entropy loss
nohup sh train_caption_stage2.sh &  # stage2, load the best ckpt of stage1 and train with CIDEr optimization 
  1. Inference
cd run_scripts/caption ; sh evaluate_caption.sh  # inference & evaluate

Gallery

Below we provide examples of OFA in text-to-image generation and open-ended VQA. Also, we demonstrate its performance in unseen task (Grounded QA) as well as unseen domain (Visual Grounding on images from unseen domains).

Text-to-Image Generation (normal query)

t2i_normal

Text-to-Image Generation (counterfactual query)

t2i_counterfactual

Open-Ended VQA

open_vqa

Grounded QA (unseen task)

grounded_qa

Viusal Grounding (unseen domain)

vg

Citation

Please cite our paper if you find it helpful :)

@article{wang2022OFA,
  title={Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework},
  author={Wang, Peng and Yang, An and Men, Rui and Lin, Junyang and Bai, Shuai and Li, Zhikang and Ma, Jianxin and Zhou, Chang and Zhou, Jingren and Yang, Hongxia},
  journal={arXiv e-prints},
  pages={arXiv--2202},
  year={2022}
}

Related Codebase

License

Apache-2.0

Comments
  • Inference model using Huggingface library

    Inference model using Huggingface library

    First of all, Thanks for your amazing work.👍 I'm very surprised at the results you've made. But I have a question. Is it possible to use this model using the transformers library using the checkpoint of the model? You made it possible to infer to the model in the spaces of the transformers library, so are you planning to upload a checkpoint in the transformers library and use that library for the inference? When I saw the colab you posted, it said how to use only fairseq I'll be waiting for the reply. Once again, thank you for the amazing results!

    enhancement 
    opened by fightnyy 20
  • ConfigAttributeError when load the checkpoint

    ConfigAttributeError when load the checkpoint

    Hi, Thanks for the great work! I meet problems when I load the pre-trained checkpoint (refcocog_large_best.pt). I load the model by

    overrides={"bpe_dir":"utils/BPE"}
    models, cfg, task = checkpoint_utils.load_model_ensemble_and_task(
            utils.split_paths('checkpoints/refcocog.pt'),
            arg_overrides=overrides
        )
    

    The error occurs

    Traceback (most recent call last):
      File "eval_refcoco.py", line 22, in <module>
        arg_overrides=overrides
      File "/home/tiger/.local/lib/python3.7/site-packages/fairseq-1.0.0a0+4095baa-py3.7-linux-x86_64.egg/fairseq/checkpoint_utils.py", line 457, in load_model_ensemble_and_task
        model = task.build_model(cfg.model)
      File "/opt/tiger/OFA_offical/tasks/mm_tasks/refcoco.py", line 79, in build_model
        if self.cfg.scst:
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 305, in __getattr__
        self._format_and_raise(key=key, value=None, cause=e)
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/base.py", line 101, in _format_and_raise
        type_override=type_override,
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/_utils.py", line 629, in format_and_raise
        _raise(ex, cause)
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/_utils.py", line 610, in _raise
        raise ex  # set end OC_CAUSE=1 for full backtrace
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 303, in __getattr__
        return self._get_impl(key=key, default_value=DEFAULT_VALUE_MARKER)
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 361, in _get_impl
        node = self._get_node(key=key)
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 383, in _get_node
        self._validate_get(key)
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 136, in _validate_get
        key=key, value=value, cause=ConfigAttributeError(msg)
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/base.py", line 101, in _format_and_raise
        type_override=type_override,
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/_utils.py", line 694, in format_and_raise
        _raise(ex, cause)
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/_utils.py", line 610, in _raise
        raise ex  # set end OC_CAUSE=1 for full backtrace
    omegaconf.errors.ConfigAttributeError: Key 'scst' not in 'RefcocoConfig'
            full_key: scst
            reference_type=Optional[RefcocoConfig]
            object_type=RefcocoConfig
    

    I would appreciate your help!

    opened by zd11024 12
  • Problems with finetuned model for VQAv2 (ms coco)

    Problems with finetuned model for VQAv2 (ms coco)

    I am doing inference for the VQA val set manually to get all answers using your demo colab notebook. I used to do everything like you wrote there and so I was using Pre-trained checkpoint (OFA-Large) as it was in tutorial, the quality was around 68% accuracy on a val set. Then I decided to change the model to the Finetuned checkpoint for VQAv2. It works with the same code, however, it's behaviour is strange, inference is very slow: for pretrained model it was 214k answers by 10 hours and now I got only 30k answers by 15 hours on the same Tesla V100. Also the quality is worse for some reason, it's around 60% accuracy now. Some answers are strange, some are completely correct and some are, for example "bedroom bedroom bedroom bedroom bedroom ...", "no no no no no no no..." etc. For some reason model gives very long answers and doesn't stop generating sequence of words.

    I am a bit confused, why is it happening, maybe I am doing something wrong. I run model as in this code: https://colab.research.google.com/drive/1lsMsF-Vum3MVyXwSVF5E-Y23rHFvj_3y?usp=sharing

    I only change path to finetuned model in this part:

    parser = options.get_generation_parser()
    input_args = ["", "--task=vqa_gen", "--beam=100", "--unnormalized", "--path=checkpoints/vqa_large_best.pt", "--bpe-dir=utils/BPE"]
    args = options.parse_args_and_arch(parser, input_args)
    cfg = convert_namespace_to_omegaconf(args)
    
    opened by 25icecreamflavors 11
  • How to train VQA on my custom data?

    How to train VQA on my custom data?

    Hello! I am trying to finetune OFA-large on VQA using custom dataset, using the finetuning instruction in the repo. I have checked my .tsv and .pkl file several times and they are correct as your provided sample. But after command "bash train_vqa_distributed.sh", the terminal just prints:

    total_num_updates 40000 warmup_updates 1000 lr 5e-5 patch_image_size 480

    The GPU usage will rise to a certain value and then suddenly return to zero, and then the program will end. I train on single server with 2 GPU. Looking forward to reply, thanks for your sharing work!

    opened by xiaoqiang-lu 11
  • The code of OFA-base is inconsistent with the pre-trained checkpoint

    The code of OFA-base is inconsistent with the pre-trained checkpoint

    Thanks for your awesome work. Something has bothered me recently. When I continued to train OFA-base (I tried to collect all the pre-training data of OFA), I found that a few training steps (10 steps) would make the performance of OFA-base worse. I checked the config in checkpoint and the config in pretrain_ofa_base.sh, and found many differences. What might affect the results?

    In addition, I found that there is a dimension inconsistency in the network. decoder.image_position_idx": "<class 'torch.Tensor'> torch.Size([1026]) in code and decoder.image_position_idx": "<class 'torch.Tensor'> torch.Size([1025]) in ckpt. Is this the reason for the decline of performance?

    opened by zzhanghub 10
  • Huggingface transformers inference: ModuleNotFoundError: No module named 'generate'

    Huggingface transformers inference: ModuleNotFoundError: No module named 'generate'

    When running the imports listed in transformers.md:

    from PIL import Image
    from torchvision import transforms
    from transformers import OFATokenizer, OFAModel
    from generate import sequence_generator
    

    I get ModuleNotFoundError: No module named 'generate'

    Where is generate supposed to come from? The implementations of sequence_generator.SequenceGenerator that I see in e.g. fairseq also don't have the same signature, so it's not clear how to proceed.

    opened by steve-marmalade 10
  • subprocess.CalledProcessError

    subprocess.CalledProcessError

    Hello

    I tried finetuning large model for image captioning but I keep getting subprocess.CalledProcessError. I've tried various numbers for Port number but it did not work out. What could be the possible reason for this error? (though it seems like a gpu distribution issue...) Thank you so much for your help

    export MASTER_PORT=1052
    
    log_dir=./stage2_logs
    save_dir=./stage2_checkpoints
    mkdir -p $log_dir $save_dir
    
    bpe_dir=../../utils/BPE
    user_dir=../../ofa_module
    
    data_dir=../../dataset/caption_data
    data=${data_dir}/caption_train_stage2_new.tsv,${data_dir}/caption_val_ct.tsv
    restore_file=../../checkpoints/caption_stage1_best.pt
    selected_cols=1,4,2
    
    task=caption
    arch=ofa_large
    criterion=scst_reward_criterion
    label_smoothing=0.1
    lr=1e-5
    max_epoch=5
    warmup_ratio=0.06
    batch_size=1
    update_freq=4
    resnet_drop_path_rate=0.0
    encoder_drop_path_rate=0.0
    decoder_drop_path_rate=0.0
    dropout=0.0
    attention_dropout=0.0
    max_src_length=80
    max_tgt_length=20
    num_bins=1000
    patch_image_size=480
    eval_cider_cached=${data_dir}/cider_cached_tokens/coco-valid-words.p
    scst_cider_cached=${data_dir}/cider_cached_tokens/coco-train-words.p
    
    for lr in 5e-6,; do
      echo "lr "${lr}
      for max_epoch in 5; do
        echo "max_epoch "${max_epoch}
    
        log_file=${log_dir}/${lr}"_"${max_epoch}".log"
        save_path=${save_dir}/${lr}"_"${max_epoch}
        mkdir -p $save_path
    
        CUDA_VISIBLE_DEVICES=1,2 python3 -m torch.distributed.launch --nproc_per_node=2 --master_port=${MASTER_PORT} ../../train.py \
            $data \
            --selected-cols=${selected_cols} \
            --bpe-dir=${bpe_dir} \
            --user-dir=${user_dir} \
            --restore-file=${restore_file} \
            --reset-optimizer --reset-dataloader --reset-meters \
            --save-dir=${save_path} \
            --task=${task} \
            --arch=${arch} \
            --criterion=${criterion} \
            --batch-size=${batch_size} \
            --update-freq=${update_freq} \
            --encoder-normalize-before \
            --decoder-normalize-before \
            --share-decoder-input-output-embed \
            --share-all-embeddings \
            --layernorm-embedding \
            --patch-layernorm-embedding \
            --code-layernorm-embedding \
            --resnet-drop-path-rate=${resnet_drop_path_rate} \
            --encoder-drop-path-rate=${encoder_drop_path_rate} \
            --decoder-drop-path-rate=${decoder_drop_path_rate} \
            --dropout=${dropout} \
            --attention-dropout=${attention_dropout} \
            --weight-decay=0.01 --optimizer=adam --adam-betas="(0.9,0.999)" --adam-eps=1e-08 --clip-norm=1.0 \
            --lr-scheduler=polynomial_decay --lr=${lr} --end-learning-rate=2e-7 \
            --max-epoch=${max_epoch} --warmup-ratio=${warmup_ratio} \
            --log-format=simple --log-interval=10 \
            --fixed-validation-seed=7 \
            --no-epoch-checkpoints --keep-best-checkpoints=1 \
            --save-interval=1 --validate-interval=1 \
            --save-interval-updates=500 --validate-interval-updates=500 \
            --eval-cider \
            --eval-cider-cached-tokens=${eval_cider_cached} \
            --eval-args='{"beam":5,"max_len_b":16,"no_repeat_ngram_size":3}' \
            --best-checkpoint-metric=cider --maximize-best-checkpoint-metric \
            --max-src-length=${max_src_length} \
            --max-tgt-length=${max_tgt_length} \
            --find-unused-parameters \
            --freeze-encoder-embedding \
            --freeze-decoder-embedding \
            --freeze-resnet \
            --add-type-embedding \
            --scale-attn \
            --scale-fc \
            --scale-heads \
            --disable-entangle \
            --num-bins=${num_bins} \
            --patch-image-size=${patch_image_size} \
            --scst \
            --scst-cider-cached-tokens=${scst_cider_cached} \
            --scst-args='{"beam":5,"max_len_b":16,"no_repeat_ngram_size":3}' \
            --memory-efficient-fp16 \
            --fp16-scale-window=512 \
            --num-workers=0 > ${log_file} 2>&1
      done
    done
    
    opened by Jihyun0510 10
  • How to train OFA for VQA in open-ended?

    How to train OFA for VQA in open-ended?

    Dear authors: Thanks for the great work! In VQA validation, If I want the model to predict the most likely next token (i.e. generating a token in the answer) from the output logits. And then I append this token to the input and repeat this step until the model predicts ⟨EOS⟩. What could I do to achieve it? Thanks a lot!

    opened by qyc-98 10
  • Additional issues trying to finetune on custom (VQA-like) dataset (VizWiz)

    Additional issues trying to finetune on custom (VQA-like) dataset (VizWiz)

    Hello, first I'd like to thank you for your amazing work and especially all the detailed answers to issues.

    I've been following the different issues on the finetuning on a custom dataset (VizWiz) and produced the .tsv files according to your format. You stated in issue #76 that the trainval_ans2label.pkl file is not used when using beam-search evaluation - is this correct?

    I've skipped its creation and training does run for the first epoch. However, upon validation on the valid subset, I get an assertion error in the sequence_generator.py - I've tracked down the error and I can "fix" it by removing the one extra step that is for the EOS marker, but my understanding of how to properly fix that error is limited.

    To give some more information of how the .tsv files look, I have attached an image for the train and val subset.

    Thank you very much for any kind of input in advance! image

    opened by Velcorn 10
  • How can I handle this in a modified model?

    How can I handle this in a modified model?

    Hi, I add another layer to the model but there is a problem that happened after several steps.

    2022-03-21 23:16:50 - progress_bar.py[line:272] - INFO: epoch 001:     41 / 24544 loss=1.825, loss_v1=0, loss_v2=0, nll_loss=1.825, ntokens=16, nsentences=16, sample_size=16, sample_size_v1=0, sample_size_v2=0, ppl=3.54, wps=11.3, ups=0.7, wpb=16, bsz=16, num_updates=41, lr=5.56838e-07, gnorm=32.218, clip=100, loss_scale=16, train_wall=1, gb_free=14.5, wall=67
    2022-03-21 23:16:51 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 8.0
    2022-03-21 23:16:53 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 4.0
    2022-03-21 23:16:54 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 2.0
    2022-03-21 23:16:55 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 1.0
    2022-03-21 23:16:56 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.5
    2022-03-21 23:16:57 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.25
    2022-03-21 23:16:58 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.125
    2022-03-21 23:16:59 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.0625
    2022-03-21 23:17:01 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.03125
    2022-03-21 23:17:02 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.015625
    2022-03-21 23:17:02 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.0078125
    2022-03-21 23:17:03 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.00390625
    2022-03-21 23:17:04 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.001953125
    2022-03-21 23:17:05 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.0009765625
    2022-03-21 23:17:06 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.00048828125
    2022-03-21 23:17:07 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.000244140625
    2022-03-21 23:17:08 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.0001220703125
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:787: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:787: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:787: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:752: UserWarning: Using non-full backward hooks on a Module that does not return a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not return a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:752: UserWarning: Using non-full backward hooks on a Module that does not return a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not return a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:787: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:752: UserWarning: Using non-full backward hooks on a Module that does not return a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not return a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:752: UserWarning: Using non-full backward hooks on a Module that does not return a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not return a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:762: UserWarning: Using non-full backward hooks on a Module that does not take as input a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not take as input a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:762: UserWarning: Using non-full backward hooks on a Module that does not take as input a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not take as input a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:762: UserWarning: Using non-full backward hooks on a Module that does not take as input a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not take as input a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:762: UserWarning: Using non-full backward hooks on a Module that does not take as input a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not take as input a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:777: UserWarning: Using a non-full backward hook when outputs are generated by different autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when outputs are generated by different autograd Nodes "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:777: UserWarning: Using a non-full backward hook when outputs are generated by different autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when outputs are generated by different autograd Nodes "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:777: UserWarning: Using a non-full backward hook when outputs are generated by different autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when outputs are generated by different autograd Nodes "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:777: UserWarning: Using a non-full backward hook when outputs are generated by different autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when outputs are generated by different autograd Nodes "
    2022-03-21 23:17:09 - nan_detector.py[line:89] - WARNING: NaN detected in output of encoder.layers.2.moe.moe_layer, shape: torch.Size([60, 1, 768]), forward input max: 3.67578125, input min: -7.75
    Traceback (most recent call last):
      File "/workspace/OFA/trainer.py", line 871, in train_step
        grad_norm = self.clip_grad_norm(self.cfg.optimization.clip_norm)
      File "/workspace/OFA/trainer.py", line 1208, in clip_grad_norm
        return self.optimizer.clip_grad_norm(
      File "/workspace/OFA/fairseq/fairseq/optim/fp16_optimizer.py", line 200, in clip_grad_norm
        self.scaler.check_overflow(grad_norm)
      File "/workspace/OFA/fairseq/fairseq/optim/dynamic_loss_scaler.py", line 61, in check_overflow
        raise FloatingPointError(
    FloatingPointError: Minimum loss scale reached (0.0001). Your loss is probably exploding. Try lowering the learning rate, using gradient clipping or increasing the batch size.
    

    Then the training broke down. So how can I fix this problem? Hyperparameter Tuning? Or something else I need to pay attention to? I will really really really appreciate it if you can help me!

    opened by dannyxiaocn 10
  • "ignoring input and redirecting stderr to stdout" when fine tuning image captioning

    I'm trying to follow the fine-tuning steps for captioning as listed in readme.md. However my output is just blank and once i hit enter, it exits. Pretraining worked fine, it's fine-tuning thats not working at all. Any idea on what might be causing this issue My GPU has 8GB vram

    image

    opened by teenaxta 9
  • RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!

    RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!

    Hello Authors,

    I am trying to run the model for visual grounding task. During inference I am evaluating the model on REFCOCO dataset on google colab with GPU. But i am getting the below error :

    Traceback (most recent call last): File "../../evaluate.py", line 160, in cli_main() File "../../evaluate.py", line 155, in cli_main cfg, main, ema_eval=args.ema_eval, beam_search_vqa_eval=args.beam_search_vqa_eval, zero_shot=args.zero_shot File "/usr/local/lib/python3.7/dist-packages/fairseq/distributed/utils.py", line 354, in call_main distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs) File "/usr/local/lib/python3.7/dist-packages/fairseq/distributed/utils.py", line 322, in distributed_main cfg.distributed_training.distributed_rank = distributed_init(cfg) File "/usr/local/lib/python3.7/dist-packages/fairseq/distributed/utils.py", line 261, in distributed_init rank=cfg.distributed_training.distributed_rank, File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 610, in init_process_group timeout=timeout, File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 738, in _new_process_group_helper pg = ProcessGroupNCCL(prefix_store, rank, world_size, pg_options) RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found! 2022-11-29 21:02:26 | INFO | fairseq.distributed.utils | distributed init (rank 1): env://

    opened by sjc6752 0
  • How to control data samples in a batch

    How to control data samples in a batch

    Hi!

    Thank you for your amazing work. I have a question about how to control the number of data samples in a batch. According to your paper, you said that

    We mix all the pretraining data within each batch, which contains 2, 048 vision&language samples, 256 object detection samples, 256 image-only samples and 512 text-only samples.

    However, I didn't find where to control the data samples in the code. Could you provide some hints?

    Besides, could you explain the iterations in the logs? For example, when fine-tuning a caption dataset (~4000 training samples and ~1000 validation samples) with a batch size of 8. But the log shows that there are only 130 instead of 500 iterations during training. So the fine-tuning works on the validation dataset rather than the training dataset?

    I would appreciate your help. Thanks.

    opened by taokz 0
  • suggestions for evaluation on NLP tasks?

    suggestions for evaluation on NLP tasks?

    Hi!

    I would like to evaluate glue-like datasets but I didn't find the evaluation script for them. I wrote a script on NLI task as follows,

    user_dir=../../ofa_module bpe_dir=../../utils/BPE

    data=/data/processed_data/tune/nli/nli_test.tsv path=/data/uned_checkpoints/nli/20_7e-5_2/checkpoint_best.pt result_path=../../results/nli selected_cols=0,1,2 split='test'

    python3 -m torch.distributed.launch --nproc_per_node=${GPUS_PER_NODE} --master_port=${MASTER_PORT} ../../evaluate.py
    ${data}
    --path=${path}
    --user-dir=${user_dir}
    --task=mnli
    --batch-size=16
    --log-format=simple --log-interval=10
    --seed=7
    --gen-subset=${split}
    --results-path=${result_path}
    --fp16
    --num-workers=0
    --model-overrides="{"data":"${data}","bpe_dir":"${bpe_dir}","selected_cols":"${selected_cols}"}"

    but I got the error

    Traceback (most recent call last):
      File "../../evaluate.py", line 156, in <module>
        cli_main()
      File "../../evaluate.py", line 151, in cli_main
        cfg, main, ema_eval=args.ema_eval, beam_search_vqa_eval=args.beam_search_vqa_eval, zero_shot=args.zero_shot
      File "/home/user/ofa/fairseq/fairseq/distributed/utils.py", line 374, in call_main
        distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
      File "/home/user/ofa/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main
        main(cfg, **kwargs)
      File "../../evaluate.py", line 134, in main
        result, scores = eval_step(task, generator, models, sample, **kwargs)
      File "/home/user/ofa/utils/eval_utils.py", line 314, in eval_step
        return eval_glue(task, generator, models, sample, **kwargs)
      File "/home/user/ofa/utils/eval_utils.py", line 240, in eval_glue
        results = [{"hyp": hyp, "ref": ref_dict.keys()[0]} for hyp, ref_dict in zip(hyps, sample['ref_dict'])]
      File "/home/user/ofa/utils/eval_utils.py", line 240, in <listcomp>
        results = [{"hyp": hyp, "ref": ref_dict.keys()[0]} for hyp, ref_dict in zip(hyps, sample['ref_dict'])]
    TypeError: 'dict_keys' object is not subscriptable
    

    Could you provide some suggestions? I would appreciate your help.

    opened by taokz 0
  • Effect of VQGAN code randomness

    Effect of VQGAN code randomness

    I understand from #258 that there is randomness in the generated VQGAN code sequences because of Gumbel Softmax, but the different sequences nevertheless reconstruct to similar looking images. However, since the training is done by predicting the sequence tokens and not by comparing the reconstructed images themselves, I am wondering if and how having different token sequences will affect the pretraining and downstream performance? Was this something that had been investigated to check for consistency in performance across different variations of the generated code sequences?

    opened by varadgunjal 2
  • ofa-large image caption performance

    ofa-large image caption performance

    Hi! Thanks for your great work. I have met a question that when I directly use your train_caption_stage1.sh to finetune OFA-large in caption without CIDER optimization, the result I got was not as good as README said. I got about 139.3 and the README says it may be around 139.5. Also, in the test stage, the CIDER is 141.2 and the paper says it may be 142.2. So I wonder if I finetune it with the wrong hyperparamters. Can you share the hyperparameters you use to finetune OFA-large in caption stage-1? Thanks a lot!

    opened by hills-code 8
Owner
OFA Sys
OFA Sys
AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations. Each modality’s augmentations are contained within its own sub-library. These sub-libraries include both function-based and class-based transforms, composition operators, and have the option to provide metadata about the transform applied, including its intensity.

Facebook Research 4.6k Dec 6, 2022
[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction Project Page | Paper | Supplementary | Video This reposit

null 327 Dec 7, 2022
Official implementation of the paper ``Unifying Nonlocal Blocks for Neural Networks'' (ICCV'21)

Spectral Nonlocal Block Overview Official implementation of the paper: Unifying Nonlocal Blocks for Neural Networks (ICCV'21) Spectral View of Nonloca

null 92 Aug 17, 2022
Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

Adam Goodge 22 Nov 22, 2022
Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution Figure: Example visualization of the method and baseline as a

Oliver Hahn 15 Aug 10, 2022
Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Deep Learning for image classification pip install -r http://webia.lip6.fr/~baskiotisn/requirements-amal.txt Train an autoencoder python3 train_auto

Hector Kohler 0 Mar 30, 2022
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

Sunbow Liu 22 Nov 25, 2022
Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

Elad Hoffer 514 Nov 17, 2022
Open source implementation of AceNAS: Learning to Rank Ace Neural Architectures with Weak Supervision of Weight Sharing

AceNAS This repo is the experiment code of AceNAS, and is not considered as an official release. We are working on integrating AceNAS as a built-in st

Yuge Zhang 6 Sep 7, 2022
Keras like implementation of Deep Learning architectures from scratch using numpy.

Mini-Keras Keras like implementation of Deep Learning architectures from scratch using numpy. How to contribute? The project contains implementations

MANU S PILLAI 5 Oct 10, 2021
Learning Versatile Neural Architectures by Propagating Network Codes

Learning Versatile Neural Architectures by Propagating Network Codes Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang,

Mingyu Ding 35 Sep 20, 2022
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 160 Nov 10, 2022
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

zhaohu xing 111 Dec 4, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group 889 Dec 4, 2022
Sequence to Sequence Models with PyTorch

Sequence to Sequence models with PyTorch This repository contains implementations of Sequence to Sequence (Seq2Seq) models in PyTorch At present it ha

Sandeep Subramanian 707 Nov 30, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 489 Nov 21, 2022
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 193 Nov 29, 2022
Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

Cornelius Roemer 24 Oct 26, 2022