Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Overview

OFA

[Paper] [Blog] [Colab] [Spaces]

Overview

OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.) to a simple sequence-to-sequence learning framework. For more information, please refer to our paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework.

News

  • 2022.2.13: Released the demo of image captioning. Have fun! Hugging Face Spaces
  • 2022.2.11: Released the Colab notebook for image captioning . Enjoy!
  • 2022.2.11: Released the pretrained checkpoint of OFA-Large and the complete (2-staged) finetuning code for image captioning.
  • 2022.2.10: Released the inference code & finetuned checkpoint for image captioning, which can reproduce the results on COCO Karparthy test split (149.6 CIDEr)

TODO

  • To release finetuning and inference codes for multimodal downstream tasks soon, including image captioning, VQA, text-to-image generation, SNLI-VE, Referring expression, comprehension, etc.
  • To release codes for pretraining soon.

Approach

approach

Requirements

  • python 3.7.4
  • pytorch 1.8.1
  • torchvision 0.9.1
  • JAVA 1.8 (for COCO evaluation)

Installation

git clone https://github.com/OFA-Sys/OFA
pip install -r requirements.txt

Datasets and Checkpoints

See datasets.md and checkpoints.md.

Pretraining

To release soon:)

Finetuning & Inference

Below we provide methods for fintuning and inference on different downstream tasks.

Caption

  1. Download data and files and put them in the correct directory
  2. Train
cd run_scripts/caption
nohup sh train_caption_stage1.sh &  # stage1, train with cross-entropy loss
nohup sh train_caption_stage2.sh &  # stage2, load the best ckpt of stage1 and train with CIDEr optimization 
  1. Inference
cd run_scripts/caption ; sh evaluate_caption.sh  # inference & evaluate

Gallery

Below we provide examples of OFA in text-to-image generation and open-ended VQA. Also, we demonstrate its performance in unseen task (Grounded QA) as well as unseen domain (Visual Grounding on images from unseen domains).

Text-to-Image Generation (normal query)

t2i_normal

Text-to-Image Generation (counterfactual query)

t2i_counterfactual

Open-Ended VQA

open_vqa

Grounded QA (unseen task)

grounded_qa

Viusal Grounding (unseen domain)

vg

Citation

Please cite our paper if you find it helpful :)

@article{wang2022OFA,
  title={Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework},
  author={Wang, Peng and Yang, An and Men, Rui and Lin, Junyang and Bai, Shuai and Li, Zhikang and Ma, Jianxin and Zhou, Chang and Zhou, Jingren and Yang, Hongxia},
  journal={arXiv e-prints},
  pages={arXiv--2202},
  year={2022}
}

Related Codebase

License

Apache-2.0

Issues
  • ConfigAttributeError when load the checkpoint

    ConfigAttributeError when load the checkpoint

    Hi, Thanks for the great work! I meet problems when I load the pre-trained checkpoint (refcocog_large_best.pt). I load the model by

    overrides={"bpe_dir":"utils/BPE"}
    models, cfg, task = checkpoint_utils.load_model_ensemble_and_task(
            utils.split_paths('checkpoints/refcocog.pt'),
            arg_overrides=overrides
        )
    

    The error occurs

    Traceback (most recent call last):
      File "eval_refcoco.py", line 22, in <module>
        arg_overrides=overrides
      File "/home/tiger/.local/lib/python3.7/site-packages/fairseq-1.0.0a0+4095baa-py3.7-linux-x86_64.egg/fairseq/checkpoint_utils.py", line 457, in load_model_ensemble_and_task
        model = task.build_model(cfg.model)
      File "/opt/tiger/OFA_offical/tasks/mm_tasks/refcoco.py", line 79, in build_model
        if self.cfg.scst:
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 305, in __getattr__
        self._format_and_raise(key=key, value=None, cause=e)
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/base.py", line 101, in _format_and_raise
        type_override=type_override,
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/_utils.py", line 629, in format_and_raise
        _raise(ex, cause)
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/_utils.py", line 610, in _raise
        raise ex  # set end OC_CAUSE=1 for full backtrace
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 303, in __getattr__
        return self._get_impl(key=key, default_value=DEFAULT_VALUE_MARKER)
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 361, in _get_impl
        node = self._get_node(key=key)
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 383, in _get_node
        self._validate_get(key)
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 136, in _validate_get
        key=key, value=value, cause=ConfigAttributeError(msg)
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/base.py", line 101, in _format_and_raise
        type_override=type_override,
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/_utils.py", line 694, in format_and_raise
        _raise(ex, cause)
      File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/_utils.py", line 610, in _raise
        raise ex  # set end OC_CAUSE=1 for full backtrace
    omegaconf.errors.ConfigAttributeError: Key 'scst' not in 'RefcocoConfig'
            full_key: scst
            reference_type=Optional[RefcocoConfig]
            object_type=RefcocoConfig
    

    I would appreciate your help!

    opened by zd11024 12
  • subprocess.CalledProcessError

    subprocess.CalledProcessError

    Hello

    I tried finetuning large model for image captioning but I keep getting subprocess.CalledProcessError. I've tried various numbers for Port number but it did not work out. What could be the possible reason for this error? (though it seems like a gpu distribution issue...) Thank you so much for your help

    export MASTER_PORT=1052
    
    log_dir=./stage2_logs
    save_dir=./stage2_checkpoints
    mkdir -p $log_dir $save_dir
    
    bpe_dir=../../utils/BPE
    user_dir=../../ofa_module
    
    data_dir=../../dataset/caption_data
    data=${data_dir}/caption_train_stage2_new.tsv,${data_dir}/caption_val_ct.tsv
    restore_file=../../checkpoints/caption_stage1_best.pt
    selected_cols=1,4,2
    
    task=caption
    arch=ofa_large
    criterion=scst_reward_criterion
    label_smoothing=0.1
    lr=1e-5
    max_epoch=5
    warmup_ratio=0.06
    batch_size=1
    update_freq=4
    resnet_drop_path_rate=0.0
    encoder_drop_path_rate=0.0
    decoder_drop_path_rate=0.0
    dropout=0.0
    attention_dropout=0.0
    max_src_length=80
    max_tgt_length=20
    num_bins=1000
    patch_image_size=480
    eval_cider_cached=${data_dir}/cider_cached_tokens/coco-valid-words.p
    scst_cider_cached=${data_dir}/cider_cached_tokens/coco-train-words.p
    
    for lr in 5e-6,; do
      echo "lr "${lr}
      for max_epoch in 5; do
        echo "max_epoch "${max_epoch}
    
        log_file=${log_dir}/${lr}"_"${max_epoch}".log"
        save_path=${save_dir}/${lr}"_"${max_epoch}
        mkdir -p $save_path
    
        CUDA_VISIBLE_DEVICES=1,2 python3 -m torch.distributed.launch --nproc_per_node=2 --master_port=${MASTER_PORT} ../../train.py \
            $data \
            --selected-cols=${selected_cols} \
            --bpe-dir=${bpe_dir} \
            --user-dir=${user_dir} \
            --restore-file=${restore_file} \
            --reset-optimizer --reset-dataloader --reset-meters \
            --save-dir=${save_path} \
            --task=${task} \
            --arch=${arch} \
            --criterion=${criterion} \
            --batch-size=${batch_size} \
            --update-freq=${update_freq} \
            --encoder-normalize-before \
            --decoder-normalize-before \
            --share-decoder-input-output-embed \
            --share-all-embeddings \
            --layernorm-embedding \
            --patch-layernorm-embedding \
            --code-layernorm-embedding \
            --resnet-drop-path-rate=${resnet_drop_path_rate} \
            --encoder-drop-path-rate=${encoder_drop_path_rate} \
            --decoder-drop-path-rate=${decoder_drop_path_rate} \
            --dropout=${dropout} \
            --attention-dropout=${attention_dropout} \
            --weight-decay=0.01 --optimizer=adam --adam-betas="(0.9,0.999)" --adam-eps=1e-08 --clip-norm=1.0 \
            --lr-scheduler=polynomial_decay --lr=${lr} --end-learning-rate=2e-7 \
            --max-epoch=${max_epoch} --warmup-ratio=${warmup_ratio} \
            --log-format=simple --log-interval=10 \
            --fixed-validation-seed=7 \
            --no-epoch-checkpoints --keep-best-checkpoints=1 \
            --save-interval=1 --validate-interval=1 \
            --save-interval-updates=500 --validate-interval-updates=500 \
            --eval-cider \
            --eval-cider-cached-tokens=${eval_cider_cached} \
            --eval-args='{"beam":5,"max_len_b":16,"no_repeat_ngram_size":3}' \
            --best-checkpoint-metric=cider --maximize-best-checkpoint-metric \
            --max-src-length=${max_src_length} \
            --max-tgt-length=${max_tgt_length} \
            --find-unused-parameters \
            --freeze-encoder-embedding \
            --freeze-decoder-embedding \
            --freeze-resnet \
            --add-type-embedding \
            --scale-attn \
            --scale-fc \
            --scale-heads \
            --disable-entangle \
            --num-bins=${num_bins} \
            --patch-image-size=${patch_image_size} \
            --scst \
            --scst-cider-cached-tokens=${scst_cider_cached} \
            --scst-args='{"beam":5,"max_len_b":16,"no_repeat_ngram_size":3}' \
            --memory-efficient-fp16 \
            --fp16-scale-window=512 \
            --num-workers=0 > ${log_file} 2>&1
      done
    done
    
    opened by Jihyun0510 10
  • How to train VQA on my custom data?

    How to train VQA on my custom data?

    Hello! I am trying to finetune OFA-large on VQA using custom dataset, using the finetuning instruction in the repo. I have checked my .tsv and .pkl file several times and they are correct as your provided sample. But after command "bash train_vqa_distributed.sh", the terminal just prints:

    total_num_updates 40000 warmup_updates 1000 lr 5e-5 patch_image_size 480

    The GPU usage will rise to a certain value and then suddenly return to zero, and then the program will end. I train on single server with 2 GPU. Looking forward to reply, thanks for your sharing work!

    opened by xiaoqiang-lu 10
  • How can I handle this in a modified model?

    How can I handle this in a modified model?

    Hi, I add another layer to the model but there is a problem that happened after several steps.

    2022-03-21 23:16:50 - progress_bar.py[line:272] - INFO: epoch 001:     41 / 24544 loss=1.825, loss_v1=0, loss_v2=0, nll_loss=1.825, ntokens=16, nsentences=16, sample_size=16, sample_size_v1=0, sample_size_v2=0, ppl=3.54, wps=11.3, ups=0.7, wpb=16, bsz=16, num_updates=41, lr=5.56838e-07, gnorm=32.218, clip=100, loss_scale=16, train_wall=1, gb_free=14.5, wall=67
    2022-03-21 23:16:51 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 8.0
    2022-03-21 23:16:53 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 4.0
    2022-03-21 23:16:54 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 2.0
    2022-03-21 23:16:55 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 1.0
    2022-03-21 23:16:56 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.5
    2022-03-21 23:16:57 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.25
    2022-03-21 23:16:58 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.125
    2022-03-21 23:16:59 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.0625
    2022-03-21 23:17:01 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.03125
    2022-03-21 23:17:02 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.015625
    2022-03-21 23:17:02 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.0078125
    2022-03-21 23:17:03 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.00390625
    2022-03-21 23:17:04 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.001953125
    2022-03-21 23:17:05 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.0009765625
    2022-03-21 23:17:06 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.00048828125
    2022-03-21 23:17:07 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.000244140625
    2022-03-21 23:17:08 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.0001220703125
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:787: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:787: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:787: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:752: UserWarning: Using non-full backward hooks on a Module that does not return a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not return a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:752: UserWarning: Using non-full backward hooks on a Module that does not return a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not return a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:787: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:752: UserWarning: Using non-full backward hooks on a Module that does not return a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not return a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:752: UserWarning: Using non-full backward hooks on a Module that does not return a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not return a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:762: UserWarning: Using non-full backward hooks on a Module that does not take as input a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not take as input a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:762: UserWarning: Using non-full backward hooks on a Module that does not take as input a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not take as input a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:762: UserWarning: Using non-full backward hooks on a Module that does not take as input a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not take as input a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:762: UserWarning: Using non-full backward hooks on a Module that does not take as input a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_input. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using non-full backward hooks on a Module that does not take as input a "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:777: UserWarning: Using a non-full backward hook when outputs are generated by different autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when outputs are generated by different autograd Nodes "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:777: UserWarning: Using a non-full backward hook when outputs are generated by different autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when outputs are generated by different autograd Nodes "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:777: UserWarning: Using a non-full backward hook when outputs are generated by different autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when outputs are generated by different autograd Nodes "
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:777: UserWarning: Using a non-full backward hook when outputs are generated by different autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_output. Please use register_full_backward_hook to get the documented behavior.
      warnings.warn("Using a non-full backward hook when outputs are generated by different autograd Nodes "
    2022-03-21 23:17:09 - nan_detector.py[line:89] - WARNING: NaN detected in output of encoder.layers.2.moe.moe_layer, shape: torch.Size([60, 1, 768]), forward input max: 3.67578125, input min: -7.75
    Traceback (most recent call last):
      File "/workspace/OFA/trainer.py", line 871, in train_step
        grad_norm = self.clip_grad_norm(self.cfg.optimization.clip_norm)
      File "/workspace/OFA/trainer.py", line 1208, in clip_grad_norm
        return self.optimizer.clip_grad_norm(
      File "/workspace/OFA/fairseq/fairseq/optim/fp16_optimizer.py", line 200, in clip_grad_norm
        self.scaler.check_overflow(grad_norm)
      File "/workspace/OFA/fairseq/fairseq/optim/dynamic_loss_scaler.py", line 61, in check_overflow
        raise FloatingPointError(
    FloatingPointError: Minimum loss scale reached (0.0001). Your loss is probably exploding. Try lowering the learning rate, using gradient clipping or increasing the batch size.
    

    Then the training broke down. So how can I fix this problem? Hyperparameter Tuning? Or something else I need to pay attention to? I will really really really appreciate it if you can help me!

    opened by dannyxiaocn 10
  • model initialize part

    model initialize part

    Where OFAModel class initialized?

    I want use model like

    "model = OFAModel(args, encoder, decoder)"

    but i cant find args, encoder, decoder init code each

    where can I get initialized code?

    opened by SangMyeongWoh 9
  • Custom vis-lan dataset - RuntimeError: stack expects each tensor to be equal size, but got [3, 256, 256] at entry 0 and [3, 320, 390] at entry 2

    Custom vis-lan dataset - RuntimeError: stack expects each tensor to be equal size, but got [3, 256, 256] at entry 0 and [3, 320, 390] at entry 2

    Thank you for your amazing work.

    When I use your code to pretrain the model, I got a runtime error as follows:

    Traceback (most recent call last): File "../../train.py", line 528, in cli_main() File "../../train.py", line 521, in cli_main distributed_utils.call_main(cfg, main) File "/home/test/ofa/fairseq/fairseq/distributed/utils.py", line 374, in call_main distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs) File "/home/test/ofa/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main main(cfg, **kwargs) File "../../train.py", line 161, in main disable_iterator_cache=True, File "/home/test/ofa/utils/checkpoint_utils.py", line 288, in load_checkpoint epoch=1, load_dataset=True, **passthrough_args File "/home/test/ofa/trainer.py", line 666, in get_train_iterator self.reset_dummy_batch(batch_iterator.first_batch) File "/home/test/ofa/fairseq/fairseq/data/iterators.py", line 322, in first_batch return self.collate_fn([self.dataset[i] for i in self.frozen_batches[0]]) File "/home/test/ofa/data/pretrain_data/unify_dataset.py", line 636, in collater res_v2 = collate(samples_v2, pad_idx=self.src_dict.pad(), eos_idx=self.eos) File "/home/test/ofa/data/pretrain_data/unify_dataset.py", line 70, in collate patch_images = torch.stack([sample['patch_image'] for sample in samples], dim=0) RuntimeError: stack expects each tensor to be equal size, but got [3, 256, 256] at entry 0 and [3, 320, 390] at entry 2

    The mismatch size would also be [3, 320, 320] sometimes. I tried to use different datasets, but I got the same error, and this error only happens when I use my customized visualization_language.tsv. I followed readme to create the .tsv file as follows:

    unique_id | image (base64 string) | caption | question | answer | ground_truth objects | dataset name | task type -- | -- | -- | -- | -- | -- | -- | --

    The base64 string is generated by your provided code as show in other issues.

    Since the size of images vary, should I resize images before generating base64 string? However, according to issue 106, you state that

    The resizing from the raw size to the specified resolution is done on the fly during training and inference in the getitem method of pytorch dataset.

    I think you mean that I do not need to resize image, and your code will help me do it. Could you clarify this issue? I appreciate your help, thank you!

    opened by taokz 8
  • fine-tune ofa base model for object detection on custom dataset

    fine-tune ofa base model for object detection on custom dataset

    Hi,

    Thanks for sharing the awesome work! Could you please point to me on how to finetune the pretrained ofa model for object detection task? Thank you very much!

    help wanted 
    opened by ilovecv 8
  • where is the json file which is test_predict.json when doing the inference and evaluation?

    where is the json file which is test_predict.json when doing the inference and evaluation?

    When i am doing the inference and evaluation job ,after i run the command,it always raise the FileNOTfound error:can't find the test_predict.json in ../../results/caption/ ,so where is this test_predict.json file,can we have an satisifed answer?

    opened by MrLianSYSU 8
  • Unsuccessful Image Caption step 2 fine-tuning

    Unsuccessful Image Caption step 2 fine-tuning

    Hi, when running the train_caption_stage2.sh, it triggers a AssertError like:

    Traceback (most recent call last):
      File "../../train.py", line 527, in <module>
        cli_main()
      File "../../train.py", line 520, in cli_main
        distributed_utils.call_main(cfg, main)
      File "/workspace/OFA/fairseq/fairseq/distributed/utils.py", line 374, in call_main
        distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
      File "/workspace/OFA/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main
        main(cfg, **kwargs)
      File "../../train.py", line 189, in main
        valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
      File "/opt/conda/lib/python3.8/contextlib.py", line 75, in inner
        return func(*args, **kwds)
      File "../../train.py", line 300, in train
        log_output = trainer.train_step(samples)
      File "/opt/conda/lib/python3.8/contextlib.py", line 75, in inner
        return func(*args, **kwds)
      File "/workspace/OFA/trainer.py", line 773, in train_step
        loss, sample_size_i, logging_output = self.task.train_step(
      File "/workspace/OFA/tasks/ofa_task.py", line 319, in train_step
        loss, sample_size, logging_output = criterion(model, sample, update_num=update_num)
      File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 881, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/workspace/OFA/criterions/scst_loss.py", line 88, in forward
        loss, score, ntokens, nsentences = self.compute_loss(model, sample, reduce=reduce)
      File "/workspace/OFA/criterions/scst_loss.py", line 239, in compute_loss
        gen_target, gen_res, gt_res = self.get_generator_out(model, sample)
      File "/workspace/OFA/criterions/scst_loss.py", line 149, in get_generator_out
        gen_out = self.task.scst_generator.generate([model], sample)
      File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
        return func(*args, **kwargs)
      File "/workspace/OFA/models/sequence_generator.py", line 207, in generate
        return self._generate(models, sample, **kwargs)
      File "/workspace/OFA/models/sequence_generator.py", line 480, in _generate
        assert step < max_len, f"{step} < {max_len}"
    AssertionError: 16 < 16
    

    How can I fix this problem? Thx!

    opened by dannyxiaocn 8
  • Objects Coordinate input

    Objects Coordinate input

    Hi,

    Congratulations on the ICML acceptance!

    I would like to feed the model several sets of coordinate information with the input image and ask question about the object specified in the coordinates, for example, what are person1 (corresponding to coord1) and person2 (corresponding to coord2) doing?, it is possible for OFA to attend to the objects with the coordinate information? If so, what would be the best input format for this?

    Having read the paper I think the grounded captioning in the pre-training task might be most relevant, but I don't see such examples in the pretrain_data_examples, it is still not clear what the best practice to feed the model with multiple coord info in one example. Also I fail to replicate the results shown in Figure 10 from the Appendix, grounded question answering, which model was used for these? And is the input exactly in the format as shown under the images, e.g. what color is the car in the region? region: <loc301> <loc495> <loc501> <loc596>? I assume the 301, 495, 501, 596 are the x1 y1, x2, y2 coordinates? I tried to ask questions about regions this way on customised images but it does not seem to focus on the region provided.

    Thanks!

    opened by chenxwh 7
  • Text-to-Image Generation Code Example

    Text-to-Image Generation Code Example

    Thank you for your awesome project. I look at your repository but I could not find any example for Text-To-Image generation(I mean like what you have provided in google Colab for other tasks).

    Could you please provide a code example that just generates an image from the input text?

    opened by AI-EnabledSoftwareEngineering-AISE 7
  • run train_stage_caption1.sh i get the erro AssertionError: 16 < 16

    run train_stage_caption1.sh i get the erro AssertionError: 16 < 16

    this is the erro log 2022-06-22 15:55:56 - train.py[line:436] - INFO: begin validation on "valid" subset slice_id 0 seek offset 0 slice_id 0 seek offset 0 Traceback (most recent call last): File "../../train.py", line 528, in cli_main() File "../../train.py", line 521, in cli_main distributed_utils.call_main(cfg, main) File "/mnt/hdd3/lcl/imagecaption/ofa/fairseq/fairseq/distributed/utils.py", line 389, in call_main main(cfg, **kwargs) File "../../train.py", line 190, in main valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/mnt/hdd3/lcl/anaconda3/envs/torch1.8/lib/python3.6/contextlib.py", line 52, in inner return func(*args, **kwds) File "../../train.py", line 316, in train cfg, trainer, task, epoch_itr, valid_subsets, end_of_epoch File "../../train.py", line 402, in validate_and_save valid_losses = validate(cfg, trainer, task, epoch_itr, valid_subsets) File "../../train.py", line 472, in validate trainer.valid_step(sample) File "/mnt/hdd3/lcl/anaconda3/envs/torch1.8/lib/python3.6/contextlib.py", line 52, in inner return func(*args, **kwds) File "/mnt/hdd3/lcl/imagecaption/ofa/trainer.py", line 1059, in valid_step sample, self.model, self.criterion, **extra_kwargs File "/mnt/hdd3/lcl/imagecaption/ofa/tasks/mm_tasks/caption.py", line 139, in valid_step hyps, refs = self._inference(self.sequence_generator, sample, model) File "/mnt/hdd3/lcl/imagecaption/ofa/tasks/mm_tasks/caption.py", line 230, in _inference gen_out = self.inference_step(generator, [model], sample) File "/mnt/hdd3/lcl/imagecaption/ofa/fairseq/fairseq/tasks/fairseq_task.py", line 518, in inference_step models, sample, prefix_tokens=prefix_tokens, constraints=constraints File "/mnt/hdd3/lcl/anaconda3/envs/torch1.8/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/mnt/hdd3/lcl/imagecaption/ofa/models/sequence_generator.py", line 207, in generate return self._generate(models, sample, **kwargs) File "/mnt/hdd3/lcl/imagecaption/ofa/models/sequence_generator.py", line 480, in _generate assert step < max_len, f"{step} < {max_len}" AssertionError: 16 < 16 below is my train_stage_caption1.sh `#!/usr/bin/env

    The port for communication. Note that if you want to run multiple tasks on the same machine,

    you need to specify different port numbers.

    export MASTER_PORT=1060 log_dir=./stage1_logs save_dir=./stage1_checkpoints mkdir -p $log_dir $save_dir

    bpe_dir=../../utils/BPE user_dir=../../ofa_module

    data_dir=../../dataset/caption_data data=${data_dir}/caption_stage1_train.tsv,${data_dir}/caption_val.tsv restore_file=../../checkpoints/ofa_large.pt selected_cols=0,4,2

    task=caption arch=ofa_large criterion=adjust_label_smoothed_cross_entropy label_smoothing=0.1 lr=1e-5 max_epoch=5 warmup_ratio=0.06 batch_size=1 update_freq=4 resnet_drop_path_rate=0.0 encoder_drop_path_rate=0.1 decoder_drop_path_rate=0.1 dropout=0.1 attention_dropout=0.0 max_src_length=80 max_tgt_length=20 num_bins=1000 patch_image_size=480 eval_cider_cached=${data_dir}/cider_cached_tokens/coco-valid-words.p drop_worst_ratio=0.2

    log_file="2_0.06.log" save_path="/mnt/hdd3/lcl/imagecaption/ofa/run_scripts/caption/stage1_logs/"

    CUDA_VISIBLE_DEVICES=0 python3 ../../train.py
    $data
    --selected-cols=${selected_cols}
    --bpe-dir=${bpe_dir}
    --user-dir=${user_dir}
    --restore-file=${restore_file}
    --reset-optimizer --reset-dataloader --reset-meters
    --save-dir=${save_path}
    --task=${task}
    --arch=${arch}
    --criterion=${criterion}
    --label-smoothing=${label_smoothing}
    --batch-size=${batch_size}
    --update-freq=${update_freq}
    --encoder-normalize-before
    --decoder-normalize-before
    --share-decoder-input-output-embed
    --share-all-embeddings
    --layernorm-embedding
    --patch-layernorm-embedding
    --code-layernorm-embedding
    --resnet-drop-path-rate=${resnet_drop_path_rate}
    --encoder-drop-path-rate=${encoder_drop_path_rate}
    --decoder-drop-path-rate=${decoder_drop_path_rate}
    --dropout=${dropout}
    --attention-dropout=${attention_dropout}
    --weight-decay=0.01 --optimizer=adam --adam-betas="(0.9,0.999)" --adam-eps=1e-08 --clip-norm=1.0
    --lr-scheduler=polynomial_decay --lr=${lr}
    --max-epoch=2 --warmup-ratio=0.06
    --log-format=simple --log-interval=10
    --fixed-validation-seed=7
    --no-epoch-checkpoints --keep-best-checkpoints=1
    --save-interval=1 --validate-interval=1
    --save-interval-updates=500 --validate-interval-updates=500
    --eval-cider
    --eval-cider-cached-tokens=${eval_cider_cached}
    --eval-args='{"beam":5,"max_len_b":16,"no_repeat_ngram_size":3}'
    --best-checkpoint-metric=cider --maximize-best-checkpoint-metric
    --max-src-length=${max_src_length}
    --max-tgt-length=${max_tgt_length}
    --find-unused-parameters
    --freeze-encoder-embedding
    --freeze-decoder-embedding
    --add-type-embedding
    --scale-attn
    --scale-fc
    --scale-heads
    --disable-entangle
    --num-bins=${num_bins}
    --patch-image-size=${patch_image_size}
    --drop-worst-ratio=${drop_worst_ratio}
    --drop-worst-after=2500
    --fp16
    --fp16-scale-window=512
    --num-workers=0 `

    opened by chunleiml 1
  • OFA-Explainability

    OFA-Explainability

    Thank you for your great work. I retrain your model on captioning tasks and the results are very good. To justify the results of my research I'd like to add a layer of explainability to the OFA decision. To do so, I started with this project because it already adds Explainability to the CLIP. This Colab contains their code. I need to load the OFA model and receive two callable objects, one of which should be Model, which accepts image and text and returns logits per image and logits per text. The second one should be a preprocess object that is a Torchvision transform that converts a PIL picture into a tensor that the returned model may use as input should be the other option.

    Pseudocode should be like this:

    # model : torch.nn.Module, The OFA model
    # preprocess : Callable[[PIL.Image], torch.Tensor]
    
    
    model, preprocess = OFA.load("path/to/ofa_large.pt", device=device)
    
    img_path = "glasses.png"
    img = preprocess(Image.open(img_path)).unsqueeze(0).to(device)
    texts = ["a man with eyeglasses"]
    
    text = OFA.tokenize(texts).to(device)
    
    R_text, R_image = interpret(model=model, image=img, texts=text, device=device)
    ...
    

    Interpret function:

    def interpret(image, texts, model, device):
        batch_size = texts.shape[0]
        images = image.repeat(batch_size, 1, 1, 1)
        logits_per_image, logits_per_text = model(images, texts)
    

    I went through your code but I could not find how should I create such objects. Would you please help me with this problem?

    opened by AI-EnabledSoftwareEngineering-AISE 1
  • Pretraining(reproducing) model on translated dataset

    Pretraining(reproducing) model on translated dataset

    first of all, thank you for releasing the code

    now i would ike to reproduce the model with same vision dataset with translated on different language,as for text dataset. i would like to use filtered mc4 + oscar(common crawl)

    do you have any advice or tips that i should aware? :) (i assume the text processing or encode/decode treatment will be different? )

    also AFAIK the paper not mentioned the hardware requirements for pretraining from scratch..can you share about it? thank you before

    opened by acul3 1
  • Error in fine-tuning on custom dataset

    Error in fine-tuning on custom dataset

    I followed the previous issues #76 #105 #56 and more, to generate the tsv files for VizWiz dataset. I ensured each row of the tsv has 6 values in the order that is desired for the training. If I try running multi-GPU training:

    GPUS_PER_NODE=4 WORKER_CNT=1 export MASTER_ADDR=127.0.0.1 export MASTER_PORT=8214 export RANK=0

    I get two errors:

    File "/fs/cml-scratch/kseelman/VQA/OFA/data/file_dataset.py", line 115, in column_l = [dtype(column_l[col_id]) for col_id, dtype in zip(self.selected_col_ids, self.dtypes)] IndexError: list index out of range

    AND

    RuntimeError: Output 0 of _DDPSinkBackward is a view and is being modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3778252) of binary: /fs/cml-scratch/kseelman/VQA/OFA/env/bin/python3

    ** The weird part is when I run single GPUS_PER_NODE=1, the list index out of range problem does not occur and training works. So if I use single-GPU it works, but I want to use mult-gpu since finetuning takes a long time

    opened by kyleseelman 0
  • Key ‘use_ema_weights_to_init_param' is not in struct

    Key ‘use_ema_weights_to_init_param' is not in struct

    Hi! Thanks for your amazing work. I' m trying to run the train_vqa_distributed.sh with the checkpoint vqa_base_best.pt downloaded from your url, but I meet this error. How can I fix it? (I ’ m using the fairseq repo provided by you)

    opened by L1-M1ng 8
Owner
OFA Sys
OFA Sys
AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations. Each modality’s augmentations are contained within its own sub-library. These sub-libraries include both function-based and class-based transforms, composition operators, and have the option to provide metadata about the transform applied, including its intensity.

Facebook Research 4.5k Jun 18, 2022
[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction Project Page | Paper | Supplementary | Video This reposit

null 265 Jun 29, 2022
Official implementation of the paper ``Unifying Nonlocal Blocks for Neural Networks'' (ICCV'21)

Spectral Nonlocal Block Overview Official implementation of the paper: Unifying Nonlocal Blocks for Neural Networks (ICCV'21) Spectral View of Nonloca

null 89 May 31, 2022
Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

Adam Goodge 13 Jun 9, 2022
Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution Figure: Example visualization of the method and baseline as a

Oliver Hahn 13 May 17, 2022
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

Sunbow Liu 19 Mar 21, 2022
Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Deep Learning for image classification pip install -r http://webia.lip6.fr/~baskiotisn/requirements-amal.txt Train an autoencoder python3 train_auto

Hector Kohler 0 Mar 30, 2022
Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

Elad Hoffer 516 May 4, 2022
Open source implementation of AceNAS: Learning to Rank Ace Neural Architectures with Weak Supervision of Weight Sharing

AceNAS This repo is the experiment code of AceNAS, and is not considered as an official release. We are working on integrating AceNAS as a built-in st

Yuge Zhang 5 Oct 27, 2021
Keras like implementation of Deep Learning architectures from scratch using numpy.

Mini-Keras Keras like implementation of Deep Learning architectures from scratch using numpy. How to contribute? The project contains implementations

MANU S PILLAI 5 Oct 10, 2021
Learning Versatile Neural Architectures by Propagating Network Codes

Learning Versatile Neural Architectures by Propagating Network Codes Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang,

Mingyu Ding 35 Jun 21, 2022
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 152 Jun 13, 2022
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

zhaohu xing 96 Jun 13, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group 825 Jun 20, 2022
Sequence to Sequence Models with PyTorch

Sequence to Sequence models with PyTorch This repository contains implementations of Sequence to Sequence (Seq2Seq) models in PyTorch At present it ha

Sandeep Subramanian 700 Jun 1, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 487 May 25, 2022
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 185 Jun 19, 2022
Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

Cornelius Roemer 22 Jan 15, 2022