Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

OFA Sys

Last update: Jan 8, 2023

Related tags

Deep Learning image-captioning visual-question-answering multimodal text-to-image-synthesis pretraining referring-expression-comprehension vision-and-language-pre-training

Overview

OFA

[Paper] [Blog] [Colab] [Spaces]

OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.) to a simple sequence-to-sequence learning framework. For more information, please refer to our paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework.

News

2022.2.13: Released the demo of image captioning. Have fun!
2022.2.11: Released the Colab notebook for image captioning . Enjoy!
2022.2.11: Released the pretrained checkpoint of OFA-Large and the complete (2-staged) finetuning code for image captioning.
2022.2.10: Released the inference code & finetuned checkpoint for image captioning, which can reproduce the results on COCO Karparthy test split (149.6 CIDEr)

TODO

To release finetuning and inference codes for multimodal downstream tasks soon, including image captioning, VQA, text-to-image generation, SNLI-VE, Referring expression, comprehension, etc.
To release codes for pretraining soon.

Approach

Requirements

python 3.7.4
pytorch 1.8.1
torchvision 0.9.1
JAVA 1.8 (for COCO evaluation)

Installation

git clone https://github.com/OFA-Sys/OFA
pip install -r requirements.txt

Datasets and Checkpoints

See datasets.md and checkpoints.md.

Pretraining

To release soon:)

Finetuning & Inference

Below we provide methods for fintuning and inference on different downstream tasks.

Caption

Download data and files and put them in the correct directory
Train

cd run_scripts/caption
nohup sh train_caption_stage1.sh &  # stage1, train with cross-entropy loss
nohup sh train_caption_stage2.sh &  # stage2, load the best ckpt of stage1 and train with CIDEr optimization

Inference

cd run_scripts/caption ; sh evaluate_caption.sh  # inference & evaluate

Gallery

Below we provide examples of OFA in text-to-image generation and open-ended VQA. Also, we demonstrate its performance in unseen task (Grounded QA) as well as unseen domain (Visual Grounding on images from unseen domains).

Text-to-Image Generation (normal query)

Text-to-Image Generation (counterfactual query)

Open-Ended VQA

Grounded QA (unseen task)

Viusal Grounding (unseen domain)

Citation

Please cite our paper if you find it helpful :)

@article{wang2022OFA,
  title={Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework},
  author={Wang, Peng and Yang, An and Men, Rui and Lin, Junyang and Bai, Shuai and Li, Zhikang and Ma, Jianxin and Zhou, Chang and Zhou, Jingren and Yang, Hongxia},
  journal={arXiv e-prints},
  pages={arXiv--2202},
  year={2022}
}

Related Codebase

Fairseq

License

Apache-2.0

Comments

Inference model using Huggingface library

First of all, Thanks for your amazing work.👍 I'm very surprised at the results you've made. But I have a question. Is it possible to use this model using the transformers library using the checkpoint of the model? You made it possible to infer to the model in the spaces of the transformers library, so are you planning to upload a checkpoint in the transformers library and use that library for the inference? When I saw the colab you posted, it said how to use only fairseq I'll be waiting for the reply. Once again, thank you for the amazing results!
enhancement

opened by fightnyy 20
ofa-large image caption performance

Hi! Thanks for your great work. I have met a question that when I directly use your train_caption_stage1.sh to finetune OFA-large in caption without CIDER optimization, the result I got was not as good as README said. I got about 139.3 and the README says it may be around 139.5. Also, in the test stage, the CIDER is 141.2 and the paper says it may be 142.2. So I wonder if I finetune it with the wrong hyperparamters. Can you share the hyperparameters you use to finetune OFA-large in caption stage-1? Thanks a lot!

opened by hills-code 17

How to evaluate prompt tuning model?

Hi, OFA team,

I have used prompt tuning method to train a vqa-gen task, and evaluate the model via run_scripts/vqa/evaluate_vqa_beam.sh directly, but got error below:

Traceback (most recent call last):
  File "../../evaluate.py", line 160, in <module>
    cli_main()
  File "../../evaluate.py", line 154, in cli_main
    distributed_utils.call_main(
  File "/workspace/project/OFA/fairseq/fairseq/distributed/utils.py", line 376, in call_main
    distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
  File "/workspace/project/OFA/fairseq/fairseq/distributed/utils.py", line 350, in distributed_main
    main(cfg, **kwargs)
  File "../../evaluate.py", line 138, in main
    result, scores = eval_step(task, generator, models, sample, **kwargs)
  File "/workspace/project/OFA/utils/eval_utils.py", line 306, in eval_step
    return eval_vqa_gen(task, generator, models, sample, **kwargs)
  File "/workspace/project/OFA/utils/eval_utils.py", line 47, in eval_vqa_gen
    hypos = task.inference_step(generator, models, sample, prefix_tokens=sample['prefix_tokens'])
  File "/workspace/project/OFA/fairseq/fairseq/tasks/fairseq_task.py", line 517, in inference_step
    return generator.generate(
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/project/OFA/models/sequence_generator.py", line 209, in generate
    return self._generate(models, sample, **kwargs)
  File "/workspace/project/OFA/models/sequence_generator.py", line 354, in _generate
    lprobs, avg_attn_scores = model.forward_decoder(
  File "/workspace/project/OFA/models/sequence_generator.py", line 824, in forward_decoder
    decoder_out = model.decoder.forward(
  File "/workspace/project/OFA/models/ofa/unify_transformer.py", line 1343, in forward
    x, extra = self.extract_features(
  File "/workspace/project/OFA/models/ofa/unify_transformer.py", line 1367, in extract_features
    return self.extract_features_scriptable(
  File "/workspace/project/OFA/models/ofa/unify_transformer.py", line 1532, in extract_features_scriptable
    x, layer_attn, _ = layer(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/project/OFA/models/ofa/unify_transformer_layer.py", line 500, in forward
    x, attn = self.self_attn(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/project/OFA/models/ofa/unify_multihead_attention.py", line 342, in forward
    assert key_padding_mask.size(1) == k.size(1), "{} vs {}".format(
AssertionError: 101 vs 102

Could you please provide an example for evaluating prompt tuning models, thanks?

opened by flymark2010 14

ConfigAttributeError when load the checkpoint

Hi, Thanks for the great work! I meet problems when I load the pre-trained checkpoint (refcocog_large_best.pt). I load the model by

overrides={"bpe_dir":"utils/BPE"}
models, cfg, task = checkpoint_utils.load_model_ensemble_and_task(
        utils.split_paths('checkpoints/refcocog.pt'),
        arg_overrides=overrides
    )

The error occurs

Traceback (most recent call last):
  File "eval_refcoco.py", line 22, in <module>
    arg_overrides=overrides
  File "/home/tiger/.local/lib/python3.7/site-packages/fairseq-1.0.0a0+4095baa-py3.7-linux-x86_64.egg/fairseq/checkpoint_utils.py", line 457, in load_model_ensemble_and_task
    model = task.build_model(cfg.model)
  File "/opt/tiger/OFA_offical/tasks/mm_tasks/refcoco.py", line 79, in build_model
    if self.cfg.scst:
  File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 305, in __getattr__
    self._format_and_raise(key=key, value=None, cause=e)
  File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/base.py", line 101, in _format_and_raise
    type_override=type_override,
  File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/_utils.py", line 629, in format_and_raise
    _raise(ex, cause)
  File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
  File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 303, in __getattr__
    return self._get_impl(key=key, default_value=DEFAULT_VALUE_MARKER)
  File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 361, in _get_impl
    node = self._get_node(key=key)
  File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 383, in _get_node
    self._validate_get(key)
  File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 136, in _validate_get
    key=key, value=value, cause=ConfigAttributeError(msg)
  File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/base.py", line 101, in _format_and_raise
    type_override=type_override,
  File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/_utils.py", line 694, in format_and_raise
    _raise(ex, cause)
  File "/home/tiger/.local/lib/python3.7/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ConfigAttributeError: Key 'scst' not in 'RefcocoConfig'
        full_key: scst
        reference_type=Optional[RefcocoConfig]
        object_type=RefcocoConfig

I would appreciate your help!

opened by zd11024 12

Problems with finetuned model for VQAv2 (ms coco)
I am doing inference for the VQA val set manually to get all answers using your demo colab notebook. I used to do everything like you wrote there and so I was using Pre-trained checkpoint (OFA-Large) as it was in tutorial, the quality was around 68% accuracy on a val set. Then I decided to change the model to the Finetuned checkpoint for VQAv2. It works with the same code, however, it's behaviour is strange, inference is very slow: for pretrained model it was 214k answers by 10 hours and now I got only 30k answers by 15 hours on the same Tesla V100. Also the quality is worse for some reason, it's around 60% accuracy now. Some answers are strange, some are completely correct and some are, for example "bedroom bedroom bedroom bedroom bedroom ...", "no no no no no no no..." etc. For some reason model gives very long answers and doesn't stop generating sequence of words.

I am a bit confused, why is it happening, maybe I am doing something wrong. I run model as in this code: https://colab.research.google.com/drive/1lsMsF-Vum3MVyXwSVF5E-Y23rHFvj_3y?usp=sharing

I only change path to finetuned model in this part:

parser = options.get_generation_parser() input_args = ["", "--task=vqa_gen", "--beam=100", "--unnormalized", "--path=checkpoints/vqa_large_best.pt", "--bpe-dir=utils/BPE"] args = options.parse_args_and_arch(parser, input_args) cfg = convert_namespace_to_omegaconf(args)
opened by 25icecreamflavors 11
How to train VQA on my custom data?

Hello! I am trying to finetune OFA-large on VQA using custom dataset, using the finetuning instruction in the repo. I have checked my .tsv and .pkl file several times and they are correct as your provided sample. But after command "bash train_vqa_distributed.sh", the terminal just prints:

total_num_updates 40000 warmup_updates 1000 lr 5e-5 patch_image_size 480

The GPU usage will rise to a certain value and then suddenly return to zero, and then the program will end. I train on single server with 2 GPU. Looking forward to reply, thanks for your sharing work!

opened by xiaoqiang-lu 11
The code of OFA-base is inconsistent with the pre-trained checkpoint

Thanks for your awesome work. Something has bothered me recently. When I continued to train OFA-base (I tried to collect all the pre-training data of OFA), I found that a few training steps (10 steps) would make the performance of OFA-base worse. I checked the config in checkpoint and the config in pretrain_ofa_base.sh, and found many differences. What might affect the results?

In addition, I found that there is a dimension inconsistency in the network. decoder.image_position_idx": "<class 'torch.Tensor'> torch.Size([1026]) in code and decoder.image_position_idx": "<class 'torch.Tensor'> torch.Size([1025]) in ckpt. Is this the reason for the decline of performance?

opened by zzhanghub 10
Huggingface transformers inference: ModuleNotFoundError: No module named 'generate'
When running the imports listed in transformers.md:

from PIL import Image from torchvision import transforms from transformers import OFATokenizer, OFAModel from generate import sequence_generator

I get ModuleNotFoundError: No module named 'generate'

Where is generate supposed to come from? The implementations of sequence_generator.SequenceGenerator that I see in e.g. fairseq also don't have the same signature, so it's not clear how to proceed.
opened by steve-marmalade 10

subprocess.CalledProcessError

Hello

I tried finetuning large model for image captioning but I keep getting subprocess.CalledProcessError. I've tried various numbers for Port number but it did not work out. What could be the possible reason for this error? (though it seems like a gpu distribution issue...) Thank you so much for your help

export MASTER_PORT=1052

log_dir=./stage2_logs
save_dir=./stage2_checkpoints
mkdir -p $log_dir $save_dir

bpe_dir=../../utils/BPE
user_dir=../../ofa_module

data_dir=../../dataset/caption_data
data=${data_dir}/caption_train_stage2_new.tsv,${data_dir}/caption_val_ct.tsv
restore_file=../../checkpoints/caption_stage1_best.pt
selected_cols=1,4,2

task=caption
arch=ofa_large
criterion=scst_reward_criterion
label_smoothing=0.1
lr=1e-5
max_epoch=5
warmup_ratio=0.06
batch_size=1
update_freq=4
resnet_drop_path_rate=0.0
encoder_drop_path_rate=0.0
decoder_drop_path_rate=0.0
dropout=0.0
attention_dropout=0.0
max_src_length=80
max_tgt_length=20
num_bins=1000
patch_image_size=480
eval_cider_cached=${data_dir}/cider_cached_tokens/coco-valid-words.p
scst_cider_cached=${data_dir}/cider_cached_tokens/coco-train-words.p

for lr in 5e-6,; do
  echo "lr "${lr}
  for max_epoch in 5; do
    echo "max_epoch "${max_epoch}

    log_file=${log_dir}/${lr}"_"${max_epoch}".log"
    save_path=${save_dir}/${lr}"_"${max_epoch}
    mkdir -p $save_path

    CUDA_VISIBLE_DEVICES=1,2 python3 -m torch.distributed.launch --nproc_per_node=2 --master_port=${MASTER_PORT} ../../train.py \
        $data \
        --selected-cols=${selected_cols} \
        --bpe-dir=${bpe_dir} \
        --user-dir=${user_dir} \
        --restore-file=${restore_file} \
        --reset-optimizer --reset-dataloader --reset-meters \
        --save-dir=${save_path} \
        --task=${task} \
        --arch=${arch} \
        --criterion=${criterion} \
        --batch-size=${batch_size} \
        --update-freq=${update_freq} \
        --encoder-normalize-before \
        --decoder-normalize-before \
        --share-decoder-input-output-embed \
        --share-all-embeddings \
        --layernorm-embedding \
        --patch-layernorm-embedding \
        --code-layernorm-embedding \
        --resnet-drop-path-rate=${resnet_drop_path_rate} \
        --encoder-drop-path-rate=${encoder_drop_path_rate} \
        --decoder-drop-path-rate=${decoder_drop_path_rate} \
        --dropout=${dropout} \
        --attention-dropout=${attention_dropout} \
        --weight-decay=0.01 --optimizer=adam --adam-betas="(0.9,0.999)" --adam-eps=1e-08 --clip-norm=1.0 \
        --lr-scheduler=polynomial_decay --lr=${lr} --end-learning-rate=2e-7 \
        --max-epoch=${max_epoch} --warmup-ratio=${warmup_ratio} \
        --log-format=simple --log-interval=10 \
        --fixed-validation-seed=7 \
        --no-epoch-checkpoints --keep-best-checkpoints=1 \
        --save-interval=1 --validate-interval=1 \
        --save-interval-updates=500 --validate-interval-updates=500 \
        --eval-cider \
        --eval-cider-cached-tokens=${eval_cider_cached} \
        --eval-args='{"beam":5,"max_len_b":16,"no_repeat_ngram_size":3}' \
        --best-checkpoint-metric=cider --maximize-best-checkpoint-metric \
        --max-src-length=${max_src_length} \
        --max-tgt-length=${max_tgt_length} \
        --find-unused-parameters \
        --freeze-encoder-embedding \
        --freeze-decoder-embedding \
        --freeze-resnet \
        --add-type-embedding \
        --scale-attn \
        --scale-fc \
        --scale-heads \
        --disable-entangle \
        --num-bins=${num_bins} \
        --patch-image-size=${patch_image_size} \
        --scst \
        --scst-cider-cached-tokens=${scst_cider_cached} \
        --scst-args='{"beam":5,"max_len_b":16,"no_repeat_ngram_size":3}' \
        --memory-efficient-fp16 \
        --fp16-scale-window=512 \
        --num-workers=0 > ${log_file} 2>&1
  done
done

opened by Jihyun0510 10

How to train OFA for VQA in open-ended?

Dear authors: Thanks for the great work! In VQA validation, If I want the model to predict the most likely next token (i.e. generating a token in the answer) from the output logits. And then I append this token to the input and repeat this step until the model predicts ⟨EOS⟩. What could I do to achieve it? Thanks a lot!

opened by qyc-98 10
Additional issues trying to finetune on custom (VQA-like) dataset (VizWiz)

Hello, first I'd like to thank you for your amazing work and especially all the detailed answers to issues.

I've been following the different issues on the finetuning on a custom dataset (VizWiz) and produced the .tsv files according to your format. You stated in issue #76 that the trainval_ans2label.pkl file is not used when using beam-search evaluation - is this correct?

I've skipped its creation and training does run for the first epoch. However, upon validation on the valid subset, I get an assertion error in the sequence_generator.py - I've tracked down the error and I can "fix" it by removing the one extra step that is for the EOS marker, but my understanding of how to properly fix that error is limited.

To give some more information of how the .tsv files look, I have attached an image for the train and val subset.

Thank you very much for any kind of input in advance!

opened by Velcorn 10
Fix NoneType Error

I found a TypeError: 'NoneType' object is not subscriptable when doing evaluation and execute line 159 in evaluate.py. I am not sure if it is the right way to fix it.

opened by xcvil 0
Evaluation: TypeError: 'NoneType' object is not subscriptable
In evaluate.py line 159:

if isinstance(scores[0], tuple):

I would suggest considering the NoneType scenario, e.g., in OFA/utils/eval_utils.py line 55.

So it will be bug-free if you modify evaluate.py line 159 as below:

if scores and isinstance(scores[0], tuple):
opened by xcvil 1

Captioning training does not start in Kaggle

I tried P100 kaggle session with:

!python -m pip install pip==21.2.4
!pip uninstall fairseq
!git clone https://github.com/OFA-Sys/OFA
!cd OFA && pip install -r requirements.txt
!cd OFA/run_scripts/caption && sh evaluate_caption.sh # ok

!python OFA/train.py --task "caption" --user-dir "OFA/ofa_module" # failed

Error: ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /opt/conda/lib/python3.7/site-packages/pyarrow/../../../libarrow.so.500)

Traceback (most recent call last):
  File "OFA/train.py", line 537, in <module>
    cli_main()
  File "OFA/train.py", line 516, in cli_main
    parser = options.get_training_parser()
  File "/kaggle/working/OFA/fairseq/fairseq/options.py", line 38, in get_training_parser
    parser = get_parser("Trainer", default_task)
  File "/kaggle/working/OFA/fairseq/fairseq/options.py", line 227, in get_parser
    utils.import_user_module(usr_args)
  File "/kaggle/working/OFA/fairseq/fairseq/utils.py", line 489, in import_user_module
    importlib.import_module(module_name)
  File "/opt/conda/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/kaggle/working/OFA/ofa_module/__init__.py", line 3, in <module>
    import tasks
  File "/kaggle/working/OFA/tasks/__init__.py", line 3, in <module>
    from .nlg_tasks import *
  File "/kaggle/working/OFA/tasks/nlg_tasks/__init__.py", line 1, in <module>
    from .gigaword import GigawordTask
  File "/kaggle/working/OFA/tasks/nlg_tasks/gigaword.py", line 19, in <module>
    from datasets import load_metric
  File "/opt/conda/lib/python3.7/site-packages/datasets/__init__.py", line 22, in <module>
    import pyarrow
  File "/opt/conda/lib/python3.7/site-packages/pyarrow/__init__.py", line 63, in <module>
    import pyarrow.lib as _lib
ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /opt/conda/lib/python3.7/site-packages/pyarrow/../../../libarrow.so.500)

!apt-get install --only-upgrade libstdc++6 ->

Reading package lists... Done
Building dependency tree       
Reading state information... Done
libstdc++6 is already the newest version (10.3.0-1ubuntu1~20.04).
0 upgraded, 0 newly installed, 0 to remove and 114 not upgraded.

But it didn't solve my problem

opened by Lednik7 0

Question about the Visual Grounding inference result

Hi OFA team,

Thanks for this amazing work! I'm deeply impressed by it. I just have a question about the Visual Grounding result.

When the input text description of an object can be found in the image, the model usually gives the correct bounding box. For example, text "a blue turtle-like pokemon with round head" will give perfect result:

However, when I ask the model to find an object which is not in the image, the model will still give a wrong bounding box. For example, text "a black bird" will output:

In my use case, I hope the model can only answer the image related queries and reject unrelated queries. Do you have any suggestions or ideas for this? Would it help to finetune the model on a new curated dataset containing answerable and unanswerable questions? Or is it possible to fix this issue by doing postprocessing based on the current model?

The model I'm using is ofa_visual-grounding_refcoco_large_en. Looking forward to your reply. Thanks again for your time!

opened by Owen-Qin 1
How can I output one caption using image sequence?

I want to generate a video caption, but how can I use the image sequence? I could not find out which part to modify to use more than one image for the caption.

opened by funykatebird 1

training number of ofa-cn-muge

    It is fine to just use the pretrained model as it is pretrained on many image-text pairs. See [https://github.com/OFA-Sys/OFA/blob/main/checkpoints_cn.md](https://github.com/OFA-Sys/OFA/blob/main/checkpoints_cn.md). To achieve a better effect, finetuning on domain-specific data is recommended. Now we only provide one caption model finetuned on MUGE caption data, which are collected from the e-commerce.

Originally posted by @JustinLin610 in https://github.com/OFA-Sys/OFA/issues/227#issuecomment-1236575608

How many training data do you use to finetune OFA-CN-MUGE？50000 images in the ECommerce-IC.zip from https://tianchi.aliyun.com/dataset/107332 ?

opened by yangjianxin1 1

Owner

OFA Sys

GitHub

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations. Each modality’s augmentations are contained within its own sub-library. These sub-libraries include both function-based and class-based transforms, composition operators, and have the option to provide metadata about the transform applied, including its intensity.

4.6k Jan 9, 2023

[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction Project Page | Paper | Supplementary | Video This reposit

331 Dec 28, 2022

Official implementation of the paper ``Unifying Nonlocal Blocks for Neural Networks'' (ICCV'21)

Spectral Nonlocal Block Overview Official implementation of the paper: Unifying Nonlocal Blocks for Neural Networks (ICCV'21) Spectral View of Nonloca

91 Dec 14, 2022

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

25 Dec 28, 2022

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution Figure: Example visualization of the method and baseline as a

16 Dec 23, 2022

Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Deep Learning for image classification pip install -r http://webia.lip6.fr/~baskiotisn/requirements-amal.txt Train an autoencoder python3 train_auto

0 Mar 30, 2022

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

22 Nov 25, 2022

Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

514 Nov 17, 2022

Open source implementation of AceNAS: Learning to Rank Ace Neural Architectures with Weak Supervision of Weight Sharing

AceNAS This repo is the experiment code of AceNAS, and is not considered as an official release. We are working on integrating AceNAS as a built-in st

6 Sep 7, 2022

Keras like implementation of Deep Learning architectures from scratch using numpy.

Mini-Keras Keras like implementation of Deep Learning architectures from scratch using numpy. How to contribute? The project contains implementations

5 Oct 10, 2021

Learning Versatile Neural Architectures by Propagating Network Codes

Learning Versatile Neural Architectures by Propagating Network Codes Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang,

36 Dec 6, 2022

A python-image-classification web application project, written in Python and served through the Flask Microframework. This Project implements the VGG16 covolutional neural network, through Keras and Tensorflow wrappers, to make predictions on uploaded images.

Image Classification in Python Implementing image classification in Flask using Keras. The VGG16 is a convolution neural network model architecture th

19 Dec 12, 2022

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

161 Dec 8, 2022

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

112 Dec 16, 2022

Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Related tags

Overview

OFA

News

TODO

Approach

Requirements

Installation

Datasets and Checkpoints

Pretraining

Finetuning & Inference

Caption

Gallery

Text-to-Image Generation (normal query)

Text-to-Image Generation (counterfactual query)

Open-Ended VQA

Grounded QA (unseen task)

Viusal Grounding (unseen domain)

Citation

Related Codebase

License

Comments

Owner

OFA Sys

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

Official implementation of the paper ``Unifying Nonlocal Blocks for Neural Networks'' (ICCV'21)

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Sequence-to-Sequence learning using PyTorch

Open source implementation of AceNAS: Learning to Rank Ace Neural Architectures with Weak Supervision of Weight Sharing

Keras like implementation of Deep Learning architectures from scratch using numpy.

Learning Versatile Neural Architectures by Propagating Network Codes

A python-image-classification web application project, written in Python and served through the Flask Microframework. This Project implements the VGG16 covolutional neural network, through Keras and Tensorflow wrappers, to make predictions on uploaded images.

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Sequence to Sequence Models with PyTorch

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

An implementation of a sequence to sequence neural network using an encoder-decoder

Sequence lineage information extracted from RKI sequence data repo