Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

Overview

An Image Captioning codebase

This is a codebase for image captioning research.

It supports:

A simple demo colab notebook is available here

Requirements

  • Python 3
  • PyTorch 1.3+ (along with torchvision)
  • cider (already been added as a submodule)
  • coco-caption (already been added as a submodule) (Remember to follow initialization steps in coco-caption/README.md)
  • yacs
  • lmdbdict

Install

If you have difficulty running the training scripts in tools. You can try installing this repo as a python package:

python -m pip install -e .

Pretrained models

Checkout MODEL_ZOO.md.

If you want to do evaluation only, you can then follow this section after downloading the pretrained models (and also the pretrained resnet101 or precomputed bottomup features, see data/README.md).

Train your own network on COCO/Flickr30k

Prepare data.

We now support both flickr30k and COCO. See details in data/README.md. (Note: the later sections assume COCO dataset; it should be trivial to use flickr30k.)

Start training

$ python tools/train.py --id fc --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_fc --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30

or

$ python tools/train.py --cfg configs/fc.yml --id fc

The train script will dump checkpoints into the folder specified by --checkpoint_path (default = log_$id/). By default only save the best-performing checkpoint on validation and the latest checkpoint to save disk space. You can also set --save_history_ckpt to 1 to save every checkpoint.

To resume training, you can specify --start_from option to be the path saving infos.pkl and model.pth (usually you could just set --start_from and --checkpoint_path to be the same).

To checkout the training curve or validation curve, you can use tensorboard. The loss histories are automatically dumped into --checkpoint_path.

The current command use scheduled sampling, you can also set --scheduled_sampling_start to -1 to turn off scheduled sampling.

If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use --language_eval 1 option, but don't forget to pull the submodule coco-caption.

For all the arguments, you can specify them in a yaml file and use --cfg to use the configurations in that yaml file. The configurations in command line will overwrite cfg file if there are conflicts.

For more options, see opts.py.

Train using self critical

First you should preprocess the dataset and get the cache for calculating cider score:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Then, copy the model from the pretrained model using cross entropy. (It's not mandatory to copy the model, just for back-up)

$ bash scripts/copy_model.sh fc fc_rl

Then

$ python tools/train.py --id fc_rl --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-5 --start_from log_fc_rl --checkpoint_path log_fc_rl --save_checkpoint_every 6000 --language_eval 1 --val_images_use 5000 --self_critical_after 30 --cached_tokens coco-train-idxs --max_epoch 50 --train_sample_n 5

or

$ python tools/train.py --cfg configs/fc_rl.yml --id fc_rl

You will see a huge boost on Cider score, : ).

A few notes on training. Starting self-critical training after 30 epochs, the CIDEr score goes up to 1.05 after 600k iterations (including the 30 epochs pertraining).

Generate image captions

Evaluate on raw images

Note: this doesn't work for models trained with bottomup feature. Now place all your images of interest into a folder, e.g. blah, and run the eval script:

$ python tools/eval.py --model model.pth --infos_path infos.pkl --image_folder blah --num_images 10

This tells the eval script to run up to 10 images from the given folder. If you have a big GPU you can speed up the evaluation by increasing batch_size. Use --num_images -1 to process all images. The eval script will create an vis.json file inside the vis folder, which can then be visualized with the provided HTML interface:

$ cd vis
$ python -m SimpleHTTPServer

Now visit localhost:8000 in your browser and you should see your predicted captions.

Evaluate on Karpathy's test split

$ python tools/eval.py --dump_images 0 --num_images 5000 --model model.pth --infos_path infos.pkl --language_eval 1 

The defualt split to evaluate is test. The default inference method is greedy decoding (--sample_method greedy), to sample from the posterior, set --sample_method sample.

Beam Search. Beam search can increase the performance of the search for greedy decoding sequence by ~5%. However, this is a little more expensive. To turn on the beam search, use --beam_size N, N should be greater than 1.

Evaluate on COCO test set

$ python tools/eval.py --input_json cocotest.json --input_fc_dir data/cocotest_bu_fc --input_att_dir data/cocotest_bu_att --input_label_h5 none --num_images -1 --model model.pth --infos_path infos.pkl --language_eval 0

You can download the preprocessed file cocotest.json, cocotest_bu_att and cocotest_bu_fc from link.

Miscellanea

Using cpu. The code is currently defaultly using gpu; there is even no option for switching. If someone highly needs a cpu model, please open an issue; I can potentially create a cpu checkpoint and modify the eval.py to run the model on cpu. However, there's no point using cpus to train the model.

Train on other dataset. It should be trivial to port if you can create a file like dataset_coco.json for your own dataset.

Live demo. Not supported now. Welcome pull request.

For more advanced features:

Checkout ADVANCED.md.

Reference

If you find this repo useful, please consider citing (no obligation at all):

@article{luo2018discriminability,
  title={Discriminability objective for training descriptive captions},
  author={Luo, Ruotian and Price, Brian and Cohen, Scott and Shakhnarovich, Gregory},
  journal={arXiv preprint arXiv:1803.04376},
  year={2018}
}

Of course, please cite the original paper of models you are using (You can find references in the model files).

Acknowledgements

Thanks the original neuraltalk2 and awesome PyTorch team.

Comments
  • 关于bottom up and top down模型的问题

    关于bottom up and top down模型的问题

    你好, bottom up and top down的论文里说 60k iterations 训9个小时就能到达cider 120.1 的效果。 但我将你代码模型的参数修改成论文中的参数,并将attention lstm那块的输入修改成每个bounding box 的image feature的均值。但是也达不到论文里的效果。 想问问你,以你的经验来看,觉得会是什么原因呢?

    opened by JimLee4530 26
  • The cider score gets smaller than pretrained model

    The cider score gets smaller than pretrained model

    Hi, i am trying to reproduce your code in tensorflow. I almost write my code as you do, but find the cider score is getting smaller than before. At the same time, the sampling sentence is also getting shorter. Can you give me some advice?

    opened by wjb123 23
  • Structure loss

    Structure loss

    Hi,

    I have a question regarding the structure loss in su+bu+structure. The structure loss weight is similar as the lambda in your discriminality paper? That is scales the rewards with cross-entropy ? Furthermore, if that is the case each of the rewards can as well have different weights?

    Thank you.

    opened by mememimis 20
  • About multi-GPU training

    About multi-GPU training

    hi~, when I tried to train the fc_model on 2 GPUs by setting the environment variable CUDA_VISIBLE_DEVICES=0,1, I ran this code with some errors intrain.py#125, but there were no errors when I ran your codes with only one GPU.

    Does this repository support multi-GPUs training?

    opened by xuyan1115 16
  • evaluation error using pre-trained model

    evaluation error using pre-trained model

    Hi Ruotian,

    I tried to run the following script: python eval.py --model resnet50.pth --infos_path infos.pkl --image_folder ./image/val2014_coco/ --num_images 1

    The model was resnet50 (and resnet101) downloaded from your google drive. But I got the error: Traceback (most recent call last): File "eval.py", line 102, in <module> model.load_state_dict(torch.load(opt.model)) File "/home/wentong/miniconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 522, in load_state_dict .format(name)) KeyError: 'unexpected key "conv1.weight" in state_dict'

    I have searched online but there was little information about that. I guess you have used multiple gpus.

    Any advice? Thank you for your implementation.

    opened by Wentong-DST 15
  • CUDA out of memory in self-critical training but not in xe

    CUDA out of memory in self-critical training but not in xe

    Hello. I am using GTX2080 Ti with 11GB memory. I trained the model using xe and it works fine. But then when I run self-critical, I get CUDA out of memory. How come the model can fit in xe training but not in rl training? It is the same model with the same number of parameters. Any advice would help

    opened by homelifes 12
  • Issue when training

    Issue when training

    I am trying to train the model as the per the instructions given on the repo. I am getting below error:

    [gsrivastava@default-home-gsrivastava-weietgweso self-critical.pytorch]$ python tools/train.py --id fc --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_fc --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30 Hugginface transformers not installed; please visit https://github.com/huggingface/transformers meshed-memory-transformer not installed; please runpip install git+https://github.com/ruotianluo/meshed-memory-transformer.gitDataLoader loading json file: data/cocotalk.json vocab size is 9487 DataLoader loading h5 file: data/cocotalk_fc data/cocotalk_att data/cocotalk_box data/cocotalk_label.h5 max sequence length in data is 16 read 123287 image features assigned 113287 images to split train assigned 5000 images to split val assigned 5000 images to split test /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) 2020-07-27 22:49:15.536672: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 Read data: 0.0002117156982421875 Traceback (most recent call last): File "tools/train.py", line 289, in <module> train(opt) File "tools/train.py", line 182, in train model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag, struc_flag) File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/usr/local/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/usr/local/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/usr/local/lib64/python3.6/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) StopIteration: Caught StopIteration in replica 0 on device 0. Original Traceback (most recent call last): File "/usr/local/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/modules/loss_wrapper.py", line 45, in forward loss = self.crit(self.model(fc_feats, att_feats, labels[..., :-1], att_masks), labels[..., 1:], masks[..., 1:]) File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/models/CaptionModel.py", line 33, in forward return getattr(self, '_'+mode)(*args, **kwargs) File "/home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/models/AttModel.py", line 128, in _forward state = self.init_hidden(batch_size*seq_per_img) File "/home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/models/AttModel.py", line 99, in init_hidden weight = next(self.parameters()) StopIteration

    PyTorch version: '1.5.0+cu101'

    I saw a pytorch - bug https://github.com/huggingface/transformers/issues/3936, which describes a similar issue. Not sure if it related.

    opened by gsrivas4 11
  • transformer性能

    transformer性能

    您好,我使用 transformer.yml训练只能达到 Bleu_1: 0.749 Bleu_2: 0.584 Bleu_3: 0.443 Bleu_4: 0.336 METEOR: 0.271 ROUGE_L: 0.553 CIDEr: 1.092 远远不及您提及的分数,请问您训练参数是如何设置的?

    opened by hasky123 10
  • Type error while training

    Type error while training

    @ruotianluo Sorry again, after fixing the file problem, I got an error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

    run tools/train.py --cfg configs/fc_rl.yml --id fc_rl DataLoader loading json file: data/cocotalk.json vocab size is 9487 DataLoader loading h5 file: data/cocotalk_fc data/cocotalk_att data/cocotalk_box data/cocotalk_label.h5 max sequence length in data is 16 read 123287 image features assigned 113287 images to split train assigned 5000 images to split val assigned 5000 images to split test Read data: 0.003994464874267578 Save ckpt on exception ... model saved to ./log_fc_rl\model.pth Save ckpt done. Traceback (most recent call last): File "D:\Stephan\Final project\self-critical.pytorch-master\tools\train.py", line 183, in train model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag, struc_flag) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\modules\loss_wrapper.py", line 45, in forward loss = self.crit(self.model(fc_feats, att_feats, labels[..., :-1], att_masks), labels[..., 1:], masks[..., 1:]) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\models\CaptionModel.py", line 33, in forward return getattr(self, '_'+mode)(*args, **kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\models\AttModel.py", line 160, in _forward output, state = self.get_logprobs_state(it, p_fc_feats, p_att_feats, pp_att_feats, p_att_masks, state) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\models\AttModel.py", line 167, in get_logprobs_state xt = self.embed(it) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\sparse.py", line 114, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\functional.py", line 1484, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

    opened by stephancheng 10
  • Question regarding self-critical reward

    Question regarding self-critical reward

    Hi Ruotian, I'd like to ask about the nature of the reward when training with self-critical. Is it normal to start at negative or zero for the first few epochs? I am getting the following for the first epoch with self-critical (after training with XE for 13 epochs):

    Capture

    Is this normal? And what is the maximum reward you achieved after training with self-critical?

    Also, I'm using the py3 branch. I saw there are a lot of differences between the py3 and py2 branches. So is the py3 branch reliable to use?

    Looking forward to your answer!

    opened by fawazsammani 9
  • ZeroDivisionError: division by zero

    ZeroDivisionError: division by zero

    ZeroDivisionError: division by zero Hi Dr. Luo. I try to evaluate model, but I meet an error.

    error info:

    python eval.py --model data/pretrain/fc/model-best.pth  --infos_path data/pretrain/fc/infos_fc-best.pkl --image_folder blah --num_images -1
    loading annotations into memory...
    0:00:00.498650
    creating index...
    index created!
    Traceback (most recent call last):
      File "eval.py", line 71, in <module>
        lang_stats = eval_utils.language_eval(opt.input_json, predictions, n_predictions, vars(opt), opt.split)
      File "/home/andrewcao95/workspace/pycharm_ws/self-critical.pytorch/eval_utils.py", line 83, in language_eval
        mean_perplexity = sum([_['perplexity'] for _ in preds_filt]) / len(preds_filt)
    ZeroDivisionError: division by zero
    
    

    source code:

    # filter results to only those in MSCOCO validation set
        preds_filt = [p for p in preds if p['image_id'] in valids]
        mean_perplexity = sum([_['perplexity'] for _ in preds_filt]) / len(preds_filt)
        mean_entropy = sum([_['entropy'] for _ in preds_filt]) / len(preds_filt)
        print('using %d/%d predictions' % (len(preds_filt), len(preds)))
        json.dump(preds_filt, open(cache_path, 'w')) # serialize to temporary json file. Sigh, COCO API...
    
        cocoRes = coco.loadRes(cache_path)
        cocoEval = COCOEvalCap(coco, cocoRes)
        cocoEval.params['image_id'] = cocoRes.getImgIds()
        cocoEval.evaluate()
    
        for metric, score in cocoEval.eval.items():
            out[metric] = score
        # Add mean perplexity
        out['perplexity'] = mean_perplexity
        out['entropy'] = mean_entropy
    
        imgToEval = cocoEval.imgToEval
        for k in list(imgToEval.values())[0]['SPICE'].keys():
            if k != 'All':
                out['SPICE_'+k] = np.array([v['SPICE'][k]['f'] for v in imgToEval.values()])
                out['SPICE_'+k] = (out['SPICE_'+k][out['SPICE_'+k]==out['SPICE_'+k]]).mean()
        for p in preds_filt:
            image_id, caption = p['image_id'], p['caption']
            imgToEval[image_id]['caption'] = caption
    
    opened by andrewcao95 8
  • How can I get my dataset image feature to scripts/prepro_feats.py

    How can I get my dataset image feature to scripts/prepro_feats.py

    Thanks for your job! When I prepare data I find that I have not pre_extract the image features in my dataset. So, how can I get these image features? Could you give me some codebase?

    opened by ShanZard 0
  • how to set the model ensemble?

    how to set the model ensemble?

    hello, I see the model ensembling method in the paper, which is helpful to improve the model performance. So I want to know how to implement the method of model enhancement?Thanks very much.

    opened by fjqfjqfjqfjqfjqfjqfjq 0
  • transformer_nsc

    transformer_nsc

    你好,我在运行transformer_nsc.yml时,碰到以下问题,请问怎么解决? Hugginface transformers not installed; please visit https://github.com/huggingface/transformers meshed-memory-transformer not installed; please run pip install git+https://github.com/ruotianluo/meshed-memory-transformer.git Warning: coco-caption not available

    opened by bai-24 11
  • Running Inference using CPU only.

    Running Inference using CPU only.

    I am not able to run inference on the model provided in the demo using CPU only. I am getting cuda device not available error. I tried removing all gpu and cuda references in the code and replacing it with CPU but it still does not work. It would be really helpful if you could help understand how to run the demo code without GPU.

    Thanks.

    opened by harindercnvrg 0
Releases(3.2)
  • 3.2(May 29, 2020)

    1. Faster beam search
    2. support h5 feature file
    3. allow beam search + scst (doesn't work as well though)
    4. Add a few models, BertCapModel and m2transformer (usefulness still question marked)
    5. Add projects.
    Source code(tar.gz)
    Source code(zip)
  • v3.1(Jan 10, 2020)

    1. Since it's 2020, py3 is officially supported. Open an issue if there is still something wrong.
    2. Finally, there is a model zoo which is relatively complete. Feel free to try the provided models.
    Source code(tar.gz)
    Source code(zip)
  • 3(Dec 31, 2019)

    1. Add structure loss inspired by Classical Structured Prediction Losses for Sequence to Sequence Learning
    2. Add a function of sample n captions. Support methods described in https://www.dropbox.com/s/tdqr9efrjdkeicz/iccv.pdf?dl=0.
    3. More pytorchy design of dataloader. Also, the dataloader now don't repeat image features according to seq_per_img. The repeating is now moved to the model forward function.
    4. Add multi-sentence sampling evaluation metrics like mBleu, Self-CIDEr etc. (those described in https://www.dropbox.com/s/tdqr9efrjdkeicz/iccv.pdf?dl=0)
    5. Use detectron type of config to setup experiments.
    6. A better self critical objective. (Named as new_self_critical now.) Use config ymls that end with nsc to test the performance. A technical report will be out soon. Basically, it performs better than original SCST on all metrics (by a small margin), but also faster (by a little bit).
    Source code(tar.gz)
    Source code(zip)
  • 2.2(Jun 25, 2019)

    1 Refactor the code a little bit. 2 Add BPE (didn’t seem to work much different) 3 Add nucleus sampling, topk and gumbel softmax sampling. 4 Make AttEnsemble compatible with transformer 5 Add remove bad ending from Improving Reinforcement Learning Based Image Captioning with Natural Language Prior

    Source code(tar.gz)
    Source code(zip)
  • 2.1(Jun 25, 2019)

  • 2.0.0(Apr 29, 2018)

    1. Add support for bleu4 optimization or combination of bleu4 and cider
    2. Add bottom-up feature support
    3. Add ensemble during evaluation.
    4. Add multi-gpu support.
    5. Add miscellaneous things. (box features; experimental models etc.)
    Source code(tar.gz)
    Source code(zip)
  • 1.0(Apr 28, 2018)

Owner
Ruotian(RT) Luo
Phd student at TTIC
Ruotian(RT) Luo
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 1.3k Dec 31, 2022
Sequence to Sequence Models with PyTorch

Sequence to Sequence models with PyTorch This repository contains implementations of Sequence to Sequence (Seq2Seq) models in PyTorch At present it ha

Sandeep Subramanian 708 Dec 19, 2022
Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

Elad Hoffer 514 Nov 17, 2022
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

zhaohu xing 112 Dec 16, 2022
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 195 Dec 17, 2022
Optimized code based on M2 for faster image captioning training

Transformer Captioning This repository contains the code for Transformer-based image captioning. Based on meshed-memory-transformer, we further optimi

lyricpoem 16 Dec 16, 2022
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

Sunbow Liu 22 Nov 25, 2022
Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

OFA Sys 1.4k Jan 8, 2023
Deep learning operations reinvented (for pytorch, tensorflow, jax and others)

This video in better quality. einops Flexible and powerful tensor operations for readable and reliable code. Supports numpy, pytorch, tensorflow, and

Alex Rogozhnikov 6.2k Jan 1, 2023
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 161 Dec 8, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group 897 Jan 5, 2023
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

Cornelius Roemer 24 Oct 26, 2022
CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote Sensing Images

CFC-Net This project hosts the official implementation for the paper: CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Dete

ming71 55 Dec 12, 2022
Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

A Critical Assessment of State-of-the-Art in Entity Alignment This repository contains the source code for the paper A Critical Assessment of State-of

Max Berrendorf 16 Oct 14, 2022
A Python Package for Portfolio Optimization using the Critical Line Algorithm

PyCLA A Python Package for Portfolio Optimization using the Critical Line Algorithm Getting started To use PyCLA, clone the repo and install the requi

null 19 Oct 11, 2022
CRISCE: Automatically Generating Critical Driving Scenarios From Car Accident Sketches

CRISCE: Automatically Generating Critical Driving Scenarios From Car Accident Sketches This document describes how to install and use CRISCE (CRItical

Chair of Software Engineering II, Uni Passau 2 Feb 9, 2022
Videocaptioning.pytorch - A simple implementation of video captioning

pytorch implementation of video captioning recommend installing pytorch and pyth

Yiyu Wang 2 Jan 1, 2022
load .txt to train YOLOX, same as Yolo others

YOLOX train your data you need generate data.txt like follow format (per line-> one image). prepare one data.txt like this: img_path1 x1,y1,x2,y2,clas

LiMingf 18 Aug 18, 2022