I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

Overview

An Image Captioning codebase

This is a codebase for image captioning research.

It supports:

A simple demo colab notebook is available here

Requirements

  • Python 3
  • PyTorch 1.3+ (along with torchvision)
  • cider (already been added as a submodule)
  • coco-caption (already been added as a submodule) (Remember to follow initialization steps in coco-caption/README.md)
  • yacs
  • lmdbdict

Install

If you have difficulty running the training scripts in tools. You can try installing this repo as a python package:

python -m pip install -e .

Pretrained models

Checkout MODEL_ZOO.md.

If you want to do evaluation only, you can then follow this section after downloading the pretrained models (and also the pretrained resnet101 or precomputed bottomup features, see data/README.md).

Train your own network on COCO/Flickr30k

Prepare data.

We now support both flickr30k and COCO. See details in data/README.md. (Note: the later sections assume COCO dataset; it should be trivial to use flickr30k.)

Start training

$ python tools/train.py --id fc --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_fc --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30

or

$ python tools/train.py --cfg configs/fc.yml --id fc

The train script will dump checkpoints into the folder specified by --checkpoint_path (default = log_$id/). By default only save the best-performing checkpoint on validation and the latest checkpoint to save disk space. You can also set --save_history_ckpt to 1 to save every checkpoint.

To resume training, you can specify --start_from option to be the path saving infos.pkl and model.pth (usually you could just set --start_from and --checkpoint_path to be the same).

To checkout the training curve or validation curve, you can use tensorboard. The loss histories are automatically dumped into --checkpoint_path.

The current command use scheduled sampling, you can also set --scheduled_sampling_start to -1 to turn off scheduled sampling.

If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use --language_eval 1 option, but don't forget to pull the submodule coco-caption.

For all the arguments, you can specify them in a yaml file and use --cfg to use the configurations in that yaml file. The configurations in command line will overwrite cfg file if there are conflicts.

For more options, see opts.py.

Train using self critical

First you should preprocess the dataset and get the cache for calculating cider score:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Then, copy the model from the pretrained model using cross entropy. (It's not mandatory to copy the model, just for back-up)

$ bash scripts/copy_model.sh fc fc_rl

Then

$ python tools/train.py --id fc_rl --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-5 --start_from log_fc_rl --checkpoint_path log_fc_rl --save_checkpoint_every 6000 --language_eval 1 --val_images_use 5000 --self_critical_after 30 --cached_tokens coco-train-idxs --max_epoch 50 --train_sample_n 5

or

$ python tools/train.py --cfg configs/fc_rl.yml --id fc_rl

You will see a huge boost on Cider score, : ).

A few notes on training. Starting self-critical training after 30 epochs, the CIDEr score goes up to 1.05 after 600k iterations (including the 30 epochs pertraining).

Generate image captions

Evaluate on raw images

Note: this doesn't work for models trained with bottomup feature. Now place all your images of interest into a folder, e.g. blah, and run the eval script:

$ python tools/eval.py --model model.pth --infos_path infos.pkl --image_folder blah --num_images 10

This tells the eval script to run up to 10 images from the given folder. If you have a big GPU you can speed up the evaluation by increasing batch_size. Use --num_images -1 to process all images. The eval script will create an vis.json file inside the vis folder, which can then be visualized with the provided HTML interface:

$ cd vis
$ python -m SimpleHTTPServer

Now visit localhost:8000 in your browser and you should see your predicted captions.

Evaluate on Karpathy's test split

$ python tools/eval.py --dump_images 0 --num_images 5000 --model model.pth --infos_path infos.pkl --language_eval 1 

The defualt split to evaluate is test. The default inference method is greedy decoding (--sample_method greedy), to sample from the posterior, set --sample_method sample.

Beam Search. Beam search can increase the performance of the search for greedy decoding sequence by ~5%. However, this is a little more expensive. To turn on the beam search, use --beam_size N, N should be greater than 1.

Evaluate on COCO test set

$ python tools/eval.py --input_json cocotest.json --input_fc_dir data/cocotest_bu_fc --input_att_dir data/cocotest_bu_att --input_label_h5 none --num_images -1 --model model.pth --infos_path infos.pkl --language_eval 0

You can download the preprocessed file cocotest.json, cocotest_bu_att and cocotest_bu_fc from link.

Miscellanea

Using cpu. The code is currently defaultly using gpu; there is even no option for switching. If someone highly needs a cpu model, please open an issue; I can potentially create a cpu checkpoint and modify the eval.py to run the model on cpu. However, there's no point using cpus to train the model.

Train on other dataset. It should be trivial to port if you can create a file like dataset_coco.json for your own dataset.

Live demo. Not supported now. Welcome pull request.

For more advanced features:

Checkout ADVANCED.md.

Reference

If you find this repo useful, please consider citing (no obligation at all):

@article{luo2018discriminability,
  title={Discriminability objective for training descriptive captions},
  author={Luo, Ruotian and Price, Brian and Cohen, Scott and Shakhnarovich, Gregory},
  journal={arXiv preprint arXiv:1803.04376},
  year={2018}
}

Of course, please cite the original paper of models you are using (You can find references in the model files).

Acknowledgements

Thanks the original neuraltalk2 and awesome PyTorch team.

Comments
  • some code missing?

    some code missing?

    python scripts/prepro_labels.py --input_json .../dataset_coco.json --output_json data/cocotalk.json --output_h5 data/cocotalk failed. Here are the errors:

    Traceback (most recent call last):
      File "scripts/prepro_labels.py", line 192, in <module>
        main(params)
      File "scripts/prepro_labels.py", line 138, in main
        imgs = imgs['images']
    TypeError: list indices must be integers, not str
    

    It seems that some code is missing.

    opened by cswhjiang 37
  • Eval runs ok first time but then throw an

    Eval runs ok first time but then throw an "ZeroDivisionError: division by zero"

    hi

    I got the eval code to run ok the first time on an image, but then when I try to run it again on the same image or on any other image I get the following error message. Is there some buffer that I need to clean up somewhere between each inference run?

    ImageCaptioning.pytorch$ sudo python3 eval.py --model models/fc/model-best.pth --infos_path models/fc/infos_fc-best.pkl --image_folder images/ --num_images 1
    Hugginface transformers not installed; please visit https://github.com/huggingface/transformers
    meshed-memory-transformer not installed; please run `pip install git+https://github.com/ruotianluo/meshed-memory-transformer.git`
    Warning: coco-caption not available
    loading annotations into memory...
    Done (t=0.78s)
    creating index...
    index created!
    Traceback (most recent call last):
      File "eval.py", line 72, in <module>
        lang_stats = eval_utils.language_eval(opt.input_json, predictions, n_predictions, vars(opt), opt.split)
      File "/media/tetsfr/SSD/ImageCaptioning.pytorch/captioning/utils/eval_utils.py", line 79, in language_eval
        mean_perplexity = sum([_['perplexity'] for _ in preds_filt]) / len(preds_filt)
    ZeroDivisionError: division by zero
    

    Same pattern if I change the pre-trained model used for evaluation, it works the first time but then throws this division by zero error all the time. thanks for your help.

    opened by Tetsujinfr 23
  • some bug found in using

    some bug found in using

    eval.py line 79: opt.input_fc_h5 = infos['opt'].input_fc_h5 need change to opt.input_fc_dir = infos['opt'].input_fc_dir line 80: opt.input_att_h5 = infos['opt'].input_att_h5 need change to opt.input_att_dir = infos['opt'].input_att_dir

    dataloaderraw.py line 104: img = img.concatenate((img, img, img), axis=2) need change to img = np.concatenate((img, img, img), axis=2)

    opened by hitluobin 22
  • questions about initializing the lstm hidden states

    questions about initializing the lstm hidden states

    here :https://github.com/ruotianluo/neuraltalk2.pytorch/blob/master/models/OldModel.py#L49 you seems to directly init the hidden states with the fc_feats with a linear layer. So I want to ask that if I want to implement an attention model where the lstm takes fc_feats as input at step 0, and takes start token as input at step 1, like the figure below, then how to init the hidden states of lstm? image

    opened by brisker 16
  • Evaluation:  AttributeError: 'Namespace' object has no attribute 'caption_model'

    Evaluation: AttributeError: 'Namespace' object has no attribute 'caption_model'

    When running eval.py on python2.7, I get this error:

    File "eval.py", line 99, in <module> model = models.setup(opt) File "/path/to/neuraltalk2.pytorch/models.py", line 16, in setup if opt.caption_model == 'show_tell': AttributeError: 'Namespace' object has no attribute 'caption_model'

    It looks like the "caption_model" argument is missing from the Argument Parser in eval.py, causing an error to be thrown when model.py attempts to access it.

    I see that there are model settings are in "opts.py". Are we somehow meant to import these?

    opened by danjcosg 14
  • saving to and loading from single h5

    saving to and loading from single h5

    @nouiz This PR creates a single h5 file instead of many numpy files or a single numpy. Follow up to https://github.com/ruotianluo/ImageCaptioning.pytorch/pull/23.

    opened by ReyhaneAskari 12
  • I met a error in the training

    I met a error in the training

    And when I tried to train on GPU,I met multiprocessing error as fllow: “Traceback (most recent call last): File "train.py", line 287, in train(opt) File "train.py", line 231, in train infos['loader_state_dict'] = loader.state_dict() File "/home1/hu/Pytorch/ImageCaptioning.pytorch-master/captioning/data/dataloader.py", line 363, in state_dict for split, loader in self.loaders.items()} File "/home1/hu/Pytorch/ImageCaptioning.pytorch-master/captioning/data/dataloader.py", line 363, in for split, loader in self.loaders.items()} File "/home1/hu/Pytorch/ImageCaptioning.pytorch-master/captioning/data/dataloader.py", line 359, in get_prefetch_num return (self.iters[split]._send_idx - self.iters[split]._rcvd_idx) * self.batch_size AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_send_idx' ” How should i fix it???

    opened by HN123-123 11
  • How to achieve the high performance?

    How to achieve the high performance?

    I have trained the fc_model for 30 epochs, and the hyper-parameters are set the same as the default. But when testing, I only achieve Bleu-4 = 0.25 in my 1000 validation images in COCO, but I used your provided pre-trained model and achieved Bleu-4=0.324

    Bleu_4 0.25786956526235716 METEOR 0.221778680264552 CIDEr 0.785944241965 ROUGE_L 0.494435436025 Bleu_2 0.5063064959709507 Bleu_1 0.6833226219296441 Bleu_3 0.3629853091429978

    How to reproduce the high performance as you achieved?

    opened by ivy94419 10
  • Evaluating the raw image

    Evaluating the raw image

    Evaluate on raw images Note: this doesn't work for models trained with bottomup feature. Now place all your images of interest into a folder, e.g. blah, and run the eval script:

    $ python tools/eval.py --model model.pth --infos_path infos.pkl --image_folder blah --num_images 10

    The image captioning performance is not high. And run once, the next time will throw errors,such as the below picture image

    opened by alice-cool 9
  • colab demo to local

    colab demo to local

    Hi, thank you for sharing your nice work. I think the performance of the colab demo is better than the eval.py. I tried to make the colab demo to the local code, but i am struggling with it. Is there any way to implement colab demo in local?

    opened by sbkim052 9
  • Upgrade code to PyTorch 0.4

    Upgrade code to PyTorch 0.4

    Hi,

    I was wondering if you've planned to update this repo, to ensure compatible with PyTorch 0.4+, sometime soon? This would be really helpful, since I have other modules in my repo that require PyTorch 0.4+.

    opened by metro-smiles 9
  • i want to do fine-tuning not COCO dataset

    i want to do fine-tuning not COCO dataset

    Hello. I would like to do fine-tuning with a trained model. However, when I try to few-shot learning the provided trained model with another dataset, I get an error because it refers to COCOID. Is it not possible to do fine-tuning with this implementation?

    opened by tnb1021 3
  • I want to evaluate trained models using CIDER, etc.

    I want to evaluate trained models using CIDER, etc.

    I want to evaluate trained models using CIDER, but i can't it. When language_eval=0, it can be evaluated, but when language_eval=1, an error occurs.I want to use language_eval. tried:

    python tools/eval.py --model log_fc/model-best.pth --infos_path log_fc/infos_fc-best.pkl --image_folder ../repurpose-gan/test_images/ --num_images -1 --force 1 --language_eval 1
    

    then:

    Traceback (most recent call last):
      File "/home/tanabe/ImageCaptioning.pytorch/tools/eval.py", line 79, in <module>
        lang_stats = eval_utils.language_eval(opt.input_json, predictions, n_predictions, vars(opt), opt.split)
      File "/home/tanabe/ImageCaptioning.pytorch/captioning/utils/eval_utils.py", line 84, in language_eval
        mean_perplexity = sum([_['perplexity'] for _ in preds_filt]) / len(preds_filt)
    ZeroDivisionError: division by zero
    

    This error has been reported in other issues, but all seem to be resolved by setting --force=1 and language_eval=0. I can eliminate the error with those conditions too, but what can I do to run with language_eval=1?

    opened by tnb1021 4
  • Training speed for transformer using SCST

    Training speed for transformer using SCST

    Hi, I'm wondering the training speed of transformer using new-self-critical or SCST. Because during training the model should be inferenced and the inference speed of transformers are much slower than training. In RNN this should not be a problem, but I think that using the transformer the training would be much slower (I implemented a version my self and the training using RL was about 20x slower). I'm curious about the training speed in your experiment. Do you have any suggestions?

    opened by liuaohanjsj 1
  • RuntimeError: input must have 3 dimensions, got 4

    RuntimeError: input must have 3 dimensions, got 4

    I encountered a new problem, running display model input dimension error, but no one said there is this error, I don't know which link is the problem. If you can help, thank you very much!

    Read data: 0.0002810955047607422 /home/mm/anaconda3/envs/subgc/lib/python3.6/site-packages/torch/nn/modules/rnn.py:582: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). (Triggered internally at /pytorch/aten/src/ATen/native/cudnn/RNN.cpp:775.) self.dropout, self.training, self.bidirectional, self.batch_first) Save ckpt on exception ... model saved to ./log_/model.pth Save ckpt done. Traceback (most recent call last): File "/data/data-2T/AI/ImageCaptioning/train.py", line 185, in train model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag, struc_flag, drop_worst_flag) File "/home/mm/anaconda3/envs/subgc/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/mm/anaconda3/envs/subgc/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/mm/anaconda3/envs/subgc/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/mm/anaconda3/envs/subgc/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/home/mm/anaconda3/envs/subgc/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/mm/anaconda3/envs/subgc/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, **kwargs) File "/home/mm/anaconda3/envs/subgc/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/data/data-2T/AI/ImageCaptioning/captioning/modules/loss_wrapper.py", line 47, in forward loss = self.crit(self.model(fc_feats, att_feats, labels[..., :-1], att_masks), labels[..., 1:], masks[..., 1:], reduction=reduction) File "/home/mm/anaconda3/envs/subgc/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in call_impl result = self.forward(*input, **kwargs) File "/data/data-2T/AI/ImageCaptioning/captioning/models/CaptionModel.py", line 33, in forward return getattr(self, ''+mode)(*args, **kwargs) File "/data/data-2T/AI/ImageCaptioning/captioning/models/ShowTellModel.py", line 81, in _forward output, state = self.core(xt.unsqueeze(0), state) File "/home/mm/anaconda3/envs/subgc/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/mm/anaconda3/envs/subgc/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 579, in forward self.check_forward_args(input, hx, batch_sizes) File "/home/mm/anaconda3/envs/subgc/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 530, in check_forward_args self.check_input(input, batch_sizes) File "/home/mm/anaconda3/envs/subgc/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 176, in check_input expected_input_dim, input.dim())) RuntimeError: input must have 3 dimensions, got 4

    opened by Waiting-TT 0
  • RuntimeError: unexpected EOF, expected 4233720 more bytes. The file might be corrupted.

    RuntimeError: unexpected EOF, expected 4233720 more bytes. The file might be corrupted.

    Got the above error when running python3 tools/eval.py --model model-best.pth --infos_path infos_fc_nsc-best.pkl --image_folder ../drive/MyDrive/IMAG --num_images 1

    What do I do?

    opened by atul1234anand 0
Owner
Ruotian(RT) Luo
Phd student at TTIC
Ruotian(RT) Luo
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 906 Jan 3, 2023
Diverse Branch Block: Building a Convolution as an Inception-like Unit

Diverse Branch Block: Building a Convolution as an Inception-like Unit (PyTorch) (CVPR-2021) DBB is a powerful ConvNet building block to replace regul

null 253 Dec 24, 2022
Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

Angora Angora is a mutation-based coverage guided fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without s

null 833 Jan 7, 2023
Only works with the dashboard version / branch of jesse

Jesse optuna Only works with the dashboard version / branch of jesse. The config.yml should be self-explainatory. Installation # install from git pip

Markus K. 8 Dec 4, 2022
CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote Sensing Images

CFC-Net This project hosts the official implementation for the paper: CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Dete

ming71 55 Dec 12, 2022
Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

A Critical Assessment of State-of-the-Art in Entity Alignment This repository contains the source code for the paper A Critical Assessment of State-of

Max Berrendorf 16 Oct 14, 2022
A Python Package for Portfolio Optimization using the Critical Line Algorithm

PyCLA A Python Package for Portfolio Optimization using the Critical Line Algorithm Getting started To use PyCLA, clone the repo and install the requi

null 19 Oct 11, 2022
CRISCE: Automatically Generating Critical Driving Scenarios From Car Accident Sketches

CRISCE: Automatically Generating Critical Driving Scenarios From Car Accident Sketches This document describes how to install and use CRISCE (CRItical

Chair of Software Engineering II, Uni Passau 2 Feb 9, 2022
Old Photo Restoration (Official PyTorch Implementation)

Bringing Old Photo Back to Life (CVPR 2020 oral)

Microsoft 11.3k Dec 30, 2022
Implementation of the master's thesis "Temporal copying and local hallucination for video inpainting".

Temporal copying and local hallucination for video inpainting This repository contains the implementation of my master's thesis "Temporal copying and

David Álvarez de la Torre 1 Dec 2, 2022
Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

MASTER-PyTorch PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This projec

Wenwen Yu 255 Dec 29, 2022
This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR

This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR,which is an open-source toolbox based on PyTorch. The overall architecture will be shown below.

Jianquan Ye 82 Nov 17, 2022
Make your master artistic punk avatar through machine learning world famous paintings.

Master-art-punk Make your master artistic punk avatar through machine learning world famous paintings. 通过机器学习世界名画制作属于你的大师级艺术朋克头像 Nowadays, NFT is beco

Philipjhc 53 Dec 27, 2022
DeOldify - A Deep Learning based project for colorizing and restoring old images (and video!)

DeOldify - A Deep Learning based project for colorizing and restoring old images (and video!)

Jason Antic 15.8k Jan 4, 2023
This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised d

Hang 3 Oct 22, 2021
THIS IS THE **OLD** PYMC PROJECT. PLEASE USE PYMC3 INSTEAD:

Introduction Version: 2.3.8 Authors: Chris Fonnesbeck Anand Patil David Huard John Salvatier Web site: https://github.com/pymc-devs/pymc Documentation

PyMC 7.2k Jan 7, 2023
Random-Afg - Afghanistan Random Old Idz Cloner Tools

AFGHANISTAN RANDOM OLD IDZ CLONER TOOLS Install $ apt update $ apt upgrade $ apt

MAHADI HASAN AFRIDI 5 Jan 26, 2022
Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms

LESA Introduction This repository contains the official implementation of Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Cont

Chenglin Yang 20 Dec 31, 2021