Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Overview

Adaptively Aligned Image Captioning via Adaptive Attention Time

This repository includes the implementation for Adaptively Aligned Image Captioning via Adaptive Attention Time.

Requirements

Training AAT

Prepare data (with python2)

See details in data/README.md.

(notes: Set word_count_threshold in scripts/prepro_labels.py to 4 to generate a vocabulary of size 10,369.)

You should also preprocess the dataset and get the cache for calculating cider score for SCST:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Training

$ sh train-aat.sh

See opts.py for the options.

Evaluation

$ CUDA_VISIBLE_DEVICES=0 python eval.py --model log/log_aat_rl/model.pth --infos_path log/log_aat_rl/infos_aat.pkl  --dump_images 0 --dump_json 1 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 100 --split test

Reference

If you find this repo helpful, please consider citing:

@inproceedings{huang2019adaptively,
  title = {Adaptively Aligned Image Captioning via Adaptive Attention Time},
  author = {Huang, Lun and Wang, Wenmin and Xia, Yaxian and Chen, Jie},
  booktitle = {Advances in Neural Information Processing Systems 32},
  year={2019}
}

Acknowledgements

This repository is based on Ruotian Luo's self-critical.pytorch.

You might also like...
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)
PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

2021-CVPR-MvCLN This repo contains the code and data of the following paper accepted by CVPR 2021 Partially View-aligned Representation Learning with

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)
Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021) PyTorch implementation of Learning RAW-to-sRGB Mappings with Inaccurat

TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral
TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks.

[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction
[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

SADRNet Paper link: SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction Requirements python

Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix
Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Using a predicted aligned error matrix corresponding to an AlphaFold2 model , returns a series of lists of residue indices, where each list corresponds to a set of residues clustering together into a pseudo-rigid domain.

Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)
Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Diverse Image Captioning with Context-Object Split Latent Spaces This repository is the PyTorch implementation of the paper: Diverse Image Captioning

Semi-Autoregressive Transformer for Image Captioning

Semi-Autoregressive Transformer for Image Captioning Requirements Python 3.6 Pytorch 1.6 Prepare data Please use git clone --recurse-submodules to clo

Comments
  • 计算SPICE score报错Failed to write core dump. Core dumps have been disabled. To enable core dumping, try

    计算SPICE score报错Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before startin

    你好! 我在AAT里面运行AOA模型的代码,运用下面的代码: python3.6 train.py --id aoa
    --batch_size 10
    --beam_size 1
    --max_epochs 25
    --caption_model aoa
    --refine 1
    --refine_aoa 1
    --use_ff 0
    --decoder_type AoA
    --use_multi_head 2
    --num_heads 8
    --multi_head_scale 1
    --mean_feats 1
    --ctx_drop 1
    --dropout_aoa 0.3
    --label_smoothing 0.2
    --input_json data/cocotalk.json
    --input_label_h5 data/cocotalk_label.h5
    --input_fc_dir data/cocobu_fc
    --input_att_dir data/cocobu_att
    --input_box_dir data/cocobu_box
    --seq_per_img 5
    --learning_rate 2e-4
    --num_layers 2
    --input_encoding_size 1024
    --rnn_size 1024
    --learning_rate_decay_start 0
    --scheduled_sampling_start 0
    --checkpoint_path log_aoa/log_aoa
    --save_checkpoint_every 6000
    --language_eval 1
    --val_images_use -1
    --scheduled_sampling_increase_every 5
    --scheduled_sampling_max_prob 0.5
    --learning_rate_decay_every 3

    在epoch=0的过程中,computing SPICE 时,代码报错如下:

    `computing SPICE score... Parsing reference captions Initiating Stanford parsing pipeline [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize [main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer. [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse [main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.3 sec]. [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.0 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.4 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.5 sec]. Threads( StanfordCoreNLP ) #

    Threads( StanfordCoreNLP ) # A fatal error has been detected by the Java Runtime Environment:

    SIGSEGV (0xb) at pc=0x00007f0aa67f4e10, pid=12537, tid=0x00007f0a7d4b4700

    JRE version: OpenJDK Runtime Environment (8.0_265-b01) (build 1.8.0_265-8u265-b01-0ubuntu2~16.04-b01) Java VM: OpenJDK 64-Bit Server VM (25.265-b01 mixed mode linux-amd64 compressed oops) Problematic frame: V [libjvm.so+0x408e10]

    Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

    An error report file with more information is saved as: /home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/hs_err_pid12537.log

    [error occurred during error reporting , id 0xb]

    If you would like to submit a bug report, please visit: http://bugreport.java.com/bugreport/crash.jsp

    Traceback (most recent call last): File "train.py", line 300, in train(opt) File "train.py", line 244, in train val_loss, predictions, lang_stats = eval_utils.eval_split(dp_model, lw_model.crit, loader, eval_kwargs) File "/home/muli/myExpe--caption/AAT/eval_utils.py", line 173, in eval_split lang_stats = language_eval(dataset, predictions, eval_kwargs['id'], split) File "/home/muli/myExpe--caption/AAT/eval_utils.py", line 55, in language_eval cocoEval.evaluate() File "coco-caption/pycocoevalcap/eval.py", line 61, in evaluate score, scores = scorer.compute_score(gts, res) File "coco-caption/pycocoevalcap/spice/spice.py", line 79, in compute_score cwd=os.path.dirname(os.path.abspath(file))) File "/usr/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['java', '-jar', '-Xmx8G', 'spice-1.0.jar', '/home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/tmp/tmpyfgzkyc4', '-cache', '/home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/cache/1601606281.2002816', '-out', '/home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/tmp/tmp_de8maf_', '-subset', '-silent']' died with <Signals.SIGABRT: 6>. Terminating BlobFetcher`

    请问一下,这个怎么解决呢? 我在运行AAT 模型时,没有报错

    opened by mymuli 0
  • Reg Training time

    Reg Training time

    Hi,

    Thanks for sharing your code here.

    Can you please tell on what type of GPUs you training your model, how much time it took to complete one epoch and the number of epochs till you run your model?

    Regards Deepak Mittal

    opened by deepak242424 1
Owner
Lun Huang
PhD student at Duke
Lun Huang
Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

Tengfei Wang 110 Dec 20, 2022
Image Captioning using CNN ,LSTM and Attention

Image Captioning using CNN ,LSTM and Attention This is a deeplearning model which tries to summarize an image into a text . Installation Install this

ASUTOSH GHANTO 1 Dec 16, 2021
Code repository for paper `Skeleton Merger: an Unsupervised Aligned Keypoint Detector`.

Skeleton Merger Skeleton Merger, an Unsupervised Aligned Keypoint Detector. The paper is available at https://arxiv.org/abs/2103.10814. A map of the r

北海若 48 Nov 14, 2022
This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization News: [2020/05/04] Added EGL rendering option for training data g

Shunsuke Saito 1.5k Jan 3, 2023
PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Under construction... Attention in Attention Network for Image Super-Resolution (A2N) This repository is an PyTorch implementation of the paper "Atten

Haoyu Chen 71 Dec 30, 2022
[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [arXiv] [Project Page] @inproceedings{ huang2021fapn, title={{FaPN}: Feature-alig

Shihua Huang 23 Jul 22, 2022
PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

五维空间 140 Nov 23, 2022
This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

Ad^2Attack:Adaptive Adversarial Attack on Real-Time UAV Tracking Demo video ?? Our video on bilibili demonstrates the test results of Ad^2Attack on se

Intelligent Vision for Robotics in Complex Environment 10 Nov 7, 2022
Optimized code based on M2 for faster image captioning training

Transformer Captioning This repository contains the code for Transformer-based image captioning. Based on meshed-memory-transformer, we further optimi

lyricpoem 16 Dec 16, 2022
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Phil Wang 109 Dec 28, 2022