Adaptively Aligned Image Captioning via Adaptive Attention Time

This repository includes the implementation for Adaptively Aligned Image Captioning via Adaptive Attention Time.

Requirements

Python 3.6
Java 1.8.0
PyTorch 1.0
cider
coco-caption
tensorboardX

Training AAT

Prepare data (with python2)

See details in data/README.md.

(notes: Set word_count_threshold in scripts/prepro_labels.py to 4 to generate a vocabulary of size 10,369.)

You should also preprocess the dataset and get the cache for calculating cider score for SCST:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Training

$ sh train-aat.sh

See opts.py for the options.

Evaluation

$ CUDA_VISIBLE_DEVICES=0 python eval.py --model log/log_aat_rl/model.pth --infos_path log/log_aat_rl/infos_aat.pkl  --dump_images 0 --dump_json 1 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 100 --split test

Reference

If you find this repo helpful, please consider citing:

@inproceedings{huang2019adaptively,
  title = {Adaptively Aligned Image Captioning via Adaptive Attention Time},
  author = {Huang, Lun and Wang, Wenmin and Xia, Yaxian and Chen, Jie},
  booktitle = {Advances in Neural Information Processing Systems 32},
  year={2019}
}

Acknowledgements

This repository is based on Ruotian Luo's self-critical.pytorch.

你好！我在AAT里面运行AOA模型的代码，运用下面的代码： python3.6 train.py --id aoa
--batch_size 10
--beam_size 1
--max_epochs 25
--caption_model aoa
--refine 1
--refine_aoa 1
--use_ff 0
--decoder_type AoA
--use_multi_head 2
--num_heads 8
--multi_head_scale 1
--mean_feats 1
--ctx_drop 1
--dropout_aoa 0.3
--label_smoothing 0.2
--input_json data/cocotalk.json
--input_label_h5 data/cocotalk_label.h5
--input_fc_dir data/cocobu_fc
--input_att_dir data/cocobu_att
--input_box_dir data/cocobu_box
--seq_per_img 5
--learning_rate 2e-4
--num_layers 2
--input_encoding_size 1024
--rnn_size 1024
--learning_rate_decay_start 0
--scheduled_sampling_start 0
--checkpoint_path log_aoa/log_aoa
--save_checkpoint_every 6000
--language_eval 1
--val_images_use -1
--scheduled_sampling_increase_every 5
--scheduled_sampling_max_prob 0.5
--learning_rate_decay_every 3

在epoch=0的过程中，computing SPICE 时，代码报错如下：

`computing SPICE score... Parsing reference captions Initiating Stanford parsing pipeline [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize [main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer. [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse [main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.3 sec]. [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.0 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.4 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.5 sec]. Threads( StanfordCoreNLP ) #

Threads( StanfordCoreNLP ) # A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007f0aa67f4e10, pid=12537, tid=0x00007f0a7d4b4700

JRE version: OpenJDK Runtime Environment (8.0_265-b01) (build 1.8.0_265-8u265-b01-0ubuntu2~16.04-b01) Java VM: OpenJDK 64-Bit Server VM (25.265-b01 mixed mode linux-amd64 compressed oops) Problematic frame: V [libjvm.so+0x408e10]

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as: /home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/hs_err_pid12537.log

[error occurred during error reporting , id 0xb]

If you would like to submit a bug report, please visit: http://bugreport.java.com/bugreport/crash.jsp

Traceback (most recent call last): File "train.py", line 300, in train(opt) File "train.py", line 244, in train val_loss, predictions, lang_stats = eval_utils.eval_split(dp_model, lw_model.crit, loader, eval_kwargs) File "/home/muli/myExpe--caption/AAT/eval_utils.py", line 173, in eval_split lang_stats = language_eval(dataset, predictions, eval_kwargs['id'], split) File "/home/muli/myExpe--caption/AAT/eval_utils.py", line 55, in language_eval cocoEval.evaluate() File "coco-caption/pycocoevalcap/eval.py", line 61, in evaluate score, scores = scorer.compute_score(gts, res) File "coco-caption/pycocoevalcap/spice/spice.py", line 79, in compute_score cwd=os.path.dirname(os.path.abspath(file))) File "/usr/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['java', '-jar', '-Xmx8G', 'spice-1.0.jar', '/home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/tmp/tmpyfgzkyc4', '-cache', '/home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/cache/1601606281.2002816', '-out', '/home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/tmp/tmp_de8maf_', '-subset', '-silent']' died with <Signals.SIGABRT: 6>. Terminating BlobFetcher`

请问一下，这个怎么解决呢？我在运行AAT 模型时，没有报错

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

115 Dec 23, 2022

PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

2021-CVPR-MvCLN This repo contains the code and data of the following paper accepted by CVPR 2021 Partially View-aligned Representation Learning with

33 Nov 1, 2022

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021) PyTorch implementation of Learning RAW-to-sRGB Mappings with Inaccurat

53 Dec 20, 2022

TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks.

264 Jan 9, 2023

[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

计算SPICE score报错Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before startin

你好！我在AAT里面运行AOA模型的代码，运用下面的代码： python3.6 train.py --id aoa
--batch_size 10
--beam_size 1
--max_epochs 25
--caption_model aoa
--refine 1
--refine_aoa 1
--use_ff 0
--decoder_type AoA
--use_multi_head 2
--num_heads 8
--multi_head_scale 1
--mean_feats 1
--ctx_drop 1
--dropout_aoa 0.3
--label_smoothing 0.2
--input_json data/cocotalk.json
--input_label_h5 data/cocotalk_label.h5
--input_fc_dir data/cocobu_fc
--input_att_dir data/cocobu_att
--input_box_dir data/cocobu_box
--seq_per_img 5
--learning_rate 2e-4
--num_layers 2
--input_encoding_size 1024
--rnn_size 1024
--learning_rate_decay_start 0
--scheduled_sampling_start 0
--checkpoint_path log_aoa/log_aoa
--save_checkpoint_every 6000
--language_eval 1
--val_images_use -1
--scheduled_sampling_increase_every 5
--scheduled_sampling_max_prob 0.5
--learning_rate_decay_every 3

在epoch=0的过程中，computing SPICE 时，代码报错如下：

`computing SPICE score... Parsing reference captions Initiating Stanford parsing pipeline [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize [main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer. [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse [main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.3 sec]. [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.0 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.4 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.5 sec]. Threads( StanfordCoreNLP ) #

Threads( StanfordCoreNLP ) # A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007f0aa67f4e10, pid=12537, tid=0x00007f0a7d4b4700

JRE version: OpenJDK Runtime Environment (8.0_265-b01) (build 1.8.0_265-8u265-b01-0ubuntu2~16.04-b01) Java VM: OpenJDK 64-Bit Server VM (25.265-b01 mixed mode linux-amd64 compressed oops) Problematic frame: V [libjvm.so+0x408e10]

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as: /home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/hs_err_pid12537.log

[error occurred during error reporting , id 0xb]

If you would like to submit a bug report, please visit: http://bugreport.java.com/bugreport/crash.jsp

Traceback (most recent call last): File "train.py", line 300, in train(opt) File "train.py", line 244, in train val_loss, predictions, lang_stats = eval_utils.eval_split(dp_model, lw_model.crit, loader, eval_kwargs) File "/home/muli/myExpe--caption/AAT/eval_utils.py", line 173, in eval_split lang_stats = language_eval(dataset, predictions, eval_kwargs['id'], split) File "/home/muli/myExpe--caption/AAT/eval_utils.py", line 55, in language_eval cocoEval.evaluate() File "coco-caption/pycocoevalcap/eval.py", line 61, in evaluate score, scores = scorer.compute_score(gts, res) File "coco-caption/pycocoevalcap/spice/spice.py", line 79, in compute_score cwd=os.path.dirname(os.path.abspath(file))) File "/usr/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['java', '-jar', '-Xmx8G', 'spice-1.0.jar', '/home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/tmp/tmpyfgzkyc4', '-cache', '/home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/cache/1601606281.2002816', '-out', '/home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/tmp/tmp_de8maf_', '-subset', '-silent']' died with <Signals.SIGABRT: 6>. Terminating BlobFetcher`

请问一下，这个怎么解决呢？我在运行AAT 模型时，没有报错

opened by mymuli 0
Reg Training time

Hi,

Thanks for sharing your code here.

Can you please tell on what type of GPUs you training your model, how much time it took to complete one epoch and the number of epochs till you run your model?

Regards Deepak Mittal

opened by deepak242424 1

Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Related tags

Overview

Adaptively Aligned Image Captioning via Adaptive Attention Time

Requirements

Training AAT

Prepare data (with python2)

Training

Evaluation

Reference

Acknowledgements

You might also like...

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)

TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Semi-Autoregressive Transformer for Image Captioning

Comments

计算SPICE score报错Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before startin

Reg Training time

Owner

Lun Huang

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

Image Captioning using CNN ,LSTM and Attention

Code repository for paper `Skeleton Merger: an Unsupervised Aligned Keypoint Detector`.

This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"

PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

Optimized code based on M2 for faster image captioning training

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification