LaBERT - A length-controllable and non-autoregressive image captioning model.

Overview

Length-Controllable Image Captioning (ECCV2020)

This repo provides the implemetation of the paper Length-Controllable Image Captioning.

Install

conda create --name labert python=3.7
conda activate labert

conda install pytorch=1.3.1 torchvision cudatoolkit=10.1 -c pytorch
pip install h5py tqdm transformers==2.1.1
pip install git+https://github.com/salaniz/pycocoevalcap

Data & Pre-trained Models

  • Prepare MSCOCO data follow link.
  • Download pretrained Bert and Faster-RCNN from Baidu Cloud Disk [code: 0j9f] or Google Drive.
    • It's an unified checkpoint file, containing a pretrained Bert-base and the fc6 layer of the Faster-RCNN.
  • Download our pretrained LaBERT model from Baidu Cloud Disk [code: fpke] or Google Drive.

Scripts

Train

python -m torch.distributed.launch \
  --nproc_per_node=$NUM_GPUS \
  --master_port=4396 train.py \
  save_dir $PATH_TO_TRAIN_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU

Continue train

python -m torch.distributed.launch \
  --nproc_per_node=$NUM_GPUS \
  --master_port=4396 train.py \
  save_dir $PATH_TO_TRAIN_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU \
  model_path $PATH_TO_MODEL

Inference

python inference.py \
  model_path $PATH_TO_MODEL \
  save_dir $PATH_TO_TEST_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU

Evaluate

python evaluate.py \
  --gt_caption data/id2captions_test.json \
  --pd_caption $PATH_TO_TEST_OUTPUT/caption_results.json \
  --save_dir $PATH_TO_TEST_OUTPUT

Cite

Please consider citing our paper in your publications if the project helps your research.

@article{deng2020length,
  title={Length-Controllable Image Captioning},
  author={Deng, Chaorui and Ding, Ning and Tan, Mingkui and Wu, Qi},
  journal={arXiv preprint arXiv:2007.09580},
  year={2020}
}
Comments
  • Dataset format

    Dataset format

    Hi! Thank you for your last reply.

    I have a question about dataset. I did download this link's coco data part1 and part2 and exchanged files name coco_detection_vg_100dets_gvd_checkpoint_trainval_cls(0to999).h5 -> cls(0to999).h5, coco_detection_vg_100dets_gvd_checkpoint_trainval_feat(0to999).h5 -> feat(0to999).h5, and coco_detection_vg_thresh0.2_feat_gvd_checkpoint_trainvaltest.h5 -> region_bbox.h5. But an error has occurred. (command at terminal : python -m torch.distributed.launch --nproc_per_node=1 --master_port=4396 train.py save_dir result/ samples_per_gpu 1) image

    I add source code train.py at 90 line and dataset.py at 86line so these have prints. train.py

    • print(pred_scores, gt_token_ids) loss = criterion(pred_scores, gt_token_ids)

    dataset.py

    • print(name) with h5py.File(osp.join(self.root, f'feat{name[-3:]}.h5'), 'r') as features,
      h5py.File(osp.join(self.root, f'cls{name[-3:]}.h5'), 'r') as classes,
      h5py.File(osp.join(self.root, f'region_bbox.h5'), 'r') as bboxes:

    Is it right to set up the dataset like this? If not, could you explain the dataset guidelines in more detail? Thank you for providing good model. :)

    opened by dXDb 4
  • can't find the region_bbox.h5

    can't find the region_bbox.h5

    Thanks for your amazing job ,i download the dataset from Baidunetdisk you provided ,but when i run the code, in the line88 dataset.py, i cant find the region_bbox.h5, is it in the dataset you provide?

    opened by State226 3
  • Missing key(s) in state_dict

    Missing key(s) in state_dict

    Hi,

    Thanks for sharing this excellent work. I encountered an issue when trying to test the inference step. I downloaded the pretrained generator.pth and bert.pth from google drive. The error message is attached below. Could you give me an hint how to solve the problem? Thanks in advance.

    Traceback (most recent call last): File "inference.py", line 140, in g_checkpointer.load(config.model_path, True) File "/data-2/home/xingzheng.xz/Research/LaBERT/utils/checkpointer.py", line 47, in load self.model.load_state_dict(checkpoint.pop("model")) File "/home/xingzheng.xz/miniconda3/envs/omninet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Generator: Missing key(s) in state_dict: "classifier.decoder.bias", "embedding_layer.position_ids".

    opened by zheng-xing 3
  • id2captions_test.json file

    id2captions_test.json file

    Hi!

    You do evaluation using data/id2captions_test.json and data/caption_results.json files. But I could not find out where the files are. Can you share the files with us?

    Thanks,

    opened by saitarslanboun 3
  • Refinement Steps

    Refinement Steps

    Hi, Great work, thanks for sharing the code.

    I could not find the number of refinement steps T in the configuration / code. Does it mean that this code is for autoregressive generation (table 1 instead of 2) only? If I'm mistaken, would you point the parameter that corresponds to T?

    Thanks!

    opened by Eladhi 1
  • MS COCO dataset download issue from the link provided [VLP github page]

    MS COCO dataset download issue from the link provided [VLP github page]

    Hey! Really interested in your work and would love to divulge in it more. I'm facing problems downloading the dataset. Can you please guide me as to where and how I can get the MS COCO dataset of 123k with splits of 113,287 images for training, 5,000 for validation, and 5,000 for offline evaluation?

    The link provided for the dataset leads to the VLP github page. The dataset there is a combination of COCO and VQA of 95GB and 72GB. I'm unable to use them. Can you please suggest a way to download MS COCO dataset only.

    opened by SupragyaSonkar 1
  • Data Download Problem

    Data Download Problem

    Hi, I am interested in your great work and wish to follow it in my further experiments but I have a problem when trying to download MSCOCO data. Since the one drive link provided in VLP cannot be visited in China without a VPN, it's difficult for me to prepare the data on my ubuntu machine. Do you have any generous advice for me to solve this problem? Or would you please provide another download link that can be easily connected in China for MSCOCO data? Thank you!

    opened by tjuwyh 1
  • Question about number of region_spatial features

    Question about number of region_spatial features

    Hi @bearcatt, Thanks for your sharing code. I met one problem here spatial_region, what are the 5th and 6th first feature about (region_spatial [:,:,[5,6])? Moreover, from your paper there are five (four for location coordinate and one for relative area) in localization feature(in implementation details section), but in your code here there are six features in last dimension. Can you help me understand this part?

    opened by czy-orange 0
  • Inference on raw images

    Inference on raw images

    Hi, Thanks for sharing your interesting work on image captioning. I wanted to run the pretrained model on a few images of mine to test. Wanted to confirm if its this or this that I need to use to create the bounding and boxes and features for my images. Thanks.

    opened by rachs 1
  • codes of VLP

    codes of VLP

    Hi! Thank you for the amazing work on LaBERT! I was wondering whether you would release the code for Length-Controllable VLP. In case I try my own implementation, adding length-level embedding to an input word embedding will suffice? Am I right? BTW, where can I find weights for length-aware VLP or AoANet? Thanks!

    opened by EddieKro 2
  • codes for AoA

    codes for AoA

    Hi~ Your codes on LaBert is amazing, I wonder whether you would release the code of implementing non-autogressive length controllable image captioning on AoA as you announced in the paper. Thanks!

    opened by RubickH 3
Owner
bearcatt
bearcatt
Semi-Autoregressive Transformer for Image Captioning

Semi-Autoregressive Transformer for Image Captioning Requirements Python 3.6 Pytorch 1.6 Prepare data Please use git clone --recurse-submodules to clo

YE Zhou 23 Dec 9, 2022
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

TalkNet 2 [WIP] TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Predictio

Rishikesh (ऋषिकेश) 69 Dec 17, 2022
SlotRefine: A Fast Non-Autoregressive Model forJoint Intent Detection and Slot Filling

SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection and Slot Filling Reference Main paper to be cited (Di Wu et al., 2020) @article

Moore 34 Nov 3, 2022
Pytorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Parallel Tacotron2 Pytorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Keon Lee 170 Dec 27, 2022
This is a template for the Non-autoregressive Deep Learning-Based TTS model (in PyTorch).

Non-autoregressive Deep Learning-Based TTS Template This is a template for the Non-autoregressive TTS model. It contains Data Preprocessing Pipeline D

Keon Lee 13 Dec 5, 2022
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

THUHCSI 138 Oct 28, 2022
Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Non-AR Spatial-Temporal Transformer Introduction Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series For

Chen Kai 66 Nov 28, 2022
Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

Idiap Research Institute 40 Aug 14, 2022
Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

GLAT Implementation for the ACL2021 paper "Glancing Transformer for Non-Autoregressive Neural Machine Translation" Requirements Python >= 3.7 Pytorch

null 117 Jan 9, 2023
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Keon Lee 67 Nov 14, 2022
PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Non-Autoregressive Transformer Code release for Non-Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K.

Salesforce 261 Nov 12, 2022
A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis

A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis This is the pytorch implementation for our MICCAI 2021 paper. A Mul

Jiarong Ye 7 Apr 4, 2022
This is the PyTorch implementation of GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation

Official PyTorch repo for GAN's N' Roses. Diverse im2im and vid2vid selfie to anime translation.

null 1.1k Jan 1, 2023
FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

FaceVerse FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset Lizhen Wang, Zhiyuan Chen, Tao Yu, Chenguang

Lizhen Wang 219 Dec 28, 2022
The Adapter-Bot: All-In-One Controllable Conversational Model

The Adapter-Bot: All-In-One Controllable Conversational Model This is the implementation of the paper: The Adapter-Bot: All-In-One Controllable Conver

CAiRE 37 Nov 4, 2022
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Ren Yurui 261 Jan 9, 2023
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Website | ArXiv | Get Start | Video PIRenderer The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic

Ren Yurui 81 Sep 25, 2021
Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

tyty 4 Aug 28, 2022
A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection

Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection 1. 介绍 用以替代 NMS,在所有 bbox 中挑选出最优的集合。 NMS 仅考虑了 bbox 的得分,然后根据 IOU 来

null 44 Sep 15, 2022