Unsupervised captioning - Code for Unsupervised Image Captioning

Overview

Unsupervised Image Captioning

by Yang Feng, Lin Ma, Wei Liu, and Jiebo Luo

Introduction

Most image captioning models are trained using paired image-sentence data, which are expensive to collect. We propose unsupervised image captioning to relax the reliance on paired data. For more details, please refer to our paper.

alt text

Citation

@InProceedings{feng2019unsupervised,
  author = {Feng, Yang and Ma, Lin and Liu, Wei and Luo, Jiebo},
  title = {Unsupervised Image Captioning},
  booktitle = {CVPR},
  year = {2019}
}

Requirements

mkdir ~/workspace
cd ~/workspace
git clone https://github.com/tensorflow/models.git tf_models
git clone https://github.com/tylin/coco-caption.git
touch tf_models/research/im2txt/im2txt/__init__.py
touch tf_models/research/im2txt/im2txt/data/__init__.py
touch tf_models/research/im2txt/im2txt/inference_utils/__init__.py
wget http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz
mkdir ckpt
tar zxvf inception_v4_2016_09_09.tar.gz -C ckpt
git clone https://github.com/fengyang0317/unsupervised_captioning.git
cd unsupervised_captioning
pip install -r requirements.txt
export PYTHONPATH=$PYTHONPATH:`pwd`

Dataset (Optional. The files generated below can be found at Gdrive).

In case you do not have the access to Google, the files are also available at One Drive.

  1. Crawl image descriptions. The descriptions used when conducting the experiments in the paper are available at link. You may download the descriptions from the link and extract the files to data/coco.

    pip3 install absl-py
    python3 preprocessing/crawl_descriptions.py
    
  2. Extract the descriptions. It seems that NLTK is changing constantly. So the number of the descriptions obtained may be different.

    python -c "import nltk; nltk.download('punkt')"
    python preprocessing/extract_descriptions.py
    
  3. Preprocess the descriptions. You may need to change the vocab_size, start_id, and end_id in config.py if you generate a new dictionary.

    python preprocessing/process_descriptions.py --word_counts_output_file \ 
      data/word_counts.txt --new_dict
    
  4. Download the MSCOCO images from link and put all the images into ~/dataset/mscoco/all_images.

  5. Object detection for the training images. You need to first download the detection model from here and then extract the model under tf_models/research/object_detection.

    python preprocessing/detect_objects.py --image_path\
      ~/dataset/mscoco/all_images --num_proc 2 --num_gpus 1
    
  6. Generate tfrecord files for images.

    python preprocessing/process_images.py --image_path\
      ~/dataset/mscoco/all_images
    

Training

  1. Train the model without the intialization pipeline.

    python im_caption_full.py --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --multi_gpu --batch_size 512 --save_checkpoint_steps 1000\
      --gen_lr 0.001 --dis_lr 0.001
    
  2. Evaluate the model. The last element in the b34.json file is the best checkpoint.

    CUDA_VISIBLE_DEVICES='0,1' python eval_all.py\
      --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --data_dir ~/dataset/mscoco/all_images
    js-beautify saving/b34.json
    
  3. Evaluate the model on test set. Suppose the best validation checkpoint is 20000.

    python test_model.py --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --data_dir ~/dataset/mscoco/all_images --job_dir saving/model.ckpt-20000
    

Initialization (Optional. The files can be found at here).

  1. Train a object-to-sentence model, which is used to generate the pseudo-captions.

    python initialization/obj2sen.py
    
  2. Find the best obj2sen model.

    python initialization/eval_obj2sen.py --threads 8
    
  3. Generate pseudo-captions. Suppose the best validation checkpoint is 35000.

    python initialization/gen_obj2sen_caption.py --num_proc 8\
      --job_dir obj2sen/model.ckpt-35000
    
  4. Train a captioning using pseudo-pairs.

    python initialization/im_caption.py --o2s_ckpt obj2sen/model.ckpt-35000\
      --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt
    
  5. Evaluate the model.

    CUDA_VISIBLE_DEVICES='0,1' python eval_all.py\
      --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --data_dir ~/dataset/mscoco/all_images --job_dir saving_imcap
    js-beautify saving_imcap/b34.json
    
  6. Train sentence auto-encoder, which is used to initialize sentence GAN.

    python initialization/sentence_ae.py
    
  7. Train sentence GAN.

    python initialization/sentence_gan.py
    
  8. Train the full model with initialization. Suppose the best imcap validation checkpoint is 18000.

    python im_caption_full.py --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --imcap_ckpt saving_imcap/model.ckpt-18000\
      --sae_ckpt sen_gan/model.ckpt-30000 --multi_gpu --batch_size 512\
      --save_checkpoint_steps 1000 --gen_lr 0.001 --dis_lr 0.001
    

Credits

Part of the code is from coco-caption, im2txt, tfgan, resnet, Tensorflow Object Detection API and maskgan.

Xinpeng told me the idea of self-critic, which is crucial to training.

Comments
  • crawl_description.py

    crawl_description.py

    I've been trying to run crawl_description.py but I am getting an error.

    Traceback (most recent call last): File "crawl_descriptions1.py", line 92, in app.run(main) File "C:\Users\asus\Anaconda3\envs\tensorflow_gpu\lib\site-packages\absl\app.py", line 300, in run _run_main(main, args) File "C:\Users\asus\Anaconda3\envs\tensorflow_gpu\lib\site-packages\absl\app.py", line 251, in _run_main sys.exit(main(argv)) File "crawl_descriptions1.py", line 88, in main download(FLAGS.data_dir, FLAGS.num_pages, i, c) File "crawl_descriptions1.py", line 68, in download all_pages = get_num_pages(label) File "crawl_descriptions1.py", line 60, in get_num_pages num_pages = int(obj.group(1)) AttributeError: 'NoneType' object has no attribute 'group'

    obj = re.search('data-max="(\d*)"', page)

    This is mostly because obj is None as it is not able to find a match "('data-max="(\d*)" in the source code. The source code for Shutterstock might have changed.

    Can anyone help. Or update the python files

    opened by prajwalkkr 6
  • Processing image in batch in testing/evaluation

    Processing image in batch in testing/evaluation

    As metioned in https://github.com/fengyang0317/unsupervised_captioning/issues/4, the testing/evaluation are slow. One reason is they do not support multi-gpu for one model. And I found the crucial reason might be that they iterate images one by one instead of processing them in batch. I notice that you use different dataloaders between training and testing, where tfrec format and placeholder are used in training and testing respectively. I wonder why not testing/evaluation use the same dataloader and similar pipeline as training so that they can also process data in batch. Parameter batch_size is defined in caption_infer.py but it seems that the size larger than one will cause errors. https://github.com/fengyang0317/unsupervised_captioning/blob/ae17dc7edf556689eb943c8e51581a229ad41742/caption_infer.py#L29 Could you please kindly provide a batch version? Thanks!

    opened by HYPJUDY 5
  • caption_infer

    caption_infer

    After training, I try to generate some captions based on images, but why are the captions generated by different images almost the same? (I didn't change your code)

    opened by wangjiangnan 4
  • Chinese image caption, In the result, multiple words of the same type appear

    Chinese image caption, In the result, multiple words of the same type appear

    Hello, I am using the COCO dataset, A two-layer LSTM model, one layer for top-down attention, and one layer for language models.

    Extracting words with jieba I used all the words in the picture description that occurred more than 3 times as a dictionary file, and a total of 14,226 words. words = [w for w in word_freq.keys () if word_freq [w]> 3]

    After training the model, when using it, multiple words of the same type appear in the result, such as:

    Note notebook laptop computer on bed A little girl little girl girl standing together

    How can I solve this problem?

    opened by cylvzj 4
  • Confusion about the code

    Confusion about the code

    In process_descriptions.py , the 'key' is the intersection between sentence and category_name. But in input_pipeline.py, the 'key' and 'sentence' is not such. in process_descriptions.py :

    >sentence
    [1, 0, 58, 595, 10, 349, 12, 782, 0, 579, 3, 2]
    >key
    [595]
    

    in input_pipeline.py

    >sentence[0]
    [   0,   65,   19, 1130,   37,  882,   10,  124,    5,   48,    5,
            345,    1,    0,    0,    0,    0,    0,    0,    0,    0,    0,
              0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
              0,    0,    0,    0,    0,    0,    0,    0,    0]
    >key[0]
    [8265, 2390,  878, 4930,   10,  436,    5,    7,  118, 2433,    8,
            388,  558,    5,  139,    6,    1,    0,    0,    0,    0,    0,
              0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
              0,    0,    0,    0,    0,    0]
    

    It seems that the 'sentence' completely unrelated to the' key '. Is that reasonable? and why the 'key' is different in the above two file?

    opened by songpipi 4
  • ValueError: No variables provided.

    ValueError: No variables provided.

    I follow the steps strictly with tensorflow 1.13.1 and python 3.7. But when I run im_caption_full.py, I got this error. How to solve this error?

    The error appears in model_fn, train_ops = tfgan.gan_train_ops( gan_model, gan_loss, generator_optimizer=gen_opt, discriminator_optimizer=dis_opt, transform_grads_fn=transform_grads_fn, summarize_gradients=is_chief, check_for_unused_update_ops=not FLAGS.use_pool, aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N)

    (...)
      "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
    Traceback (most recent call last):
      File "im_caption_full.py", line 453, in <module>
        tf.app.run(main1())
      File "im_caption_full.py", line 449, in main1
        estimator.train(train_input_fn, max_steps=FLAGS.max_steps)
      File "/data/songpp/anaconda2/envs/tensorflow-gpu/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
        loss = self._train_model(input_fn, hooks, saving_listeners)
      File "/data/songpp/anaconda2/envs/tensorflow-gpu/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
        return self._train_model_default(input_fn, hooks, saving_listeners)
      File "/data/songpp/anaconda2/envs/tensorflow-gpu/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default
        features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
      File "/data/songpp/anaconda2/envs/tensorflow-gpu/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn
        model_fn_results = self._model_fn(features=features, **kwargs)
      File "im_caption_full.py", line 372, in model_fn
        aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N)
      File "/data/songpp/anaconda2/envs/tensorflow-gpu/lib/python3.7/site-packages/tensorflow/contrib/gan/python/train.py", line 1012, in gan_train_ops
        **kwargs)
      File "/data/songpp/anaconda2/envs/tensorflow-gpu/lib/python3.7/site-packages/tensorflow/contrib/training/python/training/training.py", line 458, in create_train_op
        grad_updates = optimizer.apply_gradients(grads, global_step=global_step)
      File "/data/songpp/anaconda2/envs/tensorflow-gpu/lib/python3.7/site-packages/tensorflow/python/training/optimizer.py", line 572, in apply_gradients
        raise ValueError("No variables provided.")
    ValueError: No variables provided.
    
    
    opened by songpipi 4
  • Suggestions

    Suggestions

    Some corrections and suggestions about the repo.

    1. The installation instructions are confusing - whether workspace is a directory inside unsupervised-captioning or the other way. "First we refer to requirements.txt and then we clone the repo" seems wrong.
    2. In preprocessing/crawl_descriptions.py, obj = re.search('data-max="(\d*)"', page) should be obj = re.search('max="(\d*)"', page). I went through the HTML file and this is what I found (as of 8 April, 2019. The format could change later).
    3. preprocessing/extract_descriptions.py imports config.py which is located in the root dir of the repo. Shouldn't this file be inside preprocessing dir. Also, the workspace dir needs maybe modified accordingly. Same with misc_fn.py
    4. An error may occur while running preprocessing/extract_descriptions.py. This is a solution I found in GitHub -> answer
    opened by gautamsreekumar 4
  • preprocessing/extract_descriptions.py

    preprocessing/extract_descriptions.py

    https://github.com/fengyang0317/unsupervised_captioning/blob/0e75b6aff4cc9e94249bc272fc5490e566ef5d7c/preprocessing/extract_descriptions.py#L16 ModuleNotFoundError: No module named 'data'

    sorry to trouble you,I can't find a module called 'data',also there is no 'build_mscoco_data' module in 'data' file,how can i solve this?

    opened by ironmanx1 3
  • some questions

    some questions

    $ python preprocessing/crawl_descriptions.py person 1000 0 [] bicycle 1000 0 [] car 1000 0 [] motorbike 1000 0 [] aeroplane 1000 0 [] ... Is this result normal? How to solve it?Looking forward to your reply.

    sorry to bother you,i also get the same results in this step,but i don't know change which part in crawl_descriptions.py file.Can you help me?thank you~

    opened by ironmanx1 2
  • The passed save_path is not a valid checkpoint

    The passed save_path is not a valid checkpoint

    Hi, Thank you for sharing your code. However, I countered a problem when I tried to run one of your command line on the github. When I run this command " python test_model.py --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt
    --data_dir ~/dataset/mscoco/all_images --job_dir saving/model.ckpt-20000", it returns following error; ValueError: The passed save_path is not a valid checkpoint: saving/model.ckpt-20000. Moreover, I used your saving file in Gdrive, however there is no model.ckpt-2000, there are files like model.ckpt-2000.index and model.ckpt-2000.meta. How can I run test model using pretrained model ?

    opened by atg93 2
  • AttributeError: 'NoneType' object has no attribute 'group'

    AttributeError: 'NoneType' object has no attribute 'group'

    (py36) XXX@lthpc:~/XSpace/Games/ICP1_unsupervised_captioning$ python preprocessing/crawl_descriptions.py Traceback (most recent call last): File "preprocessing/crawl_descriptions.py", line 100, in app.run(main) File "/home/XXX/miniconda3/envs/py36/lib/python3.6/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/home/XXX/miniconda3/envs/py36/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "preprocessing/crawl_descriptions.py", line 96, in main download(FLAGS.data_dir, FLAGS.num_pages, i, c) File "preprocessing/crawl_descriptions.py", line 76, in download all_pages = get_num_pages(label) File "preprocessing/crawl_descriptions.py", line 60, in get_num_pages num_pages = int(obj.group(1)) AttributeError: 'NoneType' object has no attribute 'group'

    opened by hello-lx 2
  • Bump tensorflow-gpu from 1.13.1 to 2.9.3

    Bump tensorflow-gpu from 1.13.1 to 2.9.3

    Bumps tensorflow-gpu from 1.13.1 to 2.9.3.

    Release notes

    Sourced from tensorflow-gpu's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow-gpu's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Failed to run im_caption_full.py

    Failed to run im_caption_full.py

    我使用您在百度网盘中提供的数据,当运行im_caption_full.py时出现如下错误: Traceback (most recent call last): File "/home/wangtao/miniconda3/envs/tensorflower1.4.0-py3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/home/wangtao/miniconda3/envs/tensorflower1.4.0-py3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/wangtao/miniconda3/envs/tensorflower1.4.0-py3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.DataLossError: 3 root error(s) found. (0) Data loss: corrupted record at 2137231365 [[{{node IteratorGetNext}}]] [[InceptionV4/Conv2d_1a_3x3/BatchNorm/moving_mean/read/_481]] (1) Data loss: corrupted record at 2137231365 [[{{node IteratorGetNext}}]] [[InceptionV4/Mixed_7b/Branch_3/Conv2d_0b_1x1/BatchNorm/beta/read/_5235]] (2) Data loss: corrupted record at 2137231365 [[{{node IteratorGetNext}}]] 0 successful operations. 2 derived errors ignored. 系统提示我数据缺失,我不知道您是否遇到过同样的问题,同时您是否有相应的解决方法

    opened by ttx213 0
  • Data.build_mscoco_data not found

    Data.build_mscoco_data not found

    Hey yang, I encountered the below mentioned error when I ran "python preprocessing/extract_descriptions.py" command. Can you resolve this issue. Thank you in advance

    Traceback (most recent call last): File "preprocessing/extract_descriptions.py", line 16, in from data.build_mscoco_data import _process_caption ModuleNotFoundError: No module named 'data.build_mscoco_data'

    opened by Anirudh-crypto 0
  • Question about dataset and inference

    Question about dataset and inference

    Dear Yang, I encounter some problems when running your code. First, the Dataset you given in the first step is out of date. And also the Baidu Cloud link you given in the Q&A is out of date too. Would you please give me a new link or update the link on the Github if able? Secondly, maybe I don't fully understand the project. I do not know how to use your code to do the inference. To know your method better, I would like to see how it works. Thanks a lot! Looking forward to your reply!

    opened by ShawnRBT 0
  • eval_all.py

    eval_all.py

    https://github.com/fengyang0317/unsupervised_captioning/blob/0e75b6aff4cc9e94249bc272fc5490e566ef5d7c/eval_all.py#L4

    https://github.com/fengyang0317/unsupervised_captioning/blob/0e75b6aff4cc9e94249bc272fc5490e566ef5d7c/eval_all.py#L92 https://github.com/fengyang0317/unsupervised_captioning/blob/0e75b6aff4cc9e94249bc272fc5490e566ef5d7c/eval_all.py#L93

    multiprocessing.pool.MaybeEncodingError: Error sending result: 'NotFoundError()'. Reason: 'PicklingError("Can't pickle <type 'module'>: attribute lookup builtin.module failed",)' is this result happen because of the module multiprocessing?Have you had this problem?

    opened by ironmanx1 0
Owner
Yang Feng
SWE @ Goolgle
Yang Feng
Optimized code based on M2 for faster image captioning training

Transformer Captioning This repository contains the code for Transformer-based image captioning. Based on meshed-memory-transformer, we further optimi

lyricpoem 16 Dec 16, 2022
Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

arXiv Dual Contrastive Learning Adversarial Generative Networks (DCLGAN) We provide our PyTorch implementation of DCLGAN, which is a simple yet powerf

null 119 Dec 4, 2022
Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Diverse Image Captioning with Context-Object Split Latent Spaces This repository is the PyTorch implementation of the paper: Diverse Image Captioning

Visual Inference Lab @TU Darmstadt 34 Nov 21, 2022
Semi-Autoregressive Transformer for Image Captioning

Semi-Autoregressive Transformer for Image Captioning Requirements Python 3.6 Pytorch 1.6 Prepare data Please use git clone --recurse-submodules to clo

YE Zhou 23 Dec 9, 2022
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

null 310 Dec 28, 2022
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

VisualGPT Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning Main Architecture of Our VisualGPT Downloa

Vision CAIR Research Group, KAUST 140 Dec 28, 2022
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 906 Jan 3, 2023
An unreferenced image captioning metric (ACL-21)

UMIC This repository provides an unferenced image captioning metric from our ACL 2021 paper UMIC: An Unreferenced Metric for Image Captioning via Cont

hwanheelee 14 Nov 20, 2022
Image Captioning using CNN and Transformers

Image-Captioning Keras/Tensorflow Image Captioning application using CNN and Transformer as encoder/decoder. In particulary, the architecture consists

null 24 Dec 28, 2022
An Image Captioning codebase

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 1.1k Oct 18, 2021
A transformer-based method for Healthcare Image Captioning in Vietnamese

vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese This repo GitHub contains our solution for vieCap4H

Doanh B C 4 May 5, 2022
Image Captioning using CNN ,LSTM and Attention

Image Captioning using CNN ,LSTM and Attention This is a deeplearning model which tries to summarize an image into a text . Installation Install this

ASUTOSH GHANTO 1 Dec 16, 2021
Image Captioning on google cloud platform based on iot

Image-Captioning-on-google-cloud-platform-based-on-iot - Image Captioning on google cloud platform based on iot

Shweta_kumawat 1 Jan 20, 2022
Unsupervised Image-to-Image Translation

UNIT: UNsupervised Image-to-image Translation Networks Imaginaire Repository We have a reimplementation of the UNIT method that is more performant. It

Ming-Yu Liu 劉洺堉 1.9k Dec 26, 2022
pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

Zhiqiang Shen 16 Nov 4, 2020
Code for "Unsupervised Layered Image Decomposition into Object Prototypes" paper

DTI-Sprites Pytorch implementation of "Unsupervised Layered Image Decomposition into Object Prototypes" paper Check out our paper and webpage for deta

null 40 Dec 22, 2022
Syntax-Aware Action Targeting for Video Captioning

Syntax-Aware Action Targeting for Video Captioning Code for SAAT from "Syntax-Aware Action Targeting for Video Captioning" (Accepted to CVPR 2020). Th

null 59 Oct 13, 2022
Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Towards Diverse Paragraph Captioning for Untrimmed Videos This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Capti

Yuqing Song 61 Oct 11, 2022
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

Melon(Xuguang Duan) 96 Nov 1, 2022