Unsupervised captioning - Code for Unsupervised Image Captioning

Yang Feng

Last update: Dec 24, 2022

Related tags

Deep Learning unsupervised_captioning

Overview

Unsupervised Image Captioning

by Yang Feng, Lin Ma, Wei Liu, and Jiebo Luo

Introduction

Most image captioning models are trained using paired image-sentence data, which are expensive to collect. We propose unsupervised image captioning to relax the reliance on paired data. For more details, please refer to our paper.

Citation

@InProceedings{feng2019unsupervised,
  author = {Feng, Yang and Ma, Lin and Liu, Wei and Luo, Jiebo},
  title = {Unsupervised Image Captioning},
  booktitle = {CVPR},
  year = {2019}
}

Requirements

mkdir ~/workspace
cd ~/workspace
git clone https://github.com/tensorflow/models.git tf_models
git clone https://github.com/tylin/coco-caption.git
touch tf_models/research/im2txt/im2txt/__init__.py
touch tf_models/research/im2txt/im2txt/data/__init__.py
touch tf_models/research/im2txt/im2txt/inference_utils/__init__.py
wget http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz
mkdir ckpt
tar zxvf inception_v4_2016_09_09.tar.gz -C ckpt
git clone https://github.com/fengyang0317/unsupervised_captioning.git
cd unsupervised_captioning
pip install -r requirements.txt
export PYTHONPATH=$PYTHONPATH:`pwd`

Dataset (Optional. The files generated below can be found at Gdrive).

In case you do not have the access to Google, the files are also available at One Drive.

Crawl image descriptions. The descriptions used when conducting the experiments in the paper are available at link. You may download the descriptions from the link and extract the files to data/coco.
```
pip3 install absl-py
python3 preprocessing/crawl_descriptions.py
```
Extract the descriptions. It seems that NLTK is changing constantly. So the number of the descriptions obtained may be different.
```
python -c "import nltk; nltk.download('punkt')"
python preprocessing/extract_descriptions.py
```
Preprocess the descriptions. You may need to change the vocab_size, start_id, and end_id in config.py if you generate a new dictionary.
```
python preprocessing/process_descriptions.py --word_counts_output_file \ 
  data/word_counts.txt --new_dict
```
Download the MSCOCO images from link and put all the images into ~/dataset/mscoco/all_images.
Object detection for the training images. You need to first download the detection model from here and then extract the model under tf_models/research/object_detection.
```
python preprocessing/detect_objects.py --image_path\
  ~/dataset/mscoco/all_images --num_proc 2 --num_gpus 1
```

Generate tfrecord files for images.

python preprocessing/process_images.py --image_path\
  ~/dataset/mscoco/all_images

Training

Train the model without the intialization pipeline.

python im_caption_full.py --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
  --multi_gpu --batch_size 512 --save_checkpoint_steps 1000\
  --gen_lr 0.001 --dis_lr 0.001

Evaluate the model. The last element in the b34.json file is the best checkpoint.

CUDA_VISIBLE_DEVICES='0,1' python eval_all.py\
  --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
  --data_dir ~/dataset/mscoco/all_images
js-beautify saving/b34.json

Evaluate the model on test set. Suppose the best validation checkpoint is 20000.

python test_model.py --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
  --data_dir ~/dataset/mscoco/all_images --job_dir saving/model.ckpt-20000

Initialization (Optional. The files can be found at here).

Train a object-to-sentence model, which is used to generate the pseudo-captions.
```
python initialization/obj2sen.py
```

Find the best obj2sen model.

python initialization/eval_obj2sen.py --threads 8

Generate pseudo-captions. Suppose the best validation checkpoint is 35000.

python initialization/gen_obj2sen_caption.py --num_proc 8\
  --job_dir obj2sen/model.ckpt-35000

Train a captioning using pseudo-pairs.

python initialization/im_caption.py --o2s_ckpt obj2sen/model.ckpt-35000\
  --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt

Evaluate the model.

CUDA_VISIBLE_DEVICES='0,1' python eval_all.py\
  --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
  --data_dir ~/dataset/mscoco/all_images --job_dir saving_imcap
js-beautify saving_imcap/b34.json

Train sentence auto-encoder, which is used to initialize sentence GAN.
```
python initialization/sentence_ae.py
```
Train sentence GAN.
```
python initialization/sentence_gan.py
```

Train the full model with initialization. Suppose the best imcap validation checkpoint is 18000.

python im_caption_full.py --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
  --imcap_ckpt saving_imcap/model.ckpt-18000\
  --sae_ckpt sen_gan/model.ckpt-30000 --multi_gpu --batch_size 512\
  --save_checkpoint_steps 1000 --gen_lr 0.001 --dis_lr 0.001

Credits

Part of the code is from coco-caption, im2txt, tfgan, resnet, Tensorflow Object Detection API and maskgan.

Xinpeng told me the idea of self-critic, which is crucial to training.

Comments

crawl_description.py

I've been trying to run crawl_description.py but I am getting an error.

Traceback (most recent call last): File "crawl_descriptions1.py", line 92, in app.run(main) File "C:\Users\asus\Anaconda3\envs\tensorflow_gpu\lib\site-packages\absl\app.py", line 300, in run _run_main(main, args) File "C:\Users\asus\Anaconda3\envs\tensorflow_gpu\lib\site-packages\absl\app.py", line 251, in _run_main sys.exit(main(argv)) File "crawl_descriptions1.py", line 88, in main download(FLAGS.data_dir, FLAGS.num_pages, i, c) File "crawl_descriptions1.py", line 68, in download all_pages = get_num_pages(label) File "crawl_descriptions1.py", line 60, in get_num_pages num_pages = int(obj.group(1)) AttributeError: 'NoneType' object has no attribute 'group'

obj = re.search('data-max="(\d*)"', page)

This is mostly because obj is None as it is not able to find a match "('data-max="(\d*)" in the source code. The source code for Shutterstock might have changed.

Can anyone help. Or update the python files

opened by prajwalkkr 6
Processing image in batch in testing/evaluation

As metioned in https://github.com/fengyang0317/unsupervised_captioning/issues/4, the testing/evaluation are slow. One reason is they do not support multi-gpu for one model. And I found the crucial reason might be that they iterate images one by one instead of processing them in batch. I notice that you use different dataloaders between training and testing, where tfrec format and placeholder are used in training and testing respectively. I wonder why not testing/evaluation use the same dataloader and similar pipeline as training so that they can also process data in batch. Parameter batch_size is defined in caption_infer.py but it seems that the size larger than one will cause errors. https://github.com/fengyang0317/unsupervised_captioning/blob/ae17dc7edf556689eb943c8e51581a229ad41742/caption_infer.py#L29 Could you please kindly provide a batch version? Thanks!

opened by HYPJUDY 5
caption_infer

After training, I try to generate some captions based on images, but why are the captions generated by different images almost the same? (I didn't change your code)

opened by wangjiangnan 4
Chinese image caption， In the result, multiple words of the same type appear

Hello, I am using the COCO dataset, A two-layer LSTM model, one layer for top-down attention, and one layer for language models.

Extracting words with jieba I used all the words in the picture description that occurred more than 3 times as a dictionary file, and a total of 14,226 words. words = [w for w in word_freq.keys () if word_freq [w]> 3]

After training the model, when using it, multiple words of the same type appear in the result, such as:

Note notebook laptop computer on bed A little girl little girl girl standing together

How can I solve this problem?

opened by cylvzj 4

Confusion about the code

In process_descriptions.py , the 'key' is the intersection between sentence and category_name. But in input_pipeline.py, the 'key' and 'sentence' is not such. in process_descriptions.py :

>sentence
[1, 0, 58, 595, 10, 349, 12, 782, 0, 579, 3, 2]
>key
[595]

in input_pipeline.py

>sentence[0]
[   0,   65,   19, 1130,   37,  882,   10,  124,    5,   48,    5,
        345,    1,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0]
>key[0]
[8265, 2390,  878, 4930,   10,  436,    5,    7,  118, 2433,    8,
        388,  558,    5,  139,    6,    1,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0]

It seems that the 'sentence' completely unrelated to the' key '. Is that reasonable? and why the 'key' is different in the above two file？

opened by songpipi 4

ValueError: No variables provided.

I follow the steps strictly with tensorflow 1.13.1 and python 3.7. But when I run im_caption_full.py, I got this error. How to solve this error？

The error appears in model_fn, train_ops = tfgan.gan_train_ops( gan_model, gan_loss, generator_optimizer=gen_opt, discriminator_optimizer=dis_opt, transform_grads_fn=transform_grads_fn, summarize_gradients=is_chief, check_for_unused_update_ops=not FLAGS.use_pool, aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N)

(...)
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Traceback (most recent call last):
  File "im_caption_full.py", line 453, in <module>
    tf.app.run(main1())
  File "im_caption_full.py", line 449, in main1
    estimator.train(train_input_fn, max_steps=FLAGS.max_steps)
  File "/data/songpp/anaconda2/envs/tensorflow-gpu/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/data/songpp/anaconda2/envs/tensorflow-gpu/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/data/songpp/anaconda2/envs/tensorflow-gpu/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/data/songpp/anaconda2/envs/tensorflow-gpu/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "im_caption_full.py", line 372, in model_fn
    aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N)
  File "/data/songpp/anaconda2/envs/tensorflow-gpu/lib/python3.7/site-packages/tensorflow/contrib/gan/python/train.py", line 1012, in gan_train_ops
    **kwargs)
  File "/data/songpp/anaconda2/envs/tensorflow-gpu/lib/python3.7/site-packages/tensorflow/contrib/training/python/training/training.py", line 458, in create_train_op
    grad_updates = optimizer.apply_gradients(grads, global_step=global_step)
  File "/data/songpp/anaconda2/envs/tensorflow-gpu/lib/python3.7/site-packages/tensorflow/python/training/optimizer.py", line 572, in apply_gradients
    raise ValueError("No variables provided.")
ValueError: No variables provided.

opened by songpipi 4

Suggestions
Some corrections and suggestions about the repo.

The installation instructions are confusing - whether workspace is a directory inside unsupervised-captioning or the other way. "First we refer to requirements.txt and then we clone the repo" seems wrong.

In preprocessing/crawl_descriptions.py, obj = re.search('data-max="(\d*)"', page) should be obj = re.search('max="(\d*)"', page). I went through the HTML file and this is what I found (as of 8 April, 2019. The format could change later).

preprocessing/extract_descriptions.py imports config.py which is located in the root dir of the repo. Shouldn't this file be inside preprocessing dir. Also, the workspace dir needs maybe modified accordingly. Same with misc_fn.py

An error may occur while running preprocessing/extract_descriptions.py. This is a solution I found in GitHub -> answer
opened by gautamsreekumar 4
preprocessing/extract_descriptions.py

https://github.com/fengyang0317/unsupervised_captioning/blob/0e75b6aff4cc9e94249bc272fc5490e566ef5d7c/preprocessing/extract_descriptions.py#L16 ModuleNotFoundError: No module named 'data'

sorry to trouble you,I can't find a module called 'data',also there is no 'build_mscoco_data' module in 'data' file,how can i solve this?

opened by ironmanx1 3
some questions

$ python preprocessing/crawl_descriptions.py person 1000 0 [] bicycle 1000 0 [] car 1000 0 [] motorbike 1000 0 [] aeroplane 1000 0 [] ... Is this result normal? How to solve it?Looking forward to your reply.

sorry to bother you,i also get the same results in this step,but i don't know change which part in crawl_descriptions.py file.Can you help me?thank you~

opened by ironmanx1 2
The passed save_path is not a valid checkpoint

Hi, Thank you for sharing your code. However, I countered a problem when I tried to run one of your command line on the github. When I run this command " python test_model.py --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt
--data_dir ~/dataset/mscoco/all_images --job_dir saving/model.ckpt-20000", it returns following error; ValueError: The passed save_path is not a valid checkpoint: saving/model.ckpt-20000. Moreover, I used your saving file in Gdrive, however there is no model.ckpt-2000, there are files like model.ckpt-2000.index and model.ckpt-2000.meta. How can I run test model using pretrained model ?

opened by atg93 2
AttributeError: 'NoneType' object has no attribute 'group'

(py36) XXX@lthpc:~/XSpace/Games/ICP1_unsupervised_captioning$ python preprocessing/crawl_descriptions.py Traceback (most recent call last): File "preprocessing/crawl_descriptions.py", line 100, in app.run(main) File "/home/XXX/miniconda3/envs/py36/lib/python3.6/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/home/XXX/miniconda3/envs/py36/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "preprocessing/crawl_descriptions.py", line 96, in main download(FLAGS.data_dir, FLAGS.num_pages, i, c) File "preprocessing/crawl_descriptions.py", line 76, in download all_pages = get_num_pages(label) File "preprocessing/crawl_descriptions.py", line 60, in get_num_pages num_pages = int(obj.group(1)) AttributeError: 'NoneType' object has no attribute 'group'

opened by hello-lx 2
Bump tensorflow-gpu from 1.13.1 to 2.9.3
Bumps tensorflow-gpu from 1.13.1 to 2.9.3.

Release notes

Sourced from tensorflow-gpu's releases.

TensorFlow 2.9.3

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

TensorFlow 2.9.2

Release 2.9.2

This releases introduces several vulnerability fixes:

Fixes a CHECK failure in tf.reshape caused by overflows (CVE-2022-35934)

Fixes a CHECK failure in SobolSample caused by missing validation (CVE-2022-35935)

Fixes an OOB read in Gather_nd op in TF Lite (CVE-2022-35937)

Fixes a CHECK failure in TensorListReserve caused by missing validation (CVE-2022-35960)

Fixes an OOB write in Scatter_nd op in TF Lite (CVE-2022-35939)

Fixes an integer overflow in RaggedRangeOp (CVE-2022-35940)

Fixes a CHECK failure in AvgPoolOp (CVE-2022-35941)

Fixes a CHECK failures in UnbatchGradOp (CVE-2022-35952)

Fixes a segfault TFLite converter on per-channel quantized transposed convolutions (CVE-2022-36027)

Fixes a CHECK failures in AvgPool3DGrad (CVE-2022-35959)

Fixes a CHECK failures in FractionalAvgPoolGrad (CVE-2022-35963)

Fixes a segfault in BlockLSTMGradV2 (CVE-2022-35964)

Fixes a segfault in LowerBound and UpperBound (CVE-2022-35965)

... (truncated)

Changelog

Sourced from tensorflow-gpu's changelog.

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

Release 2.8.4

This release introduces several vulnerability fixes:

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

... (truncated)

Commits

a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2

258f9a1 Update py_func.cc

cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474

3e75385 Update version numbers to 2.9.3

bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695

3506c90 Update RELEASE.md

8dcb48e Update RELEASE.md

4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...

6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple

5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Failed to run im_caption_full.py

我使用您在百度网盘中提供的数据，当运行im_caption_full.py时出现如下错误： Traceback (most recent call last): File "/home/wangtao/miniconda3/envs/tensorflower1.4.0-py3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/home/wangtao/miniconda3/envs/tensorflower1.4.0-py3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/wangtao/miniconda3/envs/tensorflower1.4.0-py3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.DataLossError: 3 root error(s) found. (0) Data loss: corrupted record at 2137231365 [[{{node IteratorGetNext}}]] [[InceptionV4/Conv2d_1a_3x3/BatchNorm/moving_mean/read/_481]] (1) Data loss: corrupted record at 2137231365 [[{{node IteratorGetNext}}]] [[InceptionV4/Mixed_7b/Branch_3/Conv2d_0b_1x1/BatchNorm/beta/read/_5235]] (2) Data loss: corrupted record at 2137231365 [[{{node IteratorGetNext}}]] 0 successful operations. 2 derived errors ignored. 系统提示我数据缺失，我不知道您是否遇到过同样的问题，同时您是否有相应的解决方法

opened by ttx213 0
Data.build_mscoco_data not found

Hey yang, I encountered the below mentioned error when I ran "python preprocessing/extract_descriptions.py" command. Can you resolve this issue. Thank you in advance

Traceback (most recent call last): File "preprocessing/extract_descriptions.py", line 16, in from data.build_mscoco_data import _process_caption ModuleNotFoundError: No module named 'data.build_mscoco_data'

opened by Anirudh-crypto 0
Question about dataset and inference

Dear Yang, I encounter some problems when running your code. First, the Dataset you given in the first step is out of date. And also the Baidu Cloud link you given in the Q&A is out of date too. Would you please give me a new link or update the link on the Github if able? Secondly, maybe I don't fully understand the project. I do not know how to use your code to do the inference. To know your method better, I would like to see how it works. Thanks a lot! Looking forward to your reply!

opened by ShawnRBT 0
eval_all.py

https://github.com/fengyang0317/unsupervised_captioning/blob/0e75b6aff4cc9e94249bc272fc5490e566ef5d7c/eval_all.py#L4

https://github.com/fengyang0317/unsupervised_captioning/blob/0e75b6aff4cc9e94249bc272fc5490e566ef5d7c/eval_all.py#L92 https://github.com/fengyang0317/unsupervised_captioning/blob/0e75b6aff4cc9e94249bc272fc5490e566ef5d7c/eval_all.py#L93

multiprocessing.pool.MaybeEncodingError: Error sending result: 'NotFoundError()'. Reason: 'PicklingError("Can't pickle <type 'module'>: attribute lookup builtin.module failed",)' is this result happen because of the module multiprocessing?Have you had this problem?

opened by ironmanx1 0

Owner

Yang Feng

SWE @ Goolgle

GitHub

Optimized code based on M2 for faster image captioning training

Transformer Captioning This repository contains the code for Transformer-based image captioning. Based on meshed-memory-transformer, we further optimi

16 Dec 16, 2022

Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

arXiv Dual Contrastive Learning Adversarial Generative Networks (DCLGAN) We provide our PyTorch implementation of DCLGAN, which is a simple yet powerf

119 Dec 4, 2022

Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Diverse Image Captioning with Context-Object Split Latent Spaces This repository is the PyTorch implementation of the paper: Diverse Image Captioning

34 Nov 21, 2022

Semi-Autoregressive Transformer for Image Captioning

Semi-Autoregressive Transformer for Image Captioning Requirements Python 3.6 Pytorch 1.6 Prepare data Please use git clone --recurse-submodules to clo

23 Dec 9, 2022

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

310 Dec 28, 2022

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

VisualGPT Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning Main Architecture of Our VisualGPT Downloa

140 Dec 28, 2022

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

906 Jan 3, 2023

An unreferenced image captioning metric (ACL-21)

UMIC This repository provides an unferenced image captioning metric from our ACL 2021 paper UMIC: An Unreferenced Metric for Image Captioning via Cont

14 Nov 20, 2022

Image Captioning using CNN and Transformers

Image-Captioning Keras/Tensorflow Image Captioning application using CNN and Transformer as encoder/decoder. In particulary, the architecture consists

24 Dec 28, 2022

An Image Captioning codebase

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

1.1k Oct 18, 2021

A transformer-based method for Healthcare Image Captioning in Vietnamese

vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese This repo GitHub contains our solution for vieCap4H

4 May 5, 2022

Image Captioning using CNN ,LSTM and Attention

Image Captioning using CNN ,LSTM and Attention This is a deeplearning model which tries to summarize an image into a text . Installation Install this

1 Dec 16, 2021

Image Captioning on google cloud platform based on iot

Image-Captioning-on-google-cloud-platform-based-on-iot - Image Captioning on google cloud platform based on iot

1 Jan 20, 2022

Unsupervised Image-to-Image Translation

UNIT: UNsupervised Image-to-image Translation Networks Imaginaire Repository We have a reimplementation of the UNIT method that is more performant. It

1.9k Dec 26, 2022

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

16 Nov 4, 2020

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

96 Nov 1, 2022

Unsupervised captioning - Code for Unsupervised Image Captioning

Related tags

Overview

Unsupervised Image Captioning

Introduction

Citation

Requirements

Dataset (Optional. The files generated below can be found at Gdrive).

Training

Initialization (Optional. The files can be found at here).

Credits

Comments

TensorFlow 2.9.3

Release 2.9.3

TensorFlow 2.9.2

Release 2.9.2

Release 2.9.3

Release 2.8.4

Owner

Yang Feng

Optimized code based on M2 for faster image captioning training

Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Semi-Autoregressive Transformer for Image Captioning

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

An unreferenced image captioning metric (ACL-21)

Image Captioning using CNN and Transformers

An Image Captioning codebase

A transformer-based method for Healthcare Image Captioning in Vietnamese

Image Captioning using CNN ,LSTM and Attention

Image Captioning on google cloud platform based on iot

Unsupervised Image-to-Image Translation

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Code for "Unsupervised Layered Image Decomposition into Object Prototypes" paper

Syntax-Aware Action Targeting for Video Captioning

Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.