Trying to generate captions for my set of images, I try:
python3 artemis/scripts/sample_speaker.py \
-speaker-saved-args vanilla_sat_speaker/config.json.txt \
-speaker-checkpoint vanilla_sat_speaker/checkpoints/best_model.pt \
-img-dir /home/ricardokleinlein/Desktop/captioning/ARTEMIS/imgs/ \
-out-file ./OUTPUT_CAPTIONS \
--custom-data-csv /home/ricardokleinlein/Desktop/captioning/ARTEMIS/imgs/custom.csv
But I get stuck in this error:
Traceback (most recent call last):
File "artemis/scripts/sample_speaker.py", line 86, in <module>
captions_predicted, attn_weights = versatile_caption_sampler(speaker, annotate_loader, device, **config)
File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/captioning/sample_captions.py", line 35, in versatile_caption_sampler
drop_bigrams=drop_bigrams)
File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/PYTHON/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/neural_models/attentive_decoder.py", line 593, in sample_captions_beam_search
seqs = torch.cat([seqs[prev_word_inds], next_word_inds.unsqueeze(1)], dim=1) # (s, step+1)
IndexError: tensors used as indices must be long, byte or bool tensors
I've checked the actual value of the variables involves, and what I find is this:
prev_word_inds = tensor([0.0006, 0.0014, 0.0023, 0.0030, 0.0135], device='cuda:0')
next_word_inds = tensor([ 9, 20, 34, 44, 196], device='cuda:0')
Thus failing when trying to access seqs[prev_word_inds]
. How should I proceed?
Full log
Some config args are not set because I'm just trying to make it work for now.
Parameters Specified:
{'compute_nll': False,
'custom_data_csv': '/home/ricardokleinlein/Desktop/captioning/ARTEMIS/imgs/custom.csv',
'drop_bigrams': True,
'drop_unk': True,
'gpu': '0',
'img2emo_checkpoint': None,
'img_dir': '/home/ricardokleinlein/Desktop/captioning/ARTEMIS/imgs/',
'max_utterance_len': None,
'n_workers': None,
'out_file': './OUTPUT_CAPTIONS',
'random_seed': 2021,
'sampling_config_file': '/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/data/speaker_sampling_configs/selected_hyper_params.json.txt',
'speaker_checkpoint': 'vanilla_sat_speaker/checkpoints/best_model.pt',
'speaker_saved_args': 'vanilla_sat_speaker/config.json.txt',
'split': 'test',
'subsample_data': -1}
Loading saved speaker trained with parameters:
{'atn_cover_img_alpha': 1,
'atn_spatial_img_size': None,
'attention_dim': 512,
'batch_size': 128,
'data_dir': '/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/PREPROCESS_OUT',
'dataset': 'artemis',
'debug': False,
'decoder_lr': 0.0005,
'dropout_rate': 0.1,
'emo_grounding_dims': [9, 9],
'encoder_lr': 0.0001,
'fine_tune_data': None,
'gpu': '1',
'img_dim': 256,
'img_dir': '---YOUR----TOP-DIR-WITH-WIKI-ART-OR-TO-BE-ANNOTATED-IMAGES',
'lanczos': True,
'log_dir': '----YOUR---DIR-WHERE-YOU-UNZIPED-THIS-DL-ZIPPED-FOLDER-ENDING-WITH-THE-DATE-STAMP',
'lr_patience': 2,
'max_train_epochs': 50,
'num_workers': 1,
'random_seed': 2021,
'resume_path': None,
'rnn_hidden_dim': 512,
'save_each_epoch': False,
'teacher_forcing_ratio': 1,
'train_patience': 5,
'use_emo_grounding': False,
'use_timestamp': True,
'vis_encoder': 'resnet34',
'word_embedding_dim': 128}
Using a vocabulary of size 14469
Loading speaker model at epoch 7.
Loaded 429431 utterances
/home/ricardokleinlein/Desktop/captioning/ARTEMIS/PYTHON/lib/python3.7/site-packages/torchvision/transforms/transforms.py:288: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
"Argument interpolation should be of type InterpolationMode instead of int. "
Loaded 1 sampling configurations to try.
Sampling with configuration: {'sampling_rule': 'beam', 'temperature': 0.3, 'beam_size': 5, 'max_utterance_len': 30, 'drop_unk': True, 'drop_bigrams': True}
Traceback (most recent call last):
File "artemis/scripts/sample_speaker.py", line 86, in <module>
captions_predicted, attn_weights = versatile_caption_sampler(speaker, annotate_loader, device, **config)
File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/captioning/sample_captions.py", line 35, in versatile_caption_sampler
drop_bigrams=drop_bigrams)
File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/PYTHON/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/neural_models/attentive_decoder.py", line 593, in sample_captions_beam_search
seqs = torch.cat([seqs[prev_word_inds], next_word_inds.unsqueeze(1)], dim=1) # (s, step+1)
IndexError: tensors used as indices must be long, byte or bool tensors