ArtEmis: Affective Language for Art

Overview
Comments
  • Can't generate captions for images of my own

    Can't generate captions for images of my own

    Trying to generate captions for my set of images, I try:

    python3 artemis/scripts/sample_speaker.py \
    -speaker-saved-args vanilla_sat_speaker/config.json.txt \
    -speaker-checkpoint vanilla_sat_speaker/checkpoints/best_model.pt \
    -img-dir /home/ricardokleinlein/Desktop/captioning/ARTEMIS/imgs/ \
    -out-file ./OUTPUT_CAPTIONS \
    --custom-data-csv /home/ricardokleinlein/Desktop/captioning/ARTEMIS/imgs/custom.csv
    

    But I get stuck in this error:

    Traceback (most recent call last):
      File "artemis/scripts/sample_speaker.py", line 86, in <module>
        captions_predicted, attn_weights = versatile_caption_sampler(speaker, annotate_loader, device, **config)
      File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/captioning/sample_captions.py", line 35, in versatile_caption_sampler
        drop_bigrams=drop_bigrams)
      File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/PYTHON/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
        return func(*args, **kwargs)
      File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/neural_models/attentive_decoder.py", line 593, in sample_captions_beam_search
        seqs = torch.cat([seqs[prev_word_inds], next_word_inds.unsqueeze(1)], dim=1)  # (s, step+1)
    IndexError: tensors used as indices must be long, byte or bool tensors
    

    I've checked the actual value of the variables involves, and what I find is this:

    prev_word_inds = tensor([0.0006, 0.0014, 0.0023, 0.0030, 0.0135], device='cuda:0')
    next_word_inds = tensor([  9,  20,  34,  44, 196], device='cuda:0')
    

    Thus failing when trying to access seqs[prev_word_inds]. How should I proceed?

    Full log

    Some config args are not set because I'm just trying to make it work for now.

    Parameters Specified:
    {'compute_nll': False,
     'custom_data_csv': '/home/ricardokleinlein/Desktop/captioning/ARTEMIS/imgs/custom.csv',
     'drop_bigrams': True,
     'drop_unk': True,
     'gpu': '0',
     'img2emo_checkpoint': None,
     'img_dir': '/home/ricardokleinlein/Desktop/captioning/ARTEMIS/imgs/',
     'max_utterance_len': None,
     'n_workers': None,
     'out_file': './OUTPUT_CAPTIONS',
     'random_seed': 2021,
     'sampling_config_file': '/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/data/speaker_sampling_configs/selected_hyper_params.json.txt',
     'speaker_checkpoint': 'vanilla_sat_speaker/checkpoints/best_model.pt',
     'speaker_saved_args': 'vanilla_sat_speaker/config.json.txt',
     'split': 'test',
     'subsample_data': -1}
    
    
    Loading saved speaker trained with parameters:
    {'atn_cover_img_alpha': 1,
     'atn_spatial_img_size': None,
     'attention_dim': 512,
     'batch_size': 128,
     'data_dir': '/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/PREPROCESS_OUT',
     'dataset': 'artemis',
     'debug': False,
     'decoder_lr': 0.0005,
     'dropout_rate': 0.1,
     'emo_grounding_dims': [9, 9],
     'encoder_lr': 0.0001,
     'fine_tune_data': None,
     'gpu': '1',
     'img_dim': 256,
     'img_dir': '---YOUR----TOP-DIR-WITH-WIKI-ART-OR-TO-BE-ANNOTATED-IMAGES',
     'lanczos': True,
     'log_dir': '----YOUR---DIR-WHERE-YOU-UNZIPED-THIS-DL-ZIPPED-FOLDER-ENDING-WITH-THE-DATE-STAMP',
     'lr_patience': 2,
     'max_train_epochs': 50,
     'num_workers': 1,
     'random_seed': 2021,
     'resume_path': None,
     'rnn_hidden_dim': 512,
     'save_each_epoch': False,
     'teacher_forcing_ratio': 1,
     'train_patience': 5,
     'use_emo_grounding': False,
     'use_timestamp': True,
     'vis_encoder': 'resnet34',
     'word_embedding_dim': 128}
    Using a vocabulary of size 14469
    Loading speaker model at epoch 7.
    Loaded 429431 utterances
    /home/ricardokleinlein/Desktop/captioning/ARTEMIS/PYTHON/lib/python3.7/site-packages/torchvision/transforms/transforms.py:288: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
      "Argument interpolation should be of type InterpolationMode instead of int. "
    Loaded 1 sampling configurations to try.
    Sampling with configuration:  {'sampling_rule': 'beam', 'temperature': 0.3, 'beam_size': 5, 'max_utterance_len': 30, 'drop_unk': True, 'drop_bigrams': True}
    Traceback (most recent call last):
      File "artemis/scripts/sample_speaker.py", line 86, in <module>
        captions_predicted, attn_weights = versatile_caption_sampler(speaker, annotate_loader, device, **config)
      File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/captioning/sample_captions.py", line 35, in versatile_caption_sampler
        drop_bigrams=drop_bigrams)
      File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/PYTHON/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
        return func(*args, **kwargs)
      File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/neural_models/attentive_decoder.py", line 593, in sample_captions_beam_search
        seqs = torch.cat([seqs[prev_word_inds], next_word_inds.unsqueeze(1)], dim=1)  # (s, step+1)
    IndexError: tensors used as indices must be long, byte or bool tensors
    
    opened by ricardokleinklein 5
  • Data zip is not available

    Data zip is not available

    I was led to a link to a zip file once I agreed to the terms and conditions. It is a ZIP file. I cannot access this file since it is stuck on "redirecting...."

    opened by metaphorz 2
  • Can't find artemis_preprocessed.csv and rescaled_max_size_to_600px_same_aspect_ratio on repo?

    Can't find artemis_preprocessed.csv and rescaled_max_size_to_600px_same_aspect_ratio on repo?

    Hi, in the notebook for the basic linguistic, emotion & art-oriented analysis of the ArtEmis dataset, you are required to have the data specified these filepaths downloaded. But I can't seem to find them anywhere on the repo? Can you advise here please? Thanks.

    artemis_preprocessed_csv = '/home/optas/DATA/OUT/artemis/preprocessed_data/for_analysis/artemis_preprocessed.csv'
    wikiart_img_dir = '/home/optas/DATA/Images/Wiki-Art/rescaled_max_size_to_600px_same_aspect_ratio'
    
    opened by texturejc 2
  • RuntimeError encountered when training a text2emotion lstm classifier

    RuntimeError encountered when training a text2emotion lstm classifier

    I'm trying to train an lstm based txt2emo classifier, and use the code provided in notebook. But unfortunately I got stuck in the following error and have no idea anything wrong.

    Traceback (most recent call last): File "/ibex/scratch/sunp/project/artemis/artemis/notebooks/deep_nets/emotions/train_text2emo.py", line 170, in single_epoch_train(model, dataloaders['train'], args.use_imgs, criterion, optimizer, device) File "/ibex/scratch/sunp/project/artemis/artemis/neural_models/text_emotional_clf.py", line 55, in single_epoch_train loss.backward() File "/home/sunp/ibex-miniconda-install/ENTER/envs/artemislic/lib/python3.6/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/sunp/ibex-miniconda-install/ENTER/envs/artemislic/lib/python3.6/site-packages/torch/autograd/init.py", line 156, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1024, 31, 256]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

    Is this caused by the python version or network architecture? It would be nice of you if you can help to check it. Thanks in advance:)

    opened by pengzhansun 1
  • How can I call use a pretrained model to generate a prediction on an image?

    How can I call use a pretrained model to generate a prediction on an image?

    Thanks for making this code available. Apologies if I am merely ignorant about Pytorch here, but I have a question about the pretrained models that are available for download. I'd like to use these to generate text on unseen images. To do this, I downloaded the the SAT-Speaker-with-emotion-grounding (431MB) model from the repo. However, I don't seem to be able to load it. When I download the model and run the script below, I get a dictionary and not the model.

    Loading the model:

    model_emo = torch_load_model('best_model.pt', map_location=torch.device('cpu'))

    Running the model

    model_emo(image)

    The error:

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-12-e80ccbe8b6ed> in <module>
    ----> 1 model_emo(image)
    
    TypeError: 'dict' object is not callable
    

    Now, the PyTorch docs say that I should instantiate the model class and then load the checkpoint data. However, I don't know what model class this belongs to, and the README doesn't say. Do you have any advice on how to proceed with this issue? Thanks.

    opened by texturejc 1
  • Trying to preprocess the artemis data

    Trying to preprocess the artemis data

    C:\Users\safin\Miniconda3\envs\artemis\python.exe C:/Users/safin/artemis/artemis/scripts/preprocess_artemis_data.py {'automatic_spell_check': True, 'group_gt_anno': True, 'min_word_freq': 0, 'n_train_examples': None, 'preprocess_for_deep_nets': False, 'random_seed': 2021, 'raw_artemis_data_csv': './ola_dataset_release_v0.csv', 'save_out_dir': './', 'split_loads': [0.85, 0.05, 0.1], 'too_high_repetition': -1, 'too_long_utter_prc': 100, 'too_short_len': 0} 5000 annotations were loaded Using a 0.85,0.05,0.1 for train/val/test purposes Traceback (most recent call last): File "C:/Users/safin/artemis/artemis/scripts/preprocess_artemis_data.py", line 246, in df, vocab, missed_tokens = preprocess(args) File "C:/Users/safin/artemis/artemis/scripts/preprocess_artemis_data.py", line 182, in preprocess missed_tokens = tokenize_and_spell(df, glove_file, freq_file, nltk.word_tokenize, spell_check=args.automatic_spell_check) File "c:\users\safin\artemis\artemis\language\basics.py", line 66, in tokenize_and_spell golden_vocabulary = load_glove_pretrained_embedding(glove_file, only_words=True, verbose=True) File "c:\users\safin\artemis\artemis\neural_models\word_embeddings.py", line 74, in load_glove_pretrained_embedding for line in f_in: File "C:\Users\safin\Miniconda3\envs\artemis\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 3515: character maps to SymSpell spell-checker loaded: True Loading glove word embeddings.

    OS: Windows 10 python: 3.6 I have the data artemis dataset in the directory, and just ran everything in the default setting and ran into UnicodeDecodeError error.

    opened by hlsafin 1
  • Can't get a vocabulary with 14996 tokens ,so I can't use  pretrained models.

    Can't get a vocabulary with 14996 tokens ,so I can't use pretrained models.

    I set "--preprocess-for-deep-nets True",but I just get a vocabulary with 14117 tokens,What should I do? {'automatic_spell_check': True, 'group_gt_anno': True, 'min_word_freq': 5, 'n_train_examples': None, 'preprocess_for_deep_nets': True, 'random_seed': 2021, 'raw_artemis_data_csv': 'D:/ArtEmis/artemis-master/DataSet/ArtEmis/artemis_official_data/official_data/artemis_dataset_release_v0.csv', 'save_out_dir': 'step1_processed_data', 'split_loads': [0.85, 0.05, 0.1], 'too_high_repetition': 41, 'too_long_utter_prc': 95, 'too_short_len': 5} 454684 annotations were loaded Using a 0.85,0.05,0.1 for train/val/test purposes SymSpell spell-checker loaded: True Loading glove word embeddings. Done. 400000 words loaded. Updating Glove vocabulary with valid ArtEmis words that are missing from it. 3057 annotations will be dropped as they contain less than 5 tokens Too-long token length at 95-percentile is 30.0. 22196 annotations will be dropped Using a vocabulary with 14117 tokens n-utterances kept: 429431 vocab size: 14117 tokens not in Glove/Manual vocabulary: 1148 Done. Check saved results in provided save-out-dir: step1_processed_data

    opened by LT156 0
  • IndexError:

    IndexError:

    Traceback (most recent call last): File "scripts/sample_speaker.py", line 88, in captions_predicted, attn_weights = versatile_caption_sampler(speaker, annotate_loader, device, **config) File "../artemis/captioning/sample_captions.py", line 35, in versatile_caption_sampler drop_bigrams=drop_bigrams) File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "../artemis/neural_models/attentive_decoder.py", line 586, in sample_captions_beam_search seqs = torch.cat([seqs[prev_word_inds], next_word_inds.unsqueeze(1)], dim=1) # (s, step+1) IndexError: tensors used as indices must be long, byte or bool tensors

    When run sample_speaker.py, there is a mistake. Can you advise what I'm doing wrong here? Thanks!

    opened by feixiangqiqi 0
  • Where can I find the version of the WikiArt dataset used here?

    Where can I find the version of the WikiArt dataset used here?

    Hi,

    Thanks for releasing this code! I download the WikiArt dataset from here - https://archive.org/details/wikiart-dataset - but a) it doesn't have the 600px resized subfolder, and b) while training I got an error no such file 'Baroque/rembrandt_woman-standing-with-raised-hands.jpg'.

    Any tips for where I can download the version of the WikiArt dataset that's used here? It would be much appreciated.

    Thanks, Nish

    opened by NishantTharani 3
  • Fix Notebook?

    Fix Notebook?

    Can anyone help me put this into a streamlined notebook? I just want to input photos and output descriptions based on the pre-trained models. https://colab.research.google.com/drive/13IfMWEj1bEqCsyQK64qKnPyloB5lFfZ_?usp=sharing I'm getting several errors. Thanks.

    An assertion error in step 2 - If I use official data release instead of preprocess I don't get this error. Maybe due to the 14468 vs 14469 discrepency?

    assert each image has at least 5 (human) votes! x = image_distibutions.apply(sum) assert all(x.values >= 5)

    A split error in step 3 prepare data data_loaders, datasets = image_emotion_distribution_df_to_pytorch_dataset(artemis_data, args)

    An attribute error with sample speaker !python '/content/artemis/artemis/scripts/sample_speaker.py'
    -speaker-saved-args '/content/config.json.txt'
    -speaker-checkpoint '/content/best_model.pt'
    -img-dir '/content/Input'
    -out-file .'/content/Output'
    --custom-data-csv '/content/artemis_preprocessed.csv'

    And a interpolation warning from nearest neighbor Extract features device = torch.device("cuda:" + gpu_id) train_feats = extract_visual_features(train_images, img_dim, method=method, device=device) test_feats = extract_visual_features(test_images, img_dim, method=method, device=device)

    UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. "Argument interpolation should be of type InterpolationMode instead of int. "

    opened by RED3480 1
  • DataFrame' object has no attribute 'tokens'  in analyzing_artemis task.

    DataFrame' object has no attribute 'tokens' in analyzing_artemis task.

    I try running the analyzing_artemis file to analysis the first examples but there are something wrong with that. When I do this I got: File "Downloads/code/artemis_code/analyzing.py", line 28, in df.tokens = df.tokens.apply(literal_eval) # to make them a python list. File "artemis/lib/python3.8/site-packages/pandas/core/generic.py", line 5465, in __getattr__return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'tokens' I think maybe something wrong with df(artemis_preprocessed_csv), but I am not sure.

    opened by lr19960813 1
  • Can't seem to get sample_speaker.py to generate text for new images

    Can't seem to get sample_speaker.py to generate text for new images

    I wish to generate caption text for images that I'll be providing. My understanding is that sample_speaker.py will do this. However, when I run it I get an error. Here's what I run in terminal, with the relevant parts of config.json.txt changed.

    python sample_speaker.py -speaker-saved-args config.json.txt -speaker-checkpoint best_model.pt -img-dir image_folder -out-file /Outputs/results.pkl

    When I do this, I get:

    RuntimeError: Error(s) in loading state_dict for ModuleDict:
    	size mismatch for decoder.word_embedding.weight: copying a param with shape torch.Size([14469, 128]) from checkpoint, the shape in current model is torch.Size([35466, 128]).
    	size mismatch for decoder.next_word.weight: copying a param with shape torch.Size([14469, 512]) from checkpoint, the shape in current model is torch.Size([35466, 512]).
    	size mismatch for decoder.next_word.bias: copying a param with shape torch.Size([14469]) from checkpoint, the shape in current model is torch.Size([35466]).
    
    

    Can you advise what I'm doing wrong here? I can't quite get to the bottom of it. Thanks!

    opened by texturejc 8
Owner
Panos
DPhil@Stanford. Previously: RE@FAIR, ML@AutoDesk, RE@Max Planck Cybernetics
Panos
State of the Art Neural Networks for Deep Learning

pyradox This python library helps you with implementing various state of the art neural networks in a totally customizable fashion using Tensorflow 2

Ritvik Rastogi 60 May 29, 2022
Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

A Critical Assessment of State-of-the-Art in Entity Alignment This repository contains the source code for the paper A Critical Assessment of State-of

Max Berrendorf 16 Oct 14, 2022
Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

null 349 Dec 8, 2022
State of the art Semantic Sentence Embeddings

Contrastive Tension State of the art Semantic Sentence Embeddings Published Paper · Huggingface Models · Report Bug Overview This is the official code

Fredrik Carlsson 88 Dec 30, 2022
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

null 152 Jan 2, 2023
LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and build their own methods.

TuZheng 405 Jan 4, 2023
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

timeseriesAI 2.8k Jan 8, 2023
Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

null 19 Sep 29, 2022
General Virtual Sketching Framework for Vector Line Art (SIGGRAPH 2021)

General Virtual Sketching Framework for Vector Line Art - SIGGRAPH 2021 Paper | Project Page Outline Dependencies Testing with Trained Weights Trainin

Haoran MO 118 Dec 27, 2022
State-of-the-art data augmentation search algorithms in PyTorch

MuarAugment Description MuarAugment is a package providing the easiest way to a state-of-the-art data augmentation pipeline. How to use You can instal

null 43 Dec 12, 2022
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, scikit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio, video, etc.) and machine learning tasks (classification, object detection, speech recognition, generation, certification, etc.).

null 3.4k Jan 4, 2023
A selection of State Of The Art research papers (and code) on human locomotion (pose + trajectory) prediction (forecasting)

A selection of State Of The Art research papers (and code) on human trajectory prediction (forecasting). Papers marked with [W] are workshop papers.

Karttikeya Manglam 40 Nov 18, 2022
A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

CSL-YOLO: A New Lightweight Object Detection System for Edge Computing This project provides a SOTA level lightweight YOLO called "Cross-Stage Lightwe

Miles Zhang 54 Dec 21, 2022
😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

------ Update September 2018 ------ It's been a year since TorchMoji and DeepMoji were released. We're trying to understand how it's being used such t

Hugging Face 865 Dec 24, 2022
FastReID is a research platform that implements state-of-the-art re-identification algorithms.

FastReID is a research platform that implements state-of-the-art re-identification algorithms.

JDAI-CV 2.8k Jan 7, 2023
Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

Summary Explorer Summary Explorer is a tool to visually inspect the summaries from several state-of-the-art neural summarization models across multipl

Webis 42 Aug 14, 2022
PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

PaddlePaddle Vision Transformers State-of-the-art Visual Transformer and MLP Models for PaddlePaddle ?? PaddlePaddle Visual Transformers (PaddleViT or

null 1k Dec 28, 2022