ArtEmis: Affective Language for Art

Panos

Last update: Dec 12, 2022

Related tags

Overview

ArtEmis: Affective Language for Art

Created by Panos Achlioptas, Maks Ovsjanikov, Kilichbek Haydarov, Mohamed Elhoseiny, Leonidas J. Guibas

Introduction

This work is based on our arXiv tech report.

- This repo will be live soon! Stay Tuned!! (e.g., feel free to star it:)

Meanwhile, you can explore the ArtEmis annotations here!

Comments

Can't generate captions for images of my own

Trying to generate captions for my set of images, I try:

python3 artemis/scripts/sample_speaker.py \
-speaker-saved-args vanilla_sat_speaker/config.json.txt \
-speaker-checkpoint vanilla_sat_speaker/checkpoints/best_model.pt \
-img-dir /home/ricardokleinlein/Desktop/captioning/ARTEMIS/imgs/ \
-out-file ./OUTPUT_CAPTIONS \
--custom-data-csv /home/ricardokleinlein/Desktop/captioning/ARTEMIS/imgs/custom.csv

But I get stuck in this error:

Traceback (most recent call last):
  File "artemis/scripts/sample_speaker.py", line 86, in <module>
    captions_predicted, attn_weights = versatile_caption_sampler(speaker, annotate_loader, device, **config)
  File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/captioning/sample_captions.py", line 35, in versatile_caption_sampler
    drop_bigrams=drop_bigrams)
  File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/PYTHON/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/neural_models/attentive_decoder.py", line 593, in sample_captions_beam_search
    seqs = torch.cat([seqs[prev_word_inds], next_word_inds.unsqueeze(1)], dim=1)  # (s, step+1)
IndexError: tensors used as indices must be long, byte or bool tensors

I've checked the actual value of the variables involves, and what I find is this:

prev_word_inds = tensor([0.0006, 0.0014, 0.0023, 0.0030, 0.0135], device='cuda:0')
next_word_inds = tensor([  9,  20,  34,  44, 196], device='cuda:0')

Thus failing when trying to access seqs[prev_word_inds]. How should I proceed?

Full log

Some config args are not set because I'm just trying to make it work for now.

Parameters Specified:
{'compute_nll': False,
 'custom_data_csv': '/home/ricardokleinlein/Desktop/captioning/ARTEMIS/imgs/custom.csv',
 'drop_bigrams': True,
 'drop_unk': True,
 'gpu': '0',
 'img2emo_checkpoint': None,
 'img_dir': '/home/ricardokleinlein/Desktop/captioning/ARTEMIS/imgs/',
 'max_utterance_len': None,
 'n_workers': None,
 'out_file': './OUTPUT_CAPTIONS',
 'random_seed': 2021,
 'sampling_config_file': '/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/data/speaker_sampling_configs/selected_hyper_params.json.txt',
 'speaker_checkpoint': 'vanilla_sat_speaker/checkpoints/best_model.pt',
 'speaker_saved_args': 'vanilla_sat_speaker/config.json.txt',
 'split': 'test',
 'subsample_data': -1}


Loading saved speaker trained with parameters:
{'atn_cover_img_alpha': 1,
 'atn_spatial_img_size': None,
 'attention_dim': 512,
 'batch_size': 128,
 'data_dir': '/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/PREPROCESS_OUT',
 'dataset': 'artemis',
 'debug': False,
 'decoder_lr': 0.0005,
 'dropout_rate': 0.1,
 'emo_grounding_dims': [9, 9],
 'encoder_lr': 0.0001,
 'fine_tune_data': None,
 'gpu': '1',
 'img_dim': 256,
 'img_dir': '---YOUR----TOP-DIR-WITH-WIKI-ART-OR-TO-BE-ANNOTATED-IMAGES',
 'lanczos': True,
 'log_dir': '----YOUR---DIR-WHERE-YOU-UNZIPED-THIS-DL-ZIPPED-FOLDER-ENDING-WITH-THE-DATE-STAMP',
 'lr_patience': 2,
 'max_train_epochs': 50,
 'num_workers': 1,
 'random_seed': 2021,
 'resume_path': None,
 'rnn_hidden_dim': 512,
 'save_each_epoch': False,
 'teacher_forcing_ratio': 1,
 'train_patience': 5,
 'use_emo_grounding': False,
 'use_timestamp': True,
 'vis_encoder': 'resnet34',
 'word_embedding_dim': 128}
Using a vocabulary of size 14469
Loading speaker model at epoch 7.
Loaded 429431 utterances
/home/ricardokleinlein/Desktop/captioning/ARTEMIS/PYTHON/lib/python3.7/site-packages/torchvision/transforms/transforms.py:288: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
  "Argument interpolation should be of type InterpolationMode instead of int. "
Loaded 1 sampling configurations to try.
Sampling with configuration:  {'sampling_rule': 'beam', 'temperature': 0.3, 'beam_size': 5, 'max_utterance_len': 30, 'drop_unk': True, 'drop_bigrams': True}
Traceback (most recent call last):
  File "artemis/scripts/sample_speaker.py", line 86, in <module>
    captions_predicted, attn_weights = versatile_caption_sampler(speaker, annotate_loader, device, **config)
  File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/captioning/sample_captions.py", line 35, in versatile_caption_sampler
    drop_bigrams=drop_bigrams)
  File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/PYTHON/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/ricardokleinlein/Desktop/captioning/ARTEMIS/artemis/artemis/neural_models/attentive_decoder.py", line 593, in sample_captions_beam_search
    seqs = torch.cat([seqs[prev_word_inds], next_word_inds.unsqueeze(1)], dim=1)  # (s, step+1)
IndexError: tensors used as indices must be long, byte or bool tensors

opened by ricardokleinklein 5

Data zip is not available

I was led to a link to a zip file once I agreed to the terms and conditions. It is a ZIP file. I cannot access this file since it is stuck on "redirecting...."

opened by metaphorz 2
Can't find artemis_preprocessed.csv and rescaled_max_size_to_600px_same_aspect_ratio on repo?
Hi, in the notebook for the basic linguistic, emotion & art-oriented analysis of the ArtEmis dataset, you are required to have the data specified these filepaths downloaded. But I can't seem to find them anywhere on the repo? Can you advise here please? Thanks.

artemis_preprocessed_csv = '/home/optas/DATA/OUT/artemis/preprocessed_data/for_analysis/artemis_preprocessed.csv' wikiart_img_dir = '/home/optas/DATA/Images/Wiki-Art/rescaled_max_size_to_600px_same_aspect_ratio'
opened by texturejc 2
RuntimeError encountered when training a text2emotion lstm classifier

I'm trying to train an lstm based txt2emo classifier, and use the code provided in notebook. But unfortunately I got stuck in the following error and have no idea anything wrong.

Traceback (most recent call last): File "/ibex/scratch/sunp/project/artemis/artemis/notebooks/deep_nets/emotions/train_text2emo.py", line 170, in single_epoch_train(model, dataloaders['train'], args.use_imgs, criterion, optimizer, device) File "/ibex/scratch/sunp/project/artemis/artemis/neural_models/text_emotional_clf.py", line 55, in single_epoch_train loss.backward() File "/home/sunp/ibex-miniconda-install/ENTER/envs/artemislic/lib/python3.6/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/sunp/ibex-miniconda-install/ENTER/envs/artemislic/lib/python3.6/site-packages/torch/autograd/init.py", line 156, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1024, 31, 256]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Is this caused by the python version or network architecture? It would be nice of you if you can help to check it. Thanks in advance:)

opened by pengzhansun 1
How can I call use a pretrained model to generate a prediction on an image?
Thanks for making this code available. Apologies if I am merely ignorant about Pytorch here, but I have a question about the pretrained models that are available for download. I'd like to use these to generate text on unseen images. To do this, I downloaded the the SAT-Speaker-with-emotion-grounding (431MB) model from the repo. However, I don't seem to be able to load it. When I download the model and run the script below, I get a dictionary and not the model.

Loading the model:

model_emo = torch_load_model('best_model.pt', map_location=torch.device('cpu'))

Running the model

model_emo(image)

The error:

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-12-e80ccbe8b6ed> in <module> ----> 1 model_emo(image) TypeError: 'dict' object is not callable

Now, the PyTorch docs say that I should instantiate the model class and then load the checkpoint data. However, I don't know what model class this belongs to, and the README doesn't say. Do you have any advice on how to proceed with this issue? Thanks.
opened by texturejc 1
Trying to preprocess the artemis data

C:\Users\safin\Miniconda3\envs\artemis\python.exe C:/Users/safin/artemis/artemis/scripts/preprocess_artemis_data.py {'automatic_spell_check': True, 'group_gt_anno': True, 'min_word_freq': 0, 'n_train_examples': None, 'preprocess_for_deep_nets': False, 'random_seed': 2021, 'raw_artemis_data_csv': './ola_dataset_release_v0.csv', 'save_out_dir': './', 'split_loads': [0.85, 0.05, 0.1], 'too_high_repetition': -1, 'too_long_utter_prc': 100, 'too_short_len': 0} 5000 annotations were loaded Using a 0.85,0.05,0.1 for train/val/test purposes Traceback (most recent call last): File "C:/Users/safin/artemis/artemis/scripts/preprocess_artemis_data.py", line 246, in df, vocab, missed_tokens = preprocess(args) File "C:/Users/safin/artemis/artemis/scripts/preprocess_artemis_data.py", line 182, in preprocess missed_tokens = tokenize_and_spell(df, glove_file, freq_file, nltk.word_tokenize, spell_check=args.automatic_spell_check) File "c:\users\safin\artemis\artemis\language\basics.py", line 66, in tokenize_and_spell golden_vocabulary = load_glove_pretrained_embedding(glove_file, only_words=True, verbose=True) File "c:\users\safin\artemis\artemis\neural_models\word_embeddings.py", line 74, in load_glove_pretrained_embedding for line in f_in: File "C:\Users\safin\Miniconda3\envs\artemis\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 3515: character maps to SymSpell spell-checker loaded: True Loading glove word embeddings.

OS: Windows 10 python: 3.6 I have the data artemis dataset in the directory, and just ran everything in the default setting and ran into UnicodeDecodeError error.

opened by hlsafin 1
Can't get a vocabulary with 14996 tokens ,so I can't use pretrained models.

I set "--preprocess-for-deep-nets True",but I just get a vocabulary with 14117 tokens,What should I do? {'automatic_spell_check': True, 'group_gt_anno': True, 'min_word_freq': 5, 'n_train_examples': None, 'preprocess_for_deep_nets': True, 'random_seed': 2021, 'raw_artemis_data_csv': 'D:/ArtEmis/artemis-master/DataSet/ArtEmis/artemis_official_data/official_data/artemis_dataset_release_v0.csv', 'save_out_dir': 'step1_processed_data', 'split_loads': [0.85, 0.05, 0.1], 'too_high_repetition': 41, 'too_long_utter_prc': 95, 'too_short_len': 5} 454684 annotations were loaded Using a 0.85,0.05,0.1 for train/val/test purposes SymSpell spell-checker loaded: True Loading glove word embeddings. Done. 400000 words loaded. Updating Glove vocabulary with valid ArtEmis words that are missing from it. 3057 annotations will be dropped as they contain less than 5 tokens Too-long token length at 95-percentile is 30.0. 22196 annotations will be dropped Using a vocabulary with 14117 tokens n-utterances kept: 429431 vocab size: 14117 tokens not in Glove/Manual vocabulary: 1148 Done. Check saved results in provided save-out-dir: step1_processed_data

opened by LT156 0
IndexError:

Traceback (most recent call last): File "scripts/sample_speaker.py", line 88, in captions_predicted, attn_weights = versatile_caption_sampler(speaker, annotate_loader, device, **config) File "../artemis/captioning/sample_captions.py", line 35, in versatile_caption_sampler drop_bigrams=drop_bigrams) File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "../artemis/neural_models/attentive_decoder.py", line 586, in sample_captions_beam_search seqs = torch.cat([seqs[prev_word_inds], next_word_inds.unsqueeze(1)], dim=1) # (s, step+1) IndexError: tensors used as indices must be long, byte or bool tensors

When run sample_speaker.py, there is a mistake. Can you advise what I'm doing wrong here? Thanks!

opened by feixiangqiqi 0
Where can I find the version of the WikiArt dataset used here?

Hi,

Thanks for releasing this code! I download the WikiArt dataset from here - https://archive.org/details/wikiart-dataset - but a) it doesn't have the 600px resized subfolder, and b) while training I got an error no such file 'Baroque/rembrandt_woman-standing-with-raised-hands.jpg'.

Any tips for where I can download the version of the WikiArt dataset that's used here? It would be much appreciated.

Thanks, Nish

opened by NishantTharani 3
Fix Notebook?

Can anyone help me put this into a streamlined notebook? I just want to input photos and output descriptions based on the pre-trained models. https://colab.research.google.com/drive/13IfMWEj1bEqCsyQK64qKnPyloB5lFfZ_?usp=sharing I'm getting several errors. Thanks.

An assertion error in step 2 - If I use official data release instead of preprocess I don't get this error. Maybe due to the 14468 vs 14469 discrepency?

assert each image has at least 5 (human) votes! x = image_distibutions.apply(sum) assert all(x.values >= 5)

A split error in step 3 prepare data data_loaders, datasets = image_emotion_distribution_df_to_pytorch_dataset(artemis_data, args)

An attribute error with sample speaker !python '/content/artemis/artemis/scripts/sample_speaker.py'
-speaker-saved-args '/content/config.json.txt'
-speaker-checkpoint '/content/best_model.pt'
-img-dir '/content/Input'
-out-file .'/content/Output'
--custom-data-csv '/content/artemis_preprocessed.csv'

And a interpolation warning from nearest neighbor Extract features device = torch.device("cuda:" + gpu_id) train_feats = extract_visual_features(train_images, img_dim, method=method, device=device) test_feats = extract_visual_features(test_images, img_dim, method=method, device=device)

UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. "Argument interpolation should be of type InterpolationMode instead of int. "

opened by RED3480 1
DataFrame' object has no attribute 'tokens' in analyzing_artemis task.

I try running the analyzing_artemis file to analysis the first examples but there are something wrong with that. When I do this I got： File "Downloads/code/artemis_code/analyzing.py", line 28, in df.tokens = df.tokens.apply(literal_eval) # to make them a python list. File "artemis/lib/python3.8/site-packages/pandas/core/generic.py", line 5465, in __getattr__return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'tokens' I think maybe something wrong with df（artemis_preprocessed_csv）, but I am not sure.

opened by lr19960813 1
Can't seem to get sample_speaker.py to generate text for new images
I wish to generate caption text for images that I'll be providing. My understanding is that sample_speaker.py will do this. However, when I run it I get an error. Here's what I run in terminal, with the relevant parts of config.json.txt changed.

python sample_speaker.py -speaker-saved-args config.json.txt -speaker-checkpoint best_model.pt -img-dir image_folder -out-file /Outputs/results.pkl

When I do this, I get:

RuntimeError: Error(s) in loading state_dict for ModuleDict: size mismatch for decoder.word_embedding.weight: copying a param with shape torch.Size([14469, 128]) from checkpoint, the shape in current model is torch.Size([35466, 128]). size mismatch for decoder.next_word.weight: copying a param with shape torch.Size([14469, 512]) from checkpoint, the shape in current model is torch.Size([35466, 512]). size mismatch for decoder.next_word.bias: copying a param with shape torch.Size([14469]) from checkpoint, the shape in current model is torch.Size([35466]).

Can you advise what I'm doing wrong here? I can't quite get to the bottom of it. Thanks!
opened by texturejc 8

Owner

Panos

DPhil@Stanford. Previously: RE@FAIR, ML@AutoDesk, RE@Max Planck Cybernetics

GitHub https://www.artemisdataset.org

State of the Art Neural Networks for Deep Learning

pyradox This python library helps you with implementing various state of the art neural networks in a totally customizable fashion using Tensorflow 2

60 May 29, 2022

Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

A Critical Assessment of State-of-the-Art in Entity Alignment This repository contains the source code for the paper A Critical Assessment of State-of

16 Oct 14, 2022

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

349 Dec 8, 2022

State of the art Semantic Sentence Embeddings

Contrastive Tension State of the art Semantic Sentence Embeddings Published Paper · Huggingface Models · Report Bug Overview This is the official code

88 Dec 30, 2022

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

152 Jan 2, 2023

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and build their own methods.

405 Jan 4, 2023

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

2.8k Jan 8, 2023

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

19 Sep 29, 2022

General Virtual Sketching Framework for Vector Line Art (SIGGRAPH 2021)

General Virtual Sketching Framework for Vector Line Art - SIGGRAPH 2021 Paper | Project Page Outline Dependencies Testing with Trained Weights Trainin

118 Dec 27, 2022

State-of-the-art data augmentation search algorithms in PyTorch

MuarAugment Description MuarAugment is a package providing the easiest way to a state-of-the-art data augmentation pipeline. How to use You can instal

43 Dec 12, 2022

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, scikit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio, video, etc.) and machine learning tasks (classification, object detection, speech recognition, generation, certification, etc.).

3.4k Jan 4, 2023

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes Introduction This is the unofficial code of Deep Dual-re

113 Dec 23, 2022

A selection of State Of The Art research papers (and code) on human locomotion (pose + trajectory) prediction (forecasting)

A selection of State Of The Art research papers (and code) on human trajectory prediction (forecasting). Papers marked with [W] are workshop papers.

40 Nov 18, 2022

A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

CSL-YOLO: A New Lightweight Object Detection System for Edge Computing This project provides a SOTA level lightweight YOLO called "Cross-Stage Lightwe

54 Dec 21, 2022

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.

An Effective Loss Function for Generating 3D Models from Single 2D Image without Rendering Papers with code | Paper Nikola Zubić Pietro Lio University

213 Dec 27, 2022

😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

------ Update September 2018 ------ It's been a year since TorchMoji and DeepMoji were released. We're trying to understand how it's being used such t

865 Dec 24, 2022

ArtEmis: Affective Language for Art

Related tags

Overview

ArtEmis: Affective Language for Art

Introduction

- This repo will be live soon! Stay Tuned!! (e.g., feel free to star it:)

Meanwhile, you can explore the ArtEmis annotations here!

Comments

Full log

Owner

Panos

State of the Art Neural Networks for Deep Learning

Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

State of the art Semantic Sentence Embeddings

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

General Virtual Sketching Framework for Vector Line Art (SIGGRAPH 2021)

State-of-the-art data augmentation search algorithms in PyTorch

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

A selection of State Of The Art research papers (and code) on human locomotion (pose + trajectory) prediction (forecasting)

A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.

😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

FastReID is a research platform that implements state-of-the-art re-identification algorithms.

Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+