Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback

Overview

CoSMo.pytorch

Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback, Seungmin Lee*, Dongwan Kim*, Bohyung Han. *(denotes equal contribution)

Presented at CVPR2021

Paper | Poster | 5 min Video

fig

βš™οΈ Setup

Python: python3.7

πŸ“¦ Install required packages

Install torch and torchvision via following command (CUDA10)

pip install torch==1.2.0 torchvision==0.4.0 -f https://download.pytorch.org/whl/torch_stable.html

Install other packages

pip install -r requirements.txt

πŸ“‚ Dataset

Download the FashionIQ dataset by following the instructions on this link.

We have set the default path for FashionIQ datasets in data/fashionIQ.py as _DEFAULT_FASHION_IQ_DATASET_ROOT = '/data/image_retrieval/fashionIQ'. You can change this path to wherever you plan on storing the dataset.

πŸ“š Vocabulary file

Open up a python console and run the following lines to download NLTK punkt:

import nltk
nltk.download('punkt')

Then, open up a Jupyter notebook and run jupyter_files/how_to_create_fashion_iq_vocab.ipynb. As with the dataset, the default path is set in data/fashionIQ.py.

We have provided a vocab file in jupyter_files/fashion_iq_vocab.pkl.

πŸ“ˆ Weights & Biases

We use Weights and Biases to log our experiments.

If you already have a Weights & Biases account, head over to configs/FashionIQ_trans_g2_res50_config.json and fill out your wandb_account_name. You can also change the default at options/command_line.py.

If you do not have a Weights & Biases account, you can either create one or change the code and logging functions to your liking.

πŸƒβ€β™‚οΈ Run

You can run the code by the following command:

python main.py --config_path=configs/FashionIQ_trans_g2_res50_config.json --experiment_description=test_cosmo_fashionIQDress --device_idx=0,1,2,3

Note that you do not need to assign --device_idx if you have already specified CUDA_VISIBLE_DEVICES=0,1,2,3 in your terminal.

We run on 4 12GB GPUs, and the main gpu gpu:0 uses around 4GB of VRAM.

⚠️ Notes on Evaluation

In our paper, we mentioned that we use a slightly different evaluation method than the original FashionIQ dataset. This was done to match the evaluation method used by VAL.

By default, this code uses the proper evaluation method (as intended by the creators of the dataset). The results for this is shown in our supplementary materials. If you'd like to use the same evaluation method as our main paper (and VAL), head over to data/fashionIQ.py and uncomment the commented section.

πŸ“œ Citation

If you use our code, please cite our work:

@InProceedings{CoSMo2021_CVPR,
    author    = {Lee, Seungmin and Kim, Dongwan and Han, Bohyung},
    title     = {CoSMo: Content-Style Modulation for Image Retrieval With Text Feedback},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {802-812}
}
Comments
  • Seperately trained on FashionIQ subsets?

    Seperately trained on FashionIQ subsets?

    Hi, @numpee . Another question please.

    I wonder did you train three models on FashionIQ dress/toptee/shirt separately?

    In other words, are the results shown in Table 1 from one model or three models?

    opened by BrandonHanx 9
  • Doubts about the results of TIRG in the paper

    Doubts about the results of TIRG in the paper

    Hi @numpee ,

    For the results of TIRG on fashion_iq mentioned in main paper Table1 and supp material, did you reproduce several experiments with similar results?

    I understand you just copy the results reported in VAL. However, I challenge these results are wrong.

    According to my experiment, I can have following performance on original split with TIRG (with ResNet 50 and Bi-GRU, no glove and BERT embeddings were used here): | Shirt R@10 | Shirt R@50 | Dress R@10 | Dress R@50 | Toptee R@10 | Toptee R@50 | | ---------- | ---------- | ---------- | ---------- | ----------- | ----------- | | 18.50 | 43.03 | 21.81 | 46.26 | 24.02 | 51.10 |

    This performance is much better than both VAL and my produced CoSMo, also is very close to reported CoSMo.

    Also, this paper has the same conclusion with me (although our settings are different, our comparison between VAL and TIRG is fair, please see Table 1 for details): TIRG is much better than VAL and the results reported in VAL are wrong.

    If our observations are correct, then how to prove the performance effect of CoSMo (although I totally agree with CoSMo's insight)?

    Please point me out if I were wrong. Thanks in advance.

    opened by BrandonHanx 6
  • Different results from provided notebook.

    Different results from provided notebook.

    Hi, @postBG . Thanks for your great work.

    I am preparing vocab according to the instructions in README. However, I received different output results from jupyter_files/how_to_create_fashion_iq_vocab.ipynb.

    My third code block's result is:

    is solid black with no sleeves and is black with straps
    B005X4PL1G
    

    And the result of sixth is: 2957

    I guess you intended to load test split to build vocab while the val split is loaded. But I am not sure. Plase give me some hints.

    opened by BrandonHanx 4
  • LSTM hidden size

    LSTM hidden size

    In text_encoedrs/lstm.py, the hidden size of LSTM is defined as: lstm_hidden_size = kwargs.get('lstm_hidden_size', 512)

    But in text_encoders/init.py, the desired lstm hidden size (from config) is not used.

    Can you check at this? Thanks.

    opened by haoyudong-97 3
  • About other two datasets Shoes and Fashion200K

    About other two datasets Shoes and Fashion200K

    Hi, @numpee. Could you please release the data processing method of the other two datasets ? I want to know to train the model on Shoes and Fashion200K. Thank you.

    opened by Wangld5 3
  • About the result

    About the result

    image I test the model in dress dataset, and get the result above. The results do not match the paper and I run it following the command you write. Do you know why ?

    opened by Wangld5 2
  • About FasionIQ

    About FasionIQ

    Hi, I am very interested in this task. However, I try to download the dress data of FasionIQ dataset but found that about 905 image URLs are missing. I finally get 18182 dress image data. I hope to know whether you get a complete dataset and how many images are included in the dataset. What's more, can you send FashionIQ to my Gmali:[email protected]. Thanks.

    opened by Wangld5 2
  • Actual differences between FashionIQ evaluation method and VAL evaluation method

    Actual differences between FashionIQ evaluation method and VAL evaluation method

    Hi! In the README file is pointed out that the evaluation method reported in the paper is slightly different from evaluation method of the original FashionIQ dataset (in order to match the method used by VAL).

    However, I can't quite figure out which are the actual differences between the two methods. Can someone explain them to me in detail? Thanks, Alberto

    opened by ABaldrati 2
  • Concatenate two captions?

    Concatenate two captions?

    Hi, @numpee

    I also found a little different experimental setting between CosMo and other methods.

    As shown in official FashionIQ evaluation codebase: the two captions of each triplet pair are concatenated into one sentence.

    Following this setting, many other methods concatenate two captions while VAL doesn't.

    I guess you don't concatenate two captions in order to make a fair comparison with VAL.

    However, I guess you may receive higher performance with following this setting.

    opened by BrandonHanx 2
  • Why image is in png format?

    Why image is in png format?

    https://github.com/postBG/CosMo.pytorch/blob/768b7b8bdb2b8b1812ca04e7de852cd1803ca929/data/fashionIQ.py#L24-L25

    Hi, @postBG . Thanks for your great work.

    I guess the original images in FashionIQ dataset is in JPG format. While your code is reading images as PNG and thus leads to an error.

    Do you have any pre-processing?

    opened by BrandonHanx 2
  • hyperparameter settings

    hyperparameter settings

    Hi, I ran the example code (fashioniq dataset) with the default configuration provided in the thesis project, but couldn't achieve the results reported in the thesis. I wonder if there is a problem with my hyperparameter settings (adjusted random seeds, learning rate, etc.), the highest can achieve a top@50 accuracy of 44% on the toptee sub-dataset (the value reported in the paper is at 57% about). If I want to reproduce the experimental results in the paper, could you please give me some suggestions for my experiments.

    opened by 1124146862 1
  • Fashion200k result

    Fashion200k result

    Hello, I have some misunderstandings with the fashion200k dataset.

    My reproduced result is -7~8% lower than the reported one of fashion200k, so I have several questions I wish you could answer for me.

    1. Does the gallery(database) contain all the images from data/labels/xxx_test_detect_all.txt?
    2. Are query images from data/test_queries.txt?
    3. In your paper and Github page, you said that the modifier should be "Change A to B". However, we find that it's actually "Replace A with B" in the code link you provided. Does it have a negative impact?
    4. The target image is not unique when testing fashion200k (, which is different from shoes and fashion-IQ). Is it correct to write the code of the fashion200k dataset part following fashion_IQ? Do there exist additional modifications?
    5. Could you please release the code of these three datasets.

    These questions have been sent to you by email. I'm afraid that you are too busy to notice my email, so I raise this issue. I wish you could choose one to reply at your convenience. Thanks a lot.

    opened by clickmouse 3
Owner
Seung Min Lee
Bring Me Giants
Seung Min Lee
Feedback is important: response-aware feedback mechanism for background based conversation

RFM The code for the paper: "Feedback is important: response-aware feedback mechanism for background based conversation." Requirements python 3.7 pyto

Jiatao Chen 2 Sep 29, 2022
Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

StyleCariGAN in PyTorch Official implementation of StyleCariGAN:Caricature Generation via StyleGAN Feature Map Modulation in PyTorch Requirements PyTo

PeterZhouSZ 49 Oct 31, 2022
Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation This repository contains the official PyTorch implementation of the following

Wonjong Jang 270 Dec 30, 2022
The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

This is the official PyTorch implementation of TMNet in the CVPR 2021 paper "Temporal Modulation Network for Controllable Space-Time VideoSuper-Resolu

Gang Xu 95 Oct 24, 2022
Pytorch implementation of the paper "Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization"

Pytorch implementation of the paper "Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization"

Dongkyu Lee 4 Sep 18, 2022
Official PyTorch implementation of Retrieve in Style: Unsupervised Facial Feature Transfer and Retrieval.

Retrieve in Style: Unsupervised Facial Feature Transfer and Retrieval PyTorch This is the PyTorch implementation of Retrieve in Style: Unsupervised Fa

null 60 Oct 12, 2022
Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

null 47 Dec 16, 2022
[AAAI22] Reliable Propagation-Correction Modulation for Video Object Segmentation

Reliable Propagation-Correction Modulation for Video Object Segmentation (AAAI22) Preview version paper of this work is available at: https://arxiv.or

Xiaohao Xu 2 Dec 7, 2021
This script runs neural style transfer against the provided content image.

Neural Style Transfer Content Style Output Description: This script runs neural style transfer against the provided content image. The content image m

Martynas Subonis 0 Nov 25, 2021
Fast Neural Style for Image Style Transform by Pytorch

FastNeuralStyle by Pytorch Fast Neural Style for Image Style Transform by Pytorch This is famous Fast Neural Style of Paper Perceptual Losses for Real

Bengxy 81 Sep 3, 2022
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

null 364 Dec 14, 2022
Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

SPLASH: Semantic Parsing with Language Assistance from Humans SPLASH is dataset for the task of semantic parse correction with natural language feedba

Microsoft Research - Language and Information Technologies (MSR LIT) 35 Oct 31, 2022
Only a Matter of Style: Age Transformation Using a Style-Based Regression Model

Only a Matter of Style: Age Transformation Using a Style-Based Regression Model The task of age transformation illustrates the change of an individual

null 444 Dec 30, 2022
Official PyTorch implementation of "ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows"

ArtFlow Official PyTorch implementation of the paper: ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows Jie An*, Siyu Huang*, Yibing

null 123 Dec 27, 2022
Official implementation for "Style Transformer for Image Inversion and Editing" (CVPR 2022)

Style Transformer for Image Inversion and Editing (CVPR2022) https://arxiv.org/abs/2203.07932 Existing GAN inversion methods fail to provide latent co

Xueqi Hu 153 Dec 2, 2022
Flexible-CLmser: Regularized Feedback Connections for Biomedical Image Segmentation

Flexible-CLmser: Regularized Feedback Connections for Biomedical Image Segmentation The skip connections in U-Net pass features from the levels of enc

Boheng Cao 1 Dec 29, 2021
Implementation of Feedback Transformer in Pytorch

Feedback Transformer - Pytorch Simple implementation of Feedback Transformer in Pytorch. They improve on Transformer-XL by having each token have acce

Phil Wang 93 Oct 4, 2022
This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Introduction This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures. @inproceedings{Wa

Jiaqi Wang 42 Jan 7, 2023
Personal implementation of paper "Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval"

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval This repo provides personal implementation of paper Approximate Ne

John 8 Oct 7, 2022