Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Zhuosheng Zhang

Last update: Apr 14, 2022

Related tags

Text Data & NLP AppleLM

Overview

AppleLM

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles (TASLP 2022)

Setup

This implementation is based on Transformers.

Preparation

Download GLUE datasets

The datasets can be downloaded automatically. Please refer to https://github.com/nyu-mll/GLUE-baselines

git clone https://github.com/nyu-mll/GLUE-baselines.git
python download_glue_data.py --data_dir glue_data --tasks all

It is recommended to put the folder glue_data to data/. The architecture looks like:

AppleLM
└───data
│   └───glue_data
│       │   CoLA/
│       │   MRPC/
│       │   ...

Visual Features

Pre-extracted visual features can be downloaded from Google Drive borrowed from the repo Multi30K.

The features are used in image embedding layer for indexing. Extract train-resnet50-avgpool.npy and put it in the data/ folder.

Training & Evaluate

export GLUE_DIR=data/glue_data/
export CUDA_VISIBLE_DEVICES="0"
export TASK_NAME=CoLA
python ./examples/run_glue_visual-tfidf_att.py \
    --model_type bert \
    --model_name_or_path bert-large-uncased-whole-word-masking \
    --task_name $TASK_NAME \
    --do_eval \
    --do_lower_case \
    --data_dir $GLUE_DIR/$TASK_NAME \
    --max_seq_length 128 \
    --per_gpu_eval_batch_size=32   \
    --per_gpu_train_batch_size=16   \
    --learning_rate 1e-5 \
    --eval_all_checkpoints \
    --save_steps 500 \
    --max_steps 5336 \
    --warmup_steps 320 \
    --image_dir data/train.lc.norm.tok.en \
    --image_embedding_file data/train-resnet50-avgpool.npy \
    --num_img 3 \
    --tfidf 5 \
    --image_merge att-gate \
    --stopwords_dir data/stopwords-en.txt \
    --output_dir experiments/CoLA_bert_wwm

Reference

Please kindly cite this paper in your publications if it helps your research:

@ARTICLE{zhang2022which,
  author={Zhang, Zhuosheng and Yu, Haojie and Zhao, Hai and Utiyama, Masao},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={Which Apple Keeps Which Doctor Away? Colorful Word Representations With Visual Oracles}, 
  year={2022},
  volume={30},
  number={},
  pages={49-59},
  doi={10.1109/TASLP.2021.3130972}
}

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

79 Dec 27, 2022

"elect", "electoral", "electorate" etc." data-original="https://github.com/gutfeeling/word_forms/raw/master/logo.png" >

Accurately generate all possible forms of an English word e.g "election" -- "elect", "electoral", "electorate" etc.

Accurately generate all possible forms of an English word Word forms can accurately generate all possible forms of an English word. It can conjugate v

570 Dec 31, 2022

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

Transformer Embedder A Word Level Transformer layer based on PyTorch and 🤗 Transformers. How to use Install the library from PyPI: pip install transf

27 Nov 20, 2022

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

3 Dec 22, 2021

This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

Text Summarizer This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text. Team Members This mini-project was

1 Nov 16, 2021

100+ Chinese Word Vectors 上百种预训练中文词向量

Chinese Word Vectors 中文词向量中文 This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse),

10.4k Jan 9, 2023

This is a GUI program that will generate a word search puzzle image

Word Search Puzzle Generator Table of Contents About The Project Built With Getting Started Prerequisites Installation Usage Roadmap Contributing Cont

11 Feb 22, 2022

A simple word search made in python

Word Search Puzzle A simple word search made in python Usage $ python3 main.py -h usage: main.py [-h] [-c] [-f FILE] Generates a word s

16 Mar 10, 2022

Random-Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

Random Word Generator Generates meaningful words from dictionary with given no. of letters and words. This might be useful for generating short links

1 Jan 1, 2022

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Related tags

Overview

AppleLM

Setup

Preparation

Training & Evaluate

Reference

You might also like...

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Accurately generate all possible forms of an English word e.g "election" -- "elect", "electoral", "electorate" etc.

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

100+ Chinese Word Vectors 上百种预训练中文词向量

This is a GUI program that will generate a word search puzzle image

A simple word search made in python

Random-Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

Owner

Zhuosheng Zhang

Line as a Visual Sentence: Context-aware Line Descriptor for Visual Localization

apple's universal binaries BUT MUCH WORSE (PRACTICAL SHITPOST) (NOT PRODUCTION READY)

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

🦆 Contextually-keyed word vectors

🦆 Contextually-keyed word vectors

A library for Multilingual Unsupervised or Supervised word Embeddings

MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert, MILES uses the bert-base-multilingual-uncased model, as well as simple language-agnostic approaches to complex word identification (CWI) and candidate ranking.

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.