Human Attention for Text Classification
Re-implementation of the paper Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL2020).
Install requirements
$ poetry install
Download and Split Yelp dataset
Download from Yelp.com
Split the dataset
- The Yelp dataset is so large that it is divided into subsets in advance.
- After that, we can get
tng.jsonl
,val.jsonl
, andtst.jsonl
fromdata
directory.
- After that, we can get
$ allennlp split-dataset \
--input-file data/yelp_academic_dataset_review.json \
--output-dir data/ \
--tng-ratio 0.8 \
--val-ratio 0.1 \
--tst_ratio 0.1
Preprocess HAM dataset
$ allennlp preprocess-ham-dataset \
--ham-dataset-dir data/ham-dataset/raw_data/ \
--output-dir data/
Train RNN model
$ CUDA_VISIBLE_DEVICES=0 allennlp train config/base.jsonnet -s outputs -o '{"trainer": {"cuda_device": 0}}'
Reference
- Sen, Cansu, et al. "Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words?." Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020.