SCAPT-ABSA
Code for EMNLP2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training"
Overview
In this repository, we provide code for Superived ContrAstive Pre-Training (SCAPT) and aspect-aware fine-tuning, retrieved sentiment corpora from YELP/Amazon reviews, and SemEval2014 Restaurant/Laptop with addtional implicit_sentiment
labeling.
SCAPT aims to tackle implicit sentiments expression in aspect-based sentiment analysis(ABSA). In our work, we define implicit sentiment as sentiment expressions that contain no polarity markers but still convey clear human-aware sentiment polarity.
Here are examples for explicit and implicit sentiment in ABSA:
SCAPT
SCAPT gives an aligned representation of sentiment expressions with the same sentiment label, which consists of three objectives:
- Supervised Contrastive Learning (SCL)
- Review Reconstruction (RR)
- Masked Aspect Prediction (MAP)
Aspect-aware Fine-tuning
Sentiment representation and aspect-based representation are taken into account for sentiment prediction in aspect-aware fine-tuning.
Requirement
- cuda 11.0
- python 3.7.9
- lxml 4.6.2
- numpy 1.19.2
- pytorch 1.8.0
- pyyaml 5.3.1
- tqdm 4.55.0
- transformers 4.2.2
Data Preparation & Preprocessing
For Pre-training
Retrieved sentiment corpora contain millions-level reviews, we provide download links for original corpora and preprocessed data. Download if you want to do pre-training and further use them:
File | Google Drive Link | Baidu Wangpan Link | Baidu Wangpan Code |
---|---|---|---|
scapt_yelp_json.zip | link | link | q7fs |
scapt_amazon_json.zip | link | link | i1da |
scapt_yelp_pkl.zip | link | link | j9ce |
scapt_amazon_pkl.zip | link | link | 3b8t |
These pickle files can also be generated from json files by the preprocessing method:
bash preprocess.py --pretrain
For Fine-tuning
We have already combined the opinion term labeling to the original SemEval2014 datasets. For example:
<sentence id="1634">
<text>The food is uniformly exceptional, with a very capable kitchen which will proudly whip up whatever you feel like eating, whether it's on the menu or not.</text>
<aspectTerms>
<aspectTerm term="food" polarity="positive" from="4" to="8" implicit_sentiment="False" opinion_words="exceptional"/>
<aspectTerm term="kitchen" polarity="positive" from="55" to="62" implicit_sentiment="False" opinion_words="capable"/>
<aspectTerm term="menu" polarity="neutral" from="141" to="145" implicit_sentiment="True"/>
</aspectTerms>
<aspectCategories>
<aspectCategory category="food" polarity="positive"/>
</aspectCategories>
</sentence>
implicit_sentiment
indicates whether it is an implicit sentiment expression and yield opinion_words
if not implicit. The opinion_words
lebaling is credited to TOWE.
Both original and extended fine-tuning data and preprocessed dumps are uploaded to this repository.
Consequently, the structure of your data
directory should be:
├── Amazon
│ ├── amazon_laptops.json
│ └── amazon_laptops_preprocess_pretrain.pkl
├── laptops
│ ├── Laptops_Test_Gold_Implicit_Labeled_preprocess_finetune.pkl
│ ├── Laptops_Test_Gold_Implicit_Labeled.xml
│ ├── Laptops_Test_Gold.xml
│ ├── Laptops_Train_v2_Implicit_Labeled_preprocess_finetune.pkl
│ ├── Laptops_Train_v2_Implicit_Labeled.xml
│ └── Laptops_Train_v2.xml
├── MAMS
│ ├── test_preprocess_finetune.pkl
│ ├── test.xml
│ ├── train_preprocess_finetune.pkl
│ ├── train.xml
│ ├── val_preprocess_finetune.pkl
│ └── val.xml
├── restaurants
│ ├── Restaurants_Test_Gold_Implicit_Labeled_preprocess_finetune.pkl
│ ├── Restaurants_Test_Gold_Implicit_Labeled.xml
│ ├── Restaurants_Test_Gold.xml
│ ├── Restaurants_Train_v2_Implicit_Labeled_preprocess_finetune.pkl
│ ├── Restaurants_Train_v2_Implicit_Labeled.xml
│ └── Restaurants_Train_v2.xml
└── YELP
├── yelp_restaurants.json
└── yelp_restaurants_preprocess_pretrain.pkl
Pre-training
The pre-training is conducted on multiple GPUs.
-
Pre-training [TransEnc|BERT] on [YELP|Amazon]:
python -m torch.distributed.launch --nproc_per_node=${THE_CARD_NUM_YOU_HAVE} multi_card_train.py --config config/[yelp|amazon]_[TransEnc|BERT]_pretrain.yml
Model checkpoints are saved in results
.
Fine-tuning
-
Directly train [TransEnc|BERT] on [Restaurants|Laptops|MAMS] As [TransEncAsp|BERTAsp]:
python train.py --config config/[restaurants|laptops|mams]_[TransEnc|BERT]_finetune.yml
-
Fine-tune the pre-trained [TransEnc|BERT] on [Restaurants|Laptops|MAMS] As [TransEncAsp+SCAPT|BERTAsp+SCAPT]:
python train.py --config config/[restaurants|laptops|mams]_[TransEnc|BERT]_finetune.yml --checkpoint PATH/TO/MODEL_CHECKPOINT
Model checkpoints are saved in results
.
Evaluation
-
Evaluate [TransEnc|BERT]-based model on [Restaurants|Laptops|MAMS] dataset:
python evaluate.py --config config/[restaurants|laptops|mams]_[TransEnc|BERT]_finetune.yml --checkpoint PATH/TO/MODEL_CHECKPOINT
Our model parameters:
Model | Dataset | File | Google Drive Link | Baidu Wangpan Link | Baidu Wangpan Code |
---|---|---|---|---|---|
TransEncAsp+SCAPT | SemEval2014 Restaurant | TransEnc_restaurants.zip | link | link | 5e5c |
TransEncAsp+SCAPT | SemEval2014 Laptop | TransEnc_laptops.zip | link | link | 8amq |
TransEncAsp+SCAPT | MAMS | TransEnc_MAMS.zip | link | link | bf2x |
BERTAsp+SCAPT | SemEval2014 Restaurant | BERT_restaurants.zip | link | link | 1w2e |
BERTAsp+SCAPT | SemEval2014 Laptop | BERT_laptops.zip | link | link | zhte |
BERTAsp+SCAPT | MAMS | BERT_MAMS.zip | link | link | 1iva |
Citation
If you found this repository useful, please cite our paper:
@inproceedings{li-etal-2021-learning-implicit,
title = "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training",
author = "Li, Zhengyan and
Zou, Yicheng and
Zhang, Chong and
Zhang, Qi and
Wei, Zhongyu",
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2021",
address = "Online and Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.emnlp-main.22",
pages = "246--256",
abstract = "Aspect-based sentiment analysis aims to identify the sentiment polarity of a specific aspect in product reviews. We notice that about 30{\%} of reviews do not contain obvious opinion words, but still convey clear human-aware sentiment orientation, which is known as implicit sentiment. However, recent neural network-based approaches paid little attention to implicit sentiment entailed in the reviews. To overcome this issue, we adopt Supervised Contrastive Pre-training on large-scale sentiment-annotated corpora retrieved from in-domain language resources. By aligning the representation of implicit sentiment expressions to those with the same sentiment label, the pre-training process leads to better capture of both implicit and explicit sentiment orientation towards aspects in reviews. Experimental results show that our method achieves state-of-the-art performance on SemEval2014 benchmarks, and comprehensive analysis validates its effectiveness on learning implicit sentiment.",
}