Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Ruidan He

Last update: Nov 29, 2022

Related tags

Overview

Aspect-level Sentiment Classification

Code and dataset for ACL2018 [paper] ‘‘Exploiting Document Knowledge for Aspect-level Sentiment Classification’’.

Data

The preprocessed aspect-level datasets can be downloaded at [Download], and the document-level datasets can be downloaded at [Download]. The zip files should be decompressed and put in the main folder.

The pre-trained Glove vectors (on 840B tokens) are used for initializing word embeddings. You can download the extracted subset of Glove vectors for each dataset at [Download], the size of which is much smaller. The zip file should be decompressed and put in the main folder.

Training and evaluation

Pretraining on document-level dataset

The pretrained weights from document-level examples used in our experiments are provided at pretrained_weights/. You can use them directly for initialising aspect-level models.

Or if you want to retrain on ducment-level again, execute the command below under code_pretrain/:

CUDA_VISIBLE_DEVICES="0" python pre_train.py \
--domain $domain \

where $domain in ['yelp_large', 'electronics_large'] denotes the corresponding document-level domain. The trained model parameters will be saved under pretrained_weights/. You can find more arguments defined in pre_train.py with default values used in our experiments.

Training and evaluation on aspect-level dataset

To train aspect-level sentiment classifier, excute the command below under code/:

CUDA_VISIBLE_DEVICES="0" python train.py \
--domain $domain \
--alpha 0.1 \
--is-pretrain 1 \

where $domain in ['res', 'lt', 'res_15', 'res_16'] denotes the corresponding aspect-level domain. --alpha denotes the weight of the document-level training objective (\lamda in the paper). --is-pretrain is set to either 0 or 1, denoting whether to use pretrained weights from document-level examples for initialisition. You can find more arguments defined in train.py with default values used in our experiments. At the end of each epoch, results on training, validation and test sets will be printed respectively.

Dependencies

Python 2.7
Keras 2.1.2
tensorflow 1.4.1
numpy 1.13.3

Cite

If you use the code, please cite the following paper:

@InProceedings{he-EtAl:2018,
  author    = {He, Ruidan  and  Lee, Wee Sun  and  Ng, Hwee Tou  and  Dahlmeier, Daniel},
  title     = {Exploiting Document Knowledge for Aspect-level Sentiment Classification},
  booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics},
  publisher = {Association for Computational Linguistics}
}

Comments

missing a few training data ?

Thanks for sharing. From the preprocessed data, I realized the counts of examples (from my script) are not the same as reported in the paper.

For example, the training data of SemEval 2014 is like this: lt Counter({'positive': 987, 'negative': 866, 'neutral': 460}) res Counter({'positive': 2164, 'negative': 805, 'neutral': 633})

Did I make any mistake?

opened by howardhsu 2
ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(2, 74), (2, 18), (32, 572)]

Hello, I have already run your program, and the error shown in the title appears when train.py runs the second args.epochs loop. I don't know why the number of all input array (x) samples has changed. I have been watching for a long time, but I can't find a mistake. Can you help me see why this is? Thank you

opened by yu-cherish 1
ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(2, 74), (2, 18), (32, 572)]

Hello, I have already run your program and I get the following error when running the second loop in train.py. I have been watching for a long time, but I can't find a mistake. Can you help me see why this is? Thank you

opened by yu-cherish 0
error when runing the code

Hi,

I got this error when trying to run the train.py file

TypeError: add_weight() got multiple values for keyword argument 'name'

ANy ideas plz

thank u

opened by yassmine-lam 0
Predicting output for new sentences with the new model

Hi, I tried retraining the model, it went on training and showed the loss during every epoch but at the end of the training it didn't save anything (Model or Word vector file), which I could use for further predictions. Also is there a way to use model and find the sentiment of the input sentence?

opened by Arjunsankarlal 2
About the precision

hello, I modified the code to run with python3 and it can run, but I can't get the same precision you mentioned in the paper, the precision just was 50%, when I adjust the learning rate the precision improve to 67% but still has a gap with your experiment. I used your preprocessed_data. Do you have any idea about improving the precision? thank you!

opened by milkWangzai 7

Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Related tags

Overview

Aspect-level Sentiment Classification

Data

Training and evaluation

Pretraining on document-level dataset

Training and evaluation on aspect-level dataset

Dependencies

Cite

Comments

missing a few training data ?

ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(2, 74), (2, 18), (32, 572)]

ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(2, 74), (2, 18), (32, 572)]

error when runing the code

Predicting output for new sentences with the new model

About the precision

Owner

Ruidan He

This is the dataset and code release of the OpenRooms Dataset.

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Repo for the Video Person Clustering dataset, and code for the associated paper

Source code and Dataset creation for the paper "Neural Symbolic Regression That Scales"

The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset