Open & Efficient for Framework for Aspect-based Sentiment Analysis

YangHeng

Last update: Jan 7, 2023

Related tags

Overview

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

Fast & Low Memory requirement & Enhanced implementation of Local Context Focus.

Build from LC-ABSA / LCF-ABSA / LCF-BERT and LCF-ATEPC.

Provide tutorials of training and usages of ATE and APC models.

PyTorch Implementations (CPU & CUDA supported).

Tips

PyABSA use the FindFile to find the target file which means you can specify a dataset/checkpoint by keywords instead of using absolute path. e.g.,

dataset = 'laptop' # instead of './SemEval/LAPTOP'. keyword case doesn't matter
checkpoint = 'lcfs' # any checkpoint whose absolute path contains lcfs

PyABSA use the AutoCUDA to support automatic cuda assignment, but you can still set a preferred device.

auto_device=True  # to auto assign a cuda device for training / inference
auto_device=False  # to use cpu
auto_device='cuda:1'  # to specify a preferred device
auto_device='cpu'  # to specify a preferred device

PyABSA support auto label fixing which means you can set the labels to any token (except -999), e.g., sentiment labels = {-9. 2, negative, positive}
Check and make sure the version and datasets of checkpoint are compatible to your current PyABSA. The version information of PyABSA is also available in the output while loading checkpoints training args.
You can train a model using multiple datasets with same sentiment labels, and you can even contribute and define a combination of datasets here!
Other features are available to be found

Instruction

If you are willing to support PyABSA project, please star this repository as your contribution.

Installation
Package Overview
Quick-Start
- Aspect Term Extraction and Polarity Classification (ATEPC)
- Aspect Polarity Classification (APC)
Model Support
Dataset Support
Make Contributions
All Examples
Notice for LCF-BERT & LCF-ATEPC

Installation

Please do not install the version without corresponding release note to avoid installing a test version.

install via pip

To use PyABSA, install the latest version from pip or source code:

pip install -U pyabsa

install via source

git clone https://github.com/yangheng95/PyABSA --depth=1
cd PyABSA 
python setup.py install

Package Overview

pyabsa	package root (including all interfaces)
pyabsa.functional	recommend interface entry
pyabsa.functional.checkpoint	checkpoint manager entry, inference model entry
pyabsa.functional.dataset	datasets entry
pyabsa.functional.config	predefined config manager
pyabsa.functional.trainer	training module, every trainer return a inference model

Quick Start

See the demos

Aspect Polarity Classification (APC)

1. Import necessary entries

from pyabsa.functional import Trainer
from pyabsa.functional import APCConfigManager
from pyabsa.functional import ABSADatasetList

2. Choose a base param config

# Choose a Bert-based APC models param_dict
apc_config_english = APCConfigManager.get_apc_config_english()

3. Specify an APC model and alter some hyper-parameters (if necessary)

# Specify a Bert-based APC model
apc_config_english.model = APCModelList.SLIDE_LCFS_BERT

4. Configure runtime setting and running training

dataset_path = ABSADatasetList.SemEval #or set your local dataset
sent_classifier = Trainer(config=apc_config_english,
                          dataset=dataset_path,  # train set and test set will be automatically detected
                          checkpoint_save_mode=1,  # = None to avoid save model
                          auto_device=True  # automatic choose CUDA or CPU
                          ).load_trained_model()

5. Sentiment inference

# batch inferring_tutorials returns the results, save the result if necessary using save_result=True
inference_dataset = ABSADatasetList.SemEval # or set your local dataset
results = sent_classifier.batch_infer(target_file=inference_dataset,
                                      print_result=True,
                                      save_result=True,
                                      ignore_error=True,
                                      )

Check the detailed usages in APC Demos directory.

Aspect Term Extraction and Polarity Classification (ATEPC)

1. Import necessary entries

from pyabsa.functional import ATEPCModelList
from pyabsa.functional import Trainer, ATEPCTrainer
from pyabsa.functional import ABSADatasetList
from pyabsa.functional import ATEPCConfigManager

2. Choose a base param config

config = ATEPCConfigManager.get_atepc_config_english()

3. Specify an ATEPC model and alter some hyper-parameters (if necessary)

atepc_config_english = ATEPCConfigManager.get_atepc_config_english()
atepc_config_english.model = ATEPCModelList.LCF_ATEPC

4. Configure runtime setting and running training

laptop14 = ABSADatasetList.Laptop14

aspect_extractor = ATEPCTrainer(config=atepc_config_english, 
                                dataset=laptop14
                                ).load_trained_model()

5. Aspect term extraction & sentiment inference

from pyabsa import ATEPCCheckpointManager

examples = ['相比较原系列锐度高了不少这一点好与不好大家有争议',
            '这款手机的大小真的很薄，但是颜色不太好看， 总体上我很满意啦。'
            ]
aspect_extractor = ATEPCCheckpointManager.get_aspect_extractor(checkpoint='chinese',
                                                               auto_device=True  # False means load model on CPU
                                                               )

inference_source = pyabsa.ABSADatasetList.SemEval
atepc_result = aspect_extractor.extract_aspect(inference_source=inference_source, 
                                               save_result=True,
                                               print_result=True,  # print the result
                                               pred_sentiment=True,  # Predict the sentiment of extracted aspect terms
                                               )

Check the detailed usages in ATE Demos directory.

Checkpoint

How to get available checkpoints from Google Drive

PyABSA will check the latest available checkpoints before and load the latest checkpoint from Google Drive. To view available checkpoints, you can use the following code and load the checkpoint by name:

from pyabsa import available_checkpoints

checkpoint_map = available_checkpoints()

If you can not access to Google Drive, you can download our checkpoints and load the unzipped checkpoint manually. 如果您无法访问谷歌Drive，您可以下载我们预训练的模型，并手动解压缩并加载模型。模型下载地址提取码：ABSA

How to share checkpoints (e.g., checkpoints trained on your custom dataset) with community

How to use checkpoints

1. Sentiment inference

1.1 Import necessary entries

import os
from pyabsa import APCCheckpointManager, ABSADatasetList
os.environ['PYTHONIOENCODING'] = 'UTF8'

1.2 Assume the sent_classifier and checkpoint

sent_classifier = APCCheckpointManager.get_sentiment_classifier(checkpoint='dlcf-dca-bert1', #or set your local checkpoint
                                                                auto_device='cuda',  # Use CUDA if available
                                                                )

1.3 Configure inferring setting

# batch inferring_tutorials returns the results, save the result if necessary using save_result=True
inference_datasets = ABSADatasetList.Laptop14 # or set your local dataset
results = sent_classifier.batch_infer(target_file=inference_datasets,
                                      print_result=True,
                                      save_result=True,
                                      ignore_error=True,
                                      )

2. Aspect term extraction & sentiment inference

2.1 Import necessary entries

import os
from pyabsa import ABSADatasetList
from pyabsa import ATEPCCheckpointManager
os.environ['PYTHONIOENCODING'] = 'UTF8'

2.2 Assume the sent_classifier and checkpoint

aspect_extractor = ATEPCCheckpointManager.get_aspect_extractor(checkpoint='Laptop14', # or your local checkpoint
                                                               auto_device=True  # False means load model on CPU
                                                               )

2.3 Configure extraction and inferring setting

# inference_dataset = ABSADatasetList.SemEval # or set your local dataset
atepc_result = aspect_extractor.extract_aspect(inference_source=inference_dataset,
                                               save_result=True,
                                               print_result=True,  # print the result
                                               pred_sentiment=True,  # Predict the sentiment of extracted aspect terms
                                               )

3. Train based on checkpoint

3.1 Import necessary entries

from pyabsa.functional import APCCheckpointManager
from pyabsa.functional import Trainer
from pyabsa.functional import APCConfigManager
from pyabsa.functional import ABSADatasetList
from pyabsa.functional import APCModelList

3.2 Choose a base param_dict

apc_config_english = APCConfigManager.get_apc_config_english()

3.3 Specify an APC model and alter some hyper-parameters (if necessary)

apc_config_english.model = APCModelList.SLIDE_LCF_BERT

3.4 Configure checkpoint

checkpoint_path = APCCheckpointManager.get_checkpoint('slide-lcf-bert')

3.5 Configure runtime setting and running training

dataset_path = ABSADatasetList.SemEval #or set your local dataset
sent_classifier = Trainer(config=apc_config_english,
                          dataset=dataset_path,
                          from_checkpoint=checkpoint_path,
                          checkpoint_save_mode=1,
                          auto_device=True
                          ).load_trained_model()

Datasets

More datasets are available at ABSADatasets.

Twitter
Laptop14
Restaurant14
Restaurant15
Restaurant16
Phone
Car
Camera
Notebook
MAMS
TShirt
Television
MOOC
Shampoo
Multilingual (The sum of all datasets.)

You don't have to download the datasets, as the datasets will be downloaded automatically.

Model Support

Except for the following models, we provide a template model involving LCF vec, you can develop your model based on the LCF-APC model template or LCF-ATEPC model template.

ATEPC

APC

Bert-based APC models

SLIDE-LCF-BERT (Faster & Performs Better than LCF/LCFS-BERT)
SLIDE-LCFS-BERT (Faster & Performs Better than LCF/LCFS-BERT)
LCF-BERT (Reimplemented & Enhanced)
LCFS-BERT (Reimplemented & Enhanced)
FAST-LCF-BERT (Faster with slightly performance loss)
FAST_LCFS-BERT (Faster with slightly performance loss)
LCF-DUAL-BERT (Dual BERT)
LCFS-DUAL-BERT (Dual BERT)
BERT-BASE
BERT-SPC
LCA-Net
DLCF-DCA-BERT *

Bert-based APC baseline models

GloVe-based APC baseline models

Contribution

We expect that you can help us improve this project, and your contributions are welcome. You can make a contribution in many ways, including:

Share your custom dataset in PyABSA and ABSADatasets
Integrates your models in PyABSA. (You can share your models whether it is or not based on PyABSA. if you are interested, we will help you)
Raise a bug report while you use PyABSA or review the code (PyABSA is a individual project driven by enthusiasm so your help is needed)
Give us some advice about feature design/refactor (You can advise to improve some feature)
Correct/Rewrite some error-messages or code comment (The comments are not written by native english speaker, you can help us improve documents)
Create an example script in a particular situation (Such as specify a SpaCy model, pretrainedbert type, some hyperparameters)
Star this repository to keep it active

Notice

The LCF is a simple and adoptive mechanism proposed for ABSA. Many models based on LCF has been proposed and achieved SOTA performance. Developing your models based on LCF will significantly improve your ABSA models. If you are looking for the original proposal of local context focus, please redirect to the introduction of LCF. If you are looking for the original codes of the LCF-related papers, please redirect to LC-ABSA / LCF-ABSA or LCF-ATEPC.

Acknowledgement

This work build from LC-ABSA/LCF-ABSA and LCF-ATEPC, and other impressive works such as PyTorch-ABSA and LCFS-BERT.

License

MIT

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_Ryan

This project follows the all-contributors specification. Contributions of any kind welcome!

Comments

IndexError: list index out of range | ATEPC English training on Tshirt dataset
Out-of-range error while training ATEPC model - english on T-shirt dataset.

... config.model = ATEPCModelList.LCFS_ATEPC config.evaluate_begin = 5 config.num_epoch = 6 config.log_step = 100 tshirt = ABSADatasetList.TShirt

aspect_extractor = Trainer(config=config, dataset=tshirt, checkpoint_save_mode=1, auto_device=True )

Traceback - >

TShirt dataset is not found locally, search at https://github.com/yangheng95/ABSADatasets Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Using bos_token, but it is not set yet. Using eos_token, but it is not set yet. 59%|█████▊ | 1098/1870 [00:10<00:07, 100.65it/s, convert examples to features]

IndexError Traceback (most recent call last) in () 3 # from_checkpoint=checkpoint_path, 4 checkpoint_save_mode=1, ----> 5 auto_device=True 6 )

7 frames /usr/local/lib/python3.7/dist-packages/pyabsa/functional/trainer/trainer.py in init(self, config, dataset, from_checkpoint, checkpoint_save_mode, auto_device) 92 config.model_path_to_save = None 93 ---> 94 self.train() 95 96 def train(self):

/usr/local/lib/python3.7/dist-packages/pyabsa/functional/trainer/trainer.py in train(self) 103 self.config.seed = s 104 if self.checkpoint_save_mode: --> 105 model_path.append(self.train_func(self.config, self.from_checkpoint, self.logger)) 106 else: 107 # always return the last trained model if dont save trained model

/usr/local/lib/python3.7/dist-packages/pyabsa/core/atepc/training/atepc_trainer.py in train4atepc(opt, from_checkpoint_path, logger) 352 while not trainer: 353 try: --> 354 trainer = Instructor(opt, logger) 355 if from_checkpoint_path: 356 model_path = find_files(from_checkpoint_path, '.model')

/usr/local/lib/python3.7/dist-packages/pyabsa/core/atepc/training/atepc_trainer.py in init(self, opt, logger) 70 len(self.train_examples) / self.opt.batch_size / self.opt.gradient_accumulation_steps) * self.opt.num_epoch 71 train_features = convert_examples_to_features(self.train_examples, self.label_list, self.opt.max_seq_len, ---> 72 self.tokenizer, self.opt) 73 all_spc_input_ids = torch.tensor([f.input_ids_spc for f in train_features], dtype=torch.long) 74 all_input_mask = torch.tensor([f.input_mask for f in train_features], dtype=torch.long)

/usr/local/lib/python3.7/dist-packages/pyabsa/core/atepc/dataset_utils/data_utils_for_training.py in convert_examples_to_features(examples, label_list, max_seq_len, tokenizer, opt) 188 text_right = '' 189 aspect = '' --> 190 prepared_inputs = prepare_input_for_atepc(opt, tokenizer, text_left, text_right, aspect) 191 lcf_cdm_vec = prepared_inputs['lcf_cdm_vec'] 192 lcf_cdw_vec = prepared_inputs['lcf_cdw_vec']

/usr/local/lib/python3.7/dist-packages/pyabsa/core/atepc/dataset_utils/atepc_utils.py in prepare_input_for_atepc(opt, tokenizer, text_left, text_right, aspect) 60 61 if 'lcfs' in opt.model_name or opt.use_syntax_based_SRD: ---> 62 syntactical_dist, _ = get_syntax_distance(text_raw, aspect, tokenizer, opt) 63 else: 64 syntactical_dist = None

/usr/local/lib/python3.7/dist-packages/pyabsa/core/apc/dataset_utils/apc_utils.py in get_syntax_distance(text_raw, aspect, tokenizer, opt) 240 # the following two functions are both designed to calculate syntax-based distances 241 if opt.srd_alignment: --> 242 syntactical_dist = syntax_distance_alignment(raw_tokens, dist, opt.max_seq_len, tokenizer) 243 else: 244 syntactical_dist = pad_syntax_based_srd(raw_tokens, dist, tokenizer, opt)[1]

/usr/local/lib/python3.7/dist-packages/pyabsa/core/apc/dataset_utils/apc_utils.py in syntax_distance_alignment(tokens, dist, max_seq_len, tokenizer) 38 if bert_tokens != text: 39 while text or bert_tokens: ---> 40 if text[0] == ' ' or text[0] == '\xa0': # bad case handle 41 text = text[1:] 42 dep_dist = dep_dist[1:]

IndexError: list index out of range
bug
opened by hitz02 52
If Review contains numbers or emojis, its not generating any entities

I am applying PyABASA package on amazon mobile phone reviews and its not generating attributes when the review contains numbers or emojis.

For example : iPhone 12. Best phone 😍 Genuine product thanks a lot amazon I purchase this divice 20 jan 2022 almost work fine. Best one

For above reviews and similar ones its not generating entities with sentiment. I really appreciate if this issue can be resolved.

opened by ImSanjayChintha 17
[Question] Why all then sentiment predict Positive

Question Hi, it's great works you'd been made on this project.

I used this project for training on custom dataset, it has around 2000 examples. Label count is a little imbalance.Finally, I trained a model with 100 apoach and achieved apc_acc around 90 score. But the predict resullt is always Positive on all the aspect.

thanks very much you any advice?

opened by brightgems 17
Question about inference

Hi, thanks for the nice work. Recently I try to use the multilingual pretrained model for inference. I found that if the model predicts both of 2 consecutive words as (B-ASP). There will be a 'empty separator' error while inferencing. Is there any advice for avoiding this situation? Thanks again !

bug

opened by leohsuofnthu 16
Question about the version of the package used by the framework
Hello, excuse me

It is not convenient for the party to write a document listing the versions of each package used by the framework.

One more question, will the packages used by the framework be updated in a timely manner? For example, if the torch is upgraded to 1.11.0, will the framework be updated in a timely manner?
opened by yaoysyao 15
使用atepc分析时有些文本无法获取结果

你好，冒昧打扰，作者辛苦了，谢谢维护这个项目，在使用过程中遇到如下问题：版本：1.16.5 文本如下： Let me begin by saying that there are two kinds of people, those who will give the Tokyo Hotel 5 stars and rave about it to everyone they know, or... people who can't get past the broken phone, blood stains, beeping fire alarms, peg-legged receptionist, lack of water pressure, cracked walls, strange smells, questionable elevator, televisions left to die after the digital conversion, and the possibility that the air conditioner may fall out the window at any moment. That being said, I whole-heartedly give the Tokyo Hotel 5 stars. This is not a place to quietly slip in and out of with nothing to show but a faint memory of the imitation Thomas Kinkade painting bolted to the wall above your bed. And, there is no continental breakfast or coffee in the lobby. There are a few vending machines, but I'm pretty sure they wont take change minted after 1970. Here your senses will be assaulted, and after you leave you will have enough memories to compete with a 1,000 mile road-trip. I beg anyone who is even mildly considering staying here to give it a chance. The location is prime. We were able to walk down Michigan Ave and the river-walk in the middle of the night, all without straying too far from the hotel. There is a grocery store a block away and parking (which may cost more that your hotel room) across the street. Besides, this place is cheap. Super-cheap for downtown Chicago. The closest price we found in the area was four times as expensive. But, be sure to grab some cash. They don't accept credit cards. Some rules though: - Say hello to Clifton Jackson, the homeless guy by Jewel-Osco. - Buy him a drink, some chicken and look him up on Facebook. - Stay on the 17 floor. All the way at the top. - Go out the fire escape (be sure to prop the door open or you'll have a looong walk down) - Be very very careful. - Explore. (Yes, that ladder will hold your weight) - Be very very careful. - Don't be alarmed by any weird noises you hear. - Spend the night on the roof. 17 stories up, in the heart of Chicago. - Write your own Yelp review. I want to see that others are getting the Tokyo Hotel Experience. - Check out is at noon. Be sure to drink lots of water. - Spend the next day hung over. And... Please be careful on the roof. 使用的预训练好的模型：fast_lcf_atepc_Multilingual_cdw_apcacc_88.96_apcf1_81.58_atef1_81.92 得到的结果：'aspect': [], 'position': [], 'sentiment': [], 'probs': [], 'confidence': [] 从结果看出，无法分析文本的细粒度情感，请问这种情况出现的原因是文本造成的还是模型的原因关于预训练好的模型，我在hugging face上看到你有更新一些checkpoint，请问那些模型是不是可以直接用来加载使用？

opened by yaoysyao 12
[Question] atepc prediction result is array, but its length is not equal with inputs

Environment pyabsa: v1.1.22

Question atepc prediction result is array, but its length is not equal with inputs. For example: inputs examples = ['我就想问，这个真的用清水可以清洗的干净的吗？洗完之后油的吹不太干……难不成我昨晚发膜还要拿洗发水再洗一遍？那请问意义何在了……实在是很尴尬']*20

outputs [{'sentence': '我就想问，这个真的用清水可以清洗的干净的吗？洗完之后油的吹不太干 & hellip ; & hellip ; 难不成我昨晚发膜还要拿洗发水再洗一遍？那请问意义何在了 & hellip ; & hellip ; 实在是很尴尬', 'IOB': ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'I-ASP', 'I-ASP', 'I-ASP', 'I-ASP', 'I-ASP', 'I-ASP', 'I-ASP', 'I-ASP', 'I-ASP', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', '[SEP]', 'O', 'O', 'O', 'O', 'O', 'O'], 'tokens': ['我', '就', '想', '问', '，', '这', '个', '真', '的', '用', '清', '水', '可', '以', '清', '洗', '的', '干', '净', '的', '吗', '？', '洗', '完', '之', '后', '油', '的', '吹', '不', '太', '干', '&', 'hellip', ';', '&', 'hellip', ';', '难', '不', '成', '我', '昨', '晚', '发', '膜', '还', '要', '拿', '洗', '发', '水', '再', '洗', '一', '遍', '？', '那', '请', '问', '意', '义', '何', '在', '了', '&', 'hellip', ';', '&', 'hellip', ';', '实', '在', '是', '很', '尴', '尬'], 'aspect': ['完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干', '完之后油的吹不太干'], 'position': [[23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31]], 'sentiment': ['Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative']}]

opened by brightgems 12
使用deploy demo，情感预测总是Positive

checkpoint = 'model_garden/V0.8.8.0/Chinese/ATEPC/fast_lcf_atepc_Chinese_cdw_apcacc_96.69_apcf1_96.25_atef1_92.26' checkpoint = 'model_garden/V0.8.8.0/Chinese/ATEPC/fast_lcf_atepc_Multilingual_cdw_apcacc_79.61_apcf1_76.24_atef1_63.29.zip' 这两个模型都试了，Sentiment总是Positive，即便用很负向的表达。

opened by jkkl 11
对于atepc关于使用自己的数据集APC指标较低的问题？

我标注了一套自己的数据集，单独跑apc（用你们的APC模型跑的）任务指标正常。APC acc为91，f1为91.但是我跑多任务的时候，用你们的ATEPC的时候，ATE指标倒是正常，可是APC的指标很低，F1为37.37(max:37.37)。。。这是为啥。。。是不是我想做多任务的时候只能先方面抽取ATE再情感极性分类？

opened by zhujinqiu 11
IndexError: list index out of range

Hi, yangheng! the project used to worked fine on my computer, but after installing the latest version of pyabsa, indexerror inccurs as below: yelp = "C:/Users/Li Wei/integrated_datasets/apc_datasets/SemEval/yelprestaurant" aspect_extractor = Trainer(config=config, dataset=yelp, checkpoint_save_mode=1, auto_device=True ).load_trained_model()

and indexerror related to above code is: `IndexError Traceback (most recent call last) in 1 yelp = "C:/Users/Li Wei/integrated_datasets/apc_datasets/SemEval/yelprestaurant" ----> 2 aspect_extractor = Trainer(config=config, 3 dataset=yelp, 4 checkpoint_save_mode=1, 5 auto_device=True

D:\Anaconda\lib\site-packages\pyabsa\functional\trainer\trainer.py in init(self, config, dataset, from_checkpoint, checkpoint_save_mode, auto_device) 71 72 """ ---> 73 config.ABSADatasetsVersion = query_local_version() 74 if isinstance(config, APCConfigManager): 75 self.train_func = train4apc

D:\Anaconda\lib\site-packages\pyabsa\utils\file_utils.py in query_local_version() 293 def query_local_version(): 294 fin = open(find_cwd_file(['init.py', 'integrated_datasets'])) --> 295 local_version = fin.read().split(''')[-2] 296 fin.close() 297 return local_version

IndexError: list index out of range`

opened by WeiLi9811 11
Torch not compiled with CUDA enabled

I have run the "https://github.com/yangheng95/PyABSA/blob/release/examples/aspect_term_extraction/extract_aspects_chinese.py" on CPU device, and set "auto_device=False", but error message received that "Torch not compiled with CUDA enabled"。I have checked the class of "AspectExtractor" and the model class of "LCF_ATEPC", but no mistake were found。

opened by zhihao-chen 11
Question on ATEPC performance metrics and loss.
Hi author @yangheng95 ,

I'm using the FAST-LCF-ATEPC model on my custom dataset and I have 4 questions on the ATEPC performance metrics and loss:

Whats the difference between these 2 Metric Visualizer (MV) tables? Is the validation set used to calculate these metrics?

As I understand from atepc_trainer.py , there are 3 types of losses which are loss_ate , loss_apc and lastly the combined loss that uses this formula loss = loss_ate + ate_loss_weight * loss_apc. I was wondering if you could explain it in simple terms how are each of the losses calculated from the expected output and the actual output?

In continuation to question 2, I want to check if the model overfits to my dataset and to do that I need to plot the training loss and validation loss. So does the `losses' list refer to the training loss? (see below) https://github.com/yangheng95/PyABSA/blob/964d7862da13ef8cc38cb56fe0e65086b343a9cd/pyabsa/core/atepc/training/atepc_trainer.py#L204

How can I retrieve the validation loss for ATE and APC separately so that I could plot them in a graph.

Kind regards, kerolzeeq
opened by kerolzeeq 4
Performance measures test data FAST_LCF checkpoint model
Dear @yangheng95,

Thanks for making and maintaining this repo, it's great!

I have some trouble to get the accuracy and F1 scores for the Restaurant Test data Gold. (Ideally I want to make a confusion matrix). What is the easiest way to get F1 scores for APC & ATE after running a checkpoint model on test data? Does the model store these metrics somewhere?

Alternatively, how do you compare your predictions to the TRUE test data (Restaurant Test data Gold annotated)? I can easily transform the models' predictions ('atepc_inference.result_json') to a pandas dataframe. But it is very hard to transform the test data stored in integrated datasets (from ABSAdatasets) (it is in IOB format) to that exact same format (pandas dataframe) in order to test performance. Do you have a script for that, or a certain function? I was not able to find it.

Btw: I used the multilingual checkpoint model (FAST-LCF-ATEPC) on the Restaurant14 Test data Gold (But, ultimately I want to use this model on Dutch data. That is why I want to know how to test performance).

Thanks a lot,

Karsten

Code:

import pyabsa as pyabsa from pyabsa import available_checkpoints # The results of available_checkpoints() depend on the PyABSA version checkpoint_map = available_checkpoints() # show available checkpoints of PyABSA of current version from pyabsa.functional import ABSADatasetList from pyabsa.functional import ATEPCCheckpointManager inference_source = ABSADatasetList.Restaurant14 aspect_extractor = ATEPCCheckpointManager.get_aspect_extractor(checkpoint='multilingual') atepc_result = aspect_extractor.extract_aspect(inference_source=inference_source, save_result=True, print_result=True, # print the result pred_sentiment=True, # Predict the sentiment of extracted aspect terms ) import pandas as pd df_restaurant_EN_test_pred = pd.read_json('atepc_inference.result_EN.json')
opened by KarstenLasse 3
update ATEPC for ATE and ACD

Hello can we update lCF-ATEPC to do Aspect term extraction and aspect category detection for SemEval dataset (instead of Aspect polarity classification) where replacing sentiment polarity(positive, negative, natural) with aspect categories (food, service, .....) Thanks in advance

opened by Astudnew 1

Releases(v2.0.11)

v2.0.11(Nov 28, 2022)

See the update details for v2 in https://github.com/yangheng95/PyABSA/blob/v2/release-note.json

Regards,

Heng
Source code(tar.gz)
Source code(zip)
v1.16.25(Nov 4, 2022)

Bug fix vesion
Source code(tar.gz)
Source code(zip)
v1.16.14(Aug 30, 2022)

Source code(tar.gz)
Source code(zip)
v.1.16.5(Jul 17, 2022)

General update, see the release note.json for update detail
Source code(tar.gz)
Source code(zip)
v1.16.1(Jul 7, 2022)

Hi everyone! Sorry guys I introduce an evaluation bug in the previous version until v1.16.0, which makes the ATEPC evaluation relatively higher. Now I fix this critical problem and improve the inference performance in non-Latin languages like Chinese, Japanese, Korean, etc. And you can browse the new demo for the non-Latin language inference change.

Cheers! Thanks for usingPyABSA.

Heng
Source code(tar.gz)
Source code(zip)
v1.15.6(Jun 30, 2022)

Please see the release notes
Source code(tar.gz)
Source code(zip)
v1.14.8(Jun 4, 2022)

Last stable version of v1.14.x
Source code(tar.gz)
Source code(zip)
v1.14.0(May 3, 2022)

This is a stable version and solves many bugs
Source code(tar.gz)
Source code(zip)
v1.13.1(Apr 30, 2022)

This is a stable version of PyABSA, which improves the quality of aspect-term extraction and sentiment classification pipeline
Source code(tar.gz)
Source code(zip)
v1.10.4(Apr 21, 2022)

Source code(tar.gz)
Source code(zip)
v1.8.39(Mar 20, 2022)

Source code(tar.gz)
Source code(zip)
V1.8.5(Jan 10, 2022)

Source code(tar.gz)
Source code(zip)

Owner

YangHeng

PhD, University of Exeter

GitHub

codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification

DLCF-DCA codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification. submitted t

15 Aug 30, 2022

Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Aspect-level Sentiment Classification Code and dataset for ACL2018 [paper] ‘‘Exploiting Document Knowledge for Aspect-level Sentiment Classification’’

146 Nov 29, 2022

Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

Aspect Sentiment Quad Prediction (ASQP) This repo contains the annotated data and code for our paper Aspect Sentiment Quad Prediction as Paraphrase Ge

39 Dec 11, 2022

A sample pytorch Implementation of ACL 2021 research paper "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

Span-ASTE-Pytorch This repository is a pytorch version that implements Ali's ACL 2021 research paper Learning Span-Level Interactions for Aspect Senti

10 Dec 6, 2022

MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

MemStream Implementation of MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift . Siddharth Bhatia, Arjit Jain, Shivi

61 Dec 2, 2022

Semi-supervised Learning for Sentiment Analysis

Neural-Semi-supervised-Learning-for-Text-Classification-Under-Large-Scale-Pretraining Code, models and Datasets for《Neural Semi-supervised Learning fo

47 Jan 1, 2023

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Multimodal Deep Learning ?? ?? ?? Announcing the multimodal deep learning repository that contains implementation of various deep learning-based model

Deep Cognition and Language Research (DeCLaRe) Lab

398 Dec 30, 2022

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

MultiModal-InfoMax This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Informa

89 Dec 26, 2022

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks A Transformer-based library for SocialNLP classification tasks. Currently

298 Jan 7, 2023

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

This is the repository for our 2020 paper "Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis". Data We provide

35 Nov 16, 2022

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

MASTER-PyTorch PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This projec

255 Dec 29, 2022

This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR

This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR，which is an open-source toolbox based on PyTorch. The overall architecture will be shown below.

82 Nov 17, 2022

Fuzzing JavaScript Engines with Aspect-preserving Mutation

DIE Repository for "Fuzzing JavaScript Engines with Aspect-preserving Mutation" (in S&P'20). You can check the paper for technical details. Environmen

190 Dec 11, 2022

Source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated Recurrent Memory Network

KaGRMN-DSG_ABSA This repository contains the PyTorch source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated

4 May 20, 2022

OpenDelta - An Open-Source Framework for Paramter Efficient Tuning.

OpenDelta is a toolkit for parameter efficient methods (we dub it as delta tuning), by which users could flexibly assign (or add) a small amount parameters to update while keeping the most paramters frozen. By using OpenDelta, users could easily implement prefix-tuning, adapters, Lora, or any other types of delta tuning with preferred PTMs.

386 Dec 26, 2022

⚡ Fast • 🪶 Lightweight • 0️⃣ Dependency • 🔌 Pluggable • 😈 TLS interception • 🔒 DNS-over-HTTPS • 🔥 Poor Man's VPN • ⏪ Reverse & ⏩ Forward • 👮🏿 "Proxy Server" framework • 🌐 "Web Server" framework • ➵ ➶ ➷ ➠ "PubSub" framework • 👷 "Work" acceptor & executor framework

Table of Contents Features Install Using PIP Stable version Development version Using Docker Stable version Development version Using HomeBrew Stable

2.2k Jan 8, 2023

Efficient electromagnetic solver based on rigorous coupled-wave analysis for 3D and 2D multi-layered structures with in-plane periodicity

Efficient electromagnetic solver based on rigorous coupled-wave analysis for 3D and 2D multi-layered structures with in-plane periodicity, such as gratings, photonic-crystal slabs, metasurfaces, surface-emitting lasers, nano-antennas, and more.

17 Dec 19, 2022

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

引言感谢苏神带来的模型，原文地址：https://spaces.ac.cn/archives/8877 如何运行对应模型EfficientGlobalPoi

40 Dec 14, 2022

Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020).

SentiBERT Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020). https://arxiv.org/abs/20

66 Aug 13, 2022

Open & Efficient for Framework for Aspect-based Sentiment Analysis

Related tags

Overview

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

Tips

Instruction

Installation

install via pip

install via source

Package Overview

Quick Start

Aspect Polarity Classification (APC)

1. Import necessary entries

2. Choose a base param config

3. Specify an APC model and alter some hyper-parameters (if necessary)

4. Configure runtime setting and running training

5. Sentiment inference

Aspect Term Extraction and Polarity Classification (ATEPC)

1. Import necessary entries

2. Choose a base param config

3. Specify an ATEPC model and alter some hyper-parameters (if necessary)

4. Configure runtime setting and running training

5. Aspect term extraction & sentiment inference

Checkpoint

How to get available checkpoints from Google Drive

How to use checkpoints

1. Sentiment inference

1.1 Import necessary entries

1.2 Assume the sent_classifier and checkpoint

1.3 Configure inferring setting

2. Aspect term extraction & sentiment inference

2.1 Import necessary entries

2.2 Assume the sent_classifier and checkpoint

2.3 Configure extraction and inferring setting

3. Train based on checkpoint

3.1 Import necessary entries

3.2 Choose a base param_dict

3.3 Specify an APC model and alter some hyper-parameters (if necessary)

3.4 Configure checkpoint

3.5 Configure runtime setting and running training

Datasets

Model Support

ATEPC

APC

Bert-based APC models

Bert-based APC baseline models

GloVe-based APC baseline models

Contribution

Notice

Acknowledgement

License

Contributors ✨

Comments

aspect_extractor = Trainer(config=config, dataset=tshirt, checkpoint_save_mode=1, auto_device=True )

Traceback - >

Releases(v2.0.11)

v2.0.11(Nov 28, 2022)

v1.16.25(Nov 4, 2022)

v1.16.14(Aug 30, 2022)

v.1.16.5(Jul 17, 2022)

v1.16.1(Jul 7, 2022)

v1.15.6(Jun 30, 2022)

v1.14.8(Jun 4, 2022)

v1.14.0(May 3, 2022)

v1.13.1(Apr 30, 2022)

v1.10.4(Apr 21, 2022)

v1.8.39(Mar 20, 2022)

V1.8.5(Jan 10, 2022)

Owner

YangHeng

codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification

Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

A sample pytorch Implementation of ACL 2021 research paper "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

Semi-supervised Learning for Sentiment Analysis

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.