PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis
Fast & Low Memory requirement & Enhanced implementation of Local Context Focus.
Build from LC-ABSA / LCF-ABSA / LCF-BERT and LCF-ATEPC.
Provide tutorials of training and usages of ATE and APC models.
PyTorch Implementations (CPU & CUDA supported).
Tips
- PyABSA use the FindFile to find the target file which means you can specify a dataset/checkpoint by keywords instead of using absolute path. e.g.,
dataset = 'laptop' # instead of './SemEval/LAPTOP'. keyword case doesn't matter
checkpoint = 'lcfs' # any checkpoint whose absolute path contains lcfs
- PyABSA use the AutoCUDA to support automatic cuda assignment, but you can still set a preferred device.
auto_device=True # to auto assign a cuda device for training / inference
auto_device=False # to use cpu
auto_device='cuda:1' # to specify a preferred device
auto_device='cpu' # to specify a preferred device
- PyABSA support auto label fixing which means you can set the labels to any token (except -999), e.g., sentiment labels = {-9. 2, negative, positive}
- Check and make sure the version and datasets of checkpoint are compatible to your current PyABSA. The version information of PyABSA is also available in the output while loading checkpoints training args.
- You can train a model using multiple datasets with same sentiment labels, and you can even contribute and define a combination of datasets here!
- Other features are available to be found
Instruction
If you are willing to support PyABSA project, please star this repository as your contribution.
- Installation
- Package Overview
- Quick-Start
- Model Support
- Dataset Support
- Make Contributions
- All Examples
- Notice for LCF-BERT & LCF-ATEPC
Installation
Please do not install the version without corresponding release note to avoid installing a test version.
install via pip
To use PyABSA, install the latest version from pip or source code:
pip install -U pyabsa
install via source
git clone https://github.com/yangheng95/PyABSA --depth=1
cd PyABSA
python setup.py install
Package Overview
pyabsa | package root (including all interfaces) |
pyabsa.functional | recommend interface entry |
pyabsa.functional.checkpoint | checkpoint manager entry, inference model entry |
pyabsa.functional.dataset | datasets entry |
pyabsa.functional.config | predefined config manager |
pyabsa.functional.trainer | training module, every trainer return a inference model |
Quick Start
See the demos
Aspect Polarity Classification (APC)
1. Import necessary entries
from pyabsa.functional import Trainer
from pyabsa.functional import APCConfigManager
from pyabsa.functional import ABSADatasetList
2. Choose a base param config
# Choose a Bert-based APC models param_dict
apc_config_english = APCConfigManager.get_apc_config_english()
3. Specify an APC model and alter some hyper-parameters (if necessary)
# Specify a Bert-based APC model
apc_config_english.model = APCModelList.SLIDE_LCFS_BERT
4. Configure runtime setting and running training
dataset_path = ABSADatasetList.SemEval #or set your local dataset
sent_classifier = Trainer(config=apc_config_english,
dataset=dataset_path, # train set and test set will be automatically detected
checkpoint_save_mode=1, # = None to avoid save model
auto_device=True # automatic choose CUDA or CPU
).load_trained_model()
5. Sentiment inference
# batch inferring_tutorials returns the results, save the result if necessary using save_result=True
inference_dataset = ABSADatasetList.SemEval # or set your local dataset
results = sent_classifier.batch_infer(target_file=inference_dataset,
print_result=True,
save_result=True,
ignore_error=True,
)
Check the detailed usages in APC Demos directory.
Aspect Term Extraction and Polarity Classification (ATEPC)
1. Import necessary entries
from pyabsa.functional import ATEPCModelList
from pyabsa.functional import Trainer, ATEPCTrainer
from pyabsa.functional import ABSADatasetList
from pyabsa.functional import ATEPCConfigManager
2. Choose a base param config
config = ATEPCConfigManager.get_atepc_config_english()
3. Specify an ATEPC model and alter some hyper-parameters (if necessary)
atepc_config_english = ATEPCConfigManager.get_atepc_config_english()
atepc_config_english.model = ATEPCModelList.LCF_ATEPC
4. Configure runtime setting and running training
laptop14 = ABSADatasetList.Laptop14
aspect_extractor = ATEPCTrainer(config=atepc_config_english,
dataset=laptop14
).load_trained_model()
5. Aspect term extraction & sentiment inference
from pyabsa import ATEPCCheckpointManager
examples = ['相比较原系列锐度高了不少这一点好与不好大家有争议',
'这款手机的大小真的很薄,但是颜色不太好看, 总体上我很满意啦。'
]
aspect_extractor = ATEPCCheckpointManager.get_aspect_extractor(checkpoint='chinese',
auto_device=True # False means load model on CPU
)
inference_source = pyabsa.ABSADatasetList.SemEval
atepc_result = aspect_extractor.extract_aspect(inference_source=inference_source,
save_result=True,
print_result=True, # print the result
pred_sentiment=True, # Predict the sentiment of extracted aspect terms
)
Check the detailed usages in ATE Demos directory.
Checkpoint
How to get available checkpoints from Google Drive
PyABSA will check the latest available checkpoints before and load the latest checkpoint from Google Drive. To view available checkpoints, you can use the following code and load the checkpoint by name:
from pyabsa import available_checkpoints
checkpoint_map = available_checkpoints()
If you can not access to Google Drive, you can download our checkpoints and load the unzipped checkpoint manually. 如果您无法访问谷歌Drive,您可以下载我们预训练的模型,并手动解压缩并加载模型。 模型下载地址 提取码:ABSA
How to share checkpoints (e.g., checkpoints trained on your custom dataset) with community
How to use checkpoints
1. Sentiment inference
1.1 Import necessary entries
import os
from pyabsa import APCCheckpointManager, ABSADatasetList
os.environ['PYTHONIOENCODING'] = 'UTF8'
1.2 Assume the sent_classifier and checkpoint
sent_classifier = APCCheckpointManager.get_sentiment_classifier(checkpoint='dlcf-dca-bert1', #or set your local checkpoint
auto_device='cuda', # Use CUDA if available
)
1.3 Configure inferring setting
# batch inferring_tutorials returns the results, save the result if necessary using save_result=True
inference_datasets = ABSADatasetList.Laptop14 # or set your local dataset
results = sent_classifier.batch_infer(target_file=inference_datasets,
print_result=True,
save_result=True,
ignore_error=True,
)
2. Aspect term extraction & sentiment inference
2.1 Import necessary entries
import os
from pyabsa import ABSADatasetList
from pyabsa import ATEPCCheckpointManager
os.environ['PYTHONIOENCODING'] = 'UTF8'
2.2 Assume the sent_classifier and checkpoint
aspect_extractor = ATEPCCheckpointManager.get_aspect_extractor(checkpoint='Laptop14', # or your local checkpoint
auto_device=True # False means load model on CPU
)
2.3 Configure extraction and inferring setting
# inference_dataset = ABSADatasetList.SemEval # or set your local dataset
atepc_result = aspect_extractor.extract_aspect(inference_source=inference_dataset,
save_result=True,
print_result=True, # print the result
pred_sentiment=True, # Predict the sentiment of extracted aspect terms
)
3. Train based on checkpoint
3.1 Import necessary entries
from pyabsa.functional import APCCheckpointManager
from pyabsa.functional import Trainer
from pyabsa.functional import APCConfigManager
from pyabsa.functional import ABSADatasetList
from pyabsa.functional import APCModelList
3.2 Choose a base param_dict
apc_config_english = APCConfigManager.get_apc_config_english()
3.3 Specify an APC model and alter some hyper-parameters (if necessary)
apc_config_english.model = APCModelList.SLIDE_LCF_BERT
3.4 Configure checkpoint
checkpoint_path = APCCheckpointManager.get_checkpoint('slide-lcf-bert')
3.5 Configure runtime setting and running training
dataset_path = ABSADatasetList.SemEval #or set your local dataset
sent_classifier = Trainer(config=apc_config_english,
dataset=dataset_path,
from_checkpoint=checkpoint_path,
checkpoint_save_mode=1,
auto_device=True
).load_trained_model()
Datasets
More datasets are available at ABSADatasets.
- Laptop14
- Restaurant14
- Restaurant15
- Restaurant16
- Phone
- Car
- Camera
- Notebook
- MAMS
- TShirt
- Television
- MOOC
- Shampoo
- Multilingual (The sum of all datasets.)
You don't have to download the datasets, as the datasets will be downloaded automatically.
Model Support
Except for the following models, we provide a template model involving LCF vec, you can develop your model based on the LCF-APC model template or LCF-ATEPC model template.
ATEPC
- LCF-ATEPC
- LCF-ATEPC-LARGE (Dual BERT)
- FAST-LCF-ATEPC
- LCFS-ATEPC
- LCFS-ATEPC-LARGE (Dual BERT)
- FAST-LCFS-ATEPC
- BERT-BASE
APC
Bert-based APC models
- SLIDE-LCF-BERT (Faster & Performs Better than LCF/LCFS-BERT)
- SLIDE-LCFS-BERT (Faster & Performs Better than LCF/LCFS-BERT)
- LCF-BERT (Reimplemented & Enhanced)
- LCFS-BERT (Reimplemented & Enhanced)
- FAST-LCF-BERT (Faster with slightly performance loss)
- FAST_LCFS-BERT (Faster with slightly performance loss)
- LCF-DUAL-BERT (Dual BERT)
- LCFS-DUAL-BERT (Dual BERT)
- BERT-BASE
- BERT-SPC
- LCA-Net
- DLCF-DCA-BERT *
Bert-based APC baseline models
- AOA_BERT
- ASGCN_BERT
- ATAE_LSTM_BERT
- Cabasc_BERT
- IAN_BERT
- LSTM_BERT
- MemNet_BERT
- MGAN_BERT
- RAM_BERT
- TD_LSTM_BERT
- TC_LSTM_BERT
- TNet_LF_BERT
GloVe-based APC baseline models
Contribution
We expect that you can help us improve this project, and your contributions are welcome. You can make a contribution in many ways, including:
- Share your custom dataset in PyABSA and ABSADatasets
- Integrates your models in PyABSA. (You can share your models whether it is or not based on PyABSA. if you are interested, we will help you)
- Raise a bug report while you use PyABSA or review the code (PyABSA is a individual project driven by enthusiasm so your help is needed)
- Give us some advice about feature design/refactor (You can advise to improve some feature)
- Correct/Rewrite some error-messages or code comment (The comments are not written by native english speaker, you can help us improve documents)
- Create an example script in a particular situation (Such as specify a SpaCy model, pretrainedbert type, some hyperparameters)
- Star this repository to keep it active
Notice
The LCF is a simple and adoptive mechanism proposed for ABSA. Many models based on LCF has been proposed and achieved SOTA performance. Developing your models based on LCF will significantly improve your ABSA models. If you are looking for the original proposal of local context focus, please redirect to the introduction of LCF. If you are looking for the original codes of the LCF-related papers, please redirect to LC-ABSA / LCF-ABSA or LCF-ATEPC.
Acknowledgement
This work build from LC-ABSA/LCF-ABSA and LCF-ATEPC, and other impressive works such as PyTorch-ABSA and LCFS-BERT.
License
MIT
✨
Contributors Thanks goes to these wonderful people (emoji key):
XuMayi |
YangHeng |
brtgpy |
Ryan |
This project follows the all-contributors specification. Contributions of any kind welcome!