TSLM-DISCOURSE-MARKERS
Scope
This repository contains:
(1) Code to extract discourse markers from wikipedia (TSA).
(1) Code to extract significant discoßurse markers from predictions over a sample
Usage
Evaluation code:
Installation
Using pip:
pip install git+ssh://[email protected]/IBM/tslm-discourse-markers.git#egg=tslm-discourse-markers
Alternatively, you can first clone the code, and install the requirements:
1. git clone [email protected]:IBM/tslm-discousrse-markers.git
2. cd tslm-discourse-markers
3. pip install -r requirements.txt
You also need to download fasttext model: curl https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin -o ~/Downloads/lid.176.bin and spacy english model: python -m spacy download en_core_web_sm
Running
Citing tslm-discourse-markers
If you are using tslm-discourse-markers in a publication, please cite the following paper:
Liat Ein-Dor, Ilya Shnayderman, Artem Spector, Lena Dankin,Ranit Aharonov and Noam Slonim 2022 Fortunately, Discourse Markers Can Enhance Language Models for Sentiment Analysis. AAAI-2022.
Model
SenDM model can be found at: https://huggingface.co/ibm/tslm-discourse-markers
Loading dataset
import datasets
directory = 'dataset/WIKI_ENGLISH' datasets.load_dataset('csv', data_files={folder: [f'{directory}/{folder}/{folder}_*.csv.gz'] for folder in ['train', 'dev','test']})
Contributing
This project welcomes external contributions, if you would like to contribute please see further instructions here
Pull requests are very welcome! Make sure your patches are well tested. Ideally create a topic branch for every separate change you make. For example:
- Fork the repo
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Added some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
Changelog
Major changes are documented here.
Notes
If you have any questions or issues you can create a new issue here.
License
This code is distributed under Apache License 2.0. If you would like to see the detailed LICENSE click here.
Authors
The YASO dataset was collected by Liat Ein-Dor, Ilya Shnayderman, Artem Spector, Lena Dankin, Ranit Aharonov and Noam Slonim.
The code was written by Ilya Shnayderman.