LexSubGen
Lexical Substitution Framework
This repository contains the code to reproduce the results from the paper:
Arefyev Nikolay, Sheludko Boris, Podolskiy Alexander, Panchenko Alexander, "Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution", Proceedings of the 28th International Conference on Computational Linguistics, 2020
Installation
Clone LexSubGen repository from github.com.
git clone https://github.com/Samsung/LexSubGen
cd LexSubGen
Setup anaconda environment
- Download and install conda
- Create new conda environment
conda create -n lexsubgen python=3.7.4
- Activate conda environment
conda activate lexsubgen
- Install requirements
pip install -r requirements.txt
- Download spacy resources and install context2vec and word_forms from github repositories
./init.sh
Setup Web Application
If you do not plan to use the Web Application, skip this section and go to the next!
- Download and install NodeJS and npm.
- Run script for install dependencies and create build files.
bash web_app_setup.sh
Install lexsubgen library
python setup.py install
Results
Results of the lexical substitution task are presented in the following table. To reproduce them, follow the instructions above to install the correct dependencies.
Model | SemEval | COINCO | ||||||
---|---|---|---|---|---|---|---|---|
GAP | P@1 | P@3 | R@10 | GAP | P@1 | P@3 | R@10 | |
OOC | 44.65 | 16.82 | 12.83 | 18.36 | 46.3 | 19.58 | 15.03 | 12.99 |
C2V | 55.82 | 7.79 | 5.92 | 11.03 | 48.32 | 8.01 | 6.63 | 7.54 |
C2V+embs | 53.39 | 28.01 | 21.72 | 33.52 | 50.73 | 29.64 | 24.0 | 21.97 |
ELMo | 53.66 | 11.58 | 8.55 | 13.88 | 49.47 | 13.58 | 10.86 | 11.35 |
ELMo+embs | 54.16 | 32.0 | 22.2 | 31.82 | 52.22 | 35.96 | 26.62 | 23.8 |
BERT | 54.42 | 38.39 | 27.73 | 39.57 | 50.5 | 42.56 | 32.64 | 28.73 |
BERT+embs | 53.87 | 41.64 | 30.59 | 43.88 | 50.85 | 46.05 | 35.63 | 31.67 |
RoBERTa | 56.74 | 32.25 | 24.26 | 36.65 | 50.82 | 35.12 | 27.35 | 25.41 |
RoBERTa+embs | 58.74 | 43.19 | 31.19 | 44.61 | 54.6 | 46.54 | 36.17 | 32.1 |
XLNet | 59.12 | 31.75 | 22.83 | 34.95 | 53.39 | 38.16 | 28.58 | 26.47 |
XLNet+embs | 59.62 | 49.53 | 34.9 | 47.51 | 55.63 | 51.5 | 39.92 | 35.12 |
Results reproduction
Here we list XLNet reproduction commands that correspond to the results presented in the table above. Reproduction commands for all models you can find in scripts/lexsub-all-models.sh
Besides saving to the 'run-directory' all results are saved using mlflow. To check them you can run mlflow ui
in LexSubGen directory and then open the web page in a browser.
Also you can use pytest to check the reproducibility. But it may take a long time:
pytest tests/results_reproduction
-
XLNet:
XLNet Semeval07:
python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet.jsonnet --dataset-config-path configs/dataset_readers/lexsub/semeval_all.jsonnet --run-dir='debug/lexsub-all-models/semeval_all_xlnet' --force --experiment-name='lexsub-all-models' --run-name='semeval_all_xlnet'
XLNet CoInCo:
python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet.jsonnet --dataset-config-path configs/dataset_readers/lexsub/coinco.jsonnet --run-dir='debug/lexsub-all-models/coinco_xlnet' --force --experiment-name='lexsub-all-models' --run-name='coinco_xlnet'
XLNet with embeddings similarity Semeval07:
python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet_embs.jsonnet --dataset-config-path configs/dataset_readers/lexsub/semeval_all.jsonnet --run-dir='debug/lexsub-all-models/semeval_all_xlnet_embs' --force --experiment-name='lexsub-all-models' --run-name='semeval_all_xlnet_embs'
XLNet with embeddings similarity CoInCo:
python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet_embs.jsonnet --dataset-config-path configs/dataset_readers/lexsub/coinco.jsonnet --run-dir='debug/lexsub-all-models/coinco_xlnet_embs' --force --experiment-name='lexsub-all-models' --run-name='coinco_xlnet_embs'
Word Sense Induction Results
Model | SemEval 2013 | SemEval 2010 |
---|---|---|
AVG | AVG | |
XLNet | 33.4 | 52.1 |
XLNet+embs | 37.3 | 54.1 |
To reproduce these results use 2.3.0 version of transformers and the following command:
bash scripts/wsi.sh
Web application
You could use command line interface to run Web application.
# Run main server
lexsubgen-app run --host HOST
--port PORT
[--model-configs CONFIGS]
[--start-ids START-IDS]
[--start-all]
[--restore-session]
Example:
# Run server and serve models BERT and XLNet.
# For BERT create server for serving model and substitute generator instantly (load resources in memory).
# For XLNet create only server.
lexsubgen-app run --host '0.0.0.0'
--port 5000
--model-configs '["my_cool_configs/bert.jsonnet", "my_awesome_configs/xlnet.jsonnet"]'
--start-ids '[0]'
# After shutting down server JSON file with session dumps in the '~/.cache/lexsubgen/app_session.json'.
# The content of this file looks like:
# [
# 'my_cool_configs/bert.jsonnet',
# 'my_awesome_configs/xlnet.jsonnet',
# ]
# You can restore it with flag 'restore-session'
lexsubgen-app run --host '0.0.0.0'
--port 5000
--restore-session
# BERT and XLNet restored now
Arguments:
Argument | Default | Description |
---|---|---|
--help |
Show this help message and exit | |
--host |
IP address of running server host | |
--port |
5000 |
Port for starting the server |
--model-configs |
[] |
List of file paths to the model configs. |
--start-ids |
[] |
Zero-based indices of served models for which substitute generators will be created |
--start-all |
False |
Whether to create substitute generators for all served models |
--restore-session |
False |
Whether to restore session from previous Web application run |
FAQ
- How to use gpu? - You can use environment variable CUDA_VISIBLE_DEVICES to use gpu for inference:
export CUDA_VISIBLE_DEVICES='1'
orCUDA_VISIBLE_DEVICES='1'
before your command. - How to run tests? - You can use pytest:
pytest tests