Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger
In this project, our aim is to tune, compare, and contrast the performance of the Hidden Markov Model (HMM) POS tagger and the Brill POS tagger. To perform this task, we will train these two taggers using data from a specific domain and test their accuracy in predicting tag sequences from data belonging to the same domain and data from a different domain.
How to Execute?
To run this project,
-
Download the repository as a zip file.
-
Extract the zip to get the project folder.
-
Open Terminal in the directory you extracted the project folder to.
-
Change directory to the project folder using:
cd part-of-speech-taggers-main
-
Install the required libraries, NLTK and scikit-learn using the following commands:
pip3 install nltk
pip3 install -U scikit-learn
-
Now to execute the code, use any of the following commands (in the current directory):
HMM Tagger Predictions: python3 src/main.py --tagger hmm --train data/train.txt --test data/test.txt --output output/test_hmm.txt
Brill Tagger Predictions: python3 src/main.py --tagger brill --train data/train.txt --test data/test.txt --output output/test_brill.txt
Description of the execution command
Our program src/main.py that takes four command-line options. The first is --tagger to indicate the tagger type, second is --train for the path to a training corpus, the third option is --test for the path to a test corpus, and the fourth option is --output for the output file.
The two possible values for --tagger option are:
-
hmm
for the Hidden Markov Model POS Tagger -
brill
for the Brill POS Tagger
The training data can be found in data/train.txt, the in-domain test data can be found in data/test.txt, and the out-of-domain test data can be found in data/test_ood.txt.
The output file must be generated in the output/ directory.
So specifying these paths, one example of a possible execution command is:
python3 src/main.py --tagger hmm --train data/train.txt --test data/test.txt --output output/test_hmm.txt
References
https://docs.huihoo.com/nltk/0.9.5/api/nltk.tag.hmm.HiddenMarkovModelTrainer-class.html
https://tedboy.github.io/nlps/generated/generated/nltk.tag.HiddenMarkovModelTagger.html
https://www.kite.com/python/docs/nltk.HiddenMarkovModelTagger.train
https://gist.github.com/blumonkey/007955ec2f67119e0909
https://docs.huihoo.com/nltk/0.9.5/api/nltk.tag.brill-module.html
https://www.nltk.org/api/nltk.tag.brill_trainer.html
https://www.nltk.org/_modules/nltk/tag/brill.html