Language Identifier
What is this ?
The goal of this project is to create a model that is able to predict a given sentence language through text processing, including tokenizing and representation of sentences as vectors and applying concepts such as RNN, LSTM and GRU to create the classifier that can detect the language among 17 languages.
Dataset
Language Detection It's a small language detection dataset. This dataset consists of text details for 17 different languages
Results
- All models achieved high accuracy even when using one convolution layer instead of LSTM or GRU, But GRU achieved highest accuracy 99% training accuracy 94% validation accuracy.
- Using convlution layer achieved high accuracy about 95% validation accuracy
- Using fewer embedding dimensions makes the model reach high accuracy faster but in Embedding Projector alot of words grouped with other languages.
32 Embedding dimensions examples
3 Embedding dimensions examples
GRU Accuracy and Loss
GRU Confusion matrix
Libraries
- Tensorflow
- Scikit-learn
- NumPy
- Pandas
- Matplotlib