Multilingual text (NLP) processing toolkit

Overview

polyglot

Downloads Latest Version Build Status Documentation Status

Polyglot is a natural language pipeline that supports massive multilingual applications.

Features

  • Tokenization (165 Languages)
  • Language detection (196 Languages)
  • Named Entity Recognition (40 Languages)
  • Part of Speech Tagging (16 Languages)
  • Sentiment Analysis (136 Languages)
  • Word Embeddings (137 Languages)
  • Morphological analysis (135 Languages)
  • Transliteration (69 Languages)

Developer

  • Rami Al-Rfou @ rmyeid gmail com

Quick Tutorial

import polyglot
from polyglot.text import Text, Word

Language Detection

text = Text("Bonjour, Mesdames.")
print("Language Detected: Code={}, Name={}\n".format(text.language.code, text.language.name))
Language Detected: Code=fr, Name=French

Tokenization

zen = Text("Beautiful is better than ugly. "
           "Explicit is better than implicit. "
           "Simple is better than complex.")
print(zen.words)
[u'Beautiful', u'is', u'better', u'than', u'ugly', u'.', u'Explicit', u'is', u'better', u'than', u'implicit', u'.', u'Simple', u'is', u'better', u'than', u'complex', u'.']
print(zen.sentences)
[Sentence("Beautiful is better than ugly."), Sentence("Explicit is better than implicit."), Sentence("Simple is better than complex.")]

Part of Speech Tagging

text = Text(u"O primeiro uso de desobediência civil em massa ocorreu em setembro de 1906.")

print("{:<16}{}".format("Word", "POS Tag")+"\n"+"-"*30)
for word, tag in text.pos_tags:
    print(u"{:<16}{:>2}".format(word, tag))
Word            POS Tag
------------------------------
O               DET
primeiro        ADJ
uso             NOUN
de              ADP
desobediência   NOUN
civil           ADJ
em              ADP
massa           NOUN
ocorreu         ADJ
em              ADP
setembro        NOUN
de              ADP
1906            NUM
.               PUNCT

Named Entity Recognition

text = Text(u"In Großbritannien war Gandhi mit dem westlichen Lebensstil vertraut geworden")
print(text.entities)
[I-LOC([u'Gro\xdfbritannien']), I-PER([u'Gandhi'])]

Polarity

print("{:<16}{}".format("Word", "Polarity")+"\n"+"-"*30)
for w in zen.words[:6]:
    print("{:<16}{:>2}".format(w, w.polarity))
Word            Polarity
------------------------------
Beautiful        0
is               0
better           1
than             0
ugly            -1
.                0

Embeddings

word = Word("Obama", language="en")
print("Neighbors (Synonms) of {}".format(word)+"\n"+"-"*30)
for w in word.neighbors:
    print("{:<16}".format(w))
print("\n\nThe first 10 dimensions out the {} dimensions\n".format(word.vector.shape[0]))
print(word.vector[:10])
Neighbors (Synonms) of Obama
------------------------------
Bush
Reagan
Clinton
Ahmadinejad
Nixon
Karzai
McCain
Biden
Huckabee
Lula


The first 10 dimensions out the 256 dimensions

[-2.57382345  1.52175975  0.51070285  1.08678675 -0.74386948 -1.18616164
  2.92784619 -0.25694436 -1.40958667 -2.39675403]

Morphology

word = Text("Preprocessing is an essential step.").words[0]
print(word.morphemes)
[u'Pre', u'process', u'ing']

Transliteration

from polyglot.transliteration import Transliterator
transliterator = Transliterator(source_lang="en", target_lang="ru")
print(transliterator.transliterate(u"preprocessing"))
препрокессинг
Comments
  • ImportError: No module named 'icu'

    ImportError: No module named 'icu'

    python3.4: pip install polyglot from polyglot.text import Text, Word ---> 11 from icu import Locale 12 import pycld2 as cld2 13

    ImportError: No module named 'icu'

    Its not a module dependency nor is it mentioned in readme.

    opened by Fiedzia 35
  • polyglot_data on windows.

    polyglot_data on windows.

    Hi, Have installed polyglot on windows with python 3.4, after some lib problems solved, I start getting this error:

    downloader.download() Polyglot Downloader

    ---------------------------------------------------------------------------

    d) Download l) List u) Update c) Config h) Help q) Quit

    ---------------------------------------------------------------------------

    Downloader> l

    Collections: Traceback (most recent call last): File "", line 1, in File "C:\Python34\lib\site-packages\polyglot-15.5.2-py3.4.egg\polyglot\downloader.py", line 649, in download self._interactive_download() File "C:\Python34\lib\site-packages\polyglot-15.5.2-py3.4.egg\polyglot\downloader.py", line 1068, in _interactive_download DownloaderShell(self).run() File "C:\Python34\lib\site-packages\polyglot-15.5.2-py3.4.egg\polyglot\downloader.py", line 1096, in run more_prompt=True) File "C:\Python34\lib\site-packages\polyglot-15.5.2-py3.4.egg\polyglot\downloader.py", line 459, in list for info in sorted(getattr(self, category)(), key=str): File "C:\Python34\lib\site-packages\polyglot-15.5.2-py3.4.egg\polyglot\downloader.py", line 495, in collections self._update_index() File "C:\Python34\lib\site-packages\polyglot-15.5.2-py3.4.egg\polyglot\downloader.py", line 832, in _update_index P = Package.fromcsobj(p) File "C:\Python34\lib\site-packages\polyglot-15.5.2-py3.4.egg\polyglot\downloader.py", line 232, in fromcsobj language = subdir.split(path.sep)[1] IndexError: list index out of range

    After some analysis (and some neurons less...) e got the problem, on windows "path.sep" is,as expected "" instead of "/", since the packages are "named" or "ID(ed)" with "/" it makes no sense the path.sep on windows users? Or am I missing something I should had installed?

    A replace on path.sep for "/" solve the problem and allow me to list and download any data I want to my polyglot installation.

    opened by xTomax 20
  • Where to get all models as one archive?

    Where to get all models as one archive?

    I\m trying to download models from http://whoisbigger.com/polyglot. But unfortunately it shows 0 bps after some time. Could you give me a link to an alternative donwload?

    opened by hodzanassredin 15
  • ImportError: ~/anaconda3/lib/python3.5/site-packages/_icu.cpython-35m-x86_64-linux-gnu.so: undefined symbol: _ZTIN6icu_5714LEFontInstanceE

    ImportError: ~/anaconda3/lib/python3.5/site-packages/_icu.cpython-35m-x86_64-linux-gnu.so: undefined symbol: _ZTIN6icu_5714LEFontInstanceE

    Can't run polyglot project, please help

    ImportError: ~/anaconda3/lib/python3.5/site-packages/_icu.cpython-35m-x86_64-linux-gnu.so: undefined symbol: _ZTIN6icu_5714LEFontInstanceE

    opened by bilalbayasut 14
  • Trouble Installing

    Trouble Installing

    This is using "pip install polyglot".

    I've located some useful arguments that can help here, but I'm not sure how to add them to the cc command.

    Complete output from command /usr/bin/python -c "import setuptools, tokenize;file='/private/var/folders/k1/6_4k217j1ng5qnm8_vrpx1b80000gp/T/pip-build-uOkJfF/PyICU/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /var/folders/k1/6_4k217j1ng5qnm8_vrpx1b80000gp/T/pip-EdbjO8-record/install-record.txt --single-version-externally-managed --compile: running install running build running build_py creating build creating build/lib.macosx-10.10-intel-2.7 copying icu.py -> build/lib.macosx-10.10-intel-2.7 copying PyICU.py -> build/lib.macosx-10.10-intel-2.7 copying docs.py -> build/lib.macosx-10.10-intel-2.7 running build_ext building '_icu' extension creating build/temp.macosx-10.10-intel-2.7 cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/usr/local/include -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c _icu.cpp -o build/temp.macosx-10.10-intel-2.7/_icu.o -DPYICU_VER="1.9.2" In file included from _icu.cpp:27: ./common.h:86:10: fatal error: 'unicode/utypes.h' file not found #include <unicode/utypes.h> ^ 1 error generated. error: command 'cc' failed with exit status 1

    opened by iamtrask 13
  • polyglot windows installation

    polyglot windows installation

    I can't install polyglot on Windows 7 64 bit. I have tried python 3.4, 3.5, 3.6, various versions of PyICU, numpy, PyCld2 modules from http://www.lfd.uci.edu/~gohlke/pythonlibs/ but still without success. If somebody was successful with polyglot installation on Windows - could you please publish the successful combination of versions for: Python, PyICU, numpy and PyCld2 modules and type of their installations - wheel, github, pip etc..? Maybe some other tips?

    I will really appreciate your help. Thank you.

    Paul

    opened by netgateseznamcz 10
  • installation faild

    installation faild

    I want to use your tool polyglotner,But it appears to me this error and tried to repair it frequently but I could not, I need this tool a lot. can you help me ?please? error Thank you very much

    opened by zainjaradat 8
  • Not able to install polyglot for Windows 10, Python version 3.6.5

    Not able to install polyglot for Windows 10, Python version 3.6.5

    Hello. I have been trying to install Polyglot on my Windows 10 machine but to no avail. I tried to solve this error through the various issues posted here, but none of them work for me.

    It seems the issue arises when trying to install PyICU which is part of Polyglot's installation. I git cloned the repo and used python setup.py install to do so (since pip install polyglot gives an encoding error from cp1252 even though my Python's default encoding is UTF-8).

    Searching for PyICU>=1.8
    Reading https://pypi.python.org/simple/PyICU/
    Downloading https://pypi.python.org/packages/bb/ef/3a7fcbba81bfd213e479131ae21445a2ddd14b46d70ef0109640b580bc5d/PyICU-2.0.3.tar.gz#md5=f2e696a3680be895170282297e036f40
    Best match: PyICU 2.0.3
    Processing PyICU-2.0.3.tar.gz
    Writing C:\Users\me\AppData\Local\Temp\easy_install-188nt5fk\PyICU-2.0.3\setup.cfg
    Running PyICU-2.0.3\setup.py -q bdist_egg --dist-dir C:\Users\me\AppData\Local\Temp\easy_install-188nt5fk\PyICU-2.0.3\egg-dist-tmp-d_y2eb25
    
    Building PyICU 2.0.3 for ICU 2.0.3
    
    _icu.cpp
    c:\users\me\appdata\local\temp\easy_install-188nt5fk\pyicu-2.0.3\common.h(105): fatal error C1083: Cannot open include file: 'unicode/utypes.h': No such file or directory
    error: Setup script exited with error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Enterprise\\VC\\Tools\\MSVC\\14.13.26128\\bin\\HostX86\\x86\\cl.exe' failed with exit status 2
    

    Please help me out; I am totally stumped and have been trying for too long to fix this!

    P.S: This is my first ever proper issue posted, so please go easy on me and let me know what else is considered helpful in such a forum.

    opened by danyal-s 7
  • outdated model for 15.04.19?

    outdated model for 15.04.19?

    Hi,

    I just upgraded to polyglot 15.04.19 and it seems the model needs to be updated too.

    In [1]: from polyglot.downloader import downloader
    
    In [2]: downloader.download("embeddings2.en")
    [polyglot_data] Downloading package embeddings2.en to
    [polyglot_data]     /home/ubuntu/polyglot_data...
    Out[2]: True
    
    In [3]: downloader.download("pos2.en")
    [polyglot_data] Downloading package pos2.en to
    [polyglot_data]     /home/ubuntu/polyglot_data...
    Out[3]: True
    
    In [4]: blob = """We will meet at eight o'clock on Thursday morning."""
    
    In [5]: from polyglot.text import Text
    
    In [6]: text = Text(blob)
    
    In [7]: text.pos_tags
    Out[7]:
    [(u'We', u'INTJ'),
     (u'will', u'NOUN'),
     (u'meet', u'NOUN'),
     (u'at', u'ADP'),
     (u'eight', u'DET'),
     (u"o'clock", u'PART'),
     (u'on', u'ADP'),
     (u'Thursday', u'PART'),
     (u'morning', u'PART'),
     (u'.', u'ADV')]
    

    Also you might want to update this too.

    opened by geovedi 7
  • Named Entity Extraction does not seem to work

    Named Entity Extraction does not seem to work

    I would like to use the Named Entity Extraction of Polyglot, so I'm following the documentation at http://polyglot.readthedocs.org/en/latest/NamedEntityRecognition.html, however when I execute

    print(downloader.supported_languages_table("ner2", 3)) 
    

    I get the following error:

    Traceback (most recent call last):
      File "C:/Users/text_analyzer_polyglot.py", line 22, in <module>
        main()
      File "C:/Users/text_analyzer_polyglot.py", line 18, in main
        print(downloader.supported_languages_table("ner2", 3))
      File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 963, in supported_languages_table
        languages = self.supported_languages(task)
      File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 955, in supported_languages
        collection = self.get_collection(task=task)
      File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 934, in get_collection
        if task: raise TaskNotSupported("Task {} is not supported".format(id))
    polyglot.downloader.TaskNotSupported: Task TASK:ner2 is not supported
    

    In addition, if I try to execute:

    blob = """The Israeli Prime Minister Benjamin Netanyahu has warned that Iran poses a "threat to the entire world"."""
        text = Text(blob)
        print (text.entities)
    

    I get the following error:

    Traceback (most recent call last):
      File "C:/Users/text_analyzer_polyglot.py", line 23, in <module>
        main()
      File "C:/Users/text_analyzer_polyglot.py", line 20, in main
        print (text.entities)
      File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 20, in __get__
        value = obj.__dict__[self.func.__name__] = self.func(obj)
      File "C:\Python27\lib\site-packages\polyglot\text.py", line 124, in entities
        for i, (w, tag) in enumerate(self.ne_chunker.annotate(self.words)):
      File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 20, in __get__
        value = obj.__dict__[self.func.__name__] = self.func(obj)
      File "C:\Python27\lib\site-packages\polyglot\text.py", line 96, in ne_chunker
        return get_ner_tagger(lang=self.language.code)
      File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 30, in memoizer
        cache[key] = obj(*args, **kwargs)
      File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 152, in get_ner_tagger
        return NEChunker(lang=lang)
      File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 99, in __init__
        super(NEChunker, self).__init__(lang=lang)
      File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 40, in __init__
        self.predictor = self._load_network()
      File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 104, in _load_network
        self.embeddings = load_embeddings(self.lang, type='cw')
      File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 30, in memoizer
        cache[key] = obj(*args, **kwargs)
      File "C:\Python27\lib\site-packages\polyglot\load.py", line 64, in load_embeddings
        p = locate_resource(src_dir, lang)
      File "C:\Python27\lib\site-packages\polyglot\load.py", line 47, in locate_resource
        if downloader.status(package_id) != downloader.INSTALLED:
      File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 730, in status
        info = self._info_or_id(info_or_id)
      File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 500, in _info_or_id
        return self.info(info_or_id)
      File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 918, in info
        raise ValueError('Package %r not found in index' % id)
    ValueError: Package u'embeddings2.en' not found in index
    

    Am I missing something in the documentation? Could you tell me how to successfully run the Named Entity Extraction?

    opened by valeriocos 6
  • The downloads server seems currently down

    The downloads server seems currently down

    Hello! When I issued this command:

    polyglot download embeddings2.en ner2.en 
    

    I received the following answer:

    [polyglot_data] Error loading embeddings2.en: HTTP Error 503: Service
    [polyglot_data]     Unavailable
    Error installing package. Retry? [n/y/e]
    

    This has been happening for about 3 days (as far as I know) and in all sorts of circumstances. I think your downloads server is down. Any thoughts?

    opened by georgiana-b 5
  • Licensing issue for polyglot

    Licensing issue for polyglot

    We are planning to use this library in our application which is licensed under GNU General Public License v3.0 causing me a risk of license.We are not doing any modifications in the source code of the library. Can GPL and our application be made proprietary? Can you please more insights into this?

    Thanks in advance.

    opened by ShanmukhaSridhar 0
  • polyglot download failing

    polyglot download failing

    After installing polyglot from source with pip install -U git+https://github.com/aboSamoor/polyglot.git@master I can't download models via CLI nor using the python library:

    >>> polyglot download
    Polyglot Downloader
    ---------------------------------------------------------------------------
      d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
    ---------------------------------------------------------------------------
    Downloader> l
    
    Collections:
    Error reading from server: HTTP Error 404: Not Found
    

    Any other alternative to access the models?

    opened by jspablo 2
  • [1]    7092 segmentation fault  python - Error is coming

    [1] 7092 segmentation fault python - Error is coming

    After lot of struggle finally installed polyglot dependencies such as Pyicu. All is perfectly installed, but one amazing error is coming when I simply execute example mentioned in documentation such as import polyglot from polyglot.text import Text, Word text = Text("Bonjour, Mesdames.") print("Language Detected: Code={}, Name={}\n".format(text.language.code, text.language.name))

    as soon as i enter print statement, I get this error [1] 7092 segmentation fault python - Error is coming and python through out to from python.

    I am using my Mac Book Pro M1 with Mac os Monterey 12.6 python version 3.9.10

    I have no idea what this issue is and how to resolve.

    opened by asifkhan69 0
  • Underscores make sentences detected as English?

    Underscores make sentences detected as English?

    This sentence is detected as French, is 98 probabliity:

    Celles qui n'encouragent guère, emprises de jalousie.

    Chaning one char to underscore:

    Celles qui n'encouragent gu_re, emprises de jalousie

    Gives English in 98 prob. Clearly some bug. Any ideas?

    opened by ndvbd 0
  • English to Japanese Transliteration

    English to Japanese Transliteration

    Hi @aboSamoor,

    Thanks for this amazing library! I have a question regarding the transliteration of Enlgish to Japanese. As you might know, Japanese contains three different types of tokens namely Hiragana, Katakana and Kanji. I wanted to know the type of the token in which the transliteration from En to Ja is taking place here.

    Thanks.

    opened by tejassp2002 0
  • pip install polyglot error: subproc exit with error:

    pip install polyglot error: subproc exit with error:

    C:\Users\r>pip install polyglot Collecting polyglot Downloading polyglot-16.7.4.tar.gz (126 kB) ---------------------------------------- 126.3/126.3 kB 1.2 MB/s eta 0:00:00 Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

    × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [8 lines of output] Traceback (most recent call last): File "", line 2, in File "", line 34, in File "C:\Users\r\AppData\Local\Temp\pip-install-jiuhzdjt\polyglot_bd6a0716ccdf4fd7ae7fad12136682fa\setup.py", line 15, in readme = readme_file.read() File "C:\Users\r\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4941: character maps to [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

    × Encountered error while generating package metadata. ╰─> See above for output.

    note: This is an issue with the package mentioned above, not pip. hint: See above for details. image

    opened by ryan-seitz 0
Owner
RAMI ALRFOU
Research Scientist @ Google / Weekdays. --------------------------------------------------------------------------------------- A Bedouin Ranger / Weekends
RAMI ALRFOU
Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

RAMI ALRFOU 1.8k Feb 18, 2021
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing Trankit is a light-weight Transformer-based Pyth

null 652 Jan 6, 2023
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Google Research Datasets 740 Dec 24, 2022
Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

MT5_paddle Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer English | 简体中文 mT5: A Massively

null 2 Oct 17, 2021
TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks, which unifies general text transformation, task-specific transformation, adversarial attack, sub-population, and their combinations to provide a comprehensive robustness analysis.

TextFlint 587 Dec 20, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.3k Jan 7, 2023
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.1k Feb 17, 2021
Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Grading tools for Advanced NLP (11-711) Installation You'll need docker and unzip to use this repo. For docker, visit the official guide to get starte

Hao Zhu 2 Sep 27, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2.3k Dec 29, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2k Feb 9, 2021
skweak: A software toolkit for weak supervision applied to NLP tasks

Labelled data remains a scarce resource in many practical NLP scenarios. This is especially the case when working with resource-poor languages (or text domains), or when using task-specific labels without pre-existing datasets. The only available option is often to collect and annotate texts by hand, which is expensive and time-consuming.

Norsk Regnesentral (Norwegian Computing Center) 850 Dec 28, 2022
jiant is an NLP toolkit

jiant is an NLP toolkit The multitask and transfer learning toolkit for natural language processing research Why should I use jiant? jiant supports mu

ML² AT CILVR 1.5k Jan 4, 2023
pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

null 297 Dec 29, 2022
jiant is an NLP toolkit

?? Update ?? : As of 2021/10/17, the jiant project is no longer being actively maintained. This means there will be no plans to add new models, tasks,

ML² AT CILVR 1.5k Dec 28, 2022
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 24.9k Jan 2, 2023