pyMorfologik MorfologikpyMorfologik - Python binding for Morfologik.

Overview

Python binding for Morfologik

Morfologik is Polish morphological analyzer. For more information see http://github.com/morfologik/morfologik-stemming/ and http://http://www.morfologik.blogspot.com/

Requirements

This binding works with Python 2 and Python 3.

Installation

Install it from pip

pip install pyMorfologik

or directly from github

git clone https://github.com/dmirecki/pyMorfologik.git

Usage

Now, only simple stems are supported:

>>> from pymorfologik import Morfologik
>>> from pymorfologik.parsing import ListParser
>>>
>>> parser = ListParser()
>>> stemmer = Morfologik()
>>> stemmer.stem(['Ala ma kota'], parser)
[(u'Ala',
  {u'Al': [u'subst:sg:acc:m1+subst:sg:gen:m1'],
   u'Ala': [u'subst:sg:nom:f'],
   u'Alo': [u'subst:sg:acc:m1+subst:sg:gen:m1']}),
 (u'ma',
  {u'mieć': [u'verb:fin:sg:ter:imperf:refl.nonrefl'],
   u'mój': [u'adj:sg:nom.voc:f:pos']}),
 (u'kota', {u'kot': [u'subst:sg:acc:m1'], u'kota': [u'subst:sg:nom:f']})]

Acknowledgements

This repo is based on Morfologik, a great contribution of Marcin Miłowski (http://marcinmilkowski.pl) and Dawid Weiss (http://www.dawidweiss.com).

Contributions

Damian Mirecki

Adrian Bohdanowicz

You might also like...
Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Frog for Python This is a Python binding to the Natural Language Processing suite Frog. Frog is intended for Dutch and performs part-of-speech tagging

A python wrapper around the ZPar parser for English.

NOTE This project is no longer under active development since there are now really nice pure Python parsers such as Stanza and Spacy. The repository w

💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

PyStanfordDependencies Python interface for converting Penn Treebank trees to Universal Dependencies and Stanford Dependencies. Example usage Start by

Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

🎐 a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk [email protected] and Michael

Python wrapper for Stanford CoreNLP tools v3.4.1

Python interface to Stanford Core NLP tools v3.4.1 This is a Python wrapper for Stanford University's NLP group's Java-based CoreNLP tools. It can eit

Yet Another Sequence Encoder - Encode sequences to vector of vector in python !

Yase Yet Another Sequence Encoder - encode sequences to vector of vectors in python ! Why Yase ? Yase enable you to encode any sequence which can be r

Comments
  • Changes proposal

    Changes proposal

    Hi,

    Your module received mine interest. I would have a proposal for a couple of updates/changes, the main being:

    1. make it Python2 compliant
    2. change the way the output is generated; I propose not to have a dictionary, but rather a list of tuples, for example: Morfologik().get_simple_stem(['Ala ma kota']), would return: [('Ala', ['Ala', 'Al', 'Alo']), ('ma', ['mieć', 'mój']) ('kota', ['kota', 'kot', 'kot', 'kot'])]
    3. Appropriate description changes

    In my opinion this should be more in line with the output of the original Morfologik. One could easily convert it to the way your original proposal defined it.

    I have forked your repo and would made the necessary changes. Let me know if you are okey with the proposal. If yes, I would make a pull request, otherwise I'd stay in a separate repo to continue the development. In any case, thanks for your work! ;)

    BR, Adrian

    opened by adibo 3
  • Parsing class introduced

    Parsing class introduced

    Parsing class introduced to allow for customized parsing of morfologik output. Small adjustments following the discusion in https://github.com/dmirecki/pyMorfologik/issues/1

    opened by adibo 0
  • Problem on debian

    Problem on debian

    HI. When I install morfologik on debian server on python 3.6 I have errors:

    root@whites:/var/www/dev/var/stemming# pip3 install pyMorfologik Exception: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/pip/basecommand.py", line 215, in main status = self.run(options, args) File "/usr/local/lib/python3.6/site-packages/pip/commands/install.py", line 272, in run with self._build_session(options) as session: File "/usr/local/lib/python3.6/site-packages/pip/basecommand.py", line 72, in _build_session insecure_hosts=options.trusted_hosts, File "/usr/local/lib/python3.6/site-packages/pip/download.py", line 329, in init self.headers["User-Agent"] = user_agent() File "/usr/local/lib/python3.6/site-packages/pip/download.py", line 93, in user_agent from pip._vendor import distro File "/usr/local/lib/python3.6/site-packages/pip/_vendor/distro.py", line 1050, in _distro = LinuxDistribution() File "/usr/local/lib/python3.6/site-packages/pip/_vendor/distro.py", line 594, in init if include_lsb else {} File "/usr/local/lib/python3.6/site-packages/pip/_vendor/distro.py", line 931, in _get_lsb_release_info raise subprocess.CalledProcessError(code, cmd, stdout, stderr) subprocess.CalledProcessError: Command 'lsb_release -a' returned non-zero exit status 126. Traceback (most recent call last): File "/usr/local/bin/pip3", line 11, in sys.exit(main()) File "/usr/local/lib/python3.6/site-packages/pip/init.py", line 248, in main return command.main(cmd_args) File "/usr/local/lib/python3.6/site-packages/pip/basecommand.py", line 251, in main timeout=min(5, options.timeout)) as session: File "/usr/local/lib/python3.6/site-packages/pip/basecommand.py", line 72, in _build_session insecure_hosts=options.trusted_hosts, File "/usr/local/lib/python3.6/site-packages/pip/download.py", line 329, in init self.headers["User-Agent"] = user_agent() File "/usr/local/lib/python3.6/site-packages/pip/download.py", line 93, in user_agent from pip._vendor import distro File "/usr/local/lib/python3.6/site-packages/pip/_vendor/distro.py", line 1050, in _distro = LinuxDistribution() File "/usr/local/lib/python3.6/site-packages/pip/_vendor/distro.py", line 594, in init if include_lsb else {} File "/usr/local/lib/python3.6/site-packages/pip/_vendor/distro.py", line 931, in _get_lsb_release_info raise subprocess.CalledProcessError(code, cmd, stdout, stderr) subprocess.CalledProcessError: Command 'lsb_release -a' returned non-zero exit status 126. root@whites:/var/www/dev/var/stemming#

    Do you know what it is?

    opened by michalmolenda 0
Owner
Damian Mirecki
Data Scientist @ Domodi
Damian Mirecki
Yet another Python binding for fastText

pyfasttext Warning! pyfasttext is no longer maintained: use the official Python binding from the fastText repository: https://github.com/facebookresea

Vincent Rasneur 230 Nov 16, 2022
Yet another Python binding for fastText

pyfasttext Warning! pyfasttext is no longer maintained: use the official Python binding from the fastText repository: https://github.com/facebookresea

Vincent Rasneur 228 Feb 17, 2021
Python module (C extension and plain python) implementing Aho-Corasick algorithm

pyahocorasick pyahocorasick is a fast and memory efficient library for exact or approximate multi-pattern string search meaning that you can find mult

Wojciech Muła 763 Dec 27, 2022
Python module (C extension and plain python) implementing Aho-Corasick algorithm

pyahocorasick pyahocorasick is a fast and memory efficient library for exact or approximate multi-pattern string search meaning that you can find mult

Wojciech Muła 579 Feb 17, 2021
Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

null 2 Dec 29, 2022
Ελληνικά νέα (Python script) / Greek News Feed (Python script)

Ελληνικά νέα (Python script) / Greek News Feed (Python script) Ελληνικά English Το 2017 είχα υλοποιήσει ένα Python script για να εμφανίζει τα τωρινά ν

Loren Kociko 1 Jun 14, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Dec 30, 2022
A python framework to transform natural language questions to queries in a database query language.

__ _ _ _ ___ _ __ _ _ / _` | | | |/ _ \ '_ \| | | | | (_| | |_| | __/ |_) | |_| | \__, |\__,_|\___| .__/ \__, | |_| |_| |___/

Machinalis 1.2k Dec 18, 2022
Python library for processing Chinese text

SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库,可以方便的处理中文文本内容,是受到了TextBlob的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和TextBlob

Rui Wang 6k Jan 2, 2023
A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

A Python package implementing a new model for text classification with visualization tools for Explainable AI ?? Online live demos: http://tworld.io/s

Sergio Burdisso 285 Jan 2, 2023