Japanese synonym library

Overview

chikkarpy

PyPi version test

chikkarpyはchikkarのPython版です。 chikkarpy is a Python version of chikkar.

chikkarpy は Sudachi 同義語辞書を利用し、SudachiPyの出力に同義語展開を追加するために開発されたライブラリです。

単体でも同義語辞書の検索ツールとして利用できます。

利用方法 Usage

TL;DR

$ pip install chikkarpy

$ echo "閉店" | chikkarpy
閉店    クローズ,close,店仕舞い

Step 1. chikkarpyのインストール

$ pip install chikkarpy

Step 2. 使用方法

コマンドライン

$ echo "閉店" | chikkarpy
閉店    クローズ,close,店仕舞い

chikkarpyは入力された単語を見て一致する同義語のリストを返します。 同義語辞書内の曖昧性フラグが1の見出し語をトリガーにすることはできません。 出力はクエリ\t同義語リストの形式です。

$ chikkarpy search -h
usage: chikkarpy search [-h] [-d [file [file ...]]] [-ev] [-o file] [-v]
                        [file [file ...]]

Search synonyms

positional arguments:
  file                  text written in utf-8

optional arguments:
  -h, --help            show this help message and exit
  -d [file [file ...]]  synonym dictionary (default: system synonym
                        dictionary)
  -ev                   Enable verb and adjective synonyms.
  -o file               the output file
  -v, --version         print chikkarpy version

自分で用意したユーザー辞書を使いたい場合は-dで読み込むバイナリ辞書を指定できます。 (バイナリ辞書のビルドは辞書の作成を参照してください。) 複数辞書を読み込む場合は順番に注意してください。 以下の場合,user2 > user > system の順で同義語を検索して見つかった時点で検索結果を返します。

chikkarpy -d system.dic user.dic user2.dic

また、出力はデフォルトで体言のみです。 用言も出力したい場合は-evを有効にしてください。

$ echo "開放" | chikkarpy
開放	オープン,open
$ echo "開放" | chikkarpy -ev
開放	開け放す,開く,オープン,open

python ライブラリ

使用例

from chikkarpy import Chikkar
from chikkarpy.dictionarylib import Dictionary

chikkar = Chikkar()

system_dic = Dictionary("system.dic", False)
chikkar.add_dictionary(system_dic)

print(chikkar.find("閉店"))
# => ['クローズ', 'close', '店仕舞い']

print(chikkar.find("閉店", group_ids=[5])) # グループIDによる検索
# => ['クローズ', 'close', '店仕舞い']

print(chikkar.find("開放"))
# => ['オープン', 'open']

chikkar.enable_verb() # 用言の出力制御(デフォルトは体言のみ出力)
print(chikkar.find("開放"))
# => ['開け放す', '開く', 'オープン', 'open']

chikkar.add_dictionary()で複数の辞書を読み込ませる場合は順番に注意してください。 最後に読み込んだ辞書を優先して検索します。

辞書の作成 Build a dictionary

新しく辞書を追加する場合は、利用前にバイナリ形式辞書の作成が必要です。 Before using new dictionary, you need to create a binary format dictionary.

$ chikkarpy build -i synonym_dict.csv -o system.dic 
$ chikkarpy build -h
usage: chikkarpy build [-h] -i file [-o file] [-d string]

Build Synonym Dictionary

optional arguments:
  -h, --help  show this help message and exit
  -i file     dictionary file (csv)
  -o file     output file (default: synonym.dic)
  -d string   description comment to be embedded on dictionary

開発者向け

Code Format

scripts/lint.sh を実行して、コードが正しいフォーマットかを確認してください。

flake8 flake8-import-order flake8-builtins が必要です。

Test

scripts/test.sh を実行してテストしてください。

Contact

chikkarpyはWAP Tokushima Laboratory of AI and NLPによって開発されています。

開発者やユーザーの方々が質問したり議論するためのSlackワークスペースを用意しています。

You might also like...
Script to download some free japanese lessons in portuguse from NHK
Script to download some free japanese lessons in portuguse from NHK

Nihongo_nhk This is a script to download some free japanese lessons in portuguese from NHK. It can be executed by installing the packages with: pip in

An open collection of annotated voices in Japanese language

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション Koniwa (声庭): An open collection of annotated voices in Japanese language 概要 Koniwa(声庭)は利用・修正・再配布が自由でオープンな音声とアノテ

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

pyjpboatrace :speedboat: provides you with useful tools for data analysis and auto-betting for boatrace.

aMLP Transformer Model for Japanese

aMLP-japanese Japanese aMLP Pretrained Model aMLPとは、Liu, Daiらが提案する、Transformerモデルです。 ざっくりというと、BERTの代わりに使えて、より性能の良いモデルです。 詳しい解説は、こちらの記事などを参考にしてください。 この

A Japanese tokenizer based on recurrent neural networks
A Japanese tokenizer based on recurrent neural networks

Nagisa is a python module for Japanese word segmentation/POS-tagging. It is designed to be a simple and easy-to-use tool. This tool has the following

This repository has a implementations of data augmentation for NLP for Japanese.

daaja This repository has a implementations of data augmentation for NLP for Japanese: EDA: Easy Data Augmentation Techniques for Boosting Performance

Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.
Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

Visual Automata Copyright 2021 Lewi Lie Uberg Released under the MIT license Visual Automata is a Python 3 library built as a wrapper for Caleb Evans'

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!
Comments
  • pip install does not work under SudachiPy 0.6.x environment / SudachiPy 0.6.x の環境下で pip install が通らない

    pip install does not work under SudachiPy 0.6.x environment / SudachiPy 0.6.x の環境下で pip install が通らない

    temporary solution / 暫定的な解決方法

    Install SudachiPy 0.5.4, then chikkarpy, then reinstall the latest version of SudachiPy. SudachiPy 0.5.4 をインストールしてから、chikkarpy をインストールし、その後 SudachiPy 最新版を再インストールする。

    pip install sudachipy==0.5.4 --upgrade
    pip install sudachidict_core
    pip install chikkarpy
    pip install sudachipy --upgrade
    
    opened by Nishihara-Daiki 1
  • chikkarpy has no attribute 'dictionarylib' in certain cases

    chikkarpy has no attribute 'dictionarylib' in certain cases

    case 1: raised ERROR if call chikkarpy.dictionarylib

    $ pip install chikkarpy
    $ python
    >>> import chikkarpy
    >>> chikkarpy.dictionarylib
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: module 'chikkarpy' has no attribute 'dictionarylib'
    

    case 2: pass if use from chikkarpy import dictionarylib

    $ pip install chikkarpy
    $ python
    >>> from chikkarpy import dictionarylib
    >>> dictionarylib
    <module 'chikkarpy.dictionarylib' from '/usr/local/lib/python3.7/dist-packages/chikkarpy/dictionarylib/__init__.py'>
    

    case 3: pass if call chikkarpy.dictionarylib AFTER from chikkarpy import dictionarylib

    $ pip install chikkarpy
    $ python
    >>> import chikkarpy
    >>> from chikkarpy import dictionarylib
    >>> chikkarpy.dictionarylib
    <module 'chikkarpy.dictionarylib' from '/usr/local/lib/python3.7/dist-packages/chikkarpy/dictionarylib/__init__.py'>
    
    opened by Nishihara-Daiki 0
Releases(v0.1.1)
Owner
Works Applications
Works Applications
Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

japanese-ebook-analysis This aim of this project is to make analysing the contents of a japanese ebook easy and streamline the process for non-technic

Christoffer Aakre 14 Jul 23, 2022
Yomichad - a Japanese pop-up dictionary that can display readings and English definitions of Japanese words

Yomichad is a Japanese pop-up dictionary that can display readings and English definitions of Japanese words, kanji, and optionally named entities. It is similar to yomichan, 10ten, and rikaikun in spirit, but targets qutebrowser.

Jonas Belouadi 7 Nov 7, 2022
The Sudachi synonym dictionary in Solar format.

solr-sudachi-synonyms The Sudachi synonym dictionary in Solar format. Summary Run a script that checks for updates to the Sudachi dictionary every hou

Karibash 3 Aug 19, 2022
Code for producing Japanese GPT-2 provided by rinna Co., Ltd.

japanese-gpt2 This repository provides the code for training Japanese GPT-2 models. This code has been used for producing japanese-gpt2-medium release

rinna Co.,Ltd. 491 Jan 7, 2023
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)

Bunkai Bunkai is a sentence boundary (SB) disambiguation tool for Japanese texts. Quick Start $ pip install bunkai $ echo -e '宿を予約しました♪!まだ2ヶ月も先だけど。早すぎ

Megagon Labs 160 Dec 23, 2022
A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。

简体中文 | English 并行语音合成 [TOC] 新进展 2021/04/20 合并 wavegan 分支到 main 主分支,删除 wavegan 分支! 2021/04/13 创建 encoder 分支用于开发语音风格迁移模块! 2021/04/13 softdtw 分支 支持使用 Sof

Atomicoo 161 Dec 19, 2022
AllenNLP integration for Shiba: Japanese CANINE model

Allennlp Integration for Shiba allennlp-shiab-model is a Python library that provides AllenNLP integration for shiba-model. SHIBA is an approximate re

Shunsuke KITADA 12 Feb 16, 2022
Codes to pre-train Japanese T5 models

t5-japanese Codes to pre-train a T5 (Text-to-Text Transfer Transformer) model pre-trained on Japanese web texts. The model is available at https://hug

Megagon Labs 37 Dec 25, 2022
Auto translate textbox from Japanese to English or Indonesia

priconne-auto-translate Auto translate textbox from Japanese to English or Indonesia How to use Install python first, Anaconda is recommended Install

Aji Priyo Wibowo 5 Aug 25, 2022
Code for evaluating Japanese pretrained models provided by NTT Ltd.

japanese-dialog-transformers 日本語の説明文はこちら This repository provides the information necessary to evaluate the Japanese Transformer Encoder-decoder dialo

NTT Communication Science Laboratories 216 Dec 22, 2022