Text language identification using Wikipedia data

Vsevolod Dyomkin

Last update: Jul 9, 2022

Related tags

Computer Vision wiki-lang-detect

Overview

Text language identification using Wikipedia data

The aim of this project is to provide high-quality language detection over all the web's languages. The proxy for all web's languages is Wikipedia. Currently, we support 156 languages that have their Wikipedia entries.

Usage

The main function is text-langs that returns 2 values:

a lang - probability alist (languages are represented by their ISO-639-1 codes)
a vector of tokens with their inferred langs

WILD> (text-langs "це тест")
((:UK . 0.5000003) (:RU . 0.4999998))
#(<це - UK:1.00> <тест - RU:1.00>)

Running as a service

Installation

Install SBCL
Get Quicklisp
Git clone project
$ cd wiki-lang-detect; sbcl --load run.lisp

Running as a Docker

docker build -t wiki-lang-detect:latest .
docker run -it -p 5000:5000 wiki-lang-detect:latest

curl -X POST -H "Content-Type: application/json" -d "{'text': 'Несе Галя'}"  http://localhost:5000/detect | jq '.'

Or you can use prebuilt Docker image maintained outside of this repository.

docker run -it -p 5000:5000 chaliy/wiki-lang-detect:latest

API

See swagger definition

Helpful links:

Comments

License

Currently, this project is listed on https://github.com/CodyReichert/awesome-cl, but with "no license specified".

Please consider adding a file called LICENSE or COPYING containing the license text. I'd also suggest mentioning the license in the README, and maybe linking to the license file there.

opened by contrapunctus-1 1
The underlying stack and libraries seems outdated and broken

The underlying stack and libraries seems outdated and broken also I was curious to know Is there a support for Japanese Language

The build essential is giving error, seems like the src is outdated and broken W: http://220.152.42.162:80/data/05e9f1e394c802a0/archive.ubuntu.com/ubuntu/pool/main/m/manpages/manpages_4.15-1_all.deb: Automatically disabled Acquire::http::Pipeline-Depth due to incorrect response from server/proxy. (man 5 apt.conf) E: Failed to fetch http://220.152.42.162:80/data/05e9f1e394c802a0/archive.ubuntu.com/ubuntu/pool/main/m/manpages/manpages_4.15-1_all.deb File has unexpected size (1233600 != 83832). Mirror sync in progress? [IP: 220.152.42.162 80] Hashes of expected file: Fetched 49.0 MB in 39s (1269 kB/s) - SHA256:0c142fd5c44cae76e2e1ad5c62d3dc8f36f1a80eaaadf9a5e60325fcfffe16ed - SHA1:dd3996229d03fb17f3da4002d5df030a2039fef9 [weak] - MD5Sum:5861e417cd275a039bbd0583c5bdbcc8 [weak] - Filesize:83832 [weak] E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

opened by ghost 2

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

3.3k Dec 30, 2022

keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》；欢迎试用，关注，并反馈问题...

keras-ctpn [TOC] 说明预测训练例子 4.1 ICDAR2015 4.1.1 带侧边细化 4.1.2 不带带侧边细化 4.1.3 做数据增广-水平翻转 4.2 ICDAR2017 4.3 其它数据集 toDoList 总结说明本工程是keras实现的CPTN: Detecti

107 Jan 9, 2023

Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Detecting Text in Natural Image with Connectionist Text Proposal Network The codes are used for implementing CTPN for scene text detection, described

1.3k Dec 22, 2022

AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car

AdvancedEAST AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST:An Efficient and Accurate Scene Text Dete

1.2k Dec 29, 2022

OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

354 Dec 12, 2022

Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Total-Text-Dataset (Official site) Updated on April 29, 2020 (Detection leaderboard is updated - highlighted E2E methods. Thank you shine-lcy.) Update

671 Dec 27, 2022

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

Open Semantic Search https://opensemanticsearch.org Integrated search server, ETL framework for document processing (crawling, text extraction, text a

684 Jan 6, 2023

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

27 Jan 8, 2023

This can be use to convert text in a file to handwritten text.

TextToHandwriting This can be used to convert text to handwriting. Clone this project or download the code. Run TextToImage.py give the filename of th

2 Feb 6, 2022

Text language identification using Wikipedia data

Related tags

Overview

Text language identification using Wikipedia data

Usage

Running as a service

Installation

Running as a Docker

API

Helpful links:

You might also like...

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》；欢迎试用，关注，并反馈问题...

Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car

OCR, Scene-Text-Understanding, Text Recognition

Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

This can be use to convert text in a file to handwritten text.

Comments

License

The underlying stack and libraries seems outdated and broken

Owner

Vsevolod Dyomkin

OCR system for Arabic language that converts images of typed text to machine-encoded text.

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition