300 Repositories
Python Amazon-Textract-OCR-Extracao-Dados-DynamoDB Libraries
Extrator de dados do jupiterweb
Extrator de dados do jupiterweb O programa é composto de dois arquivos: Um constando apenas de classes complementares que representam as unidades e as
A project to make Amazon Echo respond to sign language using your webcam
Making Alexa respond to Sign Language using Tensorflow.js Try the live demo Read the Blog Post on Tensorflow's Blog Coming Soon Watch the video This p
A voice recognition assistant similar to amazon alexa, siri and google assistant.
kenyan-Siri Build an Artificial Assistant Full tutorial (video) To watch the tutorial, click on the image below Installation For windows users (run th
Google, Facebook, Amazon, Microsoft, Netflix tech interview questions
Algorithm and Data Structures Interview Questions HackerRank | Practice, Tutorials & Interview Preparation Solutions This repository consists of solut
A simple URL shortener app using Python AWS Chalice, AWS Lambda and AWS Dynamodb.
url-shortener-chalice A simple URL shortener app using AWS Chalice. Please make sure you configure your AWS credentials using AWS CLI before starting
FOTS Pytorch Implementation
News!!! Recognition branch now is added into model. The whole project has beed optimized and refactored. ICDAR Dataset SynthText 800K Dataset detectio
Indonesian ID Card OCR using tesseract OCR
KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON
The open source extract transaction infomation by using OCR.
Transaction OCR Mã nguồn trích xuất thông tin transaction từ file scaned pdf, ở đây tôi lựa chọn tài liệu sao kê công khai của Thuy Tien. Mã nguồn có
Empresas do Brasil (CNPJs)
Biblioteca em Python que coleta informações cadastrais de empresas do Brasil (CNPJ) obtidas de fontes oficiais (Receita Federal) e exporta para um formato legível por humanos (CSV ou JSON).
Easy installer for running Amazon AVS Device SDK on Raspberry Pi
avs-device-sdk-pi Scripts to enable Alexa voice activation using Picovoice Porcupine If you like the work, find it useful and if you would like to get
OpenMMLab Text Detection, Recognition and Understanding Toolbox
Introduction English | 简体中文 MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the correspondi
Bancos de Dados Relacionais (SQL) na AWS com Amazon RDS
Bancos de Dados Relacionais (SQL) na AWS com Amazon RDS Repositório para o Live Coding DIO do dia 24/11/2021 Serviços utilizados Amazon RDS AWS Lambda
Script de monitoramento das teclas do teclado, salvando todos os dados digitados em um arquivo de log juntamente com os dados de rede.
listenerPython Script de monitoramento das teclas do teclado, salvando todos os dados digitados em um arquivo de log juntamente com os dados de rede.
Script que realiza a identificação de todos os logins e senhas dos wifis conectados em uma máquina e envia os dados para um e-mail especificado.
getWIFIConnection Script que realiza a identificação de todos os logins e senhas dos wifis conectados em uma máquina e envia os dados para um e-mail e
OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages
OCR-Streamlit-App OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages OCR app gets an image a
SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.
SageMaker Python SDK SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the S
Apply different text recognition services to images of handwritten documents.
Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume
Fully functioning price detector built with selenium and python
Fully functioning price detector built with selenium and python
MVP monorepo to rapidly develop scalable, reliable, high-quality components for Amazon Linux instance configuration management
Ansible Amazon Base Repository Ansible Amazon Base Repository About Setting Up Ansible Environment Configuring Python VENV and Ansible Editor Configur
Script em python para carregar os arquivos de cnpj dos dados públicos da Receita Federal em MYSQL.
cnpj-mysql Script em python para carregar os arquivos de cnpj dos dados públicos da Receita Federal em MYSQL. Dados públicos de cnpj no site da Receit
OCR of Chicago 1909 Renumbering Plan
Requirements: Python 3 (probably at least 3.4) pipenv (pip3 install pipenv) tesseract (brew install tesseract, at least if you have a mac and homebrew
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.
pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp
Biblioteca Python que extrai dados de mercado do Bacen (Séries Temporais)
Pybacen This library was developed for economic analysis in the Brazilian scenario (Investments, micro and macroeconomic indicators) Installation Inst
Top 50 best selling books on amazon
It's a dashboard that shows the detailed information about each book in the top 50 best selling books on amazon over the last ten years
OCR Post Correction for Endangered Language Texts
📌 Coming soon: an update to the software including features from our paper on semi-supervised OCR post-correction, to be published in the Transaction
Python-based tools for document analysis and OCR
ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so
Tool which allow you to detect and translate text.
Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
hocr-tools About About the code Installation System-wide with pip System-wide from source virtualenv Available Programs hocr-check -- check the hOCR f
Rotates Amazon Personalize filters on a schedule based on dynamic templates
Amazon Personalize Filter Rotation This project contains the source code and supporting files for deploying a serverless application that provides aut
Let's learn how to build, release and operate your containerized applications to Amazon ECS and AWS Fargate using AWS Copilot.
🚀 Welcome to AWS Copilot Workshop In this workshop, you'll learn how to build, release and operate your containerised applications to Amazon ECS and
Classical OCR DCNN reproduction based on PaddlePaddle framework.
Paddle-SVHN Classical OCR DCNN reproduction based on PaddlePaddle framework. This project reproduces Multi-digit Number Recognition from Street View I
This is a implementation of CRAFT OCR method
This is a implementation of CRAFT OCR method
An open-source CLI tool for backing up RDS(PostgreSQL) Locally or to Amazon S3 bucket
An open-source CLI tool for backing up RDS(PostgreSQL) Locally or to Amazon S3 bucket
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.
pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp
Dirty, ugly, and hopefully useful OCR of Facebook Papers docs released by Gizmodo
Quick and Dirty OCR of Facebook Papers Gizmodo has been working through the Facebook Papers and releasing the docs that they process and review. As lu
Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!
Auto-Research A no-code utility to generate a detailed well-cited survey with topic clustered sections (draft paper format) and other interesting arti
Fetch tracking numbers of Amazon orders, for the ease of the logistics.
Amazon-Tracking-Number Fetch tracking numbers of Amazon orders, for the ease of the logistics. Read Me First (How to use this code): Get Amazon "Items
A Vietnamese personal card OCR website built with Django.
Django VietCardOCR Installation Creation of virtual environments is done by executing the command venv: python -m venv venv That will create a new fol
Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.
Convolutional Recurrent Neural Network This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN, RNN and CTC l
Key information extraction from invoice document with Graph Convolution Network
Key Information Extraction from Scanned Invoices Key information extraction from invoice document with Graph Convolution Network Related blog post fro
Script to get a notification when a product, on Amazon Warehouse, is available within a target price
Amazon_Warehouse_Scraping This script aims to scrape Amazon Warehouse and send an email back if there are products whose price matches with the target
Amazon Multilingual Counterfactual Dataset (AMCD)
Amazon Multilingual Counterfactual Dataset (AMCD)
Machine Leaning applied to denoise images to improve OCR Accuracy
Machine Learning to Denoise Images for Better OCR Accuracy This project is an adaptation of this tutorial and used only for learning purposes: https:/
A bot that extract text from images using the Tesseract OCR.
Text from image (OCR) @ocr_text_bot A simple bot to extract text from images. Usage What do I need? A AWS key configured locally, see here. NodeJS. I
Use Youdao OCR API to covert your clipboard image to text.
Alfred Clipboard OCR 注:本仓库基于 oott123/alfred-clipboard-ocr 的逻辑用 Python 重写,换用了有道 AI 的 API,准确率更高,有效防止百度导致隐私泄露等问题,并且有道 AI 初始提供的 50 元体验金对于其资费而言个人用户基本可以永久使用
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks
Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka
A chatbot that helps you set price alerts for your amazon products.
Amazon Price Alert Bot Description A Telegram chatbot that helps you set price alerts for amazon products. The bot checks the price of your watchliste
A Python tool to generate and refresh Amazon access tokens.
amazon_auth A Python tool to generate and refresh Amazon access tokens. Description This tool generates and outputs Amazon access and refresh tokens f
Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB
dio-live-textract2 Repositório de código para o live coding do dia 05/10/2021 sobre extração de dados estruturados e gravação em banco de dados a part
Ground truth data for the Optical Character Recognition of Historical Classical Commentaries.
OCR Ground Truth for Historical Commentaries The dataset OCR ground truth for historical commentaries (GT4HistComment) was created from the public dom
a micro OCR network with 0.07mb params.
MicroOCR a micro OCR network with 0.07mb params. Layer (type) Output Shape Param # Conv2d-1 [-1, 64, 8,
Programa que viabiliza a OCR (Optical Character Reading - leitura óptica de caracteres) de um PDF.
Este programa tem o intuito de ser um modificador de arquivos PDF. Os arquivos PDFs podem ser 3: PDFs verdadeiros - em que podem ser selecionados o ti
It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.
OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho
METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)
Nautilus-OCR The National Library of Luxembourg (BnL) started its first initiative in digitizing newspapers, with layout recognition and OCR on articl
Open-Source Python CLI package for copying DynamoDB tables and items in parallel batch processing + query natural & Global Secondary Indexes (GSIs)
Python Command-Line Interface Package to copy Dynamodb data in parallel batch processing + query natural & Global Secondary Indexes (GSIs).
user friendly python script who is able to catch fish in the game New World
new-world-fishing-bot release 1.1.1 click img for demonstration Download guide Click at latest release: Download and extract bot.zip: When you run fil
Criando Lambda Functions para Ingerir Dados de APIs com AWS CDK
LIVE001 - AWS Lambda para Ingerir Dados de APIs Fazer o deploy de uma função lambda com infraestrutura como código Lambda vai numa API externa e extra
docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
Project repository of Apache Airflow, deployed on Docker in Amazon EC2 via GitLab.
Airflow on Docker in EC2 + GitLab's CI/CD Personal project for simple data pipeline using Airflow. Airflow will be installed inside Docker container,
S3-plugin is a high performance PyTorch dataset library to efficiently access datasets stored in S3 buckets.
S3-plugin is a high performance PyTorch dataset library to efficiently access datasets stored in S3 buckets.
Django email backends and webhooks for Amazon SES, Mailgun, Mailjet, Postmark, SendGrid, Sendinblue, SparkPost and more
Django email backends and webhooks for Amazon SES, Mailgun, Mailjet, Postmark, SendGrid, Sendinblue, SparkPost and more
A Django email backend for Amazon's Simple Email Service
Django-SES Info: A Django email backend for Amazon's Simple Email Service Author: Harry Marr (http://github.com/hmarr, http://twitter.com/harrymarr) C
A django package which act as a gateway to send and receive email with amazon SES.
django-email-gateway: Introduction: A Simple Django app to easily send emails, receive inbound emails from users with different email vendors like AWS
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.
Learning Opinion Summarizers by Selecting Informative Reviews This repository contains the codebase and the dataset for the corresponding EMNLP 2021
A Screen Translator/OCR Translator made by using Python and Tesseract, the user interface are made using Tkinter. All code written in python.
About An OCR translator tool. Made by me by utilizing Tesseract, compiled to .exe using pyinstaller. I made this program to learn more about python. I
Esse script procura qualquer, dados que você queira na wikipedia! Em breve traremos um com dados em toda a internet.
Buscador de dados simples Dependências necessárias Para você poder começar a utilizar esta ferramenta, você vai precisar da dependência "wikipedia", p
Generate text line images for training deep learning OCR model (e.g. CRNN)
Generate text line images for training deep learning OCR model (e.g. CRNN)
Patch PL to disable LK verification. Patch LK to disable boot/recovery verification.
Simple Python(3) script to disable LK verification in Amazon Preloader images and boot/recovery image verification in Amazon LK ("Little Kernel") images.
一个多语言支持、易使用的 OCR 项目。An easy-to-use OCR project with multilingual support.
AgentOCR 简介 AgentOCR 是一个基于 PaddleOCR 和 ONNXRuntime 项目开发的一个使用简单、调用方便的 OCR 项目 本项目目前包含 Python Package 【AgentOCR】 和 OCR 标注软件 【AgentOCRLabeling】 使用指南 Pytho
pix2tex: Using a ViT to convert images of equations into LaTeX code.
The goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code.
Automated endpoint management for Amazon Aurora Global Database
This sample code can be used to manage Aurora global database endpoints. After failover the global database writer endpoints swap from one region to the other. This solution automates creation and management of Route 53 based endpoints, so the applications don't have to change the connections strings.
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
Amazon Scraper: A command-line tool for scraping Amazon product data
Amazon Product Scraper: 2021 Description A command-line tool for scraping Amazon product data to CSV or JSON format(s). Requirements Python 3 pip3 Ins
HTTP Calls to Amazon Web Services Rest API for IoT Core Shadow Actions 💻🌐💡
aws-iot-shadow-rest-api HTTP Calls to Amazon Web Services Rest API for IoT Core Shadow Actions 💻 🌐 💡 This simple script implements the following aw
Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021
🐯 SynthTIGER: Synthetic Text Image GEneratoR Official implementation of SynthTIGER | Paper | Datasets Moonbin Yim1, Yoonsik Kim1, Han-cheol Cho1, Sun
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks
Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka
Integrating Amazon API Gateway private endpoints with on-premises networks
Integrating Amazon API Gateway private endpoints with on-premises networks Read the blog about this application: Integrating Amazon API Gateway privat
A dataset for online Arabic calligraphy
Calliar Calliar is a dataset for Arabic calligraphy. The dataset consists of 2500 json files that contain strokes manually annotated for Arabic callig
Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR
Products Recognition 介绍 商品识别,围绕在复杂的商场零售场景中,识别出货架图像中的商品信息。主要组成部分: 重复图像检测。【更新进度 4/10】 图像拼接。【更新进度 0/10】 目标检测。【更新进度 0/10】 商品识别。【更新进度 1/10】 OCR。【更新进度 1/10】
A cross platform OCR Library based on PaddleOCR & OnnxRuntime
A cross platform OCR Library based on PaddleOCR & OnnxRuntime
Dados Públicos de CNPJ disponibilizados pela Receita Federal do Brasil
Dados Públicos CNPJ Fonte oficial da Receita Federal do Brasil, aqui. Layout dos arquivos, aqui. A Receita Federal do Brasil disponibiliza bases com o
PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)
Vision Transformer for Fast and Efficient Scene Text Recognition (ICDAR 2021) ViTSTR is a simple single-stage model that uses a pre-trained Vision Tra
Universal Command Line Interface for Amazon Web Services
This package provides a unified command line interface to Amazon Web Services.
OpenMMLab Text Detection, Recognition and Understanding Toolbox
Introduction English | 简体中文 MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the correspondi
Tool which allow you to detect and translate text.
Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr
Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)
MASTER-PyTorch PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This projec
Layout Parser is a deep learning based tool for document image layout analysis tasks.
A Python Library for Document Layout Understanding
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. ocrmypdf # it's a scriptable c
Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition
CRNN_Tensorflow This is a TensorFlow implementation of a Deep Neural Network for scene text recognition. It is mainly based on the paper "An End-to-En
An expandable and scalable OCR pipeline
Overview Nidaba is the central controller for the entire OGL OCR pipeline. It oversees and automates the process of converting raw images into citable
A simple OCR API server, seriously easy to be deployed by Docker, on Heroku as well
ocrserver Simple OCR server, as a small working sample for gosseract. Try now here https://ocr-example.herokuapp.com/, and deploy your own now. Deploy
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
Open Semantic Search https://opensemanticsearch.org Integrated search server, ETL framework for document processing (crawling, text extraction, text a
Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.
doc2text doc2text extracts higher quality text by fixing common scan errors Developing text corpora can be a massive pain in the butt. Much of the tex
An OCR evaluation tool
dinglehopper dinglehopper is an OCR evaluation tool and reads ALTO, PAGE and text files. It compares a ground truth (GT) document page with a OCR resu
Text recognition (optical character recognition) with deep learning methods.
What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis | paper | training and evaluation data | failure cases and cle
TedEval: A Fair Evaluation Metric for Scene Text Detectors
TedEval: A Fair Evaluation Metric for Scene Text Detectors Official Python 3 implementation of TedEval | paper | slides Chae Young Lee, Youngmin Baek,
The CIS OCR PostCorrectionTool
The CIS OCR Post Correction Tool PoCoTo Source code for the Java-based PoCoTo client enabling fast interactive batch corrections of complete OCR error
Toolbox for OCR post-correction
Ochre Ochre is a toolbox for OCR post-correction. Please note that this software is experimental and very much a work in progress! Overview of OCR pos
Binarize document images
Binarization Binarization for document images Examples Introduction This tool performs document image binarization (i.e. transform colour/grayscale to
Pre-Recognize Library - library with algorithms for improving OCR quality.
PRLib - Pre-Recognition Library. The main aim of the library - prepare image for recogntion. Image processing can really help to improve recognition q