267 Python Pdf-document Libraries

[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

LinkBERT: A Knowledgeable Language Model Pretrained with Document Links This repo provides the model, code & data of our paper: LinkBERT: Pretraining

264 Jan 1, 2023

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Multi-Type-TD-TSR Check it out on Source Code of our Paper: Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for

178 Dec 27, 2022

Incomplete easy-to-use math solver and PDF generator.

Math Expert Let me do your work Preview preview.mp4 Introduction Math Expert is our (@salastro, @younis-tarek, @marawn-mogeb) math high school graduat

22 Jul 11, 2022

Let's create a tool to convert Thailand budget from PDF to CSV.

thailand-budget-pdf2csv Let's create a tool to convert Thailand Government Budgeting from PDF to CSV! รวมพลัง Dev แปลงงบ จาก PDF สู่ Machine-readable

88 Dec 19, 2022

This is a pytorch implementation for the BST model from Alibaba https://arxiv.org/pdf/1905.06874.pdf

Behavior-Sequence-Transformer-Pytorch This is a pytorch implementation for the BST model from Alibaba https://arxiv.org/pdf/1905.06874.pdf This model

83 Jan 5, 2023

Source code for "A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction" @ NAACL 2022

TSAR Source code for NAACL 2022 paper: A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction. 🔥 Introduction We focus on extra

21 Sep 24, 2022

A modern pure-Python library for reading PDF files

pdf A modern pure-Python library for reading PDF files. The goal is to have a modern interface to handle PDF files which is consistent with itself and

6 Apr 6, 2022

Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.

Deep-Learning-for-Text-Document-Classification Text classification is one of the popular tasks in NLP that allows a program to classify free-text docu

2 Mar 17, 2022

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

BiDR Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval. Requirements torch==

11 Oct 20, 2022

Pydf: A modular Telegram Bot which provides Pdf Tools using PyPdf2

pyDF-Bot 🌍 Pydf - Pyrogram Document File Bot, a modular Telegram Bot which prov

2 Feb 18, 2022

Pgn2tex - Scripts to convert pgn files to latex document. Useful to build books or pdf from pgn studies

Pgn2Latex (WIP) A simple script to make pdf from pgn files and studies. It's sti

12 Jul 23, 2022

Wats2PDF - Convert whatsapp exported chat(without media) into a readable pdf format

Wats2PDF convert whatsApp exported chat into a readable pdf format. convert with

5 Apr 26, 2022

Searches a document for hash tags. Support multiple natural languages. Works in various contexts.

ht-getter Searches a document for hash tags. Supports multiple natural languages. Works in various contexts. This package uses a non-regex approach an

1 Mar 1, 2022

Convert PDF to AudioBook and Audio Speech to PDF

In this Python project, we will build a GUI-based PDF to Audio and Audio to PDF converter using the Tkinter, OS, path, pyttsx3, SpeechRecognition, PyPDF4, and Pydub libraries and the messagebox module of the Tkinter library.

1 Feb 13, 2022

An interactive document scanner built in Python using OpenCV

The scanner takes a poorly scanned image, finds the corners of the document, applies the perspective transformation to get a top-down view of the document, sharpens the image, and applies an adaptive color threshold to clean up the image.

1 Feb 12, 2022

File-based TF-IDF: Calculates keywords in a document, using a word corpus.

File-based TF-IDF Calculates keywords in a document, using a word corpus. Why? Because I found myself with hundreds of plain text files, with no way t

1 Feb 11, 2022

Split given PDF document into 4 page groups and convert them to booklet format

PUTO: PDF to Booklet converter Split given PDF document into 4 page groups and convert them to booklet format. It creates a PDF like shown below: Fir

3 Mar 12, 2022

Convert MD files to PDF automatically (with CSS) 📄🚀

MD2PDF Action Convert MD files to PDF automatically (with CSS)! Converts a pattern described set of markdown files and converts them to pdf whilst app

1 Feb 9, 2022

DietPDF aims at reducing PDF file size while not degrading quality nor losing metadata

6 Jul 27, 2022

JoplinPdf2Images - Converts a PDF to images in Joplin and adds it to the specified note as a printout

joplinPdf2Images Converts a PDF to images in Joplin and adds it to the specified

2 Apr 20, 2022

Svg2pdfgen - Svg To PDF gen with python

3 May 30, 2022

Word document generator with python

In this study, real world data is anonymized. The content is completely different, but the structure is the same. It was a script I prepared for the backend of a work using UiPath.

3 Jan 30, 2022

Compare-pdf - A Flask driven restful API for comparing two PDF files

COMPARE-PDF A Flask driven restful API for comparing two PDF files. Description

3 Mar 13, 2022

DocEnTr: An end-to-end document image enhancement transformer

DocEnTR Description Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on to

19 Jan 29, 2022

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.

1 Jan 28, 2022

A deep learning framework for historical document image analysis

DIVA-DAF Description A deep learning framework for historical document image analysis. How to run Install dependencies # clone project git clone https

9 Aug 4, 2022

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

DocEnTR Description Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on to

74 Jan 7, 2023

This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Script_Convertir_PDF_IMG_TXT Este script de pyhton convierte un pdf en Imagen luego utilizando tesseract como motor OCR convierte la Imagen a Texto. p

1 Jan 27, 2022

A simple document management REST based API for collaboratively interacting with documents

documan_api A simple document management REST based API for collaboratively interacting with documents.

1 Jan 22, 2022

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Good news! Our new work exhibits state-of-the-art performances on DocUNet benchmark dataset: DocScanner: Robust Document Image Rectification with Prog

231 Dec 26, 2022

RedisJSON - a JSON data type for Redis

RedisJSON is a Redis module that implements ECMA-404 The JSON Data Interchange Standard as a native data type. It allows storing, updating and fetching JSON values from Redis keys (documents).

3.4k Dec 29, 2022

A backend for mdbook in Python for generating PDF based on Chrome DevTools Protocol.

mdbook-pdf A backend for mdbook written in Python for generating PDF based on Chrome DevTools Protocol. Python library dependency Usage Put mdbook-pdf

49 Dec 27, 2022

Document manipulation detection with python

image manipulation detection task: -- tianchi function image segmentation salie

3 Aug 22, 2022

A very simple document database

DockieDb A simple in-memory document database. Installation Build the Wheel Fork or clone this repository and run python setup.py bdist_wheel in the r

1 Jan 16, 2022

A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

DeepKE is a knowledge extraction toolkit supporting low-resource and document-level scenarios for entity, relation and attribute extraction. We provide comprehensive documents, Google Colab tutorials, and online demo for beginners.

1.6k Jan 5, 2023

PyTorch code for JEREX: Joint Entity-Level Relation Extractor

JEREX: "Joint Entity-Level Relation Extractor" PyTorch code for JEREX: "Joint Entity-Level Relation Extractor". For a description of the model and exp

50 Dec 1, 2022

Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.

Lbl2Vec Lbl2Vec is an algorithm for unsupervised document classification and unsupervised document retrieval. It automatically generates jointly embed

61 Dec 20, 2022

Shelf DB is a tiny document database for Python to stores documents or JSON-like data

Shelf DB Introduction Shelf DB is a tiny document database for Python to stores documents or JSON-like data. Get it $ pip install shelfdb shelfquery S

35 Nov 3, 2022

This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

Paragraph Aggregation Retrieval Model (PARM) for Dense Document-to-Document Retrieval This repository contains the code for the paper PARM: A Paragrap

33 Aug 26, 2022

This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

Introduction This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents". If

0 Jan 11, 2022

Import entity definition document into SQLie3. Manage the entity. Also, create a "Create Table SQL file".

EntityDocumentMaker Version 1.00 After importing the entity definition (Excel file), store the data in sqlite3. エンティティ定義（Excelファイル）をインポートした後、データをsqlit

1 Jan 9, 2022

A bot for PDF for doing Many Things....

Telegram PDF Bot A Telegram bot that can: Compress, crop, decrypt, encrypt, merge, preview, rename, rotate, scale and split PDF files Compare text dif

60 Dec 27, 2022

PDFSanitizer - Renders possibly unsafe PDF files and outputs harmless PDF files

PDFSanitizer Renders possibly malicious PDF files and outputs harmless PDF files

9 Jan 30, 2022

Python utility library for compositing PDF documents with reportlab.

pdfdoc-py Python utility library for compositing PDF documents with reportlab. Installation The pdfdoc-py package can be installed directly from the s

1 Jan 6, 2022

Htmdf - html to pdf with support for variables using fastApi.

htmdf Converts html to pdf with support for variables using fastApi. Installation Clone this repository. git clone https://github.com/ShreehariVaasish

1 Jan 30, 2022

Image Compression GUI APP Python: PyQt5

Image Compression GUI APP Image Compression GUI APP Python: PyQt5 Use : f5 or debug or simply run it on your ids(vscode , pycham, anaconda etc.) socia

1 May 21, 2022

Awesome-AI-books - Some awesome AI related books and pdfs for learning and downloading

Awesome AI books Some awesome AI related books and pdfs for downloading and learning. Preface This repo only used for learning, do not use in business

1k Jan 1, 2023

A supercharged version of paperless: scan, index and archive all your physical documents

Paperless-ng Paperless (click me) is an application by Daniel Quinn and contributors that indexes your scanned documents and allows you to easily sear

5.3k Jan 9, 2023

Mipdfcompressor - 💕A simple pdf size compressing telegram robot

Pdf Compressor Telegram Bot A simple pdf size compressing telegram robot. Useful for digital documentation. Mandatory Variables API_HASH - Your A

1 Feb 14, 2022

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. ocrmypdf

8k Jan 8, 2023

A bulk pdf generator. This application can generate PDFs in bulk by using just one click.

A bulk html pdf generator. This application can generate PDFs in bulk by using just one click. Screenshots Requirements 🧱 Your system must have the f

3 Apr 23, 2022

Produce pdf in python backend from simple bootstrap vue frontend and download to browser

vollmacht produce pdf in python backend from simple bootstrap vue frontend and download to browser Frontend in one file with bootstrap-vue (allthough

1 Nov 8, 2020

ElasticSearch ODM (Object Document Mapper) for Python - pip install esengine

esengine - The Elasticsearch Object Document Mapper esengine is an ODM (Object Document Mapper) it maps Python classes in to Elasticsearch index/doc_t

109 Nov 22, 2022

Zen-Knit is a formal (PDF), informal (HTML) report generator for data analyst and data scientist who wants to use python.

About Zen-Knit: Zen-Knit is a formal (PDF), informal (HTML) report generator for data analyst and data scientist who wants to use python. Inspired fro

27 Jul 13, 2022

Essential Document Generator

Essential Document Generator Dead Simple Document Generation Whether it's testing database performance or a new web interface, we've all needed a dead

59 Nov 11, 2022

A document format conversion service based on Pandoc.

reformed Document format conversion service based on Pandoc. Usage The API specification for the Reformed server is as follows: GET /api/v1/formats: L

3 Jul 18, 2022

Python document object mapper (load python object from JSON and vice-versa)

lupin is a Python JSON object mapper lupin is meant to help in serializing python objects to JSON and unserializing JSON data to python objects. Insta

24 Nov 9, 2022

Software that extracts spreadsheets from various .pdf files to .csv

Extração de planilhas de diversos arquivos .pdf para .csv O código inteiro foi desenvolvido em Python. Foi utilizado o pacote "tabula" e a biblioteca

2 Jan 9, 2022

Simple pdf editor while preserving structure and format.

SIMPdf Simple pdf editor while preserving structure and format.

242 Jan 4, 2023

minipdf is a package for creating simple, single-page PDF documents.

minipdf minipdf is a package for creating simple, single-page PDF documents. Installation You can install the development version from GitHub with: #

41 Dec 19, 2022

Useful PDF-related productivity tool.

Luftmensch 1.4.7 (Español) | 1.4.3 (English) Version 1.4.7 (Español) released in October 2021. Version 1.4.3 (English) released in September 2021. 🏮

8 Dec 29, 2022

Scans pdfs for links written in plaintext and checks if they are active or returns an error code.

Scans pdfs for links written in plaintext and checks if they are active or returns an error code. It then generates a report of its findings. Extract references (pdf, url, doi, arxiv) and metadata from a PDF.

22 Nov 21, 2022

Convert given source code into .pdf with syntax highlighting and more features

Code2pdf 📠 Convert given source code into .pdf with syntax highlighting and more features Build Status Version Downloads Python Demo Installation Bui

343 Jan 5, 2023

Automate the case review on legal case documents and find the most critical cases using network analysis

Automation on Legal Court Cases Review This project is to automate the case review on legal case documents and find the most critical cases using netw

7 Dec 28, 2022

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Hiring We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-traine

7.8k Jan 9, 2023

Longformer: The Long-Document Transformer

Longformer Longformer and LongformerEncoderDecoder (LED) are pretrained transformer models for long documents. ***** New December 1st, 2020: Longforme

1.6k Dec 29, 2022

A toolkit for document-level event extraction, containing some SOTA model implementations

❤️ A Toolkit for Document-level Event Extraction with & without Triggers Hi, there 👋 . Thanks for your stay in this repo. This project aims at buildi

159 Dec 22, 2022

[AAAI 2022] Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Sparse Structure Learning via Graph Neural Networks for inductive document classification Make graph dataset create co-occurrence graph for datasets.

16 Dec 22, 2022

A toolkit for document-level event extraction, containing some SOTA model implementations

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker Source code for ACL-IJCNLP 2021 Long paper: Document-le

84 Dec 15, 2022

A voice assistant which can be used to interact with your computer and controls your pc operations

Introduction 👨‍💻 It is a voice assistant which can be used to interact with your computer and also you have been seeing it in Iron man movies, but t

84 Dec 22, 2022

Telegram bot that can do a lot of things related to PDF files.

Telegram PDF Bot A Telegram bot that can: Compress, crop, decrypt, encrypt, merge, preview, rename, rotate, scale and split PDF files Compare text dif

130 Dec 26, 2022

Generate modern Python clients from OpenAPI

openapi-python-client Generate modern Python clients from OpenAPI 3.x documents. This generator does not support OpenAPI 2.x FKA Swagger. If you need

555 Jan 2, 2023

🦎 A NeoVim plugin for highlighting visual selections like in a normal document editor!

🦎 HighStr.nvim A NeoVim plugin for highlighting visual selections like in a normal document editor! Demo TL;DR HighStr.nvim is a NeoVim plugin writte

222 Jan 3, 2023

Excalibur: A web interface to extract tabular data from PDFs

Excalibur: A web interface to extract tabular data from PDFs Excalibur is a web interface to extract tabular data from PDFs, written in Python 3! It i

1.2k Jan 4, 2023

Python bindings for MuPDF's rendering library.

PyMuPDF 1.19.3 Release date: December 15, 2021 On PyPI since August 2016: Author Jorj X. McKie, based on original code by Ruikai Liu. Introduction PyM

0 Nov 3, 2022

Bnagla hand written document digiiztion

Bnagla hand written document digiiztion This repo addresses the problem of digiizing hand written documents in Bangla. Documents have definite fields

1 Dec 10, 2021

This tool crawls a list of websites and download all PDF and office documents

This tool crawls a list of websites and download all PDF and office documents. Then it analyses the PDF documents and tries to detect accessibility issues.

7 Sep 30, 2022

The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

ECIR Reproducibility Paper: Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study This code corresponds to the reproducibility

3 Mar 31, 2022

Convert PDF/Image to TXT using EasyOcr - the best OCR engine available!

PDFImage2TXT - DOWNLOAD INSTALLER HERE What can you do with it? Convert scanned PDFs to TXT. Convert scanned Documents to TXT. No coding required!! In

2 Feb 22, 2022

TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"

Hierarchical Attention Networks for Document Classification This is an implementation of the paper Hierarchical Attention Networks for Document Classi

83 Dec 5, 2022

DUE: End-to-End Document Understanding Benchmark

This is the repository that provide tools to download data, reproduce the baseline results and evaluation. What can you achieve with this guide Based

21 Dec 29, 2022

pubmex.py - a script to get a fancy paper title based on given DOI or PMID

pubmex.py is a script to get a fancy paper title based on given DOI or PMID (can be also combined with macOS Finder)

13 Nov 20, 2022

A tool for certificate PDF generation.

certificate-pdf-generator 获奖证书PDF批量生成工具 | a Tool for certificate PDF generation. ⚠️ 下载前请注意本项目使用了LFS来存储PDF等大文件。在克隆或下载本仓库前，请先使用apt等包管理器安装git-lfs包。如果已经克

4 Nov 28, 2022

Web and PDF Scraper Refactoring

Web and PDF Scraper Refactoring This repository contains the example code of the Web and PDF scraper code roast. Here are the links to the videos: Par

18 Dec 31, 2022

A python script that fetches the grades of a student from a WAEC result in pdf format.

About waec-result-analyzer A python script that fetches the grades of a student from a WAEC result in pdf format. Built for federal government college

2 Dec 4, 2021

Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Graph Neural Topic Model (GNTM) This is the pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Persp

8 Sep 14, 2022

A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

A leetcode scraper to compile all questions in leetcode free tier to text file, pdf also available. if new questions get added, run again to get new questions.

3 Dec 7, 2021

Trata PDF para torná-lo compatível com PDF/X e com impressoras em escala de cinza.

tratapdf Trata PDF para torná-lo compatível com PDF/X e com impressoras em escala de cinza. dependências icc-profiles ghostscript visualizador de PDF

1 Nov 30, 2021

Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API

Dominate Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure

1.5k Jan 9, 2023

A lightweight and fast-to-use Markdown document generator based on Python

1 Jan 10, 2022

MMDA - multimodal document analysis

75 Jan 4, 2023

Meaningful titles for tabs and PDF downloads! Also supports tab search.

arxiv-utils If you are a researcher that reads a lot on ArXiv, you'll benefit a lot from this web extension. Renames the title of PDF page to the pape

174 Dec 20, 2022

Busca no nome e conteúdo de arquivos PDF no diretório e subdiretórios.

PDF Finder Este script auxilia na pesquisa em pastas com inúmeros arquivos PDF. A pesquisa é feita em todos os arquivos do doretório e subdiretórios.

1 Nov 27, 2021

Camelot is a Python library that makes it easy for anyone to extract tables from PDF files

Camelot: PDF Table Extraction for Humans Camelot is a Python library that makes it easy for anyone to extract tables from PDF files! Note: You can als

3.3k Jan 6, 2023

PhD document for navlab

PhD_document_for_navlab The project contains the relative software documents which I developped or used during my PhD period. It includes: FLVIS. A st

9 Feb 21, 2022

A Discord token grabber executing in a Microsoft Document.

🦊 Rage 🦊 Rage is a tool written in Python3 allowing you to inject a Python3 complete Discord token grabber (Riot) script in a Microsoft Document usi

73 Nov 3, 2022

CLI tool to generate pdf invoices written in python

invoicepy CLI invoice tool, store and print invoices as pdf. save companies and customers for later use. installation pip install invoicepy config co

9 Aug 1, 2022

Malicious Document IoC Extractor is a collection of scripts that helps extracting IoCs from various maldoc families.

MDIExtractor Malicious Document IoC Extractor (MDIExtractor) is a collection of scripts that helps extracting IoCs from various maldoc families. Prere

14 Nov 25, 2022

Learning Logic Rules for Document-Level Relation Extraction

LogiRE Learning Logic Rules for Document-Level Relation Extraction We propose to introduce logic rules to tackle the challenges of doc-level RE. Equip

41 Dec 26, 2022

Self-Supervised Document-to-Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference

Self-Supervised Document Similarity Ranking (SDR) via Contextualized Language Models and Hierarchical Inference This repo is the implementation for SD

36 Nov 28, 2022

Python Pdf-document Resources

Python pdf-document Libraries

[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Incomplete easy-to-use math solver and PDF generator.

Let's create a tool to convert Thailand budget from PDF to CSV.

This is a pytorch implementation for the BST model from Alibaba https://arxiv.org/pdf/1905.06874.pdf

Source code for "A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction" @ NAACL 2022

A modern pure-Python library for reading PDF files

Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

Pydf: A modular Telegram Bot which provides Pdf Tools using PyPdf2

Pgn2tex - Scripts to convert pgn files to latex document. Useful to build books or pdf from pgn studies

Wats2PDF - Convert whatsapp exported chat(without media) into a readable pdf format

Searches a document for hash tags. Support multiple natural languages. Works in various contexts.

Convert PDF to AudioBook and Audio Speech to PDF

An interactive document scanner built in Python using OpenCV

File-based TF-IDF: Calculates keywords in a document, using a word corpus.

Split given PDF document into 4 page groups and convert them to booklet format

Convert MD files to PDF automatically (with CSS) 📄🚀

DietPDF aims at reducing PDF file size while not degrading quality nor losing metadata

JoplinPdf2Images - Converts a PDF to images in Joplin and adds it to the specified note as a printout

Svg2pdfgen - Svg To PDF gen with python

Word document generator with python

Compare-pdf - A Flask driven restful API for comparing two PDF files

DocEnTr: An end-to-end document image enhancement transformer

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

A deep learning framework for historical document image analysis

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

A simple document management REST based API for collaboratively interacting with documents

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

RedisJSON - a JSON data type for Redis

A backend for mdbook in Python for generating PDF based on Chrome DevTools Protocol.

Document manipulation detection with python

A very simple document database

A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

PyTorch code for JEREX: Joint Entity-Level Relation Extractor

Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.

Shelf DB is a tiny document database for Python to stores documents or JSON-like data

This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

Import entity definition document into SQLie3. Manage the entity. Also, create a "Create Table SQL file".

A bot for PDF for doing Many Things....

PDFSanitizer - Renders possibly unsafe PDF files and outputs harmless PDF files

Python utility library for compositing PDF documents with reportlab.

Htmdf - html to pdf with support for variables using fastApi.

Image Compression GUI APP Python: PyQt5

Awesome-AI-books - Some awesome AI related books and pdfs for learning and downloading

A supercharged version of paperless: scan, index and archive all your physical documents

Mipdfcompressor - 💕A simple pdf size compressing telegram robot

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

A bulk pdf generator. This application can generate PDFs in bulk by using just one click.

Produce pdf in python backend from simple bootstrap vue frontend and download to browser

ElasticSearch ODM (Object Document Mapper) for Python - pip install esengine

Zen-Knit is a formal (PDF), informal (HTML) report generator for data analyst and data scientist who wants to use python.

Essential Document Generator

A document format conversion service based on Pandoc.

Python document object mapper (load python object from JSON and vice-versa)

Software that extracts spreadsheets from various .pdf files to .csv

Simple pdf editor while preserving structure and format.

minipdf is a package for creating simple, single-page PDF documents.

Useful PDF-related productivity tool.

Scans pdfs for links written in plaintext and checks if they are active or returns an error code.

Convert given source code into .pdf with syntax highlighting and more features

Automate the case review on legal case documents and find the most critical cases using network analysis

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Longformer: The Long-Document Transformer

A toolkit for document-level event extraction, containing some SOTA model implementations

[AAAI 2022] Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

A toolkit for document-level event extraction, containing some SOTA model implementations

A voice assistant which can be used to interact with your computer and controls your pc operations

Telegram bot that can do a lot of things related to PDF files.

Generate modern Python clients from OpenAPI

🦎 A NeoVim plugin for highlighting visual selections like in a normal document editor!

Excalibur: A web interface to extract tabular data from PDFs

Python bindings for MuPDF's rendering library.

Bnagla hand written document digiiztion

This tool crawls a list of websites and download all PDF and office documents

The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study