267 Python Pdf-document Libraries

DivNoising is an unsupervised denoising method to generate diverse denoised samples for any noisy input image. This repository contains the code to reproduce the results reported in the paper https://openreview.net/pdf?id=agHLCOBM5jP

DivNoising: Diversity Denoising with Fully Convolutional Variational Autoencoders Mangal Prakash1, Alexander Krull1,2, Florian Jug2 1Authors contribut

33 Dec 30, 2022

Detectron2 for Document Layout Analysis

Detectron2 trained on PubLayNet dataset This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Det

163 Nov 21, 2022

Sheet Data Image/PDF-to-CSV Converter

5 Nov 22, 2021

AI-powered literature discovery and review engine for medical/scientific papers

AI-powered literature discovery and review engine for medical/scientific papers paperai is an AI-powered literature discovery and review engine for me

819 Dec 30, 2022

The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

2 Nov 20, 2021

rst2pdf: Use a text editor. Make a PDF.

487 Jan 6, 2023

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)

DocFormer - PyTorch Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for t

171 Jan 6, 2023

This is PDF Merger Application Developed using Just Python

2 Nov 18, 2021

Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files

cdvpp Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files Reads a Digital Vaccination Pass PDF file as in

1 Nov 17, 2021

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL, and utterance id

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL

3 Dec 26, 2022

A simple Python script to convert multiple images (well technically also a single image) into a pdf.

PythonImage2PDF A simple Python script to convert multiple images into a single PDF-document. Created basically for only my own needs for converting m

1 Jun 28, 2022

A simple Telegram bot can convert web docs, Telegraph links, etc. to Pdf !

A Telegram Bot to convert http Links to PDF

10 Oct 28, 2022

The goal of this project is for anyone with an old printer to be able to double-sided printing.

Welcome to PDF-double-side! Hi! I'm 15. I have a old printer so I can't print double-sided outs. The goal of this project is for anyone with an old pr

4 Dec 28, 2021

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

Alpha Swap English This is a simple python tool for the purpose of swapping latinic letters with cirylic ones and vice versa, in txt, docx and pdf fil

3 May 31, 2022

Extract the table in the PDF，outputs the data similar to the json format

extract the table in the PDF，outputs the data similar to the json format

3 Nov 25, 2021

Reads Data from given Excel File and exports Single PDFs and a complete PDF grouped by Gateway

E-Shelter Excel2QR Reads Data from given Excel File and exports Single PDFs and a complete PDF grouped by Gateway Features Reads Excel 2021 Export Sin

1 Nov 13, 2021

This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

Text Summarizer This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text. Team Members This mini-project was

1 Nov 16, 2021

A Certificate renaming tool made for IEEE CS SBC, SJCE.

PDF Batch Renamer Made for IEEE CS SBC, SJCE How to use? Before using the python script, ensure that pytesseract, pdf2image, opencv and other supporti

2 Nov 14, 2021

Searching keywords in PDF file folders

keyword_searching Steps to use this Python scripts： (1)Paste this script into the file folder containing the PDF files you need to search from; (2)Thi

1 Nov 8, 2021

This is a file deletion program that asks you for an extension of a file (.mp3, .pdf, .docx, etc.) to delete all of the files in a dir that have that extension.

FileBulk This is a file deletion program that asks you for an extension of a file (.mp3, .pdf, .docx, etc.) to delete all of the files in a dir that h

1 Jun 26, 2022

pikepdf is a Python library for reading and writing PDF files.

A Python library for reading and writing PDF, powered by qpdf

1.6k Jan 3, 2023

Document blur detection based on Laplacian operator and text detection.

Document Blur Detection For general blurred image, using the variance of Laplacian operator is a good solution. But as for the blur detection of docum

5 Oct 20, 2022

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

75 Oct 21, 2022

Small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface

1.8k Dec 29, 2022

Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

3.2k Dec 31, 2022

Generate a preview image for a PDF.

PDF ➡️ Preview A simple tool to save me time on Illustrator. Generates a preview image for a PDF file. Useful for sneak peeks to academic publications

51 Sep 22, 2022

Program that locks/unlocks pdf files🐍

🐍 📄 PDFtools 📄 🐍 Programa que bloqueia/desbloqueia arquivos pdf Requisitos • Como usar • Capturas de Tela 🚨 Aviso 🚨 Altere os caminhos referente

1 Nov 4, 2021

Converting Html files to pdf using python script, pdfkit module and wkhtmltopdf.

Html-to-pdf-pdfkit-wkhtml- This repository has code for converting local html files and online html resources into pdf. It is an python script which u

1 Nov 9, 2021

this is simple program, that converts pdf file to png

author: a5892731 last update:2021-11-01 version: 1.1 resources: -https://pypi.org/project/pdf2image/ -https://github.com/oschwartz10612/poppler-window

1 Nov 1, 2021

Convert markdown to HTML using the GitHub API and some additional tweaks with Python.

Convert markdown to HTML using the GitHub API and some additional tweaks with Python. Comes with full formula support and image compression.

70 Dec 23, 2022

A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV.

DcoumentScanner A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV. Directly install the .exe file to inst

1 Oct 29, 2021

a simple ehentai downloader with jpg 2 pdf

Simple_Ehentai_DownLoader a simple ehentai downloader with jpg 2 pdf 中文介绍 Environment python3.8 How to use before you start,there are some tips. the q

6 Dec 11, 2022

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

75 Oct 21, 2022

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation Official Code Repository for the paper "Unsupervised Documen

2 Oct 26, 2021

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

Auto-Research A no-code utility to generate a detailed well-cited survey with topic clustered sections (draft paper format) and other interesting arti

20 Dec 14, 2022

Table automatically extraction from PDF Document

PDF Table Extractor Table automatically extraction from PDF Document Our Icon 📌 Name : PDF Table Extractor 📌 Authors : Minku Koo Jiyong Park 📌 Deve

1 Jan 10, 2022

Key information extraction from invoice document with Graph Convolution Network

Key Information Extraction from Scanned Invoices Key information extraction from invoice document with Graph Convolution Network Related blog post fro

39 Dec 16, 2022

Convert lecture videos to slides in one line. Takes an input of a directory containing your lecture videos and outputs a directory containing .PDF files containing the slides of each lecture.

12 Sep 10, 2022

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

498 Dec 24, 2022

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu

114 Jan 6, 2023

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu

111 Dec 18, 2022

One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

Introduction One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing". Users

18 Dec 11, 2022

Styled Handwritten Text Generation with Transformers (ICCV 21)

⚡ Handwriting Transformers [PDF] Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan & Mubarak Shah Abstract: We

85 Dec 22, 2022

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

18 Nov 28, 2022

A simple pdf size compressing telegram robot witten in python.

Pdf Compressor Telegram Bot ##About : A simple pdf size compressing telegram robot witten in python. Mostly useful for digital documentation. Deploy t

22 Oct 28, 2022

Arxiv2Kindle is a simple script written in python that converts LaTeX source downloaded from Arxiv and recompiles it to better fit a Kindle or other similar reading devices.

Arxiv2Kindle is a simple script written in python that converts LaTeX source downloaded from Arxiv and recompiles it to better fit a read

8 Jul 9, 2022

Convert Lecture Videos to PDF

Convert Lecture Videos to PDF Description Want to go through lecture videos faster without missing any information? Wish you can read the lecture vide

20 Nov 25, 2022

A cross-document event and entity coreference resolution system, trained and evaluated on the ECB+ corpus.

A Comprehensive Comparison of Word Embeddings in Event & Entity Coreference Resolution. Introduction This repo contains experimental code derived from

2 May 9, 2022

Programa que viabiliza a OCR (Optical Character Reading - leitura óptica de caracteres) de um PDF.

Este programa tem o intuito de ser um modificador de arquivos PDF. Os arquivos PDFs podem ser 3: PDFs verdadeiros - em que podem ser selecionados o ti

2 Oct 11, 2021

Simple python tool created for downloading PDF.

PDFdownloader Usage Open PDF in full-screen mode Run scan.exe Enter how many pages you want to scan Focus PDF After scanning is done, run merge.exe En

5 Oct 27, 2021

This book will take you on an exploratory journey through the PDF format, and the borb Python library.

281 Jan 1, 2023

Docbarcodes extracts 1D and 2D barcodes from scanned PDF documents or images. It can be used to automate extraction and processing of all kind of documents.

Intro Barcodes are being used in many documents or forms to enable machine reading capabilities and reduce manual processing effort. Simple 1D barcode

3 Jun 18, 2022

Document Web APIs made with Django Rest Framework

DRF Docs Document Web APIs made with Django Rest Framework. View Demo Contributors Wanted: Do you like this project? Using it? Let's make it better! S

626 Nov 20, 2022

Website desenvolvido em Django para gerenciamento e upload de arquivos (.pdf).

Website para Gerenciamento de Arquivos Features Esta é uma aplicação full stack web construída para desenvolver habilidades com o framework Django. O

8 Sep 22, 2022

distfit - Probability density fitting

Python package for probability density function fitting of univariate distributions of non-censored data

187 Dec 30, 2022

An application which enables the users to perform simple yet intriguing PDF operations

AstutePDF A repository containing the GUI for an application which enables the users to perform simple yet intriguing PDF operations. These include, M

5 Jan 22, 2022

Mayan EDMS is a document management system.

Mayan EDMS is a document management system. Its main purpose is to store, introspect, and categorize files, with a strong emphasis on preserving the contextual and business information of documents. It can also OCR, preview, label, sign, send, and receive thoses files.

3 Oct 2, 2021

I can help you convert your images to pdf file.

IMAGE TO PDF CONVERTER BOT Configs TOKEN - Get bot token from @BotFather API_ID - From my.telegram.org API_HASH - From my.telegram.org Deploy to Herok

10 Dec 14, 2022

Parser manager for parsing DOC, DOCX, PDF or HTML files

Parser manager Description Parser gets PDF, DOC, DOCX or HTML file via API and saves parsed data to the database. Implemented in Ruby 3.0.1 using Acti

4 Dec 4, 2021

docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

1.5k Jan 1, 2023

Convert emails without attachments to pdf and send as email

Email to PDF to email This script will check an imap folder for unread emails. Any unread email that does not have an attachment will be converted to

21 Nov 22, 2022

Gera um PDF, logo depois de você responder um questionário simples, e envia para o e-mail que você informar.

PDF generator and send it for your email Criador: Francisco Robson de O. Dutra Filho Repositório criado no dia 18/09/2021 Instagram: @robsondutra_ Sob

8 Nov 22, 2021

The project is investigating methods to extract human-marked data from document forms such as surveys and tests.

The project is investigating methods to extract human-marked data from document forms such as surveys and tests. They can read questions, multiple-choice exam papers, and grade.

5 Mar 27, 2022

x-ray is a Python library for finding bad redactions in PDF documents.

A tool to detect whether a PDF has a bad redaction

73 Dec 19, 2022

CDLA: A Chinese document layout analysis (CDLA) dataset

CDLA: A Chinese document layout analysis (CDLA) dataset 介绍 CDLA是一个中文文档版面分析数据集，面向中文文献类（论文）场景。包含以下10个label：正文标题图片图片标题表格表格标题页眉页脚注释公式 Text Title

84 Dec 28, 2022

NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

pretrain4ir_tutorial NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking 用作NLPIR实验室, Pre-training

12 Apr 7, 2022

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Pre-trained (foundation) models across tasks (understanding, generation and translation), languages (100+ languages), and modalities (language, image, audio, vision + language, audio + language, etc.)

7.6k Jan 1, 2023

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding".

BanglaBERT This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced i

197 Dec 25, 2022

🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴

PAUSE: Positive and Annealed Unlabeled Sentence Embedding Sentence embedding refers to a set of effective and versatile techniques for converting raw

21 Dec 15, 2022

Document processing using transformers

Doc Transformers Document processing using transformers. This is still in developmental phase, currently supports only extraction of form data i.e (ke

13 Dec 21, 2022

txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.

3.1k Dec 31, 2022

Python lib for Simple PDF text extraction

651 Jan 1, 2023

WeasyPrint is a smart solution helping web developers to create PDF documents.

WeasyPrint is a smart solution helping web developers to create PDF documents. It turns simple HTML pages into gorgeous statistical reports, invoices, tickets…

5.4k Jan 8, 2023

borb is a library for reading, creating and manipulating PDF files in python.

2.9k Jan 1, 2023

Fetch McDonald invoices from mailbox and merge them to one PDF file.

concatenate Fetch McDonald invoices from mailbox and merge them to one PDF file. Description This script will fetch all McDonald invoice pdfs from a p

3 Oct 6, 2022

Python script that split PDF files.

Automatic PDF Splitter This script can create new single-page PDFs files from multipaged PDFs. Requirements Python 3.0+ # Debian distros sudo apt-get

5 Apr 2, 2022

Merge multiple PDF files into one.

PDF Merger Merge multiple PDF files into one. Usage % python pdf_merger.py -h usage: pdf_merger.py [-h] [-o OUTPUT] [-f [FILES ...]] optional argumen

6 Oct 3, 2022

Images to PDF Telegram Bot

ilovepdf Convert Images to PDF Bot This bot will helps you to create pdf's from your images [without leaving telegram] 😉 By Default: your pdf fil

116 Dec 29, 2022

Generate a bunch of malicious pdf files with phone-home functionality. Can be used with Burp Collaborator

Malicious PDF Generator ☠️ Generate ten different malicious pdf files with phone-home functionality. Can be used with Burp Collaborator. Used for pene

1.9k Jan 1, 2023

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

235 Dec 22, 2022

This app converts an pdf file into the audio file.

PDF-to-Audio This app takes an pdf as an input and convert it into audio, and the library text-to-speech starts speaking the preffered page given in t

3 Aug 4, 2021

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network

39 Aug 2, 2021

Hierarchical Metadata-Aware Document Categorization under Weak Supervision (WSDM'21)

Hierarchical Metadata-Aware Document Categorization under Weak Supervision This project provides a weakly supervised framework for hierarchical metada

53 Sep 17, 2022

Extract Thailand COVID-19 Cluster data from daily briefing pdf.

Thailand COVID-19 Cluster Data Extraction About Extract Clusters from Thailand Daily COVID-19 briefing PDF Download latest data Here. Data will be upd

5 Sep 27, 2021

organize - The file management automation tool

1.5k Jan 1, 2023

Performing the following operations using python on PDF.

Python PDF Handling Tutorial Python is a highly versatile language with a huge set of libraries. It is a high level language with simple syntax. Pytho

131 Dec 16, 2022

A Python tool to generate a static HTML file that represents the internal structure of a PDF file

PDFSyntax A Python tool to generate a static HTML file that represents the internal structure of a PDF file At some point the low-level functions deve

394 Dec 30, 2022

Command line program to download documents from web portals.

command line document download made easy Highlights list available documents in json format or download them filter documents using string matching re

16 Dec 26, 2022

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network.

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network

111 Dec 27, 2022

Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Aspect-level Sentiment Classification Code and dataset for ACL2018 [paper] ‘‘Exploiting Document Knowledge for Aspect-level Sentiment Classification’’

146 Nov 29, 2022

pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative markdown file as input

pystitcher pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative input in the form of a mark

387 Dec 10, 2022

A python library for extracting text from PDFs without losing the formatting of the PDF content.

Multilingual PDF to Text Install Package from Pypi Install it using pip. pip install multilingual-pdf2text The library uses Tesseract which can be ins

49 Nov 7, 2022

SDL: Synthetic Document Layout dataset

SDL is the project that synthesizes document images. It facilitates multiple-level labeling on document images and can generate in multiple languages.

0 Oct 7, 2021

document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

152 Oct 29, 2022

Implementation for our AAAI2021 paper (Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction).

SSAN Introduction This is the pytorch implementation of the SSAN model (see our AAAI2021 paper: Entity Structure Within and Throughout: Modeling Menti

69 Nov 15, 2022

Cross-Document Coreference Resolution

Cross-Document Coreference Resolution This repository contains code and models for end-to-end cross-document coreference resolution, as decribed in ou

29 Nov 28, 2022

A embed able annotation tool for end to end cross document co-reference

CoRefi CoRefi is an emebedable web component and stand alone suite for exaughstive Within Document and Cross Document Coreference Anntoation. For a de

39 Dec 12, 2022

Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

ATLOP Code for AAAI 2021 paper Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling. If you make use of this co

146 Nov 29, 2022

PyMuPDF is a Python binding with support for MuPDF

PyMuPDF is a Python binding with support for MuPDF (current version 1.18.*), a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, Inc.

1.9k Jan 3, 2023

Code for NAACL 2021 full paper "Efficient Attentions for Long Document Summarization"

LongDocSum Code for NAACL 2021 paper "Efficient Attentions for Long Document Summarization" This repository contains data and models needed to reprodu

56 Jan 2, 2023

Python Pdf-document Resources

Python pdf-document Libraries

DivNoising is an unsupervised denoising method to generate diverse denoised samples for any noisy input image. This repository contains the code to reproduce the results reported in the paper https://openreview.net/pdf?id=agHLCOBM5jP

Detectron2 for Document Layout Analysis

Sheet Data Image/PDF-to-CSV Converter

AI-powered literature discovery and review engine for medical/scientific papers

The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

rst2pdf: Use a text editor. Make a PDF.

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)

This is PDF Merger Application Developed using Just Python

Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL, and utterance id

A simple Python script to convert multiple images (well technically also a single image) into a pdf.

A simple Telegram bot can convert web docs, Telegraph links, etc. to Pdf !

The goal of this project is for anyone with an old printer to be able to double-sided printing.

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

Extract the table in the PDF，outputs the data similar to the json format

Reads Data from given Excel File and exports Single PDFs and a complete PDF grouped by Gateway

This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

A Certificate renaming tool made for IEEE CS SBC, SJCE.

Searching keywords in PDF file folders

This is a file deletion program that asks you for an extension of a file (.mp3, .pdf, .docx, etc.) to delete all of the files in a dir that have that extension.

pikepdf is a Python library for reading and writing PDF files.

Document blur detection based on Laplacian operator and text detection.

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

Small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface

Python-based tools for document analysis and OCR

Generate a preview image for a PDF.

Program that locks/unlocks pdf files🐍

Converting Html files to pdf using python script, pdfkit module and wkhtmltopdf.

this is simple program, that converts pdf file to png

Convert markdown to HTML using the GitHub API and some additional tweaks with Python.

A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV.

a simple ehentai downloader with jpg 2 pdf

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

Table automatically extraction from PDF Document

Key information extraction from invoice document with Graph Convolution Network

Convert lecture videos to slides in one line. Takes an input of a directory containing your lecture videos and outputs a directory containing .PDF files containing the slides of each lecture.

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

Styled Handwritten Text Generation with Transformers (ICCV 21)

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

A simple pdf size compressing telegram robot witten in python.

Arxiv2Kindle is a simple script written in python that converts LaTeX source downloaded from Arxiv and recompiles it to better fit a Kindle or other similar reading devices.

Convert Lecture Videos to PDF

A cross-document event and entity coreference resolution system, trained and evaluated on the ECB+ corpus.

Programa que viabiliza a OCR (Optical Character Reading - leitura óptica de caracteres) de um PDF.

Simple python tool created for downloading PDF.

This book will take you on an exploratory journey through the PDF format, and the borb Python library.

Docbarcodes extracts 1D and 2D barcodes from scanned PDF documents or images. It can be used to automate extraction and processing of all kind of documents.

Document Web APIs made with Django Rest Framework

Website desenvolvido em Django para gerenciamento e upload de arquivos (.pdf).

distfit - Probability density fitting

An application which enables the users to perform simple yet intriguing PDF operations

Mayan EDMS is a document management system.

I can help you convert your images to pdf file.

Parser manager for parsing DOC, DOCX, PDF or HTML files

docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Convert emails without attachments to pdf and send as email

Gera um PDF, logo depois de você responder um questionário simples, e envia para o e-mail que você informar.

The project is investigating methods to extract human-marked data from document forms such as surveys and tests.

x-ray is a Python library for finding bad redactions in PDF documents.

CDLA: A Chinese document layout analysis (CDLA) dataset

NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding".

🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴

Document processing using transformers

txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.

Python lib for Simple PDF text extraction

WeasyPrint is a smart solution helping web developers to create PDF documents.

borb is a library for reading, creating and manipulating PDF files in python.

Fetch McDonald invoices from mailbox and merge them to one PDF file.

Python script that split PDF files.

Merge multiple PDF files into one.

Images to PDF Telegram Bot