182 Python Document-layout Libraries

[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

LinkBERT: A Knowledgeable Language Model Pretrained with Document Links This repo provides the model, code & data of our paper: LinkBERT: Pretraining

264 Jan 1, 2023

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Multi-Type-TD-TSR Check it out on Source Code of our Paper: Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for

178 Dec 27, 2022

Source code for "A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction" @ NAACL 2022

TSAR Source code for NAACL 2022 paper: A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction. 🔥 Introduction We focus on extra

21 Sep 24, 2022

Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.

Deep-Learning-for-Text-Document-Classification Text classification is one of the popular tasks in NLP that allows a program to classify free-text docu

2 Mar 17, 2022

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

BiDR Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval. Requirements torch==

11 Oct 20, 2022

Pgn2tex - Scripts to convert pgn files to latex document. Useful to build books or pdf from pgn studies

Pgn2Latex (WIP) A simple script to make pdf from pgn files and studies. It's sti

12 Jul 23, 2022

I3-master-layout - Simple master and stack layout script

Simple master and stack layout script | ------ | ----- | | | | | Ma

18 Dec 5, 2022

Searches a document for hash tags. Support multiple natural languages. Works in various contexts.

ht-getter Searches a document for hash tags. Supports multiple natural languages. Works in various contexts. This package uses a non-regex approach an

1 Mar 1, 2022

An interactive document scanner built in Python using OpenCV

The scanner takes a poorly scanned image, finds the corners of the document, applies the perspective transformation to get a top-down view of the document, sharpens the image, and applies an adaptive color threshold to clean up the image.

1 Feb 12, 2022

BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents

BROS (BERT Relying On Spatiality) is a pre-trained language model focusing on text and layout for better key information extraction from documents. Given the OCR results of the document image, which are text and bounding box pairs, it can perform various key information extraction tasks, such as extracting an ordered item list from receipts

94 Dec 30, 2022

File-based TF-IDF: Calculates keywords in a document, using a word corpus.

File-based TF-IDF Calculates keywords in a document, using a word corpus. Why? Because I found myself with hundreds of plain text files, with no way t

1 Feb 11, 2022

Split given PDF document into 4 page groups and convert them to booklet format

PUTO: PDF to Booklet converter Split given PDF document into 4 page groups and convert them to booklet format. It creates a PDF like shown below: Fir

3 Mar 12, 2022

Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning

Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning Code for the paper Harmonious Textual Layout Generation over Nat

7 Aug 9, 2022

split-manga-pages: a command line utility written in Python that converts your double-page layout manga to single-page layout.

split-manga-pages split-manga-pages is a command line utility written in Python that converts your double-page layout manga (or any images in double p

3 May 24, 2022

Word document generator with python

In this study, real world data is anonymized. The content is completely different, but the structure is the same. It was a script I prepared for the backend of a work using UiPath.

3 Jan 30, 2022

DocEnTr: An end-to-end document image enhancement transformer

DocEnTR Description Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on to

19 Jan 29, 2022

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.

1 Jan 28, 2022

A deep learning framework for historical document image analysis

DIVA-DAF Description A deep learning framework for historical document image analysis. How to run Install dependencies # clone project git clone https

9 Aug 4, 2022

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

DocEnTR Description Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on to

74 Jan 7, 2023

A simple document management REST based API for collaboratively interacting with documents

documan_api A simple document management REST based API for collaboratively interacting with documents.

1 Jan 22, 2022

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Good news! Our new work exhibits state-of-the-art performances on DocUNet benchmark dataset: DocScanner: Robust Document Image Rectification with Prog

231 Dec 26, 2022

RedisJSON - a JSON data type for Redis

RedisJSON is a Redis module that implements ECMA-404 The JSON Data Interchange Standard as a native data type. It allows storing, updating and fetching JSON values from Redis keys (documents).

3.4k Dec 29, 2022

Document manipulation detection with python

image manipulation detection task: -- tianchi function image segmentation salie

3 Aug 22, 2022

A very simple document database

DockieDb A simple in-memory document database. Installation Build the Wheel Fork or clone this repository and run python setup.py bdist_wheel in the r

1 Jan 16, 2022

A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

DeepKE is a knowledge extraction toolkit supporting low-resource and document-level scenarios for entity, relation and attribute extraction. We provide comprehensive documents, Google Colab tutorials, and online demo for beginners.

1.6k Jan 5, 2023

PyTorch code for JEREX: Joint Entity-Level Relation Extractor

JEREX: "Joint Entity-Level Relation Extractor" PyTorch code for JEREX: "Joint Entity-Level Relation Extractor". For a description of the model and exp

50 Dec 1, 2022

Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.

Lbl2Vec Lbl2Vec is an algorithm for unsupervised document classification and unsupervised document retrieval. It automatically generates jointly embed

61 Dec 20, 2022

Shelf DB is a tiny document database for Python to stores documents or JSON-like data

Shelf DB Introduction Shelf DB is a tiny document database for Python to stores documents or JSON-like data. Get it $ pip install shelfdb shelfquery S

35 Nov 3, 2022

This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

Paragraph Aggregation Retrieval Model (PARM) for Dense Document-to-Document Retrieval This repository contains the code for the paper PARM: A Paragrap

33 Aug 26, 2022

This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

Introduction This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents". If

0 Jan 11, 2022

🔤 Measure edit distance based on keyboard layout

clavier Measure edit distance based on keyboard layout. Table of contents Table of contents Introduction Installation User guide Keyboard layouts Dist

42 Dec 18, 2022

Import entity definition document into SQLie3. Manage the entity. Also, create a "Create Table SQL file".

EntityDocumentMaker Version 1.00 After importing the entity definition (Excel file), store the data in sqlite3. エンティティ定義（Excelファイル）をインポートした後、データをsqlit

1 Jan 9, 2022

Python utility library for compositing PDF documents with reportlab.

pdfdoc-py Python utility library for compositing PDF documents with reportlab. Installation The pdfdoc-py package can be installed directly from the s

1 Jan 6, 2022

Wallc - Calculate the layout on the wall to hang up pictures

wallc Calculate the layout on the wall to hang up pictures. Installation pip install git+https://github.com/trbznk/wallc.git Getting Started Currently

68 Sep 9, 2022

A supercharged version of paperless: scan, index and archive all your physical documents

Paperless-ng Paperless (click me) is an application by Daniel Quinn and contributors that indexes your scanned documents and allows you to easily sear

5.3k Jan 9, 2023

ElasticSearch ODM (Object Document Mapper) for Python - pip install esengine

esengine - The Elasticsearch Object Document Mapper esengine is an ODM (Object Document Mapper) it maps Python classes in to Elasticsearch index/doc_t

109 Nov 22, 2022

Essential Document Generator

Essential Document Generator Dead Simple Document Generation Whether it's testing database performance or a new web interface, we've all needed a dead

59 Nov 11, 2022

A document format conversion service based on Pandoc.

reformed Document format conversion service based on Pandoc. Usage The API specification for the Reformed server is as follows: GET /api/v1/formats: L

3 Jul 18, 2022

Python document object mapper (load python object from JSON and vice-versa)

lupin is a Python JSON object mapper lupin is meant to help in serializing python objects to JSON and unserializing JSON data to python objects. Insta

24 Nov 9, 2022

Fastest Gephi's ForceAtlas2 graph layout algorithm implemented for Python and NetworkX

ForceAtlas2 for Python A port of Gephi's Force Atlas 2 layout algorithm to Python 2 and Python 3 (with a wrapper for NetworkX and igraph). This is the

227 Jan 5, 2023

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Hiring We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-traine

7.8k Jan 9, 2023

Longformer: The Long-Document Transformer

Longformer Longformer and LongformerEncoderDecoder (LED) are pretrained transformer models for long documents. ***** New December 1st, 2020: Longforme

1.6k Dec 29, 2022

A toolkit for document-level event extraction, containing some SOTA model implementations

❤️ A Toolkit for Document-level Event Extraction with & without Triggers Hi, there 👋 . Thanks for your stay in this repo. This project aims at buildi

159 Dec 22, 2022

[AAAI 2022] Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Sparse Structure Learning via Graph Neural Networks for inductive document classification Make graph dataset create co-occurrence graph for datasets.

16 Dec 22, 2022

A toolkit for document-level event extraction, containing some SOTA model implementations

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker Source code for ACL-IJCNLP 2021 Long paper: Document-le

84 Dec 15, 2022

Generate modern Python clients from OpenAPI

openapi-python-client Generate modern Python clients from OpenAPI 3.x documents. This generator does not support OpenAPI 2.x FKA Swagger. If you need

555 Jan 2, 2023

🦎 A NeoVim plugin for highlighting visual selections like in a normal document editor!

🦎 HighStr.nvim A NeoVim plugin for highlighting visual selections like in a normal document editor! Demo TL;DR HighStr.nvim is a NeoVim plugin writte

222 Jan 3, 2023

Bnagla hand written document digiiztion

Bnagla hand written document digiiztion This repo addresses the problem of digiizing hand written documents in Bangla. Documents have definite fields

1 Dec 10, 2021

The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

ECIR Reproducibility Paper: Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study This code corresponds to the reproducibility

3 Mar 31, 2022

Use Seaborn to visualize interpret the byte layout of Solana account types

solana-account-vis Use Seaborn to visually interpret the byte layout of Solana account types Usage from account_visualization import generate_account_

15 Aug 25, 2022

TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"

Hierarchical Attention Networks for Document Classification This is an implementation of the paper Hierarchical Attention Networks for Document Classi

83 Dec 5, 2022

DUE: End-to-End Document Understanding Benchmark

This is the repository that provide tools to download data, reproduce the baseline results and evaluation. What can you achieve with this guide Based

21 Dec 29, 2022

A WYSIWYG layout editor for Jupyter widgets

Based on the React library FlexLayout, ipyflex allows you to compose the sophisticated dashboard layouts from existing Jupyter widgets without coding. It supports multiple tabs, resizable cards, drag-and-drop layout, save dashboard template to disk, and many more.

93 Nov 24, 2022

Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Graph Neural Topic Model (GNTM) This is the pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Persp

8 Sep 14, 2022

Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API

Dominate Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure

1.5k Jan 9, 2023

A lightweight and fast-to-use Markdown document generator based on Python

1 Jan 10, 2022

MMDA - multimodal document analysis

75 Jan 4, 2023

A framework to create reusable Dash layout.

dash_component_template A framework to create reusable Dash layout.

4 Aug 4, 2022

PhD document for navlab

PhD_document_for_navlab The project contains the relative software documents which I developped or used during my PhD period. It includes: FLVIS. A st

9 Feb 21, 2022

A Discord token grabber executing in a Microsoft Document.

🦊 Rage 🦊 Rage is a tool written in Python3 allowing you to inject a Python3 complete Discord token grabber (Riot) script in a Microsoft Document usi

73 Nov 3, 2022

Malicious Document IoC Extractor is a collection of scripts that helps extracting IoCs from various maldoc families.

MDIExtractor Malicious Document IoC Extractor (MDIExtractor) is a collection of scripts that helps extracting IoCs from various maldoc families. Prere

14 Nov 25, 2022

Learning Logic Rules for Document-Level Relation Extraction

LogiRE Learning Logic Rules for Document-Level Relation Extraction We propose to introduce logic rules to tackle the challenges of doc-level RE. Equip

41 Dec 26, 2022

Self-Supervised Document-to-Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference

Self-Supervised Document Similarity Ranking (SDR) via Contextualized Language Models and Hierarchical Inference This repo is the implementation for SD

36 Nov 28, 2022

Detectron2 for Document Layout Analysis

Detectron2 trained on PubLayNet dataset This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Det

163 Nov 21, 2022

A python program for visualizing MIDI files, and displaying them in a spiral layout

SpiralMusic_python A python program for visualizing MIDI files, and displaying them in a spiral layout For a hardware version using Teensy & LED displ

6 Nov 23, 2022

AI-powered literature discovery and review engine for medical/scientific papers

AI-powered literature discovery and review engine for medical/scientific papers paperai is an AI-powered literature discovery and review engine for me

819 Dec 30, 2022

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)

DocFormer - PyTorch Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for t

171 Jan 6, 2023

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL, and utterance id

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL

3 Dec 26, 2022

This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

Text Summarizer This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text. Team Members This mini-project was

1 Nov 16, 2021

gdsfactory is an EDA (electronics design automation) tool to Layout Integrated Circuits.

gdsfactory 3.5.5 gdsfactory is an EDA (electronics design automation) tool to Layout Integrated Circuits. It is build on top of phidl gdspy and klayou

147 Jan 4, 2023

Keyboard Layout Change - Extension for Ulauncher

4 Aug 26, 2022

Document blur detection based on Laplacian operator and text detection.

Document Blur Detection For general blurred image, using the variance of Laplacian operator is a good solution. But as for the blur detection of docum

5 Oct 20, 2022

Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

3.2k Dec 31, 2022

A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV.

DcoumentScanner A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV. Directly install the .exe file to inst

1 Oct 29, 2021

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation Official Code Repository for the paper "Unsupervised Documen

2 Oct 26, 2021

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

Auto-Research A no-code utility to generate a detailed well-cited survey with topic clustered sections (draft paper format) and other interesting arti

20 Dec 14, 2022

Table automatically extraction from PDF Document

PDF Table Extractor Table automatically extraction from PDF Document Our Icon 📌 Name : PDF Table Extractor 📌 Authors : Minku Koo Jiyong Park 📌 Deve

1 Jan 10, 2022

Alternative layout visualizer for ZSA Moonlander keyboard

General info This is a keyboard layout visualizer for ZSA Moonlander keyboard (because I didn't find their Oryx or their training tool particularly us

10 Jul 19, 2022

Key information extraction from invoice document with Graph Convolution Network

Key Information Extraction from Scanned Invoices Key information extraction from invoice document with Graph Convolution Network Related blog post fro

39 Dec 16, 2022

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

498 Dec 24, 2022

A basic layout of atm working of my local database

Software for working Banking service 😄 This project was developed for Banking service. mysql server is required To have mysql server on your system u

1 Oct 21, 2021

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu

114 Jan 6, 2023

A basic layout of atm working of my local database

Software for working Banking service 😄 This project was developed for Banking service. mysql server is required To have mysql server on your system u

1 Oct 21, 2021

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu

111 Dec 18, 2022

One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

Introduction One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing". Users

18 Dec 11, 2022

Styled Handwritten Text Generation with Transformers (ICCV 21)

⚡ Handwriting Transformers [PDF] Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan & Mubarak Shah Abstract: We

85 Dec 22, 2022

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

18 Nov 28, 2022

This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

SBEVNet: End-to-End Deep Stereo Layout Estimation This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by D

19 Dec 17, 2022

PyTorch implementation of "LayoutTransformer: Layout Generation and Completion with Self-attention"

PyTorch implementation of "LayoutTransformer: Layout Generation and Completion with Self-attention" to appear in ICCV 2021

75 Dec 23, 2022

A cross-document event and entity coreference resolution system, trained and evaluated on the ECB+ corpus.

A Comprehensive Comparison of Word Embeddings in Event & Entity Coreference Resolution. Introduction This repo contains experimental code derived from

2 May 9, 2022

Docbarcodes extracts 1D and 2D barcodes from scanned PDF documents or images. It can be used to automate extraction and processing of all kind of documents.

Intro Barcodes are being used in many documents or forms to enable machine reading capabilities and reduce manual processing effort. Simple 1D barcode

3 Jun 18, 2022

Document Web APIs made with Django Rest Framework

DRF Docs Document Web APIs made with Django Rest Framework. View Demo Contributors Wanted: Do you like this project? Using it? Let's make it better! S

626 Nov 20, 2022

[CVPR'21] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation Weixiang Yang, Qi Li, Wenxi Liu, Yuanlong Yu, Y

118 Dec 26, 2022

Mayan EDMS is a document management system.

Mayan EDMS is a document management system. Its main purpose is to store, introspect, and categorize files, with a strong emphasis on preserving the contextual and business information of documents. It can also OCR, preview, label, sign, send, and receive thoses files.

3 Oct 2, 2021

Project template layout for Django 3.0+

Django 3.0+ project template This is a simple Django 3.0+ project template with my preferred setup. Most Django project templates make way too many as

649 Dec 30, 2022

docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

1.5k Jan 1, 2023

Python Document-layout Resources

Python document-layout Libraries

[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Source code for "A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction" @ NAACL 2022

Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

Pgn2tex - Scripts to convert pgn files to latex document. Useful to build books or pdf from pgn studies

I3-master-layout - Simple master and stack layout script

Searches a document for hash tags. Support multiple natural languages. Works in various contexts.

An interactive document scanner built in Python using OpenCV

BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents

File-based TF-IDF: Calculates keywords in a document, using a word corpus.

Split given PDF document into 4 page groups and convert them to booklet format

Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning

split-manga-pages: a command line utility written in Python that converts your double-page layout manga to single-page layout.

Word document generator with python

DocEnTr: An end-to-end document image enhancement transformer

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

A deep learning framework for historical document image analysis

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

A simple document management REST based API for collaboratively interacting with documents

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

RedisJSON - a JSON data type for Redis

Document manipulation detection with python

A very simple document database

A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

PyTorch code for JEREX: Joint Entity-Level Relation Extractor

Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.

Shelf DB is a tiny document database for Python to stores documents or JSON-like data

This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

🔤 Measure edit distance based on keyboard layout

Import entity definition document into SQLie3. Manage the entity. Also, create a "Create Table SQL file".

Python utility library for compositing PDF documents with reportlab.

Wallc - Calculate the layout on the wall to hang up pictures

A supercharged version of paperless: scan, index and archive all your physical documents

ElasticSearch ODM (Object Document Mapper) for Python - pip install esengine

Essential Document Generator

A document format conversion service based on Pandoc.

Python document object mapper (load python object from JSON and vice-versa)

Fastest Gephi's ForceAtlas2 graph layout algorithm implemented for Python and NetworkX

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Longformer: The Long-Document Transformer

A toolkit for document-level event extraction, containing some SOTA model implementations

[AAAI 2022] Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

A toolkit for document-level event extraction, containing some SOTA model implementations

Generate modern Python clients from OpenAPI

🦎 A NeoVim plugin for highlighting visual selections like in a normal document editor!

Bnagla hand written document digiiztion

The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

Use Seaborn to visualize interpret the byte layout of Solana account types

TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"

DUE: End-to-End Document Understanding Benchmark

A WYSIWYG layout editor for Jupyter widgets

Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API

A lightweight and fast-to-use Markdown document generator based on Python

MMDA - multimodal document analysis

A framework to create reusable Dash layout.

PhD document for navlab

A Discord token grabber executing in a Microsoft Document.

Malicious Document IoC Extractor is a collection of scripts that helps extracting IoCs from various maldoc families.

Learning Logic Rules for Document-Level Relation Extraction

Self-Supervised Document-to-Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference

Detectron2 for Document Layout Analysis

A python program for visualizing MIDI files, and displaying them in a spiral layout

AI-powered literature discovery and review engine for medical/scientific papers

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL, and utterance id

This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

gdsfactory is an EDA (electronics design automation) tool to Layout Integrated Circuits.

Keyboard Layout Change - Extension for Ulauncher

Document blur detection based on Laplacian operator and text detection.

Python-based tools for document analysis and OCR

A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV.

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

Table automatically extraction from PDF Document

Alternative layout visualizer for ZSA Moonlander keyboard