220 Repositories
Python icloud-document-storage Libraries
[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links
LinkBERT: A Knowledgeable Language Model Pretrained with Document Links This repo provides the model, code & data of our paper: LinkBERT: Pretraining
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:
Multi-Type-TD-TSR Check it out on Source Code of our Paper: Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for
Stephen's Obsessive Note-Storage Engine.
Latest Release · PyPi Package · Issues · Changelog · License # Get Sonse and tell it where your notes are... $ pip install sonse $ export SONSE="$HOME
Source code for "A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction" @ NAACL 2022
TSAR Source code for NAACL 2022 paper: A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction. 🔥 Introduction We focus on extra
Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.
Deep-Learning-for-Text-Document-Classification Text classification is one of the popular tasks in NLP that allows a program to classify free-text docu
A simple MTProto-based bot that can download various types of media (10MB) on a local storage
TG Media Downloader Bot 🤖 A telegram bot based on Pyrogram that downloads on a local storage the following media files: animation, audio, document, p
Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval
BiDR Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval. Requirements torch==
Pgn2tex - Scripts to convert pgn files to latex document. Useful to build books or pdf from pgn studies
Pgn2Latex (WIP) A simple script to make pdf from pgn files and studies. It's sti
Searches a document for hash tags. Support multiple natural languages. Works in various contexts.
ht-getter Searches a document for hash tags. Supports multiple natural languages. Works in various contexts. This package uses a non-regex approach an
An interactive document scanner built in Python using OpenCV
The scanner takes a poorly scanned image, finds the corners of the document, applies the perspective transformation to get a top-down view of the document, sharpens the image, and applies an adaptive color threshold to clean up the image.
File-based TF-IDF: Calculates keywords in a document, using a word corpus.
File-based TF-IDF Calculates keywords in a document, using a word corpus. Why? Because I found myself with hundreds of plain text files, with no way t
Split given PDF document into 4 page groups and convert them to booklet format
PUTO: PDF to Booklet converter Split given PDF document into 4 page groups and convert them to booklet format. It creates a PDF like shown below: Fir
Word document generator with python
In this study, real world data is anonymized. The content is completely different, but the structure is the same. It was a script I prepared for the backend of a work using UiPath.
Oracle Cloud Infrastructure Object Storage fsspec implementation
Oracle Cloud Infrastructure Object Storage fsspec implementation The Oracle Cloud Infrastructure Object Storage service is an internet-scale, high-per
This Open-Source project is great for sensor capture and storage solutions.
Phase 1 This project helps developers in the creation of extended realities that communicate with Arduino and require the security of blockchain stora
DocEnTr: An end-to-end document image enhancement transformer
DocEnTR Description Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on to
This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2
GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.
A deep learning framework for historical document image analysis
DIVA-DAF Description A deep learning framework for historical document image analysis. How to run Install dependencies # clone project git clone https
Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.
DocEnTR Description Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on to
Keepsake is a Python library that uploads files and metadata (like hyperparameters) to Amazon S3 or Google Cloud Storage
Keepsake Version control for machine learning. Keepsake is a Python library that uploads files and metadata (like hyperparameters) to Amazon S3 or Goo
A simple library for temporary storage of small files
TemporaryStorage An simple library for temporary storage of small files. Navigation Install Usage In Python console As a standalone application List o
iCloudPy is a simple iCloud webservices wrapper library written in Python
iCloudPy 🤟 Please star this repository if you end up using the library. It will help me continue supporting this product. 🙏 iCloudPy is a simple iCl
A simple document management REST based API for collaboratively interacting with documents
documan_api A simple document management REST based API for collaboratively interacting with documents.
The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
Good news! Our new work exhibits state-of-the-art performances on DocUNet benchmark dataset: DocScanner: Robust Document Image Rectification with Prog
RedisJSON - a JSON data type for Redis
RedisJSON is a Redis module that implements ECMA-404 The JSON Data Interchange Standard as a native data type. It allows storing, updating and fetching JSON values from Redis keys (documents).
Improved file host. Change of interface and storage: 15 GB available.
File hosting v2 Improved file host. Change of interface and storage: 15 GB available. This app now uses the Google API to store, view, and delete file
AirDrive lets you store unlimited files to cloud for free. Upload & download files from your personal drive at any time using its super-fast API.
AirDrive lets you store unlimited files to cloud for free. Upload & download files from your personal drive at any time using its super-fast API.
Document manipulation detection with python
image manipulation detection task: -- tianchi function image segmentation salie
A very simple document database
DockieDb A simple in-memory document database. Installation Build the Wheel Fork or clone this repository and run python setup.py bdist_wheel in the r
A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population
DeepKE is a knowledge extraction toolkit supporting low-resource and document-level scenarios for entity, relation and attribute extraction. We provide comprehensive documents, Google Colab tutorials, and online demo for beginners.
PyTorch code for JEREX: Joint Entity-Level Relation Extractor
JEREX: "Joint Entity-Level Relation Extractor" PyTorch code for JEREX: "Joint Entity-Level Relation Extractor". For a description of the model and exp
Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.
Lbl2Vec Lbl2Vec is an algorithm for unsupervised document classification and unsupervised document retrieval. It automatically generates jointly embed
ioztat is a storage load analysis tool for OpenZFS
ioztat is a storage load analysis tool for OpenZFS. It provides iostat-like statistics at an individual dataset/zvol level.
Shelf DB is a tiny document database for Python to stores documents or JSON-like data
Shelf DB Introduction Shelf DB is a tiny document database for Python to stores documents or JSON-like data. Get it $ pip install shelfdb shelfquery S
This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.
Paragraph Aggregation Retrieval Model (PARM) for Dense Document-to-Document Retrieval This repository contains the code for the paper PARM: A Paragrap
This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".
Introduction This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents". If
Import entity definition document into SQLie3. Manage the entity. Also, create a "Create Table SQL file".
EntityDocumentMaker Version 1.00 After importing the entity definition (Excel file), store the data in sqlite3. エンティティ定義(Excelファイル)をインポートした後、データをsqlit
Python utility library for compositing PDF documents with reportlab.
pdfdoc-py Python utility library for compositing PDF documents with reportlab. Installation The pdfdoc-py package can be installed directly from the s
Develop and deploy applications with the Ionburst Cloud Python SDK.
Ionburst SDK for Python The Ionburst SDK for Python enables developers to easily integrate with Ionburst Cloud, building in ultra-secure and private o
A supercharged version of paperless: scan, index and archive all your physical documents
Paperless-ng Paperless (click me) is an application by Daniel Quinn and contributors that indexes your scanned documents and allows you to easily sear
Storage-optimizer - Identify potintial optimizations on the cloud storage accounts
Storage Optimizer Identify potintial optimizations on the cloud storage accounts
A python module for configuration of block devices
Blivet is a python module for system storage configuration. CI status Licence See COPYING Installation From Fedora repositories Blivet is available in
Neptune client library - integrate your Python scripts with Neptune
Lightweight experiment tracking tool for AI/ML individuals and teams. Fits any workflow. Neptune is a lightweight experiment logging/tracking tool tha
ElasticSearch ODM (Object Document Mapper) for Python - pip install esengine
esengine - The Elasticsearch Object Document Mapper esengine is an ODM (Object Document Mapper) it maps Python classes in to Elasticsearch index/doc_t
Essential Document Generator
Essential Document Generator Dead Simple Document Generation Whether it's testing database performance or a new web interface, we've all needed a dead
A document format conversion service based on Pandoc.
reformed Document format conversion service based on Pandoc. Usage The API specification for the Reformed server is as follows: GET /api/v1/formats: L
Lightweight, zero-dependency proxy and storage RTSP server
python-rtsp-server Python-rtsp-server is a lightweight, zero-dependency proxy and storage server for several IP-cameras and multiple clients. Features
Python document object mapper (load python object from JSON and vice-versa)
lupin is a Python JSON object mapper lupin is meant to help in serializing python objects to JSON and unserializing JSON data to python objects. Insta
File storage with API access. Used as a part of the Swipio project
API File storage File storage with API access. Used as a part of the Swipio project 📝 About The Project File storage allows you to upload and downloa
Complete system for facial identity system
Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring
A simple CLI application helps you to find giant files that are eating up your system storage
Large file finder Sometimes it's very hard to find if some giant files are eating up your system storage. We might need to hunt those down. This simpl
A distributed block-based data storage and compute engine
Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.
Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring
Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Hiring We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-traine
Longformer: The Long-Document Transformer
Longformer Longformer and LongformerEncoderDecoder (LED) are pretrained transformer models for long documents. ***** New December 1st, 2020: Longforme
An NFTGenerator to generate NFTs and send them to nft.storage
NFTGenerator Table of Contents Overview Installation Introduction Features Reflection Issues & bug reports Show your support Credits Overview The NFTG
A toolkit for document-level event extraction, containing some SOTA model implementations
❤️ A Toolkit for Document-level Event Extraction with & without Triggers Hi, there 👋 . Thanks for your stay in this repo. This project aims at buildi
[AAAI 2022] Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification
Sparse Structure Learning via Graph Neural Networks for inductive document classification Make graph dataset create co-occurrence graph for datasets.
A toolkit for document-level event extraction, containing some SOTA model implementations
Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker Source code for ACL-IJCNLP 2021 Long paper: Document-le
TileDB-Py is a Python interface to the TileDB Storage Engine.
TileDB-Py TileDB-Py is a Python interface to the TileDB Storage Engine. Quick Links Installation Build Instructions TileDB Documentation Python API re
Generate modern Python clients from OpenAPI
openapi-python-client Generate modern Python clients from OpenAPI 3.x documents. This generator does not support OpenAPI 2.x FKA Swagger. If you need
🦎 A NeoVim plugin for highlighting visual selections like in a normal document editor!
🦎 HighStr.nvim A NeoVim plugin for highlighting visual selections like in a normal document editor! Demo TL;DR HighStr.nvim is a NeoVim plugin writte
Ballcone is a fast and lightweight server-side Web analytics solution.
Ballcone Ballcone is a fast and lightweight server-side Web analytics solution. It requires no JavaScript on your website. Screenshots Design Goals Si
Persistent, stale-free, local and cross-machine caching for Python functions.
Persistent, stale-free, local and cross-machine caching for Python functions.
GTK and Python based, system performance and usage monitoring tool
System Monitoring Center GTK3 and Python 3 based, system performance and usage monitoring tool. Features: Detailed system performance and usage usage
Cached file system for online resources in Python
Minato Cache & file system for online resources in Python Features Minato enables you to: Download & cache online recsources minato supports the follo
Flask extension that provides integration with Azure Storage
Flask-Azure-Storage A Flask extension that provides integration with Azure Storage Table of Contents Flask-Azure-Storage Install Usage Examples Create
Bnagla hand written document digiiztion
Bnagla hand written document digiiztion This repo addresses the problem of digiizing hand written documents in Bangla. Documents have definite fields
OpenSource Poc && Vulnerable-Target Storage Box.
reapoc OpenSource Poc && Vulnerable-Target Storage Box. We are aming to collect different normalized poc and the vulerable target to verify it. Now re
The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study
ECIR Reproducibility Paper: Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study This code corresponds to the reproducibility
TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"
Hierarchical Attention Networks for Document Classification This is an implementation of the paper Hierarchical Attention Networks for Document Classi
DUE: End-to-End Document Understanding Benchmark
This is the repository that provide tools to download data, reproduce the baseline results and evaluation. What can you achieve with this guide Based
Microsoft Azure Storage Library for Python
Microsoft Azure Storage Library for Python
Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"
Graph Neural Topic Model (GNTM) This is the pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Persp
Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API
Dominate Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure
A lightweight and fast-to-use Markdown document generator based on Python
A lightweight and fast-to-use Markdown document generator based on Python
MMDA - multimodal document analysis
MMDA - multimodal document analysis
PhD document for navlab
PhD_document_for_navlab The project contains the relative software documents which I developped or used during my PhD period. It includes: FLVIS. A st
A Discord token grabber executing in a Microsoft Document.
🦊 Rage 🦊 Rage is a tool written in Python3 allowing you to inject a Python3 complete Discord token grabber (Riot) script in a Microsoft Document usi
Malicious Document IoC Extractor is a collection of scripts that helps extracting IoCs from various maldoc families.
MDIExtractor Malicious Document IoC Extractor (MDIExtractor) is a collection of scripts that helps extracting IoCs from various maldoc families. Prere
Learning Logic Rules for Document-Level Relation Extraction
LogiRE Learning Logic Rules for Document-Level Relation Extraction We propose to introduce logic rules to tackle the challenges of doc-level RE. Equip
Self-Supervised Document-to-Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference
Self-Supervised Document Similarity Ranking (SDR) via Contextualized Language Models and Hierarchical Inference This repo is the implementation for SD
Detectron2 for Document Layout Analysis
Detectron2 trained on PubLayNet dataset This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Det
AI-powered literature discovery and review engine for medical/scientific papers
AI-powered literature discovery and review engine for medical/scientific papers paperai is an AI-powered literature discovery and review engine for me
Herramienta para transferir eventos de Shadowserver REST API hacia Azure Blob Storage.
Herramienta para transferir eventos de Shadowserver REST API hacia Azure Blob Storage.
Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)
DocFormer - PyTorch Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for t
Herramienta para transferir eventos de Sucuri WAF hacia Azure Blob Storage.
Transfiere eventos de Sucuri hacia Azure Blob Storage Script para transferir eventos del Sucuri Web Application Firewall (WAF) hacia Azure Blob Storag
TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL, and utterance id
TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL
This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.
Text Summarizer This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text. Team Members This mini-project was
Console to handle object storage using JSON serialization and deserealization.
Console to handle object storage using JSON serialization and deserealization. This is a team project to develop a Python3 console that emulates the AirBnb object management.
Document blur detection based on Laplacian operator and text detection.
Document Blur Detection For general blurred image, using the variance of Laplacian operator is a good solution. But as for the blur detection of docum
A System Metrics Monitoring Tool Built using Python3 , rabbitmq,Grafana and InfluxDB. Setup using docker compose. Use to monitor system performance with graphical interface of grafana , storage of influxdb and message queuing of rabbitmq
SystemMonitoringRabbitMQGrafanaInflux This repository has code to setup a system monitoring tool The tools used are the follows Python3.6 Docker Rabbi
Placeholders is a single-unit storage solution for your Frontend.
Placeholder Placeholders is a single-unit file storage solution for your Frontend. Why Placeholder? Generally, when a website/service requests for fil
Python-based tools for document analysis and OCR
ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so
Qtas(Quite a Storage)is an experimental distributed storage system developed by Q-team in BJFU Advanced Computer Network sources.
Qtas(Quite a Storage)is a experimental distributed storage system developed by Q-team in BJFU Advanced Computer Network sources.
Qtas(Quite a Storage)is an experimental distributed storage system developed by Q-team in BJFU Advanced Computer Network sources.
Qtas(Quite a Storage)is a experimental distributed storage system developed by Q-team in BJFU Advanced Computer Network sources.
DNA Storage Simulator that analyzes and simulates DNA storage
DNA Storage Simulator This monorepository contains code for a research project by Mayank Keoliya and supervised by Djordje Jevdjic, that analyzes and
The fastest way to copy to (not from) high speed flash storage.
FastestCopy The fastest way to copy to (not from) high speed flash storage. This is about 3-6x faster than file copy on explorer.exe to usb flash driv
Enjoy Discords Unlimited Storage
Discord Storage V.3.5 (Beta) Made by BoKa Enjoy Discords free and unlimited storage... Prepare: Clone this from Github, make sure there either a folde
Ralph is a command-line tool to fetch, extract, convert and push your tracking logs from various storage backends to your LRS or any other compatible storage or database backend.
Ralph is a command-line tool to fetch, extract, convert and push your tracking logs (aka learning events) from various storage backends to your