166 Repositories
Python scientific-documents Libraries
Docbarcodes extracts 1D and 2D barcodes from scanned PDF documents or images. It can be used to automate extraction and processing of all kind of documents.
Intro Barcodes are being used in many documents or forms to enable machine reading capabilities and reduce manual processing effort. Simple 1D barcode
Scientific Computation Methods in C and Python (Open for Hacktoberfest 2021)
Sci - cpy README is a stub. Do expand it. Objective This repository is meant to be a ready reference for scientific computation methods. Do ⭐ it if yo
SCICO is a Python package for solving the inverse problems that arise in scientific imaging applications.
Scientific Computational Imaging COde (SCICO) SCICO is a Python package for solving the inverse problems that arise in scientific imaging applications
x-ray is a Python library for finding bad redactions in PDF documents.
A tool to detect whether a PDF has a bad redaction
Download YOUR files, documents from vk.
vk-documents-downloader Кароч эта симпл херня качает все ВАШИ документы с вк. Или я еблан, но в гх и тмб гугле я подобного не нашел. py main.py Login:
Scitizen - Help scientific research for the benefit of mankind and humanity 🔬
Scitizen - Help scientific research for the benefit of mankind and humanity 🔬 Scitizen has been built from the ground up to give everyone the possibi
WeasyPrint is a smart solution helping web developers to create PDF documents.
WeasyPrint is a smart solution helping web developers to create PDF documents. It turns simple HTML pages into gorgeous statistical reports, invoices, tickets…
Generate YARA rules for OOXML documents using ZIP local header metadata.
apooxml Generate YARA rules for OOXML documents using ZIP local header metadata. To learn more about this tool and the methodology behind it, check ou
Official Pytorch implementation of ICLR 2018 paper Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge.
Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge: Official Pytorch implementation of ICLR 2018 paper Deep Learning for Phy
2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.
TableMASTER-mmocr Contents About The Project Method Description Dependency Getting Started Prerequisites Installation Usage Data preprocess Train Infe
IDRLnet, a Python toolbox for modeling and solving problems through Physics-Informed Neural Network (PINN) systematically.
IDRLnet IDRLnet is a machine learning library on top of PyTorch. Use IDRLnet if you need a machine learning library that solves both forward and inver
Command line program to download documents from web portals.
command line document download made easy Highlights list available documents in json format or download them filter documents using string matching re
DRIFT is a tool for Diachronic Analysis of Scientific Literature.
About DRIFT is a tool for Diachronic Analysis of Scientific Literature. The application offers user-friendly and customizable utilities for two modes:
document organizer with tags and full-text-search, in a simple and clean sqlite3 schema
document organizer with tags and full-text-search, in a simple and clean sqlite3 schema
Extracting Summary Knowledge Graphs from Long Documents
GraphSum This repo contains the data and code for the G2G model in the paper: Extracting Summary Knowledge Graphs from Long Documents. The other basel
Scientific color maps and standardization tools
Scicomap is a package that provides scientific color maps and tools to standardize your favourite color maps if you don't like the built-in ones. Scicomap currently provides sequential, bi-sequential, diverging, circular, qualitative and miscellaneous color maps. You can easily draw examples, compare the rendering, see how colorblind people will perceive the color maps. I will illustrate the scicomap capabilities below.
Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.
Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.
PyMuPDF is a Python binding with support for MuPDF
PyMuPDF is a Python binding with support for MuPDF (current version 1.18.*), a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, Inc.
A scientific and useful toolbox, which contains practical and effective long-tail related tricks with extensive experimental results
Bag of tricks for long-tailed visual recognition with deep convolutional neural networks This repository is the official PyTorch implementation of AAA
This is a script to forward forward large number of documents to another telegram channel.
ChannelForward 😇 This is a Script to Forward Large Number of Documents to Another Telegram Channel. If You Try to Forward Very Large Number of Files
Telegram Bot to store Posts and Documents and it can Access by Special Links.
File-sharing-Bot Telegram Bot to store Posts and Documents and it can Access by Special Links. I Guess This Will Be Usefull For Many People..... 😇 .
Python with the scientific stack, compiled to WebAssembly.
Pyodide may be used in any context where you want to run Python inside a web browser.
SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)
SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks Molecular interaction networks are powerful resources for the discovery. While dee
xitorch: differentiable scientific computing library
xitorch is a PyTorch-based library of differentiable functions and functionals that can be widely used in scientific computing applications as well as deep learning.
monolish: MONOlithic Liner equation Solvers for Highly-parallel architecture
monolish is a linear equation solver library that monolithically fuses variable data type, matrix structures, matrix data format, vendor specific data transfer APIs, and vendor specific numerical algebra libraries.
ArrayFire: a general purpose GPU library.
ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i
CUDA integration for Python, plus shiny features
PyCUDA lets you access Nvidia's CUDA parallel computation API from Python. Several wrappers of the CUDA API already exist-so what's so special about P
ScanTailor Advanced is the version that merges the features of the ScanTailor Featured and ScanTailor Enhanced versions, brings new ones and fixes.
ScanTailor Advanced The ScanTailor version that merges the features of the ScanTailor Featured and ScanTailor Enhanced versions, brings new ones and f
A machine learning software for extracting information from scholarly documents
GROBID GROBID documentation Visit the GROBID documentation for more detailed information. Summary GROBID (or Grobid, but not GroBid nor GroBiD) means
A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.
The project is based on older versions of tesseract and other tools, and is now superseded by another project which allows for more granular control o
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.
Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des
An application of high resolution GANs to dewarp images of perturbed documents
Docuwarp This project is focused on dewarping document images through the usage of pix2pixHD, a GAN that is useful for general image to image translat
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
Spack Spack is a multi-platform package manager that builds and installs multiple versions and configurations of software. It works on Linux, macOS, a
Render reMarkable documents to PDF
rmrl: reMarkable Rendering Library rmrl is a Python library for rendering reMarkable documents to PDF files. It takes the original PDF document and th
Sequential model-based optimization with a `scipy.optimize` interface
Scikit-Optimize Scikit-Optimize, or skopt, is a simple and efficient library to minimize (very) expensive and noisy black-box functions. It implements
[HELP REQUESTED] Generalized Additive Models in Python
pyGAM Generalized Additive Models in Python. Documentation Official pyGAM Documentation: Read the Docs Building interpretable models with Generalized
Veusz scientific plotting application
Veusz 3.3.1 Veusz is a scientific plotting package. It is designed to produce publication-ready PDF or SVG output. Graphs are built-up by combining pl
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
Spack Spack is a multi-platform package manager that builds and installs multiple versions and configurations of software. It works on Linux, macOS, a
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.
ReproZip ReproZip is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used comm
3D visualization of scientific data in Python
Mayavi: 3D visualization of scientific data in Python Mayavi docs: http://docs.enthought.com/mayavi/mayavi/ TVTK docs: http://docs.enthought.com/mayav
An interactive explorer for single-cell transcriptomics data
an interactive explorer for single-cell transcriptomics data cellxgene (pronounced "cell-by-gene") is an interactive data explorer for single-cell tra
Scan, index, and archive all of your paper documents
[ en | de | el ] Important news about the future of this project It's been more than 5 years since I started this project on a whim as an effort to tr
A full spaCy pipeline and models for scientific/biomedical documents.
This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds
Python implementation of TextRank for phrase extraction and summarization of text documents
PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document
Module for automatic summarization of text documents and HTML pages.
Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim
3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)
PyVista Deployment Build Status Metrics Citation License Community 3D plotting and mesh analysis through a streamlined interface for the Visualization
Fast data visualization and GUI tools for scientific / engineering applications
PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h
A fast MDCT implementation using SciPy and FFTs
MDCT A fast MDCT implementation using SciPy and FFTs Installation As usual pip install mdct Dependencies NumPy SciPy STFT Usage import mdct spectrum
A full spaCy pipeline and models for scientific/biomedical documents.
This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds
Python implementation of TextRank for phrase extraction and summarization of text documents
PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document
Module for automatic summarization of text documents and HTML pages.
Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim
3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)
PyVista Deployment Build Status Metrics Citation License Community 3D plotting and mesh analysis through a streamlined interface for the Visualization
Fast data visualization and GUI tools for scientific / engineering applications
PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Parsel Parsel is a BSD-licensed Python library to extract and remove data from HTML and XML using XPath and CSS selectors, optionally combined with re
Provides syntax for Python-Markdown which allows for the inclusion of the contents of other Markdown documents.
Markdown-Include This is an extension to Python-Markdown which provides an "include" function, similar to that found in LaTeX (and also the C pre-proc
Spyder - The Scientific Python Development Environment
Spyder is a powerful scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts. It offers a unique combination of the advanced editing, analysis, debugging, and profiling functionality of a comprehensive development tool with the data exploration, interactive execution, deep inspection, and beautiful visualization capabilities of a scientific package.
On Generating Extended Summaries of Long Documents
ExtendedSumm This repository contains the implementation details and datasets used in On Generating Extended Summaries of Long Documents paper at the
texlive expressions for documents
tex2nix Generate Texlive environment containing all dependencies for your document rather than downloading gigabytes of texlive packages. Installation
Search for documents in a domain through Google. The objective is to extract metadata
MetaFinder - Metadata search through Google _____ __ ___________ .__ .___ / \
Module for automatic summarization of text documents and HTML pages.
Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim
Create Open XML PowerPoint documents in Python
python-pptx is a Python library for creating and updating PowerPoint (.pptx) files. A typical use would be generating a customized PowerPoint presenta
Fast data visualization and GUI tools for scientific / engineering applications
PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h
Crab is a flexible, fast recommender engine for Python that integrates classic information filtering recommendation algorithms in the world of scientific Python packages (numpy, scipy, matplotlib).
Crab - A Recommendation Engine library for Python Crab is a flexible, fast recommender engine for Python that integrates classic information filtering r
Standards-compliant library for parsing and serializing HTML documents and fragments in Python
html5lib html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all majo
pyglet is a cross-platform windowing and multimedia library for Python, for developing games and other visually rich applications.
pyglet pyglet is a cross-platform windowing and multimedia library for Python, intended for developing games and other visually rich applications. It
Official repository for Spyder - The Scientific Python Development Environment
Copyright © 2009–2021 Spyder Project Contributors Some source files and icons may be under other authorship/licenses; see NOTICE.txt. Project status B