166 Repositories
Python scientific-documents Libraries
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
Paperless-ngx Paperless-ngx is a document management system that transforms your physical documents into a searchable online archive so you can keep,
Plugin to manage site, circuit and device diagrams and documents in Netbox
Netbox Documents Plugin A plugin designed to faciliate the storage of site, circuit and device specific documents within NetBox Note: Netbox v3.2+ is
Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.
Deep-Learning-for-Text-Document-Classification Text classification is one of the popular tasks in NLP that allows a program to classify free-text docu
This repository is used to simplify the process of cloning the SSM documents across the AWS regions.
SSM Cloner Introduction This module is created in order to simplify the process of copying the SSM documents from one region to another regions. As an
Scientific Programming: A Crash Course
Scientific Programming: A Crash Course Welcome to the Scientific Programming course. My name is Jon Carr and I am a postdoc in Davide Crepaldi's lab.
Citation Intent Classification in scientific papers using the Scicite dataset an Pytorch
Citation Intent Classification Table of Contents About the Project Built With Installation Usage Acknowledgments About The Project Citation Intent Cla
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents
BROS (BERT Relying On Spatiality) is a pre-trained language model focusing on text and layout for better key information extraction from documents. Given the OCR results of the document image, which are text and bounding box pairs, it can perform various key information extraction tasks, such as extracting an ordered item list from receipts
GT4SD, an open-source library to accelerate hypothesis generation in the scientific discovery process.
The GT4SD (Generative Toolkit for Scientific Discovery) is an open-source platform to accelerate hypothesis generation in the scientific discovery process. It provides a library for making state-of-the-art generative AI models easier to use.
Arxiv harvester - Poor man's simple harvester for arXiv resources
Poor man's simple harvester for arXiv resources This modest Python script takes
This repository contains the scripts for downloading and validating scripts for the documents
HC4: HLTCOE CLIR Common-Crawl Collection This repository contains the scripts for downloading and validating scripts for the documents. Document ids,
This repository compare a selfie with images from identity documents and response if the selfie match.
aws-rekognition-facecompare This repository compare a selfie with images from identity documents and response if the selfie match. This code was made
A simple document management REST based API for collaboratively interacting with documents
documan_api A simple document management REST based API for collaboratively interacting with documents.
Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.
Value Retrieval with Arbitrary Queries for Form-like Documents Introduction Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-
Auto-researching tool generating word documents.
About ResearchTE automates researching by generating document with answers to given questions. Supports getting results from: Google DuckDuckGo (with
FileGenerator - File Generator for sites that accepts documents
File Generator for sites that accepts documents This code generates files as per
Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.
Lbl2Vec Lbl2Vec is an algorithm for unsupervised document classification and unsupervised document retrieval. It automatically generates jointly embed
Toolchain for project structure and documents optimisation
ritocco Toolchain for project structure and documents optimisation
Shelf DB is a tiny document database for Python to stores documents or JSON-like data
Shelf DB Introduction Shelf DB is a tiny document database for Python to stores documents or JSON-like data. Get it $ pip install shelfdb shelfquery S
This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".
Introduction This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents". If
An executor that wraps 3D mesh models and encodes 3D content documents to d-dimension vector.
3D Mesh Encoder An Executor that receives Documents containing point sets data in its blob attribute, with shape (N, 3) and encodes it to embeddings o
FURY - A software library for scientific visualization in Python
Free Unified Rendering in Python A software library for scientific visualization in Python. General Information • Key Features • Installation • How to
ruptures: change point detection in Python
Welcome to ruptures ruptures is a Python library for off-line change point detection. This package provides methods for the analysis and segmentation
Import Python modules from dicts and JSON formatted documents.
Paker Paker is module for importing Python packages/modules from dictionaries and JSON formatted documents. It was inspired by httpimporter. Important
Finite Element Analysis
FElupe - Finite Element Analysis FElupe is a Python 3.6+ finite element analysis package focussing on the formulation and numerical solution of nonlin
Python utility library for compositing PDF documents with reportlab.
pdfdoc-py Python utility library for compositing PDF documents with reportlab. Installation The pdfdoc-py package can be installed directly from the s
Quantifiers and Negations in RE Documents
Quantifiers-and-Negations-in-RE-Documents This project was part of my work for a
A supercharged version of paperless: scan, index and archive all your physical documents
Paperless-ng Paperless (click me) is an application by Daniel Quinn and contributors that indexes your scanned documents and allows you to easily sear
Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.
opt-einsum-torch There have been many implementations of Einstein's summation. numpy's numpy.einsum is the least efficient one as it only runs in sing
Scientific measurement library for instruments, experiments, and live-plotting
PyMeasure scientific package PyMeasure makes scientific measurements easy to set up and run. The package contains a repository of instrument classes a
A refresher for PowerBI Desktop documents
PowerBI_Refresher-NPP Informació Per executar el programa s'ha de tenir instalat el python versio 3 o mes. Requeriments a requirements.txt. El fitxer
An open-source systems and controls toolbox for Python3
harold A control systems package for Python=3.6. Introduction This package is written with the ambition of providing a full-fledged control systems s
An executor that loads ONNX models and embeds documents using the ONNX runtime.
ONNXEncoder An executor that loads ONNX models and embeds documents using the ONNX runtime. Usage via Docker image (recommended) from jina import Flow
OpenQuake's Engine for Seismic Hazard and Risk Analysis
OpenQuake Engine The OpenQuake Engine is an open source application that allows users to compute seismic hazard and seismic risk of earthquakes on a g
Telegram Bot to store Posts and Documents and it can Access by Special Links.
Telegram Bot to store Posts and Documents and it can Access by Special Links. I Guess This Will Be Usefull For Many People..... 😇 . Features Fully cu
Python package for the analysis and visualisation of finite-difference fields.
discretisedfield Marijan Beg1,2, Martin Lang2, Samuel Holt3, Ryan A. Pepper4, Hans Fangohr2,5,6 1 Department of Earth Science and Engineering, Imperia
A Boilerplate repo for Scientific Python Open Science projects
A Boilerplate repo for Scientific Python Open Science projects Installation Clone this repo If you need a fresh python environment, run $ conda env cr
SIR model parameter estimation using a novel algorithm for differentiated uniformization.
TenSIR Parameter estimation on epidemic data under the SIR model using a novel algorithm for differentiated uniformization of Markov transition rate m
SciPy library main repository
SciPy SciPy (pronounced "Sigh Pie") is an open-source software for mathematics, science, and engineering. It includes modules for statistics, optimiza
freeCodeCamp Scientific Computing with Python Project for Certification.
Time_Calculator_freeCodeCamp freeCodeCamp Scientific Computing with Python Project for Certification. Write a function named add_time that takes in tw
freeCodeCamp Scientific Computing with Python Project for Certification.
Polygon_Area_Calculator freeCodeCamp Python Project freeCodeCamp Scientific Computing with Python Project for Certification. In this project you will
minipdf is a package for creating simple, single-page PDF documents.
minipdf minipdf is a package for creating simple, single-page PDF documents. Installation You can install the development version from GitHub with: #
Program to extract signatures from documents.
Extracting Signatures from Bank Checks Introduction Ahmed et al. [1] suggest a connected components-based method for segmenting signatures in document
Automate the case review on legal case documents and find the most critical cases using network analysis
Automation on Legal Court Cases Review This project is to automate the case review on legal case documents and find the most critical cases using netw
Make scripted visualizations in blender
Scripted visualizations in blender The goal of this project is to script 3D scientific visualizations using blender. To achieve this, we aim to bring
Efficient and Scalable Physics-Informed Deep Learning and Scientific Machine Learning on top of Tensorflow for multi-worker distributed computing
Notice: Support for Python 3.6 will be dropped in v.0.2.1, please plan accordingly! Efficient and Scalable Physics-Informed Deep Learning Collocation-
Physics-Informed Neural Networks (PINN) and Deep BSDE Solvers of Differential Equations for Scientific Machine Learning (SciML) accelerated simulation
NeuralPDE NeuralPDE.jl is a solver package which consists of neural network solvers for partial differential equations using scientific machine learni
Simulation and Parameter Estimation in Geophysics
Simulation and Parameter Estimation in Geophysics - A python package for simulation and gradient based parameter estimation in the context of geophysical applications.
yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data.
The yt Project yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data. yt supports structured, varia
A Python tool that parses JSON documents using JsonPath
A Python tool that parses JSON documents using JsonPath
Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
Have you always wished Jupyter notebooks were plain text documents? Wished you could edit them in your favorite IDE? And get clear and meaningful diff
The fundamental package for scientific computing with Python.
NumPy is the fundamental package needed for scientific computing with Python. Website: https://www.numpy.org Documentation: https://numpy.org/doc Mail
This tool crawls a list of websites and download all PDF and office documents
This tool crawls a list of websites and download all PDF and office documents. Then it analyses the PDF documents and tries to detect accessibility issues.
Plot-configurations for scientific publications, purely based on matplotlib
TUEplots Plot-configurations for scientific publications, purely based on matplotlib. Usage Please have a look at the examples in the example/ directo
A script to check for common mistakes in LaTeX source files of scientific papers.
LaTeX Paper Linter This script checks for common mistakes in LaTeX source files of scientific papers. Usage python3 paperlint.py file.tex [-i/x inc
Library - Recent and favorite documents
Thingy Thingy is used to quickly access recent and favorite documents. It's an XApp so it can work in any distribution and many desktop environments (
A fast, pure python implementation of the MuyGPs Gaussian process realization and training algorithm.
Fast implementation of the MuyGPs Gaussian process hyperparameter estimation algorithm MuyGPs is a GP estimation method that affords fast hyperparamet
A deep learning based semantic search platform that computes similarity scores between provided query and documents
semanticsearch This is a deep learning based semantic search platform that computes similarity scores between provided query and documents. Documents
Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API
Dominate Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure
File-sharing-Bot: Telegram Bot to store Posts and Documents and it can Access by Special Links.
File-sharing-Bot Telegram Bot to store Posts and Documents and it can Access by Special Links. I Guess This Will Be Usefull For Many People..... 😇 .
A naive Bayes model for cancer classification using a set of documents
Naivebayes text classifcation model for cancer and noncancer documents Author: Alex King Purpose Requirements/files included How to use 1. Purpose The
Data imputations library to preprocess datasets with missing data
Impyute is a library of missing data imputation algorithms. This library was designed to be super lightweight, here's a sneak peak at what impyute can do.
Fast EMD for Python: a wrapper for Pele and Werman's C++ implementation of the Earth Mover's Distance metric
PyEMD: Fast EMD for Python PyEMD is a Python wrapper for Ofir Pele and Michael Werman's implementation of the Earth Mover's Distance that allows it to
Python+Numpy+OpenGL: fast, scalable and beautiful scientific visualization
Python+Numpy+OpenGL: fast, scalable and beautiful scientific visualization
A set of useful perceptually uniform colormaps for plotting scientific data
Colorcet: Collection of perceptually uniform colormaps Build Status Coverage Latest dev release Latest release Docs What is it? Colorcet is a collecti
Scikit-Garden or skgarden is a garden for Scikit-Learn compatible decision trees and forests.
Scikit-Garden or skgarden (pronounced as skarden) is a garden for Scikit-Learn compatible decision trees and forests.
A Python package for performing pore network modeling of porous media
Overview of OpenPNM OpenPNM is a comprehensive framework for performing pore network simulations of porous materials. More Information For more detail
Apply different text recognition services to images of handwritten documents.
Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume
AI-powered literature discovery and review engine for medical/scientific papers
AI-powered literature discovery and review engine for medical/scientific papers paperai is an AI-powered literature discovery and review engine for me
Python Tool to Easily Generate Multiple Documents
Python Tool to Easily Generate Multiple Documents Running the script doesn't require internet Max Generation is set to 10k to avoid lagging/crashing R
Code for DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents
DeepXML Code for DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents Architectures and algorithms DeepXML supports
This repository is maintained for the scientific paper tittled " Study of keyword extraction techniques for Electric Double Layer Capacitor domain using text similarity indexes: An experimental analysis "
kwd-extraction-study This repository is maintained for the scientific paper tittled " Study of keyword extraction techniques for Electric Double Layer
Scientific Visualization: Python + Matplotlib
An open access book on scientific visualization using python and matplotlib
Converts a grading Excel sheet into Markdown documents.
GradeDocs Turns Excel worksheets into grade/score documents. Example Given such an Excel Worksheet (see examples/example.xlsx): The following commands
Python bindings for JIGSAW: a Delaunay-based unstructured mesh generator.
JIGSAW: An unstructured mesh generator JIGSAW is an unstructured mesh generator and tessellation library; designed to generate high-quality triangulat
An Indexer that works out-of-the-box when you have less than 100K stored Documents
U100KIndexer An Indexer that works out-of-the-box when you have less than 100K stored Documents. U100K means under 100K. At 100K stored Documents with
MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.
MetPy MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data. MetPy follows semantic versioni
Powerful, efficient particle trajectory analysis in scientific Python.
freud Overview The freud Python library provides a simple, flexible, powerful set of tools for analyzing trajectories obtained from molecular dynamics
JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates.
JTEX JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates. This package uses Jinja2 as the template engine with
Document blur detection based on Laplacian operator and text detection.
Document Blur Detection For general blurred image, using the variance of Laplacian operator is a good solution. But as for the blur detection of docum
Small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface
Small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface
Scrape plants scientific name information from Agroforestry Species Switchboard 2.0.
Agroforestry Species Switchboard 2.0 Scraper Scrape plants scientific name information from Species Switchboard 2.0. Requirements python = 3.10 (you
A python module for scientific analysis of 3D objects based on VTK and Numpy
A lightweight and powerful python module for scientific analysis and visualization of 3d objects.
DocumentPy is a Python application that runs in a command-line interface environment, made for creating HTML documents.
DocumentPy DocumentPy is a Python application that runs in a command-line interface environment, made for creating HTML documents. Usage DocumentPy, a
Freecodecamp Scientific Computing with Python Certification; Solution for Challenge 2: Time Calculator
Assignment Write a function named add_time that takes in two required parameters and one optional parameter: a start time in the 12-hour clock format
Can we visualize a large scientific data set with a surrogate model? We're building a GAN for the Earth's Mantle Convection data set to see if we can!
EarthGAN - Earth Mantle Surrogate Modeling Can a surrogate model of the Earth’s Mantle Convection data set be built such that it can be readily run in
Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!
Auto-Research A no-code utility to generate a detailed well-cited survey with topic clustered sections (draft paper format) and other interesting arti
This python module allows to extract data from the RAW-file-format produces by devices from Thermo Fisher Scientific.
fisher_py This Python module allows access to Thermo Orbitrap raw mass spectrometer files. Using this library makes it possible to automate the analys
EMNLP 2021 Findings' paper, SCICAP: Generating Captions for Scientific Figures
SCICAP: Scientific Figures Dataset This is the Github repo of the EMNLP 2021 Findings' paper, SCICAP: Generating Captions for Scientific Figures (Hsu
Implementation of TF-IDF algorithm to find documents similarity with cosine similarity
NLP learning Trying to learn NLP to use in my projects! Table of Contents About The Project Built With Getting Started Requirements Run Usage License
This is REST-API for Indonesian Text Summarization using Non-Negative Matrix Factorization for the algorithm to summarize documents and FastAPI for the framework.
Indonesian Text Summarization Using FastAPI This is REST-API for Indonesian Text Summarization using Non-Negative Matrix Factorization for the algorit
Differentiable scientific computing library
xitorch: differentiable scientific computing library xitorch is a PyTorch-based library of differentiable functions and functionals that can be widely
Deep learning library for solving differential equations and more
DeepXDE Voting on whether we should have a Slack channel for discussion. DeepXDE is a library for scientific machine learning. Use DeepXDE if you need
PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks
Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python
AICSImageIO Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Pure Python Features Supports reading metadata and imaging
Glue is a python project to link visualizations of scientific datasets across many files.
Glue Glue is a python project to link visualizations of scientific datasets across many files. Click on the image for a quick demo: Features Interacti
SciBERT is a BERT model trained on scientific text.
SciBERT is a BERT model trained on scientific text.
EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.
EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.
Open Data Cube analyses continental scale Earth Observation data through time
Open Data Cube Core Overview The Open Data Cube Core provides an integrated gridded data analysis environment for decades of analysis ready earth obse
Toolchest provides APIs for scientific and bioinformatic data analysis.
Toolchest Python Client Toolchest provides APIs for scientific and bioinformatic data analysis. It allows you to abstract away the costliness of runni
A Telegram bot to all media and documents files to web link .
FileStreamBot A Telegram bot to all media and documents files to web link . Report a Bug | Request Feature 🍁 About This Bot : This bot will give you