321 Repositories
Python gpu-memory Libraries
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch
Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori
Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"
Bailando Code for CVPR 2022 (oral) paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory" [Paper] | [Project Page] | [Vi
Elevation Mapping on GPU.
Elevation Mapping cupy Overview This is a ros package of elevation mapping on GPU. Code are written in python and uses cupy for GPU calculation. * pla
Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.
Optimum Transformers Accelerated NLP pipelines for fast inference 🚀 on CPU and GPU. Built with 🤗 Transformers, Optimum and ONNX runtime. Installatio
Sionna: An Open-Source Library for Next-Generation Physical Layer Research
Sionna: An Open-Source Library for Next-Generation Physical Layer Research Sionna™ is an open-source Python library for link-level simulations of digi
GPU-accelerated Image Processing library using OpenCL
pyclesperanto pyclesperanto is a python package for clEsperanto - a multi-language framework for GPU-accelerated image processing. clEsperanto uses Op
(Personalized) Page-Rank computation using PyTorch
torch-ppr This package allows calculating page-rank and personalized page-rank via power iteration with PyTorch, which also supports calculation on GP
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.
Poisson Image Editing - A Parallel Implementation Jiayi Weng (jiayiwen), Zixu Chen (zixuc) Poisson Image Editing is a technique that can fuse two imag
Use the state-of-the-art m2m100 to translate large data on CPU/GPU/TPU. Super Easy!
Easy-Translate is a script for translating large text files in your machine using the M2M100 models from Facebook/Meta AI. We also privide a script fo
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
English | 简体中文 Easy Parallel Library Overview Easy Parallel Library (EPL) is a general and efficient library for distributed model training. Usability
GPU Programming with Julia - course at the Swiss National Supercomputing Centre (CSCS), ETH Zurich
Course Description The programming language Julia is being more and more adopted in High Performance Computing (HPC) due to its unique way to combine
Lowest memory consumption and second shortest runtime in NTIRE 2022 challenge on Efficient Super-Resolution
FMEN Lowest memory consumption and second shortest runtime in NTIRE 2022 on Efficient Super-Resolution. Our paper: Fast and Memory-Efficient Network T
Implementation of MeMOT - Multi-Object Tracking with Memory - in Pytorch
MeMOT - Pytorch (wip) Implementation of MeMOT - Multi-Object Tracking with Memory - in Pytorch. This paper is just one in a line of work, but importan
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
Memory Efficient Attention Pytorch Implementation of a memory efficient multi-head attention as proposed in the paper, Self-attention Does Not Need O(
K2HASH Python library - NoSQL Key Value Store(KVS) library
k2hash_python Overview k2hash_python is an official python driver for k2hash. Install Firstly you must install the k2hash shared library: curl -o- htt
Episodic-memory - Ego4D Episodic Memory Benchmark
Ego4D Episodic Memory Benchmark EGO4D is the world's largest egocentric (first p
CVE-2022-22536 - SAP memory pipes(MPI) desynchronization vulnerability CVE-2022-22536
CVE-2022-22536 SAP memory pipes desynchronization vulnerability(MPI) CVE-2022-22
Memory Defense: More Robust Classificationvia a Memory-Masking Autoencoder
Memory Defense: More Robust Classificationvia a Memory-Masking Autoencoder Authors: - Eashan Adhikarla - Dan Luo - Dr. Brian D. Davison Abstract Many
Numerical differential equation solvers in JAX. Autodifferentiable and GPU-capable.
Diffrax Numerical differential equation solvers in JAX. Autodifferentiable and GPU-capable. Diffrax is a JAX-based library providing numerical differe
A framework for GPU based high-performance medical image processing and visualization
FAST is an open-source cross-platform framework with the main goal of making it easier to do high-performance processing and visualization of medical images on heterogeneous systems utilizing both multi-core CPUs and GPUs. To achieve this, FAST use modern C++, OpenCL and OpenGL.
A Python function for Slurm, to monitor the GPU information
Gpu-Monitor A Python function for Slurm, where I couldn't use nvidia-smi to monitor the GPU information. whole repo is not finish Installation TODO Mo
Easily benchmark PyTorch model FLOPs, latency, throughput, max allocated memory and energy consumption
⏱ pytorch-benchmark Easily benchmark model inference FLOPs, latency, throughput, max allocated memory and energy consumption Install pip install pytor
FewBit — a library for memory efficient training of large neural networks
FewBit FewBit — a library for memory efficient training of large neural networks. Its efficiency originates from storage optimizations applied to back
This speeds up PyCharm's package index processes and avoids CPU & memory overloading
This speeds up PyCharm's package index processes and avoids CPU & memory overloading
The unified machine learning framework, enabling framework-agnostic functions, layers and libraries.
The unified machine learning framework, enabling framework-agnostic functions, layers and libraries. Contents Overview In a Nutshell Where Next? Overv
Decorators for maximizing memory utilization with PyTorch & CUDA
torch-max-mem This package provides decorators for memory utilization maximization with PyTorch and CUDA by starting with a maximum parameter size and
Pytorch Performace Tuning, WandB, AMP, Multi-GPU, TensorRT, Triton
Plant Pathology 2020 FGVC7 Introduction A deep learning model pipeline for training, experimentaiton and deployment for the Kaggle Competition, Plant
Rip Raw - a small tool to analyse the memory of compromised Linux systems
Rip Raw Rip Raw is a small tool to analyse the memory of compromised Linux systems. It is similar in purpose to Bulk Extractor, but particularly focus
The Dual Memory is build from a simple CNN for the deep memory and Linear Regression fro the fast Memory
Simple-DMA a simple Dual Memory Architecture for classifications. based on the paper Dual-Memory Deep Learning Architectures for Lifelong Learning of
Generating Radiology Reports via Memory-driven Transformer
R2Gen This is the implementation of Generating Radiology Reports via Memory-driven Transformer at EMNLP-2020. Citations If you use or extend our work,
PyTorchMemTracer - Depict GPU memory footprint during DNN training of PyTorch
A Memory Tracer For PyTorch OOM is a nightmare for PyTorch users. However, most
This project generates news headlines using a Long Short-Term Memory (LSTM) neural network.
News Headlines Generator bunnysaini/Generate-Headlines Goal This project aims to generate news headlines using a Long Short-Term Memory (LSTM) neural
GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors
GPU implementation of kNN and SNN GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors Supported by numba cuda and faiss library E
TResNet: High Performance GPU-Dedicated Architecture
TResNet: High Performance GPU-Dedicated Architecture paperV2 | pretrained models Official PyTorch Implementation Tal Ridnik, Hussam Lawen, Asaf Noy, I
[内测中]前向式Python环境快捷封装工具,快速将Python打包为EXE并添加CUDA、NoAVX等支持。
QPT - Quick packaging tool 快捷封装工具 GitHub主页 | Gitee主页 QPT是一款可以“模拟”开发环境的多功能封装工具,最短只需一行命令即可将普通的Python脚本打包成EXE可执行程序,并选择性添加CUDA和NoAVX的支持,尽可能兼容更多的用户环境。 感觉还可
Visual Python and C++ nanosecond profiler, logger, tests enabler
Look into Palanteer and get an omniscient view of your program Palanteer is a set of lean and efficient tools to improve the quality of software, for
Colab notebook for openai/glide-text2im.
GLIDE text2im on Colab This repository provides a Colab notebook to produce images conditioned on text prompts with GLIDE [1]. Usage Run text2im.ipynb
Cross-modal Retrieval using Transformer Encoder Reasoning Networks (TERN). With use of Metric Learning and FAISS for fast similarity search on GPU
Cross-modal Retrieval using Transformer Encoder Reasoning Networks This project reimplements the idea from "Transformer Reasoning Network for Image-Te
Holographic Declarative Memory for Python ACT-R
HDM This is the repository for the Holographic Declarative Memory (HDM) module for Python ACT-R. This repository contains: documentation: a paper, con
Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"
Memory Compressed Attention Implementation of the Self-Attention layer of the proposed Memory-Compressed Attention, in Pytorch. This repository offers
The accompanying code for the paper "GMAT: Global Memory Augmentation for Transformers" (Ankit Gupta and Jonathan Berant).
GMAT: Global Memory Augmentation for Transformers This repository contains the accompanying code for the paper: "GMAT: Global Memory Augmentation for
Implementation of Memformer, a Memory-augmented Transformer, in Pytorch
Memformer - Pytorch Implementation of Memformer, a Memory-augmented Transformer, in Pytorch. It includes memory slots, which are updated with attentio
Collection of machine learning related notebooks to share.
ML_Notebooks Collection of machine learning related notebooks to share. Notebooks GAN_distributed_training.ipynb In this Notebook, TensorFlow's tutori
GrabGpu_py: a scripts for grab gpu when gpu is free
GrabGpu_py a scripts for grab gpu when gpu is free. WaitCondition: gpu_memory
GLNet for Memory-Efficient Segmentation of Ultra-High Resolution Images
GLNet for Memory-Efficient Segmentation of Ultra-High Resolution Images Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-
Semi-Supervised Semantic Segmentation with Pixel-Level Contrastive Learning from a Class-wise Memory Bank
This repository provides the official code for replicating experiments from the paper: Semi-Supervised Semantic Segmentation with Pixel-Level Contrast
FDTD simulator that generates s-parameters from OFF geometry files using a GPU
Emport Overview This repo provides a FDTD (Finite Differences Time Domain) simulator called emport for solving RF circuits. Emport outputs its simulat
GPOEO is a micro-intrusive GPU online energy optimization framework for iterative applications
GPOEO GPOEO is a micro-intrusive GPU online energy optimization framework for iterative applications. We also implement ODPP [1] as a comparison. [1]
A wrapper around ffmpeg to make it work in a concurrent and memory-buffered fashion.
Media Fixer Have you ever had a film or TV show that your TV wasn't able to play its audio? Well this program is for you. Media Fixer is a program whi
PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders: A PyTorch Implementation This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners: @
PyTorch GPU implementation of the ES-RNN model for time series forecasting
Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm A GPU-enabled version of the hybrid ES-RNN model by Slawek et al that won the M4 time-series
Python binding for Khiva library.
Khiva-Python Build Documentation Build Linux and Mac OS Build Windows Code Coverage README This is the Khiva Python binding, it allows the usage of Kh
Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.
Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s
AXI Combat is a networked multiplayer game built on the AXI Visualizer 3D engine.
AXI_Combat AXI Combat is a networked multiplayer game built on the AXI Visualizer 3D engine. https://axi.x10.mx/Combat AXI Combat is released under th
Import Python modules from dicts and JSON formatted documents.
Paker Paker is module for importing Python packages/modules from dictionaries and JSON formatted documents. It was inspired by httpimporter. Important
Module for remote in-memory Python package/module loading through HTTP/S
httpimport Python's missing feature! The feature has been suggested in Python Mailing List Remote, in-memory Python package/module importing through H
A Demo server serving Bert through ONNX with GPU written in Rust with 3
Demo BERT ONNX server written in rust This demo showcase the use of onnxruntime-rs on BERT with a GPU on CUDA 11 served by actix-web and tokenized wit
Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab
PySDM PySDM is a package for simulating the dynamics of population of particles. It is intended to serve as a building block for simulation systems mo
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python
Scalene: a high-performance CPU, GPU and memory profiler for Python by Emery Berger, Sam Stern, and Juan Altmayer Pizzorno. Scalene community Slack Ab
It is a simple library to speed up CLIP inference up to 3x (K80 GPU)
CLIP-ONNX It is a simple library to speed up CLIP inference up to 3x (K80 GPU) Usage Install clip-onnx module and requirements first. Use this trick !
Meshed-Memory Transformer for Image Captioning. CVPR 2020
M²: Meshed-Memory Transformer This repository contains the reference code for the paper Meshed-Memory Transformer for Image Captioning (CVPR 2020). Pl
PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning PyTorch code for our ACL 2020 paper "MART: Memory-Augmented Recur
Tf alloc - Simplication of GPU allocation for Tensorflow2
tf_alloc Simpliying GPU allocation for Tensorflow Developer: korkite (Junseo Ko)
Predictive Maintenance LSTM
Predictive-Maintenance-LSTM - Predictive maintenance study for Complex case study, we've obtained failure causes by operational error and more deeply by design mistakes.
Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend
Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend This project acts as both a tuto
Python function to construct an ODS spreadsheet on the fly - without having to store the entire file in memory or disk
stream-write-ods Python function to construct an ODS (OpenDocument Spreadsheet) on the fly - without having to store the entire file in memory or disk
Memory efficient transducer loss computation
Introduction This project implements the optimization techniques proposed in Improving RNN Transducer Modeling for End-to-End Speech Recognition to re
Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.
opt-einsum-torch There have been many implementations of Einstein's summation. numpy's numpy.einsum is the least efficient one as it only runs in sing
A simplistic and efficient pure-python neural network library from Phys Whiz with CPU and GPU support.
A simplistic and efficient pure-python neural network library from Phys Whiz with CPU and GPU support.
Scalable and Elastic Deep Reinforcement Learning Using PyTorch. Please star. 🔥
ElegantRL “小雅”: Scalable and Elastic Deep Reinforcement Learning ElegantRL is developed for researchers and practitioners with the following advantage
Python function to construct a ZIP archive with on the fly - without having to store the entire ZIP in memory or disk
Python function to construct a ZIP archive with on the fly - without having to store the entire ZIP in memory or disk
Robust, highly tunable and easy-to-integrate in-memory cache solution written in pure Python, with no dependencies.
Omoide Cache Caching doesn't need to be hard anymore. With just a few lines of code Omoide Cache will instantly bring your Python services to the next
Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021
Memory-Efficient Multi-Level In-Situ Generation (MLG) By Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen and David Z. Pan
Attention for PyTorch with Linear Memory Footprint
Attention for PyTorch with Linear Memory Footprint Unofficially implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention (+
This script allows you to retrieve all functions / variables names of a Python code, and the variables values.
Memory Extractor This script allows you to retrieve all functions / variables names of a Python code, and the variables values. How to use it ? The si
A human-readable PyTorch implementation of "Self-attention Does Not Need O(n^2) Memory"
memory_efficient_attention.pytorch A human-readable PyTorch implementation of "Self-attention Does Not Need O(n^2) Memory" (Rabe&Staats'21). def effic
Python Implementation of Scalable In-Memory Updatable Bitmap Indexing
PyUpBit CS490 Large Scale Data Analytics — Implementation of Updatable Compressed Bitmap Indexing Paper Table of Contents About The Project Usage Cont
Simple embedded in memory json database
dbj dbj is a simple embedded in memory json database. It is easy to use, fast and has a simple query language. The code is fully documented, tested an
A low-impact profiler to figure out how much memory each task in Dask is using
dask-memusage If you're using Dask with tasks that use a lot of memory, RAM is your bottleneck for parallelism. That means you want to know how much m
[NeurIPS-2020] Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.
Self-paced Contrastive Learning (SpCL) The official repository for Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID
Memory game in Python
Concentration - Memory Game Concentration is a memory game written in Python, inspired by memory-game. Description As stated in the introduction of th
Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.
MTM This is the official repository of the paper: Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Cla
Library for Memory Trace Statistics in Python
Memory Search Library for Memory Trace Statistics in Python The library uses tracemalloc as a core module, which is why it is only available for Pytho
A universal memory dumper using Frida
Fridump Fridump (v0.1) is an open source memory dumping tool, primarily aimed to penetration testers and developers. Fridump is using the Frida framew
Stochastic Tensor Optimization for Robot Motion - A GPU Robot Motion Toolkit
STORM Stochastic Tensor Optimization for Robot Motion - A GPU Robot Motion Toolkit [Install Instructions] [Paper] [Website] This package contains code
Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System
News! Aug 2020: v0.4.0 version of AlphaPose is released! Stronger tracking! Include whole body(face,hand,foot) keypoints! Colab now available. Dec 201
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch
Memory Efficient Attention This is unofficial implementation of Self-attention Does Not Need O(n^2) Memory for Jax and PyTorch. Implementation is almo
Robotics with GPU computing
Robotics with GPU computing Cupoch is a library that implements rapid 3D data processing for robotics using CUDA. The goal of this library is to imple
BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches
BLEND is a mechanism that can efficiently find fuzzy seed matches between sequences to significantly improve the performance and accuracy while reducing the memory space usage of two important applications: 1) finding overlapping reads and 2) read mapping. Described by Firtina et al.
A Python dictionary implementation designed to act as an in-memory cache for FaaS environments
faas-cache-dict A Python dictionary implementation designed to act as an in-memory cache for FaaS environments. Formally you would describe this a mem
Efficient and Scalable Physics-Informed Deep Learning and Scientific Machine Learning on top of Tensorflow for multi-worker distributed computing
Notice: Support for Python 3.6 will be dropped in v.0.2.1, please plan accordingly! Efficient and Scalable Physics-Informed Deep Learning Collocation-
The versatile ocean simulator, in pure Python, powered by JAX.
Veros is the versatile ocean simulator -- it aims to be a powerful tool that makes high-performance ocean modeling approachable and fun. Because Veros
Memory-Augmented Model Predictive Control
Memory-Augmented Model Predictive Control This repository hosts the source code for the journal article "Composing MPC with LQR and Neural Networks fo
A high-performance distributed deep learning system targeting large-scale and automated distributed training.
HETU Documentation | Examples Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, develop
A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.
A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration. Introduction spinor-gpe is high-level,
Asyncio cache manager for redis, memcached and memory
aiocache Asyncio cache supporting multiple backends (memory, redis and memcached). This library aims for simplicity over specialization. All caches co
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas
CogniPy for Pandas - In-memory Graph Database and Knowledge Graph with Natural Language Interface Whats in the box Reasoning, exploration of RDF/OWL,
GTK and Python based, system performance and usage monitoring tool
System Monitoring Center GTK3 and Python 3 based, system performance and usage monitoring tool. Features: Detailed system performance and usage usage
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU
Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode
Official PyTorch implementation of RIO
Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection Figure 1: Our proposed Resampling at image-level and obect-