1356 Repositories
Python vision-transformers Libraries
Exadel CompreFace is a free and open-source face recognition GitHub project
Exadel CompreFace is a leading free and open-source face recognition system Exadel CompreFace is a free and open-source face recognition service that
ChatBot-Pytorch - A GPT-2 ChatBot implemented using Pytorch and Huggingface-transformers
ChatBot-Pytorch A GPT-2 ChatBot implemented using Pytorch and Huggingface-transf
TetrisAI - Tetris AI Bot using computer vision to play game automatically
Tetris AI Tetris AI Bot using computer vision to play game automatically bot.py
Pytorch implementation of ICASSP 2022 paper Attention Probe: Vision Transformer Distillation in the Wild
Attention Probe: Vision Transformer Distillation in the Wild Jiahao Wang, Mingdeng Cao, Shuwei Shi, Baoyuan Wu, Yujiu Yang In ICASSP 2022 This code is
Attention Probe: Vision Transformer Distillation in the Wild
Attention Probe: Vision Transformer Distillation in the Wild Jiahao Wang, Mingdeng Cao, Shuwei Shi, Baoyuan Wu, Yujiu Yang In ICASSP 2022 This code is
Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET
Training COMET using seq2seq setting Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET. The codes are modified from run_summarizati
Are you obsessed with playing the increasingly-popular word game Wordle?
WORDLE-VISION Up your Wordle game! Are you obsessed with playing the increasingly-popular word game Wordle? Ever wondered what the optimal first word
OMNIVORE is a single vision model for many different visual modalities
Omnivore: A Single Model for Many Visual Modalities [paper][website] OMNIVORE is a single vision model for many different visual modalities. It learns
Cereal box identification in store shelves using computer vision and a single train image per model.
Product Recognition on Store Shelves Description You can read the task description here. Report You can read and download our report here. Step A - Mu
Fast Differentiable Matrix Sqrt Root
Official Pytorch implementation of ICLR 22 paper Fast Differentiable Matrix Square Root
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents [Project Page] [Paper] [Video] Wenlong Huang1, Pieter Abbee
Deep Learning Topics with Computer Vision & NLP
Deep learning Udacity Course Deep Learning Topics with Computer Vision & NLP for the AWS Machine Learning Engineer Nanodegree Program Tasks are mostly
FFCV: Fast Forward Computer Vision (and other ML workloads!)
Fast Forward Computer Vision: train models at a fraction of the cost with accele
GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form
GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.
Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)
Awesome Visual-Transformer Collect some Transformer with Computer-Vision (CV) papers. If you find some overlooked papers, please open issues or pull r
Transformer in Vision
Transformer-in-Vision Recent Transformer-based CV and related works. Welcome to comment/contribute! Keep updated. Resource SCENIC: A JAX Library for C
Transformer in Computer Vision
Transformer-in-Vision A paper list of some recent Transformer-based CV works. If you find some ignored papers, please open issues or pull requests. **
A general python framework for visual object tracking and video object segmentation, based on PyTorch
PyTracking A general python framework for visual object tracking and video object segmentation, based on PyTorch. 📣 Two tracking/VOS papers accepted
Dynamic Token Normalization Improves Vision Transformers
Dynamic Token Normalization Improves Vision Transformers This is the PyTorch implementation of the paper Dynamic Token Normalization Improves Vision T
Meta Self-learning for Multi-Source Domain Adaptation: A Benchmark
Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark Project | Arxiv | YouTube | | Abstract In recent years, deep learning-based methods
PyTorch implementation of image classification models for CIFAR-10/CIFAR-100/MNIST/FashionMNIST/Kuzushiji-MNIST/ImageNet
PyTorch Image Classification Following papers are implemented using PyTorch. ResNet (1512.03385) ResNet-preact (1603.05027) WRN (1605.07146) DenseNet
X-VLM: Multi-Grained Vision Language Pre-Training
X-VLM: learning multi-grained vision language alignments Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts. Yan Zeng, Xi
Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation
Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation Official PyTorch implementation for the paper Look
Codes and scripts for "Explainable Semantic Space by Grounding Languageto Vision with Cross-Modal Contrastive Learning"
Visually Grounded Bert Language Model This repository is the official implementation of Explainable Semantic Space by Grounding Language to Vision wit
TiP-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
TiP-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling This is the official code release for the paper 'TiP-Adapter: Training-fre
TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers.
TransMVSNet This repository contains the official implementation of the paper: "TransMVSNet: Global Context-aware Multi-view Stereo Network with Trans
A PyTorch implementation of VIOLET
VIOLET: End-to-End Video-Language Transformers with Masked Visual-token Modeling A PyTorch implementation of VIOLET Overview VIOLET is an implementati
3D-RETR: End-to-End Single and Multi-View3D Reconstruction with Transformers
3D-RETR: End-to-End Single and Multi-View 3D Reconstruction with Transformers (BMVC 2021) Zai Shi*, Zhao Meng*, Yiran Xing, Yunpu Ma, Roger Wattenhofe
[BMVC2021] "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation"
TransFusion-Pose TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei
Official Code Release for "CLIP-Adapter: Better Vision-Language Models with Feature Adapters"
Official Code Release for "CLIP-Adapter: Better Vision-Language Models with Feature Adapters" Pipeline of CLIP-Adapter CLIP-Adapter is a drop-in modul
Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers
Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers This is the repo used for human motion prediction with non-autoregress
Python Computer Vision from Scratch
This repository explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos.
Duke Machine Learning Winter School: Computer Vision 2022
mlwscv2002 Welcome to the Duke Machine Learning Winter School: Computer Vision 2022! The MLWS-CV includes 3 hands-on training sessions on implementing
In this project, we compared Spanish BERT and Multilingual BERT in the Sentiment Analysis task.
Applying BERT Fine Tuning to Sentiment Classification on Amazon Reviews Abstract Sentiment analysis has made great progress in recent years, due to th
This repository contains demos I made with the Transformers library by HuggingFace.
Transformers-Tutorials Hi there! This repository contains demos I made with the Transformers library by 🤗 HuggingFace. Currently, all of them are imp
This repository provides the code for MedViLL(Medical Vision Language Learner).
MedViLL This repository provides the code for MedViLL(Medical Vision Language Learner). Our proposed architecture MedViLL is a single BERT-based model
PyGRANSO: A PyTorch-enabled port of GRANSO with auto-differentiation
PyGRANSO PyGRANSO: A PyTorch-enabled port of GRANSO with auto-differentiation Please check https://ncvx.org/PyGRANSO for detailed instructions (introd
Predicting 10 different clothing types using Xception pre-trained model.
Predicting-Clothing-Types Predicting 10 different clothing types using Xception pre-trained model from Keras library. It is reimplemented version from
No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency
This repository contains the implementation for the paper: No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consiste
HAT: Hierarchical Aggregation Transformers for Person Re-identification
HAT: Hierarchical Aggregation Transformers for Person Re-identification
"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8
FGVC8 Exploring Vision Transformers for Fine-grained Classification paper presented at the CVPR 2021, The Eight Workshop on Fine-Grained Visual Catego
Official implementation of "Refiner: Refining Self-attention for Vision Transformers".
RefinerViT This repo is the official implementation of "Refiner: Refining Self-attention for Vision Transformers". The repo is build on top of timm an
Visual dialog agents with pre-trained vision-and-language encoders.
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation Or READ-UP: Referring Expression Agent Dialog with Unified Pretr
RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent
LegoDNN: a block-grained scaling tool for mobile vision systems
Table of contents 1 Introduction 1.1 Major features 1.2 Architecture 2 Code and Installation 2.1 Code 2.2 Installation 3 Repository of DNNs in vision
A Unified Framework and Analysis for Structured Knowledge Grounding
UnifiedSKG 📚 : Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models Code for paper UnifiedSKG: Unifying and Mu
Scaling Vision with Sparse Mixture of Experts
Scaling Vision with Sparse Mixture of Experts This repository contains the code for training and fine-tuning Sparse MoE models for vision (V-MoE) on I
Instant neural graphics primitives: lightning fast NeRF and more
Instant Neural Graphics Primitives Ever wanted to train a NeRF model of a fox in under 5 seconds? Or fly around a scene captured from photos of a fact
Bringing Computer Vision and Flutter together , to build an awesome app !!
Bringing Computer Vision and Flutter together , to build an awesome app !! Explore the Directories Flutter · Machine Learning Table of Contents About
Face Mask Detector by live camera using tensorflow-keras, openCV and Python
Face Mask Detector 😷 by Live Camera Detecting masked or unmasked faces by live camera with percentange of mask occupation About Project: This an Arti
Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function
Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function
[CVPR 2021] "Multimodal Motion Prediction with Stacked Transformers": official code implementation and project page.
mmTransformer Introduction This repo is official implementation for mmTransformer in pytorch. Currently, the core code of mmTransformer is implemented
NeWT: Natural World Tasks
NeWT: Natural World Tasks This repository contains resources for working with the NeWT dataset. ❗ At this time the binary tasks are not publicly avail
Official PyTorch Implementation of "AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting".
AgentFormer This repo contains the official implementation of our paper: AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecast
"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021
transformers-arithmetic This repository contains the code to reproduce the experiments from the paper: Nogueira, Jiang, Lin "Investigating the Limitat
An implementation of the efficient attention module.
Efficient Attention An implementation of the efficient attention module. Description Efficient attention is an attention mechanism that substantially
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
GCNet for Object Detection By Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu. This repo is a official implementation of "GCNet: Non-local Networ
Pytorch implementation of Compressive Transformers, from Deepmind
Compressive Transformer in Pytorch Pytorch implementation of Compressive Transformers, a variant of Transformer-XL with compressed memory for long-ran
The accompanying code for the paper "GMAT: Global Memory Augmentation for Transformers" (Ankit Gupta and Jonathan Berant).
GMAT: Global Memory Augmentation for Transformers This repository contains the accompanying code for the paper: "GMAT: Global Memory Augmentation for
Implementation of Memformer, a Memory-augmented Transformer, in Pytorch
Memformer - Pytorch Implementation of Memformer, a Memory-augmented Transformer, in Pytorch. It includes memory slots, which are updated with attentio
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.
Linear Transformers Are Secretly Fast Weight Programmers This repository contains the code accompanying the paper Linear Transformers Are Secretly Fas
Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets. Additionally, it also collects many useful tutorials and tools in these related domains.
Repository for 2021 Computer Vision Class @ Chulalongkorn University
2110443 - Computer Vision (2021/2) Computer Vision @ Chulalongkorn University Anaconda Download Link https://www.anaconda.com/download/ Miniconda and
📚 A collection of all the Deep Learning Metrics that I came across which are not accuracy/loss.
📚 A collection of all the Deep Learning Metrics that I came across which are not accuracy/loss.
Unofficial JAX implementations of Deep Learning models
JAX Models Table of Contents About The Project Getting Started Prerequisites Installation Usage Contributing License Contact About The Project The JAX
This is the official implementation of our proposed SwinMR
SwinMR This is the official implementation of our proposed SwinMR: Swin Transformer for Fast MRI Please cite: @article{huang2022swin, title={Swi
VisionKG: Vision Knowledge Graph
VisionKG: Vision Knowledge Graph Official Repository of VisionKG by Anh Le-Tuan, Trung-Kien Tran, Manh Nguyen-Duc, Jicheng Yuan, Manfred Hauswirth and
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
applied-ml Curated papers, articles, and blogs on data science & machine learning in production. ⚙️ Figuring out how to implement your ML project? Lea
Repo for the paper Extrapolating from a Single Image to a Thousand Classes using Distillation
Extrapolating from a Single Image to a Thousand Classes using Distillation by Yuki M. Asano* and Aaqib Saeed* (*Equal Contribution) Extrapolating from
LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation
LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation Table of Contents: Introduction Project Structure Installation Datas
The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer
ELSA: Enhanced Local Self-Attention for Vision Transformer By Jingkai Zhou, Pich
A simple python module to generate anchor (aka default/prior) boxes for object detection tasks.
PyBx WIP A simple python module to generate anchor (aka default/prior) boxes for object detection tasks. Calculated anchor boxes are returned as ndarr
MPViT:Multi-Path Vision Transformer for Dense Prediction
MPViT : Multi-Path Vision Transformer for Dense Prediction This repository inlcu
Generic Foreground Segmentation in Images
Pixel Objectness The following repository contains pretrained model for pixel objectness. Please visit our project page for the paper and visual resul
Food recognition model using convolutional neural network & computer vision
Food recognition model using convolutional neural network & computer vision. The goal is to match or beat the DeepFood Research Paper
Object recognition using Azure Custom Vision AI and Azure Functions
Step by Step on how to create an object recognition model using Custom Vision, export the model and run the model in an Azure Function
ELSED: Enhanced Line SEgment Drawing
ELSED: Enhanced Line SEgment Drawing This repository contains the source code of ELSED: Enhanced Line SEgment Drawing the fastest line segment detecto
Repository of Vision Transformer with Deformable Attention
Vision Transformer with Deformable Attention This repository contains the code for the paper Vision Transformer with Deformable Attention [arXiv]. Int
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"
merlot_reserve Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound" MERLOT Reserve (in submission) is a mo
Implementation of Hire-MLP: Vision MLP via Hierarchical Rearrangement and An Image Patch is a Wave: Phase-Aware Vision MLP.
Hire-Wave-MLP.pytorch Implementation of Hire-MLP: Vision MLP via Hierarchical Rearrangement and An Image Patch is a Wave: Phase-Aware Vision MLP Resul
This repository collects 100 papers related to negative sampling methods.
Negative-Sampling-Paper This repository collects 100 papers related to negative sampling methods, covering multiple research fields such as Recommenda
A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval
CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate
This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.
inverse_attention This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021. Le
Practical Machine Learning with Python
Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.
Smart computer vision application
Smart-computer-vision-application Backend : opencv and python Library required:
A Transformer Implementation that is easy to understand and customizable.
Simple Transformer I've written a series of articles on the transformer architecture and language models on Medium. This repository contains an implem
A computer vision pipeline to identify the "icons" in Christian paintings
Christian-Iconography A computer vision pipeline to identify the "icons" in Christian paintings. A bit about iconography. Iconography is related to id
Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution
PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution [arXiv 2021].
Alignment Attention Fusion framework for Few-Shot Object Detection
AAF framework Framework generalities This repository contains the code of the AAF framework proposed in this paper. The main idea behind this work is
A Light in the Dark: Deep Learning Practices for Industrial Computer Vision
A Light in the Dark: Deep Learning Practices for Industrial Computer Vision This is the repository for our Paper/Contribution to the WI2022 in Nürnber
Source code for the plant extraction workflow introduced in the paper “Agricultural Plant Cataloging and Establishment of a Data Framework from UAV-based Crop Images by Computer Vision”
Plant extraction workflow Source code for the plant extraction workflow introduced in the paper "Agricultural Plant Cataloging and Establishment of a
A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval
CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate
This is a computer vision based implementation of the popular childhood game 'Hand Cricket/Odd or Even' in python
Hand Cricket Table of Content Overview Installation Game rules Project Details Future scope Overview This is a computer vision based implementation of
Computer Vision and Pattern Recognition, NUS CS4243, 2022
CS4243_2022 Computer Vision and Pattern Recognition, NUS CS4243, 2022 Cloud Machine #1 : Google Colab (Free GPU) Follow this Notebook installation : h
PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders: A PyTorch Implementation This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners: @
Paddle pit - Rethinking Spatial Dimensions of Vision Transformers
基于Paddle实现PiT ——Rethinking Spatial Dimensions of Vision Transformers,arxiv 官方原版代
Pytorch reimplementation of the Mixer (MLP-Mixer: An all-MLP Architecture for Vision)
MLP-Mixer Pytorch reimplementation of Google's repository for the MLP-Mixer (Not yet updated on the master branch) that was released with the paper ML
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs ArXiv Abstract Convolutional Neural Networks (CNNs) have become the de f
Transformers based fully on MLPs
Awesome MLP-based Transformers papers An up-to-date list of Transformers based fully on MLPs without attention! Why this repo? After transformers and
PASSL包含 SimCLR,MoCo,BYOL,CLIP等基于对比学习的图像自监督算法以及 Vision-Transformer,Swin-Transformer,BEiT,CVT,T2T,MLP_Mixer等视觉Transformer算法
PASSL Introduction PASSL is a Paddle based vision library for state-of-the-art Self-Supervised Learning research with PaddlePaddle. PASSL aims to acce