940 Repositories
Python vision Libraries
PoolFormer: MetaFormer is Actually What You Need for Vision
PoolFormer: MetaFormer is Actually What You Need for Vision (arXiv) This is a PyTorch implementation of PoolFormer proposed by our paper "MetaFormer i
FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks
FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks Image Classification Dataset: Google Landmark, COCO, ImageNet Model: Efficient
Unofficial keras(tensorflow) implementation of MAE model from Masked Autoencoders Are Scalable Vision Learners
MAE-keras Unofficial keras(tensorflow) implementation of MAE model described in 'Masked Autoencoders Are Scalable Vision Learners'. This work has been
Tensorflow 2.x implementation of Vision-Transformer model
Vision Transformer Unofficial Tensorflow 2.x implementation of the Transformer based Image Classification model proposed by the paper AN IMAGE IS WORT
PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018
PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place
COLMAP - Structure-from-Motion and Multi-View Stereo
COLMAP About COLMAP is a general-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface.
Current state of supervised and unsupervised depth completion methods
Awesome Depth Completion Table of Contents About Sparse-to-Dense Depth Completion Current State of Depth Completion Unsupervised VOID Benchmark Superv
Multi-modal Vision Transformers Excel at Class-agnostic Object Detection
Multi-modal Vision Transformers Excel at Class-agnostic Object Detection
Detectron2 for Document Layout Analysis
Detectron2 trained on PubLayNet dataset This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Det
Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research
Welcome to AirSim AirSim is a simulator for drones, cars and more, built on Unreal Engine (we now also have an experimental Unity release). It is open
TransMorph: Transformer for Medical Image Registration
TransMorph: Transformer for Medical Image Registration keywords: Vision Transformer, Swin Transformer, convolutional neural networks, image registrati
Deep Learning with PyTorch made easy 🚀 !
Deep Learning with PyTorch made easy 🚀 ! Carefree? carefree-learn aims to provide CAREFREE usages for both users and developers. It also provides a c
TensorFlow implementation of Barlow Twins (Barlow Twins: Self-Supervised Learning via Redundancy Reduction)
Barlow-Twins-TF This repository implements Barlow Twins (Barlow Twins: Self-Supervised Learning via Redundancy Reduction) in TensorFlow and demonstrat
Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.
lacmus The program for searching through photos from the air of lost people in the forest using Retina Net neural nwtwork. The project is being develo
PyTorch Connectomics: segmentation toolbox for EM connectomics
Introduction The field of connectomics aims to reconstruct the wiring diagram of the brain by mapping the neural connections at the level of individua
How to use TensorLayer
How to use TensorLayer While research in Deep Learning continues to improve the world, we use a bunch of tricks to implement algorithms with TensorLay
Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).
Face Recognition: Too Bias, or Not Too Bias? Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition:
Lab Materials for MIT 6.S191: Introduction to Deep Learning
This repository contains all of the code and software labs for MIT 6.S191: Introduction to Deep Learning! All lecture slides and videos are available
Apply different text recognition services to images of handwritten documents.
Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume
Implementation EfficientDet: Scalable and Efficient Object Detection in PyTorch
Implementation EfficientDet: Scalable and Efficient Object Detection in PyTorch
kyle's vision of how datadog's python client should look
kyle's datadog python vision/proposal not for production use See examples/comprehensive.py for a mostly working example of the proposed API. 📈 🐶 ❤️
This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.
TransMix: Attend to Mix for Vision Transformers This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transf
A deep learning object detector framework written in Python for supporting Land Search and Rescue Missions.
AIR: Aerial Inspection RetinaNet for supporting Land Search and Rescue Missions AIR is a deep learning based object detection solution to automate the
This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.
TransMix: Attend to Mix for Vision Transformers This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transf
A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image.
Minimal Body A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image. The model file is only 51.2 MB and runs a
An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners
An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners This is a coarse version for MAE, only make the pretrain model, the fine
Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.
DD3D: "Is Pseudo-Lidar needed for Monocular 3D Object detection?" Install // Datasets // Experiments // Models // License // Reference Full video Offi
PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.
MAE for Self-supervised ViT Introduction This is an unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-sup
AgeGuesser: deep learning based age estimation system. Powered by EfficientNet and Yolov5
AgeGuesser AgeGuesser is an end-to-end, deep-learning based Age Estimation system, presented at the CAIP 2021 conference. You can find the related pap
Generating Videos with Scene Dynamics
Generating Videos with Scene Dynamics This repository contains an implementation of Generating Videos with Scene Dynamics by Carl Vondrick, Hamed Pirs
Stacked Generative Adversarial Networks
Stacked Generative Adversarial Networks This repository contains code for the paper "Stacked Generative Adversarial Networks", CVPR 2017. Part of the
MoCoGAN: Decomposing Motion and Content for Video Generation
MoCoGAN: Decomposing Motion and Content for Video Generation This repository contains an implementation and further details of MoCoGAN: Decomposing Mo
🔥3D-RecGAN in Tensorflow (ICCV Workshops 2017)
3D Object Reconstruction from a Single Depth View with Adversarial Learning Bo Yang, Hongkai Wen, Sen Wang, Ronald Clark, Andrew Markham, Niki Trigoni
Official Chainer implementation of GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral)
GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral) [Project] [Paper] [Demo] [Related Work: A2RL (for Auto Image Cropping)] [C
[CVPR 2016] Unsupervised Feature Learning by Image Inpainting using GANs
Context Encoders: Feature Learning by Inpainting CVPR 2016 [Project Website] [Imagenet Results] Sample results on held-out images: This is the trainin
Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.
CycleGAN PyTorch | project page | paper Torch implementation for learning an image-to-image translation (i.e. pix2pix) without input-output pairs, for
Image-to-image translation with conditional adversarial nets
pix2pix Project | Arxiv | PyTorch Torch implementation for learning a mapping from input images to output images, for example: Image-to-Image Translat
A simple interface for editing natural photos with generative neural networks.
Neural Photo Editor A simple interface for editing natural photos with generative neural networks. This repository contains code for the paper "Neural
Interactive Image Generation via Generative Adversarial Networks
iGAN: Interactive Image Generation via Generative Adversarial Networks Project | Youtube | Paper Recent projects: [pix2pix]: Torch implementation for
FCOS: Fully Convolutional One-Stage Object Detection (ICCV'19)
FCOS: Fully Convolutional One-Stage Object Detection This project hosts the code for implementing the FCOS algorithm for object detection, as presente
Powerful and efficient Computer Vision Annotation Tool (CVAT)
Computer Vision Annotation Tool (CVAT) CVAT is free, online, interactive video and image annotation tool for computer vision. It is being used by our
🔎 Super-scale your images and run experiments with Residual Dense and Adversarial Networks.
Image Super-Resolution (ISR) The goal of this project is to upscale and improve the quality of low resolution images. This project contains Keras impl
Gesture Volume Control Using OpenCV and MediaPipe
This Project Uses OpenCV and MediaPipe Hand solutions to identify hands and Change system volume by taking thumb and index finger positions
Official implementation for "Image Quality Assessment using Contrastive Learning"
Image Quality Assessment using Contrastive Learning Pavan C. Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli and Alan C. Bovik This is the offi
Implementation of the master's thesis "Temporal copying and local hallucination for video inpainting".
Temporal copying and local hallucination for video inpainting This repository contains the implementation of my master's thesis "Temporal copying and
Pytorch implementation of forward and inverse Haar Wavelets 2D
Pytorch implementation of forward and inverse Haar Wavelets 2D
A clean and extensible PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners
A clean and extensible PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners A PyTorch re-implementation of Mask Autoencoder trai
PyTorch implementation of MulMON
MulMON This repository contains a PyTorch implementation of the paper: Learning Object-Centric Representations of Multi-object Scenes from Multiple Vi
Summary of related papers on visual attention
This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey paper Vision-Attention-Papers Channel attention Spatial attention Temp
Open source hardware and software platform to build a small scale self driving car.
Donkeycar is minimalist and modular self driving library for Python. It is developed for hobbyists and students with a focus on allowing fast experimentation and easy community contributions.
🚪✊Knock Knock: Get notified when your training ends with only two additional lines of code
Knock Knock A small library to get a notification when your training is complete or when it crashes during the process with two additional lines of co
Gated-Shape CNN for Semantic Segmentation (ICCV 2019)
GSCNN This is the official code for: Gated-SCNN: Gated Shape CNNs for Semantic Segmentation Towaki Takikawa, David Acuna, Varun Jampani, Sanja Fidler
UPSNet: A Unified Panoptic Segmentation Network
UPSNet: A Unified Panoptic Segmentation Network Introduction UPSNet is initially described in a CVPR 2019 oral paper. Disclaimer This repository is te
Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018 (spotlight)
Learning to Adapt Structured Output Space for Semantic Segmentation Pytorch implementation of our method for adapting semantic segmentation from the s
A Kitti Road Segmentation model implemented in tensorflow.
KittiSeg KittiSeg performs segmentation of roads by utilizing an FCN based model. The model achieved first place on the Kitti Road Detection Benchmark
Real-time Joint Semantic Reasoning for Autonomous Driving
MultiNet MultiNet is able to jointly perform road segmentation, car detection and street classification. The model achieves real-time speed and state-
Keras implementation of Real-Time Semantic Segmentation on High-Resolution Images
Keras-ICNet [paper] Keras implementation of Real-Time Semantic Segmentation on High-Resolution Images. Training in progress! Requisites Python 3.6.3 K
TensorFlow implementation of ENet
TensorFlow-ENet TensorFlow implementation of ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. This model was tested on th
TensorFlow implementation of ENet, trained on the Cityscapes dataset.
segmentation TensorFlow implementation of ENet (https://arxiv.org/pdf/1606.02147.pdf) based on the official Torch implementation (https://github.com/e
Fully convolutional networks for semantic segmentation
FCN-semantic-segmentation Simple end-to-end semantic segmentation using fully convolutional networks [1]. Takes a pretrained 34-layer ResNet [2], remo
Chainer Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)
fcn - Fully Convolutional Networks Chainer implementation of Fully Convolutional Networks. Installation pip install fcn Inference Inference is done as
TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision
TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{you2019torchcv, author = {Ansheng You and Xiangtai Li and Zhen Zhu a
SegNet-like Autoencoders in TensorFlow
SegNet SegNet is a TensorFlow implementation of the segmentation network proposed by Kendall et al., with cool features like strided deconvolution, a
Semantic segmentation models, datasets and losses implemented in PyTorch.
Semantic Segmentation in PyTorch Semantic Segmentation in PyTorch Requirements Main Features Models Datasets Losses Learning rate schedulers Data augm
Anime Face Detector using mmdet and mmpose
Anime Face Detector This is an anime face detector using mmdetection and mmpose. (To avoid copyright issues, I use generated images by the TADNE model
History Aware Multimodal Transformer for Vision-and-Language Navigation
History Aware Multimodal Transformer for Vision-and-Language Navigation This repository is the official implementation of History Aware Multimodal Tra
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners This repository is built upon BEiT, thanks very much! Now, we on
Coursework project for DIP class. The goal is to use vision to guide the Dashgo robot through two traffic cones in bright color.
Coursework project for DIP class. The goal is to use vision to guide the Dashgo robot through two traffic cones in bright color.
ML for NLP and Computer Vision.
Sparrow is our open-source ML product. It runs on Skipper MLOps infrastructure.
History Aware Multimodal Transformer for Vision-and-Language Navigation
History Aware Multimodal Transformer for Vision-and-Language Navigation This repository is the official implementation of History Aware Multimodal Tra
A program that uses computer vision to detect hand gestures, used for controlling movie players.
HandGestureDetection This program uses a Haar Cascade algorithm to detect the presence of your hand, and then passes it on to a self-created and self-
A free, multiplatform SDK for real-time facial motion capture using blendshapes, and rigid head pose in 3D space from any RGB camera, photo, or video.
mocap4face by Facemoji mocap4face by Facemoji is a free, multiplatform SDK for real-time facial motion capture based on Facial Action Coding System or
This repo. is an implementation of ACFFNet, which is accepted for in Image and Vision Computing.
Attention-Guided-Contextual-Feature-Fusion-Network-for-Salient-Object-Detection This repo. is an implementation of ACFFNet, which is accepted for in I
Investigating automatic navigation towards standard US views integrating MARL with the virtual US environment developed in CT2US simulation
AutomaticUSnavigation Investigating automatic navigation towards standard US views integrating MARL with the virtual US environment developed in CT2US
Image Segmentation using U-Net, U-Net with skip connections and M-Net architectures
Brain-Image-Segmentation Segmentation of brain tissues in MRI image has a number of applications in diagnosis, surgical planning, and treatment of bra
Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling"
Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling" Pipeline of Tip-Adapter Tip-Adapter can provid
Laser device for neutralizing - mosquitoes, weeds and pests
Laser device for neutralizing - mosquitoes, weeds and pests (in progress) Here I will post information for creating a laser device. A warning!! How It
A set of tools to pre-calibrate and calibrate (multi-focus) plenoptic cameras (e.g., a Raytrix R12) based on the libpleno.
COMPOTE: Calibration Of Multi-focus PlenOpTic camEra. COMPOTE is a set of tools to pre-calibrate and calibrate (multifocus) plenoptic cameras (e.g., a
Certified Patch Robustness via Smoothed Vision Transformers
Certified Patch Robustness via Smoothed Vision Transformers This repository contains the code for replicating the results of our paper: Certified Patc
[ICCV 2021 Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Just Ask: Learning to Answer Questions from Millions of Narrated Videos Webpage • Demo • Paper This repository provides the code for our paper, includ
[ICCV'21] Pri3D: Can 3D Priors Help 2D Representation Learning?
Pri3D: Can 3D Priors Help 2D Representation Learning? [ICCV 2021] Pri3D leverages 3D priors for downstream 2D image understanding tasks: during pre-tr
VLGrammar: Grounded Grammar Induction of Vision and Language
VLGrammar: Grounded Grammar Induction of Vision and Language
Sketch Your Own GAN: Customizing a GAN model with hand-drawn sketches.
Sketch Your Own GAN Project | Paper | Youtube | Slides Our method takes in one or a few hand-drawn sketches and customizes an off-the-shelf GAN to mat
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet, ICCV 2021 Update: 2021/03/11: update our new results. Now our T2T-ViT-14 w
This is a collection of our NAS and Vision Transformer work.
AutoML - Neural Architecture Search This is a collection of our AutoML-NAS work iRPE (NEW): Rethinking and Improving Relative Position Encoding for Vi
ICCV2021 Papers with Code
ICCV2021 Papers with Code
RSC-Net: 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos
RSC-Net: 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos Implementation for "3D Human Pose, Shape and Texture from Low-Resoluti
Neural Scene Flow Prior (NeurIPS 2021 spotlight)
Neural Scene Flow Prior Xueqian Li, Jhony Kaesemodel Pontes, Simon Lucey Will appear on Thirty-fifth Conference on Neural Information Processing Syste
A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code.
A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code
AWS Blog post code for running feature-extraction on images using AWS Batch and Cloud Development Kit (CDK).
Batch processing with AWS Batch and CDK Welcome This repository demostrates provisioning the necessary infrastructure for running a job on AWS Batch u
Real-Time High-Resolution Background Matting
Real-Time High-Resolution Background Matting Official repository for the paper Real-Time High-Resolution Background Matting. Our model requires captur
Official implementation of "Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention" (BMVC 2021).
Multi-Glimpse Network Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention arXiv Require
This repository summarized computer vision theories.
This repository summarized computer vision theories.
The code for the NeurIPS 2021 paper "A Unified View of cGANs with and without Classifiers".
Energy-based Conditional Generative Adversarial Network (ECGAN) This is the code for the NeurIPS 2021 paper "A Unified View of cGANs with and without
DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021]
DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021] Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng
Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)
Discovering Non-monotonic Autoregressive Orderings with Variational Inference Description This package contains the source code implementation of the
Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing
HFGI: High-Fidelity GAN Inversion for Image Attribute Editing High-Fidelity GAN Inversion for Image Attribute Editing Update: We released the inferenc
Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond
CRF - Conditional Random Fields A library for dense conditional random fields (CRFs). This is the official accompanying code for the paper Regularized
Face Mask Detection System built with OpenCV, TensorFlow using Computer Vision concepts
Face mask detection Face Mask Detection System built with OpenCV, TensorFlow using Computer Vision concepts in order to detect face masks in static im
Concept Modeling: Topic Modeling on Images and Text
Concept is a technique that leverages CLIP and BERTopic-based techniques to perform Concept Modeling on images.