552 Repositories
Python Awesome-Visual-Captioning Libraries
Official Implementation of SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations
Official Implementation of SimIPU SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations Since
Visual Adversarial Imitation Learning using Variational Models (VMAIL)
Visual Adversarial Imitation Learning using Variational Models (VMAIL) This is the official implementation of the NeurIPS 2021 paper. Project website
A concise grammar of interactive graphics, built on Vega.
Vega-Lite Vega-Lite provides a higher-level grammar for visual analysis that generates complete Vega specifications. You can find more details, docume
Tensorflow 2 implementations of the C-SimCLR and C-BYOL self-supervised visual representation methods from "Compressive Visual Representations" (NeurIPS 2021)
Compressive Visual Representations This repository contains the source code for our paper, Compressive Visual Representations. We developed informatio
Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019
Class-Balanced Loss Based on Effective Number of Samples Tensorflow code for the paper: Class-Balanced Loss Based on Effective Number of Samples Yin C
[CVPR 2020] Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective
Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective [Arxiv] This is PyTorch implementation of th
[NeurIPS 2020] Code for the paper "Balanced Meta-Softmax for Long-Tailed Visual Recognition"
Balanced Meta-Softmax Code for the paper Balanced Meta-Softmax for Long-Tailed Visual Recognition Jiawei Ren, Cunjun Yu, Shunan Sheng, Xiao Ma, Haiyu
[CVPR 2021] MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition
MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition (CVPR 2021) arXiv Prerequisite PyTorch = 1.2.0 Python3 torchvision PIL argpar
This repository contains code for the paper "Disentangling Label Distribution for Long-tailed Visual Recognition", published at CVPR' 2021
Disentangling Label Distribution for Long-tailed Visual Recognition (CVPR 2021) Arxiv link Blog post This codebase is built on Causal Norm. Install co
Official PyTorch Implementation of GAN-Supervised Dense Visual Alignment
GAN-Supervised Dense Visual Alignment — Official PyTorch Implementation Paper | Project Page | Video This repo contains training, evaluation and visua
[AAAI2022] Source code for our paper《Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning》
SSVC The source code for paper [Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning] samples of the
HIVE: Evaluating the Human Interpretability of Visual Explanations
HIVE: Evaluating the Human Interpretability of Visual Explanations Project Page | Paper This repo provides the code for HIVE, a human evaluation frame
A curated list of awesome deep long-tailed learning resources.
A curated list of awesome deep long-tailed learning resources.
Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks
Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks - Official Project Page This repository contains the code develope
Useful guides, tutorials, and FAQs related to LEGO Universe and Darkflame Universe.
Awesome Lego Universe A curated list of awesome things related to LEGO Universe. LEGO Universe was a kid-friendly massively-multiplayer online role pl
Code for the TPAMI paper: "Syntax Customized Video Captioning by Imitating Exemplar Sentences"
Syntax-Customized-Video-Captioning Code for the TPAMI paper: "Syntax Customized Video Captioning by Imitating Exemplar Sentences". This is my second w
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"
SMCG Code for the paper "Controllable Video Captioning with an Exemplar Sentence" Introduction We investigate a novel and challenging task, namely con
A visual dataflow programming language for sklearn
Persimmon What is it? Persimmon is a visual dataflow language for creating sklearn pipelines. It represents functions as blocks, inputs and outputs ar
A curated list of awesome things related to Textual
Awesome Textual | A curated list of awesome things related to Textual. Textual is a TUI (Text User Interface) framework for Python inspired by modern
OpenCodeBlocks an open-source tool for modular visual programing in python
OpenCodeBlocks OpenCodeBlocks is an open-source tool for modular visual programing in python ! Although for now the tool is in Beta and features are c
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
VL-BERT By Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai. This repository is an official implementation of the paper VL-BERT:
Vision-Language Pre-training for Image Captioning and Question Answering
VLP This repo hosts the source code for our AAAI2020 work Vision-Language Pre-training (VLP). We have released the pre-trained model on Conceptual Cap
Oscar and VinVL
Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks VinVL: Revisiting Visual Representations in Vision-Language Models Updates
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part
VILLA: Vision-and-Language Adversarial Training This is the official repository of VILLA (NeurIPS 2020 Spotlight). This repository currently supports
[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual Annotations Karan Desai and Justin Johnson University of Michigan CVPR 2021 arxiv.org/abs/2006.06
A Fast Knowledge Distillation Framework for Visual Recognition
FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"
SMCG Code for the paper "Controllable Video Captioning with an Exemplar Sentence" Introduction We investigate a novel and challenging task, namely con
PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition.
FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f
A unified 3D Transformer Pipeline for visual synthesis
Overview This is the official repo for the paper: NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion. NÜWA is a unified multimodal p
Predicting Event Memorability from Contextual Visual Semantics
Predicting Event Memorability from Contextual Visual Semantics
PyTorch implementation of "Debiased Visual Question Answering from Feature and Sample Perspectives" (NeurIPS 2021)
D-VQA We provide the PyTorch implementation for Debiased Visual Question Answering from Feature and Sample Perspectives (NeurIPS 2021). Dependencies P
This is a short program that takes the input from your microphone and uses OpenGL to draw a live colourful pattern
Visual-Music This is a short program that takes the input from your microphone and uses OpenGL to draw a live colourful pattern Installation and Setup
✨✨✨An awesome open source toolbox for stereo matching.
OpenStereo This is an awesome open source toolbox for stereo matching. Supported Methods: BM SGM(T-PAMI'07) GCNet(ICCV'17) PSMNet(CVPR'18) StereoNet(E
Awesome Django Blog App
Awesome-Django-Blog-App Made with love django as the backend and Bootstrap as the frontend ! i hope that can help !! Project Title Django provides mul
Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.
FENSE The metric, Fluency ENhanced Sentence-bert Evaluation (FENSE), for audio caption evaluation, proposed in the paper "Can Audio Captions Be Evalua
Toward a Visual Concept Vocabulary for GAN Latent Space, ICCV 2021
Toward a Visual Concept Vocabulary for GAN Latent Space Code and data from the ICCV 2021 paper Sarah Schwettmann, Evan Hernandez, David Bau, Samuel Kl
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r
PyTorch code of paper "LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering"
LiVLR-VideoQA We propose a Lightweight Visual-Linguistic Reasoning framework (LiVLR) for VideoQA. The overview of LiVLR: Evaluation on MSRVTT-QA Datas
A fast, dataset-agnostic, deep visual search engine for digital art history
imgs.ai imgs.ai is a fast, dataset-agnostic, deep visual search engine for digital art history based on neural network embeddings. It utilizes modern
A dual benchmarking study of visual forgery and visual forensics techniques
A dual benchmarking study of facial forgery and facial forensics In recent years, visual forgery has reached a level of sophistication that humans can
This repository contains all code and data for the Inside Out Visual Place Recognition task
Inside Out Visual Place Recognition This repository contains code and instructions to reproduce the results for the Inside Out Visual Place Recognitio
Image Captioning using CNN ,LSTM and Attention
Image Captioning using CNN ,LSTM and Attention This is a deeplearning model which tries to summarize an image into a text . Installation Install this
An index of algorithms for learning causality with data
awesome-causality-algorithms An index of algorithms for learning causality with data. Please cite our survey paper if this index is helpful. @article{
A Discord token grabber written in Python3, with awesome obfuscation and anti-debug protection.
☣️ Plague ☣️ Plague is a Discord token grabber written in Python3, obfuscated with Kramer, protected from traffic analysers with Scarecrow and using t
HybVIO visual-inertial odometry and SLAM system
HybVIO A visual-inertial odometry system with an optional SLAM module. This is a research-oriented codebase, which has been published for the purposes
Learning Convolutional Neural Networks with Interactive Visualization.
CNN Explainer An interactive visualization system designed to help non-experts learn about Convolutional Neural Networks (CNNs) For more information,
A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).
A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).
MobileNetV1-V2,MobileNeXt,GhostNet,AdderNet,ShuffleNetV1-V2,Mobile+ViT etc.
MobileNetV1-V2,MobileNeXt,GhostNet,AdderNet,ShuffleNetV1-V2,Mobile+ViT etc. ⭐⭐⭐⭐⭐
Code for database and frontend of webpage for Neural Fields in Visual Computing and Beyond.
Neural Fields in Visual Computing—Complementary Webpage This is based on the amazing MiniConf project from Hendrik Strobelt and Sasha Rush—thank you!
Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)
DHF1K =========================================================================== Wenguan Wang, J. Shen, M.-M Cheng and A. Borji, Revisiting Video Sal
Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods
ADGC: Awesome Deep Graph Clustering ADGC is a collection of state-of-the-art (SOTA), novel deep graph clustering methods (papers, codes and datasets).
A unified 3D Transformer Pipeline for visual synthesis
Overview This is the official repo for the paper: "NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion". NÜWA is a unified multimodal
PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].
VGPL-Visual-Prior PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner (VGPL). Give
Repo for our ICML21 paper Unsupervised Learning of Visual 3D Keypoints for Control
Unsupervised Learning of Visual 3D Keypoints for Control [Project Website] [Paper] Boyuan Chen1, Pieter Abbeel1, Deepak Pathak2 1UC Berkeley 2Carnegie
A curated list of awesome projects and resources related fastai
A curated list of awesome projects and resources related fastai
Creating multimodal multitask models
Fusion Brain Challenge The English version of the document can be found here. Обновления 01.11 Мы выкладываем пример данных, аналогичных private test
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.
WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf
Current state of supervised and unsupervised depth completion methods
Awesome Depth Completion Table of Contents About Sparse-to-Dense Depth Completion Current State of Depth Completion Unsupervised VOID Benchmark Superv
Falcon: Interactive Visual Analysis for Big Data
Falcon: Interactive Visual Analysis for Big Data Crossfilter millions of records without latencies. This project is work in progress and not documente
The purpose of this project is to share knowledge on how awesome Streamlit is and can be
Awesome Streamlit The fastest way to build Awesome Tools and Apps! Powered by Python! The purpose of this project is to share knowledge on how Awesome
A curated list of awesome Dash (plotly) resources
Awesome Dash A curated list of awesome Dash (plotly) resources Dash is a productive Python framework for building web applications. Written on top of
[NeurIPS 2021 Spotlight] Code for Learning to Compose Visual Relations
Learning to Compose Visual Relations This is the pytorch codebase for the NeurIPS 2021 Spotlight paper Learning to Compose Visual Relations. Demo Imag
Automatic Video Captioning Evaluation Metric --- EMScore
Automatic Video Captioning Evaluation Metric --- EMScore Overview For an illustration, EMScore can be computed as: Installation modify the encode_text
Simple image captioning model
CLIP prefix captioning. Inference Notebook: 🥳 New: 🥳 Our technical papar is finally out! Official implementation for the paper "ClipCap: CLIP Prefix
Visual Memorability for Robotic Interestingness via Unsupervised Online Learning (ECCV 2020 Oral and TRO)
Visual Interestingness Refer to the project description for more details. This code based on the following paper. Chen Wang, Yuheng Qiu, Wenshan Wang,
Blind visual quality assessment on 360° Video based on progressive learning
Blind visual quality assessment on omnidirectional or 360 video (ProVQA) Blind VQA for 360° Video via Progressively Learning from Pixels, Frames and V
Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)
DocFormer - PyTorch Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for t
Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective
Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective Zhengzhuo Xu, Zenghao Chai, Chun Yuan This is the PyTorch implement
A transformer-based method for Healthcare Image Captioning in Vietnamese
vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese This repo GitHub contains our solution for vieCap4H
SalGAN: Visual Saliency Prediction with Generative Adversarial Networks
SalGAN: Visual Saliency Prediction with Adversarial Networks Junting Pan Cristian Canton Ferrer Kevin McGuinness Noel O'Connor Jordi Torres Elisa Sayr
Curated list of awesome GAN applications and demo
gans-awesome-applications Curated list of awesome GAN applications and demonstrations. Note: General GAN papers targeting simple image generation such
PyTorch implementation of MulMON
MulMON This repository contains a PyTorch implementation of the paper: Learning Object-Centric Representations of Multi-object Scenes from Multiple Vi
LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations
LIMEcraft LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations The LIMEcraft algorithm is an explanatory method based on
Summary of related papers on visual attention
This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey paper Vision-Attention-Papers Channel attention Spatial attention Temp
RRxIO - Robust Radar Visual/Thermal Inertial Odometry: Robust and accurate state estimation even in challenging visual conditions.
RRxIO - Robust Radar Visual/Thermal Inertial Odometry RRxIO offers robust and accurate state estimation even in challenging visual conditions. RRxIO c
A "finish the lyrics" game using Spotify, YouTube Transcript, and YouTube Search APIs, coupled with visual machine learning
Singify Introducing Singify, the party game! Challenge your friend to who knows songs better. Play random songs from your very own Spotify playlist an
Visual Interaction with Code - A portable visual debugger for python
VIC Visual Interaction with Code A simple tool for debugging and interacting with running python code. This tool is designed to make it easy to inspec
PyTorch implementation for paper "Full-Body Visual Self-Modeling of Robot Morphologies".
Full-Body Visual Self-Modeling of Robot Morphologies Boyuan Chen, Robert Kwiatkowskig, Carl Vondrick, Hod Lipson Columbia University Project Website |
Generative Art Using Neural Visual Grammars and Dual Encoders
Generative Art Using Neural Visual Grammars and Dual Encoders Arnheim 1 The original algorithm from the paper Generative Art Using Neural Visual Gramm
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning This repository contains the code for our ICCV 202
SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)
SAT: 2D Semantics Assisted Training for 3D Visual Grounding SAT: 2D Semantics Assisted Training for 3D Visual Grounding by Zhengyuan Yang, Songyang Zh
[AAAI-2021] Visual Boundary Knowledge Translation for Foreground Segmentation
Trans-Net Code for (Visual Boundary Knowledge Translation for Foreground Segmentation, AAAI2021). [https://ojs.aaai.org/index.php/AAAI/article/view/16
Python Indent - Correct python indentation in Visual Studio Code.
Python Indent Correct python indentation in Visual Studio Code. See the extension on the VSCode Marketplace and its source code on GitHub. Please cons
C++ Environment InitiatorVisual Studio Code C / C++ Environment Initiator
Visual Studio Code C / C++ Environment Initiator Latest Version : v 1.0.1(2021/11/08) .exe link here About : Visual Studio Code에서 C/C++환경을 MinGW GCC/G
Machine-in-the-Loop Rewriting for Creative Image Captioning
Machine-in-the-Loop Rewriting for Creative Image Captioning Data Annotated sources of data used in the paper: Data Source URL Mohammed et al. Link Gor
A Differentiable Recipe for Learning Visual Non-Prehensile Planar Manipulation
A Differentiable Recipe for Learning Visual Non-Prehensile Planar Manipulation This repository contains the source code of the paper A Differentiable
Leaderboard, taxonomy, and curated list of few-shot object detection papers.
Leaderboard, taxonomy, and curated list of few-shot object detection papers.
[ICCV 2021 Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Just Ask: Learning to Answer Questions from Millions of Narrated Videos Webpage • Demo • Paper This repository provides the code for our paper, includ
A collection of differentiable SVD methods and also the official implementation of the ICCV21 paper "Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?"
Differentiable SVD Introduction This repository contains: The official Pytorch implementation of ICCV21 paper Why Approximate Matrix Square Root Outpe
Code for Greedy Gradient Ensemble for Visual Question Answering (ICCV 2021, Oral)
Greedy Gradient Ensemble for De-biased VQA Code release for "Greedy Gradient Ensemble for Robust Visual Question Answering" (ICCV 2021, Oral). GGE can
This repo holds codes of the ICCV21 paper: Visual Alignment Constraint for Continuous Sign Language Recognition.
VAC_CSLR This repo holds codes of the paper: Visual Alignment Constraint for Continuous Sign Language Recognition.(ICCV 2021) [paper] Prerequisites Th
An open source recipe book from the awesome staff of Clinical Genomics
meatballs An open source recipe book from the awesome staff of Clinical Genomics.
An awesome tool to save articles from RSS feed to Pocket automatically.
RSS2Pocket An awesome tool to save articles from RSS feed to Pocket automatically. About the Project I used to use IFTTT to save articles from RSS fee
Image captioning service for healthcare domains in Vietnamese using VLP
Image captioning service for healthcare domains in Vietnamese using VLP This service is a web service that provides image captioning services for heal
Python with braces. Because Python is awesome, but whitespace is awful.
Bython Python with braces. Because Python is awesome, but whitespace is awful. Bython is a Python preprosessor which translates curly brackets into in
Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)
Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation [Project website] [Paper] This project is a PyTorch i
SW components and demos for visual kinship recognition. An emphasis is put on the FIW dataset-- data loaders, benchmarks, results in summary.
FIW Data Development Kit Table of Contents Introduction Families In the Wild Database Publications Organization To Do License Getting Involved Introdu
A treasure chest for visual recognition powered by PaddlePaddle
简体中文 | English PaddleClas 简介 飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集,助力使用者训练出更好的视觉模型和应用落地。 近期更新 2021.11.1 发布PP-ShiTu技术报告,新增饮料识别demo 2021.10.23 发