ICCV2021-Papers-with-Code
ICCV 2021 论文和开源项目合集(papers with code)!
1617 papers accepted - 25.9% acceptance rate
ICCV 2021 收录论文IDs:https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRfaTmsNweuaA0Gjyu58H_Cx56pGwFhcTYII0u1pg0U7MbhlgY0R6Y-BbK3xFhAiwGZ26u3TAtN5MnS/pubhtml
注1:欢迎各位大佬提交issue,分享ICCV 2021论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
【ICCV 2021 论文和开源目录】
- Backbone
- Transformer
- 涨点神器
- GAN
- NAS
- NeRF
- Loss
- Zero-Shot Learning
- Few-Shot Learning
- 长尾(Long-tailed)
- Vision and Language
- 无监督/自监督(Self-Supervised)
- Multi-Label Image Recognition(多标签图像识别)
- 2D目标检测(Object Detection)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 医学图像分割(Medical Image Segmentation)
- 视频目标分割(Video Object Segmentation)
- Few-shot Segmentation
- 人体运动分割(Human Motion Segmentation)
- 目标跟踪(Object Tracking)
- 3D Point Cloud
- 3D Object Detection(3D目标检测)
- 3D Semantic Segmenation(3D语义分割)
- 3D Instance Segmentation(3D实例分割)
- 3D Multi-Object Tracking(3D多目标跟踪)
- Point Cloud Denoising(点云去噪)
- Point Cloud Registration(点云配准)
- Point Cloud Completion(点云补全)
- 雷达语义分割(Radar Semantic Segmentation)
- 图像恢复(Image Restoration)
- 超分辨率(Super-Resolution)
- 去噪(Denoising)
- 医学图像去噪(Medical Image Denoising)
- 去模糊(Deblurring)
- 阴影去除(Shadow Removal)
- 视频插帧(Video Frame Interpolation)
- 视频修复/补全(Video Inpainting)
- 行人重识别(Person Re-identification)
- 行人搜索(Person Search)
- 2D/3D人体姿态估计(2D/3D Human Pose Estimation)
- 6D位姿估计(6D Object Pose Estimation)
- 3D人头重建(3D Head Reconstruction)
- 人脸识别(Face Recognition)
- 人脸表情识别(Facial Expression Recognition)
- 行为识别(Action Recognition)
- 时序动作定位(Temporal Action Localization)
- 动作检测(Action Detection)
- 群体活动识别(Group Activity Recognition)
- 手语识别(Sign Language Recognition)
- 文本检测(Text Detection)
- 文本识别(Text Recognition)
- 文本替换(Text Repalcement)
- 视觉问答(Visual Question Answering, VQA)
- 对抗攻击(Adversarial Attack)
- 深度估计(Depth Estimation)
- 视线估计(Gaze Estimation)
- 人群计数(Crowd Counting)
- 车道线检测(Lane Detection)
- 轨迹预测(Trajectory Prediction)
- 异常检测(Anomaly Detection)
- 场景图生成(Scene Graph Generation)
- 图像编辑(Image Editing)
- 图像合成(Image Synthesis)
- 图像检索(Image Retrieval)
- 三维重建(3D Reconstruction)
- 视频稳像(Video Stabilization)
- 细粒度识别(Fine-Grained Recognition)
- 风格迁移(Style Transfer)
- 神经绘画(Neural Painting)
- 特征匹配(Feature Matching)
- 语义对应(Semantic Correspondence)
- 边缘检测(Edge Detection)
- 相机标定(Camera Calibration)
- 图像质量评估(Image Quality Assessment)
- 度量学习(Metric Learning)
- Unsupervised Domain Adaptation
- Video Rescaling
- Hand-Object Interaction
- Vision-and-Language Navigation
- 数据集(Datasets)
- 其他(Others)
Backbone
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
- Paper(Oral): https://arxiv.org/abs/2102.12122
- Code: https://github.com/whai362/PVT
AutoFormer: Searching Transformers for Visual Recognition
Bias Loss for Mobile Neural Networks
- Paper: https://arxiv.org/abs/2107.11170
- Code: None
Vision Transformer with Progressive Sampling
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Rethinking Spatial Dimensions of Vision Transformers
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Conformer: Local Features Coupling Global Representations for Visual Recognition
MicroNet: Improving Image Recognition with Extremely Low FLOPs
Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition
Visual Transformer
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
An Empirical Study of Training Self-Supervised Vision Transformers
- Paper(Oral): https://arxiv.org/abs/2104.02057
- MoCo v3 Code: None
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
- Paper(Oral): https://arxiv.org/abs/2102.12122
- Code: https://github.com/whai362/PVT
Group-Free 3D Object Detection via Transformers
- Paper: https://arxiv.org/abs/2104.00678
- Code: None
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
- Paper: https://arxiv.org/abs/2107.12309
- Code: None
Rethinking and Improving Relative Position Encoding for Vision Transformer
Emerging Properties in Self-Supervised Vision Transformers
Learning Spatio-Temporal Transformer for Visual Tracking
Fast Convergence of DETR with Spatially Modulated Co-Attention
Vision Transformer with Progressive Sampling
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Rethinking Spatial Dimensions of Vision Transformers
The Right to Talk: An Audio-Visual Transformer Approach
- Paper: https://arxiv.org/abs/2108.03256
- Code: None
Joint Inductive and Transductive Learning for Video Object Segmentation
Conformer: Local Features Coupling Global Representations for Visual Recognition
Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
Conditional DETR for Fast Training Convergence
MUSIQ: Multi-scale Image Quality Transformer
- Paper: https://arxiv.org/abs/2108.05997
- Code: https://github.com/google-research/google-research/tree/master/musiq
SOTR: Segmenting Objects with Transformers
PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers
- Paper(Oral): https://arxiv.org/abs/2108.08839
- Code: https://github.com/yuxumin/PoinTr
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
Improving 3D Object Detection with Channel-wise Transformer
TransFER: Learning Relation-aware Facial Expression Representations with Transformers
- Paper: https://arxiv.org/abs/2108.11116
- Code: None
GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
- Paper: https://arxiv.org/abs/2109.00512
- Code: https://github.com/facebookresearch/co3d
- Dataset: https://github.com/facebookresearch/co3d
Voxel Transformer for 3D Object Detection
- Paper: https://arxiv.org/abs/2109.02497
- Code: None
3D Human Texture Estimation from a Single Image with Transformers
- Homepage: https://www.mmlab-ntu.com/project/texformer/
- Paper(Oral): https://arxiv.org/abs/2109.02563
- Code: None
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
CTRL-C: Camera calibration TRansformer with Line-Classification
An End-to-End Transformer Model for 3D Object Detection
- Homepage: https://facebookresearch.github.io/3detr/
- Paper: https://arxiv.org/abs/2109.08141
- Code: https://github.com/facebookresearch/3detr
Eformer: Edge Enhancement based Transformer for Medical Image Denoising
- Paper: https://arxiv.org/abs/2109.08044
- Code: None
PnP-DETR: Towards Efficient Visual Analysis with Transformers
Transformer-based Dual Relation Graph for Multi-label Image Recognition
- Paper: https://arxiv.org/abs/2110.04722
- Code: None
涨点神器
FaPN: Feature-aligned Pyramid Network for Dense Image Prediction
Unifying Nonlocal Blocks for Neural Networks
Towards Learning Spatially Discriminative Feature Representations
- Paper: https://arxiv.org/abs/2109.01359
- Code: None
GAN
Labels4Free: Unsupervised Segmentation using StyleGAN
GNeRF: GAN-based Neural Radiance Field without Posed Camera
-
Paper(Oral): https://arxiv.org/abs/2103.15606
EigenGAN: Layer-Wise Eigen-Learning for GANs
From Continuity to Editability: Inverting GANs with Consecutive Images
- Paper: https://arxiv.org/abs/2107.13812
- Code: https://github.com/Qingyang-Xu/InvertingGANs_with_ConsecutiveImgs
Sketch Your Own GAN
- Homepage: https://peterwang512.github.io/GANSketching/
- Paper: https://arxiv.org/abs/2108.02774
- 代码: https://github.com/peterwang512/GANSketching
Manifold Matching via Deep Metric Learning for Generative Modeling
Dual Projection Generative Adversarial Networks for Conditional Image Generation
- Paper: https://arxiv.org/abs/2108.09016
- Code: None
GAN Inversion for Out-of-Range Images with Geometric Transformations
- Paper: https://arxiv.org/abs/2108.08998
- Code: None
ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement
- Homepage: https://yuval-alaluf.github.io/restyle-encoder/
- Paper: https://arxiv.org/abs/2104.02699
- Code: https://github.com/yuval-alaluf/restyle-encoder
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
- Paper(Oral): https://arxiv.org/abs/2103.17249
- Code: https://github.com/orpatashnik/StyleCLIP
Image Synthesis via Semantic Composition
- Homepage: https://shepnerd.github.io/scg/
- Paper: https://arxiv.org/abs/2109.07053
- Code: https://github.com/dvlab-research/SCGAN
NAS
AutoFormer: Searching Transformers for Visual Recognition
BN-NAS: Neural Architecture Search with Batch Normalization
Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition
NeRF
GNeRF: GAN-based Neural Radiance Field without Posed Camera
-
Paper(Oral): https://arxiv.org/abs/2103.15606
KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
In-Place Scene Labelling and Understanding with Implicit Scene Representation
- Homepage: https://shuaifengzhi.com/Semantic-NeRF/
- Paper(Oral): https://arxiv.org/abs/2103.15875
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
- Homepage: https://ajayj.com/dietnerf
- Paper(DietNeRF): https://arxiv.org/abs/2104.00677
BARF: Bundle-Adjusting Neural Radiance Fields
- Homepage: https://chenhsuanlin.bitbucket.io/bundle-adjusting-NeRF/
- Paper(Oral): https://arxiv.org/abs/2104.06405
- Code: https://github.com/chenhsuanlin/bundle-adjusting-NeRF
Self-Calibrating Neural Radiance Fields
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
- Paper: https://arxiv.org/abs/2109.00512
- Code: https://github.com/facebookresearch/co3d
- Dataset: https://github.com/facebookresearch/co3d
Neural Articulated Radiance Field
NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo
- Paper(Oral): https://arxiv.org/abs/2109.01129
- Code: https://github.com/weiyithu/NerfingMVS
SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes
- Homepage: https://xuchen-ethz.github.io/snarf
- Paper: https://arxiv.org/abs/2104.03953
- Code: https://github.com/xuchen-ethz/snarf
CodeNeRF: Disentangled Neural Radiance Fields for Object Categories
PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering
- Paper: https://openaccess.thecvf.com/content/ICCV2021/html/Ren_PIRenderer_Controllable_Portrait_Image_Generation_via_Semantic_Neural_Rendering_ICCV_2021_paper.html
- Code: https://github.com/RenYurui/PIRender
Loss
Rank & Sort Loss for Object Detection and Instance Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
Bias Loss for Mobile Neural Networks
- Paper: https://arxiv.org/abs/2107.11170
- Code: None
A Robust Loss for Point Cloud Registration
- Paper: https://arxiv.org/abs/2108.11682
- Code: None
Reconcile Prediction Consistency for Balanced Object Detection
- Paper: https://arxiv.org/abs/2108.10809
- Code: None
Influence-Balanced Loss for Imbalanced Visual Classification
Zero-Shot Learning
FREE: Feature Refinement for Generalized Zero-Shot Learning
Discriminative Region-based Multi-Label Zero-Shot Learning
Semantics Disentangling for Generalized Zero-Shot Learning
Few-Shot Learning
Relational Embedding for Few-Shot Classification
Few-Shot and Continual Learning with Attentive Independent Mechanisms
Few Shot Visual Relationship Co-Localization
-
Homepage: https://vl2g.github.io/projects/vrc/
长尾(Long-tailed)
Parametric Contrastive Learning
- Paper: https://arxiv.org/abs/2107.12028
- Code: https://github.com/jiequancui/Parametric-Contrastive-Learning
Influence-Balanced Loss for Imbalanced Visual Classification
Vision and Language
VLGrammar: Grounded Grammar Induction of Vision and Language
无监督/自监督(Un/Self-Supervised)
An Empirical Study of Training Self-Supervised Vision Transformers
- Paper(Oral): https://arxiv.org/abs/2104.02057
- MoCo v3 Code: None
DetCo: Unsupervised Contrastive Learning for Object Detection
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
- Paper: https://arxiv.org/abs/2108.02183
- Code: None
Improving Contrastive Learning by Visualizing Feature Transformation
- Paper(Oral): https://arxiv.org/abs/2108.02982
- Code: https://github.com/DTennant/CL-Visualizing-Feature-Transformation
Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
- Paper: https://arxiv.org/abs/2108.08012
- Code: None
Temporal Knowledge Consistency for Unsupervised Visual Representation Learning
- Paper: https://arxiv.org/abs/2108.10668
- Code: None
MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving
Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds
- Homepage: https://siyuanhuang.com/STRL/
- Paper: https://arxiv.org/abs/2109.00179
- Code: https://github.com/yichen928/STRL
Self-supervised Product Quantization for Deep Unsupervised Image Retrieval
Self-Supervised Representation Learning from Flow Equivariance
- Paper: https://arxiv.org/abs/2101.06553
- Code: None
Multi-Label Image Recognition(多标签图像识别)
Residual Attention: A Simple but Effective Method for Multi-Label Recognition
2D目标检测(Object Detection)
DetCo: Unsupervised Contrastive Learning for Object Detection
Detecting Invisible People
Active Learning for Deep Object Detection via Probabilistic Modeling
- Paper: https://arxiv.org/abs/2103.16130
- Code: None
Conditional Variational Capsule Network for Open Set Recognition
MDETR : Modulated Detection for End-to-End Multi-Modal Understanding
- Homepage: https://ashkamath.github.io/mdetr_page/
- Paper(Oral): https://arxiv.org/abs/2104.12763
- Code: https://github.com/ashkamath/mdetr
Rank & Sort Loss for Object Detection and Instance Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
SimROD: A Simple Adaptation Method for Robust Object Detection
- Paper(Oral): https://arxiv.org/abs/2107.13389
- Code: None
GraphFPN: Graph Feature Pyramid Network for Object Detection
- Paper: https://arxiv.org/abs/2108.00580
- Code: None
Fast Convergence of DETR with Spatially Modulated Co-Attention
Conditional DETR for Fast Training Convergence
TOOD: Task-aligned One-stage Object Detection
- Paper(Oral): https://arxiv.org/abs/2108.07755
- Code: https://github.com/fcjian/TOOD
Reconcile Prediction Consistency for Balanced Object Detection
-
Code: None
Mutual Supervision for Dense Object Detection
PnP-DETR: Towards Efficient Visual Analysis with Transformers
Deep Structured Instance Graph for Distilling Object Detectors
半监督目标检测
End-to-End Semi-Supervised Object Detection with Soft Teacher
- Paper: https://arxiv.org/abs/2106.09018
- Code: None
旋转目标检测
Oriented R-CNN for Object Detection
Few-Shot目标检测
DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection
语义分割(Semantic Segmentation)
Personalized Image Semantic Segmentation
- Paper: https://arxiv.org/abs/2107.13978
- Code: https://github.com/zhangyuygss/PIS
- Dataset: https://github.com/zhangyuygss/PIS
Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11264
- Code: None
Enhanced Boundary Learning for Glass-like Object Segmentation
Self-Regulation for Semantic Segmentation
Mining Contextual Information Beyond Image for Semantic Segmentation
Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation
ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation
Scaling up instance annotation via label propagation
- Homepage: http://scaling-anno.csail.mit.edu/
- Paper: https://arxiv.org/abs/2110.02277
- Code: None
无监督域自适应语义分割(Unsupervised Domain Ddaption Semantic Segmentation)
Multi-Anchor Active Domain Adaptation for Semantic Segmentation
- Paper(Oral): https://arxiv.org/abs/2108.08012
- Code: https://github.com/munanning/MADA
Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation
- Homepage: https://sites.google.com/view/sfdaseg
- Paper: https://arxiv.org/abs/2108.11249
Few-Shot语义分割
Learning Meta-class Memory for Few-Shot Semantic Segmentation
- Paper: https://arxiv.org/abs/2108.02958'
- Code: None
Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer
半监督语义分割(Semi-supervised Semantic Segmentation)
Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
- Paper: https://arxiv.org/abs/2107.11787
- Code: None
Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation
- Paper(Oral): https://arxiv.org/abs/2107.11279
- Code: https://github.com/CVMI-Lab/DARS
Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation
- Paper: https://arxiv.org/abs/2108.09025
- Code: None
弱监督语义分割(Weakly Supervised Semantic Segmentation)
Complementary Patch for Weakly Supervised Semantic Segmentation
- Paper: https://arxiv.org/abs/2108.03852
- Code: None
无监督分割(Unsupervised Segmentation)
Labels4Free: Unsupervised Segmentation using StyleGAN
实例分割(Instance Segmentation)
Instances as Queries
Crossover Learning for Fast Online Video Instance Segmentation
Rank & Sort Loss for Object Detection and Instance Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
SOTR: Segmenting Objects with Transformers
Scaling up instance annotation via label propagation
- Homepage: http://scaling-anno.csail.mit.edu/
- Paper: https://arxiv.org/abs/2110.02277
- Code: None
医学图像分割(Medical Image Segmentation)
Recurrent Mask Refinement for Few-Shot Medical Image Segmentation
视频目标分割(Video Object Segmentation)
Hierarchical Memory Matching Network for Video Object Segmentation
Full-Duplex Strategy for Video Object Segmentation
- Homepage: http://dpfan.net/FSNet/
- Paper: https://arxiv.org/abs/2108.03151
- Code: https://github.com/GewelsJI/FSNet
Joint Inductive and Transductive Learning for Video Object Segmentation
Few-shot Segmentation
Mining Latent Classes for Few-shot Segmentation
- Paper(Oral): https://arxiv.org/abs/2103.15402
- Code: https://github.com/LiheYoung/MiningFSS
人体运动分割(Human Motion Segmentation)
Graph Constrained Data Representation Learning for Human Motion Segmentation
- Paper: https://arxiv.org/abs/2107.13362
- Code: None
目标跟踪(Object Tracking)
Learning to Track Objects from Unlabeled Videos
Learning Spatio-Temporal Transformer for Visual Tracking
Learning to Adversarially Blur Visual Object Tracking
HiFT: Hierarchical Feature Transformer for Aerial Tracking
Learn to Match: Automatic Matching Network Design for Visual Tracking
Saliency-Associated Object Tracking
RGBD 目标跟踪
DepthTrack: Unveiling the Power of RGBD Tracking
- Paper: https://arxiv.org/abs/2108.13962
- Code: https://github.com/xiaozai/DeT
- Dataset: https://github.com/xiaozai/DeT
3D Point Cloud
Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds
-
Homepage: https://siyuanhuang.com/STRL/
Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion
- Homepage: https://hansen7.github.io/OcCo/
- Paper: https://arxiv.org/abs/2010.01089
- Code: https://github.com/hansen7/OcCo
DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation
- Paper: https://arxiv.org/abs/2108.04023
- Code: None
Adaptive Graph Convolution for Point Cloud Analysis
Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion
3D Object Detection(3D目标检测)
Group-Free 3D Object Detection via Transformers
- Paper: https://arxiv.org/abs/2104.00678
- Code: None
Improving 3D Object Detection with Channel-wise Transformer
AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection
4D-Net for Learned Multi-Modal Alignment
- Paper: https://arxiv.org/abs/2109.01066
- Code: None
Voxel Transformer for 3D Object Detection
- Paper: https://arxiv.org/abs/2109.02497
- Code: None
Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection
- Paper: https://arxiv.org/abs/2109.02499
- Code: None
An End-to-End Transformer Model for 3D Object Detection
- Homepage: https://facebookresearch.github.io/3detr/
- Paper: https://arxiv.org/abs/2109.08141
- Code: https://github.com/facebookresearch/3detr
RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection
Geometry-based Distance Decomposition for Monocular 3D Object Detection
3D Semantic Segmentation(3D语义分割)
ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation
- Paper: https://arxiv.org/abs/2107.11769
- Code: None
Learning with Noisy Labels for Robust Point Cloud Segmentation
- Homepage: https://shuquanye.com/PNAL_website/
- Paper(Oral): https://arxiv.org/abs/2107.14230
VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.13824
- Code: https://github.com/hzykent/VMNet
Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation
DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation
- Paper: https://arxiv.org/abs/2108.04023
- Code: None
Adaptive Graph Convolution for Point Cloud Analysis
Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation
3D Instance Segmentation(3D实例分割)
Hierarchical Aggregation for 3D Instance Segmentation
Instance Segmentation in 3D Scenes Using Semantic Superpoint Tree Networks
3D Multi-Object Tracking(3D多目标跟踪)
Exploring Simple 3D Multi-Object Tracking for Autonomous Driving
Point Cloud Denoising(点云去噪)
Score-Based Point Cloud Denoising
- Paper: https://arxiv.org/abs/2107.10981
- Code: None
Point Cloud Registration(点云配准)
HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration
- Homepage: https://ispc-group.github.io/hregnet
- Paper: https://arxiv.org/abs/2107.11992
- Code: https://github.com/ispc-lab/HRegNet
A Robust Loss for Point Cloud Registration
- Paper: https://arxiv.org/abs/2108.11682
- Code: None
Point Cloud Completion(点云补全)
PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers
- Paper(Oral): https://arxiv.org/abs/2108.08839
- Code: https://github.com/yuxumin/PoinTr
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
雷达语义分割(Radar Semantic Segmentation)
Multi-View Radar Semantic Segmentation
图像恢复(Image Restoration)
Dynamic Attentive Graph Learning for Image Restoration
超分辨率(Super-Resolution)
Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks
Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution
Deep Reparametrization of Multi-Frame Super-Resolution and Denoising
- Paper(Oral): https://arxiv.org/abs/2108.08286
- Code: None
Dual-Camera Super-Resolution with Aligned Attention Modules
- Homepage: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
- Paper: https://arxiv.org/abs/2109.01349
- Code: https://github.com/Tengfei-Wang/DualCameraSR
- Dataset: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme
- Paper: https://www4.comp.polyu.edu.hk/~cslzhang/paper/ICCV21_RealVSR.pdf
- Code: https://github.com/IanYeung/RealVSR
- Dataset: https://github.com/IanYeung/RealVSR
去噪(Denoising)
Deep Reparametrization of Multi-Frame Super-Resolution and Denoising
- Paper(Oral): https://arxiv.org/abs/2108.08286
- Code: None
Rethinking Deep Image Prior for Denoising
医学图像去噪(Medical Image Denoising)
Eformer: Edge Enhancement based Transformer for Medical Image Denoising
- Paper: https://arxiv.org/abs/2109.08044
- Code: None
去模糊(Deblurring)
Rethinking Coarse-to-Fine Approach in Single Image Deblurring
Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions
- Paper: https://arxiv.org/abs/2108.09108
- Code: None
阴影去除(Shadow Removal)
CANet: A Context-Aware Network for Shadow Removal
视频插帧(Video Frame Interpolation)
XVFI: eXtreme Video Frame Interpolation
- Paper(Oral): https://arxiv.org/abs/2103.16206
- Code: https://github.com/JihyongOh/XVFI
- Dataset: https://github.com/JihyongOh/XVFI
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation
视频修复/补全(Video Inpainting)
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
行人重识别(Person Re-identification)
TransReID: Transformer-based Object Re-Identification
IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID
- Paper(Oral): https://arxiv.org/abs/2108.02413
- Code: https://github.com/SikaStar/IDM
行人搜索(Person Search)
Weakly Supervised Person Search with Region Siamese Networks
- Paper: https://arxiv.org/abs/2109.06109
- Code: None
2D/3D人体姿态估计(2D/3D Human Pose Estimation)
2D 人体姿态估计
Human Pose Regression with Residual Log-likelihood Estimation
- Paper(Oral): https://arxiv.org/abs/2107.11291
- Code(RLE): https://github.com/Jeff-sjtu/res-loglikelihood-regression
Online Knowledge Distillation for Efficient Pose Estimation
- Paper: https://arxiv.org/abs/2108.02092
- Code: None
3D 人体姿态估计
Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows
- Paper: https://arxiv.org/abs/2107.13788
- Code: https://github.com/twehrbein/Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows
Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
- Paper: https://arxiv.org/abs/2109.05885
- Code: None
6D位姿估计(6D Object Pose Estimation)
StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation
- Paper: https://arxiv.org/abs/2109.10115
- Code: None
- Dataset: None
3D人头重建(3D Head Reconstruction)
H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction
-
Homepage: https://crisalixsa.github.io/h3d-net/
人脸识别(Face Recognition)
SynFace: Face Recognition with Synthetic Data
- Paper: https://arxiv.org/abs/2108.07960
- Code: None
Facial Expression Recognition(人脸表情识别)
TransFER: Learning Relation-aware Facial Expression Representations with Transformers
- Paper: https://arxiv.org/abs/2108.11116
- Code: None
行为识别(Action Recognition)
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
- Paper: https://arxiv.org/abs/2104.09952
- Code: None
Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
- Paper: https://arxiv.org/abs/2108.02183
- Code: None
Dynamic Network Quantization for Efficient Video Inference
- Homepage: https://cs-people.bu.edu/sunxm/VideoIQ/project.html
- Paper: https://arxiv.org/abs/2108.10394
- Code: https://github.com/sunxm2357/VideoIQ
时序动作定位(Temporal Action Localization)
Enriching Local and Global Contexts for Temporal Action Localization
- Paper: https://arxiv.org/abs/2107.12960
- Code: None
动作检测(Action Detection)
Class Semantics-based Attention for Action Detection
- Paper: https://arxiv.org/abs/2109.02613
- Code: None
群体活动识别(Group Activity Recognition)
GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer
手语识别(Sign Language Recognition)
Visual Alignment Constraint for Continuous Sign Language Recognition
文本检测(Text Detection)
Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection
文本识别(Text Recognition)
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
- Paper: https://arxiv.org/abs/2107.12090
- Code: None
文本替换(Text Replacement)
STRIVE: Scene Text Replacement In Videos
视觉问答(Visual Question Answering, VQA)
Greedy Gradient Ensemble for Robust Visual Question Answering
对抗攻击(Adversarial Attack)
Feature Importance-aware Transferable Adversarial Attacks
AdvDrop: Adversarial Attack to DNNs by Dropping Information
深度估计(Depth Estimation)
Augmenting Depth Estimation with Geospatial Context
- Paper: https://arxiv.org/abs/2109.09879
- Code: None
NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo
- Paper(Oral): https://arxiv.org/abs/2109.01129
- Code: https://github.com/weiyithu/NerfingMVS
单目深度估计
MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments
- Paper: https://arxiv.org/abs/2107.12429
- Code: None
Towards Interpretable Deep Networks for Monocular Depth Estimation
Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark
Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation
StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation
视线估计(Gaze Estimation)
Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation
人群计数(Crowd Counting)
Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework
- Paper(Oral): https://arxiv.org/abs/2107.12746
- Code(P2PNet): https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet
Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting
- Paper: https://arxiv.org/abs/2107.12619
- Code: https://github.com/TencentYoutuResearch/CrowdCounting-UEPNet
车道线检测(Lane-Detection)
VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection
-
Dataset: https://github.com/yujun0-0/MMA-Net
轨迹预测(Trajectory Prediction)
Human Trajectory Prediction via Counterfactual Analysis
Personalized Trajectory Prediction via Distribution Discrimination
MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction
Social NCE: Contrastive Learning of Socially-aware Motion Representations
Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving
- Paper: https://arxiv.org/abs/2109.01510
- Code: https://github.com/xrenaa/Safety-Aware-Motion-Prediction
Where are you heading? Dynamic Trajectory Prediction with Expert Goal Examples
- Paper: https://openaccess.thecvf.com/content/ICCV2021/papers/Zhao_Where_Are_You_Heading_Dynamic_Trajectory_Prediction_With_Expert_Goal_ICCV_2021_paper.pdf
- Code: https://github.com/JoeHEZHAO/expert_traj
异常检测(Anomaly Detection)
Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
场景图生成(Scene Graph Generation)
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
- Paper: https://arxiv.org/abs/2107.12309
- Code: None
图像编辑(Image Editing)
Sketch Your Own GAN
- Homepage: https://peterwang512.github.io/GANSketching/
- Paper: https://arxiv.org/abs/2108.02774
- 代码: https://github.com/peterwang512/GANSketching
图像合成(Image Synthesis)
Image Synthesis via Semantic Composition
- Homepage: https://shepnerd.github.io/scg/
- Paper: https://arxiv.org/abs/2109.07053
- Code: https://github.com/dvlab-research/SCGAN
图像检索(Image Retrieval)
Self-supervised Product Quantization for Deep Unsupervised Image Retrieval
三维重建(3D Reconstruction)
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
视频稳像(Video Stabilization)
Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization
细粒度识别(Fine-Grained Recognition)
Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
- Paper: https://arxiv.org/abs/2108.02399
- Code: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
- Dataset: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
风格迁移(Style Transfer)
AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer
-
Paddle Code:https://github.com/PaddlePaddle/PaddleGAN
-
PyTorch Code:https://github.com/Huage001/AdaAttN
神经绘画(Neural Painting)
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
特征匹配(Feature Matching)
Learning to Match Features with Seeded Graph Matching Network
语义对应(Semantic Correspondence)
Multi-scale Matching Networks for Semantic Correspondence
边缘检测(Edge Detection)
Pixel Difference Networks for Efficient Edge Detection
RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth
- Paper: https://arxiv.org/abs/2108.00616
- Code : https://github.com/MengyangPu/RINDNet
- Dataset: https://github.com/MengyangPu/RINDNet
相机标定(Camera calibration)
CTRL-C: Camera calibration TRansformer with Line-Classification
图像质量评估(Image Quality Assessment)
MUSIQ: Multi-scale Image Quality Transformer
- Paper: https://arxiv.org/abs/2108.05997
- Code: https://github.com/google-research/google-research/tree/master/musiq
Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment
度量学习(Metric Learning)
Deep Relational Metric Learning
Towards Interpretable Deep Metric Learning with Structural Matching
Unsupervised Domain Adaptation
Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation
- Paper(Oral): https://arxiv.org/abs/2107.13467
- Code: None
Video Rescaling
Self-Conditioned Probabilistic Learning of Video Rescaling
-
Code: None
Hand-Object Interaction
Learning a Contact Potential Field to Model the Hand-Object Interaction
Vision-and-Language Navigation
Airbert: In-domain Pretraining for Vision-and-Language Navigation
- Paper: https://arxiv.org/abs/2108.09105
- Code: https://airbert-vln.github.io/
- Dataset: https://airbert-vln.github.io/
数据集(Datasets)
Beyond Road Extraction: A Dataset for Map Update using Aerial Images
-
Homepage: https://favyen.com/muno21/
StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation
- Paper: https://arxiv.org/abs/2109.10115
- Code: None
- Dataset: None
RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth
- Paper: https://arxiv.org/abs/2108.00616
- Code : https://github.com/MengyangPu/RINDNet
- Dataset: https://github.com/MengyangPu/RINDNet
Panoptic Narrative Grounding
- Homepage: https://bcv-uniandes.github.io/panoptic-narrative-grounding/
- Paper(Oral): https://arxiv.org/abs/2109.04988
- Code: https://github.com/BCV-Uniandes/PNG
- Dataset: https://github.com/BCV-Uniandes/PNG
STRIVE: Scene Text Replacement In Videos
Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme
- Paper: https://www4.comp.polyu.edu.hk/~cslzhang/paper/ICCV21_RealVSR.pdf
- Code: https://github.com/IanYeung/RealVSR
- Dataset: https://github.com/IanYeung/RealVSR
Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes
-
Code: None
Dual-Camera Super-Resolution with Aligned Attention Modules
- Homepage: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
- Paper: https://arxiv.org/abs/2109.01349
- Code: https://github.com/Tengfei-Wang/DualCameraSR
- Dataset: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
DepthTrack: Unveiling the Power of RGBD Tracking
- Paper: https://arxiv.org/abs/2108.13962
- Code: https://github.com/xiaozai/DeT
- Dataset: https://github.com/xiaozai/DeT
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
BioFors: A Large Biomedical Image Forensics Dataset
- Paper: https://arxiv.org/abs/2108.12961
- Code: None
- Dataset: None
Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
- Paper: https://arxiv.org/abs/2108.02399
- Code: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
- Dataset: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
Airbert: In-domain Pretraining for Vision-and-Language Navigation
- Paper: https://arxiv.org/abs/2108.09105
- Code: https://airbert-vln.github.io/
- Dataset: https://airbert-vln.github.io/
Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
- Paper: http://arxiv.org/abs/2108.08202
- Code: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
- Dataset: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection
-
Dataset: https://github.com/yujun0-0/MMA-Net
XVFI: eXtreme Video Frame Interpolation
- Paper(Oral): https://arxiv.org/abs/2103.16206
- Code: https://github.com/JihyongOh/XVFI
- Dataset: https://github.com/JihyongOh/XVFI
Personalized Image Semantic Segmentation
- Paper: https://arxiv.org/abs/2107.13978
- Code: https://github.com/zhangyuygss/PIS
- Dataset: https://github.com/zhangyuygss/PIS
H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction
-
Homepage: https://crisalixsa.github.io/h3d-net/
其他(Others)
Photon-Starved Scene Inference using Single Photon Cameras
Towards Flexible Blind JPEG Artifacts Removal
Generating Attribution Maps with Disentangled Masked Backpropagation
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
- Paper: https://arxiv.org/abs/2109.14910
- Code: None
ReconfigISP: Reconfigurable Camera Image Processing Pipeline
- Paper: https://arxiv.org/abs/2109.04760
- Code: None
Panoptic Narrative Grounding
- Homepage: https://bcv-uniandes.github.io/panoptic-narrative-grounding/
- Paper(Oral): https://arxiv.org/abs/2109.04988
- Code: https://github.com/BCV-Uniandes/PNG
- Dataset: https://github.com/BCV-Uniandes/PNG
NEAT: Neural Attention Fields for End-to-End Autonomous Driving
Keep CALM and Improve Visual Feature Attribution
YouRefIt: Embodied Reference Understanding with Language and Gesture
- Paper: https://arxiv.org/abs/2109.03413
- Code: None
Pri3D: Can 3D Priors Help 2D Representation Learning?
Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain
Continual Learning for Image-Based Camera Localization
- Paper: https://arxiv.org/abs/2108.09112
- Code: None
Multi-Task Self-Training for Learning General Representations
- Paper: https://arxiv.org/abs/2108.11353
- Code: None
A Unified Objective for Novel Class Discovery
- Homepage: https://ncd-uno.github.io/
- Paper(Oral): https://arxiv.org/abs/2108.08536
- Code: https://github.com/DonkeyShot21/UNO
Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs
Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
- Paper: http://arxiv.org/abs/2108.08202
- Code: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
- Dataset: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
Impact of Aliasing on Generalizatin in Deep Convolutional Networks
- Paper: https://arxiv.org/abs/2108.03489
- Code: None
Out-of-Core Surface Reconstruction via Global TGV Minimization
- Paper: https://arxiv.org/abs/2107.14790
- Code: None
Progressive Correspondence Pruning by Consensus Learning
- Homepage: https://sailor-z.github.io/projects/CLNet.html
- Paper: https://arxiv.org/abs/2101.00591
- Code: https://github.com/sailor-z/CLNet
Energy-Based Open-World Uncertainty Modeling for Confidence Calibration
- Paper: https://arxiv.org/abs/2107.12628
- Code: None
Generalized Shuffled Linear Regression
- Paper: https://drive.google.com/file/d/1Qu21VK5qhCW8WVjiRnnBjehrYVmQrDNh/view?usp=sharing
- Code: https://github.com/SILI1994/Generalized-Shuffled-Linear-Regression
Discovering 3D Parts from Image Collections
-
Homepage: https://chhankyao.github.io/lpd/
Semi-Supervised Active Learning with Temporal Output Discrepancy
Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?
Paper: https://arxiv.org/abs/2105.02498
Code: https://github.com/KingJamesSong/DifferentiableSVD
Hand-Object Contact Consistency Reasoning for Human Grasps Generation
- Homepage: https://hwjiang1510.github.io/GraspTTA/
- Paper(Oral): https://arxiv.org/abs/2104.03304
- Code: None
Equivariant Imaging: Learning Beyond the Range Space
- Paper(Oral): https://arxiv.org/abs/2103.14756
- Code: https://github.com/edongdongchen/EI
Just Ask: Learning to Answer Questions from Millions of Narrated Videos
- Paper(Oral): https://arxiv.org/abs/2012.00451
- Code: https://github.com/antoyang/just-ask