"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

Overview

Texformer: 3D Human Texture Estimation from a Single Image with Transformers

This is the official implementation of "3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021 (Oral)

Highlights

  • Texformer: a novel structure combining Transformer and CNN
  • Low-Rank Attention layer (LoRA) with linear complexity
  • Combination of RGB UV map and texture flow
  • Part-style loss
  • Face-structure loss

BibTeX

@inproceedings{xu2021texformer,
  title={{3D} Human Texture Estimation from a Single Image with Transformers},
  author={Xu, Xiangyu and Loy, Chen Change},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Abstract

We propose a Transformer-based framework for 3D human texture estimation from a single image. The proposed Transformer is able to effectively exploit the global information of the input image, overcoming the limitations of existing methods that are solely based on convolutional neural networks. In addition, we also propose a mask-fusion strategy to combine the advantages of the RGB-based and texture-flow-based models. We further introduce a part-style loss to help reconstruct high-fidelity colors without introducing unpleasant artifacts. Extensive experiments demonstrate the effectiveness of the proposed method against state-of-the-art 3D human texture estimation approaches both quantitatively and qualitatively.

Overview

Overview of Texformer

The Query is a pre-computed color encoding of the UV space obtained by mapping the 3D coordinates of a standard human body mesh to the UV space. The Key is a concatenation of the input image and the 2D part-segmentation map. The Value is a concatenation of the input image and its 2D coordinates. We first feed the Query, Key, and Value into three CNNs to transform them into feature space. Then the multi-scale features are sent to the Transformer units to generate the Output features. The multi-scale Output features are processed and fused in another CNN, which produces the RGB UV map T, texture flow F, and fusion mask M. The final UV map is generated by combining T and the textures sampled with F using the fusion mask M. Note that we have skip connections between the same-resolution layers of the CNNs similar to [1] which have been omitted in the figure for brevity.

Visual Results

For each example, the image on the left is the input, and the image on the right is the rendered 3D human, where the human texture is predicted by the proposed Texformer, and the geometry is predicted by RSC-Net.

input1 input1       input1 input1

Install

  • Manage the environment with Anaconda
conda create -n texformer anaconda
conda activate texformer
  • Pytorch-1.4, CUDA-9.2
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=9.2 -c pytorch
  • Install Pytorch-neural-renderer according to the instructions here

Download

  • Download meta data, and put it in "./meta/".

  • Download pretrained model, and put it in "./pretrained".

  • We propose an enhanced Market-1501 dataset, termed as SMPLMarket, by equipping the original data of Market-1501 with SMPL estimation from RSC-Net and body part segmentation estimated by EANet. Please download the SMPLMarket dataset and put it in "./datasets/".

  • Other datasets: PRW, surreal, CUHK-SYSU. Please put these datasets in "./datasets/".

  • All the paths are set in "config.py".

Demo

Run the Texformer with human part segmentation from an off-the-shelf model:

python demo.py --img_path demo_imgs/img.png --seg_path demo_imgs/seg.png

If you don't want to run an external model for human part segmentation, you can use the human part segmentation of RSC-Net instead (note that this may affect the performance as the segmentation of RSC-Net is not very accurate due to the limitation of SMPL):

python demo.py --img_path demo_imgs/img.png

Train

Run the training code with default settings:

python trainer.py --exp_name texformer

Evaluation

Run the evaluation on the SPMLMarket dataset:

python eval.py --checkpoint_path ./pretrained/texformer_ep500.pt

References

[1] "3D Human Pose, Shape and Texture from Low-Resolution Images and Videos", IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.

[2] "3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning", ECCV, 2020

[3] "SMPL: A Skinned Multi-Person Linear Model", SIGGRAPH Asia, 2015

[4] "Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and Video Denoising", IEEE Transactions on Image Processing, 2020.

[5] "Learning Factorized Weight Matrix for Joint Filtering", ICML, 2020

You might also like...
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision Download links and PyTorch implementation of "Towers of Ba

Official Repository for the ICCV 2021 paper
Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

PixelSynth: Generating a 3D-Consistent Experience from a Single Image (ICCV 2021) Chris Rockwell, David F. Fouhey, and Justin Johnson [Project Website

Official repository for
Official repository for "On Generating Transferable Targeted Perturbations" (ICCV 2021)

On Generating Transferable Targeted Perturbations (ICCV'21) Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Fatih Porikli Paper:

 Official implementation of the ICCV 2021 paper
Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

The DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergence, and present a conditional cross-attention mechanism for fast DETR training. Our approach is motivated by that the cross-attention in DETR relies highly on the content embeddings and that the spatial embeddings make minor contributions, increasing the need for high-quality content embeddings and thus increasing the training difficulty.

Learning Compatible Embeddings, ICCV 2021
Learning Compatible Embeddings, ICCV 2021

LCE Learning Compatible Embeddings, ICCV 2021 by Qiang Meng, Chixiang Zhang, Xiaoqiang Xu and Feng Zhou Paper: Arxiv We cannot release source codes pu

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)
Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021) PyTorch implementation of Learning RAW-to-sRGB Mappings with Inaccurat

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..
Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..

ARAPReg Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators.. Installation The cod

Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021
Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021

Learning Generative Models of Textured 3D Meshes from Real-World Images This is the reference implementation of "Learning Generative Models of Texture

Comments
  • How to extract only UV map in Figure 3.(e) ?

    How to extract only UV map in Figure 3.(e) ?

    Thanks for sharing great repo!!

    I wonder how to get UV map like in Fig 3.(e)

    In demo.py uvmap = self.get_texture(img_tensor, parts) uvmap = self.get_texture(img_tensor, parts)

    When I extracted uvmap by this code, the color of uv map is too strange. Could you help me solve this process! Really appreciated and Have a great day!

    opened by taewhankim 2
  • How to obtain rendered images like Figure 6?

    How to obtain rendered images like Figure 6?

    Hello,

    I was trying to render images from the front and the back while all joints are straight, just like Fig 6.

    I thought I can generate them by playing the parameters of render function, like cam_t, but I could not find which values to set for these parameters.

    Can you help me with that?

    opened by saidaltindis 2
  • I get an error

    I get an error

    Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.cache/torch/checkpoints/resnet50-19c8e357.pth 100% 97.8M/97.8M [00:00<00:00, 125MB/s] Traceback (most recent call last): File "demo.py", line 162, in demo.run_demo(args) File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad return func(*args, **kwargs) File "demo.py", line 122, in run_demo self.get_smpl(img_tensor) File "demo.py", line 44, in get_smpl pred_vertices, pred_cam_t, pred_rotmat, pred_betas = self.rsc_runner.get_3D_batch(img_tensor, scale, pre_process=True) File "/content/f7b7c7758a46da49f84bc68b47997d69/Texformer/RSC_net/ra_test.py", line 34, in get_3D_batch input_batch = self.pre_process_batch(input_batch) File "/content/f7b7c7758a46da49f84bc68b47997d69/Texformer/RSC_net/ra_test.py", line 30, in pre_process_batch return (batch - torch.tensor(self.IMG_NORM_MEAN).view(1, -1, 1, 1)) / torch.tensor(self.IMG_NORM_STD).view(1, -1, 1, 1) RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1

    opened by maylad31 5
Owner
XiangyuXu
XiangyuXu
Code for ICCV 2021 paper "Distilling Holistic Knowledge with Graph Neural Networks"

HKD Code for ICCV 2021 paper "Distilling Holistic Knowledge with Graph Neural Networks" cifia-100 result The implementation of compared methods are ba

Wang Yucheng 29 Jul 27, 2022
Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

NPMs: Neural Parametric Models Project Page | Paper | ArXiv | Video NPMs: Neural Parametric Models for 3D Deformable Shapes Pablo Palafox, Aljaz Bozic

PabloPalafox 107 Sep 29, 2022
Toward Spatially Unbiased Generative Models (ICCV 2021)

Toward Spatially Unbiased Generative Models Implementation of Toward Spatially Unbiased Generative Models (ICCV 2021) Overview Recent image generation

Jooyoung Choi 88 Sep 24, 2022
Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Vision Transformer with Progressive Sampling This is the official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

yuexy 118 Sep 30, 2022
PyTorch implementation of paper: AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer, ICCV 2021.

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer [Paper] [PyTorch Implementation] [Paddle Implementation] Overview This reposit

null 135 Sep 20, 2022
code for ICCV 2021 paper 'Generalized Source-free Domain Adaptation'

G-SFDA Code (based on pytorch 1.3) for our ICCV 2021 paper 'Generalized Source-free Domain Adaptation'. [project] [paper]. Dataset preparing Download

Shiqi Yang 74 Sep 23, 2022
Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral

Improving Contrastive Learning by Visualizing Feature Transformation This project hosts the codes, models and visualization tools for the paper: Impro

Bingchen Zhao 78 Sep 6, 2022
Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction [Paper] [PaddlePaddle Implementation] Homepage of paper: Paint Transformer: Fee

null 428 Sep 29, 2022
HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

Code for HDR Video Reconstruction HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021) Guanying Chen, Cha

Guanying Chen 57 Sep 16, 2022
Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Vision-Language Transformer and Query Generation for Referring Segmentation Please consider citing our paper in your publications if the project helps

Henghui Ding 129 Sep 5, 2022