Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

Overview

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

report report

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation,
Zicong Fan, Adrian Spurr, Muhammed Kocabas, Siyu Tang, Michael J. Black, Otmar Hilliges International Conference on 3D Vision (3DV), 2021

Image

Features

DIGIT estimates the 3D poses of two interacting hands from a single RGB image. This repo provides the training, evaluation, and demo code for the project in PyTorch Lightning.

Updates

  • November 25 2021: Initial repo with training and evaluation on PyTorch Lightning 0.9.

Setting up environment

DIGIT has been implemented and tested on Ubuntu 18.04 with python >= 3.7, PyTorch Lightning 0.9 and PyTorch 1.6.

Clone the repo:

git clone https://github.com/zc-alexfan/digit-interacting

Create folders needed:

make folders

Install conda environment:

conda create -n digit python=3.7
conda deactivate
conda activate digit
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
pip install -r requirements.txt

Downloading InterHand2.6M

  • Download the 5fps.v1 of InterHand2.6M, following the instructions here
  • Place annotations, images, and rootnet_output from InterHand2.6M under ./data/InterHand/*:
./data/InterHand
├── annotations
├── images
│   ├── test
│   ├── train
│   └── val
├── rootnet_output
│   ├── rootnet_interhand2.6m_output_all_test.json
│   └── rootnet_interhand2.6m_output_machine_annot_val.json
|-- annotations
|-- images
|   |-- test
|   |-- train
|   `-- val
`-- rootnet_output
    |-- rootnet_interhand2.6m_output_test.json
    `-- rootnet_interhand2.6m_output_val.json
  • The folder ./data/InterHand/annotations should look like this:
./data/InterHand/annotations
|-- skeleton.txt
|-- subject.txt
|-- test
|   |-- InterHand2.6M_test_MANO_NeuralAnnot.json
|   |-- InterHand2.6M_test_camera.json
|   |-- InterHand2.6M_test_data.json
|   `-- InterHand2.6M_test_joint_3d.json
|-- train
|   |-- InterHand2.6M_train_MANO_NeuralAnnot.json
|   |-- InterHand2.6M_train_camera.json
|   |-- InterHand2.6M_train_data.json
|   `-- InterHand2.6M_train_joint_3d.json
`-- val
    |-- InterHand2.6M_val_MANO_NeuralAnnot.json
    |-- InterHand2.6M_val_camera.json
    |-- InterHand2.6M_val_data.json
    `-- InterHand2.6M_val_joint_3d.json

Preparing data and backbone for training

Download the ImageNet-pretrained backbone from here and place it under:

./saved_models/pytorch/imagenet/hrnet_w32-36af842e.pt

Package images into lmdb:

cd scripts
python package_images_lmdb.py

Preprocess annotation:

python preprocess_annot.py

Render part segmentation masks:

  • Following the README.md of render_mano_ih to prepare an LMDB of part segmentation. For question in preparing the segmentation masks, please keep issues in there.

Place the LMDB from the images, the segmentation masks, and meta_dict_*.pkl to ./data/InterHand and it should look like the structure below. The cache files meta_dict_*.pkl are by-products of the step above.

|-- annotations
|   |-- skeleton.txt
|   |-- subject.txt
|   |-- test
|   |   |-- InterHand2.6M_test_MANO_NeuralAnnot.json
|   |   |-- InterHand2.6M_test_camera.json
|   |   |-- InterHand2.6M_test_data.json
|   |   |-- InterHand2.6M_test_data.pkl
|   |   `-- InterHand2.6M_test_joint_3d.json
|   |-- train
|   |   |-- InterHand2.6M_train_MANO_NeuralAnnot.json
|   |   |-- InterHand2.6M_train_camera.json
|   |   |-- InterHand2.6M_train_data.json
|   |   |-- InterHand2.6M_train_data.pkl
|   |   `-- InterHand2.6M_train_joint_3d.json
|   `-- val
|       |-- InterHand2.6M_val_MANO_NeuralAnnot.json
|       |-- InterHand2.6M_val_camera.json
|       |-- InterHand2.6M_val_data.json
|       |-- InterHand2.6M_val_data.pkl
|       `-- InterHand2.6M_val_joint_3d.json
|-- cache
|   |-- meta_dict_test.pkl
|   |-- meta_dict_train.pkl
|   `-- meta_dict_val.pkl
|-- images
|   |-- test
|   |-- train
|   `-- val
|-- rootnet_output
|   |-- rootnet_interhand2.6m_output_test.json
|   `-- rootnet_interhand2.6m_output_val.json
`-- segm_32.lmdb

Training and evaluating

To train DIGIT, run the command below. The script runs at a batch size of 64 using accumulated gradient where each iteration is on a batch size 32:

python train.py --iter_batch 32 --batch_size 64 --gpu_ids 0 --trainsplit train --precision 16 --eval_every_epoch 2 --lr_dec_epoch 40 --max_epoch 50 --min_epoch 50

OR if you just want to do a sanity check you can run:

python train.py --iter_batch 32 --batch_size 64 --gpu_ids 0 --trainsplit minitrain --valsplit minival --precision 16 --eval_every_epoch 1 --max_epoch 50 --min_epoch 50

Each time you run train.py, it will create a new experiment under logs and each experiment is assigned a key.

Supposed your experiment key is 2e8c5136b, you can evaluate the last epoch of the model on the test set by:

python test.py --eval_on minitest --load_ckpt logs/2e8c5136b/model_dump/last.ckpt

OR

python test.py --eval_on test --load_ckpt logs/2e8c5136b/model_dump/last.ckpt

The former only does the evaluation 1000 images for a sanity check.

Similarly, you can evaluate on the validation set:

python test.py --eval_on val --load_ckpt logs/2e8c5136b/model_dump/last.ckpt

Visualizing and evaluating pre-trained DIGIT

Here we provide instructions to show qualitative results of DIGIT.

Download pre-trained DIGIT:

wget https://dataset.ait.ethz.ch/downloads/dE6qPPePCV/db7cba8c1.pt
mv db7cba8c1.pt saved_models

Visualize results:

CUDA_VISIBLE_DEVICES=0 python demo.py --eval_on minival --load_from saved_models/db7cba8c1.pt  --num_workers 0

Evaluate pre-trained digit:

CUDA_VISIBLE_DEVICES=0 python test.py --eval_on test --load_from saved_models/db7cba8c1.pt --precision 16
CUDA_VISIBLE_DEVICES=0 python test.py --eval_on val --load_from saved_models/db7cba8c1.pt --precision 16

You should have the same results as in here.

The results will be dumped to ./visualization.

Citation

@inProceedings{fan2021digit,
  title={Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-pixel Part Segmentation},
  author={Fan, Zicong and Spurr, Adrian and Kocabas, Muhammed and Tang, Siyu and Black, Michael and Hilliges, Otmar},
  booktitle={International Conference on 3D Vision (3DV)},
  year={2021}
}

License

Since our code is developed based on InterHand2.6M, which is CC-BY-NC 4.0 licensed, the same LICENSE is applied to DIGIT.

DIGIT is CC-BY-NC 4.0 licensed, as found in the LICENSE file.

References

Some code in our repo uses snippets of the following repo:

Please consider citing them if you find our code useful:

@inproceedings{Moon_2020_ECCV_InterHand2.6M,  
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},  
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},  
booktitle = {European Conference on Computer Vision (ECCV)},  
year = {2020}  
}  

@inproceedings{sun2019deep,
  title={Deep High-Resolution Representation Learning for Human Pose Estimation},
  author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
  booktitle={CVPR},
  year={2019}
}

@inproceedings{xiao2018simple,
    author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
    title={Simple Baselines for Human Pose Estimation and Tracking},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2018}
}

@misc{Charles2013,
  author = {milesial},
  title = {Pytorch-UNet},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/milesial/Pytorch-UNet}}
}

Contact

For any question, you can contact [email protected].

Comments
  • Wrong segmentation

    Wrong segmentation

    Hi, outputs/segms/train/Capture9/0129_indextip/cam410062/image32412.png. This segmentation has a value of '33', which is not expected. Do you have any ideas?

    opened by ZhengdiYu 5
  • Demo on a single image

    Demo on a single image

    Hi,

    I read the paper and it looks interesting. I was wondering how you would run this model on a single image. The provided demo code seems to only run on the validation set.

    So if I have an image for which I have the bounding box/segmentation data of the hands, then how should I preprocess this for the model?

    Given that this code is based on InterHand2.6M I tried to do something similar to what they have in their demo code:

    def demo_model(args):
        pl.seed_everything(args.seed)
        model = fetch_pl_model(args, args.experiment)
    
        model.cuda()
        model.freeze()
        model.eval()
    
        # prepare input image
        transform = transforms.ToTensor()
        img_path = '1.jpg'
        original_img = cv2.imread(img_path)
        original_img_height, original_img_width = original_img.shape[:2]
    
        # prepare bbox
        bbox = [723, 354, 127, 74] # xmin, ymin, width, height  1
        bbox = process_bbox(bbox, (original_img_height, original_img_width, original_img_height))
        img, trans, inv_trans = generate_patch_image(original_img, bbox, False, 1.0, 0.0, cfg.input_img_shape)
        img = transform(img.astype(np.float32))/255
        img = img.cuda()[None,:,:,:]
    
        # forward
        inputs = {'img': img}
        targets = {}
        meta_info = {}
    
        with torch.no_grad():
            vis_dict = model(inputs, targets, meta_info, 'vis')
            print("vis_dict", vis_dict)`
    
    

    Where process_bbox and generate_patch_image are from InterHand2.6M however, it didn't work out for me. The model always returned None. My guess is that it requires the bbox to be processed differently, looking at the code I think it has to do something with the "segm_256" key in the "targets" dict, but not sure what would be the easiest way.

    I would appreciate some help or directions on what would be the simplest way to write a script that runs the model on a single image.

    Best regards, Kevin

    opened by kevinhartyanyi 5
  • Unexpected bus error encountered

    Unexpected bus error encountered

    Hi, I ran into a problem while following your README to run the code. I wonder whether it's because I ran it on CUDA11.3 did I run the render_mano_ih operation incorrectly, and there is a problem with the lmdb file obtained..

    When I train DIGIT or test it, I met the problem with dataloader:

    Number of annotations in single hand sequences: 58463 Number of annotations in interacting hand sequences: 36568 Total number of annotations: 95031

    Defrost ALL GPU available: True, used: True TPU available: False, using: 0 TPU cores CUDA_VISIBLE_DEVICES: [0] Using native 16bit precision.

    | Name | Type | Params

    0 | model | Model | 40 M

    Validation sanity check: 0it [00:00, ?it/s]ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). Traceback (most recent call last): File "/home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 872, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/queue.py", line 179, in get self.not_empty.wait(remaining) File "/home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/threading.py", line 300, in wait gotit = waiter.acquire(True, timeout) File "/home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 4844) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "train.py", line 56, in run_exp(cfg) File "train.py", line 51, in run_exp trainer.fit(model, train_loader, [val_loader]) File "/home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn result = fn(self, *args, **kwargs) File "/home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1073, in fit results = self.accelerator_backend.train(model) File "/home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_backend.py", line 51, in train results = self.trainer.run_pretrain_routine(model) File "/home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1224, in run_pretrain_routine self._run_sanity_check(ref_model, model) File "/home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1257, in _run_sanity_check eval_results = self._evaluate(model, self.val_dataloaders, max_batches, False) File "/home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 305, in _evaluate for batch_idx, batch in enumerate(dataloader): File "/home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1068, in _next_data idx, data = self._get_data() File "/home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1024, in _get_data success, data = self._try_get_data() File "/home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 885, in _try_get_data raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e RuntimeError: DataLoader worker (pid(s) 4844) exited unexpectedly.

    However, when I change num_workers to 0, I met another problem:

    Using native 16bit precision.

    | Name | Type | Params

    0 | model | Model | 40 M

    /home/zhenghanmo/anaconda3/envs/digit/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 36 which is the number of cpus on this machine) in theDataLoader` init to improve performance. warnings.warn(*args, **kwargs) Validation sanity check: 0it [00:00, ?it/s]Bus error (core dumped)

    opened by zhm1211 4
  • segmentation probability distribution

    segmentation probability distribution

    image image

    Hi,

    Q1. I was trying to understand this part and Fig 6c. of your paper. What is segmentation probability distribution? Do you mean (B, class_num, H, W) before doing argmax into (B, 1, H, W)

    You mentioned you trained two different pipelines using Fig6c. But I couldn't find a correspondence in Fig6c. Do you mean that you replace Segm.(S) with (B, class_num, H, W)and (B, 1, H, W) separately?

    Q2. What is the corresponding class of the part-segmentation label ? is it the same as original MANO's joints without tips or InterHands2.6M's order? (i.e. wrist is 1 or 16?)

    Q3. https://github.com/zc-alexfan/digit-interacting/blob/36e5110ac91b0e6ac2151d43d5b1cc5712037080/src/dataset/dataset_utils.py#L133-L149 What is img_segm_swapped[1], img_segm_swapped[2] = img_segm_swapped[2].clone(), img_segm_swapped[1].clone() used for? I think this operation is conducted on the channel dimension, I'm not sure what is this for. The 3 channels should be the same.

    Q4. Why don't you ignore background label when training the segmentation?

    opened by ZhengdiYu 3
  • the meaning of 'package_images_lmdb.py'

    the meaning of 'package_images_lmdb.py'

    Thank you for your excellent research.

    I wonder the meaning of the code package_images_lmdb.py in scripts.

    1. At line 12, why fnames is redefined as fnames[:1000]?
    2. At line 14, why 5130240 is multiplied to len(fnames)?

    I'll be waiting for the reply. Thank you.

    opened by redorangeyellowy 1
  • Evaluation data split?

    Evaluation data split?

    Hi,

    Thanks for your great work, I just want to confirm that what data split you are using in table 1 and table 2 separately, because they have different results. Your MPJPE is 16.55 in table 2 but 14.27 in table 1

    opened by ZhengdiYu 1
Owner
Zicong Fan
A Ph.D. student at ETH Zurich.
Zicong Fan
This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization

Spherical Gaussian Optimization This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization. This code has b

null 41 Dec 14, 2022
sssegmentation is a general framework for our research on strongly supervised semantic segmentation.

sssegmentation is a general framework for our research on strongly supervised semantic segmentation.

null 445 Jan 2, 2023
Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Pixel-Level Cycle Association This is the Pytorch implementation of our NeurIPS 2020 Oral paper Pixel-Level Cycle Association: A New Perspective for D

null 87 Oct 19, 2022
Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

Pixel Difference Convolution This repository contains the PyTorch implementation for "Pixel Difference Networks for Efficient Edge Detection" by Zhuo

Alex 236 Dec 21, 2022
ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)

ILVR + ADM This is the implementation of ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral). This repository is h

Jooyoung Choi 225 Dec 28, 2022
A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

ParallelFold Author: Bozitao Zhong This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (p

Bozitao Zhong 77 Dec 22, 2022
A strongly-typed genetic programming framework for Python

monkeys "If an army of monkeys were strumming on typewriters they might write all the books in the British Museum." monkeys is a framework designed to

H. Chase Stevens 115 Nov 27, 2022
GrailQA: Strongly Generalizable Question Answering

GrailQA is a new large-scale, high-quality KBQA dataset with 64,331 questions annotated with both answers and corresponding logical forms in different syntax (i.e., SPARQL, S-expression, etc.). It can be used to test three levels of generalization in KBQA: i.i.d., compositional, and zero-shot.

OSU DKI Lab 76 Dec 21, 2022
Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data arXiv This is the code base for weakly supervised NER. We provide a

Amazon 92 Jan 4, 2023
Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Good news! We release a clean version of PVNet: clean-pvnet, including how to train the PVNet on the custom dataset. Use PVNet with a detector. The tr

ZJU3DV 722 Dec 27, 2022
RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

[3DV 2021] We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network.

Phong Nguyen Ha 4 May 26, 2022
[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

 同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 842 Jan 4, 2023
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
[3DV 2020] PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction

PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction International Conference on 3D Vision, 2020 Sai Sagar Jinka1, Rohan

Rohan Chacko 39 Oct 12, 2022
Shape Matching of Real 3D Object Data to Synthetic 3D CADs (3DV project @ ETHZ)

Real2CAD-3DV Shape Matching of Real 3D Object Data to Synthetic 3D CADs (3DV project @ ETHZ) Group Member: Yue Pan, Yuanwen Yue, Bingxin Ke, Yujie He

null 24 Jun 22, 2022
Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

HamasKhan 3 Jul 8, 2022
Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation Exploring Cross-Image Pixel Contrast for Semantic Segmentation, Wenguan Wang, Tianfei Z

Tianfei Zhou 510 Jan 2, 2023
Pixel Consensus Voting for Panoptic Segmentation (CVPR 2020)

Implementation for Pixel Consensus Voting (CVPR 2020). This codebase contains the essential ingredients of PCV, including various spatial discretizati

Haochen 23 Oct 25, 2022
Pixel-wise segmentation on VOC2012 dataset using pytorch.

PiWiSe Pixel-wise segmentation on the VOC2012 dataset using pytorch. FCN SegNet PSPNet UNet RefineNet For a more complete implementation of segmentati

Bodo Kaiser 378 Dec 30, 2022