[ICCV-2021] An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation

rongchangxie

Last update: Jan 4, 2023

Related tags

Overview

An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation (ICCV 2021)

Introduction

This is an official pytorch implementation of An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation. [ICCV 2021] PDF

Abstract

Most semi-supervised learning models are consistency-based, which leverage unlabeled images by maximizing the similarity between different augmentations of an image. But when we apply them to human pose estimation that has extremely imbalanced class distribution, they often collapse and predict every pixel in unlabeled images as background. We find this is because the decision boundary passes the high-density areas of the minor class so more and more pixels are gradually mis-classified as background.

In this work, we present a surprisingly simple approach to drive the model. For each image, it composes a pair of easy-hard augmentations and uses the more accurate predictions on the easy image to teach the network to learn pose information of the hard one. The accuracy superiority of teaching signals allows the network to be “monotonically” improved which effectively avoids collapsing. We apply our method to the state-of-the-art pose estimators and it further improves their performance on three public datasets.

Main Results

1. Semi-Supervised Setting

Results on COCO Val2017

Method	Augmentation	1K Labels	5K Labels	10K Labels
Supervised	Affine	31.5	46.4	51.1
PoseCons (Single)	Affine	38.5	50.5	55.4
PoseCons (Single)	Affine + Joint Cutout	42.1	52.3	57.3
PoseDual (Dual)	Affine	41.5	54.8	58.7
PoseDual (Dual)	Affine + RandAug	43.7	55.4	59.3
PoseDual (Dual)	Affine + Joint Cutout	44.6	55.6	59.6

We use COCO Subset (1K, 5K and 10K) and TRAIN as labeled and unlabeled datasets, respectively

Note:

The Ground Truth person boxes is used
No flipping test is used.

2. Full labels Setting

Results on COCO Val2017

Method	Network	AP	AP.5	AR
Supervised	ResNet50	70.9	91.4	74.2
PoseDual	ResNet50	73.9 (↑3.0)	92.5	77.0
Supervised	HRNetW48	77.2	93.5	79.9
PoseDual	HRNetW48	79.2 (↑2.0)	94.6	81.7

We use COCO TRAIN and WILD as labeled and unlabeled datasets, respectively

Pretrained Models

Download Links Google Drive

Environment

The code is developed using python 3.7 on Ubuntu 16.04. NVIDIA GPUs are needed.

Quick start

Installation

Install pytorch >= v1.2.0 following official instruction.
Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.
Install dependencies:
```
pip install -r requirements.txt
```
Make libs:
```
cd ${POSE_ROOT}/lib
make
```
Init output(training model output directory)::
```
 mkdir output 
 mkdir log
```
Download pytorch imagenet pretrained models from Google Drive. The PoseDual (ResNet18) should load resnet18_5c_gluon_posedual as pretrained for training,

Download our pretrained models from Google Drive

${POSE_ROOT}
 `-- models
     `-- pytorch
         |-- imagenet
         |   |-- resnet18_5c_f3_posedual.pth
         |   |-- resnet18-5c106cde.pth
         |   |-- resnet50-19c8e357.pth
         |   |-- resnet101-5d3b4d8f.pth
         |   |-- resnet152-b121ed2d.pth
         |   |-- ......
         |-- pose_dual
             |-- COCO_subset
             |   |-- COCO1K_PoseDual.pth.tar
             |   |-- COCO5K_PoseDual.pth.tar
             |   |-- COCO10K_PoseDual.pth.tar
             |   |-- ......
             |-- COCO_COCOwild
             |-- ......

Data preparation

For COCO and MPII dataset, Please refer to Simple Baseline to prepare them.
Download Person Detection Boxes and Images for COCO WILD (unlabeled) set. The structure looks like this:

${POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   |-- person_keypoints_val2017.json
        |   `__ image_info_unlabeled2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        |   |-- COCO_test-dev2017_detections_AP_H_609_person.json
        |   `-- COCO_unlabeled2017_detections_person_faster_rcnn.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- ...

For AIC data, please download from AI Challenger 2017, 2017 Train/Val is needed for keypoints training and validation. Please download the annotation files from AIC Annotations. The structure looks like this:

${POSE_ROOT}
|-- data
`-- |-- ai_challenger
    `-- |-- train
        |   |-- images
        |   `-- keypoint_train_annotation.json
        `-- validation
            |-- images
            |   |-- 0a00c0b5493774b3de2cf439c84702dd839af9a2.jpg
            |   |-- 0a0c466577b9d87e0a0ed84fc8f95ccc1197f4b0.jpg
            |   `-- ...
            |-- gt_valid.mat
            `-- keypoint_validation_annotation.json

Run

Training

1. Training Dual Networks (PoseDual) on COCO 1K labels

python pose_estimation/train.py \
    --cfg experiments/mix_coco_coco/res18/256x192_COCO1K_PoseDual.yaml

2. Training Dual Networks on COCO 1K labels with Joint Cutout

python pose_estimation/train.py \
    --cfg experiments/mix_coco_coco/res18/256x192_COCO1K_PoseDual_JointCutout.yaml

3.Training Dual Networks on COCO 1K labels with Distributed Data Parallel

python -m torch.distributed.launch --nproc_per_node=4  pose_estimation/train.py \
    --distributed --cfg experiments/mix_coco_coco/res18/256x192_COCO1K_PoseDual.yaml

4. Training Single Networks (PoseCons) on COCO 1K labels

python pose_estimation/train.py \
    --cfg experiments/mix_coco_coco/res18/256x192_COCO1K_PoseCons.yaml

5. Training Dual Networks (PoseDual) with ResNet50 on COCO TRAIN + WILD

python pose_estimation/train.py \
    --cfg experiments/mix_coco_coco/res50/256x192_COCO_COCOunlabel_PoseDual_JointCut.yaml

Testing

6. Testing Dual Networks (PoseDual+COCO1K) on COCO VAL

python pose_estimation/valid.py \
    --cfg experiments/mix_coco_coco/res18/256x192_COCO1K_PoseDual.yaml

Citation

If you use our code or models in your research, please cite with:

@inproceedings{semipose,
  title={An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation},
  author={Xie, Rongchang and Wang, Chunyu and Zeng, Wenjun and Wang, Yizhou},
  booktitle={ICCV},
  year={2021}
}

Acknowledgement

The code is mainly based on Simple Baseline and HRNet. Some code comes from DarkPose. Thanks for their works.

Comments

Easy and hard augmentation?

Hi,

Thanks for sharing this exciting work! I have some practical questions regarding the implementation of the augmentation in your method:

In the pos_dual.py file, the augmentations were annotated here:

        # Teacher
        # Easy Augmentation
        with torch.no_grad():
            unsup_fea1, unsup_ht1 = self.resnet(unsup_x)
            unsup_fea2, unsup_ht2 = self.resnet2(unsup_x)

and

        # Student
        # Hard Augmentation
        _, cons_ht1 = self.resnet(unsup_x_trans)
        _, cons_ht2 = self.resnet2(unsup_x_trans_2)

but they are essentially just images being passed through ResNets?

For this part where from my understanding is the real place augmentations happen:

        # Transform
        # Apply Affine Transformation again for hard augmentation
        if self.cfg.UNSUP_TRANSFORM:
            with torch.no_grad():
                theta = self.get_batch_affine_transform(batch_size)
                grid = F.affine_grid(theta, sup_x.size()).float()

                unsup_x_trans = F.grid_sample(unsup_x_trans, grid)
                unsup_x_trans_2 = F.grid_sample(unsup_x_trans_2, grid)

                ht_grid = F.affine_grid(theta, unsup_ht1.size()).float()
                unsup_ht_trans1 = F.grid_sample(unsup_ht1.detach(), ht_grid)
                unsup_ht_trans2 = F.grid_sample(unsup_ht2.detach(), ht_grid)

These augmentations seem to share the same set of parameters, which means the augmentation should be on the same level, instead of having a difference in the magnitude. Would you please clarify these parts?

Thanks a lot in advance for your time.

opened by wangkaihong 1

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

4 Dec 15, 2022

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

45 Dec 12, 2022

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

4 Aug 28, 2022

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

3.9k Jan 5, 2023

Human head pose estimation using Keras over TensorFlow.

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild.

71 Jan 5, 2023

[ICCV-2021] An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation

Related tags

Overview

An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation (ICCV 2021)

Introduction

Abstract

Main Results

1. Semi-Supervised Setting

Note:

2. Full labels Setting

Pretrained Models

Environment

Quick start

Installation

Data preparation

Run

Training

1. Training Dual Networks (PoseDual) on COCO 1K labels

2. Training Dual Networks on COCO 1K labels with Joint Cutout

3.Training Dual Networks on COCO 1K labels with Distributed Data Parallel

4. Training Single Networks (PoseCons) on COCO 1K labels

5. Training Dual Networks (PoseDual) with ResNet50 on COCO TRAIN + WILD

Testing

6. Testing Dual Networks (PoseDual+COCO1K) on COCO VAL

Citation

Acknowledgement

You might also like...

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Human head pose estimation using Keras over TensorFlow.

Comments

Easy and hard augmentation?

Owner

rongchangxie

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).