Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Last update: Dec 24, 2022

Overview

Human Pose Regression with Residual Log-likelihood Estimation

[Paper] [arXiv] [Project Page]

Human Pose Regression with Residual Log-likelihood Estimation
Jiefeng Li, Siyuan Bian, Ailing Zeng, Can Wang, Bo Pang, Wentao Liu, Cewu Lu
ICCV 2021 Oral

Regression with Residual Log-likelihood Estimation

TODO

Provide minimal implementation of RLE loss.
Provide implementation on Human3.6M dataset.
Provide implementation on COCO dataset.

Installation

Install pytorch >= 1.1.0 following official instruction.
Install rlepose:

pip install cython
python setup.py develop

Install COCOAPI.

pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

Init data directory:

mkdir data

Download COCO data:

|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ...

Train from scratch

./scripts/train.sh ./configs/256x192_res50_regress-flow.yaml train_rle

Evaluation

Download the pretrained model from Google Drive.

./scripts/validate.sh ./configs/256x192_res50_regress-flow.yaml ./coco-laplace-rle.pth

Citing

If our code helps your research, please consider citing the following paper:

@inproceedings{li2021human,
    title={Human Pose Regression with Residual Log-likelihood Estimation},
    author={Li, Jiefeng and Bian, Siyuan and Zeng, Ailing and Wang, Can and Pang, Bo and Liu, Wentao and Lu, Cewu},
    booktitle={ICCV},
    year={2021}
}

Comments

Question about the norm in linear layer

Hi, I am curious about the normaliztion applied for fc_coord. What's the meaning of this line? Why should the output divide the norm of the input? https://github.com/Jeff-sjtu/res-loglikelihood-regression/blob/203dc3195ee5a11ed6f47c066ffdb83247511359/rlepose/models/regression_nf.py#L33

opened by sicxu 8
How do multiple point detection works?

I am confused how multiple key point detection works. One-point is easy, just compare the result. Two-point is not, because the model could output A, B or B, A, while the ground truth is a, b. So, multiple-point can have different permutation to compare.

Another thing is, it is possible the model output shape is (B, N, 2) but the number of points needed to be detected is smaller than N. Then, how do you know the exact number of points dynamically.

How about model output repeated points? NMS?

opened by nviwch 7
Some related or extended questions

I implemented a tensorflow keras version for 2D single point regression with some success. The mu is more accurate (sometimes better than ground truth due to human label error) and sigma do have a meaningful representation.

But I have some questions,

I would like to know if sigma bounded by (0, 1) is solely for the ease of classifying missing point or there is another reason. I tried using softplus in sigma so that it is > 0, the result is also very good and the sigma no longer squeezed between (0, 1), a blurry/noisy image can have very high sigma > 4.

As it is possible that the key point is missing in the image, I used a simple classification output previously, i.e. output one more sigmoid value to represent if the point exists. Now I have sigma, but few normal cases' sigma do overlap with missing point cases' sigma around ~0.5 (using softplus). Previously, I can feed missing point image to train, but now I can't? Just asking for some advice, how do I improve missing point detection?

opened by nviwch 6
Implementation of logQ

https://github.com/Jeff-sjtu/res-loglikelihood-regression/blob/203dc3195ee5a11ed6f47c066ffdb83247511359/rlepose/models/criterion.py#L37 Sorry for bothering you, I wonder if I'm right thinking the above line misimplemented the loss from $logQ(\bar\mu)$ in the paper to $logQ(\mu)$？

opened by Indigo6 5
can't reproduce the result present in the paper

hi @Jeff-sjtu i use the cmd ./scripts/train.sh ./configs/256x192_res50_regress-flow.yaml train_rle_coco to reproduce the result present in the paper, but only get ##### Epoch 255 | gt mAP: 0.7123198166036238 | det mAP: 0.6962231312113063 #####, lower than the released model you provide where gt box: 0.7218652898214926 mAP | det box: 0.7127219006071578 mAP. can u give the complete training pipeline to reproduce the result? thanks

opened by zimenglan-sysu-512 5
The order of height and width of the heatmap in the metrics is different.
In this implementation, PCK@50 is used to evaluate the inference results.

https://github.com/Jeff-sjtu/res-loglikelihood-regression/blob/71a7e8fe0e39719000722b7d515f7d1899573618/rlepose/utils/metrics.py#L115-L150

However, I am wondering if the order of the height and width of the heatmap used for normalizing is reversed.

norm = np.ones((preds.shape[0], 2)) * np.array([hm_w, hm_h]) / 10

Other implementations of the awesome method (SimpleBaseline, HRNet, DarkPose)are as follows.

norm = np.ones((pred.shape[0], 2)) * np.array([h, w]) / 10
opened by katsura-jp 4

Error When Installing

Hi, I was just installing this repo, but when I run python setup.py develop, I get some errors in Post-processing (stage 2), and I don't know what do they mean. Could you please help me? Thanks a lot!

Post-processing (stage 2)...
Building modules...
        Building module "mvn"...
                Constructing wrapper function "mvnun"...
                  value,inform = mvnun(lower,upper,means,covar,[maxpts,abseps,releps])
                Constructing wrapper function "mvndst"...
                  error,value,inform = mvndst(lower,upper,infin,correl,[maxpts,abseps,releps])
                Constructing COMMON block support for "dkblck"...
                  ivls
        Wrote C/API module "mvn" to file "build/src.linux-x86_64-3.8/scipy/stats/mvnmodule.c"
        Fortran 77 wrappers are saved to "build/src.linux-x86_64-3.8/scipy/stats/mvn-f2pywrappers.f"
no previously-included directories found matching 'benchmarks/env'
no previously-included directories found matching 'benchmarks/results'
no previously-included directories found matching 'benchmarks/html'
no previously-included directories found matching 'benchmarks/scipy'
no previously-included directories found matching 'scipy/special/tests/data/boost'
no previously-included directories found matching 'scipy/special/tests/data/gsl'
no previously-included directories found matching 'scipy/special/tests/data/local'
no previously-included directories found matching 'doc/build'
no previously-included directories found matching 'doc/source/generated'
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.bak' found anywhere in distribution
warning: no previously-included files matching '*.swp' found anywhere in distribution
warning: no previously-included files matching '*.pyo' found anywhere in distribution
CCompilerOpt.generate_dispatch_header[2281] : dispatch header dir build/src.linux-x86_64-3.8/numpy/distutils/include does not exist, creating it

opened by HeegerGao 3

performance gap between 2d and 3d data

Hi, I'm training by mixing 2d and 3d data, but the 3d-prediction performance is much worse than 2d. the training log is like this: POPO20210817-103512

the pink line is an Integral Pose model, and the green line is RLE. As you can see, the 2d performance is competitive(RLE is better in fact), but the 3d performance is worse Due to I'm doing mix-data training, I calculate my RLE loss as follow:

        is_2d = ~is_3d  # masks of 2d and 3d
        num_3d = is_3d.sum()
        num_2d = is_2d.sum()

        t_cnt = 0
        ### joint loss ###
        jc_loss = 0.
        if num_3d > 0:
            jc_loss += self.joint_loss(joint_coord3d[is_3d], nf_loss[is_3d], sigma[is_3d], target_coord[is_3d])
            t_cnt += 1.
        if num_2d > 0:
            jc_loss += self.joint_loss(joint_coord3d[is_2d, :, :2], nf_loss[is_2d, :, :2], sigma[is_2d, :, :2], target_coord[is_2d, :, :2])
            t_cnt += 1.
        jc_loss /= t_cnt

Do you have any suggestion?

opened by Tau-J 3

cannot reproduce the results on human3.6m

Hi, I have tried to train the model on human3.6m dataset, but cannot get the same performance as the provided pretrained model 'h36m-laplace-rle.pth'. I was using the annotation files and codes from this repository, and the environment is based on pytorch1.8.

The training behaives normally at the beginning, but after some epoch, the loss value was increased and MPJPE reached 200.

Train-9 epoch | loss:-179.81700464 | acc:0.5870
##### Epoch 9 | gt results: 78.48804327541252/78.48804327541252 #####
############# Starting Epoch 10 | LR: 0.001 #############
Train-10 epoch | loss:-178.24089413 | acc:0.5794
############# Starting Epoch 11 | LR: 0.001 #############
Train-11 epoch | loss:-105.90890308 | acc:0.2606
##### Epoch 11 | gt results: 206.2949851515231/78.48804327541252 #####
############# Starting Epoch 12 | LR: 0.001 #############
Train-12 epoch | loss:-99.68143688 | acc:0.2141
############# Starting Epoch 13 | LR: 0.001 #############
Train-13 epoch | loss:-134.96754921 | acc:0.3854

Before lr step, the model would collapse.

############# Starting Epoch 60 | LR: 0.001 #############
Train-60 epoch | loss:-99.61992546 | acc:0.2145
##### Epoch 60 | gt results: 183.75314692309206/71.80444962169011 #####
############# Starting Epoch 61 | LR: 0.001 #############
Train-61 epoch | loss:16064.89554501 | acc:0.0533
##### Epoch 61 | gt results: 973.8534021229914/71.80444962169011 #####
############# Starting Epoch 62 | LR: 0.001 #############
Train-62 epoch | loss:-18.63251462 | acc:0.0071

I doubted if it was from the very large lr, and then experimented with initial lr 10e-4. The loss value was lower but also increased after some epoch.

##### Epoch 64 | gt results: 83.14857669377162/64.38682879287299 #####
############# Starting Epoch 65 | LR: 0.0001 #############
Train-65 epoch | loss:-208.61684037 | acc:0.6639
##### Epoch 65 | gt results: 74.92741776420186/64.38682879287299 #####
############# Starting Epoch 66 | LR: 0.0001 #############
Train-66 epoch | loss:-207.51670638 | acc:0.6620
##### Epoch 66 | gt results: 77.90103486437523/64.38682879287299 #####
############# Starting Epoch 67 | LR: 0.0001 #############
Train-67 epoch | loss:-176.32472946 | acc:0.5414
##### Epoch 67 | gt results: 204.0743792529304/64.38682879287299 #####
############# Starting Epoch 68 | LR: 0.0001 #############
Train-68 epoch | loss:-126.63842224 | acc:0.3175
##### Epoch 68 | gt results: 124.20229629684249/64.38682879287299 #####

If the lr is dropped to 1e-5 before the , I got a final result of 66 MPJPE, compared to the 38 MPJPE by pretrained model.

Additionally, I have tried to downgrade the pytorch to 1.5, but got similar phenomenon.

Do you have any suggestions? Thanks a lot for your help.

opened by a2394797795 3

loss gets nan problem

Hi Jeff, I still have trouble training with RLE in my project. the loss can decrease correctly at beginning, but after some iters, it increases immediately and finally becomes to nan. I'm using Adam and a cosine scheduler with warm-up strategy. POPO20210809-143648 POPO20210809-143638

I implement the regression module as follow: (this is a handpose project, I add two fc heads to predict hand validness and handtype(left/right hand))

class RegressFlow3D(nn.Module):
    def __init__(self, cfg, in_dim):
        super(RegressFlow3D, self).__init__()
        self.num_joints = cfg.joint_num
        self.root_idx = cfg.wrist_joint_idx

        self.avg_pool = nn.AdaptiveAvgPool2d(1)

        self.hand_type_fc = make_linear_layers([in_dim, 128, 1], relu_final=False)
        self.hand_valid_fc = make_linear_layers([in_dim, 64, 1], relu_final=False)

        self.fc_coord = Linear(in_dim, self.num_joints * 3)
        self.fc_sigma = Linear(in_dim, self.num_joints * 3)

        # self.fc_layers = [self.fc_coord, self.fc_sigma]

        prior = distributions.MultivariateNormal(torch.zeros(2), torch.eye(2), validate_args=False)
        masks = torch.from_numpy(np.array([[0, 1], [1, 0]] * 3).astype(np.float32))
        prior3d = distributions.MultivariateNormal(torch.zeros(3), torch.eye(3), validate_args=False)
        masks3d = torch.from_numpy(np.array([[0, 0, 1], [1, 1, 0]] * 3).astype(np.float32))

        self.flow2d = RealNVP(nets, nett, masks, prior)
        self.flow3d = RealNVP(nets3d, nett3d, masks3d, prior3d)

    # def _initialize(self):
    #     for m in self.fc_layers:
    #         if isinstance(m, nn.Linear):
    #             nn.init.xavier_uniform_(m.weight, gain=0.01)

    def forward(self, feat, labels=None):
        BATCH_SIZE = feat.shape[0]

        feat = self.avg_pool(feat).reshape(BATCH_SIZE, -1)

        hand_type = self.hand_type_fc(feat)
        hand_valid = self.hand_valid_fc(feat)

        out_coord = self.fc_coord(feat).reshape(BATCH_SIZE, self.num_joints, 3)
        # (B, N, 3)
        pred_jts = out_coord.reshape(BATCH_SIZE, self.num_joints, 3)
        pred_jts[:, :, 2] = pred_jts[:, :, 2] - pred_jts[:, self.root_idx:self.root_idx + 1, 2]

        if labels is not None:
            gt_uvd = labels['target_coord'].reshape(pred_jts.shape)
            gt_3d_mask = labels['mask3d']

            out_sigma = self.fc_sigma(feat).reshape(BATCH_SIZE, self.num_joints, -1)
            sigma = out_sigma.reshape(BATCH_SIZE, self.num_joints, -1).sigmoid() + 1e-9
            scores = 1 - sigma
            scores = torch.mean(scores, dim=2, keepdim=True)
            bar_mu = (pred_jts - gt_uvd) / sigma
            bar_mu = bar_mu.reshape(-1, 3)

            bar_mu_3d = bar_mu[gt_3d_mask > 0]
            bar_mu_2d = bar_mu[gt_3d_mask < 1][:, :2]

            log_phi = torch.zeros_like(bar_mu[:, 0])
            # (B, K, 3)
            num_3d = bar_mu_3d.shape[0]
            num_2d = bar_mu_2d.shape[0]
            if num_3d:
                log_phi_3d = self.flow3d.log_prob(bar_mu_3d)
                log_phi[gt_3d_mask > 0] = log_phi_3d
            if num_2d:
                log_phi_2d = self.flow2d.log_prob(bar_mu_2d)
                log_phi[gt_3d_mask < 1] = log_phi_2d

            log_phi = log_phi.reshape(BATCH_SIZE, self.num_joints, 1)
            nf_loss = torch.log(sigma) - log_phi

            return pred_jts, scores, nf_loss, sigma, hand_type, hand_valid
        else:
            return pred_jts, hand_type, hand_valid

and loss as follow:

class RLELoss3D(nn.Module):
    ''' RLE Regression Loss 3D
    '''

    def __init__(self, OUTPUT_3D=False, size_average=True):
        super(RLELoss3D, self).__init__()
        self.size_average = size_average
        self.amp = 1 / math.sqrt(2 * math.pi)

    def logQ(self, gt_uv, pred_jts, sigma):
        return torch.log(sigma / self.amp) + torch.abs(gt_uv - pred_jts) / (math.sqrt(2) * sigma + 1e-9)

    def forward(self, pred_jts, nf_loss, sigma, target):
        gt_uv = target.reshape(pred_jts.shape)
        Q_logprob = self.logQ(gt_uv, pred_jts, sigma)
        loss = nf_loss + Q_logprob

        if self.size_average:
            return loss.sum() / len(loss)
        else:
            return loss.sum()

POPO20210809-175054

Could you provide any suggestions about debugging?

opened by Tau-J 3

LogQ in criterion.py

def logQ(self, gt_uv, pred_jts, sigma): return torch.log(sigma / self.amp) + torch.abs(gt_uv - pred_jts) / (math.sqrt(2) * sigma + 1e-9)

from the function definition, it looks like the Q is laplacian distribution but Q = exp( - abs(x - mu) / b) / 2b, after taking log it is log(1/2b) - abs(x - mu) / b

when seeing loss = nf_loss + Q_logprob, nf_loss = log(sigma) - log_phi, compare with the paper saying loss = -log(Q(bar_mu)) +log(sigma)-log_phi, I guess the Q_logprob already has the negative sign inside but then, why it is not -log(1/2b) + abs(x-mu)/b? i.e. the first term torch.log(sigma / self.amp) should have the minus sign.

On the other hand, how do you choose the value of b? in the distribution sense, b = sqrt(variance / 2)

opened by nviwch 3
Some experiment outputs of coco

thanks for author The outputs in coco have AP AP.5 AP.75 ...... but in your code's outputs only have mAP So i want to ask how to get Ap.5 ...... more details data?

opened by Aruisir 0
bar_mu computation is different from the paper in Eq (5)

Hi there,

It seems the bar_mu computation is different. Should be multiplying a "-1". (below Eqn (5), bar_mu = (gt - mu_pred) / sigma.

As shown here:

https://github.com/Jeff-sjtu/res-loglikelihood-regression/blob/203dc3195ee5a11ed6f47c066ffdb83247511359/rlepose/models/regression_nf.py#L134

This does not affect the computation of log_Q, which basically using the abs of this term. How about the flow model? Not sure if this leads to any difference in the learning of the flow model RealNVP, or did i miss something here?

Thanks.

opened by superaha 2
数据集的替换问题和实验结果输出

你好，大佬我是一名本科生对你的工作十分感兴趣。但是在尝试替换loss时遇到了问题。 1.无法直接在其他项目内替换loss（如HRNet的官方代码），我尝试替换数据集定义文件也失败 2.大佬你提供的项目内好像没有可视化以及AP,AR详细精度的结果输出 3.请问如果要更换backbone网络如更改为HourglassNet需要更改为分类网络后在跟nvp适配么，不添加分类头的话会如何呢谢谢大佬

opened by Aruisir 0
out_coord.shape[2] == 2

Hello, could you tell me why the third dimension must ensure is 2? " out_coord = self.fc_coord(x).reshape(BATCH_SIZE, self.num_joints, 2) assert out_coord.shape[2] == 2"

opened by flomok 0
about the implement of RLE on two-stage 3D HPE mehtods

Hi, I'm interested in RLE. it is a nice job. I notice that you embed RLE into two-stage 3D HPE methods, such as SRnet, but I didn't see it in this repo. can you introduce the implementation detail about this? Is there any else that needs to notice?

Thanks very much!

opened by ChenyangWang95 0

Owner

JeffLi

jeff.lee.sjtu[at]gmail[dot]com

GitHub

Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds (ICCV 2021 oral) **Project Page | Arxiv ** Runsong Zhu¹, Yuan Liu², Zhen Dong¹, Te

40 Dec 30, 2022

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

66 Nov 16, 2022

Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral

Improving Contrastive Learning by Visualizing Feature Transformation This project hosts the codes, models and visualization tools for the paper: Impro

83 Dec 15, 2022

BARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)

BARF ?? : Bundle-Adjusting Neural Radiance Fields Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Simon Lucey IEEE International Conference on Comp

539 Dec 28, 2022

[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers Created by Xumin Yu*, Yongming Rao*, Ziyi Wang, Zuyan Liu, Jiwen Lu, Jie Zhou

317 Dec 26, 2022

[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

NerfingMVS Project Page | Paper | Video | Data NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo Yi Wei, Shaohui

369 Dec 24, 2022

[ICCV 2021 (oral)] Planar Surface Reconstruction from Sparse Views

Planar Surface Reconstruction From Sparse Views Linyi Jin, Shengyi Qian, Andrew Owens, David F. Fouhey University of Michigan ICCV 2021 (Oral) This re

89 Jan 5, 2023

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)

ILVR + ADM This is the implementation of ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral). This repository is h

225 Dec 28, 2022

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

52 Dec 19, 2022

[ICCV 2021 Oral] SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

This repository contains the source code for the paper SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer (ICCV 2021 Oral). The project page is here.

65 Dec 26, 2022

This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Introduction This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures. @inproceedings{Wa

42 Jan 7, 2023

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

67 Jan 3, 2023