Code for the Lovász-Softmax loss (CVPR 2018)

Overview

The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks

Maxim Berman, Amal Rannen Triki, Matthew B. Blaschko

ESAT-PSI, KU Leuven, Belgium.

Published in CVPR 2018. See project page, arxiv paper, paper on CVF open access.

PyTorch implementation of the loss layer (pytorch folder)

Files included:

  • lovasz_losses.py: Standalone PyTorch implementation of the Lovász hinge and Lovász-Softmax for the Jaccard index
  • demo_binary.ipynb: Jupyter notebook showcasing binary training of a linear model, with the Lovász Hinge and with the Lovász-Sigmoid.
  • demo_multiclass.ipynb: Jupyter notebook showcasing multiclass training of a linear model with the Lovász-Softmax

The binary lovasz_hinge expects real-valued scores (positive scores correspond to foreground pixels).

The multiclass lovasz_softmax expect class probabilities (the maximum scoring category is predicted). First use a Softmax layer on the unnormalized scores.

TensorFlow implementation of the loss layer (tensorflow folder)

Files included:

  • lovasz_losses_tf.py: Standalone TensorFlow implementation of the Lovász hinge and Lovász-Softmax for the Jaccard index
  • demo_binary_tf.ipynb: Jupyter notebook showcasing binary training of a linear model, with the Lovász Hinge and with the Lovász-Sigmoid.
  • demo_multiclass_tf.ipynb: Jupyter notebook showcasing the application of the multiclass loss with the Lovász-Softmax

Warning: the losses values and gradients have been tested to be the same as in PyTorch (see notebooks), however we have not used the TF implementation in a training setting.

Usage

See the demos for simple proofs of principle.

FAQ

  • How should I use the Lovász-Softmax loss?

The loss can be optimized on its own, but the optimal optimization hyperparameters (learning rates, momentum) might be different from the best ones for cross-entropy. As discussed in the paper, optimizing the dataset-mIoU (Pascal VOC measure) is dependent on the batch size and number of classes. Therefore you might have best results by optimizing with cross-entropy first and finetuning with our loss, or by combining the two losses.

See for example how the work Land Cover Classification From Satellite Imagery With U-Net and Lovasz-Softmax Loss by Alexander Rakhlin et al. used our loss in the CVPR 18 DeepGlobe challenge.

  • Inference in Tensorflow is very slow...

Compiling from Tensorflow master (or using a future distribution that includes commit tensorflow/tensorflow@73e3215) should solve this problem; see issue #6.

Citation

Please cite

@inproceedings{berman2018lovasz,
  title={The Lov{\'a}sz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks},
  author={Berman, Maxim and Rannen Triki, Amal and Blaschko, Matthew B},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={4413--4421},
  year={2018}
}
Comments
  • Fail to improve the performance if I train the model from scratch.

    Fail to improve the performance if I train the model from scratch.

    Really interesting work. I have a baseline with softmax loss with the Deeplabv3 and achieve mIOU=76.7 on the Cityscapes.

    And I simply replace the cross entropy loss with your proposed loss and train the models with the same learning rate and weight decay, but I only achieve mIOU=64.7.

    Could you give me some hint?

    I also notice that you also do not train the ENet from scratch and you just finetune the models.

    Besides, I also conduct a small experiments to train the models with both the cross entropy loss and your proposed loss, which achieves a good performance: mIOU=78.4.

    It would be great if you could share me your advice!

    question answered 
    opened by PkuRainBow 12
  • plug'n play implentation for tensorflow/keras

    plug'n play implentation for tensorflow/keras

    I'm trying to use your tensorflow implementation for a U-net using keras. The problem I face is that I cannot simply plugin the lovasz_softmax loss function U-Net model since the loss function takes the labels as image batches dim(labels) = (batchsize, width, height, 1) and the probas in the one-hot-vector notation - as it should be for such a problem (dim(probas)=(batchsize, width, height, n_classes).

    Simply taking argmax of the probas does not work because keras raises an Exception when trying to calculate the gradient of argmax during training. If I understood your paper correctly, this is problem what the use lovasz softmax should avoid.

    model.compile(optimizer=SDG() , loss=lovasz_sofmax, metrics=["accuracy"])

    opened by rhoef 9
  • Got a very low MIoU after simply swapping out the cross entropy loss for

    Got a very low MIoU after simply swapping out the cross entropy loss for "lovasz_softmax"

    Hello, nice to read this paper. I have encountered the problem that I got a very low miou(0.003) from Deeplabv3+ with Lovasz_softmax. It can normally achieve miou=76% using cross entropy loss. Environment: pytorch 1.0 Ubuntu 16.04 batch size: 10 dataset: Pascal VOC 2012 (aug) loaded ImageNet pretrained ResNet-101 weight

    And here is the code of Lovasz softmax:

    class LovaszSoftmax(nn.Module):
        def __init__(self, per_image=False):
            super(LovaszSoftmax, self).__init__()
            self.lovasz_softmax = lovasz_softmax
            self.per_image = per_image
    
        def forward(self, pred, label):
            pred = F.softmax(pred, dim=1)
            return self.lovasz_softmax(pred, label, per_image=self.per_image, ignore=255)
    

    Thanks!

    opened by Tensorfengsheng1926 5
  • Labels should be only {-1,1} in case of binary segmentation?

    Labels should be only {-1,1} in case of binary segmentation?

    Hi there,

    Thanks for sharing the code of this fantastic job. Congratulations on your CVPR paper! I have a question about the labels in the ground-truths. The gt labels should be {-1,1} (-1:background, 1:foreground) and for instance {0,1} (0:background, 1:foreground) doesn't work properly, right?

    opened by SorourMo 5
  • Lovasz softmax with 1 class and small batch does not learn

    Lovasz softmax with 1 class and small batch does not learn

    I have an image segmentation task with small batch size (4-8) and some samples that have only the background (negative) class.

    I have implemented lovasz softmax as below:

    loss2 = lovasz_softmax(probs, labels, classes=[1], per_image=False)
    

    where probs are B, H, W, C, and label is B, H, W with a range of [0, 1]

    However, the network does not learn at all -- the output feature maps look random, and tuning the learning rate does not improve the issue.

    The same network works fine with dice, Tversky, focal, or BCE loss.

    I think it is due to the presence of background-only classes -- I know that classes = 'present' solves this for multi class problems. Is there a way to do the same for a binary lovasz softmax?

    opened by JohnMBrandt 4
  • "name 'ifilterfalse' is not defined" in Python3

    Hi, I've tried to run PyTorch implementation in Python 3.6 and got the error:

        226     l = iter(l)
        227     if ignore_nan:
    --> 228         l = ifilterfalse(isnan, l)
        229     try:
        230         n = 1
    
    NameError: name 'ifilterfalse' is not defined
    

    I think the function needs to be aliased after import in Python 3 like this:

    from itertools import filterfalse as filterfalse
    
    opened by mxwell 3
  • Fix the case that labels are all ignored

    Fix the case that labels are all ignored

    An error occurred in line 177 ( C = probas.size(1) ) when input labels are all ignored, because labels got empty. This PR deal with such problems.

    One thing I concerned about is the difference between the lines I added and below lines

            if len(labels) == 0:
                # only void pixels, the gradients should be 0
                return logits.sum() * 0.
    

    Above is a part of the original implementation of lovasz_flat. This doesn't exclude void labels for calculation the loss. Is it excepted behavior?

    opened by lyakaap 3
  • About the slow speed on tensorflow

    About the slow speed on tensorflow

    Hello, I loved your work on Lovasz softmax very much and implemented it on a modified version of Deeplabv3+ in Tensorflow. However, I experienced significant speed drop, the time used per step increased from 0.4s ( using cross entropy) to now almost 3.8s. Is this normal or did I do something wrong? Thankyou!

    opened by MarkYangjiayi 3
  • possible bug

    possible bug

    Hi, Thank you for your work.

    I believe there is error here

    https://github.com/bermanmaxim/LovaszSoftmax/blob/6309c68a2276ada25ebf04692575bce937460f1a/lovasz_losses.py#L26

    should be jaccard[1:p] = jaccard[1:p] - jaccard[:p-1]

    opened by alexander-rakhlin 3
  • mIOU decreasing as the Lovasz Hinge Loss decreases

    mIOU decreasing as the Lovasz Hinge Loss decreases

    Hi Maxim Berman! Great Work. I am using the lovasz_hinge loss and iou_binary as my metric. My labels are binary masks of foreground represented as 1 and background as 0. And I am currently overfitting one example just to see how my model(which is a form of hyper-network) works. But as the Lovasz Hinge loss decreases, the output of iou_binary also decreases. Thanks alot in advance for helping

    wontfix 
    opened by m-hamza-mughal 1
  • Some TensorFlow implementation problems.

    Some TensorFlow implementation problems.

    hi, Maxim

    you present a very interesting and solid work! but I met some implementation error while using your lovasz loss in my deeplab v3+. My initial loss is tf.losses.softmax_cross_entropy, and I prepared:

    onehot_labels: [batch_size, num_classes] target one-hot-encoded labels. logits: [batch_size, num_classes] logits outputs of the network .

    as its input, but it just didn't fit in your loss, could you please give some advice about how to transfer original params into your params like probas, labels? Thank you!

    opened by ZanePenn 1
  • How to understand the lovasz_grad when gt_sorted class number>1?

    How to understand the lovasz_grad when gt_sorted class number>1?

    hi, @bermanmaxim jaccard_loss = 1 - IOU why do the jaccard[1:] - jaccard[0:-1]?

    def lovasz_grad(gt_sorted):
        """
        Computes gradient of the Lovasz extension w.r.t sorted errors
        See Alg. 1 in paper
        """
        p = len(gt_sorted)
        gts = gt_sorted.sum()
        intersection = gts - gt_sorted.float().cumsum(0)
        union = gts + (1 - gt_sorted).float().cumsum(0)
        jaccard = 1. - intersection / union
        if p > 1: # cover 1-pixel case
            jaccard[1:p] = jaccard[1:p] - jaccard[0:-1]
        return jaccard
    
    opened by LeopoldACC 0
  • ModuleNotFoundError: No module named 'lovasz'

    ModuleNotFoundError: No module named 'lovasz'

    pip install lovasz

    ERROR: Could not find a version that satisfies the requirement lovasz (from versions: none) ERROR: No matching distribution found for lovasz Note: you may need to restart the kernel to use updated packages.

    Please let me know the solution to this.

    opened by Taitai6521 0
  • how to combine lovasz hinge and bce in binary segmentaion task appropriately?

    how to combine lovasz hinge and bce in binary segmentaion task appropriately?

    Dear BermanMaxin, Thanks for your great work,it has helped me a lot! I got a confusion that how to combine lovasz hinge and bce in binary segmentaion task appropriately. As we all know,lovasz hinge expects logits(without sigmoid),but bce need relust after sigmoid.What confuses me is that whether these two different types(with/without sigmoid) losses can get along well. Other combo loss,e.g. bce+lovasz softmax,bce+dice, all need sigmoid so in my mind there is no problem. Could you give me some advice about this,thanks! In addition,if 'per_image=False' can bring a faster convergence when batch size is big. Thanks.

    opened by yu-Mas 1
  • multi classes with lovasz_hinge

    multi classes with lovasz_hinge

    Hi, thanks for your great work. But I have a question here. emm, when I have a multi classes semantic segmentation task, I can convert the label to one-hot format and do sigmoid to the output of the network then apply nn.BCELoss() to the label and outputs. (Certainly, one-hot + no sigmoid outputs + nn.BCEWithLogitsLoss is also ok), when i do the inference, i just do torch.sigmoid to the outputs of the network and set the thershold as 0.5, then i can get the correct results of semantic segmentation.So may I do the same thing to the lovasz_hinge()? one-hot + no sigmoid outputs + lovasz_hinge?Does that work? And the inference process is same as above?

    opened by hwh-hit 0
  • weird results

    weird results

    Hi,thanks for your work. I have added the tensorflow version of lovaszsoftmax to my task,I trained the model with Cross entropy loss first, then fine tuned it with Cross entropy loss + lovaszsoftmax loss(the weights is 1:1), the mIoU improved about 2%. But when tested on videos it seems like that the model without lovaszsoftmax loss performs better, especially on recall. Do you have any idea about this, thank you.

    my task is 2-class lane segmentation

    opened by phoenares 0
Implement of "Training deep neural networks via direct loss minimization" in PyTorch for 0-1 loss

This is the implementation of "Training deep neural networks via direct loss minimization" published at ICML 2016 in PyTorch. The implementation targe

Cuong Nguyen 1 Jan 18, 2022
3D ResNets for Action Recognition (CVPR 2018)

3D ResNets for Action Recognition Update (2020/4/13) We published a paper on arXiv. Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,

Kensho Hara 3.5k Jan 6, 2023
StarGAN - Official PyTorch Implementation (CVPR 2018)

StarGAN - Official PyTorch Implementation ***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 ***** This repository provides t

Yunjey Choi 5.1k Jan 4, 2023
PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place

Mikaela Uy 294 Dec 12, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
The Noise Contrastive Estimation for softmax output written in Pytorch

An NCE implementation in pytorch About NCE Noise Contrastive Estimation (NCE) is an approximation method that is used to work around the huge computat

Kaiyu Shi 287 Nov 25, 2022
[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax

[NeurIPS 2021] Galerkin Transformer: linear attention without softmax Summary A non-numerical analyst oriented explanation on Toward Data Science abou

Shuhao Cao 159 Dec 20, 2022
an implementation of softmax splatting for differentiable forward warping using PyTorch

softmax-splatting This is a reference implementation of the softmax splatting operator, which has been proposed in Softmax Splatting for Video Frame I

Simon Niklaus 338 Dec 28, 2022
GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification This is the official pytorch implementation of t

Alibaba Cloud 5 Nov 14, 2022
Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

cosFormer Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention Update log 2022/2/28 Add core code License This

null 120 Dec 15, 2022
[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

MDCA Calibration This is the official PyTorch implementation for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved

MDCA Calibration 21 Dec 22, 2022
git《Investigating Loss Functions for Extreme Super-Resolution》(CVPR 2020) GitHub:

Investigating Loss Functions for Extreme Super-Resolution NTIRE 2020 Perceptual Extreme Super-Resolution Submission. Our method ranked first and secon

Sejong Yang 0 Oct 17, 2022
CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Spatially-Correlative Loss arXiv | website We provide the Pytorch implementation of "The Spatially-Correlative Loss for Various Image Translation Task

Chuanxia Zheng 89 Jan 4, 2023
The official implementation of Equalization Loss v1 & v2 (CVPR 2020, 2021) based on MMDetection.

The Equalization Losses for Long-tailed Object Detection and Instance Segmentation This repo is official implementation CVPR 2021 paper: Equalization

Jingru Tan 129 Dec 16, 2022
PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

2021-CVPR-MvCLN This repo contains the code and data of the following paper accepted by CVPR 2021 Partially View-aligned Representation Learning with

XLearning Group 33 Nov 1, 2022
Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Adaptive Class Suppression Loss for Long-Tail Object Detection This repo is the official implementation for CVPR 2021 paper: Adaptive Class Suppressio

CASIA-IVA-Lab 67 Dec 4, 2022
PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

Yulun Zhang 1.2k Dec 26, 2022
Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

ARAE Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun https://arxiv.org/abs/1706.04223 Disc

Junbo (Jake) Zhao 399 Jan 2, 2023
Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

Lars Mescheder 885 Jan 1, 2023