Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

Overview

PWC

CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows

WACV 2022 preprint:https://arxiv.org/abs/2107.12571

Abstract

Unsupervised anomaly detection with localization has many practical applications when labeling is infeasible and, moreover, when anomaly examples are completely missing in the train data. While recently proposed models for such data setup achieve high accuracy metrics, their complexity is a limiting factor for real-time processing. In this paper, we propose a real-time model and analytically derive its relationship to prior methods. Our CFLOW-AD model is based on a conditional normalizing flow framework adopted for anomaly detection with localization. In particular, CFLOW-AD consists of a discriminatively pretrained encoder followed by a multi-scale generative decoders where the latter explicitly estimate likelihood of the encoded features. Our approach results in a computationally and memory-efficient model: CFLOW-AD is faster and smaller by a factor of 10x than prior state-of-the-art with the same input setting. Our experiments on the MVTec dataset show that CFLOW-AD outperforms previous methods by 0.36% AUROC in detection task, by 1.12% AUROC and 2.5% AUPRO in localization task, respectively. We open-source our code with fully reproducible experiments.

BibTex Citation

If you like our paper or code, please cite its WACV 2022 preprint using the following BibTex:

@article{cflow_ad,
  title={CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows},
  author={Gudovskiy, Denis and Ishizaka, Shun and Kozuka, Kazuki},
  journal={arXiv:2107.12571},
  year={2021}
}

Installation

Install all packages with this command:

$ python3 -m pip install -U -r requirements.txt

Datasets

We support MVTec AD dataset for anomaly localization in factory setting and Shanghai Tech Campus (STC) dataset with surveillance camera videos. Please, download dataset from URLs and extract to data folder or make symlink to that folder or change default data path in main.py).

Code Organization

  • ./custom_datasets - contains dataloaders for MVTec and STC
  • ./custom_models - contains pretrained feature extractors

Training Models

  • Run code by selecting class name, feature extractor, input size, flow model etc.
  • The commands below should reproduce our reference MVTec results using WideResnet-50 extractor:
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name bottle
python3 main.py --gpu 0 --pro -inp 256 --dataset mvtec --class-name cable
python3 main.py --gpu 0 --pro -inp 256 --dataset mvtec --class-name capsule
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name carpet
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name grid
python3 main.py --gpu 0 --pro -inp 256 --dataset mvtec --class-name hazelnut
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name leather
python3 main.py --gpu 0 --pro -inp 256 --dataset mvtec --class-name metal_nut
python3 main.py --gpu 0 --pro -inp 256 --dataset mvtec --class-name pill
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name screw
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name tile
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name toothbrush
python3 main.py --gpu 0 --pro -inp 128 --dataset mvtec --class-name transistor
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name wood
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name zipper

Testing Pretrained Models

  • Download pretrained weights from Google Drive
  • The command below should reproduce MVTec results using light-weight MobileNetV3L extractor (AUROC, AUPRO) = (98.38%, 94.72%):
python3 main.py --gpu 0 --pro -enc mobilenet_v3_large --dataset mvtec --action-type norm-test -inp INPUT --class-name CLASS --checkpoint PATH/FILE.PT

CFLOW-AD Architecture

CFLOW-AD

Reference CFLOW-AD Results for MVTec

CFLOW-AD

Comments
  • how to inference  single image?

    how to inference single image?

    great job i am new about abnomal detection. what i want to know is that ,in actually scene ,when i have trained one model with bottle class in mvtec, i give another bottle image, should the model give the class (nomal or abnomal),and location the abnomal area like GT? could you give some demo code for that (load a model and predict single image)?

    opened by gewenpulan 6
  • heat map visualization

    heat map visualization

    First of all, thank you for presenting your work as open source. When I tested the bottle class with the

    python3 main.py --gpu 0 --pro -enc mobilenet_v3_large --dataset mvtec --action-type norm-test -inp INPUT --class-name CLASS --checkpoint PATH/FILE.PT

    command, I got results. DET_AUROC: last: 100.00 max: 100.00 epoch_max: 0 SEG_AUROC: last: 98.93 max: 98.93 epoch_max: 0 SEG_AUPRO: last: 96.48 max: 96.48 epoch_max: 0

    The question is how can i display heatmap on images.

    opened by OzanErcan6 5
  • Large test loss, and thresholds

    Large test loss, and thresholds

    First, it's a great job, thanks for your sharing. And, when I test bottle, train loss is very small, 10 epochs later, 0.06. But the test loss growth quickly, larger than 27000. At the 25th epoch, test loss is 6636421 while train loss 0.05. So, is this normal? I use default settings.

    Then, a problem confused me a lot time. Without GTs, how can I chose thresholds of SEG? Both the auroc socres and F1 scores require gt_mask, but the fact is no gt information for new data with only normal samples. Could you give some ideas? thks!

    opened by letmejoin 5
  • RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient

    RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient

    Hi! I faced the RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient when tracing encoder (-enc mobilenet_v3_large) from the cflow-ad network with torch.jit.trace() function. I fed in trace function simplified version of test_meta_epoch function and single image (as tensor torch.Size([1, 3, 512, 512])). Script fails on line with _ = c.encoder(image).

    Please explain to me, what should I change in code to resolve this task?

    Traceback (most recent call last):
      File "convert_cflow-ad.py", line 187, in <module>
        traced_model = torch.jit.trace(test_model, image)
      File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/jit/_trace.py", line 780, in trace
        traced = torch._C._create_function_from_trace(
      File "convert_cflow-ad.py", line 72, in test_model
        _ = c.encoder(image)  # BxCxHxW
      File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
        return forward_call(*input, **kwargs)
      File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1039, in _slow_forward
        result = self.forward(*input, **kwargs)
      File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
        input = module(input)
      File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
        return forward_call(*input, **kwargs)
      File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1039, in _slow_forward
        result = self.forward(*input, **kwargs)
      File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
        input = module(input)
      File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
        return forward_call(*input, **kwargs)
      File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1039, in _slow_forward
        result = self.forward(*input, **kwargs)
      File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 439, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient
    
    opened by volkov-maxim 2
  • Increase hyperparameter N

    Increase hyperparameter N

    Hi,

    When I increase hyperparameter N, for test_meta_epoch only, to 8192 I get an fps increase of 300% and SUM scores which are identical to N=256. I do not fully understand the code yet, so could you answer if this is wise to do?

    opened by rvermeire 2
  • Question about image resizing and evaluation methods in localization task

    Question about image resizing and evaluation methods in localization task

    I'm a student who studies anomaly detection, Thank you for providing an interesting paper and well-designed code. I would like to ask a question about image resizing and evaluation methods in the localization task.

    The input image is resized when training in your code. And the information about the size of the resized image is given when the main.py is executed like this. python3 main.py --gpu 0 --pro -inp 128 --dataset mvtec --class-name transistor The original size of "transistor" class in MVTec dataset is 1024, so I guess the size of this image is resized from 1024 to 128 here.

    My question: Are the predicted mask image and GT mask image also resized when they are tested? (like 1024 -> 128 in "transistor" class) If so, would reducing the resolution of the image used for testing not improve the evaluation value?

    I'd appreciate it if you could answer my questions.

    opened by shot1107 2
  • why taking dataloader and training times very slow?

    why taking dataloader and training times very slow?

    Hello, running the command below.

    python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name bottle
    

    Can you tell me why it takes more time to train than the existing Anomaly algorithm and why it takes more time to bring dataloaders? Thank you for sharing code

    opened by ingbeeedd 2
  • Calculate detection AUROC from anomaly map

    Calculate detection AUROC from anomaly map

    Is it possible to get the detection performance from the anomaly map (segmentation)? I mean like the way that get the top k highest anomaly score in the anomaly map and calculate the mean of them as the threshold for decide as anomaly or not.

    Is there any misunderstanding or something that I miss about this kind of way to calculate detection performance instead of calculate with label (1,0) ?

    Because I observe that you have a better performance in segmentation than the detection in this model maybe your model is flow-based (generative-based) so it have a better behavior in pixel level?

    opened by Howeng98 1
  • Inference speed increase with the time

    Inference speed increase with the time

    Hello @gudovskiy .

    I was measuring the proc time of this work in my GPU. I noted that the proc time per iteration increases. I wonder if there are some accumulation formula that could decrease the speed. You can see next:

    image

    I am sharing the code I'm using below. I already saw that you have a similar method but I want to measure all the pipeline untill the scores are achieved

    input_data = torch.rand_like(next(iter(loader))[0], requires_grad=False, device=torch.device('cuda'))
    
    #Warmup phase 
    ....
    # end of warmup
    
    torch.cuda.synchronize()
        print("Start timing...")
        timings = []
        with torch.no_grad():
            for i in tqdm(range(500), 'Measuring inference speed...'):
                start_time = time.time()
                _ = encoder(input_data)  # BxCxHxW
                # test decoder
                for l, layer in enumerate(pool_layers):
                    e = activation[layer]  # BxCxHxW
                    B, C, H, W = e.size()
                    S = H * W
                    E = B * S
                    if i == 0:  # get stats
                        height.append(H)
                        width.append(W)
                    p = positionalencoding2d(P, H, W).to(c.device).unsqueeze(0).repeat(B, 1, 1, 1)
                    c_r = p.reshape(B, P, S).transpose(1, 2).reshape(E, P)  # BHWxP
                    e_r = e.reshape(B, C, S).transpose(1, 2).reshape(E, C)  # BHWxC
                    decoder = decoders[l]
                    FIB = E // N + int(E % N > 0)  # number of fiber batches
                    for f in range(FIB):
                        if f < (FIB - 1):
                            idx = torch.arange(f * N, (f + 1) * N)
                        else:
                            idx = torch.arange(f * N, E)
                        c_p = c_r[idx]  # NxP
                        e_p = e_r[idx]  # NxC
                        z, log_jac_det = decoder(e_p, [c_p, ])
                        decoder_log_prob = get_logp(C, z, log_jac_det)
                        log_prob = decoder_log_prob / C  # likelihood per dim
                        test_dist[l] = test_dist[l] + log_prob.detach().cpu().tolist()
    
                test_map = [list() for p in pool_layers]
                for l, p in enumerate(pool_layers):
                    test_norm = torch.tensor(test_dist[l], dtype=torch.double)  # EHWx1
                    test_norm -= torch.max(test_norm)  # normalize likelihoods to (-Inf:0] by subtracting a constant
                    test_prob = torch.exp(test_norm)  # convert to probs in range [0:1]
                    test_mask = test_prob.reshape(-1, height[l], width[l])
                    test_map[l] = F.interpolate(test_mask.unsqueeze(1), size=c.crp_size, mode='bilinear', align_corners=True).squeeze().numpy()
    
                score_map = np.zeros_like(test_map[0])
                for l, p in enumerate(pool_layers):
                    score_map += test_map[l]
                score_mask = score_map
                super_mask = score_mask.max() - score_mask
                score_label = np.max(super_mask)
    
                torch.cuda.synchronize()
                end_time = time.time()
                timings.append(end_time - start_time)
                if i % 100 == 0:
                    print("It: {} average time: {}".format(i, np.mean(timings) * 1000))
    
        print("Input shape {}".format(input_data.shape))
        print("Average time {}".format(np.mean(timings) * 1000))
    

    Thanks

    opened by mjack3 1
  • Calculate detection AUROC

    Calculate detection AUROC

    In the line 334 of train file, the detection AUROC is calculated using truth label and score label. Why the truth label is bolean and score label is a float? Im trying to replicate using the leather class and the score label values are between [0.97,2.7].

    opened by jpmrs1313 1
  • Why num channels is divided here?

    Why num channels is divided here?

    According to your paper, what we should use is log(pz) in equation 8. Why the log_prob is divided by the number of channels here?

    https://github.com/gudovskiy/cflow-ad/blob/0d39848c988ad1c6b3c9b7f4bad40d52c1a75ceb/train.py#L77

    opened by mysephi 1
  • About the difficulties of exporting onnx

    About the difficulties of exporting onnx

    Hello Denis, Thanks a lot for your proposed CFlow method! Do you have any plans to export the model to 'onnx' format? In my attempts to export, I had to use 'torch.jit.script' as an intermediate method to implement 'onnx' due to the presence of the loop in the forward code. But this seems to be difficult and the Freia library always fails to export. Sincerely looking forward to your reply!

    opened by nullhd2 8
  • Tune parameters to get best results.

    Tune parameters to get best results.

    Hello Denis, I have been studying and adjusting this great repository to fit my needs in defect detection. I have trained several models successfully(using the mobilenetv3_large backbone) and achieved good results. However, I have eliminated some parts of your code, including the snippets that calculate the "seg_threshold" parameter from the ground truth. Therefore, I am choosing it by hand (through trial&error), and although the results are OK, I think they can be further improved.

    My questions are: 1- How can I choose a value for "seg_threshold" reliably in my given case? 2- What parameters do you recommend fine-tuning when training a new model (knowing that I have a good and balanced dataset of the same product but with different colours)? 3- My last question is related to exporting the model to "onnx" format, do you have any comments on how to achieve that? Do you plan on adding that capability? 4- When should I stop training ?

    Thank you in advance, your work is truly inspiring.

    opened by Tekno-H 2
  • Detection AUROC and segmentation AUROC behavior

    Detection AUROC and segmentation AUROC behavior

    Hello!

    I wonder if the detection AUROC snd segmentation AUROC have the same behavior. In other words, when the model achieves the best seg_auroc,should it be when it has also the best detection_auroc? Or can it have the best det_auroc and the best seg_auroc in diferent epochs?

    PDT: congrats for this work

    opened by mjack3 1
  • Improper normalization of the scores?

    Improper normalization of the scores?

    In train.py, you normalize the scores according to:

    test_map = [list() for p in pool_layers]
    for l, p in enumerate(pool_layers):
        test_norm = torch.tensor(test_dist[l], dtype=torch.double)  # EHWx1
        test_norm-= torch.max(test_norm) # normalize likelihoods to (-Inf:0] by subtracting a constant
        test_prob = torch.exp(test_norm) # convert to probs in range [0:1]
        test_mask = test_prob.reshape(-1, height[l], width[l])
        test_mask = test_prob.reshape(-1, height[l], width[l])
        # upsample
        test_map[l] = F.interpolate(test_mask.unsqueeze(1),
            size=c.crp_size, mode='bilinear', align_corners=True).squeeze().numpy()
    # score aggregation
    score_map = np.zeros_like(test_map[0])
    for l, p in enumerate(pool_layers):
        score_map += test_map[l]
    

    This normalization is fine as long as it is done for only one map since this normalization function is monotonically increasing. By adding up the maps from the different layers, this makes no sense to me since the relative weighting of the score maps for aggregation (last line) depends on the test set or to be more precise on the maxima of the individual maps over the test set. Am I missing something here or is this normalization improper?

    opened by marco-rudolph 9
Owner
Denis
Machine and Deep Learning Researcher
Denis
Occlusion robust 3D face reconstruction model in CFR-GAN (WACV 2022)

Occlusion Robust 3D face Reconstruction Yeong-Joon Ju, Gun-Hee Lee, Jung-Ho Hong, and Seong-Whan Lee Code for Occlusion Robust 3D Face Reconstruction

Yeongjoon 31 Dec 19, 2022
Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on [ Paper ] [ Project Page ] This repository contains the code fo

Andrew Jong 97 Dec 13, 2022
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 3 Aug 20, 2022
[WACV 2020] Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints

Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints Official implementation for Reducing Footskate in Human Motion Recon

Virginia Tech Vision and Learning Lab 38 Nov 1, 2022
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Qin Wang 87 Jan 8, 2023
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 141 Dec 30, 2022
An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022

Dual Correlation Reduction Network An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022. Any

yueliu1999 109 Dec 23, 2022
[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

MDCA Calibration This is the official PyTorch implementation for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved

MDCA Calibration 21 Dec 22, 2022
Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

OpenDet Expanding Low-Density Latent Regions for Open-Set Object Detection (CVPR2022) Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-So

csuhan 64 Jan 7, 2023
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

NVIDIA Research Projects 1.4k Jan 1, 2023
Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)

?? Sound-guided Semantic Image Manipulation (CVPR2022) Official Pytorch Implementation Sound-guided Semantic Image Manipulation IEEE/CVF Conference on

CVLAB 58 Dec 28, 2022
[CVPR 2022] Official Pytorch code for OW-DETR: Open-world Detection Transformer

OW-DETR: Open-world Detection Transformer (CVPR 2022) [Paper] Akshita Gupta*, Sanath Narayan*, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Sh

Akshita Gupta 127 Dec 27, 2022
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

NTIRE 2022 - Image Inpainting Challenge Important dates 2022.02.01: Release of train data (input and output images) and validation data (only input) 2

Andrés Romero 37 Nov 27, 2022
"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (CVPRW 2022) Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Z

Yuanhao Cai 274 Jan 5, 2023
The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

DG-TrajGen The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022. Our Meth

Wang 25 Sep 26, 2022
Official repository of the AAAI'2022 paper "Contrast and Generation Make BART a Good Dialogue Emotion Recognizer"

CoG-BART Contrast and Generation Make BART a Good Dialogue Emotion Recognizer Quick Start: To run the model on test sets of four datasets, Download th

null 39 Dec 24, 2022
ReLoss - Official implementation for paper "Relational Surrogate Loss Learning" ICLR 2022

Relational Surrogate Loss Learning (ReLoss) Official implementation for paper "R

Tao Huang 31 Nov 22, 2022
Official Implementation of CVPR 2022 paper: "Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning"

(CVPR 2022) Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning ArXiv This repo contains Official Implementat

Yujun Shi 24 Nov 1, 2022