Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

Denis

Last update: Dec 28, 2022

Related tags

Deep Learning unsupervised detection anomaly normalizing-flows mvtec mvtec-ad

Overview

CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows

WACV 2022 preprint:https://arxiv.org/abs/2107.12571

Abstract

Unsupervised anomaly detection with localization has many practical applications when labeling is infeasible and, moreover, when anomaly examples are completely missing in the train data. While recently proposed models for such data setup achieve high accuracy metrics, their complexity is a limiting factor for real-time processing. In this paper, we propose a real-time model and analytically derive its relationship to prior methods. Our CFLOW-AD model is based on a conditional normalizing flow framework adopted for anomaly detection with localization. In particular, CFLOW-AD consists of a discriminatively pretrained encoder followed by a multi-scale generative decoders where the latter explicitly estimate likelihood of the encoded features. Our approach results in a computationally and memory-efficient model: CFLOW-AD is faster and smaller by a factor of 10x than prior state-of-the-art with the same input setting. Our experiments on the MVTec dataset show that CFLOW-AD outperforms previous methods by 0.36% AUROC in detection task, by 1.12% AUROC and 2.5% AUPRO in localization task, respectively. We open-source our code with fully reproducible experiments.

BibTex Citation

If you like our paper or code, please cite its WACV 2022 preprint using the following BibTex:

@article{cflow_ad,
  title={CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows},
  author={Gudovskiy, Denis and Ishizaka, Shun and Kozuka, Kazuki},
  journal={arXiv:2107.12571},
  year={2021}
}

Installation

Clone this repository: tested on Python 3.8
Install PyTorch: tested on v1.8
Install FrEIA Flows: tested on the recent branch
Other dependencies in requirements.txt

Install all packages with this command:

$ python3 -m pip install -U -r requirements.txt

Datasets

We support MVTec AD dataset for anomaly localization in factory setting and Shanghai Tech Campus (STC) dataset with surveillance camera videos. Please, download dataset from URLs and extract to data folder or make symlink to that folder or change default data path in main.py).

Code Organization

./custom_datasets - contains dataloaders for MVTec and STC
./custom_models - contains pretrained feature extractors

Training Models

Run code by selecting class name, feature extractor, input size, flow model etc.
The commands below should reproduce our reference MVTec results using WideResnet-50 extractor:

python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name bottle
python3 main.py --gpu 0 --pro -inp 256 --dataset mvtec --class-name cable
python3 main.py --gpu 0 --pro -inp 256 --dataset mvtec --class-name capsule
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name carpet
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name grid
python3 main.py --gpu 0 --pro -inp 256 --dataset mvtec --class-name hazelnut
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name leather
python3 main.py --gpu 0 --pro -inp 256 --dataset mvtec --class-name metal_nut
python3 main.py --gpu 0 --pro -inp 256 --dataset mvtec --class-name pill
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name screw
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name tile
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name toothbrush
python3 main.py --gpu 0 --pro -inp 128 --dataset mvtec --class-name transistor
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name wood
python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name zipper

Testing Pretrained Models

Download pretrained weights from Google Drive
The command below should reproduce MVTec results using light-weight MobileNetV3L extractor (AUROC, AUPRO) = (98.38%, 94.72%):

python3 main.py --gpu 0 --pro -enc mobilenet_v3_large --dataset mvtec --action-type norm-test -inp INPUT --class-name CLASS --checkpoint PATH/FILE.PT

CFLOW-AD Architecture

Reference CFLOW-AD Results for MVTec

Comments

how to inference single image?

great job i am new about abnomal detection. what i want to know is that ,in actually scene ，when i have trained one model with bottle class in mvtec, i give another bottle image, should the model give the class (nomal or abnomal),and location the abnomal area like GT? could you give some demo code for that (load a model and predict single image)?

opened by gewenpulan 6
heat map visualization

First of all, thank you for presenting your work as open source. When I tested the bottle class with the

python3 main.py --gpu 0 --pro -enc mobilenet_v3_large --dataset mvtec --action-type norm-test -inp INPUT --class-name CLASS --checkpoint PATH/FILE.PT

command, I got results. DET_AUROC: last: 100.00 max: 100.00 epoch_max: 0 SEG_AUROC: last: 98.93 max: 98.93 epoch_max: 0 SEG_AUPRO: last: 96.48 max: 96.48 epoch_max: 0

The question is how can i display heatmap on images.

opened by OzanErcan6 5
Large test loss, and thresholds

First, it's a great job, thanks for your sharing. And, when I test bottle, train loss is very small, 10 epochs later, 0.06. But the test loss growth quickly, larger than 27000. At the 25th epoch, test loss is 6636421 while train loss 0.05. So, is this normal? I use default settings.

Then, a problem confused me a lot time. Without GTs, how can I chose thresholds of SEG? Both the auroc socres and F1 scores require gt_mask, but the fact is no gt information for new data with only normal samples. Could you give some ideas? thks!

opened by letmejoin 5

RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient

Hi! I faced the RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient when tracing encoder (-enc mobilenet_v3_large) from the cflow-ad network with torch.jit.trace() function. I fed in trace function simplified version of test_meta_epoch function and single image (as tensor torch.Size([1, 3, 512, 512])). Script fails on line with _ = c.encoder(image).

Please explain to me, what should I change in code to resolve this task?

Traceback (most recent call last):
  File "convert_cflow-ad.py", line 187, in <module>
    traced_model = torch.jit.trace(test_model, image)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/jit/_trace.py", line 780, in trace
    traced = torch._C._create_function_from_trace(
  File "convert_cflow-ad.py", line 72, in test_model
    _ = c.encoder(image)  # BxCxHxW
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1039, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1039, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1039, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 439, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient

opened by volkov-maxim 2

Increase hyperparameter N

Hi,

When I increase hyperparameter N, for test_meta_epoch only, to 8192 I get an fps increase of 300% and SUM scores which are identical to N=256. I do not fully understand the code yet, so could you answer if this is wise to do?

opened by rvermeire 2
Question about image resizing and evaluation methods in localization task

I'm a student who studies anomaly detection, Thank you for providing an interesting paper and well-designed code. I would like to ask a question about image resizing and evaluation methods in the localization task.

The input image is resized when training in your code. And the information about the size of the resized image is given when the main.py is executed like this. python3 main.py --gpu 0 --pro -inp 128 --dataset mvtec --class-name transistor The original size of "transistor" class in MVTec dataset is 1024, so I guess the size of this image is resized from 1024 to 128 here.

My question: Are the predicted mask image and GT mask image also resized when they are tested? (like 1024 -> 128 in "transistor" class) If so, would reducing the resolution of the image used for testing not improve the evaluation value?

I'd appreciate it if you could answer my questions.

opened by shot1107 2
why taking dataloader and training times very slow?
Hello, running the command below.

python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name bottle

Can you tell me why it takes more time to train than the existing Anomaly algorithm and why it takes more time to bring dataloaders? Thank you for sharing code
opened by ingbeeedd 2
Calculate detection AUROC from anomaly map

Is it possible to get the detection performance from the anomaly map (segmentation)? I mean like the way that get the top k highest anomaly score in the anomaly map and calculate the mean of them as the threshold for decide as anomaly or not.

Is there any misunderstanding or something that I miss about this kind of way to calculate detection performance instead of calculate with label (1,0) ?

Because I observe that you have a better performance in segmentation than the detection in this model maybe your model is flow-based (generative-based) so it have a better behavior in pixel level?

opened by Howeng98 1

Inference speed increase with the time

Hello @gudovskiy .

I was measuring the proc time of this work in my GPU. I noted that the proc time per iteration increases. I wonder if there are some accumulation formula that could decrease the speed. You can see next:

I am sharing the code I'm using below. I already saw that you have a similar method but I want to measure all the pipeline untill the scores are achieved

input_data = torch.rand_like(next(iter(loader))[0], requires_grad=False, device=torch.device('cuda'))

#Warmup phase 
....
# end of warmup

torch.cuda.synchronize()
    print("Start timing...")
    timings = []
    with torch.no_grad():
        for i in tqdm(range(500), 'Measuring inference speed...'):
            start_time = time.time()
            _ = encoder(input_data)  # BxCxHxW
            # test decoder
            for l, layer in enumerate(pool_layers):
                e = activation[layer]  # BxCxHxW
                B, C, H, W = e.size()
                S = H * W
                E = B * S
                if i == 0:  # get stats
                    height.append(H)
                    width.append(W)
                p = positionalencoding2d(P, H, W).to(c.device).unsqueeze(0).repeat(B, 1, 1, 1)
                c_r = p.reshape(B, P, S).transpose(1, 2).reshape(E, P)  # BHWxP
                e_r = e.reshape(B, C, S).transpose(1, 2).reshape(E, C)  # BHWxC
                decoder = decoders[l]
                FIB = E // N + int(E % N > 0)  # number of fiber batches
                for f in range(FIB):
                    if f < (FIB - 1):
                        idx = torch.arange(f * N, (f + 1) * N)
                    else:
                        idx = torch.arange(f * N, E)
                    c_p = c_r[idx]  # NxP
                    e_p = e_r[idx]  # NxC
                    z, log_jac_det = decoder(e_p, [c_p, ])
                    decoder_log_prob = get_logp(C, z, log_jac_det)
                    log_prob = decoder_log_prob / C  # likelihood per dim
                    test_dist[l] = test_dist[l] + log_prob.detach().cpu().tolist()

            test_map = [list() for p in pool_layers]
            for l, p in enumerate(pool_layers):
                test_norm = torch.tensor(test_dist[l], dtype=torch.double)  # EHWx1
                test_norm -= torch.max(test_norm)  # normalize likelihoods to (-Inf:0] by subtracting a constant
                test_prob = torch.exp(test_norm)  # convert to probs in range [0:1]
                test_mask = test_prob.reshape(-1, height[l], width[l])
                test_map[l] = F.interpolate(test_mask.unsqueeze(1), size=c.crp_size, mode='bilinear', align_corners=True).squeeze().numpy()

            score_map = np.zeros_like(test_map[0])
            for l, p in enumerate(pool_layers):
                score_map += test_map[l]
            score_mask = score_map
            super_mask = score_mask.max() - score_mask
            score_label = np.max(super_mask)

            torch.cuda.synchronize()
            end_time = time.time()
            timings.append(end_time - start_time)
            if i % 100 == 0:
                print("It: {} average time: {}".format(i, np.mean(timings) * 1000))

    print("Input shape {}".format(input_data.shape))
    print("Average time {}".format(np.mean(timings) * 1000))

Thanks

opened by mjack3 1

Calculate detection AUROC

In the line 334 of train file, the detection AUROC is calculated using truth label and score label. Why the truth label is bolean and score label is a float? Im trying to replicate using the leather class and the score label values are between [0.97,2.7].

opened by jpmrs1313 1
Why num channels is divided here?

According to your paper, what we should use is log(pz) in equation 8. Why the log_prob is divided by the number of channels here?

https://github.com/gudovskiy/cflow-ad/blob/0d39848c988ad1c6b3c9b7f4bad40d52c1a75ceb/train.py#L77

opened by mysephi 1
About the difficulties of exporting onnx

Hello Denis, Thanks a lot for your proposed CFlow method! Do you have any plans to export the model to 'onnx' format? In my attempts to export, I had to use 'torch.jit.script' as an intermediate method to implement 'onnx' due to the presence of the loop in the forward code. But this seems to be difficult and the Freia library always fails to export. Sincerely looking forward to your reply!

opened by nullhd2 8
Tune parameters to get best results.

Hello Denis, I have been studying and adjusting this great repository to fit my needs in defect detection. I have trained several models successfully(using the mobilenetv3_large backbone) and achieved good results. However, I have eliminated some parts of your code, including the snippets that calculate the "seg_threshold" parameter from the ground truth. Therefore, I am choosing it by hand (through trial&error), and although the results are OK, I think they can be further improved.

My questions are: 1- How can I choose a value for "seg_threshold" reliably in my given case? 2- What parameters do you recommend fine-tuning when training a new model (knowing that I have a good and balanced dataset of the same product but with different colours)? 3- My last question is related to exporting the model to "onnx" format, do you have any comments on how to achieve that? Do you plan on adding that capability? 4- When should I stop training ?

Thank you in advance, your work is truly inspiring.

opened by Tekno-H 2
Detection AUROC and segmentation AUROC behavior

Hello!

I wonder if the detection AUROC snd segmentation AUROC have the same behavior. In other words, when the model achieves the best seg_auroc,should it be when it has also the best detection_auroc? Or can it have the best det_auroc and the best seg_auroc in diferent epochs?

PDT: congrats for this work

opened by mjack3 1

Improper normalization of the scores?

In train.py, you normalize the scores according to:

test_map = [list() for p in pool_layers]
for l, p in enumerate(pool_layers):
    test_norm = torch.tensor(test_dist[l], dtype=torch.double)  # EHWx1
    test_norm-= torch.max(test_norm) # normalize likelihoods to (-Inf:0] by subtracting a constant
    test_prob = torch.exp(test_norm) # convert to probs in range [0:1]
    test_mask = test_prob.reshape(-1, height[l], width[l])
    test_mask = test_prob.reshape(-1, height[l], width[l])
    # upsample
    test_map[l] = F.interpolate(test_mask.unsqueeze(1),
        size=c.crp_size, mode='bilinear', align_corners=True).squeeze().numpy()
# score aggregation
score_map = np.zeros_like(test_map[0])
for l, p in enumerate(pool_layers):
    score_map += test_map[l]

This normalization is fine as long as it is done for only one map since this normalization function is monotonically increasing. By adding up the maps from the different layers, this makes no sense to me since the relative weighting of the score maps for aggregation (last line) depends on the test set or to be more precise on the maxima of the individual maps over the test set. Am I missing something here or is this normalization improper?

opened by marco-rudolph 9

Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

Related tags

Overview

CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows

Abstract

BibTex Citation

Installation

Datasets

Code Organization

Training Models

Testing Pretrained Models

CFLOW-AD Architecture

Reference CFLOW-AD Results for MVTec

Comments

Owner

Denis

Occlusion robust 3D face reconstruction model in CFR-GAN (WACV 2022)

Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

[WACV 2020] Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)

[CVPR 2022] Official Pytorch code for OW-DETR: Open-world Detection Transformer

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

Official repository of the AAAI'2022 paper "Contrast and Generation Make BART a Good Dialogue Emotion Recognizer"

ReLoss - Official implementation for paper "Relational Surrogate Loss Learning" ICLR 2022

Official Implementation of CVPR 2022 paper: "Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning"