Pytorch implementation of convolutional neural network visualization techniques

Utku Ozbulak

Last update: Jan 3, 2023

Related tags

Deep Learning Model Explanation grad-cam pytorch segmentation gradient cam saliency deep-dream guided-backpropagation guided-grad-cam gradient-visualization cnn-visualization smooth-grad

Overview

Convolutional Neural Network Visualizations

This repository contains a number of convolutional neural network visualization techniques implemented in PyTorch.

Note: I removed cv2 dependencies and moved the repository towards PIL. A few things might be broken (although I tested all methods), I would appreciate if you could create an issue if something does not work.

Note: The code in this repository was tested with torch version 0.4.1 and some of the functions may not work as intended in later versions. Although it shouldn't be too much of an effort to make it work, I have no plans at the moment to make the code in this repository compatible with the latest version because I'm still using 0.4.1.

Implemented Techniques

Gradient visualization with vanilla backpropagation
Gradient visualization with guided backpropagation [1]
Gradient visualization with saliency maps [4]
Gradient-weighted class activation mapping [3] (Generalization of [2])
Guided, gradient-weighted class activation mapping [3]
Score-weighted class activation mapping [15] (Gradient-free generalization of [2])
Element-wise gradient-weighted class activation mapping [16]
Smooth grad [8]
CNN filter visualization [9]
Inverted image representations [5]
Deep dream [10]
Class specific image generation [4] [14]
Grad times image [12]
Integrated gradients [13]

General Information

Depending on the technique, the code uses pretrained AlexNet or VGG from the model zoo. Some of the code also assumes that the layers in the model are separated into two sections; features, which contains the convolutional layers and classifier, that contains the fully connected layer (after flatting out convolutions). If you want to port this code to use it on your model that does not have such separation, you just need to do some editing on parts where it calls model.features and model.classifier.

Every technique has its own python file (e.g. gradcam.py) which I hope will make things easier to understand. misc_functions.py contains functions like image processing and image recreation which is shared by the implemented techniques.

All images are pre-processed with mean and std of the ImageNet dataset before being fed to the model. None of the code uses GPU as these operations are quite fast for a single image (except for deep dream because of the example image that is used for it is huge). You can make use of gpu with very little effort. The example pictures below include numbers in the brackets after the description, like Mastiff (243), this number represents the class id in the ImageNet dataset.

I tried to comment on the code as much as possible, if you have any issues understanding it or porting it, don't hesitate to send an email or create an issue.

Below, are some sample results for each operation.

Gradient Visualization

	Target class: King Snake (56)	Target class: Mastiff (243)	Target class: Spider (72)
Original Image
Colored Vanilla Backpropagation
Vanilla Backpropagation Saliency
Colored Guided Backpropagation (GB)
Guided Backpropagation Saliency (GB)
Guided Backpropagation Negative Saliency (GB)
Guided Backpropagation Positive Saliency (GB)
Gradient-weighted Class Activation Map (Grad-CAM)
Gradient-weighted Class Activation Heatmap (Grad-CAM)
Gradient-weighted Class Activation Heatmap on Image (Grad-CAM)
Score-weighted Class Activation Map (Score-CAM)
Score-weighted Class Activation Heatmap (Score-CAM)
Score-weighted Class Activation Heatmap on Image (Score-CAM)
Colored Guided Gradient-weighted Class Activation Map (Guided-Grad-CAM)
Guided Gradient-weighted Class Activation Map Saliency (Guided-Grad-CAM)
Integrated Gradients (without image multiplication)

Hierarchical Gradient Visualization

LayerCAM [16] is a simple modification of Grad-CAM [3], which can generate reliable class activation maps from different layers. For the examples provided below, a pre-trained VGG16 was used.

	Class Activation Map	Class Activation HeatMap	Class Activation HeatMap on Image
LayerCAM (Layer 9)
LayerCAM (Layer 16)
LayerCAM (Layer 23)
LayerCAM (Layer 30)

Grad Times Image

Another technique that is proposed is simply multiplying the gradients with the image itself. Results obtained with the usage of multiple gradient techniques are below.

Vanilla Grad X Image
Guided Grad X Image
Integrated Grad X Image

Smooth Grad

Smooth grad is adding some Gaussian noise to the original image and calculating gradients multiple times and averaging the results [8]. There are two examples at the bottom which use vanilla and guided backpropagation to calculate the gradients. Number of images (n) to average over is selected as 50. σ is shown at the bottom of the images.

	Vanilla Backprop

	Guided Backprop

Convolutional Neural Network Filter Visualization

CNN filters can be visualized when we optimize the input image with respect to output of the specific convolution operation. For this example I used a pre-trained VGG16. Visualizations of layers start with basic color and direction filters at lower levels. As we approach towards the final layer the complexity of the filters also increase. If you employ external techniques like blurring, gradient clipping etc. you will probably produce better images.

Layer 2 (Conv 1-2)
Layer 10 (Conv 2-1)
Layer 17 (Conv 3-1)
Layer 24 (Conv 4-1)

Another way to visualize CNN layers is to to visualize activations for a specific input on a specific layer and filter. This was done in [1] Figure 3. Below example is obtained from layers/filters of VGG16 for the first image using guided backpropagation. The code for this opeations is in layer_activation_with_guided_backprop.py. The method is quite similar to guided backpropagation but instead of guiding the signal from the last layer and a specific target, it guides the signal from a specific layer and filter.

Input Image	Layer Vis. (Filter=0)	Filter Vis. (Layer=29)

Inverted Image Representations

I think this technique is the most complex technique in this repository in terms of understanding what the code does. It is mainly because of complex regularization. If you truly want to understand how this is implemented I suggest you read the second and third page of the paper [5], specifically, the regularization part. Here, the aim is to generate original image after nth layer. The further we go into the model, the harder it becomes. The results in the paper are incredibly good (see Figure 6) but here, the result quickly becomes messy as we iterate through the layers. This is because the authors of the paper tuned the parameters for each layer individually. You can tune the parameters just like the to ones that are given in the paper to optimize results for each layer. The inverted examples from several layers of AlexNet with the previous Snake picture are below.

Layer 0: Conv2d	Layer 2: MaxPool2d	Layer 4: ReLU

Layer 7: ReLU	Layer 9: ReLU	Layer 12: MaxPool2d

Deep Dream

Deep dream is technically the same operation as layer visualization the only difference is that you don't start with a random image but use a real picture. The samples below were created with VGG19, the produced result is entirely up to the filter so it is kind of hit or miss. The more complex models produce mode high level features. If you replace VGG19 with an Inception variant you will get more noticable shapes when you target higher conv layers. Like layer visualization, if you employ additional techniques like gradient clipping, blurring etc. you might get better visualizations.

Original Image
VGG19 Layer: 34 (Final Conv. Layer) Filter: 94
VGG19 Layer: 34 (Final Conv. Layer) Filter: 103

Class Specific Image Generation

This operation produces different outputs based on the model and the applied regularization method. Below, are some samples produced with VGG19 incorporated with Gaussian blur every other iteration (see [14] for details). The quality of generated images also depend on the model, AlexNet generally has green(ish) artifacts but VGGs produce (kind of) better images. Note that these images are generated with regular CNNs with optimizing the input and not with GANs.

Target class: Worm Snake (52) - (VGG19)	Target class: Spider (72) - (VGG19)

The samples below show the produced image with no regularization, l1 and l2 regularizations on target class: flamingo (130) to show the differences between regularization methods. These images are generated with a pretrained AlexNet.

No Regularization	L1 Regularization	L2 Regularization

Produced samples can further be optimized to resemble the desired target class, some of the operations you can incorporate to improve quality are; blurring, clipping gradients that are below a certain treshold, random color swaps on some parts, random cropping the image, forcing generated image to follow a path to force continuity.

Some of these techniques are implemented in generate_regularized_class_specific_samples.py (courtesy of alexstoken).

Requirements:

torch == 0.4.1
torchvision >= 0.1.9
numpy >= 1.13.0
matplotlib >= 1.5
PIL >= 1.1.7

Citation

If you find the code in this repository useful for your research consider citing it.

@misc{uozbulak_pytorch_vis_2021,
  author = {Utku Ozbulak},
  title = {PyTorch CNN Visualizations},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/utkuozbulak/pytorch-cnn-visualizations}},
  commit = {53561b601c895f7d7d5bcf5fbc935a87ff08979a}
}

References:

[1] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for Simplicity: The All Convolutional Net, https://arxiv.org/abs/1412.6806

[2] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba. Learning Deep Features for Discriminative Localization, https://arxiv.org/abs/1512.04150

[3] R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, https://arxiv.org/abs/1610.02391

[4] K. Simonyan, A. Vedaldi, A. Zisserman. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, https://arxiv.org/abs/1312.6034

[5] A. Mahendran, A. Vedaldi. Understanding Deep Image Representations by Inverting Them, https://arxiv.org/abs/1412.0035

[6] H. Noh, S. Hong, B. Han, Learning Deconvolution Network for Semantic Segmentation https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Noh_Learning_Deconvolution_Network_ICCV_2015_paper.pdf

[7] A. Nguyen, J. Yosinski, J. Clune. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images https://arxiv.org/abs/1412.1897

[8] D. Smilkov, N. Thorat, N. Kim, F. Viégas, M. Wattenberg. SmoothGrad: removing noise by adding noise https://arxiv.org/abs/1706.03825

[9] D. Erhan, Y. Bengio, A. Courville, P. Vincent. Visualizing Higher-Layer Features of a Deep Network https://www.researchgate.net/publication/265022827_Visualizing_Higher-Layer_Features_of_a_Deep_Network

[10] A. Mordvintsev, C. Olah, M. Tyka. Inceptionism: Going Deeper into Neural Networks https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html

[11] I. J. Goodfellow, J. Shlens, C. Szegedy. Explaining and Harnessing Adversarial Examples https://arxiv.org/abs/1412.6572

[12] A. Shrikumar, P. Greenside, A. Shcherbina, A. Kundaje. Not Just a Black Box: Learning Important Features Through Propagating Activation Differences https://arxiv.org/abs/1605.01713

[13] M. Sundararajan, A. Taly, Q. Yan. Axiomatic Attribution for Deep Networks https://arxiv.org/abs/1703.01365

[14] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, Hod Lipson, Understanding Neural Networks Through Deep Visualization https://arxiv.org/abs/1506.06579

[15] H. Wang, Z. Wang, M. Du, F. Yang, Z. Zhang, S. Ding, P. Mardziel, X. Hu. Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks https://arxiv.org/abs/1910.01279

[16] P. Jiang, C. Zhang, Q. Hou, M. Cheng, Y. Wei. LayerCAM: Exploring Hierarchical Class Activation Maps for Localization http://mmcheng.net/mftp/Papers/21TIP_LayerCAM.pdf

Comments

Class Specific Image Generation (L1 and L2 Regularization)

Thank you so much for making this work open source! I really appreciate it.

I am trying to reproduce the results you've displayed. The code that is provided for this does very well for the no regularization case, but I cannot replicate the L1 and L2 regularization examples. How were these generated? Could the code for this also be kindly provided?

Once again, thank you so much for your work!

opened by arjung128 11
Sign error in CNN Layer Visualization

There is only a very minor sign error in the code. In the hook version you correctly minimize the negative mean, hence, maximizing the mean.

In the unhooked version however, the minus sign is missing, computing the input than activates the feature the least (line 102):

loss = torch.mean(self.conv_output)

Additionally, the code has to be modified to run the latest pytorch 0.4, where zero-dimensional tensor can no longer be indexed (line 101 and 63).

opened by McLawrence 11
Add blur, gradient clipping, and regularization to ClassSpecificImageGenerator

Thanks for a great visualization suite. I was having trouble getting a quality image from the class specific image generator so I implemented some of the techniques you suggested in the README and that are originally from Understanding Neural Networks Through Deep Visualization. These techniques greatly improved my visualizations.

Tested with torchvision 0.5.0 with model and image on cpu. To use GPU, will likely have to add logic to transfer image to cpu for PIL and then back to GPU for optimization, but I haven't looked in to this much.

opened by alexstoken 9
Class correctness in gradcam.py

There is a line in gradcam.py which says that if target_class = None then the target_class takes the argmax of the ouptut. Is it possible that the actual class might be different from the expected class?

opened by AND2797 8
too many indices for array

Traceback (most recent call last): File "X:/pytorch-cnn-visualizations-master/src/cnn_layer_visualization.py", line 130, in layer_vis.visualise_layer_without_hooks() File "X:/pytorch-cnn-visualizations-master/src/cnn_layer_visualization.py", line 105, in visualise_layer_without_hooks print('Iteration:', str(i), 'Loss:', "{0:.2f}".format(loss.data.numpy()[0])) IndexError: too many indices for array same Error in deep_dreem.py in exactly the same line but lint no 62, same in inverted_representation.py line 106 and generate_claass_specificsamples.py line 43

opened by shashanka300 8

Model does not have 'feature' attribute

Hi thanks for the code, I'm trying to visualise a trained ResNet18 model with the GradCam function. But ResNet models do not have the feature attribute as in VGGs. Should we use model.named_children() instead?

from gradcam import GradCam
grad_cam = GradCam(model, target_layer=7)

Error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-16-0a85f7f4c865> in <module>
      8 grad_cam = GradCam(model, target_layer=0)
      9 # Generate cam mask
---> 10 cam = grad_cam.generate_cam(x, target_class=None)
     11 # Save mask
     12 save_class_activation_images(original_image, cam, file_name_to_export)

/mnt/sdh/adam/visualisation/gradcam.py in generate_cam(self, input_image, target_class)
     56         # conv_output is the output of convolutions at specified layer
     57         # model_output is the final output of the model (1, 1000)
---> 58         conv_output, model_output = self.extractor.forward_pass(input_image)
     59         if target_class is None:
     60             target_class = np.argmax(model_output.data.numpy())

/mnt/sdh/adam/visualisation/gradcam.py in forward_pass(self, x)
     35         """
     36         # Forward pass on the convolutions
---> 37         conv_output, x = self.forward_pass_on_convolutions(x)
     38         x = x.view(x.size(0), -1)  # Flatten
     39         # Forward pass on the classifier

/mnt/sdh/adam/visualisation/gradcam.py in forward_pass_on_convolutions(self, x)
     23         """
     24         conv_output = None
---> 25         for module_pos, module in self.model.features._modules.items():
     26             x = module(x)  # Forward
     27             if int(module_pos) == self.target_layer:

/mnt/sdh/adam/adam_env/lib/python3.6/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
    589                 return modules[name]
    590         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 591             type(self).__name__, name))
    592 
    593     def __setattr__(self, name, value):

AttributeError: 'ResNet' object has no attribute 'features'

opened by adamxyang 7

visualize model that not in the model zoo
HI! as you said:

If you want to port this code to use it on your model that does not have such separation, you just need to do some editing on parts where it calls model.features and model.classifier.

I try to modify the model.features or .classifier part , but i get confused how to do it. Below is part of my script , hope that you can give some details about how to visualize own trained model.

model =torch.load(model_save_path) for index,(layer,_) in enumerate(model.items()): # model.items() return the weight of this layer, eg, model.items()[0]=model.0.1.weight x = layer(x) ......

but i get the TypeError that the object is not callable. i wonder how to read the layer from the trained model that didn't have features attribute. thanks~
opened by visonpon 6
maximize instead of minimize output for filter visualization?

In https://github.com/utkuozbulak/pytorch-cnn-visualizations/blob/master/src/cnn_layer_visualization.py#L104, I see that the loss of your filter optimization is the mean value of a feature map. Isn't that the loss should be the negative of this, as we want to make the target filter with a higher output? Thanks.
bug

opened by zym1010 6
Reference for image clipping in image recreation step?
Hi, first of all, thanks for providing this repository! I really like it and compare it to my own implementation, trying to understand the algorithms. I am very new to this, so sorry if this is a stupid question, but could you/anyone point me to a reference of why you clip instead of normalizing the images during the recreation phase? The specific code lines are the following in the mis_functions.py:

recreated_im[recreated_im > 1] = 1 recreated_im[recreated_im < 0] = 0

Why shouldn't we rescale them to a range between 0 and 1?

Thanks :)
opened by kai-tub 5
question about guided backprop
Hi, thank you for this great repo. I'm confused when I'm reading guided_backprop.py. On line 70-73, the gradient of output is [0, 0, ..., 1, 0].

one_hot_output = torch.FloatTensor(1, model_output.size()[-1]).zero_() one_hot_output[0][target_class] = 1 # Backward pass model_output.backward(gradient=one_hot_output)

I changed target class while draw cat_dog_Guided_BP_color map. But I find no matter what class is set, it always looks the same.

For example: cat_dog_Guided_BP_color_target_class_10.jpg

cat_dog_Guided_BP_color_target_class_100.jpg

cat_dog_Guided_BP_color_target_class_243.jpg(ground truth)

cat_dog_Guided_BP_color_target_class_500.jpg

cat_dog_Guided_BP_color_target_class_890.jpg
opened by narrowsnap 5
guided_backprop register backward/forward hook seems to consume a lot of GPU memory

Hi, really a great job! I was trying to get a guided-gradcam visualization of a video. But I found as more images have been read, very soon my server shows CUDA out of memory error. Then I discovered that it is because register_backward/forward_hook consumes a lot of GPU memory. Is there any way I could fix this issue? Thanks in advance!

opened by Terahezi 5
Adding capability to choose device other than cpu and fixing/generalize
Reference Issues/PRs

None

What does this implementation fix?

These changes mainly focus on augmenting the code to make it more usable and generalize.

Run on GPU: Modern architectures are so deep and computation extensive, that running it only on CPU may always result in out of memory error. In this implementation, I have changed grad-cam, guided-backprop, integrated-gradient, layer-activation-with-guided-backprop, score-cam, and vanilla-backprop to be able to initialize with desired device ids as per our will. And then the model forward function and backward gradient calculations can be done on GPU devices. If nothing is provided at the time of initialization, the code simply follows the existing.

Introduction of forward hook: When we make models in Pytorch it is not essential that model class and the forward function contain the same layers. For example, often designers don't keep Flatten and Concat functions out of the class, And only perform them in forward section. So in Cam extractors, forward_pass_on_convolutions function looping over the layer of the model until the desired layer is reached may not depict the correct functionality of the forward function of the model. Hence we just added forward hook of the desired conv layer and let the model complete its forward pass without intervention. This way we can get the conv layer output as well as the correct output of the entire model in way more general way without the dependency of the forward function.

Other comments

No

enhancement
opened by arnabdas2019ovgu 1
image size problem

Hi，I find a bug in gradcam.py The Image package use w,h mode. However, the precessed image is a tensor with 1,c,h,w. So，the order of w and h need to be switched when using Image.resize.

Line 88，change cam = np.uint8(Image.fromarray(cam).resize((input_image.shape[2], input_image.shape[3]), Image.ANTIALIAS)) to cam = np.uint8(Image.fromarray(cam).resize((input_image.shape[3], input_image.shape[2]), Image.ANTIALIAS))

opened by zhaoxin111 6