Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

Overview

UvA Deep Learning Tutorials

Note: To look at the notebooks in a nicer format, visit our RTD website: https://uvadlc-notebooks.readthedocs.io/en/latest/

Course website: https://uvadlc.github.io/
Course edition: Fall 2020 (Oct. 26 - Dec. 14)
Recordings: YouTube Playlist
Author: Phillip Lippe

For this year's course edition, we created a series of Jupyter notebooks that are designed to help you understanding the "theory" from the lectures by seeing corresponding implementations. We will visit various topics such as optimization techniques, graph neural networks, adversarial attacks and normalizing flows (for a full list, see below). The notebooks are there to help you understand the material and teach you details of the PyTorch framework, including PyTorch Lightning.

The notebooks are presented in the second hour of each lecture slot. During the tutorial sessions, we will present the content and explain the implementation of the notebooks. You can decide yourself rather you just want to look at the filled notebook, want to try it yourself, or code along during the practical session. We do not have any mandatory assignments on which you would be graded or similarly. However, we encourage you to get familiar with the notebooks and experiment or extend them yourself.

How to run the notebooks

On this website, you will find the notebooks exported into a HTML format so that you can read them from whatever device you prefer. However, we suggest that you also give them a try and run them yourself. There are three main ways of running the notebooks we recommend:

  • Locally on CPU: All notebooks are stored on the github repository that also builds this website. You can find them here: https://github.com/phlippe/uvadlc_notebooks/tree/master/docs/tutorial_notebooks. The notebooks are designed that you can execute them on common laptops without the necessity of a GPU. We provide pretrained models that are automatically downloaded when running the notebooks, or can manually be downloaoded from this Google Drive. The required disk space for the pretrained models and datasets is less than 1GB. To ensure that you have all the right python packages installed, we provide a conda environment in the same repository.

  • Google Colab: If you prefer to run the notebooks on a different platform than your own computer, or want to experiment with GPU support, we recommend using Google Colab. Each notebook on this documentation website has a badge with a link to open it on Google Colab. Remember to enable GPU support before running the notebook (Runtime -> Change runtime type). Each notebook can be executed independently, and doesn't require you to connect your Google Drive or similar. However, when closing the session, changes might be lost if you don't save it to your local computer or have copied the notebook to your Google Drive beforehand.

  • Lisa cluster: If you want to train your own (larger) neural networks based on the notebooks, you can make use of the Lisa cluster. However, this is only suggested if you really want to train a new model, and use the other two options to go through the discussion and analysis of the models. Lisa might not allow you with your student account to run jupyter notebooks directly on the gpu_shared partition. Instead, you can first convert the notebooks to a script using jupyter nbconvert --to script ...ipynb, and then start a job on Lisa for running the script. A few advices when running on Lisa:

    • Disable the tqdm statements in the notebook. Otherwise your slurm output file might overflow and be several MB large. In PyTorch Lightning, you can do this by setting progress_bar_refresh_rate=0 in the trainer.
    • Comment out the matplotlib plotting statements, or change :code:plt.show() to plt.savefig(...).

Tutorial-Lecture alignment

We will discuss 12 tutorials in total, each focusing on a different aspect of Deep Learning. The tutorials are spread across lectures, and we tried to cover something from every area. You can align the tutorials with the lectures as follows:

  • Lecture 1: Introduction to Deep Learning

    • Guide 1: Working with the Lisa cluster
    • Tutorial 2: Introduction to PyTorch
  • Lecture 2: Modular Learning

    • Tutorial 3: Activation functions
  • Lecture 3: Deep Learning Optimizations

    • Tutorial 4: Optimization and Initialization
  • Lecture 4: Convolutional Neural Networks

  • Lecture 5: Modern ConvNets

    • Tutorial 5: Inception, ResNet and DenseNet
  • Lecture 6: Recurrent Neural Networks

    • Tutorial 6: Transformers and Multi-Head Attention
  • Lecture 7: Graph Neural Networks

    • Tutorial 7: Graph Neural Networks
  • Lecture 8: Deep Generative Models

    • Tutorial 8: Deep Energy Models
  • Lecture 9: Deep Variational Inference

    • Tutorial 9: Deep Autoencoders
  • Lecture 10: Generative Adversarial Networks

    • Tutorial 10: Adversarial Attacks
  • Lecture 11: Advanced Generative Models

    • Tutorial 11: Normalizing Flows
    • Tutorial 12: Autoregressive Image Modeling
  • Lecture 12: Deep Stochastic Models

  • Lecture 13: Bayesian Deep Learning

  • Lecture 14: Deep Dynamics

Feedback, Questions or Contributions

This is the first time we present these tutorials during the Deep Learning course. As with any other project, small bugs and issues are expected. We appreciate any feedback from students, whether it is about a spelling mistake, implementation bug, or suggestions for improvements/additions to the notebooks. Please use the following link to submit feedback, or feel free to reach out to me directly per mail (p dot lippe at uva dot nl), or grab me during any TA session.

Comments
  • Doubt in GNN tutorial

    Doubt in GNN tutorial

    Tutorial 7 - Hey thanks for these wonderful notebooks. I was going through the GNN code especially the GAT section and just to be sure I understood everything correctly i replicated everything for your special case.

        a_input = torch.cat([torch.index_select(input=node_feats_flat, index=edge_indices_row, 
               dim=0), torch.index_select(input=node_feats_flat, index=edge_indices_col, dim=0)
                ], dim=-1) 
    
        # Calculate attention MLP output (independent for each head)
        attn_logits = torch.einsum('bhc,hc->bh', a_input, self.a)
        attn_logits = self.leakyrelu(attn_logits)
    

    In this line you stack the features of the nodes in each edge so say we have two nodes i, j we get a 2x2 matrix corresponding to them represented as [[a, b] , [c, d]] and we have the attention weights [[w, x] , [y, z]]. When we do the einsum operation we get a 2x1 matrix [[aw+bx, cy+dz]]. This is the first doubt shouldnt it be [aw+bx+cy+dz] according to the equation as for each i, j we have one value. If we have two heads then the a matrix should have shape 2x2*d where d=2 for our case

    But going further down keeping the same calculations as above. After the attention probabilities.

    node_feats = torch.einsum('bijh,bjhc->bihc', attn_probs, node_feats) which is this line

    which can be expanded into where ap is the attention probabilites and feats is the node features after the linear projection

           for i in range(4):
               p1 = ap[i, :, 0]
               p2 = ap[i, :, 1]
      
               f1 = feats[:, 0, :].squeeze() ## dimension 1 
               f2 = feats[:, 1, :].squeeze() ## dimension 2 
               p1.shape , f1.shape, f2.shape
    
               print(torch.tensor([(torch.dot(p1, f1), torch.dot(p2, f2))]))
    

    we see that the results is obtained by taking the two different probabilites from different heads and taking the dot product with two different dimensions of the feature matrix, but each head should operate on both the dimensions of the node features or atleast I hope it should. I check the output at each intermediate stage to be sure that it matches the results from the notebook you provide. Am i missing something

    question 
    opened by anorak94 7
  • Specifying the mask in Tutorial 6 (MHA)

    Specifying the mask in Tutorial 6 (MHA)

    Tutorial:-1 (6)

    Describe the bug This is more of a clarification question than a bug. First of all, thanks for the excellent tutorial documentation. It's been very clear overall.

    The reason I'm reaching out is to ask if a little more explanation could be provided on how and where to insert and apply the key padding mask to the attention_weights. Specifically, I have a Tensor of the form [True True True False False] for every sequence in the batch ([Batch, SeqLen]), with False marking padding tokens.

    However, scaled_dot_product shown below wants the mask to have the following dimensions: [Batch, Head, SeqLen, SeqLen]. To this end, I have simply expanded key padding mask in the row dimension (using, key_padding_mask.view(bsz, 1, 1, seqlen).expand(-1, num_heads, seqlen, -1)), yielding the following square [SeqLen, SeqLen] mask for a sequence:

    [[True True True False False], [True True True False False], [True True True False False], [True True True False False], [True True True False False]]

    I do this somewhere upstream, in the forward definition of TransformerPredictor. Next, the same mask is fed all the way down to scaled_dot_product where it is then used to mask out False tokens, rendering the attn_logits -9e15 where there used to be a False. However, in contrast to a previous attempt using length-normalized sequences, the model does not manage to learn. This makes me wonder whether the above implementation is not how it was meant to be designed. Am I missing anything important here?

    def scaled_dot_product(q, k, v, mask=None): d_k = q.size()[-1] attn_logits = torch.matmul(q, k.transpose(-2, -1)) attn_logits = attn_logits / math.sqrt(d_k) if mask is not None: attn_logits = attn_logits.masked_fill(mask == 0, -9e15) attention = F.softmax(attn_logits, dim=-1) values = torch.matmul(attention, v) return values, attention

    question 
    opened by StolkArjen 5
  • Tutorial 11 : Dequantization and quantization process

    Tutorial 11 : Dequantization and quantization process

    Thank you for your great tutorials!

    I'm tring tutorial 11 and have 2 questions on the dequatization and the quantization process (codes in 6th - 8th cells).

    • You mentioned, between 7th and 8th cells, that the test fails because of numerical inaccuracy. Is it really correct?

    I found 3 ldj updates for dequantization process and 2 for quantization process. I guess this means the quantization process is not theoritically invert of the dequantization process. This is because scaling to prevent boundaries 0 and 1 for the dequantization process in sigmoid function in the 6th cell.

    z = z * (1 - self.alpha) + 0.5 * self.alpha

    I add codes in sigmoid function for quantization process:

    ldj -= np.log(1 - self.alpha) * np.prod(z.shape[1:]) z = (z - 0.5 * self.alpha) / (1 - self.alpha)

    With these code, the test succeeded.

    Smaller values(z < self.alpha) can also be shifted to z = self.alpha, I guess. This does not require ldj update. And, of course, because the test fail is not serious and ldj update is very small, we can ignore this.

    • The figure, output of 8th cell, shows the probability distribution after dequantization. Is the figure is correct?

    The area of -0.5 < z < 0.5 is larger than 1.5, I guess. It means all area is much larger than 1. And I found plotted "prob" array in 8th cell is [1, 1, ..., 1]. In the cell, prior array is assumed 1 for each value means uniform distribution. So, the "plot" array should be normalized before muliplied by the "prob" array:

    prob = prob * prior[out] / quants

    Theoritically, the figure shows prob = e^{-z}/(1+e^{-z})^2. The output of the modification looks like e^{-z}/(1+e^{-z})^2.

    Thank you.

    bug question 
    opened by sy-eng 4
  • Tutorial 12: Vertical and horizontal convolution stacks

    Tutorial 12: Vertical and horizontal convolution stacks

    Hi, Thanks for sharing such a great notebook! I have a question about the vertical and horizontal convolution stacks in tutorial 12. Based on your explanation:

    The vertical convolution is not allowed to work on features from the horizontal convolution. In the feature map of the horizontal convolutions, a pixel contains information about all of the "true" pixels on the left. If we apply a vertical convolution which also uses features from the right, we effectively expand our receptive field to the true input which we want to prevent. Thus, the feature maps can only be merged for the horizontal convolution.

    I'm still confused about why for horizontal convolution we need to add horiz_conv(horiz_img) + vert_img but for vertical convolution, we only need vert_conv(vert_img).

    Would appreciate if you can explain more about this!

    question 
    opened by WNZhao1988 4
  • Loss should be real - fake in Tutorial 8

    Loss should be real - fake in Tutorial 8

    Tutorial: 8

    Describe the bug The loss function used in the implementation of 'DeepEnergyModel()' in tutorial 8 (cdiv_loss) does not match the loss described earlier in the algorithm. It should be: cdiv_loss = real_out.mean() - fake_out.mean(), not the other way around.

    To Reproduce (if any steps necessary) Steps to reproduce the behavior:

    1. Go to Tutorial 8
    2. Scroll down to Training Algorithm
    3. See error in Algorithm 2 vs code in cell 6.

    Expected behavior cdiv_loss = real_out.mean() - fake_out.mean()

    question 
    opened by najwalb 3
  • Train vs Test dataset reconstructions for Autoencoder

    Train vs Test dataset reconstructions for Autoencoder

    In the Tutorial 9 when comparing latent dimensionality, the plots show reconstruction results on the train dataset. If the model overfits (for example, if someone decides to make the model bigger) train image reconstruction quality might become misleading. I recommend using test dataset images for that experiment.

    bug 
    opened by michaelklachko 3
  • Dataset unavailable

    Dataset unavailable

    Hi,

    the dataset (https://surfdrive.surf.nl/files/index.php/s/6YWMO1eiVXI4EkB/download) that is used in https://github.com/phlippe/uvadlc_notebooks/blob/master/docs/tutorial_notebooks/DL2/sampling/graphs.ipynb seems to be unavailable. Is there any way to still download (or generate) it?

    Best and thanks for your help, Gerrit

    bug 
    opened by gerritgr 3
  • Multihead Attention

    Multihead Attention

    It seems like the implementation of MultiheadAttention is not consistent with the "Multi-Head Attention" figure. In particular, the projection: self.qkv_proj = nn.Dense(3*self.embed_dim,...) Should actually be: self.qkv_proj = nn.Dense(3*self.embed_dim*self.num_heads,...) Am I missing something?

    [this would also require to change the line: values = values.reshape(batch_size, seq_length, self.embed_dim) to: values = values.reshape(batch_size, seq_length, -1) ] Thanks.

    opened by ofermeshi 2
  • Question regarding image transforms in SimCLR Tutorial

    Question regarding image transforms in SimCLR Tutorial

    In Tutorial 17, SimCLR implementation; you've mentioned that you didn't use color distortion incase of train image transforms. Because it changes the color distribution which is an important feature for classification. But you've used RandomGrayscale(p=0.2) in the train image transforms. Converting an RGB image to Grayscale image changes the color distribution right?

    Also can you point to the resource which tells that color distribution is an important feature?

    bug 
    opened by BalajiAI 2
  • Tutorial 11 : Runtime error in train_flow function

    Tutorial 11 : Runtime error in train_flow function

    Thank you for your great tutorials!

    I tried tutorial 11 on my laptop (it has no gpu.) and I got a runtime error in train_flow function. Its error message said map_location in torch.load should be set. So I modified ckpt = torch.load(pretrained_filename) to ckpt = torch.load(pretrained_filename, map_location=device).

    I guess this modification is good for PCs without gpu.

    Thank you.

    opened by sy-eng 2
  • Tutorial 6: error in the `MultiheadAttention.forward` method

    Tutorial 6: error in the `MultiheadAttention.forward` method

    Tutorial: 6

    Describe the bug In the MultiheadAttention.forward method, the line:

            values = values.reshape(batch_size, seq_length, embed_dim)
    

    should read:

            values = values.reshape(batch_size, seq_length, self.embed_dim)
    

    The embed_dim should not come from the input tensor, i.e. instead of:

            batch_size, seq_length, embed_dim = x.size()
    

    we should probably have something like:

            batch_size, seq_length, _ = x.size()
    

    or

            batch_size, seq_length, input_dim = x.size()
    

    To Reproduce (if any steps necessary) Steps to reproduce the behavior:

    1. Go to the In [5]: cell, the one containing class MultiheadAttention(nn.Module):
    2. Run it
    3. Insert a cell under it
    4. Run the following:
    batch_size = 3
    seq_len = 11
    input_dim = 13
    num_heads = 19
    embed_dim = 17 * num_heads
    
    mha = MultiheadAttention(input_dim, embed_dim, num_heads)
    
    input_tensor = torch.rand((batch_size, seq_len, input_dim))
    values = mha(input_tensor)
    
    values.shape
    

    which yields the following error:

    RuntimeError                              Traceback (most recent call last)
    [<ipython-input-50-38c850c37259>](https://localhost:8080/#) in <module>
          8 
          9 input_tensor = torch.rand((batch_size, seq_len, input_dim))
    ---> 10 values = mha(input_tensor)
         11 
         12 values.shape
    
    1 frames
    [<ipython-input-49-45be71448f04>](https://localhost:8080/#) in forward(self, x, mask, return_attention)
         36         values = values.permute(0, 2, 1, 3) # [Batch, SeqLen, Head, Dims]
         37         # values = values.reshape(batch_size, seq_length, embed_dim)
    ---> 38         values = values.reshape(batch_size, seq_length, embed_dim)
         39         o = self.o_proj(values)
         40 
    
    RuntimeError: shape '[3, 11, 13]' is invalid for input of size 10659
    

    Expected behavior After making the suggested change, the output is:

    torch.Size([3, 11, 323])
    

    which is what I was expecting to get.

    Runtime environment (please complete the following information): Google Colab, both CPU and GPU.

    bug 
    opened by MTDzi 2
Owner
Phillip Lippe
PhD student at University of Amsterdam, QUVA Lab
Phillip Lippe
The materials used in the SaxonJS tutorial presented at Declarative Amsterdam, 2021

SaxonJS-Tutorial-2021, version 1.0.4 Last updated on 4 November, 2021. Table of contents Background Prerequisites Starting a web server Running a Java

Saxonica 11 Oct 23, 2022
School of Artificial Intelligence at the Nanjing University (NJU)School of Artificial Intelligence at the Nanjing University (NJU)

F-Principle This is an exercise problem of the digital signal processing (DSP) course at School of Artificial Intelligence at the Nanjing University (

Thyrix 5 Nov 23, 2022
All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

Daniel Bourke 3.4k Jan 7, 2023
Teaching end to end workflow of deep learning

Deep-Education This repository is now available for public use for teaching end to end workflow of deep learning. This implies that learners/researche

Data Lab at College of William and Mary 2 Sep 26, 2022
The offcial repository for 'CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos', SIGIR2022

CharacterBERT-DR The offcial repository for CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos, Sh

ielab 11 Nov 15, 2022
A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

Eugenio Herrera 175 Dec 29, 2022
This Jupyter notebook shows one way to implement a simple first-order low-pass filter on sampled data in discrete time.

How to Implement a First-Order Low-Pass Filter in Discrete Time We often teach or learn about filters in continuous time, but then need to implement t

Joshua Marshall 4 Aug 24, 2022
Jittor Medical Segmentation Lib -- The assignment of Pattern Recognition course (2021 Spring) in Tsinghua University

THU模式识别2021春 -- Jittor 医学图像分割 模型列表 本仓库收录了课程作业中同学们采用jittor框架实现的如下模型: UNet SegNet DeepLab V2 DANet EANet HarDNet及其改动HarDNet_alter PSPNet OCNet OCRNet DL

null 48 Dec 26, 2022
Code for the AI lab course 2021/2022 of the University of Verona

AI-Lab Code for the AI lab course 2021/2022 of the University of Verona Set-Up the environment for the curse Download Anaconda for your System. Instal

Davide Corsi 5 Oct 19, 2022
Final Project for the CS238: Decision Making Under Uncertainty course at Stanford University in Autumn '21.

Final Project for the CS238: Decision Making Under Uncertainty course at Stanford University in Autumn '21. We optimized wind turbine placement in a wind farm, subject to wake effects, using Q-learning.

Manasi Sharma 2 Sep 27, 2022
The ICS Chat System project for NYU Shanghai Fall 2021

ICS_Chat_System [Catenger] This is the ICS Chat System project for NYU Shanghai Fall 2021 Creators: Shavarsh Melikyan, Skyler Chen and Arghya Sarkar,

null 1 Dec 20, 2021
Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

null 7 Jun 22, 2022
Official implementation for (Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation, CVPR-2021)

FRSKD Official implementation for Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation (CVPR-2021) Requirements Pytho

null 75 Dec 28, 2022
PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021)

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021) This repo presents PyTorch implementation of M

Evgeny 79 Dec 19, 2022
A Machine Teaching Framework for Scalable Recognition

MEMORABLE This repository contains the source code accompanying our ICCV 2021 paper. A Machine Teaching Framework for Scalable Recognition Pei Wang, N

null 2 Dec 8, 2021
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

This repo is the official implementation of "Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework". @inproceedings{zhou2021insta

null 34 Dec 31, 2022
PassAPI is a password generator in hash format and fully developed in Python, with the aim of teaching how to handle and build

simple, elegant and safe Introduction PassAPI is a password generator in hash format and fully developed in Python, with the aim of teaching how to ha

Johnsz 2 Mar 2, 2022
Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

Manav Mishra 4 Apr 15, 2022
A template repository for submitting a job to the Slurm Cluster installed at the DISI - University of Bologna

Cluster di HPC con GPU per esperimenti di calcolo (draft version 1.0) Per poter utilizzare il cluster il primo passo è abilitare l'account istituziona

null 20 Dec 16, 2022