Tutorials and implementations for "Self-normalizing networks"

Related tags

Deep Learning SNNs
Overview

Self-Normalizing Networks

Tutorials and implementations for "Self-normalizing networks"(SNNs) as suggested by Klambauer et al. (arXiv pre-print).

Versions

  • see environment file for full list of prerequisites. Tutorial implementations use Tensorflow > 2.0 (Keras) or Pytorch, but versions for Tensorflow 1.x users based on the deprecated tf.contrib module (with separate environment file) are also available.

Note for Tensorflow >= 1.4 users

Tensorflow >= 1.4 already has the function tf.nn.selu and tf.contrib.nn.alpha_dropout that implement the SELU activation function and the suggested dropout version.

Note for Tensorflow >= 2.0 users

Tensorflow 2.3 already has selu activation function when using high level framework keras, tf.keras.activations.selu. Must be combined with tf.keras.initializers.LecunNormal, corresponding dropout version is tf.keras.layers.AlphaDropout.

Note for Pytorch users

Pytorch versions >= 0.2 feature torch.nn.SELU and torch.nn.AlphaDropout, they must be combined with the correct initializer, namely torch.nn.init.kaiming_normal_ (parameter, mode='fan_in', nonlinearity='linear') as this is identical to lecun initialisation (mode='fan_in') with a gain of 1 (nonlinearity='linear').

Tutorials

Tensorflow 1.x

  • Multilayer Perceptron on MNIST (notebook)
  • Convolutional Neural Network on MNIST (notebook)
  • Convolutional Neural Network on CIFAR10 (notebook)

Tensorflow 2.x (Keras)

Pytorch

  • Multilayer Perceptron on MNIST (notebook)
  • Convolutional Neural Network on MNIST (notebook)
  • Convolutional Neural Network on CIFAR10 (notebook)

Further material

Design novel SELU functions (Tensorflow 1.x)

  • How to obtain the SELU parameters alpha and lambda for arbitrary fixed points (notebook)

Basic python functions to implement SNNs (Tensorflow 1.x)

are provided as code chunks here: selu.py

Notebooks and code to produce Figure 1 (Tensorflow 1.x)

are provided here: Figure1, builds on top of the biutils package.

Calculations and numeric checks of the theorems (Mathematica)

are provided as mathematica notebooks here:

UCI, Tox21 and HTRU2 data sets

Comments
  • Is SeLU alone having positive impact on accuracy?

    Is SeLU alone having positive impact on accuracy?

    Hi,

    In MNIST, Cifar-10 tutorials there is Selu as well as alpha dropout used and result after experiment is that SNN outperforms ReLU, ELU based models. Mnist based models (Lenet) can work without dropout, batchNorm with quite good accuracy, so my question is if Selu alone (no dropout and Batch Norm) is according to your observations, increasing accuracy? What I mean is that I have model that is working on MNIST and is a basic CNN eg. convolutions, ReLU, Fully Connected and softmax, and assuming that initialization of weights and normalization of input was done correctly can I expect increased accuracy?

    opened by jczaja 6
  • Can someone help me with creating the csv from sdf with exactly the same number of features?

    Can someone help me with creating the csv from sdf with exactly the same number of features?

    So I used skchem pipeline to feature extract meaningful features from the sdf train file mention in the official Tox21 challenge website. But I am not able to get the 801 features that are there in the zip csv file . Can someone help me with that py code. My aim is to experiment with the architucture with SeLU and technique and not get into domain specific feature extraction details.

    opened by pdcoded 5
  • How should we handle skip connections properly?

    How should we handle skip connections properly?

    https://github.com/bioinf-jku/SNNs/commit/1a366963a18b328c4e5d13f95fb9993fe990e4fb#diff-c7c21fc90a9f9340db7b45c70ef0e393R13

    Could you please give me some hints?

    Thank you

    opened by qbx2 3
  • Categorical and continuous variables preprocessing

    Categorical and continuous variables preprocessing

    With the UCI data, how did you preprocess the categorical and continuous variables?

    Did you enforce a min/max or did you just standardize the continuous variables? And for the categorical, did you use one-hot/dummy coding or standardized them?

    Edit: Also, what batch size did you use? Did it depend on the sample size?

    Thanks!

    opened by AlexiaJM 3
  • SELU values for a truncated normal distribution

    SELU values for a truncated normal distribution

    https://github.com/bioinf-jku/SNNs/blob/f992b229795712a54c67266995d8ea522cd10770/selu.py#L31 and many other examples (e.g. Keras) do an additional trick where samples are resampled if they're not within two standard deviations of the mean. I'm curious how much of an effect this truncation has on the fix points derivation? Are they analytically identical for a normal distribution and a truncated normal distribution?

    I read in the paper that "Uniform and truncated Gaussian distributions with these moments led to networks with similar behavior." but this feels unsatisfactory to me. Maybe a small discrepancy becomes really problematic for deeper networks? This aligns with my experience that it's still beneficial to have batchnorm/layernorm with SELU.

    opened by carlthome 2
  • cnn_graph

    cnn_graph

    Thank you for sharing the code. I have successfully applied them (SELU and alpha_dropout) building a pure fully connected network (7 layers) for a regression problem (the R2 between the predicted and the observed variable is greater than 0.99!).

    Right now I'm trying to replace the RELU in cnn_graph with SELU. Unlike the standard cnn, cnn_graph performs the convolution on the graph Fourier transformed inputs (a recursive process involves multiple matrix multiplications between the layer-specific graph Laplacian and the layer inputs). The original normalized input is shifted to some unknown distribution by the graph Fourier transform. Therefore, I don't know how to apply the SELU even on the first cnn_graph layer. Could you give me some suggestions on this?

    Besides, can I use some normalization on the output of cnn_graph before feeding them into fully connected network using SELU as activation function?

    Thanks!

    opened by maosi-chen 2
  • batch normalization

    batch normalization

    @bioinf-jku,thank you for your nice work! I am a new to deep learning,and I have some simply questions,since the net in your test code is not very deep,It makes no big different to add batch normalization layers after each convolution layers,but if the net is very deep,is it necessary to add batch normalization layer after each convolution layers?Or there is no need to do so since the activation function selu has the ability to batch normalize the input layer? thank you in advance!

    opened by zhly0 2
  • Questions on the self-normalizing property

    Questions on the self-normalizing property

    I think the proposed SELU is an powerful non-linearity for MLPs. The self-normalizing property comes from the derivation of the forward propagation. This property could be confirmed by the following codes.

    import torch
    f = torch.nn.functional.selu
    x = torch.randn(1024, 1024) * 456 + 123
    lin = torch.nn.Linear(1024, 1024, bias=False)
    _ = torch.nn.init.kaiming_normal_(lin.weight, nonlinearity="linear")
    with torch.no_grad():
        for i in range(100): x = f(lin(x))
    print(f"mean = {x.mean()}") # 0.00253
    print(f"var = {(x ** 2).mean()}") # 1.05135
    

    However, the self-normalizing property only holds for the forward pass, it does not hold for the backward pass (is it right?). Noisy gradients will definitely be harmful for the learning processes. The proposed SELU is based on ELU, which is based on a selective preference. I wonder if there could exists a more general non-linearity that has self-normalizing properties for both the forward and backward propagations. If it is possible, how could we find it?

    opened by Karbo123 1
  • Effect of bias in linear layers

    Effect of bias in linear layers

    I've been experimenting with SELUs, and found they provide an improvement in terms of computation time during training with respect to batchnorm, thank you for your work.

    I just have a question regarding the effect of bias in linear layers. As I understand it, every neuron should have mean zero in order to stay in the self regularizing zone, but bias precisely shifts that mean. In my experiments however I didn't see much of an effect either removing or adding biases. I see that in the tutorial notebook bias is used, and I wonder wether you've considered the issue.

    opened by ptrcarta 1
  • information about step (1) in selu.py

    information about step (1) in selu.py

    Hi, thank you for the paper and the code. Could you please tell us how you scale the inputs to zero mean and unit variance in step (1) in selu.py

    Thank you

    opened by AzizCode92 1
  • Alpha dropouts

    Alpha dropouts

    Hi,

    Just a question: when applying the alpha dropouts, on the prediction phase, not the training, do you scale by p where p is the probability to be kept? https://stats.stackexchange.com/questions/205932/dropout-scaling-the-activation-versus-inverting-the-dropout

    opened by edubois 1
Owner
Institute of Bioinformatics, Johannes Kepler University Linz
Software of the Institute of Bioinformatics, JKU Linz. Updated repo at: https://github.com/ml-jku
Institute of Bioinformatics, Johannes Kepler University Linz
Simple tutorials on Pytorch DDP training

pytorch-distributed-training Distribute Dataparallel (DDP) Training on Pytorch Features Easy to study DDP training You can directly copy this code for

Ren Tianhe 188 Jan 6, 2023
Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

Phillip Lippe 1.1k Jan 7, 2023
Pytorch tutorials for Neural Style transfert

PyTorch Tutorials This tutorial is no longer maintained. Please use the official version: https://pytorch.org/tutorials/advanced/neural_style_tutorial

Alexis David Jacq 135 Jun 26, 2022
Pytorch Geometric Tutorials

Pytorch Geometric Tutorials

Antonio Longa 648 Jan 8, 2023
Python/Rust implementations and notes from Proofs Arguments and Zero Knowledge

What is this? This is where I'll be collecting resources related to the Study Group on Dr. Justin Thaler's Proofs Arguments And Zero Knowledge Book. T

Thor 66 Jan 4, 2023
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Machine Learning From Scratch About Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The purpose

Erik Linder-Norén 21.8k Jan 9, 2023
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Algo Phantoms 81 Nov 26, 2022
Pytorch Lightning 1.2k Jan 6, 2023
Implementations of orthogonal and semi-orthogonal convolutions in the Fourier domain with applications to adversarial robustness

Orthogonalizing Convolutional Layers with the Cayley Transform This repository contains implementations and source code to reproduce experiments for t

CMU Locus Lab 36 Dec 30, 2022
Custom TensorFlow2 implementations of forward and backward computation of soft-DTW algorithm in batch mode.

Batch Soft-DTW(Dynamic Time Warping) in TensorFlow2 including forward and backward computation Custom TensorFlow2 implementations of forward and backw

null 19 Aug 30, 2022
The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

MIC-DKFZ 1.2k Jan 4, 2023
PyTorch implementations of deep reinforcement learning algorithms and environments

Deep Reinforcement Learning Algorithms with PyTorch This repository contains PyTorch implementations of deep reinforcement learning algorithms and env

Petros Christodoulou 4.7k Jan 4, 2023
Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Overview PyTorch 0.4.1 | Python 3.6.5 Annotated implementations with comparative introductions for minimax, non-saturating, wasserstein, wasserstein g

Shayne O'Brien 471 Dec 16, 2022
Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

null 183 Dec 28, 2022
Implementations of polygamma, lgamma, and beta functions for PyTorch

lgamma Implementations of polygamma, lgamma, and beta functions for PyTorch. It's very hacky, but that's usually ok for research use. To build, run: .

Rachit Singh 24 Nov 9, 2021
Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.

Semi-supervised-learning-for-medical-image-segmentation. Recently, semi-supervised image segmentation has become a hot topic in medical image computin

Healthcare Intelligence Laboratory 1.3k Jan 3, 2023
Object DGCNN and DETR3D, Our implementations are built on top of MMdetection3D.

This repo contains the implementations of Object DGCNN (https://arxiv.org/abs/2110.06923) and DETR3D (https://arxiv.org/abs/2110.06922). Our implementations are built on top of MMdetection3D.

Wang, Yue 539 Jan 7, 2023
A library for Deep Learning Implementations and utils

deeply A Deep Learning library Table of Contents Features Quick Start Usage License Features Python 2.7+ and Python 3.4+ compatible. Quick Start $ pip

Achilles Rasquinha 1 Dec 12, 2022
Tensorflow AffordanceNet and AffContext implementations

AffordanceNet and AffContext This is tensorflow AffordanceNet and AffContext implementations. Both are implemented and tested with tensorflow 2.3. The

Beatriz Pérez 6 Dec 1, 2022