Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Angusdu

Last update: Oct 18, 2022

Related tags

Deep Learning Efficient_SAM

Overview

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Code for “Efficient Sharpness-aware Minimization for Improved Training of Neural Networks”

Requisite

This code is implemented in PyTorch, and we have tested the code under the following environment settings:

python = 3.8.8
torch = 1.8.0
torchvision = 0.9.0

What is in this repository

Codes for our ESAM on CIFAR10/CIFAR100 datasets.

How to use it

from utils.layer_dp_sam import ESAM
base_optimizer = torch.optim.SGD(model.parameters(),lr=args.learning_rate,momentum=0.9,weight_decay=args.weight_decay)
optimizer = ESAM(paras, base_optimizer, rho=args.rho, weight_dropout=args.weight_dropout,adaptive=args.isASAM,nograd_cutoff=args.nograd_cutoff,opt_dropout = args.opt_dropout,temperature=args.temperature)

--beta the SWP hyperparameter

--gamma the SDS hyperparameter

During training loss_fct should have reduction="none", to return instance-wise losses. defined_backward is the function used for DDP and mixed precision backward

loss_fct = torch.nn.CrossEntropyLoss(reduction="none")
def defined_backward():
    if args.fp16:
    with amp.scale_loss(loss, optimizer0) as scaled_loss:
        scaled_loss.backward()
    else:
        loss.backward()

paras = [inputs,targets,loss_fct,model,defined_backward]
optimizer.paras = paras
optimizer.step()
predictions_logits,loss = optimizer.returnthings

Example

bash run.sh

Reference Code

[1] SAM

Degree-Quant: Quantization-Aware Training for Graph Neural Networks.

Degree-Quant This repo provides a clean re-implementation of the code associated with the paper Degree-Quant: Quantization-Aware Training for Graph Ne

35 Oct 7, 2022

Learning to Initialize Neural Networks for Stable and Efficient Training

GradInit This repository hosts the code for experiments in the paper, GradInit: Learning to Initialize Neural Networks for Stable and Efficient Traini

124 Dec 30, 2022

AlphaNet Improved Training of Supernet with Alpha-Divergence

AlphaNet: Improved Training of Supernet with Alpha-Divergence This repository contains our PyTorch training code, evaluation code and pretrained model

87 Oct 10, 2022

A pytorch implementation of Paper "Improved Training of Wasserstein GANs"

WGAN-GP An pytorch implementation of Paper "Improved Training of Wasserstein GANs". Prerequisites Python, NumPy, SciPy, Matplotlib A recent NVIDIA GPU

1.4k Dec 14, 2022

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training By Likun Cai, Zhi Zhang, Yi Zhu, Li Zhang, Mu Li, Xiangyang Xue. This

290 Dec 29, 2022

A repository that shares tuning results of trained models generated by TensorFlow / Keras. Post-training quantization (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization), Quantization-aware training. TensorFlow Lite. OpenVINO. CoreML. TensorFlow.js. TF-TRT. MediaPipe. ONNX. [.tflite,.h5,.pb,saved_model,tfjs,tftrt,mlmodel,.xml/.bin, .onnx]

PINTO_model_zoo Please read the contents of the LICENSE file located directly under each folder before using the model. My model conversion scripts ar

2.4k Jan 5, 2023

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks (SDPoint) This repository contains the cod

17 Jul 4, 2022

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly Code for this paper Ultra-Data-Efficient GAN Tra

77 Oct 5, 2022

Unofficial & improved implementation of NeRF--: Neural Radiance Fields Without Known Camera Parameters

[Unofficial code-base] NeRF--: Neural Radiance Fields Without Known Camera Parameters [ Project | Paper | Official code base ] ⬅️ Thanks the original

239 Dec 22, 2022

Comments

the mask in your implementation is inconsistent with your paper

Based on your implementation Esam.py, same parameters are randomly masked at the descent step (i.e., the second step as follows). But in your paper (the paragraph above eq(5)), the mask is added to the perturbance $\epsilon$. As they are very different, could you please provide some explanations?

@torch.no_grad()
    def second_step(self, zero_grad=False):
        for group in self.param_groups:
            for p in group["params"]:
                if p.grad is None or not self.state[p]: continue
                p.sub_(self.state[p]["e_w"])  # get back to "w" from "w + e(w)"
                self.state[p]["e_w"] = 0

                if random.random() > self.beta:
                    p.requires_grad = False

        self.base_optimizer.step()  # do the actual "sharpness-aware" update

        if zero_grad: self.zero_grad()

opened by jxs129 3

Support for multiple loss functions and GradScaler

I came across your work and you did a fantastic job in improving the performance of SAM.

It seems that the current implementation supports only a single loss function. While the code example does include the case for fp16, there are no mentions of gradient scaling, which is commonly used together in AMP.

Are there any plans to support multiple loss functions and GradScaler?

opened by kenmbkr 1

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Related tags

Overview

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Requisite

What is in this repository

How to use it

Example

Reference Code

You might also like...

Degree-Quant: Quantization-Aware Training for Graph Neural Networks.

Learning to Initialize Neural Networks for Stable and Efficient Training

AlphaNet Improved Training of Supernet with Alpha-Divergence

A pytorch implementation of Paper "Improved Training of Wasserstein GANs"

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

Unofficial & improved implementation of NeRF--: Neural Radiance Fields Without Known Camera Parameters

Comments

the mask in your implementation is inconsistent with your paper

Support for multiple loss functions and GradScaler

Owner

Angusdu

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

Implement of "Training deep neural networks via direct loss minimization" in PyTorch for 0-1 loss

The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

ICLR21 Tent: Fully Test-Time Adaptation by Entropy Minimization

Tilted Empirical Risk Minimization (ICLR '21)

Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"