Official implementation for (Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation, CVPR-2021)

Related tags

Deep Learning FRSKD
Overview

FRSKD

Official implementation for Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation (CVPR-2021)

Requirements

  • Python3
  • Pytorch (>1.4.0)
  • torchvision
  • numpy
  • Pillow
  • tqdm

Training

In this code, you can reproduce the experiment results of classification task in submitted paper. The datasets are all open-sourced, so it is easy to download. Example training settings are for ResNet18 on CIFAR-100. Detailed hyperparameter settings are enumerated in the paper.

  • Training with FRSKD
python main.py --data_dir PATH_TO_DATASET \
--data CIFAR100 --batch_size 128 --alpha 2 --beta 100 \
--aux none --aux_lamb 0 --aug none --aug_a 0
  • Training with FRSKD + SLA
python main.py --data_dir PATH_TO_DATASET \
--data CIFAR100 --batch_size 128 --alpha 2 --beta 100 \
--aux sla --aux_lamb 1 --aug none --aug_a 0
  • Training with FRSKD + Mixup
python main.py --data_dir PATH_TO_DATASET \
--data CIFAR100 --batch_size 128 --alpha 2 --beta 100 \
--aux none --aux_lamb 0 --aug mixup --aug_a 0.2
  • Training with FRSKD + CutMix
python main.py --data_dir PATH_TO_DATASET \
--data CIFAR100 --batch_size 128 --alpha 2 --beta 100 \
--aux none --aux_lamb 0 --aug cutmix --aug_a 1.0
Comments
  • can your model / bifpn be saved in tensorboard?

    can your model / bifpn be saved in tensorboard?

    To visualize your network architecture more clearly, I intend to save the network architecture including model and bifpn into a tensorboard.

    I add the following codes into your classification/main.py:

    writer = SummaryWriter('./tensorboard/'+args_path) images, labels = next(iter(train_loader)) images = images.cuda() writer.add_graph(model, images) writer.add_graph(bifpn, images)

    However, it returns many erros.

    Therefore, I want to find some help from you to visualize your network architecture using a tensorboard.

    opened by XueZ-phd 0
  • The question about num_features

    The question about num_features

    When Training with FRSKD,

    In train, the first feat has been deleted in feat_lst.

    https://github.com/MingiJi/FRSKD/blob/bed26010383cbf4b8ae3a5d38c414d052c68a6ed/classification/utils.py#L60-L61

    But, in model bifpn forward, it still deletes the first feat.

    https://github.com/MingiJi/FRSKD/blob/bed26010383cbf4b8ae3a5d38c414d052c68a6ed/classification/models/bifpn.py#L27-L28

    My guess, based on network_channels, is that only one delete operation is required.

    https://github.com/MingiJi/FRSKD/blob/bed26010383cbf4b8ae3a5d38c414d052c68a6ed/classification/main.py#L75-L76

    So, the following line should also be deleted

    https://github.com/MingiJi/FRSKD/blob/bed26010383cbf4b8ae3a5d38c414d052c68a6ed/classification/models/bifpn.py#L28

    opened by wnma3mz 0
  • Can you tell me how to visualize-attention-map at 50 epochs?

    Can you tell me how to visualize-attention-map at 50 epochs?

    'Weselect the attention maps at the 50-th epoch to observe thedistillation behaviors in the learning process' . As you said in your paper, but i can't visualization attention map like you did.Can you tell me how to code it? Thanks.

    opened by sheyuwei 1
  • The loss value is nan when I try to use resnet50 as the backbone network

    The loss value is nan when I try to use resnet50 as the backbone network

    Hi,

    Thanks for your contributions! When I tried to implement your code on Resnet50 and Resnet101, the loss value is nan. I just modified your code like this: def cifarresnet50(pretrained=False, **kwargs): return CIFAR_ResNet(Bottleneck, [3, 4, 6, 3], **kwargs) Then when I tried to train this network, the loss value is always nan.

    (GPU: RTX 3090)

    How to fix this error? Thanks!

    opened by zhengli97 0
  • The value of Miou is abnormally small,why?

    The value of Miou is abnormally small,why?

    I'm a little confused about whether it's a data problem or a code problem Here is the log information: ,epoch,train_loss,mIOU,time 0,1,0.7561731152236462,0.03367172123101207,172.40587258338928 1,2,0.748881900539765,0.03683467247028115,174.69739818572998 2,3,0.7231125767127826,0.03413524813319142,176.4995617866516 3,4,0.7103231128018636,0.034718840754135,175.68166756629944 4,5,0.6948323705448554,0.03985984205367289,176.37116837501526 5,6,0.6963401340091457,0.034812881275243984,175.4205687046051 6,7,0.670259090176282,0.039593741565003365,177.59505224227905 7,8,0.657163177903455,0.04566971232266198,177.0455219745636 8,9,0.6466049935955268,0.04648490031142609,175.9458737373352 9,10,0.6441954008948344,0.040296921257545505,175.44736456871033 10,11,0.6518893415251603,0.040108131647718384,173.6164116859436 11,12,0.6342953191353724,0.04960548057842536,174.7338092327118 12,13,0.627325470559299,0.04964727835046585,176.74344968795776 13,14,0.6132798489326468,0.05281939904056167,175.50420832633972 14,15,0.6002895892239534,0.055861818521504104,176.07024669647217 15,16,0.6025141052042062,0.05466266379947758,177.30515456199646 16,17,0.6029521900300796,0.04238770927785353,175.8492751121521 17,18,0.5934671195080647,0.07225978056218663,173.75626182556152 18,19,0.5773926787078381,0.048255481604236414,174.74913358688354 19,20,0.582425432673727,0.059308266458931364,176.88284063339233 20,21,0.5682540397661237,0.05038267338739297,174.98791122436523 21,22,0.5746843272533554,0.05538608518532103,173.91722106933594 22,23,0.5666266437619925,0.08541678590371297,174.55052709579468 23,24,0.5560441032911723,0.06600933025918862,175.00977063179016 24,25,0.5517762127833871,0.06934330535863947,178.28503155708313 25,26,0.5430047279940202,0.07609816247099357,173.69276547431946 26,27,0.5501473788888409,0.08342366132038433,177.11464619636536 27,28,0.5378253686313446,0.09859108165162175,176.39698958396912 28,29,0.5331599443721083,0.0678868360658615,176.34525656700134 29,30,0.5324760234126678,0.10438792503626895,176.21589493751526 30,31,0.5238752497646672,0.1128183428160334,175.78917503356934 31,32,0.5251926730315273,0.12039694674205649,176.04383778572083 32,33,0.5220876589345818,0.09766672767178268,174.95863962173462 33,34,0.5175259400588962,0.10621496236219143,176.0151607990265 34,35,0.5067871576175094,0.12684976035995407,177.03462433815002 35,36,0.49546105708353794,0.11654276929771637,176.7458143234253 36,37,0.494680989605303,0.124827549423532,176.56689929962158 37,38,0.506546473918626,0.13644115255903136,176.26826405525208 38,39,0.49985619039776236,0.13591350946832806,175.53716468811035 39,40,0.4863511062083909,0.1471462457362383,172.9881386756897 40,41,0.47111362118560535,0.16850882157957298,178.14349603652954 41,42,0.4635956700389775,0.17315480519723986,174.8266270160675 42,43,0.47008128091692924,0.1753281237951893,177.32808256149292 43,44,0.46045722468541217,0.16734526537605668,181.26892566680908 44,45,0.47051489163333404,0.1779139305812485,176.50252866744995 45,46,0.4693494217040447,0.1737308142689212,173.95223832130432 46,47,0.4636622975007273,0.17580288567626196,174.41174721717834 47,48,0.47545362876441616,0.18437971651423307,175.14617395401 48,49,0.48231911250891596,0.17996943729784323,175.4971604347229 49,50,0.4861245102678927,0.1646887044495716,172.72737431526184 50,51,0.4998336468751614,0.1784439646198613,178.50492978096008 51,52,0.5165176399481984,0.17607720511222485,177.06681728363037 52,53,0.5195794134902266,0.17282210617745608,177.14761924743652 53,54,0.5199347996654419,0.17748172026042575,173.87622380256653 54,55,0.527843453706457,0.17620730130932943,174.56411957740784 55,56,0.5464535670784804,0.1823876403320441,175.65909218788147 56,57,0.5455715203514466,0.17574022854965982,175.3632197380066 57,58,0.5364228865275016,0.18073436845138688,173.09280729293823 58,59,0.536486465913745,0.17925631794081523,181.71167612075806 59,60,0.5356380167202308,0.17350742104516761,174.03977036476135

    opened by zcc720 2
Owner
null
TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

FunMatch-Distillation TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A g

Sayak Paul 67 Dec 20, 2022
Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)

Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching Official pytorch implementation of "Show, Attend and Distill: Kn

Clova AI Research 80 Dec 16, 2022
This is the official pytorch implementation of Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation(TESKD)

Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation (TESKD) By Zheng Li[1,4], Xiang Li[2], Lingfeng Yang[2,4], Jian Yang[2], Zh

Zheng Li 9 Sep 26, 2022
The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

null 9 Nov 14, 2022
Official implementation of the paper "Lightweight Deep CNN for Natural Image Matting via Similarity Preserving Knowledge Distillation"

Lightweight-Deep-CNN-for-Natural-Image-Matting-via-Similarity-Preserving-Knowledge-Distillation Introduction Accepted at IEEE Signal Processing Letter

DongGeun-Yoon 19 Jun 7, 2022
Block-wisely Supervised Neural Architecture Search with Knowledge Distillation (CVPR 2020)

DNA This repository provides the code of our paper: Blockwisely Supervised Neural Architecture Search with Knowledge Distillation. Illustration of DNA

Changlin Li 215 Dec 19, 2022
Codes for SIGIR'22 Paper 'On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation'

OD-Rec Codes for SIGIR'22 Paper 'On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation' Paper, saved teacher models and Andro

Xin Xia 11 Nov 22, 2022
Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement Recently, the power of unconditional image synthesis has significantly advanced th

null 967 Jan 4, 2023
Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation

Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation Introduction WAKD is a PyTorch implementation for our ICPR-2022 pap

null 2 Oct 20, 2022
The offcial repository for 'CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos', SIGIR2022

CharacterBERT-DR The offcial repository for CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos, Sh

ielab 11 Nov 15, 2022
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

SHI Lab 174 Dec 19, 2022
[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

MosaicKD Code for NeurIPS-21 paper "Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data" 1. Motivation Natural images share common l

ZJU-VIPA 37 Nov 10, 2022
Implementation of the paper "Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning"

Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning This is the implementation of the paper "Self-Promoted Prototype Refinement

Kai Zhu 78 Dec 2, 2022
Code implementation of Data Efficient Stagewise Knowledge Distillation paper.

Data Efficient Stagewise Knowledge Distillation Table of Contents Data Efficient Stagewise Knowledge Distillation Table of Contents Requirements Image

IvLabs 112 Dec 2, 2022
Pytorch implementation for Patient Knowledge Distillation for BERT Model Compression

Patient Knowledge Distillation for BERT Model Compression Knowledge distillation for BERT model Installation Run command below to install the environm

Siqi 180 Dec 19, 2022
PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition.

FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f

Zhiqiang Shen 129 Dec 24, 2022
Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"

Dataset Distillation by Matching Training Trajectories Project Page | Paper This repo contains code for training expert trajectories and distilling sy

George Cazenavette 256 Jan 5, 2023
Unoffical implementation about Image Super-Resolution via Iterative Refinement by Pytorch

Image Super-Resolution via Iterative Refinement Paper | Project Brief This is a unoffical implementation about Image Super-Resolution via Iterative Re

LiangWei Jiang 2.5k Jan 2, 2023
Code for the CVPR 2021 paper: Understanding Failures of Deep Networks via Robust Feature Extraction

Welcome to Barlow Barlow is a tool for identifying the failure modes for a given neural network. To achieve this, Barlow first creates a group of imag

Sahil Singla 33 Dec 5, 2022