PyTorch implementation of Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Teddy Koker

Last update: Sep 29, 2022

Related tags

Overview

Simple PyTorch Implementation of "Grokking"

Implementation of Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Usage

Running train.py with default arguments will run my best (yet) attempt to reproduce the "Grokking" behavior on modular division as seen in Figure 1 of the paper.

python train.py

The results seem highly sensitive to optimizer hyperparameter selection, and I have not yet tried all of the configurations outlined in the paper.

Citations

@inproceedings{power2021grokking,
  title={Grokking: Generalization beyond overfitting on small algorithmic datasets},
  author={Power, Alethea and Burda, Yuri and Edwards, Harri and Babuschkin, Igor and Misra, Vedant},
  booktitle={ICLR MATH-AI Workshop},
  year={2021}
}

You might also like...

Official PyTorch implementation of the Fishr regularization for out-of-distribution generalization

Fishr: Invariant Gradient Variances for Out-of-distribution Generalization Official PyTorch implementation of the Fishr regularization for out-of-dist

62 Dec 22, 2022

Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization This is an official implementation of "Feature Stylization and Domain-

22 Sep 22, 2022

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

17 Sep 23, 2021

[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

template-pose Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions

92 Dec 28, 2022

This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Stock Market Buy/Sell/Hold prediction Using convolutional Neural Network This repo is an attempt to implement the research paper titled "Algorithmic F

136 Dec 28, 2022

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context This repository contains the code in both PyTorch and TensorFlow for our paper

3.3k Jan 6, 2023

Efficient Training of Visual Transformers with Small Datasets

Official codes for "Efficient Training of Visual Transformers with Small Datasets", NerIPS 2021.

112 Dec 25, 2022

Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis"

Beyond the Spectrum Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis" by Yang He, Ning Yu, Margret Keu

27 Jan 7, 2023

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Deep Learning Dataset Maker Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data. How to use Down

25 Dec 15, 2022

Owner

Teddy Koker

Machine Learning @mit-ll

GitHub

The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We significantly improve the systematic generalization of transformer models on a variety of datasets using simple tricks and careful considerations.

Codebase for training transformers on systematic generalization datasets. The official repository for our EMNLP 2021 paper The Devil is in the Detail:

57 Nov 21, 2022

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetu

3 Dec 5, 2022

A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

121 Dec 17, 2022

Minimal But Practical Image Classifier Pipline Using Pytorch, Finetune on ResNet18, Got 99% Accuracy on Own Small Datasets.

PyTorch Image Classifier Updates As for many users request, I released a new version of standared pytorch immage classification example at here: http:

106 Nov 6, 2022

Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

TCMR: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video Qualtitative result Paper teaser video Introduction This r

215 Jan 6, 2023

PyTorch implementation of Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Related tags

Overview

Simple PyTorch Implementation of "Grokking"

Usage

Citations

You might also like...

Official PyTorch implementation of the Fishr regularization for out-of-distribution generalization

Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Efficient Training of Visual Transformers with Small Datasets

Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis"

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Owner

Teddy Koker

The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We significantly improve the systematic generalization of transformer models on a variety of datasets using simple tricks and careful considerations.

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Minimal But Practical Image Classifier Pipline Using Pytorch, Finetune on ResNet18, Got 99% Accuracy on Own Small Datasets.

Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

SPT_LSA_ViT - Implementation for Visual Transformer for Small-size Datasets

A PyTorch implementation of Sharpness-Aware Minimization for Efficiently Improving Generalization