TransMix: Attend to Mix for Vision Transformers
This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers
Code and pretrained models will be released soon.
This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers
Code and pretrained models will be released soon.
Hey, I have a question to bother you. I use this code to train Tiny ImageNet, all the configs are reserved. But after dozens of epochs, the top1 accuracy on eval is 100%. Is there anything wrong? I can't find the problem. And I observe that the test acc is higher than the test EMA acc in former epochs, then why need I to use EMA? Thanks.
Thanks for this sharing your amazing work. I was trying to train DeIT-B/16 from scratch on ImageNet-1k using the hyperparams reported in your paper. I'm pretty sure I'm missing something, but I'm unable to reach 82.4%. With the hyperparams I use, I get around 78.6% which is even worse than DeIT-S/16.
Could you please share the training command line for DeIT-B/16 or share the config file for the same? Thanks a lot.
I carefully checked the definition of lam in your paper, but I didn't find the description of "lam = (lam0 + lam1) / 2 # ()+(b,) ratio=0.5".So, are you using half of the origin lam and half of the attention lam for the final lam? Thanks
hello, authors! Recently, I have been interested in swin-transformer. However, the heat map I got is very unreasonable. I see that you had worked on swin-transformer, so I want to know your ways of drawing heat map or attention map of swin-transformer. I would be grateful if you could give me the answer.
Hey, I have a question. For non-classification task, this paper proposed utilizing pretrained backbone with TransMix will perform better, which need pretrained model. If I straightly add this augment module on current transformer-based segmentation task, will it work? Thanks!
I just checked the configuration file, it seems that some of the training strategies are quite different from the original DeiT training recipe, (e.g. Batch size, learning rate scheduler, model ema ...) So I'm wondering what would be the baseline result for this configuration?
Hi,We have read the paper 4.5 about the swim transformer part.Would you guys open source the ca-swim and how to using the transmix into it? Thanks for you attention.
How to Reproduce our Results This repository contains PyTorch implementation code for the paper MixCo: Mix-up Contrastive Learning for Visual Represen
Contrast and Mix (CoMix) The repository contains the codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Backgroun
Intriguing Properties of Vision Transformers Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, & Ming-Hsuan Yang P
Improving-Adversarial-Transferability-of-Vision-Transformers Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Khan, Fatih Porikli arxiv link A
Triangle Multiplicative Module - Pytorch Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or c
Lightweight (Bayesian) Media Mix Model This is not an official Google product. L
Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r
This repository contains PyTorch code for Robust Vision Transformers.
Recurrent Fast Weight Programmers This is the official repository containing the code we used to produce the experimental results reported in the pape
Codebase for training transformers on systematic generalization datasets. The official repository for our EMNLP 2021 paper The Devil is in the Detail:
3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and
vision-transformer-from-scratch This repository includes several kinds of vision transformers from scratch so that one beginner can understand the the
Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut
Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut
Less is More: Pay Less Attention in Vision Transformers Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers. By
Focal Transformer This is the official implementation of our Focal Transformer -- "Focal Self-attention for Local-Global Interactions in Vision Transf
DG-TrajGen The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022. Our Meth
Overview This is a hobby project which includes a hand-gesture controlled virtual piano using an android phone camera and some OpenCV library. My moti
Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is