You Only 👀 One Sequence

Hust Visual Learning Team

Last update: Jan 3, 2023

Related tags

Overview

You Only 👀 One Sequence

TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO object detection benchmark.
This project is under active development.

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

by Yuxin Fang¹ *, Bencheng Liao¹ *, Xinggang Wang^{1

✉️}, Jiemin Fang^{2, 1}, Jiyang Qi¹, Rui Wu³, Jianwei Niu³, Wenyu Liu¹.

¹ School of EIC, HUST, ² Institute of AI, HUST, ³ Horizon Robotics.

(*) equal contribution, (^✉️) corresponding author.

arXiv technical report (arXiv 2106.00666)

You Only Look at One Sequence (YOLOS)

The Illustration of YOLOS

Highlights

Directly inherited from ViT (DeiT), YOLOS is not designed to be yet another high-performance object detector, but to unveil the versatility and transferability of Transformer from image recognition to object detection. Concretely, our main contributions are summarized as follows:

We use the mid-sized ImageNet-1k as the sole pre-training dataset, and show that a vanilla ViT (DeiT) can be successfully transferred to perform the challenging object detection task and produce competitive COCO results with the fewest possible modifications, i.e., by only looking at one sequence (YOLOS).
We demonstrate that 2D object detection can be accomplished in a pure sequence-to-sequence manner by taking a sequence of fixed-sized non-overlapping image patches as input. Among existing object detectors, YOLOS utilizes minimal 2D inductive biases. Moreover, it is feasible for YOLOS to perform object detection in any dimensional space unaware the exact spatial structure or geometry.
For ViT (DeiT), we find the object detection results are quite sensitive to the pre-train scheme and the detection performance is far from saturating. Therefore the proposed YOLOS can be used as a challenging benchmark task to evaluate different pre-training strategies for ViT (DeiT).
We also discuss the impacts as wel as the limitations of prevalent pre-train schemes and model scaling strategies for Transformer in vision through transferring to object detection.

Results

Model	Pre-train Epochs	ViT (DeiT) Weight / Log	Fine-tune Epochs	Eval Size	YOLOS Checkpoint / Log	AP @ COCO val
`YOLOS-Ti`	300	FB	300	512	Baidu Drive, Google Drive / Log	28.7
`YOLOS-S`	200	Baidu Drive, Google Drive / Log	150	800	Baidu Drive, Google Drive / Log	36.1
`YOLOS-S`	300	FB	150	800	Baidu Drive, Google Drive / Log	36.1
`YOLOS-S (dWr)`	300	Baidu Drive, Google Drive / Log	150	800	Baidu Drive, Google Drive / Log	37.6
`YOLOS-B`	1000	FB	150	800	Baidu Drive, Google Drive / Log	42.0

Notes:

The access code for Baidu Drive is yolo.
The FB stands for model weights provided by DeiT (paper, code). Thanks for their wonderful works.
We will update other models in the future, please stay tuned :)

Requirement

This codebase has been developed with python version 3.6, PyTorch 1.5+ and torchvision 0.6+:

conda install -c pytorch pytorch torchvision

Install pycocotools (for evaluation on COCO) and scipy (for training):

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Training

Before finetuning on COCO, you need download the ImageNet pretrained model to the /path/to/YOLOS/ directory

To train the YOLOS-Ti model in the paper, run this command:


python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --use_env main.py \
    --coco_path /path/to/coco
    --batch_size 2 \
    --lr 5e-5 \
    --epochs 300 \
    --backbone_name tiny \
    --pre_trained /path/to/deit-tiny.pth\
    --eval_size 512 \
    --init_pe_size 800 1333 \
    --output_dir /output/path/box_model

To train the YOLOS-S model with 200 epoch pretrained Deit-S in the paper, run this command:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --batch_size 1 --lr 2.5e-5 --epochs 150 --backbone_name small --pre_trained /path/to/deit-small-200epoch.pth --eval_size 800 --init_pe_size 512 864 --mid_pe_size 512 864 --output_dir /output/path/box_model

To train the YOLOS-S model with 300 epoch pretrained Deit-S in the paper, run this command:

python -m torch.distributed.launch \ --nproc_per_node=8 \ --use_env main.py \ --coco_path /path/to/coco --batch_size 1 \ --lr 2.5e-5 \ --epochs 150 \ --backbone_name small \ --pre_trained /path/to/deit-small-300epoch.pth\ --eval_size 800 \ --init_pe_size 512 864 \ --mid_pe_size 512 864 \ --output_dir /output/path/box_model

To train the YOLOS-S (dWr) model in the paper, run this command:


python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --use_env main.py \
    --coco_path /path/to/coco
    --batch_size 1 \
    --lr 2.5e-5 \
    --epochs 150 \
    --backbone_name small_dWr \
    --pre_trained /path/to/deit-small-dWr-scale.pth\
    --eval_size 800 \
    --init_pe_size 512 864 \
    --mid_pe_size 512 864 \
    --output_dir /output/path/box_model

To train the YOLOS-B model in the paper, run this command:


python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --use_env main.py \
    --coco_path /path/to/coco
    --batch_size 1 \
    --lr 2.5e-5 \
    --epochs 150 \
    --backbone_name base \
    --pre_trained /path/to/deit-base.pth\
    --eval_size 800 \
    --init_pe_size 800 1344 \
    --mid_pe_size 800 1344 \
    --output_dir /output/path/box_model

Evaluation

To evaluate YOLOS-Ti model on COCO, run:

python main.py --coco_path /path/to/coco --batch_size 2 --backbone_name tiny --eval --eval_size 512 --init_pe_size 800 1333 --resume /path/to/YOLOS-Ti

To evaluate YOLOS-S model on COCO, run:

python main.py --coco_path /path/to/coco --batch_size 1 --backbone_name small --eval --eval_size 800 --init_pe_size 512 864 --mid_pe_size 512 864 --resume /path/to/YOLOS-S

To evaluate YOLOS-S (dWr) model on COCO, run:

python main.py --coco_path /path/to/coco --batch_size 1 --backbone_name small_dWr --eval --eval_size 800 --init_pe_size 512 864 --mid_pe_size 512 864 --resume /path/to/YOLOS-S(dWr)

To evaluate YOLOS-B model on COCO, run:

python main.py --coco_path /path/to/coco --batch_size 1 --backbone_name small --eval --eval_size 800 --init_pe_size 800 1344 --mid_pe_size 800 1344 --resume /path/to/YOLOS-B

Visualization

We have observed some intriguing properties of YOLOS, and we are working on a notebook to better demonstrate them, please stay tuned :)

Visualize box prediction and object categories distribution：

To Get visualization in the paper, you need the finetuned YOLOS models on COCO, run following command to get 100 Det-Toks prediction on COCO val split, then it will generate /path/to/YOLOS/visualization/modelname-eval-800-eval-pred.json

python cocoval_predjson_generation.py --coco_path /path/to/coco --batch_size 1 --backbone_name small --eval --eval_size 800 --init_pe_size 512 864 --mid_pe_size 512 864 --resume /path/to/yolos-s-model.pth --output_dir ./visualization

To get all ground truth object categories on all images from COCO val split, run following command to generate /path/to/YOLOS/visualization/coco-valsplit-cls-dist.json

python cocoval_gtclsjson_generation.py --coco_path /path/to/coco --batch_size 1 --output_dir ./visualization

To visualize the distribution of Det-Toks' bboxs and categories, run following command to generate .png files in /path/to/YOLOS/visualization/

 python visualize_dettoken_dist.py --visjson /path/to/YOLOS/visualization/modelname-eval-800-eval-pred.json --cococlsjson /path/to/YOLOS/visualization/coco-valsplit-cls-dist.json

Visualize self-attention of the [DetTok] token on the different heads of the last layer：

we are working on a notebook to better demonstrate them, please stay tuned :)

Acknowledgement ❤️

This project is based on DETR (paper, code), DeiT (paper, code) and timm. Thanks for their wonderful works.

Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 :

@article{YOLOS,
  title={You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection},
  author={Fang, Yuxin and Liao, Bencheng and Wang, Xinggang and Fang, Jiemin and Qi, Jiyang and Wu, Rui and Niu, Jianwei and Liu, Wenyu},
  journal={arXiv preprint arXiv:2106.00666},
  year={2021}
}

Comments

Input size can not be dynamic?

I tried something like this:

 python demo.py --resume weights/yolos_s_dWr.pth --data_file ../yolov7/images/COCO_val2014_000000001856.jpg --mid_pe_size 800 864 --init_pe_size 800 864
Not using distributed mode
Namespace(backbone_name='small_dWr', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path=None, data_file='../yolo/images/COCO_val2014_000000001856.jpg', dataset_file='coco', decay_rate=0.1, det_token_num=100, device='cuda', dice_loss_coef=1, dist_url='env://', distributed=False, eos_coef=0.1, epochs=150, eval=False, eval_size=800, giou_loss_coef=2, init_pe_size=[800, 864], lr=0.0001, lr_backbone=1e-05, lr_drop=100, mid_pe_size=[800, 864], min_lr=1e-07, num_workers=2, output_dir='', pre_trained='', remove_difficult=False, resume='weights/yolos_s_dWr.pth', sched='warmupcos', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, use_checkpoint=False, warmup_epochs=0, warmup_lr=1e-06, weight_decay=0.0001, world_size=1)

Got:

torch1.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Detector:
	size mismatch for backbone.pos_embed: copying a param with shape torch.Size([1, 1829, 330]) from checkpoint, the shape in current model is torch.Size([1, 2801, 330]).
	size mismatch for backbone.mid_pos_embed: copying a param with shape torch.Size([13, 1, 1829, 330]) from checkpoint, the shape in current model is torch.Size([13, 1, 2801, 330]).

bug

opened by jinfagang 6

How is the performance on Pascal VOC？

❔How is the performance on Pascal VOC？

Hi, I test YOLOS on pascal voc 2007 with default parameters, I can't get a satisfactory result, here is my result：

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.276 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.497 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.274 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.006 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.085 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.390 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.297 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.433 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.490 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.053 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.289 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.628

I wonder is there anything go wrong, could you give me some advice?

Additional context
question

opened by MingXiangL 4
About Learning Rate Scheduler

❔Question

Why the step of learning rate scheduler after each epoch instead of each batch in main.py?

Won't the change rate of lr be too slow? (and unstable for various dataset sizes)
question

opened by 1049451037 2
AMP Support?

Thanks for your great work and releasing the code!

I find that in engine.py, AMP-related code is commented out. And I am wondering that if I can use AMP in this project. Would it speed up the training and would it hurt the performance?
question

opened by impiga 2

[URGENT] Eval results are much lower than what's reported

Hi, thanks for the excellent work!

I follow the instructions in README to evaluate the models provided in your repo. However, the AP I got for yolos_ti .pth, yolos_s_200_pre.pth, yolos_s_300_pre.pth, yolos_s_dWr.pth, and yolos_base.pth are 28.7, 12.5, 12.7, 13.2, and 13.8, respectively. While yolos_ti.pth matches the performance in your paper and log, other four models are significantly lower than what's expected. Any idea why this would happen? Thanks in advance!

For example, when evaluating the base model, I ran

python  -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path ../data/coco --batch_size 2 --backbone_name base --eval --eval_size 800 --init_pe_size 800 1344 --mid_pe_size 800 1344 --resume ../trained_weights/yolos/yolos_base.pth

and was expected to obtain a 42.0 AP performance, as shown in your paper and log. However, the result is only 13.8 AP.

The complete evaluation output is shown below.

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
| distributed init (rank 0): env://
| distributed init (rank 2): env://
| distributed init (rank 3): env://
| distributed init (rank 1): env://
| distributed init (rank 6): env://
| distributed init (rank 5): env://
| distributed init (rank 7): env://
| distributed init (rank 4): env://
Namespace(backbone_name='base', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path='../data/coco', dataset_file='coco', decay_rate=0.1, det_token_num=100, device='cuda', dice_loss_coef=1, dist_backend='nccl', dist_url='env://', distributed=True, eos_coef=0.1, epochs=150, eval=True, eval_size=800, giou_loss_coef=2, gpu=0, init_pe_size=[800, 1344], lr=0.0001, lr_backbone=1e-05, lr_drop=100, mid_pe_size=[800, 1344], min_lr=1e-07, num_workers=2, output_dir='', pre_trained='', rank=0, remove_difficult=False, resume='../trained_weights/yolos/yolos_base.pth', sched='warmupcos', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, use_checkpoint=False, warmup_epochs=0, warmup_lr=1e-06, weight_decay=0.0001, world_size=8)
Has mid pe
number of params: 127798368
loading annotations into memory...
Done (t=23.52s)
creating index...
index created!
800
loading annotations into memory...
Done (t=3.00s)
creating index...
index created!
Test:  [  0/313]  eta: 0:39:39  class_error: 29.21  loss: 2.1542 (2.1542)  loss_bbox: 0.4245 (0.4245)  loss_ce: 0.7761 (0.7761)  loss_giou: 0.9535 (0.9535)  cardinality_error_unscaled: 5.3750 (5.3750)  class_error_unscaled: 29.2100 (29.2100)  loss_bbox_unscaled: 0.0849 (0.0849)  loss_ce_unscaled: 0.7761 (0.7761)  loss_giou_unscaled: 0.4768 (0.4768)  time: 7.6030  data: 0.5298  max mem: 3963
Test:  [256/313]  eta: 0:00:26  class_error: 17.22  loss: 2.5668 (2.6435)  loss_bbox: 0.5639 (0.5792)  loss_ce: 0.8598 (0.8386)  loss_giou: 1.1904 (1.2257)  cardinality_error_unscaled: 3.8750 (4.2398)  class_error_unscaled: 28.7817 (28.6160)  loss_bbox_unscaled: 0.1128 (0.1158)  loss_ce_unscaled: 0.8598 (0.8386)  loss_giou_unscaled: 0.5952 (0.6129)  time: 0.4406  data: 0.0137  max mem: 10417
Test:  [312/313]  eta: 0:00:00  class_error: 16.29  loss: 2.8745 (2.6626)  loss_bbox: 0.5974 (0.5833)  loss_ce: 0.8791 (0.8461)  loss_giou: 1.3012 (1.2332)  cardinality_error_unscaled: 3.8750 (4.2370)  class_error_unscaled: 26.2946 (28.7748)  loss_bbox_unscaled: 0.1195 (0.1167)  loss_ce_unscaled: 0.8791 (0.8461)  loss_giou_unscaled: 0.6506 (0.6166)  time: 0.4251  data: 0.0134  max mem: 10417
Test: Total time: 0:02:25 (0.4663 s / it)
Averaged stats: class_error: 16.29  loss: 2.8745 (2.6626)  loss_bbox: 0.5974 (0.5833)  loss_ce: 0.8791 (0.8461)  loss_giou: 1.3012 (1.2332)  cardinality_error_unscaled: 3.8750 (4.2370)  class_error_unscaled: 26.2946 (28.7748)  loss_bbox_unscaled: 0.1195 (0.1167)  loss_ce_unscaled: 0.8791 (0.8461)  loss_giou_unscaled: 0.6506 (0.6166)
Accumulating evaluation results...
DONE (t=15.78s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.13810
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.26766
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.11832
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.05146
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.13066
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.23324
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.18115
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.29001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.31740
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.12520
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.31154
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.49446

bug good first issue

opened by encounter1997 2

About learning rate scheduler

❔Question

From your code main.py, I only see the learning rate updated after every epoch: https://github.com/hustvl/YOLOS/blob/2e10dc45b5effc015bd64d597b546caf47fb3c0e/main.py#L217

Looking at your logs, it also seems to confirm that. Did you use a warm-up learning rate scheduler for the first few iterations?
question

opened by davidnvq 2
Small learning rate value

❔Question

Thank you for your great work to examine transformers in OD. My question is that why do we start with a very small learning rate 2.5 * 10e-5 as there is no clue in your paper? My first guess is that you inherited the settings from the DETR framework.

Have you tried with larger learning rates? To speed up the training procedure with more GPUs, any rule to scale up the learning rate for YOLOS as you experimented without losing the performance?

Many thanks.
question

opened by davidnvq 2
Object Detection LB

❔Question

Congratulation for publishing a good work. How is performance wrt to YOLO5 and other YOlo series and also its standing on Object detection LB.

Additional context
question

opened by jaideep11061982 2
Where are the pre-trained models?

In your paper, you say that your pre-trained models are under this repo, but I can't find them! I had searched it everywhere, but I can't find them at any other place. I have no time to train them by myself, so I need your help! If you can give them to me privately, you can send them to [email protected]. Thanks!
enhancement

opened by HeySUPERMELON 1
Anyone else getting memory issues?

❔Question

Hello! I wonder if anyone else is getting GPU memory errors even with the small model (yolos_small) ?

Additional context

I am on a 4 GPUs node with Geforce Gtx 1080 ti with 11gb memory each. I use batch size 1 as recommended. Both distributed and non-distributed versions throw the same error.

Tiny model trains smoothly without a trouble.

If there are any tips to reduce memory usage that would be awesome as well!
question

opened by kilickaya 1
Where're the pre-trained models?

In your paper, you say that your pre-trained models are under this repo, but I can't find them! I had found it everywhere, but I can't find them at any other place since I have no time to train them by myself. If you can give them to me privately, you can send them to [email protected]. Thanks!
enhancement

opened by HeySUPERMELON 0
ONNX Export

❔Question

Can we export YOLOS models to ONNX format?

Additional context

Because I want to deploy YOLOS model on onnxruntime to save deployment cost and run it via docker on NVIDIA Jetson series
question

opened by zogojogo 0
CUDA Out of Memory Errors w Batch Size of 1 on 16GB V100

Using the default FeatureExtractor settings for the HuggingFace port of YOLOS, I am consistently running into CUDA OOM errors on a 16GB V100 (even with a training batch size of 1).

I would like to train YOLOS on publaynet and ideally use 4-8 V100s.

Is there a way to lower the CUDA memory usage while training YOLOS besides batch size (whilst preserving the accuracy and leveraging the pertained models)?

I see that other models (e.g. DiT) use image sizes of 244x244. However, is it fair to assume that such a small image size would not be appropriate for object detection as too much information is lost? In the DiT case document image classification was the objective.
question

opened by jordanparker6 4
Train problem with VOC

❔Question

I convert PASCAL VOC dataset to COCO format, but when I trained yolos-tiny with 150 epochs and pre-trained weights , the results is so bad. I get no ideas.

Additional context

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.039 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.084 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.032 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.005 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.059 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.124 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.234 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.283 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.066 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.402
question

opened by chenyi-real 0
Hi，what is the magnitude of model parameters ?

❔Question

Hi，what is the magnitude of YOLOS model parameters ?

Additional context

PS :Do you have the official implementation of article 'Benchmarking Detection Transfer Learning with Vision Transformers' over there ？I use your MIMdet model, but my GPU is RTX3070 with 8G and can't run it.
question

opened by ross-Hr 0
Adding YOLOS to HuggingFace Transformers

Hi YOLOS team :)

I've implemented YOLOS as a fork of 🤗 HuggingFace Transformers, and I'm going to add it soon to the library (see https://github.com/huggingface/transformers/pull/16848). Here's a notebook that illustrates inference with it: https://colab.research.google.com/drive/18ti9HrRoVE6d0vGBtnaeq93Tau3EYqOK?usp=sharing

The reason I'm adding YOLOS is because I really like the simplicity of it, compared to very complex frameworks such as Faster R-CNN and Mask R-CNN. I've added DETR previously also because it simplifies the task of object detection a lot.

As you may or may not know, any model on the HuggingFace hub has its own Github repository. E.g. the YOLOS-small checkpoint can be found here: https://huggingface.co/nielsr/yolos-s. If you check the "files and versions" tab, it includes the weights. The model hub uses git-LFS (large file storage) to use Git with large files such as model weights. This means that any model has its own Git commit history!

A model card can also be added to the repo, which is just a README.

Are you interested in creating an organization on the hub, such that we can store all model checkpoints there (rather than under my user name)?

Let me know!

Kind regards,

Niels ML Engineer @ HuggingFace

opened by NielsRogge 6
Error of the size mismatch for pos_embed

We load our pretrained model of vit-base trained with mae method, and we meet the size mismatch for pos_embed. Is there any solution to this problem please?

RuntimeError: Error(s) in loading state_dict for VisionTransformer: size mismatch for pos_embed: copying a param with shape torch.Size([1, 785, 768]) from checkpoint, the shape in current model is torch.Size([1, 578, 768]).
question

opened by lxn96 1

Owner

Hust Visual Learning Team

Hust Visual Learning Team belongs to the Artificial Intelligence Research Institute in the School of EIC in HUST

GitHub

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

112 Dec 16, 2022

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

22 Nov 25, 2022

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

897 Jan 5, 2023

Sequence to Sequence Models with PyTorch

Sequence to Sequence models with PyTorch This repository contains implementations of Sequence to Sequence (Seq2Seq) models in PyTorch At present it ha

708 Dec 19, 2022

Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

514 Nov 17, 2022

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

490 Dec 15, 2022

An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

195 Dec 17, 2022

Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

24 Oct 26, 2022

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

1.4k Jan 8, 2023

You Only Look One-level Feature (YOLOF), CVPR2021, Detectron2

You Only Look One-level Feature (YOLOF), CVPR2021 A simple, fast, and efficient object detector without FPN. This repo provides a neat implementation

273 Jan 3, 2023

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

YOLOR implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks To reproduce the results in the paper, please us

1.8k Jan 4, 2023

Code for the paper "Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are in envir

269 Jan 5, 2023

Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Offline Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are

266 Dec 27, 2022

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

44 Dec 12, 2022

People log into different sites every day to get information and browse through these sites one by one

HyperLink People log into different sites every day to get information and browse through these sites one by one. And they are exposed to advertisemen

0 Feb 17, 2022

naked is a Python tool which allows you to strip a model and only keep what matters for making predictions.

naked is a Python tool which allows you to strip a model and only keep what matters for making predictions. The result is a pure Python function with no third-party dependencies that you can simply copy/paste wherever you wish.

24 Dec 20, 2022

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper) @misc{zhang2021compress,

46 Dec 7, 2022

PyTorch implementation of the YOLO (You Only Look Once) v2

PyTorch implementation of the YOLO (You Only Look Once) v2 The YOLOv2 is one of the most popular one-stage object detector. This project adopts PyTorc

433 Nov 24, 2022

You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors

You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors In this paper, we propose a novel local descriptor-based fra

80 Dec 15, 2022

You Only 👀 One Sequence

Related tags

Overview

You Only 👀 One Sequence

You Only Look at One Sequence (YOLOS)

The Illustration of YOLOS

Highlights

Results

Requirement

Data preparation

Training

Evaluation

Visualization

Acknowledgement ❤️

Citation

Comments

❔How is the performance on Pascal VOC？

Additional context

❔Question

❔Question

❔Question

❔Question

Additional context

❔Question

Additional context

❔Question

Additional context

❔Question

Additional context

❔Question

Additional context

Owner

Hust Visual Learning Team

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Sequence to Sequence Models with PyTorch

Sequence-to-Sequence learning using PyTorch

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

An implementation of a sequence to sequence neural network using an encoder-decoder

Sequence lineage information extracted from RKI sequence data repo

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

You Only Look One-level Feature (YOLOF), CVPR2021, Detectron2

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

Code for the paper "Reinforcement Learning as One Big Sequence Modeling Problem"

Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

People log into different sites every day to get information and browse through these sites one by one

naked is a Python tool which allows you to strip a model and only keep what matters for making predictions.

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

PyTorch implementation of the YOLO (You Only Look Once) v2

You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors