iBOT: Image BERT Pre-Training with Online Tokenizer

Bytedance Inc.

Last update: Jan 6, 2023

Related tags

Deep Learning ibot

Overview

Image BERT Pre-Training with iBOT

Official PyTorch implementation and pretrained models for paper iBOT: Image BERT Pre-Training with Online Tokenizer.

[arXiv] [BibTex]

iBOT is a novel self-supervised pre-training framework that performs masked image modeling with self-distillation. iBOT pre-trained model shows local semantic features, which helps the model transfer well to downstream tasks both at a global scale and a local scale. For example, iBOT achieves strong performance on COCO object detection (51.4 box AP and 44.2 mask AP) and ADE20K semantic segmentation (50.0 mIoU) with vanilla ViT-B/16. iBOT can also extract semantic-meaningful local parts, like dog's ear 🐶 .

Update 🎉

December 2021 - Release the code and pre-trained models.
November 2021 - Release the pre-print on arXiv.

Installation

See installation structions for details.

Training

For a glimpse at the full documentation of iBOT pre-training, please run:

python main_ibot.py --help

iBOT Pre-Training with ViTs

To start the iBOT pre-training with Vision Transformer (ViT), simply run the following commands. JOB_NAME is a customized argument to distinguish different experiments and this will automatically save checkpoints into the seperate folders.

./run.sh imagenet_pretrain $JOB_NAME vit_{small,base,large} teacher {16,24,64}

The exact arguments to reproduce the models presented in our paper can be found in the args column of the pre-trained models. We also provide the logs for pre-training to help reproducibility.

For example, run iBOT with ViT-S/16 network on two nodes with 8 GPUs for 800 epochs with the following command. The resulting checkpoint should reach 75.2% on k-NN accuracy, 77.9% on linear probing accuracy, and 82.3% on fine-tuning accuracy.

./run.sh imagenet_pretrain $JOB_NAME vit_small teacher 16 \
  --teacher_temp 0.07 \
  --warmup_teacher_temp_epochs 30 \
  --norm_last_layer false \
  --epochs 800 \
  --batch_size_per_gpu 64 \
  --shared_head true \
  --out_dim 8192 \
  --local_crops_number 10 \
  --global_crops_scale 0.25 1 \
  --local_crops_scale 0.05 0.25 \
  --pred_ratio 0 0.3 \
  --pred_ratio_var 0 0.2

iBOT Pre-Training with Swins

This code also works for training iBOT on Swin Transformer (Swin). In the paper, we only conduct experiments on Swin-T with different window size:

./run.sh imagenet_pretrain $JOB_NAME swin_tiny teacher {16,40} \
  --patch_size 4 \
  --window_size {7,14}

For example, run iBOT with Swin-T/14 network on five nodes with 8 GPUS for 300 epochs with the following command. The resulting checkpoint should reach 76.2% on k-NN accuracy, 79.3% on linear probing accuracy.

./run.sh imagenet_pretrain $JOB_NAME swin_tiny teacher 40 \
  --teacher_temp 0.07 \
  --warmup_teacher_temp_epochs 30 \
  --norm_last_layer false \
  --epochs 300 \
  --batch_size_per_gpu 26 \
  --shared_head true \
  --out_dim 8192 \
  --local_crops_number 10 \
  --global_crops_scale 0.25 1 \
  --local_crops_scale 0.05 0.25 \
  --pred_ratio 0 0.3 \
  --pred_ratio_var 0 0.2 \
  --pred_start_epoch 50 \
  --patch_size 4 \
  --window_size 14

Pre-Trained Models

You can choose to download only the weights of the pretrained backbone used for downstream tasks, and the full ckpt which contains backbone and projection head weights for both student and teacher networks. For the backbone, s denotes that the student network is selected while t denotes that the teacher network is selected.

Arch.	Par.	k-NN	Lin.	Fin.	download
ViT-S/16	21M	74.5%	77.0%	82.3%	backbone (t)	full ckpt	args	logs
Swin-T/7	28M	75.3%	78.6%	\	backbone (t)	full ckpt	args	logs
Swin-T/14	28M	76.2%	79.3%	\	backbone (t)	full ckpt	args	logs
ViT-B/16	85M	77.1%	79.5%	83.8%	backbone (t)	full ckpt	args	logs

We also provide the ViT-{B,L}/16 model pre-trained on ImageNet-22K dataset.

Arch.	Par.	k-NN	Lin.	Fin.	download
ViT-B/16	85M	71.1%	79.0%	84.4%	backbone (s)	full ckpt	args	logs
ViT-L/16	307M	70.6%	81.7%	86.3%	backbone (s)	full ckpt	args	logs

To extract the backbone from the full checkpoint by yourself, please run the following command where KEY being either student or teacher.

WEIGHT_FILE=$OUTPUT_DIR/checkpoint_$KEY.pth

python extract_backbone_weights.py \
  --checkpoint_key $KEY \
  $PRETRAINED \
  $WEIGHT_FILE \

Downstream Evaluation

See Evaluating iBOT on Downstream Tasks for details.

Property Analysis

See Analyzing iBOT's Properties for robustness test and visualizing self-attention map:

or extracting sparse correspondence pairs bwtween two images:

Extracting Semantic Patterns

We extract top-k numbered local classes based on patch tokens with their corresponding patches and contexts by running the following command. We indentify very diverse behaviour like shared low-level textures and high-level semantics.

python3 -m torch.distributed.launch --nproc_per_node=8 \
    --master_port=${MASTER_PORT:-29500} \
    analysis/extract_pattern/extract_topk_cluster.py \
    --pretrained_path $PRETRAINED \
    --checkpoint {student,teacher} \
    --type patch \
    --topk 36 \
    --patch_window 5 \
    --show_pics 20 \
    --arch vit_small \
    --save_path memory_bank_patch.pth \
    --data_path data/imagenet/val

The script also supports to extract the patern layout on the [CLS] token, which is actually doing clustering or unsupervised classification. This property is not induced by MIM objective since we also spot this feature on DINO.

python3 -m torch.distributed.launch --nproc_per_node=8 \
    --master_port=${MASTER_PORT:-29500} \
    analysis/extract_pattern/extract_topk_cluster.py \
    --pretrained_path $PRETRAINED \
    --checkpoint {student,teacher} \
    --type cls \
    --topk 36 \
    --show_pics 20 \
    --arch vit_small \
    --save_path memory_bank_cls.pth \
    --data_path data/imagenet/val

Acknowledgement

This repository is built using the DINO repository and the BEiT repository.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Citing iBOT

If you find this repository useful, please consider giving a star ⭐ and citation:

@article{zhou2021ibot,
  title={iBOT: Image BERT Pre-Training with Online Tokenizer},
  author={Zhou, Jinghao and Wei, Chen and Wang, Huiyu and Shen, Wei and Xie, Cihang and Yuille, Alan and Kong, Tao},
  journal={arXiv preprint arXiv:2111.07832},
  year={2021}
}

Comments

The reproduce of 100 epochs

Hello, first thanks for your great work. Now, I want to reproduce the results of 100 epochs, i.e, the results of 71.5 knn in figure 8. Can you tell me the corresponding args?

opened by lzyhha 10
Semantic Segmentation on ADE20K

Thank you for your outstanding work, have you tested the Semantic Segmentation on ADE20K in the code? I have encountered many problems (such as mmcv version issues and model init problems), I want to confirm that you can use this code to test normally on ADE20k？

opened by Trent-tangtao 5

Loss goes to NaN after several epochs

Hello,

First of all, well done and thank you for this great work.

I am trying to launch iBOT experiments but struggle with loss going to NaN after a few epochs. The training loss increases (see below), which seems weird. Have you faced similar issues during your experiments and if yes, how did you solve it ? I have limited ressources to launch my experiments, so I can't play too much on the parameters, and any tip would be greatly appreciated.

I know that using mixed precision, as this is the case here, can lead to stability issues. I indeed do not get any nan loss when setting use_fp16 to False. But I'd rather let use_fp16 to True to keep a reasonable training time.

I tried to increase eps of AdamW to 1e-6 and of Batch norm layers to 6.1e-5 (as proposed here), but this did not work. I see in #17 that you propose to decrease beta opt 2 of AdamW, can you comment a bit more on this ? And on other techniques if you know any ?

Below is how I launched the training (I'm on a slurm cluster so I use run_with_submitit.py that calls main_ibot.py ). Dataset size is ~4M images, hence around 3 times larger than ImageNet10K training set. This is why I divided by 3 the number of epochs for warmup_teacher_temp_epochs and warmup_epochs. Not sure this makes sense though.

python run_with_submitit.py --arch vit_small \
    --ngpus 4 \
    --nodes 4 \
    --num_workers 8 \
    --teacher_temp 0.07 \
    --warmup_teacher_temp_epochs 10 \
    --warmup_epochs 3 \
    --norm_last_layer false \
    --epochs 100 \
    --batch_size_per_gpu 112 \
    --shared_head true \
    --out_dim 8192 \
    --local_crops_number 10 \
    --global_crops_scale 0.25 1 \
    --local_crops_scale 0.05 0.25 \
    --pred_ratio 0 0.3 \
    --pred_ratio_var 0 0.2 \
    --timeout 1200 \
    --partition gpu_xxx \
    --data_path "xxx" \
    --saveckp_freq 1

Below, you can find the training metrics recorded :

{"train_loss": 6.99587931743066, "train_cls": 4.331486056964688, "train_patch": 2.6643932607626066, "train_lr": 0.0005831685964416835, "train_wd": 0.040029588545113536, "train_acc": 0.6155056208539762, "train_nmi": 0.1298466117645908, "train_ari": 0.006587329895082833, "train_fscore": 0.043441455577130375, "train_adjacc": -1, "epoch": 0}
{"train_loss": 10.466352438147872, "train_cls": 6.67679833335828, "train_patch": 3.789554104414066, "train_lr": 0.0017500000000000005, "train_wd": 0.040207159993842605, "train_acc": 0.408415272718953, "train_nmi": 0.14113477636477684, "train_ari": 0.0071149456743873594, "train_fscore": 0.030568697062104223, "train_adjacc": -1, "epoch": 1}
{"train_loss": 12.008997473409751, "train_cls": 8.066221890750954, "train_patch": 3.9427755813775858, "train_lr": 0.002916831403558317, "train_wd": 0.0405621652690049, "train_acc": 0.3335961760343232, "train_nmi": 0.13128954990810676, "train_ari": 0.004251988918558582, "train_fscore": 0.03003438171217746, "train_adjacc": -1, "epoch": 2}

Below you can find more information about the config used for the experiment :

act_in_head: gelu
arch: vit_small
batch_size_per_gpu: 112
clip_grad: 3.0
comment:
constraint: ""
data_path: ""
dist_url: env://
drop_path: 0.1
epochs: 100
freeze_last_layer: 1
global_crops_number: 2
global_crops_scale: [0.25, 1.0]
gpu: 0
lambda1: 1.0
lambda2: 1.0
local_crops_number: 10
local_crops_scale: [0.05, 0.25]
local_rank: 0
lr: 0.0005
min_lr: 1e-06
momentum_teacher: 0.996
ngpus: 4
nodes: 4
norm_in_head: None
norm_last_layer: False
num_workers: 8
optimizer: adamw
out_dim: 8192
output_dir: ""
partition: gpu_xxx
patch_out_dim: 8192
patch_size: 16
pred_ratio: [0.0, 0.3]
pred_ratio_var: [0.0, 0.2]
pred_shape: block
pred_start_epoch: 0
qos: qos_xxx
rank: 0
saveckp_freq: 1
seed: 0
shared_head: True
shared_head_teacher: True
teacher_patch_temp: 0.07
teacher_temp: 0.07
timeout: 1200
use_fp16: True
use_masked_im_modeling: True
warmup_epochs: 3
warmup_teacher_patch_temp: 0.04
warmup_teacher_temp: 0.04
warmup_teacher_temp_epochs: 10
weight_decay: 0.04
weight_decay_end: 0.4
window_size: 7
world_size: 16

Best regards

opened by CharlieCheckpt 4

checkpoint not saved by master

as your code describe, https://github.com/bytedance/ibot/blob/3302b63fc7e287afc68601cb1dc2f0c311af8e3b/main_ibot.py#L358 in ddp training, every processor(GPU) would save an checkpoint model in disk, this behaviou may cause duplicate writing problem and saved checkpoint can not be load by torch.load successfully

opened by luuuyi 3
question about imagenet 1% logist regression

hello, I use the eval_logistic_regression.py (lambd=0.1) to evaluate the provided vit-small model but can only get 58.0 val top-1 acc with 1% data, but 65.9 in your paper.Thanks for your help!

Start the logistic regression. Matrix X, n=12811, p=384 Switching to regular solver, problem is well conditioned

Catalyst Accelerator MISO Solver Incremental Solver with uniform sampling Lipschitz constant: 0.25 Multiclass logistic Loss is used L2 regularization Epoch: 10, primal objective: 6.90561, time: 211.912 Best relative duality gap: 0.000430669 Time elapsed : 212.632 Logistic regression result: Acc: 0.58044

opened by haohang96 2
About license

Thanks for the great job. As you said, this repository is released under the Apache 2.0 license. I want to know whether that means the pre-trained models are also under the Apache 2.0 license? Thanks!

opened by WangWenhao0716 2
Linear segmentation evaluation on ADE20k
Hi,

I am trying to reproduce the linear segmentation results obtained with the ViT-B IBOT pretrained model, which performs at 38.3 mIoU according to the paper.

With this model, and the config file provided in:

ibot/evaluation/semantic_segmentation/configs/linear/vit_base_512_ade20k_160k.py

I only reach ~18mIoU on ADE20K. I saw that the command in the README change the learning rate and normalize the output so I tried with:

model.backbone.out_with_norm=true optimizer.lr=8e-4

and I got ~20mIoU.

The only difference is that I am not using apex and the custom distributed optimizer, so I basically comment:

runner = dict(type='IterBasedRunnerAmp') fp16 = None optimizer_config = dict( type="DistOptimizerHook", update_interval=1, grad_clip=None, coalesce=True, bucket_size_mb=-1, use_fp16=True, )

In the config file.

I run my experiment a single node with 8 GPUs. I was wondering if the performance gap could come from the fact that I am not using DistOptimizerHook and apex, or if there is something else I am missing.

Thanks for your help.
opened by Adrien987k 2
100 or 300 epoch training

Hello, Have you train iBOT in shorter training time, eg.100epoch or 300epoch, can you share me these hyper-parameters? Follow DINO, I set python -m torch.distributed.launch --nproc_per_node=8 --master_port=29500\ main_ibot.py \ --arch vit_small \ --output_dir ibot_100epoch \ --data_path imagenet/train \ --batch_size_per_gpu 64 \ --local_crops_number 8 \ --saveckp_freq 10\ --shared_head true \ --epochs 100\ --out_dim 8192, I don't know if this is reasonable

opened by Trent-tangtao 2
Linear semantic segmentation with ViT-L models

Hi,

I was wondering if you had evaluated the ViT-L pretrained on ImageNet1k and ViT-L pretrained on ImageNet22k on the linear semantic segmentation benchmark on ADE20k, similar to column 3 of right table of Table 6 in the paper ? If yes, can you share the results and the corresponding log files ?

Thanks!

opened by Adrien987k 1
Description of DINO in [preliminaries section] is not accurate

"The parameters of the student network θ are Exponentially Moving Averaged (EMA) to the parameters of teacher network θ'",

should be the other way around.

opened by mega-optimus 1
Semantic Segmentation Error on ADE20K

Thank you for your outstanding work. When I try to train ViT-S/16 with UperNet as the task layer, I got the error: KeyError: "EncoderDecoder: 'VisionTransformer is not in the backbone registry'" I find the issue Semantic Segmentation on ADE20K

Solution --> Starting a new terminal window after the installation resolved the issue. This issue could also appear due to GPU - cuda version mismatch.

But it didn't work. Also I checked the description of mmsegmentation v.12.0, VisionTransformer backbone is not yet supported. Hope you can provide some help.

opened by bittxtcc 1

RuntimeError: Expected to mark a variable ready only once.

Hi, I'm new to ibot and mmcv, sorry to disturb. I'm trying to reproduce the object detection task in evaluation phase. I set the job name to "first_try" and my command is shown below:

./run.sh ade20k_seg first_try vit_small teacher 4   data.samples_per_gpu=4   model.backbone.out_with_norm=true   optimizer.lr=3e-5

and an error occurred before training:

2022-11-10 17:52:14,699 - mmseg - INFO - workflow: [('train', 1)], max: 160000 iters
Traceback (most recent call last):
  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 176, in <module>
Traceback (most recent call last):
    main()  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 176, in <module>

  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 172, in main
Traceback (most recent call last):
    meta=meta)  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 176, in <module>

  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/train_api.py", line 187, in train_segmentor
    main()
  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 172, in main
    runner.run(data_loaders, cfg.workflow)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
    meta=meta)
  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/train_api.py", line 187, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
    main()
  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 172, in main
    meta=meta)
  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/train_api.py", line 187, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
    iter_runner(iter_loaders[i], **kwargs)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
iter_runner(iter_loaders[i], **kwargs)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
    self.call_hook('after_train_iter')    
self.call_hook('after_train_iter')
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    self.call_hook('after_train_iter')
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/apex_runner/optimizer.py", line 37, in after_train_iter
    getattr(hook, fn_name)(self)
  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/apex_runner/optimizer.py", line 37, in after_train_iter
        scaled_loss.backward()getattr(hook, fn_name)(self)

  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/apex_runner/optimizer.py", line 37, in after_train_iter
    scaled_loss.backward()
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
    scaled_loss.backward()
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
        allow_unreachable=True)  # allow_unreachable flagtorch.autograd.backward(self, gradient, retain_graph, create_graph)

  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 99, in backward
    torch.autograd.backward(outputs, args)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    return self._forward_cls.backward(self, *args)  # type: ignore
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 99, in backward
    torch.autograd.backward(outputs, args)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
    torch.autograd.backward(outputs, args)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
    allow_unreachable=True)  # allow_unreachable flag
    allow_unreachable=True)  # allow_unreachable flagRuntimeError
: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
Traceback (most recent call last):
  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 176, in <module>
    main()
  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 172, in main
    meta=meta)
  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/train_api.py", line 187, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
    self.call_hook('after_train_iter')
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/apex_runner/optimizer.py", line 37, in after_train_iter
    scaled_loss.backward()
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 99, in backward
    torch.autograd.backward(outputs, args)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
Traceback (most recent call last):
  File "/home/username/anaconda3/envs/py37/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/username/anaconda3/envs/py37/bin/python3', '-u', '/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py', '--local_rank=3', '/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/configs/upernet/vit_small_512_ade20k_160k.py', '--launcher', 'pytorch', '--work-dir', '/data/data0/username/spaceevo_segmentation/ibot/work_dirs/first_try/seg', '--deterministic', '--options', 'model.backbone.use_checkpoint=True', 'model.pretrained=/data/data0/username/spaceevo_segmentation/ibot/work_dirs/first_try/checkpoint_teacher.pth', 'data.samples_per_gpu=4', 'model.backbone.out_with_norm=true', 'optimizer.lr=3e-5']' returned non-zero exit status 1.

I also tried the linear head for segmentation, and there is no such error. Have you ever encountered such a problem? Thanks a lot!

opened by dejiesmile 0

Unsatisfying performance on COCO using Swin-T

Hi.

I compared iBOT Swin-T and supervised Swin-T as pre-trained models for COCO, getting the following results:

Supervised Swin-T: mAP 0.432 iBOT Swin-T: mAP 0.428

The detection framework is Mask R-CNN 1x with multi-scale training. Do you have any ideas on that?

opened by Joker316701882 1
some debug about use torch.utils.checkpoint.checkpoint
When I try to use torch.utils.checkpoint.checkpoint as follows, and use apex to train the model, I found that the loss is so small as 0.4, but the normal loss is 2.x.

So, do you have some idea about this question?

for blk in self.blocks: # x = blk(x) x = torch.utils.checkpoint.checkpoint(blk, x)
opened by ShiYaya 0

Owner

Bytedance Inc.

GitHub

Code for pre-training CharacterBERT models (as well as BERT models).

Pre-training CharacterBERT (and BERT) This is a repository for pre-training BERT and CharacterBERT. DISCLAIMER: The code was largely adapted from an o

31 Dec 5, 2022

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain Mingchen Zhuge*, Dehong Gao*, Deng-Ping Fan#, Linbo Jin, Ben Chen, Haoming Zhou, Minghui

248 Dec 4, 2022

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain Mingchen Zhuge*, Dehong Gao*, Deng-Ping Fan#, Linbo Jin, Ben Chen, Haoming Zhou, Minghui

250 Jan 8, 2023

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

201 Nov 21, 2022

BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

Pre-trained checkpoint and bert config json file Location of checkpoint and bert config json file This MLCommons members Google Drive location contain

SAIT (Samsung Advanced Institute of Technology)

12 Apr 27, 2022

I-BERT: Integer-only BERT Quantization

I-BERT: Integer-only BERT Quantization HuggingFace Implementation I-BERT is also available in the master branch of HuggingFace! Visit the following li

139 Dec 27, 2022

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

37 Oct 30, 2022

LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

14 Aug 24, 2022

VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

44 Nov 1, 2022

Chinese clinical named entity recognition using pre-trained BERT model

Chinese clinical named entity recognition (CNER) using pre-trained BERT model Introduction Code for paper Chinese clinical named entity recognition wi

109 Dec 14, 2022

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

191 Dec 31, 2022

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

DeCLIP Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. Our paper is available in arxiv Updates ** Ou

470 Dec 30, 2022

CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

CLIP-Indonesian CLIP (Radford et al., 2021) is a multimodal model that can connect images and text by training a vision encoder and a text encoder joi

17 Mar 10, 2022

Code release for SLIP Self-supervision meets Language-Image Pre-training

SLIP: Self-supervision meets Language-Image Pre-training What you can find in this repo: Pre-trained models (with ViT-Small, Base, Large) and code to

621 Dec 31, 2022

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

1.3k Dec 31, 2022

PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Transformer-PyTorch A PyTorch implementation of the Transformer from the paper Attention is All You Need in both Post-LN (Post-LayerNorm) and Pre-LN (

22 Feb 27, 2022

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

1 Dec 13, 2021

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

61 Jan 1, 2023

[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers This is the official PyTorch implementation and models for UP-DETR paper: @a

430 Dec 23, 2022