iBOT: Image BERT Pre-Training with Online Tokenizer

Related tags

Deep Learning ibot

Image BERT Pre-Training with iBOT iBOT Icon


Official PyTorch implementation and pretrained models for paper iBOT: Image BERT Pre-Training with Online Tokenizer.

[arXiv] [BibTex]

iBOT framework

iBOT is a novel self-supervised pre-training framework that performs masked image modeling with self-distillation. iBOT pre-trained model shows local semantic features, which helps the model transfer well to downstream tasks both at a global scale and a local scale. For example, iBOT achieves strong performance on COCO object detection (51.4 box AP and 44.2 mask AP) and ADE20K semantic segmentation (50.0 mIoU) with vanilla ViT-B/16. iBOT can also extract semantic-meaningful local parts, like dog's ear 🐶 .

Update 🎉

  • December 2021 - Release the code and pre-trained models.
  • November 2021 - Release the pre-print on arXiv.


See installation structions for details.


For a glimpse at the full documentation of iBOT pre-training, please run:

python main_ibot.py --help

iBOT Pre-Training with ViTs

To start the iBOT pre-training with Vision Transformer (ViT), simply run the following commands. JOB_NAME is a customized argument to distinguish different experiments and this will automatically save checkpoints into the seperate folders.

./run.sh imagenet_pretrain $JOB_NAME vit_{small,base,large} teacher {16,24,64}

The exact arguments to reproduce the models presented in our paper can be found in the args column of the pre-trained models. We also provide the logs for pre-training to help reproducibility.

For example, run iBOT with ViT-S/16 network on two nodes with 8 GPUs for 800 epochs with the following command. The resulting checkpoint should reach 75.2% on k-NN accuracy, 77.9% on linear probing accuracy, and 82.3% on fine-tuning accuracy.

./run.sh imagenet_pretrain $JOB_NAME vit_small teacher 16 \
  --teacher_temp 0.07 \
  --warmup_teacher_temp_epochs 30 \
  --norm_last_layer false \
  --epochs 800 \
  --batch_size_per_gpu 64 \
  --shared_head true \
  --out_dim 8192 \
  --local_crops_number 10 \
  --global_crops_scale 0.25 1 \
  --local_crops_scale 0.05 0.25 \
  --pred_ratio 0 0.3 \
  --pred_ratio_var 0 0.2

iBOT Pre-Training with Swins

This code also works for training iBOT on Swin Transformer (Swin). In the paper, we only conduct experiments on Swin-T with different window size:

./run.sh imagenet_pretrain $JOB_NAME swin_tiny teacher {16,40} \
  --patch_size 4 \
  --window_size {7,14}

For example, run iBOT with Swin-T/14 network on five nodes with 8 GPUS for 300 epochs with the following command. The resulting checkpoint should reach 76.2% on k-NN accuracy, 79.3% on linear probing accuracy.

./run.sh imagenet_pretrain $JOB_NAME swin_tiny teacher 40 \
  --teacher_temp 0.07 \
  --warmup_teacher_temp_epochs 30 \
  --norm_last_layer false \
  --epochs 300 \
  --batch_size_per_gpu 26 \
  --shared_head true \
  --out_dim 8192 \
  --local_crops_number 10 \
  --global_crops_scale 0.25 1 \
  --local_crops_scale 0.05 0.25 \
  --pred_ratio 0 0.3 \
  --pred_ratio_var 0 0.2 \
  --pred_start_epoch 50 \
  --patch_size 4 \
  --window_size 14 

Pre-Trained Models

You can choose to download only the weights of the pretrained backbone used for downstream tasks, and the full ckpt which contains backbone and projection head weights for both student and teacher networks. For the backbone, s denotes that the student network is selected while t denotes that the teacher network is selected.

Arch. Par. k-NN Lin. Fin. download
ViT-S/16 21M 74.5% 77.0% 82.3% backbone (t) full ckpt args logs
Swin-T/7 28M 75.3% 78.6% \ backbone (t) full ckpt args logs
Swin-T/14 28M 76.2% 79.3% \ backbone (t) full ckpt args logs
ViT-B/16 85M 77.1% 79.5% 83.8% backbone (t) full ckpt args logs

We also provide the ViT-{B,L}/16 model pre-trained on ImageNet-22K dataset.

Arch. Par. k-NN Lin. Fin. download
ViT-B/16 85M 71.1% 79.0% 84.4% backbone (s) full ckpt args logs
ViT-L/16 307M 70.6% 81.7% 86.3% backbone (s) full ckpt args logs

To extract the backbone from the full checkpoint by yourself, please run the following command where KEY being either student or teacher.


python extract_backbone_weights.py \
  --checkpoint_key $KEY \

Downstream Evaluation

See Evaluating iBOT on Downstream Tasks for details.

Property Analysis

See Analyzing iBOT's Properties for robustness test and visualizing self-attention map:

iBOT Global Pattern Layout

or extracting sparse correspondence pairs bwtween two images:

iBOT Global Pattern Layout

Extracting Semantic Patterns

We extract top-k numbered local classes based on patch tokens with their corresponding patches and contexts by running the following command. We indentify very diverse behaviour like shared low-level textures and high-level semantics.

python3 -m torch.distributed.launch --nproc_per_node=8 \
    --master_port=${MASTER_PORT:-29500} \
    analysis/extract_pattern/extract_topk_cluster.py \
    --pretrained_path $PRETRAINED \
    --checkpoint {student,teacher} \
    --type patch \
    --topk 36 \
    --patch_window 5 \
    --show_pics 20 \
    --arch vit_small \
    --save_path memory_bank_patch.pth \
    --data_path data/imagenet/val
iBOT Local Part-Level Pattern Layout

The script also supports to extract the patern layout on the [CLS] token, which is actually doing clustering or unsupervised classification. This property is not induced by MIM objective since we also spot this feature on DINO.

python3 -m torch.distributed.launch --nproc_per_node=8 \
    --master_port=${MASTER_PORT:-29500} \
    analysis/extract_pattern/extract_topk_cluster.py \
    --pretrained_path $PRETRAINED \
    --checkpoint {student,teacher} \
    --type cls \
    --topk 36 \
    --show_pics 20 \
    --arch vit_small \
    --save_path memory_bank_cls.pth \
    --data_path data/imagenet/val
iBOT Global Pattern Layout


This repository is built using the DINO repository and the BEiT repository.


This repository is released under the Apache 2.0 license as found in the LICENSE file.

Citing iBOT

If you find this repository useful, please consider giving a star and citation:

  title={iBOT: Image BERT Pre-Training with Online Tokenizer},
  author={Zhou, Jinghao and Wei, Chen and Wang, Huiyu and Shen, Wei and Xie, Cihang and Yuille, Alan and Kong, Tao},
  journal={arXiv preprint arXiv:2111.07832},
  • The reproduce of 100 epochs

    The reproduce of 100 epochs

    Hello, first thanks for your great work. Now, I want to reproduce the results of 100 epochs, i.e, the results of 71.5 knn in figure 8. Can you tell me the corresponding args?

    opened by lzyhha 10
  • Semantic Segmentation on ADE20K

    Semantic Segmentation on ADE20K

    Thank you for your outstanding work, have you tested the Semantic Segmentation on ADE20K in the code? I have encountered many problems (such as mmcv version issues and model init problems), I want to confirm that you can use this code to test normally on ADE20k?

    opened by Trent-tangtao 5
  • Loss goes to NaN after several epochs

    Loss goes to NaN after several epochs


    First of all, well done and thank you for this great work.

    I am trying to launch iBOT experiments but struggle with loss going to NaN after a few epochs. The training loss increases (see below), which seems weird. Have you faced similar issues during your experiments and if yes, how did you solve it ? I have limited ressources to launch my experiments, so I can't play too much on the parameters, and any tip would be greatly appreciated.

    I know that using mixed precision, as this is the case here, can lead to stability issues. I indeed do not get any nan loss when setting use_fp16 to False. But I'd rather let use_fp16 to True to keep a reasonable training time.

    I tried to increase eps of AdamW to 1e-6 and of Batch norm layers to 6.1e-5 (as proposed here), but this did not work. I see in #17 that you propose to decrease beta opt 2 of AdamW, can you comment a bit more on this ? And on other techniques if you know any ?

    Below is how I launched the training (I'm on a slurm cluster so I use run_with_submitit.py that calls main_ibot.py ). Dataset size is ~4M images, hence around 3 times larger than ImageNet10K training set. This is why I divided by 3 the number of epochs for warmup_teacher_temp_epochs and warmup_epochs. Not sure this makes sense though.

    python run_with_submitit.py --arch vit_small \
        --ngpus 4 \
        --nodes 4 \
        --num_workers 8 \
        --teacher_temp 0.07 \
        --warmup_teacher_temp_epochs 10 \
        --warmup_epochs 3 \
        --norm_last_layer false \
        --epochs 100 \
        --batch_size_per_gpu 112 \
        --shared_head true \
        --out_dim 8192 \
        --local_crops_number 10 \
        --global_crops_scale 0.25 1 \
        --local_crops_scale 0.05 0.25 \
        --pred_ratio 0 0.3 \
        --pred_ratio_var 0 0.2 \
        --timeout 1200 \
        --partition gpu_xxx \
        --data_path "xxx" \
        --saveckp_freq 1

    Below, you can find the training metrics recorded :

    {"train_loss": 6.99587931743066, "train_cls": 4.331486056964688, "train_patch": 2.6643932607626066, "train_lr": 0.0005831685964416835, "train_wd": 0.040029588545113536, "train_acc": 0.6155056208539762, "train_nmi": 0.1298466117645908, "train_ari": 0.006587329895082833, "train_fscore": 0.043441455577130375, "train_adjacc": -1, "epoch": 0}
    {"train_loss": 10.466352438147872, "train_cls": 6.67679833335828, "train_patch": 3.789554104414066, "train_lr": 0.0017500000000000005, "train_wd": 0.040207159993842605, "train_acc": 0.408415272718953, "train_nmi": 0.14113477636477684, "train_ari": 0.0071149456743873594, "train_fscore": 0.030568697062104223, "train_adjacc": -1, "epoch": 1}
    {"train_loss": 12.008997473409751, "train_cls": 8.066221890750954, "train_patch": 3.9427755813775858, "train_lr": 0.002916831403558317, "train_wd": 0.0405621652690049, "train_acc": 0.3335961760343232, "train_nmi": 0.13128954990810676, "train_ari": 0.004251988918558582, "train_fscore": 0.03003438171217746, "train_adjacc": -1, "epoch": 2}

    Below you can find more information about the config used for the experiment :

    act_in_head: gelu
    arch: vit_small
    batch_size_per_gpu: 112
    clip_grad: 3.0
    constraint: ""
    data_path: ""
    dist_url: env://
    drop_path: 0.1
    epochs: 100
    freeze_last_layer: 1
    global_crops_number: 2
    global_crops_scale: [0.25, 1.0]
    gpu: 0
    lambda1: 1.0
    lambda2: 1.0
    local_crops_number: 10
    local_crops_scale: [0.05, 0.25]
    local_rank: 0
    lr: 0.0005
    min_lr: 1e-06
    momentum_teacher: 0.996
    ngpus: 4
    nodes: 4
    norm_in_head: None
    norm_last_layer: False
    num_workers: 8
    optimizer: adamw
    out_dim: 8192
    output_dir: ""
    partition: gpu_xxx
    patch_out_dim: 8192
    patch_size: 16
    pred_ratio: [0.0, 0.3]
    pred_ratio_var: [0.0, 0.2]
    pred_shape: block
    pred_start_epoch: 0
    qos: qos_xxx
    rank: 0
    saveckp_freq: 1
    seed: 0
    shared_head: True
    shared_head_teacher: True
    teacher_patch_temp: 0.07
    teacher_temp: 0.07
    timeout: 1200
    use_fp16: True
    use_masked_im_modeling: True
    warmup_epochs: 3
    warmup_teacher_patch_temp: 0.04
    warmup_teacher_temp: 0.04
    warmup_teacher_temp_epochs: 10
    weight_decay: 0.04
    weight_decay_end: 0.4
    window_size: 7
    world_size: 16

    Best regards

    opened by CharlieCheckpt 4
  • checkpoint not saved by master

    checkpoint not saved by master

    as your code describe, https://github.com/bytedance/ibot/blob/3302b63fc7e287afc68601cb1dc2f0c311af8e3b/main_ibot.py#L358 in ddp training, every processor(GPU) would save an checkpoint model in disk, this behaviou may cause duplicate writing problem and saved checkpoint can not be load by torch.load successfully

    opened by luuuyi 3
  • question about imagenet 1% logist regression

    question about imagenet 1% logist regression

    hello, I use the eval_logistic_regression.py (lambd=0.1) to evaluate the provided vit-small model but can only get 58.0 val top-1 acc with 1% data, but 65.9 in your paper.Thanks for your help!

    Start the logistic regression. Matrix X, n=12811, p=384 Switching to regular solver, problem is well conditioned

    Catalyst Accelerator MISO Solver Incremental Solver with uniform sampling Lipschitz constant: 0.25 Multiclass logistic Loss is used L2 regularization Epoch: 10, primal objective: 6.90561, time: 211.912 Best relative duality gap: 0.000430669 Time elapsed : 212.632 Logistic regression result: Acc: 0.58044

    opened by haohang96 2
  • About license

    About license

    Thanks for the great job. As you said, this repository is released under the Apache 2.0 license. I want to know whether that means the pre-trained models are also under the Apache 2.0 license? Thanks!

    opened by WangWenhao0716 2
  • Linear segmentation evaluation on ADE20k

    Linear segmentation evaluation on ADE20k


    I am trying to reproduce the linear segmentation results obtained with the ViT-B IBOT pretrained model, which performs at 38.3 mIoU according to the paper.

    With this model, and the config file provided in:


    I only reach ~18mIoU on ADE20K. I saw that the command in the README change the learning rate and normalize the output so I tried with:

    model.backbone.out_with_norm=true  optimizer.lr=8e-4

    and I got ~20mIoU.

    The only difference is that I am not using apex and the custom distributed optimizer, so I basically comment:

    runner = dict(type='IterBasedRunnerAmp')
    fp16 = None
    optimizer_config = dict(

    In the config file.

    I run my experiment a single node with 8 GPUs. I was wondering if the performance gap could come from the fact that I am not using DistOptimizerHook and apex, or if there is something else I am missing.

    Thanks for your help.

    opened by Adrien987k 2
  • 100 or 300 epoch training

    100 or 300 epoch training

    Hello, Have you train iBOT in shorter training time, eg.100epoch or 300epoch, can you share me these hyper-parameters? Follow DINO, I set python -m torch.distributed.launch --nproc_per_node=8 --master_port=29500\ main_ibot.py \ --arch vit_small \ --output_dir ibot_100epoch \ --data_path imagenet/train \ --batch_size_per_gpu 64 \ --local_crops_number 8 \ --saveckp_freq 10\ --shared_head true \ --epochs 100\ --out_dim 8192, I don't know if this is reasonable

    opened by Trent-tangtao 2
  • Linear semantic segmentation with ViT-L models

    Linear semantic segmentation with ViT-L models


    I was wondering if you had evaluated the ViT-L pretrained on ImageNet1k and ViT-L pretrained on ImageNet22k on the linear semantic segmentation benchmark on ADE20k, similar to column 3 of right table of Table 6 in the paper ? If yes, can you share the results and the corresponding log files ?


    opened by Adrien987k 1
  • Description of DINO in [preliminaries section] is not accurate

    Description of DINO in [preliminaries section] is not accurate

    "The parameters of the student network θ are Exponentially Moving Averaged (EMA) to the parameters of teacher network θ'",

    should be the other way around.

    opened by mega-optimus 1
  • Semantic Segmentation Error on ADE20K

    Semantic Segmentation Error on ADE20K

    Thank you for your outstanding work. When I try to train ViT-S/16 with UperNet as the task layer, I got the error: KeyError: "EncoderDecoder: 'VisionTransformer is not in the backbone registry'" I find the issue Semantic Segmentation on ADE20K

    Solution --> Starting a new terminal window after the installation resolved the issue. This issue could also appear due to GPU - cuda version mismatch.

    But it didn't work. Also I checked the description of mmsegmentation v.12.0, VisionTransformer backbone is not yet supported. Hope you can provide some help.

    opened by bittxtcc 1
  • RuntimeError: Expected to mark a variable ready only once.

    RuntimeError: Expected to mark a variable ready only once.

    Hi, I'm new to ibot and mmcv, sorry to disturb. I'm trying to reproduce the object detection task in evaluation phase. I set the job name to "first_try" and my command is shown below:

    ./run.sh ade20k_seg first_try vit_small teacher 4   data.samples_per_gpu=4   model.backbone.out_with_norm=true   optimizer.lr=3e-5

    and an error occurred before training:

    2022-11-10 17:52:14,699 - mmseg - INFO - workflow: [('train', 1)], max: 160000 iters
    Traceback (most recent call last):
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 176, in <module>
    Traceback (most recent call last):
        main()  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 176, in <module>
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 172, in main
    Traceback (most recent call last):
        meta=meta)  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 176, in <module>
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/train_api.py", line 187, in train_segmentor
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 172, in main
        runner.run(data_loaders, cfg.workflow)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/train_api.py", line 187, in train_segmentor
        runner.run(data_loaders, cfg.workflow)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 172, in main
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/train_api.py", line 187, in train_segmentor
        runner.run(data_loaders, cfg.workflow)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
        iter_runner(iter_loaders[i], **kwargs)
          File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
    iter_runner(iter_loaders[i], **kwargs)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
        iter_runner(iter_loaders[i], **kwargs)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
        getattr(hook, fn_name)(self)
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/apex_runner/optimizer.py", line 37, in after_train_iter
        getattr(hook, fn_name)(self)
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/apex_runner/optimizer.py", line 37, in after_train_iter
            scaled_loss.backward()getattr(hook, fn_name)(self)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/apex_runner/optimizer.py", line 37, in after_train_iter
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
            allow_unreachable=True)  # allow_unreachable flagtorch.autograd.backward(self, gradient, retain_graph, create_graph)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
        allow_unreachable=True)  # allow_unreachable flag
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply
        return self._forward_cls.backward(self, *args)  # type: ignore
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 99, in backward
        allow_unreachable=True)  # allow_unreachable flag
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply
        return self._forward_cls.backward(self, *args)  # type: ignore
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 99, in backward
        torch.autograd.backward(outputs, args)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
        return self._forward_cls.backward(self, *args)  # type: ignore
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 99, in backward
        torch.autograd.backward(outputs, args)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
        allow_unreachable=True)  # allow_unreachable flag
        torch.autograd.backward(outputs, args)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
        allow_unreachable=True)  # allow_unreachable flag
        allow_unreachable=True)  # allow_unreachable flagRuntimeError
    : Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
    RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
    Traceback (most recent call last):
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 176, in <module>
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 172, in main
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/train_api.py", line 187, in train_segmentor
        runner.run(data_loaders, cfg.workflow)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
        iter_runner(iter_loaders[i], **kwargs)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
        getattr(hook, fn_name)(self)
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/apex_runner/optimizer.py", line 37, in after_train_iter
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
        allow_unreachable=True)  # allow_unreachable flag
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply
        return self._forward_cls.backward(self, *args)  # type: ignore
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 99, in backward
        torch.autograd.backward(outputs, args)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
        allow_unreachable=True)  # allow_unreachable flag
    RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
    Traceback (most recent call last):
      File "/home/username/anaconda3/envs/py37/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module>
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
    subprocess.CalledProcessError: Command '['/home/username/anaconda3/envs/py37/bin/python3', '-u', '/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py', '--local_rank=3', '/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/configs/upernet/vit_small_512_ade20k_160k.py', '--launcher', 'pytorch', '--work-dir', '/data/data0/username/spaceevo_segmentation/ibot/work_dirs/first_try/seg', '--deterministic', '--options', 'model.backbone.use_checkpoint=True', 'model.pretrained=/data/data0/username/spaceevo_segmentation/ibot/work_dirs/first_try/checkpoint_teacher.pth', 'data.samples_per_gpu=4', 'model.backbone.out_with_norm=true', 'optimizer.lr=3e-5']' returned non-zero exit status 1.

    I also tried the linear head for segmentation, and there is no such error. Have you ever encountered such a problem? Thanks a lot!

    opened by dejiesmile 0
  • Unsatisfying performance on COCO using Swin-T

    Unsatisfying performance on COCO using Swin-T


    I compared iBOT Swin-T and supervised Swin-T as pre-trained models for COCO, getting the following results:

    Supervised Swin-T: mAP 0.432 iBOT Swin-T: mAP 0.428

    The detection framework is Mask R-CNN 1x with multi-scale training. Do you have any ideas on that?

    opened by Joker316701882 1
  • some debug about use torch.utils.checkpoint.checkpoint

    some debug about use torch.utils.checkpoint.checkpoint

    When I try to use torch.utils.checkpoint.checkpoint as follows, and use apex to train the model, I found that the loss is so small as 0.4, but the normal loss is 2.x.

    So, do you have some idea about this question?

            for blk in self.blocks:
                # x = blk(x)
                x = torch.utils.checkpoint.checkpoint(blk, x)
    opened by ShiYaya 0
Bytedance Inc.
Bytedance Inc.
Code for pre-training CharacterBERT models (as well as BERT models).

Pre-training CharacterBERT (and BERT) This is a repository for pre-training BERT and CharacterBERT. DISCLAIMER: The code was largely adapted from an o

Hicham EL BOUKKOURI 31 Dec 5, 2022
(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain Mingchen Zhuge*, Dehong Gao*, Deng-Ping Fan#, Linbo Jin, Ben Chen, Haoming Zhou, Minghui

null 248 Dec 4, 2022
(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain Mingchen Zhuge*, Dehong Gao*, Deng-Ping Fan#, Linbo Jin, Ben Chen, Haoming Zhou, Minghui

null 250 Jan 8, 2023
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Sun Yi 201 Nov 21, 2022
BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

Pre-trained checkpoint and bert config json file Location of checkpoint and bert config json file This MLCommons members Google Drive location contain

SAIT (Samsung Advanced Institute of Technology) 12 Apr 27, 2022
I-BERT: Integer-only BERT Quantization

I-BERT: Integer-only BERT Quantization HuggingFace Implementation I-BERT is also available in the master branch of HuggingFace! Visit the following li

Sehoon Kim 139 Dec 27, 2022
Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

THUNLP 37 Oct 30, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

Salesforce 44 Nov 1, 2022
Chinese clinical named entity recognition using pre-trained BERT model

Chinese clinical named entity recognition (CNER) using pre-trained BERT model Introduction Code for paper Chinese clinical named entity recognition wi

Xiangyang Li 109 Dec 14, 2022
FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

XCL 191 Dec 31, 2022
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

DeCLIP Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. Our paper is available in arxiv Updates ** Ou

Sense-GVT 470 Dec 30, 2022
CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

CLIP-Indonesian CLIP (Radford et al., 2021) is a multimodal model that can connect images and text by training a vision encoder and a text encoder joi

Galuh 17 Mar 10, 2022
Code release for SLIP Self-supervision meets Language-Image Pre-training

SLIP: Self-supervision meets Language-Image Pre-training What you can find in this repo: Pre-trained models (with ViT-Small, Base, Large) and code to

Meta Research 621 Dec 31, 2022
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Salesforce 1.3k Dec 31, 2022
PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Transformer-PyTorch A PyTorch implementation of the Transformer from the paper Attention is All You Need in both Post-LN (Post-LayerNorm) and Pre-LN (

Jared Wang 22 Feb 27, 2022
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Kimio Kuramitsu 1 Dec 13, 2021
Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

null 61 Jan 1, 2023
[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers This is the official PyTorch implementation and models for UP-DETR paper: @a

dddzg 430 Dec 23, 2022