iBOT: Image BERT Pre-Training with Online Tokenizer

Related tags

Text Data & NLP ibot
Overview

Image BERT Pre-Training with iBOT iBOT Icon

PWC PWC

Official PyTorch implementation and pretrained models for paper iBOT: Image BERT Pre-Training with Online Tokenizer.

[arXiv] [BibTex]

iBOT framework

iBOT is a novel self-supervised pre-training framework that performs masked image modeling with self-distillation. iBOT pre-trained model shows local semantic features, which helps the model transfer well to downstream tasks both at a global scale and a local scale. For example, iBOT achieves strong performance on COCO object detection (51.4 box AP and 44.2 mask AP) and ADE20K semantic segmentation (50.0 mIoU) with vanilla ViT-B/16. iBOT can also extract semantic-meaningful local parts, like dog's ear 🐶 .

Update 🎉

  • December 2021 - Release the code and pre-trained models.
  • November 2021 - Release the pre-print on arXiv.

Installation

See installation structions for details.

Training

For a glimpse at the full documentation of iBOT pre-training, please run:

python main_ibot.py --help

iBOT Pre-Training with ViTs

To start the iBOT pre-training with Vision Transformer (ViT), simply run the following commands. JOB_NAME is a customized argument to distinguish different experiments and this will automatically save checkpoints into the seperate folders.

./run.sh imagenet_pretrain $JOB_NAME vit_{small,base,large} teacher {16,24,64}

The exact arguments to reproduce the models presented in our paper can be found in the args column of the pre-trained models. We also provide the logs for pre-training to help reproducibility.

For example, run iBOT with ViT-S/16 network on two nodes with 8 GPUs for 800 epochs with the following command. The resulting checkpoint should reach 75.2% on k-NN accuracy, 77.9% on linear probing accuracy, and 82.3% on fine-tuning accuracy.

./run.sh imagenet_pretrain $JOB_NAME vit_small teacher 16 \
  --teacher_temp 0.07 \
  --warmup_teacher_temp_epochs 30 \
  --norm_last_layer false \
  --epochs 800 \
  --batch_size_per_gpu 64 \
  --shared_head true \
  --out_dim 8192 \
  --local_crops_number 10 \
  --global_crops_scale 0.25 1 \
  --local_crops_scale 0.05 0.25 \
  --pred_ratio 0 0.3 \
  --pred_ratio_var 0 0.2

iBOT Pre-Training with Swins

This code also works for training iBOT on Swin Transformer (Swin). In the paper, we only conduct experiments on Swin-T with different window size:

./run.sh imagenet_pretrain $JOB_NAME swin_tiny teacher {16,40} \
  --patch_size 4 \
  --window_size {7,14}

For example, run iBOT with Swin-T/14 network on five nodes with 8 GPUS for 300 epochs with the following command. The resulting checkpoint should reach 76.2% on k-NN accuracy, 79.3% on linear probing accuracy.

./run.sh imagenet_pretrain $JOB_NAME swin_tiny teacher 40 \
  --teacher_temp 0.07 \
  --warmup_teacher_temp_epochs 30 \
  --norm_last_layer false \
  --epochs 300 \
  --batch_size_per_gpu 26 \
  --shared_head true \
  --out_dim 8192 \
  --local_crops_number 10 \
  --global_crops_scale 0.25 1 \
  --local_crops_scale 0.05 0.25 \
  --pred_ratio 0 0.3 \
  --pred_ratio_var 0 0.2 \
  --pred_start_epoch 50 \
  --patch_size 4 \
  --window_size 14 

Pre-Trained Models

You can choose to download only the weights of the pretrained backbone used for downstream tasks, and the full ckpt which contains backbone and projection head weights for both student and teacher networks. For the backbone, s denotes that the student network is selected while t denotes that the teacher network is selected.

Arch. Par. k-NN Lin. Fin. download
ViT-S/16 21M 74.5% 77.0% 82.3% backbone (t) full ckpt args logs
Swin-T/7 28M 75.3% 78.6% \ backbone (t) full ckpt args logs
Swin-T/14 28M 76.2% 79.3% \ backbone (t) full ckpt args logs
ViT-B/16 85M 77.1% 79.5% 83.8% backbone (t) full ckpt args logs

We also provide the ViT-{B,L}/16 model pre-trained on ImageNet-22K dataset.

Arch. Par. k-NN Lin. Fin. download
ViT-B/16 85M 71.1% 79.0% 84.4% backbone (s) full ckpt args logs
ViT-L/16 307M 70.6% 81.7% 86.3% backbone (s) full ckpt args logs

To extract the backbone from the full checkpoint by yourself, please run the following command where KEY being either student or teacher.

WEIGHT_FILE=$OUTPUT_DIR/checkpoint_$KEY.pth

python extract_backbone_weights.py \
  --checkpoint_key $KEY \
  $PRETRAINED \
  $WEIGHT_FILE \

Downstream Evaluation

See Evaluating iBOT on Downstream Tasks for details.

Property Analysis

See Analyzing iBOT's Properties for robustness test and visualizing self-attention map:

iBOT Global Pattern Layout

or extracting sparse correspondence pairs bwtween two images:

iBOT Global Pattern Layout

Extracting Semantic Patterns

We extract top-k numbered local classes based on patch tokens with their corresponding patches and contexts by running the following command. We indentify very diverse behaviour like shared low-level textures and high-level semantics.

python3 -m torch.distributed.launch --nproc_per_node=8 \
    --master_port=${MASTER_PORT:-29500} \
    analysis/extract_pattern/extract_topk_cluster.py \
    --pretrained_path $PRETRAINED \
    --checkpoint {student,teacher} \
    --type patch \
    --topk 36 \
    --patch_window 5 \
    --show_pics 20 \
    --arch vit_small \
    --save_path memory_bank_patch.pth \
    --data_path data/imagenet/val
iBOT Local Part-Level Pattern Layout

The script also supports to extract the patern layout on the [CLS] token, which is actually doing clustering or unsupervised classification. This property is not induced by MIM objective since we also spot this feature on DINO.

python3 -m torch.distributed.launch --nproc_per_node=8 \
    --master_port=${MASTER_PORT:-29500} \
    analysis/extract_pattern/extract_topk_cluster.py \
    --pretrained_path $PRETRAINED \
    --checkpoint {student,teacher} \
    --type cls \
    --topk 36 \
    --show_pics 20 \
    --arch vit_small \
    --save_path memory_bank_cls.pth \
    --data_path data/imagenet/val
iBOT Global Pattern Layout

Acknowledgement

This repository is built using the DINO repository and the BEiT repository.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Citing iBOT

If you find this repository useful, please consider giving a star and citation:

@article{zhou2021ibot,
  title={iBOT: Image BERT Pre-Training with Online Tokenizer},
  author={Zhou, Jinghao and Wei, Chen and Wang, Huiyu and Shen, Wei and Xie, Cihang and Yuille, Alan and Kong, Tao},
  journal={arXiv preprint arXiv:2111.07832},
  year={2021}
}
Comments
  • The reproduce of 100 epochs

    The reproduce of 100 epochs

    Hello, first thanks for your great work. Now, I want to reproduce the results of 100 epochs, i.e, the results of 71.5 knn in figure 8. Can you tell me the corresponding args?

    opened by lzyhha 10
  • Semantic Segmentation on ADE20K

    Semantic Segmentation on ADE20K

    Thank you for your outstanding work, have you tested the Semantic Segmentation on ADE20K in the code? I have encountered many problems (such as mmcv version issues and model init problems), I want to confirm that you can use this code to test normally on ADE20k?

    opened by Trent-tangtao 5
  • Loss goes to NaN after several epochs

    Loss goes to NaN after several epochs

    Hello,

    First of all, well done and thank you for this great work.

    I am trying to launch iBOT experiments but struggle with loss going to NaN after a few epochs. The training loss increases (see below), which seems weird. Have you faced similar issues during your experiments and if yes, how did you solve it ? I have limited ressources to launch my experiments, so I can't play too much on the parameters, and any tip would be greatly appreciated.

    I know that using mixed precision, as this is the case here, can lead to stability issues. I indeed do not get any nan loss when setting use_fp16 to False. But I'd rather let use_fp16 to True to keep a reasonable training time.

    I tried to increase eps of AdamW to 1e-6 and of Batch norm layers to 6.1e-5 (as proposed here), but this did not work. I see in #17 that you propose to decrease beta opt 2 of AdamW, can you comment a bit more on this ? And on other techniques if you know any ?

    Below is how I launched the training (I'm on a slurm cluster so I use run_with_submitit.py that calls main_ibot.py ). Dataset size is ~4M images, hence around 3 times larger than ImageNet10K training set. This is why I divided by 3 the number of epochs for warmup_teacher_temp_epochs and warmup_epochs. Not sure this makes sense though.

    python run_with_submitit.py --arch vit_small \
        --ngpus 4 \
        --nodes 4 \
        --num_workers 8 \
        --teacher_temp 0.07 \
        --warmup_teacher_temp_epochs 10 \
        --warmup_epochs 3 \
        --norm_last_layer false \
        --epochs 100 \
        --batch_size_per_gpu 112 \
        --shared_head true \
        --out_dim 8192 \
        --local_crops_number 10 \
        --global_crops_scale 0.25 1 \
        --local_crops_scale 0.05 0.25 \
        --pred_ratio 0 0.3 \
        --pred_ratio_var 0 0.2 \
        --timeout 1200 \
        --partition gpu_xxx \
        --data_path "xxx" \
        --saveckp_freq 1
    

    Below, you can find the training metrics recorded :

    {"train_loss": 6.99587931743066, "train_cls": 4.331486056964688, "train_patch": 2.6643932607626066, "train_lr": 0.0005831685964416835, "train_wd": 0.040029588545113536, "train_acc": 0.6155056208539762, "train_nmi": 0.1298466117645908, "train_ari": 0.006587329895082833, "train_fscore": 0.043441455577130375, "train_adjacc": -1, "epoch": 0}
    {"train_loss": 10.466352438147872, "train_cls": 6.67679833335828, "train_patch": 3.789554104414066, "train_lr": 0.0017500000000000005, "train_wd": 0.040207159993842605, "train_acc": 0.408415272718953, "train_nmi": 0.14113477636477684, "train_ari": 0.0071149456743873594, "train_fscore": 0.030568697062104223, "train_adjacc": -1, "epoch": 1}
    {"train_loss": 12.008997473409751, "train_cls": 8.066221890750954, "train_patch": 3.9427755813775858, "train_lr": 0.002916831403558317, "train_wd": 0.0405621652690049, "train_acc": 0.3335961760343232, "train_nmi": 0.13128954990810676, "train_ari": 0.004251988918558582, "train_fscore": 0.03003438171217746, "train_adjacc": -1, "epoch": 2}
    

    Below you can find more information about the config used for the experiment :

    act_in_head: gelu
    arch: vit_small
    batch_size_per_gpu: 112
    clip_grad: 3.0
    comment:
    constraint: ""
    data_path: ""
    dist_url: env://
    drop_path: 0.1
    epochs: 100
    freeze_last_layer: 1
    global_crops_number: 2
    global_crops_scale: [0.25, 1.0]
    gpu: 0
    lambda1: 1.0
    lambda2: 1.0
    local_crops_number: 10
    local_crops_scale: [0.05, 0.25]
    local_rank: 0
    lr: 0.0005
    min_lr: 1e-06
    momentum_teacher: 0.996
    ngpus: 4
    nodes: 4
    norm_in_head: None
    norm_last_layer: False
    num_workers: 8
    optimizer: adamw
    out_dim: 8192
    output_dir: ""
    partition: gpu_xxx
    patch_out_dim: 8192
    patch_size: 16
    pred_ratio: [0.0, 0.3]
    pred_ratio_var: [0.0, 0.2]
    pred_shape: block
    pred_start_epoch: 0
    qos: qos_xxx
    rank: 0
    saveckp_freq: 1
    seed: 0
    shared_head: True
    shared_head_teacher: True
    teacher_patch_temp: 0.07
    teacher_temp: 0.07
    timeout: 1200
    use_fp16: True
    use_masked_im_modeling: True
    warmup_epochs: 3
    warmup_teacher_patch_temp: 0.04
    warmup_teacher_temp: 0.04
    warmup_teacher_temp_epochs: 10
    weight_decay: 0.04
    weight_decay_end: 0.4
    window_size: 7
    world_size: 16
    

    Best regards

    opened by CharlieCheckpt 4
  • checkpoint not saved by master

    checkpoint not saved by master

    as your code describe, https://github.com/bytedance/ibot/blob/3302b63fc7e287afc68601cb1dc2f0c311af8e3b/main_ibot.py#L358 in ddp training, every processor(GPU) would save an checkpoint model in disk, this behaviou may cause duplicate writing problem and saved checkpoint can not be load by torch.load successfully

    opened by luuuyi 3
  • question about imagenet 1% logist regression

    question about imagenet 1% logist regression

    hello, I use the eval_logistic_regression.py (lambd=0.1) to evaluate the provided vit-small model but can only get 58.0 val top-1 acc with 1% data, but 65.9 in your paper.Thanks for your help!

    Start the logistic regression. Matrix X, n=12811, p=384 Switching to regular solver, problem is well conditioned


    Catalyst Accelerator MISO Solver Incremental Solver with uniform sampling Lipschitz constant: 0.25 Multiclass logistic Loss is used L2 regularization Epoch: 10, primal objective: 6.90561, time: 211.912 Best relative duality gap: 0.000430669 Time elapsed : 212.632 Logistic regression result: Acc: 0.58044

    opened by haohang96 2
  • About license

    About license

    Thanks for the great job. As you said, this repository is released under the Apache 2.0 license. I want to know whether that means the pre-trained models are also under the Apache 2.0 license? Thanks!

    opened by WangWenhao0716 2
  • Linear segmentation evaluation on ADE20k

    Linear segmentation evaluation on ADE20k

    Hi,

    I am trying to reproduce the linear segmentation results obtained with the ViT-B IBOT pretrained model, which performs at 38.3 mIoU according to the paper.

    With this model, and the config file provided in:

    ibot/evaluation/semantic_segmentation/configs/linear/vit_base_512_ade20k_160k.py
    

    I only reach ~18mIoU on ADE20K. I saw that the command in the README change the learning rate and normalize the output so I tried with:

    model.backbone.out_with_norm=true  optimizer.lr=8e-4
    

    and I got ~20mIoU.

    The only difference is that I am not using apex and the custom distributed optimizer, so I basically comment:

    runner = dict(type='IterBasedRunnerAmp')
    fp16 = None
    optimizer_config = dict(
        type="DistOptimizerHook",
        update_interval=1,
        grad_clip=None,
        coalesce=True,
        bucket_size_mb=-1,
        use_fp16=True,
    )
    

    In the config file.

    I run my experiment a single node with 8 GPUs. I was wondering if the performance gap could come from the fact that I am not using DistOptimizerHook and apex, or if there is something else I am missing.

    Thanks for your help.

    opened by Adrien987k 2
  • 100 or 300 epoch training

    100 or 300 epoch training

    Hello, Have you train iBOT in shorter training time, eg.100epoch or 300epoch, can you share me these hyper-parameters? Follow DINO, I set python -m torch.distributed.launch --nproc_per_node=8 --master_port=29500\ main_ibot.py \ --arch vit_small \ --output_dir ibot_100epoch \ --data_path imagenet/train \ --batch_size_per_gpu 64 \ --local_crops_number 8 \ --saveckp_freq 10\ --shared_head true \ --epochs 100\ --out_dim 8192, I don't know if this is reasonable

    opened by Trent-tangtao 2
  • Linear semantic segmentation with ViT-L models

    Linear semantic segmentation with ViT-L models

    Hi,

    I was wondering if you had evaluated the ViT-L pretrained on ImageNet1k and ViT-L pretrained on ImageNet22k on the linear semantic segmentation benchmark on ADE20k, similar to column 3 of right table of Table 6 in the paper ? If yes, can you share the results and the corresponding log files ?

    Thanks!

    opened by Adrien987k 1
  • Description of DINO in [preliminaries section] is not accurate

    Description of DINO in [preliminaries section] is not accurate

    "The parameters of the student network θ are Exponentially Moving Averaged (EMA) to the parameters of teacher network θ'",

    should be the other way around.

    opened by mega-optimus 1
  • Semantic Segmentation Error on ADE20K

    Semantic Segmentation Error on ADE20K

    Thank you for your outstanding work. When I try to train ViT-S/16 with UperNet as the task layer, I got the error: KeyError: "EncoderDecoder: 'VisionTransformer is not in the backbone registry'" I find the issue Semantic Segmentation on ADE20K

    Solution --> Starting a new terminal window after the installation resolved the issue. This issue could also appear due to GPU - cuda version mismatch.

    But it didn't work. Also I checked the description of mmsegmentation v.12.0, VisionTransformer backbone is not yet supported. Hope you can provide some help.

    opened by bittxtcc 1
  • RuntimeError: Expected to mark a variable ready only once.

    RuntimeError: Expected to mark a variable ready only once.

    Hi, I'm new to ibot and mmcv, sorry to disturb. I'm trying to reproduce the object detection task in evaluation phase. I set the job name to "first_try" and my command is shown below:

    ./run.sh ade20k_seg first_try vit_small teacher 4   data.samples_per_gpu=4   model.backbone.out_with_norm=true   optimizer.lr=3e-5
    

    and an error occurred before training:

    2022-11-10 17:52:14,699 - mmseg - INFO - workflow: [('train', 1)], max: 160000 iters
    Traceback (most recent call last):
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 176, in <module>
    Traceback (most recent call last):
        main()  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 176, in <module>
    
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 172, in main
    Traceback (most recent call last):
        meta=meta)  File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 176, in <module>
    
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/train_api.py", line 187, in train_segmentor
        main()
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 172, in main
        runner.run(data_loaders, cfg.workflow)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
        meta=meta)
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/train_api.py", line 187, in train_segmentor
        runner.run(data_loaders, cfg.workflow)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
        main()
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 172, in main
        meta=meta)
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/train_api.py", line 187, in train_segmentor
        runner.run(data_loaders, cfg.workflow)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
        iter_runner(iter_loaders[i], **kwargs)
          File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
    iter_runner(iter_loaders[i], **kwargs)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
        iter_runner(iter_loaders[i], **kwargs)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
        self.call_hook('after_train_iter')    
    self.call_hook('after_train_iter')
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
        self.call_hook('after_train_iter')
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
        getattr(hook, fn_name)(self)
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/apex_runner/optimizer.py", line 37, in after_train_iter
        getattr(hook, fn_name)(self)
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/apex_runner/optimizer.py", line 37, in after_train_iter
            scaled_loss.backward()getattr(hook, fn_name)(self)
    
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/apex_runner/optimizer.py", line 37, in after_train_iter
        scaled_loss.backward()
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
        scaled_loss.backward()
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
            allow_unreachable=True)  # allow_unreachable flagtorch.autograd.backward(self, gradient, retain_graph, create_graph)
    
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
        allow_unreachable=True)  # allow_unreachable flag
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply
        return self._forward_cls.backward(self, *args)  # type: ignore
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 99, in backward
        allow_unreachable=True)  # allow_unreachable flag
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply
        return self._forward_cls.backward(self, *args)  # type: ignore
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 99, in backward
        torch.autograd.backward(outputs, args)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
        return self._forward_cls.backward(self, *args)  # type: ignore
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 99, in backward
        torch.autograd.backward(outputs, args)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
        allow_unreachable=True)  # allow_unreachable flag
        torch.autograd.backward(outputs, args)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
        allow_unreachable=True)  # allow_unreachable flag
        allow_unreachable=True)  # allow_unreachable flagRuntimeError
    : Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
    RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
    Traceback (most recent call last):
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 176, in <module>
        main()
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py", line 172, in main
        meta=meta)
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/train_api.py", line 187, in train_segmentor
        runner.run(data_loaders, cfg.workflow)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
        iter_runner(iter_loaders[i], **kwargs)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
        self.call_hook('after_train_iter')
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
        getattr(hook, fn_name)(self)
      File "/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/mmcv_custom/apex_runner/optimizer.py", line 37, in after_train_iter
        scaled_loss.backward()
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
        allow_unreachable=True)  # allow_unreachable flag
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply
        return self._forward_cls.backward(self, *args)  # type: ignore
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 99, in backward
        torch.autograd.backward(outputs, args)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
        allow_unreachable=True)  # allow_unreachable flag
    RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
    Traceback (most recent call last):
      File "/home/username/anaconda3/envs/py37/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module>
        main()
      File "/home/username/anaconda3/envs/py37/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
        cmd=cmd)
    subprocess.CalledProcessError: Command '['/home/username/anaconda3/envs/py37/bin/python3', '-u', '/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/train.py', '--local_rank=3', '/data/data0/username/spaceevo_segmentation/ibot/evaluation/semantic_segmentation/configs/upernet/vit_small_512_ade20k_160k.py', '--launcher', 'pytorch', '--work-dir', '/data/data0/username/spaceevo_segmentation/ibot/work_dirs/first_try/seg', '--deterministic', '--options', 'model.backbone.use_checkpoint=True', 'model.pretrained=/data/data0/username/spaceevo_segmentation/ibot/work_dirs/first_try/checkpoint_teacher.pth', 'data.samples_per_gpu=4', 'model.backbone.out_with_norm=true', 'optimizer.lr=3e-5']' returned non-zero exit status 1.
    

    I also tried the linear head for segmentation, and there is no such error. Have you ever encountered such a problem? Thanks a lot!

    opened by dejiesmile 0
  • Unsatisfying performance on COCO using Swin-T

    Unsatisfying performance on COCO using Swin-T

    Hi.

    I compared iBOT Swin-T and supervised Swin-T as pre-trained models for COCO, getting the following results:

    Supervised Swin-T: mAP 0.432 iBOT Swin-T: mAP 0.428

    The detection framework is Mask R-CNN 1x with multi-scale training. Do you have any ideas on that?

    opened by Joker316701882 1
  • some debug about use torch.utils.checkpoint.checkpoint

    some debug about use torch.utils.checkpoint.checkpoint

    When I try to use torch.utils.checkpoint.checkpoint as follows, and use apex to train the model, I found that the loss is so small as 0.4, but the normal loss is 2.x.

    So, do you have some idea about this question?

            for blk in self.blocks:
                # x = blk(x)
                x = torch.utils.checkpoint.checkpoint(blk, x)
    
    opened by ShiYaya 0
Owner
Bytedance Inc.
Bytedance Inc.
天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch

天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch

zxx飞翔的鱼 751 Dec 30, 2022
Pre-training BERT masked language models with custom vocabulary

Pre-training BERT Masked Language Models (MLM) This repository contains the method to pre-train a BERT model using custom vocabulary. It was used to p

Stella Douka 14 Nov 2, 2022
TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

Yixuan Su 26 Oct 17, 2022
Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

Google 6.4k Jan 1, 2023
Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

VK.com 847 Dec 19, 2022
Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

Google 4.8k Feb 18, 2021
Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

VK.com 718 Feb 18, 2021
Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage >>> from transformers import RemBertToken

Koichi Yasuoka 3 Dec 22, 2021
A Japanese tokenizer based on recurrent neural networks

Nagisa is a python module for Japanese word segmentation/POS-tagging. It is designed to be a simple and easy-to-use tool. This tool has the following

null 325 Jan 5, 2023
Train BPE with fastBPE, and load to Huggingface Tokenizer.

BPEer Train BPE with fastBPE, and load to Huggingface Tokenizer. Description The BPETrainer of Huggingface consumes a lot of memory when I am training

Lizhuo 1 Dec 23, 2021
Tokenizer - Module python d'analyse syntaxique et de grammaire, tokenization

Tokenizer Le Tokenizer est un analyseur lexicale, il permet, comme Flex and Yacc par exemple, de tokenizer du code, c'est à dire transformer du code e

Manolo 1 Aug 15, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

Salesforce 44 Nov 1, 2022
Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Ubiquitous Knowledge Processing Lab 59 Dec 1, 2022
Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

背景 安装教程 快速上手 (一)预训练模型 (二)机器翻译 (三)文本分类 TenTrans 进阶 1. 多语言机器翻译 2. 跨语言预训练 背景 TrenTrans是一个统一的端到端的多语言多任务预训练平台,支持多种预训练方式,以及序列生成和自然语言理解任务。 安装教程 git clone git

Tencent Minority-Mandarin Translation Team 42 Dec 20, 2022
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)

InstaDeep Ltd 72 Dec 9, 2022