[ICCV2021] Learning to Track Objects from Unlabeled Videos

Related tags

Deep Learning USOT
Overview

Unsupervised Single Object Tracking (USOT)

🌿 Learning to Track Objects from Unlabeled Videos

Jilai Zheng, Chao Ma, Houwen Peng and Xiaokang Yang

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Introduction

This repository implements unsupervised deep tracker USOT, which learns to track objects from unlabeled videos.

Main ideas of USOT are listed as follows.

  • Coarsely discovering moving objects from videos, with pseudo boxes precise enough for bbox regression.
  • Training a naive Siamese tracker from single-frame pairs, then gradually extending it to longer temporal spans.
  • Following cycle memory training paradigm, enabling unsupervised tracker to update online.

Results

Results of USOT and USOT* on recent tracking benchmarks.

Model VOT2016
EAO
VOT2018
EAO
VOT2020
EAO
LaSOT
AUC (%)
TrackingNet
AUC (%)
OTB100
AUC (%)
USOT 0.351 0.290 0.222 33.7 59.9 58.9
USOT* 0.402 0.344 0.219 35.8 61.5 57.4

Raw result files can be found in folder result from Google Drive.

Tutorial

Environments

The environment we utilize is listed as follows.

  • Preprocessing: Pytorch 1.1.0 + CUDA-9.0 / 10.0 (following ARFlow)
  • Train / Test / Eval: Pytorch 1.7.1 + CUDA-10.0 / 10.2 / 11.1

If you have problems for preprocessing, you can actually skip it by downloading off-the-shelf preprocessed materials.

Preparations

Assume the project root path is $USOT_PATH. You can build an environment for development with the provided script, where $CONDA_PATH denotes your anaconda path.

cd $USOT_PATH
bash ./preprocessing/install_model.sh $CONDA_PATH USOT
source activate USOT && export PYTHONPATH=$(pwd)

You can revise the CUDA toolkit version for pytorch in install_model.sh (by default 10.0).

Test and Eval

First, we provide both models utilized in our paper (USOT.pth and USOT_star.pth). You can download them in folder snapshot from Google Drive, and place them in $USOT_PATH/var/snapshot.

Next, you can link your wanted benchmark dataset (e.g. VOT2018) to $USOT_PATH/datasets_test as follows. The ground truth json files for some benchmarks (e.g VOT2018.json) can be downloaded in folder test from Google Drive, and placed also in $USOT_PATH/datasets_test.

cd $USOT_PATH && mkdir datasets_test
ln -s $your_benchmark_path ./datasets_test/VOT2018

After that, you can test the tracker on these benchmarks (e.g. VOT2018) as follows. The raw results will be placed in $USOT_PATH/var/result/VOT2018/USOT.

cd $USOT_PATH
python -u ./scripts/test_usot.py --dataset VOT2018 --resume ./var/snapshot/USOT_star.pth

The inference result can be evaluated with pysot-toolkit. Install pysot-toolkit before evaluation.

cd $USOT_PATH/lib/eval_toolkit/pysot/utils
python setup.py build_ext --inplace

Then the evaluation can be conducted as follows.

cd $USOT_PATH
python ./lib/eval_toolkit/bin/eval.py --dataset_dir datasets_test \
        --dataset VOT2018 --tracker_result_dir var/result/VOT2018 --trackers USOT

Train

First, download the pretrained backbone in folder pretrain from Google Drive into $USOT_PATH/pretrain. Note that USOT* and USOT are respectively trained from imagenet_pretrain.model and moco_v2_800.model.

Second, preprocess the raw datasets with the paradigm of DP + Flow. Refer to $USOT_PATH/preprocessing/datasets_train for details.

In fact, we have provided two shortcuts for skipping this preprocessing procedure.

  • You can directly download the generated pseudo box files (e.g. got10k_flow.json) in folder train/box_sample_result from Google Drive, and place them into the corresponding dataset preprocessing path (e.g. $USOT_PATH/preprocessing/datasets_train/got10k), in order to skip the box generation procedure.
  • You can directly download the whole cropped training dataset (e.g. got10k_flow.tar) in dataset folder from Google Drive (Coming soon) (e.g. train/GOT-10k), which enables you to skip all procedures in preprocessing.

Third, revise the config file for training as $USOT_PATH/experiments/train/USOT.yaml. Very important options are listed as follows.

  • GPUS: the gpus for training, e.g. '0,1,2,3'
  • TRAIN/PRETRAIN: the pretrained backbone, e.g. 'imagenet_pretrain.model'
  • DATASET: the folder for your cropped training instances and their pseudo annotation files, e.g. PATH: '/data/got10k_flow/crop511/', ANNOTATION: '/data/got10k_flow/train.json'

Finally, you can start the training phase with the following script. The training checkpoints will also be placed automatically in $USOT_PATH/var/snapshot.

cd $USOT_PATH
python -u ./scripts/train_usot.py --cfg experiments/train/USOT.yaml --gpus 0,1,2,3 --workers 32

We also provide a onekey script for train, test and eval.

cd $USOT_PATH
python ./scripts/onekey_usot.py --cfg experiments/train/USOT.yaml

Citation

If any parts of our paper and codes are helpful to your work, please generously citing:

@inproceedings{zheng-iccv2021-usot,
   title={Learning to Track Objects from Unlabeled Videos},
   author={Jilai Zheng and Chao Ma and Houwen Peng and Xiaokang Yang},
   booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
   year={2021}
}

Reference

We refer to the following repositories when implementing our unsupervised tracker. Thanks for their great work.

Contact

Feel free to contact me if you have any questions.

Comments
  • about the score_map

    about the score_map

    When I only use offline siamese module to train this model for a while, I find that the cls_score stop update and the score_map visualized like below: 1670934545979 Do you know how can I fix this? Thank you very much!

    opened by LiShenglana 5
  • I found difficult to reproduce

    I found difficult to reproduce

    my environment is pytorch1.8.1, cuda11.1, python3.8.0, batch size is 12 I can not reproduce the result using full data(VID, GOT10k, LASOT, YTVOS) image where A R EAO I reproduce is 0.574 0.393 0.300. can it be reproduced using pytorch1.7?

    opened by chenrxi 5
  • "Ninja is required to load C++ extensions"

    Using /home/cscv/.cache/torch_extensions as PyTorch extensions root... Creating extension directory /home/cscv/.cache/torch_extensions/_prroi_pooling... Traceback (most recent call last): File "/home/cscv/Documents/lsl/USOT/scripts/train_usot.py", line 421, in main() File "/home/cscv/Documents/lsl/USOT/scripts/train_usot.py", line 411, in main curLR, config, writer_dict, logger, device=device) File "/home/cscv/Documents/lsl/USOT/scripts/train_usot.py", line 247, in usot_train search_memory_ir=search_memory_ir, search_bbox=search_bbox, cls_ratio=cls_ratio) File "/home/cscv/anaconda3/envs/usot-model2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/cscv/anaconda3/envs/usot-model2/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/cscv/anaconda3/envs/usot-model2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/cscv/Documents/lsl/USOT/lib/models/models.py", line 287, in forward _, zf = self.neck(zf, crop=True, pr_pool=self.pr_pool, bbox=template_bbox) File "/home/cscv/anaconda3/envs/usot-model2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/cscv/Documents/lsl/USOT/lib/models/connect.py", line 349, in forward xf_pr = self.prpooling(x_ori, bbox) File "/home/cscv/anaconda3/envs/usot-model2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/cscv/Documents/lsl/USOT/lib/models/prroi_pool/prroi_pool.py", line 28, in forward return prroi_pool2d(features, rois, self.pooled_height, self.pooled_width, self.spatial_scale) File "/home/cscv/Documents/lsl/USOT/lib/models/prroi_pool/functional.py", line 44, in forward _prroi_pooling = _import_prroi_pooling() File "/home/cscv/Documents/lsl/USOT/lib/models/prroi_pool/functional.py", line 33, in _import_prroi_pooling verbose=True File "/home/cscv/anaconda3/envs/usot-model2/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 997, in load keep_intermediates=keep_intermediates) File "/home/cscv/anaconda3/envs/usot-model2/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1202, in _jit_compile with_cuda=with_cuda) File "/home/cscv/anaconda3/envs/usot-model2/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1268, in _write_ninja_file_and_build_library verify_ninja_availability() File "/home/cscv/anaconda3/envs/usot-model2/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1323, in verify_ninja_availability raise RuntimeError("Ninja is required to load C++ extensions") RuntimeError: Ninja is required to load C++ extensions

    I encounterd this problem, it seems like need ninja, but I already install this packages: (usot-model2) cscv@cscv-202206:~/Documents/lsl/USOT/scripts$ which ninja /home/cscv/anaconda3/envs/usot-model2/bin/ninja How can I solve this, thank you very much!

    opened by LiShenglana 4
  • VOT2020 test

    VOT2020 test

    Hello, how did you complete the VOT2020 test evaluation? I have seen your comment that it is done by using test_vot2020.py file and vot toolkit. Could you please tell me more about it? Thank you again for your excellent work.

    opened by zrz1018 4
  • Problem when increasing batchsize

    Problem when increasing batchsize

    When increasing the Batchsize from the default 12 to 64 this error occurs: File "/USOT/lib/models/models.py", line 276, in forward bbox_pred_to_img = self.pred_offset_to_image_bbox(off_forward_bbox, batch) File "/USOT/lib/models/models.py", line 145, in pred_offset_to_image_bbox pred_x1 = self.grid_to_search_x - bbox_pred[:, 0, ...].unsqueeze(1) RuntimeError: The size of tensor a (64) must match the size of tensor b (256) at non-singleton dimension 0

    The sizes of the relevant variables for batch size 12: self.grid_to_search_x / y = [48, 1, 25, 25] bbox_pred = [48, 4, 25, 25]

    and for batch size 64: self.grid_to_search_x / y = [64, 1, 25, 25] bbox_pred = [256, 4, 25, 25]

    opened by wr0112358 2
  • error while trying to train

    error while trying to train

    Trying to train on a custom dataset, named it got10k to change as little as I can in the code and config, I ran the preprocess stated in the README and received the following flow json got10k_flow.txt.

    When I try to run the train using

    python -u ./scripts/train_usot.py --cfg experiments/train/USOT.yaml --gpus 0 --workers 8
    

    I get the following error,

    /content/drive/MyDrive/USOT/USOT-main
    => creating var/log/USOT
    => creating var/log/USOT/USOT_2022-08-12-13-57
    Namespace(cfg='experiments/train/USOT.yaml', gpus='0', workers=8)
    {'CHECKPOINT_DIR': 'var/snapshot',
     'GPUS': '0',
     'OUTPUT_DIR': 'var/log',
     'PRINT_FREQ': 10,
     'USOT': {'DATASET': {'FAR_SAMPLE': 3,
                          'GOT10K': {'ANNOTATION': '/content/drive/MyDrive/USOT/USOT-main/preprocessing/datasets_train/got10k/got10k_flow.json',
                                     'PATH': '/content/drive/MyDrive/USOT/USOT-main/dataset',
                                     'USE': 19000},
                          'LASOT': {'ANNOTATION': '/home/jlzheng/dataset/lasot_flow/train.json',
                                    'PATH': '/home/jlzheng/dataset/lasot_flow/crop511/',
                                    'USE': 13000},
                          'SCALE': 0.05,
                          'SCALEm': 0.18,
                          'SCALEs': 0.18,
                          'SHIFT': 4,
                          'SHIFTm': 64,
                          'SHIFTs': 64,
                          'VID': {'ANNOTATION': '/home/jlzheng/dataset/VID_flow/train.json',
                                  'PATH': '/home/jlzheng/dataset/VID_flow/crop511/',
                                  'USE': 14000},
                          'VIDEO_QUALITY': 0.4,
                          'YTVOS': {'ANNOTATION': '/home/jlzheng/dataset/ytvos_flow/train.json',
                                    'PATH': '/home/jlzheng/dataset/ytvos_flow/crop511/',
                                    'USE': 4000}},
              'TEST': {'DATA': 'GOT10K',
                       'END_EPOCH': 30,
                       'ISTRUE': False,
                       'MODEL': 'USOT',
                       'START_EPOCH': 10,
                       'THREADS': 11},
              'TRAIN': {'BASE_LR': 0.005,
                        'BATCH': 12,
                        'BATCH_STAGE_2': 12,
                        'CLS_RATIOS': [0.6, 0.5, 0.4],
                        'CLS_RATIO_SHIFT_EPOCHS': [0, 7, 10],
                        'END_EPOCH': 30,
                        'ISTRUE': True,
                        'LAMBDA_1_LIST': [0.3, 0.275, 0.25],
                        'LAMBDA_1_NAIVE': 0.2,
                        'LAMBDA_SHIFT_EPOCHS': [0, 7, 9],
                        'LAMBDA_TOTAL': 0.9,
                        'LAYERS_LR': 0.1,
                        'LR': {'KWARGS': {'end_lr': 2e-05, 'start_lr': 0.005},
                               'TYPE': 'log'},
                        'MEMORY_EPOCH': 6,
                        'MEMORY_NUM': 4,
                        'MODEL': 'USOT',
                        'MOMENTUM': 0.9,
                        'PRETRAIN': 'imagenet_pretrain.model',
                        'RESUME': False,
                        'SEARCH_SIZE': 255,
                        'START_EPOCH': 1,
                        'STRIDE': 8,
                        'TEMPLATE_SIZE': 127,
                        'TRAINABLE_LAYER': ['layer1', 'layer2', 'layer3'],
                        'UNFIX_EPOCH': 10,
                        'UNFIX_POLICY': 'log',
                        'WARMUP': {'EPOCH': 5,
                                   'IFNOT': True,
                                   'KWARGS': {'end_lr': 0.005,
                                              'start_lr': 0.0025,
                                              'step': 1},
                                   'TYPE': 'step'},
                        'WARM_POLICY': 'step',
                        'WEIGHT_DECAY': 0.0001,
                        'WHICH_USE': ['GOT10K']}},
     'WORKERS': 8}
    USOT(
      (criterion): BCEWithLogitsLoss()
      (features): ResNet50(
        (features): ResNet_plus2(
          (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), bias=False)
          (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace)
          (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
          (layer1): Sequential(
            (0): Bottleneck(
              (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (relu): ReLU(inplace)
              (downsample): Sequential(
                (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              )
            )
            (1): Bottleneck(
              (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (relu): ReLU(inplace)
            )
            (2): Bottleneck(
              (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (relu): ReLU(inplace)
            )
          )
          (layer2): Sequential(
            (0): Bottleneck(
              (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), bias=False)
              (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (relu): ReLU(inplace)
              (downsample): Sequential(
                (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), bias=False)
                (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              )
            )
            (1): Bottleneck(
              (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (relu): ReLU(inplace)
            )
            (2): Bottleneck(
              (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (relu): ReLU(inplace)
            )
            (3): Bottleneck(
              (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (relu): ReLU(inplace)
            )
          )
          (layer3): Sequential(
            (0): Bottleneck(
              (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (relu): ReLU(inplace)
              (downsample): Sequential(
                (0): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              )
            )
            (1): Bottleneck(
              (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
              (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (relu): ReLU(inplace)
            )
            (2): Bottleneck(
              (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
              (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (relu): ReLU(inplace)
            )
            (3): Bottleneck(
              (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
              (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (relu): ReLU(inplace)
            )
            (4): Bottleneck(
              (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
              (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (relu): ReLU(inplace)
            )
            (5): Bottleneck(
              (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
              (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (relu): ReLU(inplace)
            )
          )
        )
      )
      (neck): AdjustLayer(
        (downsample): Sequential(
          (0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (prpooling): PrRoIPool2D(kernel_size=(7, 7), spatial_scale=1.0)
      )
      (connect_model): box_tower_reg(
        (cls_encode): matrix(
          (matrix11_k): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU(inplace)
          )
          (matrix11_s): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU(inplace)
          )
          (matrix12_k): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), dilation=(2, 1), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU(inplace)
          )
          (matrix12_s): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), dilation=(2, 1), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU(inplace)
          )
          (matrix21_k): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), dilation=(1, 2), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU(inplace)
          )
          (matrix21_s): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), dilation=(1, 2), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU(inplace)
          )
        )
        (reg_encode): matrix(
          (matrix11_k): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU(inplace)
          )
          (matrix11_s): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU(inplace)
          )
          (matrix12_k): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), dilation=(2, 1), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU(inplace)
          )
          (matrix12_s): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), dilation=(2, 1), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU(inplace)
          )
          (matrix21_k): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), dilation=(1, 2), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU(inplace)
          )
          (matrix21_s): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), dilation=(1, 2), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU(inplace)
          )
        )
        (cls_dw): GroupDW()
        (reg_dw): GroupDW()
        (conf_fusion): Conf_Fusion(
          (conf_gen): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU(inplace)
          )
          (value_gen): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU(inplace)
          )
        )
        (bbox_tower): Sequential(
          (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
          (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (5): ReLU()
          (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (8): ReLU()
          (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (10): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (11): ReLU()
        )
        (cls_tower): Sequential(
          (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
          (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (5): ReLU()
          (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (8): ReLU()
          (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (10): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (11): ReLU()
        )
        (cls_memory_tower): Sequential(
          (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
          (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (5): ReLU()
          (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (8): ReLU()
          (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (10): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (11): ReLU()
        )
        (bbox_pred): Conv2d(256, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (cls_pred): Conv2d(256, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (cls_memory_pred): Conv2d(256, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
    load pretrained model from ./pretrain/imagenet_pretrain.model
    remove prefix 'module.'
    remove prefix 'feature_extractor.'
    missing keys:['connect_model.reg_encode.matrix11_k.1.bias', 'connect_model.cls_encode.matrix21_k.1.running_var', 'connect_model.bbox_tower.10.bias', 'connect_model.cls_encode.matrix12_s.0.weight', 'connect_model.bbox_tower.1.running_var', 'connect_model.bbox_tower.7.running_var', 'connect_model.cls_memory_pred.bias', 'connect_model.cls_tower.7.weight', 'connect_model.bbox_tower.3.bias', 'connect_model.bias', 'connect_model.cls_memory_tower.6.weight', 'connect_model.cls_memory_tower.7.running_mean', 'connect_model.cls_encode.matrix11_s.0.weight', 'connect_model.cls_encode.matrix11_s.1.bias', 'connect_model.bbox_tower.4.bias', 'connect_model.bbox_tower.9.bias', 'connect_model.bbox_tower.4.running_var', 'connect_model.conf_fusion.value_gen.1.weight', 'connect_model.cls_encode.matrix21_k.0.weight', 'connect_model.conf_fusion.conf_gen.0.weight', 'connect_model.bbox_tower.7.running_mean', 'connect_model.bbox_tower.0.bias', 'connect_model.reg_encode.matrix21_k.0.weight', 'connect_model.cls_tower.1.running_mean', 'connect_model.reg_encode.matrix11_s.1.running_var', 'connect_model.reg_encode.matrix12_k.1.weight', 'connect_model.reg_encode.matrix11_s.1.bias', 'connect_model.bbox_tower.9.weight', 'connect_model.cls_tower.1.bias', 'connect_model.bbox_tower.6.weight', 'connect_model.cls_encode.matrix12_k.1.running_var', 'neck.downsample.1.running_var', 'connect_model.cls_encode.matrix21_k.1.bias', 'connect_model.reg_encode.matrix21_s.1.weight', 'connect_model.conf_fusion.value_gen.1.running_mean', 'connect_model.cls_encode.matrix21_s.1.bias', 'connect_model.cls_encode.matrix12_k.1.running_mean', 'connect_model.cls_tower.10.running_var', 'connect_model.cls_memory_tower.3.weight', 'connect_model.cls_memory_tower.4.weight', 'connect_model.conf_fusion.value_gen.0.weight', 'connect_model.bbox_tower.0.weight', 'connect_model.cls_memory_tower.4.running_var', 'connect_model.cls_encode.matrix12_k.1.weight', 'neck.downsample.1.bias', 'connect_model.cls_encode.matrix12_s.1.running_mean', 'connect_model.conf_fusion.conf_gen.1.running_mean', 'connect_model.cls_encode.matrix11_k.0.weight', 'connect_model.reg_encode.matrix12_k.1.running_var', 'connect_model.cls_tower.0.weight', 'connect_model.reg_dw.weight', 'connect_model.cls_encode.matrix12_k.0.weight', 'connect_model.reg_encode.matrix11_k.1.weight', 'connect_model.cls_encode.matrix21_s.1.running_var', 'connect_model.conf_fusion.value_gen.1.bias', 'connect_model.cls_tower.9.weight', 'connect_model.reg_encode.matrix12_s.1.running_mean', 'connect_model.reg_encode.matrix21_k.1.weight', 'connect_model.reg_encode.matrix11_s.1.running_mean', 'connect_model.cls_encode.matrix11_s.1.running_var', 'connect_model.cls_tower.9.bias', 'connect_model.bbox_pred.weight', 'connect_model.cls_encode.matrix11_k.1.bias', 'connect_model.cls_tower.3.bias', 'neck.downsample.0.weight', 'connect_model.bbox_tower.1.running_mean', 'connect_model.reg_encode.matrix12_k.1.bias', 'connect_model.reg_encode.matrix12_s.0.weight', 'connect_model.bbox_tower.1.weight', 'connect_model.bbox_tower.4.weight', 'connect_model.cls_tower.7.running_var', 'connect_model.conf_fusion.conf_gen.0.bias', 'connect_model.cls_encode.matrix21_s.1.weight', 'connect_model.cls_pred.bias', 'connect_model.cls_dw.weight', 'connect_model.adjust', 'connect_model.reg_encode.matrix21_k.1.bias', 'connect_model.conf_fusion.value_gen.1.running_var', 'connect_model.cls_memory_tower.1.bias', 'connect_model.reg_encode.matrix21_s.1.running_mean', 'connect_model.cls_encode.matrix21_s.0.weight', 'connect_model.conf_fusion.conf_gen.1.running_var', 'connect_model.cls_memory_tower.0.bias', 'connect_model.cls_tower.10.bias', 'connect_model.bbox_tower.10.running_var', 'connect_model.cls_memory_tower.9.bias', 'connect_model.cls_encode.matrix11_s.1.weight', 'connect_model.reg_encode.matrix12_s.1.weight', 'connect_model.reg_encode.matrix12_s.1.running_var', 'connect_model.cls_encode.matrix12_s.1.bias', 'connect_model.cls_pred.weight', 'connect_model.cls_memory_tower.7.weight', 'connect_model.cls_tower.4.weight', 'connect_model.cls_memory_tower.6.bias', 'connect_model.reg_encode.matrix11_s.0.weight', 'connect_model.bbox_tower.6.bias', 'connect_model.cls_memory_tower.4.bias', 'connect_model.cls_tower.6.bias', 'connect_model.cls_memory_tower.0.weight', 'connect_model.cls_tower.4.running_mean', 'connect_model.cls_encode.matrix21_k.1.running_mean', 'connect_model.cls_encode.matrix12_s.1.running_var', 'connect_model.cls_tower.6.weight', 'connect_model.cls_tower.4.running_var', 'connect_model.cls_memory_tower.7.bias', 'connect_model.cls_tower.1.running_var', 'connect_model.cls_tower.7.running_mean', 'neck.downsample.1.running_mean', 'connect_model.cls_tower.0.bias', 'connect_model.cls_memory_pred.weight', 'connect_model.cls_encode.matrix21_s.1.running_mean', 'connect_model.bbox_tower.4.running_mean', 'connect_model.cls_tower.10.running_mean', 'connect_model.cls_tower.10.weight', 'connect_model.conf_fusion.conf_gen.1.weight', 'connect_model.cls_tower.3.weight', 'connect_model.cls_memory_tower.10.weight', 'connect_model.cls_encode.matrix11_k.1.weight', 'connect_model.cls_memory_tower.1.running_mean', 'connect_model.bbox_tower.10.running_mean', 'connect_model.bbox_pred.bias', 'connect_model.reg_encode.matrix11_k.1.running_mean', 'connect_model.bbox_tower.7.bias', 'connect_model.cls_memory_tower.3.bias', 'connect_model.bbox_tower.1.bias', 'connect_model.reg_encode.matrix21_k.1.running_var', 'connect_model.cls_memory_tower.9.weight', 'connect_model.reg_encode.matrix12_k.1.running_mean', 'connect_model.conf_fusion.value_gen.0.bias', 'connect_model.reg_encode.matrix11_k.0.weight', 'connect_model.bbox_tower.7.weight', 'connect_model.reg_encode.matrix12_s.1.bias', 'connect_model.cls_memory_tower.10.bias', 'connect_model.reg_encode.matrix12_k.0.weight', 'connect_model.cls_encode.matrix11_k.1.running_var', 'connect_model.cls_tower.4.bias', 'connect_model.cls_encode.matrix11_k.1.running_mean', 'connect_model.reg_encode.matrix21_s.1.running_var', 'connect_model.cls_tower.1.weight', 'connect_model.cls_encode.matrix12_k.1.bias', 'neck.downsample.1.weight', 'connect_model.cls_encode.matrix11_s.1.running_mean', 'connect_model.reg_encode.matrix11_s.1.weight', 'connect_model.conf_fusion.conf_gen.1.bias', 'connect_model.reg_encode.matrix21_s.0.weight', 'connect_model.cls_memory_tower.10.running_var', 'connect_model.cls_memory_tower.1.weight', 'connect_model.reg_encode.matrix21_s.1.bias', 'connect_model.reg_encode.matrix11_k.1.running_var', 'connect_model.reg_encode.matrix21_k.1.running_mean', 'connect_model.cls_memory_tower.4.running_mean', 'connect_model.cls_memory_tower.10.running_mean', 'connect_model.cls_encode.matrix21_k.1.weight', 'connect_model.cls_tower.7.bias', 'connect_model.cls_memory_tower.1.running_var', 'connect_model.bbox_tower.3.weight', 'connect_model.bbox_tower.10.weight', 'connect_model.cls_memory_tower.7.running_var', 'connect_model.cls_encode.matrix12_s.1.weight']
    unused checkpoint keys:['features.features.layer4.0.bn2.bias', 'features.features.layer4.0.bn2.weight', 'features.features.layer4.1.bn1.bias', 'features.features.layer4.2.bn2.weight', 'features.features.layer4.0.conv1.weight', 'features.features.layer4.0.bn2.running_mean', 'features.features.layer4.2.conv2.weight', 'features.features.layer4.2.bn1.running_mean', 'features.features.layer4.0.bn1.weight', 'features.features.layer4.1.bn1.weight', 'features.features.layer4.2.bn3.bias', 'features.features.layer4.2.bn3.running_mean', 'features.features.layer4.1.bn2.running_var', 'features.features.layer4.0.downsample.1.running_var', 'features.features.layer4.0.bn3.running_mean', 'features.features.layer4.0.bn3.bias', 'features.features.layer4.2.bn3.weight', 'features.features.layer4.2.bn1.running_var', 'features.features.layer4.1.conv3.weight', 'features.features.layer4.1.bn1.running_var', 'features.features.layer4.1.conv2.weight', 'features.features.layer4.1.bn2.weight', 'features.features.layer4.1.bn1.running_mean', 'features.features.layer4.1.bn3.running_var', 'features.features.layer4.0.downsample.1.bias', 'features.features.layer4.2.conv3.weight', 'features.features.layer4.2.bn2.running_var', 'features.features.layer4.0.bn3.weight', 'features.features.layer4.0.bn2.running_var', 'features.features.layer4.2.bn2.running_mean', 'features.features.layer4.1.bn2.running_mean', 'features.features.layer4.1.bn3.weight', 'features.features.layer4.0.downsample.0.weight', 'features.features.layer4.0.bn3.running_var', 'features.features.layer4.1.bn3.bias', 'features.features.layer4.1.bn2.bias', 'features.features.layer4.2.bn1.bias', 'features.features.layer4.0.conv2.weight', 'features.features.layer4.2.bn2.bias', 'features.features.layer4.0.bn1.running_mean', 'features.features.layer4.0.bn1.bias', 'features.features.layer4.0.bn1.running_var', 'features.features.layer4.2.bn1.weight', 'features.features.layer4.2.conv1.weight', 'features.features.layer4.0.conv3.weight', 'features.features.layer4.0.downsample.1.running_mean', 'features.features.layer4.1.conv1.weight', 'features.features.layer4.0.downsample.1.weight', 'features.features.layer4.2.bn3.running_var', 'features.features.layer4.1.bn3.running_mean']
    ==========first check trainable==========
    {'params': <filter object at 0x7fab08023050>, 'lr': 0.0005}
    {'params': <generator object Module.parameters at 0x7fab080517d0>, 'lr': 0.005}
    {'params': <generator object Module.parameters at 0x7fab080519d0>, 'lr': 0.005}
    ==========double check trainable==========
    trainable params:
    neck.downsample.0.weight
    neck.downsample.1.weight
    neck.downsample.1.bias
    connect_model.adjust
    connect_model.bias
    connect_model.cls_encode.matrix11_k.0.weight
    connect_model.cls_encode.matrix11_k.1.weight
    connect_model.cls_encode.matrix11_k.1.bias
    connect_model.cls_encode.matrix11_s.0.weight
    connect_model.cls_encode.matrix11_s.1.weight
    connect_model.cls_encode.matrix11_s.1.bias
    connect_model.cls_encode.matrix12_k.0.weight
    connect_model.cls_encode.matrix12_k.1.weight
    connect_model.cls_encode.matrix12_k.1.bias
    connect_model.cls_encode.matrix12_s.0.weight
    connect_model.cls_encode.matrix12_s.1.weight
    connect_model.cls_encode.matrix12_s.1.bias
    connect_model.cls_encode.matrix21_k.0.weight
    connect_model.cls_encode.matrix21_k.1.weight
    connect_model.cls_encode.matrix21_k.1.bias
    connect_model.cls_encode.matrix21_s.0.weight
    connect_model.cls_encode.matrix21_s.1.weight
    connect_model.cls_encode.matrix21_s.1.bias
    connect_model.reg_encode.matrix11_k.0.weight
    connect_model.reg_encode.matrix11_k.1.weight
    connect_model.reg_encode.matrix11_k.1.bias
    connect_model.reg_encode.matrix11_s.0.weight
    connect_model.reg_encode.matrix11_s.1.weight
    connect_model.reg_encode.matrix11_s.1.bias
    connect_model.reg_encode.matrix12_k.0.weight
    connect_model.reg_encode.matrix12_k.1.weight
    connect_model.reg_encode.matrix12_k.1.bias
    connect_model.reg_encode.matrix12_s.0.weight
    connect_model.reg_encode.matrix12_s.1.weight
    connect_model.reg_encode.matrix12_s.1.bias
    connect_model.reg_encode.matrix21_k.0.weight
    connect_model.reg_encode.matrix21_k.1.weight
    connect_model.reg_encode.matrix21_k.1.bias
    connect_model.reg_encode.matrix21_s.0.weight
    connect_model.reg_encode.matrix21_s.1.weight
    connect_model.reg_encode.matrix21_s.1.bias
    connect_model.cls_dw.weight
    connect_model.reg_dw.weight
    connect_model.conf_fusion.conf_gen.0.weight
    connect_model.conf_fusion.conf_gen.0.bias
    connect_model.conf_fusion.conf_gen.1.weight
    connect_model.conf_fusion.conf_gen.1.bias
    connect_model.conf_fusion.value_gen.0.weight
    connect_model.conf_fusion.value_gen.0.bias
    connect_model.conf_fusion.value_gen.1.weight
    connect_model.conf_fusion.value_gen.1.bias
    connect_model.bbox_tower.0.weight
    connect_model.bbox_tower.0.bias
    connect_model.bbox_tower.1.weight
    connect_model.bbox_tower.1.bias
    connect_model.bbox_tower.3.weight
    connect_model.bbox_tower.3.bias
    connect_model.bbox_tower.4.weight
    connect_model.bbox_tower.4.bias
    connect_model.bbox_tower.6.weight
    connect_model.bbox_tower.6.bias
    connect_model.bbox_tower.7.weight
    connect_model.bbox_tower.7.bias
    connect_model.bbox_tower.9.weight
    connect_model.bbox_tower.9.bias
    connect_model.bbox_tower.10.weight
    connect_model.bbox_tower.10.bias
    connect_model.cls_tower.0.weight
    connect_model.cls_tower.0.bias
    connect_model.cls_tower.1.weight
    connect_model.cls_tower.1.bias
    connect_model.cls_tower.3.weight
    connect_model.cls_tower.3.bias
    connect_model.cls_tower.4.weight
    connect_model.cls_tower.4.bias
    connect_model.cls_tower.6.weight
    connect_model.cls_tower.6.bias
    connect_model.cls_tower.7.weight
    connect_model.cls_tower.7.bias
    connect_model.cls_tower.9.weight
    connect_model.cls_tower.9.bias
    connect_model.cls_tower.10.weight
    connect_model.cls_tower.10.bias
    connect_model.cls_memory_tower.0.weight
    connect_model.cls_memory_tower.0.bias
    connect_model.cls_memory_tower.1.weight
    connect_model.cls_memory_tower.1.bias
    connect_model.cls_memory_tower.3.weight
    connect_model.cls_memory_tower.3.bias
    connect_model.cls_memory_tower.4.weight
    connect_model.cls_memory_tower.4.bias
    connect_model.cls_memory_tower.6.weight
    connect_model.cls_memory_tower.6.bias
    connect_model.cls_memory_tower.7.weight
    connect_model.cls_memory_tower.7.bias
    connect_model.cls_memory_tower.9.weight
    connect_model.cls_memory_tower.9.bias
    connect_model.cls_memory_tower.10.weight
    connect_model.cls_memory_tower.10.bias
    connect_model.bbox_pred.weight
    connect_model.bbox_pred.bias
    connect_model.cls_pred.weight
    connect_model.cls_pred.bias
    connect_model.cls_memory_pred.weight
    connect_model.cls_memory_pred.bias
    GPU NUM:  1
    (WarmUPScheduler) lr spaces: 
    [2.50000000e-03 2.87174589e-03 3.29876978e-03 3.78929142e-03
     4.35275282e-03 5.00000000e-03 3.97242620e-03 3.15603398e-03
     2.50742241e-03 1.99211009e-03 1.58270207e-03 1.25743343e-03
     9.99012299e-04 7.93700526e-04 6.30583352e-04 5.00989166e-04
     3.98028497e-04 3.16227766e-04 2.51238292e-04 1.99605115e-04
     1.58583318e-04 1.25992105e-04 1.00098868e-04 7.95270729e-05
     6.31830855e-05 5.01980288e-05 3.98815930e-05 3.16853369e-05
     2.51735325e-05 2.00000000e-05]
    model prepare done
    train datas: ['GOT10K']
    ############################################################
    <class 'list'>
    ############################################################
    Traceback (most recent call last):
      File "./scripts/train_usot.py", line 367, in <module>
        main()
      File "./scripts/train_usot.py", line 328, in main
        train_set = USOTDataset(config)
      File "/content/drive/MyDrive/USOT/USOT-main/lib/dataset_loader/datasets_usot.py", line 105, in __init__
        self.video_quality, self.far_sample)
      File "/content/drive/MyDrive/USOT/USOT-main/lib/dataset_loader/datasets_usot.py", line 472, in __init__
        self._clean()
      File "/content/drive/MyDrive/USOT/USOT-main/lib/dataset_loader/datasets_usot.py", line 498, in _clean
        frames = self.labels[video]
    TypeError: list indices must be integers or slices, not list
    

    Using google collab

    opened by nadavmisgav 1
  • Data set related issues

    Data set related issues

    The got10_flow.json file I downloaded directly from Google cloud drive only reached 9334 in the end, but the GOT10k training data set has 9335 files, may I ask what is the situation?

    opened by zrz1018 1
  • why detach(stop-gradient) the boxes for PrPooling

    why detach(stop-gradient) the boxes for PrPooling

    The contribution of precise ROI-pooling can be concluded as: pooling more precise ROI features and can backpropagate the gradients to the coordinates.

    We find you detach the boxes when pooling the results in the training pipeline at https://github.com/VISION-SJTU/USOT/blob/main/lib/models/models.py#L273. that's somewhat confusing for using cycle consistency to train an unsupervised tracking model. If dont detach these boxes, the model may acquire more info from the video-level training and can be better trained ? But now this part is detached, the model seems like the conventional pair-wise trained siamese tracking model. So can we ask, why this part is detached ? what if this part is not detached?

    opened by FlorinShum 1
Owner
null
The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

DS3L This is the code for paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020. Setups The code is implem

Guolz 36 Oct 19, 2022
[ICML 2021] Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data

Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data This repo provides the source code & data of our paper: Break-It-Fix-It: Unsupervised

Michihiro Yasunaga 86 Nov 30, 2022
The NEOSSat is a dual-mission microsatellite designed to detect potentially hazardous Earth-orbit-crossing asteroids and track objects that reside in deep space

The NEOSSat is a dual-mission microsatellite designed to detect potentially hazardous Earth-orbit-crossing asteroids and track objects that reside in deep space

John Salib 2 Jan 30, 2022
Ranking Models in Unlabeled New Environments (iccv21)

Ranking Models in Unlabeled New Environments Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch 1.7.0 + torchivision 0.8.1

null 14 Dec 17, 2021
A DeepStack custom model for detecting common objects in dark/night images and videos.

DeepStack_ExDark This repository provides a custom DeepStack model that has been trained and can be used for creating a new object detection API for d

MOSES OLAFENWA 98 Dec 24, 2022
Search and filter videos based on objects that appear in them using convolutional neural networks

Thingscoop: Utility for searching and filtering videos based on their content Description Thingscoop is a command-line utility for analyzing videos se

Anastasis Germanidis 354 Dec 4, 2022
Parametric Contrastive Learning (ICCV2021)

Parametric-Contrastive-Learning This repository contains the implementation code for ICCV2021 paper: Parametric Contrastive Learning (https://arxiv.or

DV Lab 156 Dec 21, 2022
Semi-Supervised Learning, Object Detection, ICCV2021

End-to-End Semi-Supervised Object Detection with Soft Teacher By Mengde Xu*, Zheng Zhang*, Han Hu, Jianfeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai,

Microsoft 789 Dec 27, 2022
Dynamic Attentive Graph Learning for Image Restoration, ICCV2021 [PyTorch Code]

Dynamic Attentive Graph Learning for Image Restoration This repository is for GATIR introduced in the following paper: Chong Mou, Jian Zhang, Zhuoyuan

Jian Zhang 84 Dec 9, 2022
Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

[ICCV2021] Learning to Regress Bodies from Images using Differentiable Semantic Rendering Getting Started DSR has been implemented and tested on Ubunt

Sai Kumar Dwivedi 83 Nov 27, 2022
PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos. By adopting a unified pipeline-based API design, PyKale enforces standardization and minimalism, via reusing existing resources, reducing repetitions and redundancy, and recycling learning models across areas.

PyKale 370 Dec 27, 2022
Official PyTorch Implementation of Rank & Sort Loss [ICCV2021]

Rank & Sort Loss for Object Detection and Instance Segmentation The official implementation of Rank & Sort Loss. Our implementation is based on mmdete

Kemal Oksuz 229 Dec 20, 2022
source code of “Visual Saliency Transformer” (ICCV2021)

Visual Saliency Transformer (VST) source code for our ICCV 2021 paper “Visual Saliency Transformer” by Nian Liu, Ni Zhang, Kaiyuan Wan, Junwei Han, an

null 89 Dec 21, 2022
HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)

HiFT: Hierarchical Feature Transformer for Aerial Tracking Ziang Cao, Changhong Fu, Junjie Ye, Bowen Li, and Yiming Li Our paper is Accepted by ICCV 2

Intelligent Vision for Robotics in Complex Environment 55 Nov 23, 2022
Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation Created by Zeyu HU Introduction This work is based on our paper VMNet: Voxel-Mes

HU Zeyu 82 Dec 27, 2022
Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Implicit Internal Video Inpainting Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation paper | project

null 202 Dec 30, 2022
This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

GMPQ: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation This is the pytorch implementation for the paper: Generalizable Mix

null 18 Sep 2, 2022
Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

Lucas 103 Dec 14, 2022
Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment (ICCV2021)

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment This is a pytorch project for the paper Seeing Dynamic Scene i

DV Lab 21 Nov 28, 2022