ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

Related tags

Deep Learning ISTR
Overview

This is the project page for the paper:

ISTR: End-to-End Instance Segmentation via Transformers,
Jie Hu, Liujuan Cao, Yao Lu, ShengChuan Zhang, Yan Wang, Ke Li, Feiyue Huang, Ling Shao, Rongrong Ji,
arXiv 2105.00637

Highlights:

  • GPU Friendly: Four 1080Ti/2080Ti GPUs can handle the training for R50, R101 backbones with ISTR.
  • High Performance: On COCO test-dev, ISTR-R50-3x gets 46.8/38.6 box/mask AP, and ISTR-R101-3x gets 48.1/39.9 box/mask AP.

Updates

  • (2021.05.03) The project page for ISTR is avaliable.

Models

Method inf. time box AP mask AP download
ISTR-R50-3x 17.8 FPS 46.8 38.6 model | log
ISTR-R101-3x 13.9 FPS 48.1 39.9 model | log
  • The inference time is evaluated with a single 2080Ti GPU.
  • We use the models pre-trained on ImageNet using torchvision. The ImageNet pre-trained ResNet-101 backbone is obtained from SparseR-CNN.

Installation

The codes are built on top of Detectron2, SparseR-CNN, and AdelaiDet.

Requirements

  • Python=3.8
  • PyTorch=1.6.0, torchvision=0.7.0, cudatoolkit=10.1
  • OpenCV for visualization

Steps

  1. Install the repository (we recommend to use Anaconda for installation.)
conda create -n ISTR python=3.8 -y
conda activate ISTR
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
pip install opencv-python
pip install scipy
pip install shapely
git clone https://github.com/hujiecpp/ISTR.git
cd ISTR
python setup.py build develop
  1. Link coco dataset path
ln -s /coco_dataset_path/coco ./datasets
  1. Train ISTR (e.g., with ResNet50 backbone)
python projects/ISTR/train_net.py --num-gpus 4 --config-file projects/ISTR/configs/ISTR-R50-3x.yaml
  1. Evaluate ISTR (e.g., with ResNet50 backbone)
python projects/ISTR/train_net.py --num-gpus 4 --config-file projects/ISTR/configs/ISTR-R50-3x.yaml --eval-only MODEL.WEIGHTS ./output/model_final.pth
  1. Visualize the detection and segmentation results (e.g., with ResNet50 backbone)
python demo/demo.py --config-file projects/ISTR/configs/ISTR-R50-3x.yaml --input input1.jpg --output ./output --confidence-threshold 0.4 --opts MODEL.WEIGHTS ./output/model_final.pth

Citation

If our paper helps your research, please cite it in your publications:

@article{hu2021ISTR,
  title={ISTR: End-to-End Instance Segmentation via Transformers},
  author={Hu, Jie and Cao, Liujuan and Lu, Yao and Zhang, ShengChuan and Li, Ke and Huang, Feiyue and Shao, Ling and Ji, Rongrong},
  journal={arXiv preprint arXiv:2105.00637},
  year={2021}
}
Comments
  •  pred_mask visualization cannot match the object

    pred_mask visualization cannot match the object

    I trained the model on my dataset,when run the 'demo.py' for testing, I got the result as following: 1-01-5-pred It seems like that I got the right prediction result,but the code for visualization have some problems, or else??

    opened by GuQingtao 5
  • PATH_COMPONENTS:

    PATH_COMPONENTS: "./projects/AE/checkpoints/AE_112_256.t7" not found

    I tried using config file of projects/ISTR/configs/ISTR-AE-swinL-3x.yaml. But the path components given in that config file is missing frorm GitHub. Is there any link of this file?

    opened by pd2871 2
  • Why the box ap is higher than Sparse R-CNN?

    Why the box ap is higher than Sparse R-CNN?

    @hujiecpp Hi! Thanks for opensourcing your code. It is a very amazing work. I do not understand Why the box ap is higher than Sparse R-CNN? From the code, I find you use the global image features for query feature learning. However, I add this into sparse rcnn (mmdet version) I can not find the improvement.

    opened by lxtGH 2
  • What is PATH_COMPONENTS ?

    What is PATH_COMPONENTS ?

    Hi, thanks for your awesome work.

    I try to retrain the network, while I found that cfg.MODEL.ISTR.PATH_COMPONENTS = "./datasets/coco/components/coco_2017_train_class_agnosticTrue_whitenTrue_sigmoidTrue_60.npz" However, I can't found it in this repo. How can I get it?

    Thanks

    opened by Len-Li 2
  • NMS module?

    NMS module?

    Hi @hujiecpp, thank you for making this project publicly available. I tried to apply this framework to an instance segmentation task of medical images. The prediction result looks good but some objects seem to be detected as multiple overlapped objects. This problem is solved by NMS in MASK RCNN. I checked the inference code but failed to find the way to overcome this problem. Is there any counterpart (like NMS) in ISTR to filter out extra predictions? I probably missed it in the manuscirpt, sorry about that. In advance thank you.

    opened by cao13jf 1
  • Mask Embeddings in DETR

    Mask Embeddings in DETR

    Hi, thanks for your awesome work.

    Have you tried the Mask Embeddings in DETR? I am very interested about the performance of this method on DETR.

    Thank you.

    opened by mxin262 0
  • Error in installation step

    Error in installation step

    Hello,

    Thank you very much for sharing the code for the model,

    I'm trying to follow the installation guide. However, when I run the command python setup.py build develop, I get the error below:

    Traceback (most recent call last):
      File "setup.py", line 174, in get_model_zoo_configs
        os.symlink(source_configs_dir, destination)
    FileExistsError: [Errno 17] File exists: '/home/thi/ISTR/configs' -> '/home/thi/ISTR/detectron2/model_zoo/configs'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "setup.py", line 198, in <module>
        package_data={"detectron2.model_zoo": get_model_zoo_configs()},
      File "setup.py", line 177, in get_model_zoo_configs
        shutil.copytree(source_configs_dir, destination)
      File "/home/thi/anaconda3/envs/ISTR/lib/python3.8/shutil.py", line 555, in copytree
        with os.scandir(src) as itr:
    FileNotFoundError: [Errno 2] No such file or directory: '/home/thi/ISTR/configs'
    
    

    Can you help me fix this, please? Thank you very much!

    opened by thiphan94 2
  • RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 1; 10.76 GiB total capacity; 9.75 GiB already allocated; 25.56 MiB free; 9.89 GiB reserved in total by PyTorch)

    RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 1; 10.76 GiB total capacity; 9.75 GiB already allocated; 25.56 MiB free; 9.89 GiB reserved in total by PyTorch)

    RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 1; 10.76 GiB total capacity; 9.75 GiB already allocated; 25.56 MiB free; 9.89 GiB reserved in total by PyTorch)

    how do you solve it?,my gpu is rtx 2080ti(memory 11G).

    opened by guangxuwang 0
  • RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch)

    RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch)

    dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False) [08/25 11:15:41 detectron2]: Contents of args.config_file=projects/ISTR/configs/ISTR-AE-R50-3x.yaml: BASE: "Base-ISTR.yaml" MODEL: WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" RESNETS: DEPTH: 50 STRIDE_IN_1X1: False ISTR: NUM_PROPOSALS: 300 NUM_CLASSES: 5 MASK_ENCODING_METHOD: "AE" PATH_COMPONENTS: "/content/drive/MyDrive/imenselmi/ISTR_TRAIN/ISTR/projects/AE/checkpoints/AE_112_256.t7" DATASETS: TRAIN: ("train",) TEST: ("val",) SOLVER: STEPS: (210000, 250000) MAX_ITER: 270000 INPUT: FORMAT: "RGB"

    [08/25 11:15:41 detectron2]: Running with full config: CUDNN_BENCHMARK: true DATALOADER: ASPECT_RATIO_GROUPING: true FILTER_EMPTY_ANNOTATIONS: true NUM_WORKERS: 4 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: [] PROPOSAL_FILES_TRAIN: [] TEST:

    • val TRAIN:
    • train GLOBAL: HACK: 1.0 INPUT: CROP: ENABLED: true SIZE:
      • 384
      • 600 TYPE: absolute_range FORMAT: RGB MASK_FORMAT: polygon MAX_SIZE_TEST: 1333 MAX_SIZE_TRAIN: 1333 MIN_SIZE_TEST: 800 MIN_SIZE_TRAIN:
    • 416
    • 448
    • 480
    • 512
    • 544
    • 576
    • 608
    • 640
    • 672
    • 704
    • 736
    • 768
    • 800
    • 832
    • 864
    • 896
    • 928
    • 960
    • 992
    • 1024
    • 1056
    • 1088 MIN_SIZE_TRAIN_SAMPLING: choice RANDOM_FLIP: horizontal LSJ_AUG: false MODEL: ANCHOR_GENERATOR: ANGLES:
        • -90
        • 0
        • 90 ASPECT_RATIOS:
        • 0.5
        • 1.0
        • 2.0 NAME: DefaultAnchorGenerator OFFSET: 0.0 SIZES:
        • 32
        • 64
        • 128
        • 256
        • 512 BACKBONE: FREEZE_AT: -1 NAME: build_resnet_fpn_backbone DEVICE: cuda FPN: FUSE_TYPE: sum IN_FEATURES:
      • res2
      • res3
      • res4
      • res5 NORM: '' OUT_CHANNELS: 256 ISTR: ALPHA: 0.25 CLASS_WEIGHT: 2.0 DEEP_SUPERVISION: true DIM_DYNAMIC: 64 DIM_FEEDFORWARD: 2048 DROPOUT: 0.0 FEAT_WEIGHT: 1.0 GAMMA: 2.0 GIOU_WEIGHT: 2.0 HIDDEN_DIM: 256 IOU_LABELS:
      • 0
      • 1 IOU_THRESHOLDS:
      • 0.5 L1_WEIGHT: 5.0 MASK_ENCODING_METHOD: AE MASK_FEAT_DIM: 256 MASK_SIZE: 112 MASK_WEIGHT: 5.0 NHEADS: 8 NO_OBJECT_WEIGHT: 0.1 NUM_CLASSES: 5 NUM_CLS: 3 NUM_DYNAMIC: 2 NUM_HEADS: 6 NUM_MASK: 3 NUM_PROPOSALS: 300 NUM_REG: 3 PATH_COMPONENTS: /content/drive/MyDrive/imenselmi/ISTR_TRAIN/ISTR/projects/AE/checkpoints/AE_112_256.t7 PRIOR_PROB: 0.01 KEYPOINT_ON: false LOAD_PROPOSALS: false MASK_ON: true META_ARCHITECTURE: ISTR PANOPTIC_FPN: COMBINE: ENABLED: true INSTANCES_CONFIDENCE_THRESH: 0.5 OVERLAP_THRESH: 0.5 STUFF_AREA_LIMIT: 4096 INSTANCE_LOSS_WEIGHT: 1.0 PIXEL_MEAN:
    • 123.675
    • 116.28
    • 103.53 PIXEL_STD:
    • 58.395
    • 57.12
    • 57.375 PROPOSAL_GENERATOR: MIN_SIZE: 0 NAME: RPN RESNETS: DEFORM_MODULATED: false DEFORM_NUM_GROUPS: 1 DEFORM_ON_PER_STAGE:
      • false
      • false
      • false
      • false DEPTH: 50 NORM: FrozenBN NUM_GROUPS: 1 OUT_FEATURES:
      • res2
      • res3
      • res4
      • res5 RES2_OUT_CHANNELS: 256 RES5_DILATION: 1 STEM_OUT_CHANNELS: 64 STRIDE_IN_1X1: false WIDTH_PER_GROUP: 64 RETINANET: BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_WEIGHTS: &id001
      • 1.0
      • 1.0
      • 1.0
      • 1.0 FOCAL_LOSS_ALPHA: 0.25 FOCAL_LOSS_GAMMA: 2.0 IN_FEATURES:
      • p3
      • p4
      • p5
      • p6
      • p7 IOU_LABELS:
      • 0
      • -1
      • 1 IOU_THRESHOLDS:
      • 0.4
      • 0.5 NMS_THRESH_TEST: 0.5 NORM: '' NUM_CLASSES: 80 NUM_CONVS: 4 PRIOR_PROB: 0.01 SCORE_THRESH_TEST: 0.05 SMOOTH_L1_LOSS_BETA: 0.1 TOPK_CANDIDATES_TEST: 1000 ROI_BOX_CASCADE_HEAD: BBOX_REG_WEIGHTS:
        • 10.0
        • 10.0
        • 5.0
        • 5.0
        • 20.0
        • 20.0
        • 10.0
        • 10.0
        • 30.0
        • 30.0
        • 15.0
        • 15.0 IOUS:
      • 0.5
      • 0.6
      • 0.7 ROI_BOX_HEAD: BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_LOSS_WEIGHT: 1.0 BBOX_REG_WEIGHTS:
      • 10.0
      • 10.0
      • 5.0
      • 5.0 CLS_AGNOSTIC_BBOX_REG: false CONV_DIM: 256 FC_DIM: 1024 NAME: '' NORM: '' NUM_CONV: 0 NUM_FC: 0 POOLER_RESOLUTION: 7 POOLER_SAMPLING_RATIO: 2 POOLER_TYPE: ROIAlignV2 SMOOTH_L1_BETA: 0.0 TRAIN_ON_PRED_BOXES: false ROI_HEADS: BATCH_SIZE_PER_IMAGE: 512 IN_FEATURES:
      • p2
      • p3
      • p4
      • p5 IOU_LABELS:
      • 0
      • 1 IOU_THRESHOLDS:
      • 0.5 NAME: Res5ROIHeads NMS_THRESH_TEST: 0.5 NUM_CLASSES: 80 POSITIVE_FRACTION: 0.25 PROPOSAL_APPEND_GT: true SCORE_THRESH_TEST: 0.05 ROI_KEYPOINT_HEAD: CONV_DIMS:
      • 512
      • 512
      • 512
      • 512
      • 512
      • 512
      • 512
      • 512 LOSS_WEIGHT: 1.0 MIN_KEYPOINTS_PER_IMAGE: 1 NAME: KRCNNConvDeconvUpsampleHead NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true NUM_KEYPOINTS: 17 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 ROI_MASK_HEAD: CLS_AGNOSTIC_MASK: false CONV_DIM: 256 NAME: MaskRCNNConvUpsampleHead NORM: '' NUM_CONV: 0 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 RPN: BATCH_SIZE_PER_IMAGE: 256 BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_LOSS_WEIGHT: 1.0 BBOX_REG_WEIGHTS: *id001 BOUNDARY_THRESH: -1 CONV_DIMS:
      • -1 HEAD_NAME: StandardRPNHead IN_FEATURES:
      • res4 IOU_LABELS:
      • 0
      • -1
      • 1 IOU_THRESHOLDS:
      • 0.3
      • 0.7 LOSS_WEIGHT: 1.0 NMS_THRESH: 0.7 POSITIVE_FRACTION: 0.5 POST_NMS_TOPK_TEST: 1000 POST_NMS_TOPK_TRAIN: 2000 PRE_NMS_TOPK_TEST: 6000 PRE_NMS_TOPK_TRAIN: 12000 SMOOTH_L1_BETA: 0.0 SEM_SEG_HEAD: COMMON_STRIDE: 4 CONVS_DIM: 128 IGNORE_VALUE: 255 IN_FEATURES:
      • p2
      • p3
      • p4
      • p5 LOSS_WEIGHT: 1.0 NAME: SemSegFPNHead NORM: GN NUM_CLASSES: 54 SWINT: APE: false DEPTHS:
      • 2
      • 2
      • 6
      • 2 DROP_PATH_RATE: 0.2 EMBED_DIM: 96 MLP_RATIO: 4 NUM_HEADS:
      • 3
      • 6
      • 12
      • 24 OUT_FEATURES:
      • stage2
      • stage3
      • stage4
      • stage5 WINDOW_SIZE: 7 WEIGHTS: detectron2://ImageNetPretrained/torchvision/R-50.pkl OUTPUT_DIR: ./output SEED: 2333333 SOLVER: AMP: ENABLED: false BACKBONE_MULTIPLIER: 1.0 BASE_LR: 2.5e-05 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 5000 CLIP_GRADIENTS: CLIP_TYPE: full_model CLIP_VALUE: 1.0 ENABLED: true NORM_TYPE: 2.0 GAMMA: 0.1 IMS_PER_BATCH: 16 LR_SCHEDULER_NAME: WarmupMultiStepLR MAX_ITER: 270000 MOMENTUM: 0.9 NESTEROV: false OPTIMIZER: ADAMW REFERENCE_WORLD_SIZE: 0 STEPS:
    • 210000
    • 250000 WARMUP_FACTOR: 0.01 WARMUP_ITERS: 1000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0001 WEIGHT_DECAY_BIAS: null WEIGHT_DECAY_NORM: 0.0 TEST: AUG: ENABLED: false FLIP: true MAX_SIZE: 4000 MIN_SIZES:
      • 400
      • 500
      • 600
      • 700
      • 800
      • 900
      • 1000
      • 1100
      • 1200 DETECTIONS_PER_IMAGE: 100 EVAL_PERIOD: 7330 EXPECTED_RESULTS: [] KEYPOINT_OKS_SIGMAS: [] PRECISE_BN: ENABLED: false NUM_ITER: 200 VERSION: 2 VIS_PERIOD: 0

    [08/25 11:15:41 detectron2]: Full config saved to ./output/config.yaml [08/25 11:15:47 d2.engine.defaults]: Model: ISTR( (backbone): FPN( (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (top_block): LastLevelMaxPool() (bottom_up): ResNet( (stem): BasicStem( (conv1): Conv2d( 3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) ) (res2): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv1): Conv2d( 64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) ) (res3): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv1): Conv2d( 256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) ) (res4): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) (conv1): Conv2d( 512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (4): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (5): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) ) (res5): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) (conv1): Conv2d( 1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) ) ) ) (pos_embeddings): Embedding(300, 256) (init_proposal_boxes): Embedding(300, 4) (IFE): ImgFeatExtractor() (mask_E): Encoder( (encoder): Sequential( (0): Conv2d(1, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(16, 32, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (8): ELU(alpha=True) (9): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (10): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (11): ELU(alpha=True) (12): Conv2d(128, 256, kernel_size=(7, 7), stride=(1, 1)) (13): View() ) ) (mask_D): Decoder( (decoder): Sequential( (0): View() (1): ConvTranspose2d(256, 128, kernel_size=(7, 7), stride=(1, 1)) (2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) (4): up_conv( (up): Sequential( (0): Upsample(scale_factor=2.0, mode=bilinear) (1): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) ) ) (5): up_conv( (up): Sequential( (0): Upsample(scale_factor=2.0, mode=bilinear) (1): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) ) ) (6): up_conv( (up): Sequential( (0): Upsample(scale_factor=2.0, mode=bilinear) (1): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) ) ) (7): up_conv( (up): Sequential( (0): Upsample(scale_factor=2.0, mode=bilinear) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) ) ) (8): Conv2d(16, 1, kernel_size=(1, 1), stride=(1, 1)) (9): Sigmoid() (10): View() ) ) (head): DynamicHead( (box_pooler): ROIPooler( (level_poolers): ModuleList( (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=2, aligned=True) (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=2, aligned=True) (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=2, aligned=True) (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=2, aligned=True) ) ) (mask_pooler): ROIPooler( (level_poolers): ModuleList( (0): ROIAlign(output_size=(28, 28), spatial_scale=0.25, sampling_ratio=2, aligned=True) (1): ROIAlign(output_size=(28, 28), spatial_scale=0.125, sampling_ratio=2, aligned=True) (2): ROIAlign(output_size=(28, 28), spatial_scale=0.0625, sampling_ratio=2, aligned=True) (3): ROIAlign(output_size=(28, 28), spatial_scale=0.03125, sampling_ratio=2, aligned=True) ) ) (head_series): ModuleList( (0): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (1): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (2): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (3): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (4): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (5): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) ) ) (criterion): SetCriterion( (matcher): HungarianMatcher() ) ) [08/25 11:15:53 d2.data.datasets.coco]: Loading /content/drive/MyDrive/imenselmi/ISTR_TRAIN/data/result/train.json takes 6.61 seconds. [08/25 11:15:54 d2.data.datasets.coco]: Loaded 43480 images in COCO format from /content/drive/MyDrive/imenselmi/ISTR_TRAIN/data/result/train.json [08/25 11:15:57 d2.data.build]: Removed 0 images with no usable annotations. 43480 images left. [08/25 11:15:59 d2.data.build]: Distribution of instances among all 5 categories: | category | #instances | category | #instances | category | #instances | |:-------------:|:-------------|:-------------:|:-------------|:-------------:|:-------------| | short_sleev.. | 18359 | long_sleeve.. | 14566 | long_sleeve.. | 10492 | | shorts | 12123 | trousers | 18227 | | | | total | 73767 | | | | |

    pos_embeddings.weight WARNING [08/25 11:16:04 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model: stem.fc.{bias, weight} [08/25 11:16:04 d2.engine.train_loop]: Starting training from iteration 0 /usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py:724: ShapelyDeprecationWarning: Iteration over multi-part geometries is deprecated and will be removed in Shapely 2.0. Use the geoms property to access the constituent parts of a multi-part geometry. for poly in cropped: /usr/local/lib/python3.7/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) ERROR [08/25 11:16:05 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 273, in run_step loss_dict = self.model(data) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/content/drive/.shortcut-targets-by-id/190HFmYfsGdKfNWeUiqnpTgh7X3m3GFmF/ISTR_TRAIN/ISTR/projects/ISTR/istr/inseg.py", line 162, in forward src = self.backbone(images.tensor) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward bottom_up_features = self.bottom_up(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 449, in forward x = stage(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 201, in forward out = self.conv3(out) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/wrappers.py", line 110, in forward x = self.norm(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/batch_norm.py", line 53, in forward return x * scale.to(out_dtype) + bias.to(out_dtype) RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch) [08/25 11:16:05 d2.engine.hooks]: Total training time: 0:00:01 (0:00:00 on hooks) [08/25 11:16:05 d2.utils.events]: iter: 0 lr: N/A max_mem: 14075M Traceback (most recent call last): File "projects/ISTR/train_net.py", line 136, in args=(args,), File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/launch.py", line 82, in launch main_func(*args) File "projects/ISTR/train_net.py", line 124, in main return trainer.train() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 484, in train super().train(self.start_iter, self.max_iter) File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 273, in run_step loss_dict = self.model(data) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/content/drive/.shortcut-targets-by-id/190HFmYfsGdKfNWeUiqnpTgh7X3m3GFmF/ISTR_TRAIN/ISTR/projects/ISTR/istr/inseg.py", line 162, in forward src = self.backbone(images.tensor) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward bottom_up_features = self.bottom_up(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 449, in forward x = stage(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 201, in forward out = self.conv3(out) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/wrappers.py", line 110, in forward x = self.norm(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/batch_norm.py", line 53, in forward return x * scale.to(out_dtype) + bias.to(out_dtype) RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch)

    opened by aymennturki 3
  • RuntimeError: CUDA out of memory.

    RuntimeError: CUDA out of memory.

    return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors) RuntimeError: CUDA out of memory. Tried to allocate 180.00 MiB (GPU 0; 10.92 GiB total capacity; 9.74 GiB already allocated; 57.56 MiB free; 9.90 GiB reserved in total by PyTorch)

    how can i solve this issue? which file should i change ?

    opened by maamouunn 1
Owner
Jie Hu
Phd Student, Xiamen University.
Jie Hu
This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Predicting Patient Outcomes with Graph Representation Learning This repository contains the code used for Predicting Patient Outcomes with Graph Repre

Emma Rocheteau 76 Dec 22, 2022
Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow 201 Dec 21, 2022
Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

[TCSVT] Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization LPN [Paper] NEWs Prerequisites Python 3.6 GPU Memory >= 8G Numpy > 1.

null 46 Dec 14, 2022
https://arxiv.org/abs/2102.11005

LogME LogME: Practical Assessment of Pre-trained Models for Transfer Learning How to use Just feed the features f and labels y to the function, and yo

THUML: Machine Learning Group @ THSS 149 Dec 19, 2022
Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561

Meta-Solver for Neural Ordinary Differential Equations Towards robust neural ODEs using parametrized solvers. Main idea Each Runge-Kutta (RK) solver w

Julia Gusak 25 Aug 12, 2021
Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

A Critical Assessment of State-of-the-Art in Entity Alignment This repository contains the source code for the paper A Critical Assessment of State-of

Max Berrendorf 16 Oct 14, 2022
[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PointDSC repository PyTorch implementation of PointDSC for CVPR'2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency",

null 153 Dec 14, 2022
Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement Recently, the power of unconditional image synthesis has significantly advanced th

null 967 Jan 4, 2023
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Representation Robustness Evaluations Our implementation is based on code from MadryLab's robustness package and Devon Hjelm's Deep InfoMax. For all t

Sicheng 19 Dec 7, 2022
Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Face Identity Disentanglement via Latent Space Mapping - Implement in pytorch with StyleGAN 2 Description Pytorch implementation of the paper Face Ide

Daniel Roich 58 Dec 24, 2022
Minimal implementation of PAWS (https://arxiv.org/abs/2104.13963) in TensorFlow.

PAWS-TF ?? Implementation of Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples (PAWS)

Sayak Paul 43 Jan 8, 2023
A PyTorch implementation of EventProp [https://arxiv.org/abs/2009.08378], a method to train Spiking Neural Networks

Spiking Neural Network training with EventProp This is an unofficial PyTorch implemenation of EventProp, a method to compute exact gradients for Spiki

Pedro Savarese 35 Jul 29, 2022
Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Swin-Transformer-Tensorflow A direct translation of the official PyTorch implementation of "Swin Transformer: Hierarchical Vision Transformer using Sh

null 52 Dec 29, 2022
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

null 458 Jan 2, 2023
Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

alias-free-gan-pytorch Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) This implementation

Kim Seonghyeon 502 Jan 3, 2023
source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics This work will be published in Nature Biomedical

International Business Machines 71 Nov 15, 2022
Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

Kalpesh Krishna 41 Nov 8, 2022