ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

Method	inf. time	box AP	mask AP	download
ISTR-R50-3x	17.8 FPS	46.8	38.6	model \| log
ISTR-R101-3x	13.9 FPS	48.1	39.9	model \| log

dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False) [08/25 11:15:41 detectron2]: Contents of args.config_file=projects/ISTR/configs/ISTR-AE-R50-3x.yaml: BASE: "Base-ISTR.yaml" MODEL: WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" RESNETS: DEPTH: 50 STRIDE_IN_1X1: False ISTR: NUM_PROPOSALS: 300 NUM_CLASSES: 5 MASK_ENCODING_METHOD: "AE" PATH_COMPONENTS: "/content/drive/MyDrive/imenselmi/ISTR_TRAIN/ISTR/projects/AE/checkpoints/AE_112_256.t7" DATASETS: TRAIN: ("train",) TEST: ("val",) SOLVER: STEPS: (210000, 250000) MAX_ITER: 270000 INPUT: FORMAT: "RGB"

[08/25 11:15:41 detectron2]: Running with full config: CUDNN_BENCHMARK: true DATALOADER: ASPECT_RATIO_GROUPING: true FILTER_EMPTY_ANNOTATIONS: true NUM_WORKERS: 4 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: [] PROPOSAL_FILES_TRAIN: [] TEST:

val TRAIN:
train GLOBAL: HACK: 1.0 INPUT: CROP: ENABLED: true SIZE:
- 384
- 600 TYPE: absolute_range FORMAT: RGB MASK_FORMAT: polygon MAX_SIZE_TEST: 1333 MAX_SIZE_TRAIN: 1333 MIN_SIZE_TEST: 800 MIN_SIZE_TRAIN:
416
448
480
512
544
576
608
640
672
704
736
768
800
832
864
896
928
960
992
1024
1056
1088 MIN_SIZE_TRAIN_SAMPLING: choice RANDOM_FLIP: horizontal LSJ_AUG: false MODEL: ANCHOR_GENERATOR: ANGLES:
- - -90
  - 0
  - 90 ASPECT_RATIOS:
- - 0.5
  - 1.0
  - 2.0 NAME: DefaultAnchorGenerator OFFSET: 0.0 SIZES:
- - 32
  - 64
  - 128
  - 256
  - 512 BACKBONE: FREEZE_AT: -1 NAME: build_resnet_fpn_backbone DEVICE: cuda FPN: FUSE_TYPE: sum IN_FEATURES:
- res2
- res3
- res4
- res5 NORM: '' OUT_CHANNELS: 256 ISTR: ALPHA: 0.25 CLASS_WEIGHT: 2.0 DEEP_SUPERVISION: true DIM_DYNAMIC: 64 DIM_FEEDFORWARD: 2048 DROPOUT: 0.0 FEAT_WEIGHT: 1.0 GAMMA: 2.0 GIOU_WEIGHT: 2.0 HIDDEN_DIM: 256 IOU_LABELS:
- 0
- 1 IOU_THRESHOLDS:
- 0.5 L1_WEIGHT: 5.0 MASK_ENCODING_METHOD: AE MASK_FEAT_DIM: 256 MASK_SIZE: 112 MASK_WEIGHT: 5.0 NHEADS: 8 NO_OBJECT_WEIGHT: 0.1 NUM_CLASSES: 5 NUM_CLS: 3 NUM_DYNAMIC: 2 NUM_HEADS: 6 NUM_MASK: 3 NUM_PROPOSALS: 300 NUM_REG: 3 PATH_COMPONENTS: /content/drive/MyDrive/imenselmi/ISTR_TRAIN/ISTR/projects/AE/checkpoints/AE_112_256.t7 PRIOR_PROB: 0.01 KEYPOINT_ON: false LOAD_PROPOSALS: false MASK_ON: true META_ARCHITECTURE: ISTR PANOPTIC_FPN: COMBINE: ENABLED: true INSTANCES_CONFIDENCE_THRESH: 0.5 OVERLAP_THRESH: 0.5 STUFF_AREA_LIMIT: 4096 INSTANCE_LOSS_WEIGHT: 1.0 PIXEL_MEAN:
123.675
116.28
103.53 PIXEL_STD:
58.395
57.12
57.375 PROPOSAL_GENERATOR: MIN_SIZE: 0 NAME: RPN RESNETS: DEFORM_MODULATED: false DEFORM_NUM_GROUPS: 1 DEFORM_ON_PER_STAGE:
- false
- false
- false
- false DEPTH: 50 NORM: FrozenBN NUM_GROUPS: 1 OUT_FEATURES:
- res2
- res3
- res4
- res5 RES2_OUT_CHANNELS: 256 RES5_DILATION: 1 STEM_OUT_CHANNELS: 64 STRIDE_IN_1X1: false WIDTH_PER_GROUP: 64 RETINANET: BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_WEIGHTS: &id001
- 1.0
- 1.0
- 1.0
- 1.0 FOCAL_LOSS_ALPHA: 0.25 FOCAL_LOSS_GAMMA: 2.0 IN_FEATURES:
- p3
- p4
- p5
- p6
- p7 IOU_LABELS:
- 0
- -1
- 1 IOU_THRESHOLDS:
- 0.4
- 0.5 NMS_THRESH_TEST: 0.5 NORM: '' NUM_CLASSES: 80 NUM_CONVS: 4 PRIOR_PROB: 0.01 SCORE_THRESH_TEST: 0.05 SMOOTH_L1_LOSS_BETA: 0.1 TOPK_CANDIDATES_TEST: 1000 ROI_BOX_CASCADE_HEAD: BBOX_REG_WEIGHTS:
- - 10.0
  - 10.0
  - 5.0
  - 5.0
- - 20.0
  - 20.0
  - 10.0
  - 10.0
- - 30.0
  - 30.0
  - 15.0
  - 15.0 IOUS:
- 0.5
- 0.6
- 0.7 ROI_BOX_HEAD: BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_LOSS_WEIGHT: 1.0 BBOX_REG_WEIGHTS:
- 10.0
- 10.0
- 5.0
- 5.0 CLS_AGNOSTIC_BBOX_REG: false CONV_DIM: 256 FC_DIM: 1024 NAME: '' NORM: '' NUM_CONV: 0 NUM_FC: 0 POOLER_RESOLUTION: 7 POOLER_SAMPLING_RATIO: 2 POOLER_TYPE: ROIAlignV2 SMOOTH_L1_BETA: 0.0 TRAIN_ON_PRED_BOXES: false ROI_HEADS: BATCH_SIZE_PER_IMAGE: 512 IN_FEATURES:
- p2
- p3
- p4
- p5 IOU_LABELS:
- 0
- 1 IOU_THRESHOLDS:
- 0.5 NAME: Res5ROIHeads NMS_THRESH_TEST: 0.5 NUM_CLASSES: 80 POSITIVE_FRACTION: 0.25 PROPOSAL_APPEND_GT: true SCORE_THRESH_TEST: 0.05 ROI_KEYPOINT_HEAD: CONV_DIMS:
- 512
- 512
- 512
- 512
- 512
- 512
- 512
- 512 LOSS_WEIGHT: 1.0 MIN_KEYPOINTS_PER_IMAGE: 1 NAME: KRCNNConvDeconvUpsampleHead NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true NUM_KEYPOINTS: 17 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 ROI_MASK_HEAD: CLS_AGNOSTIC_MASK: false CONV_DIM: 256 NAME: MaskRCNNConvUpsampleHead NORM: '' NUM_CONV: 0 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 RPN: BATCH_SIZE_PER_IMAGE: 256 BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_LOSS_WEIGHT: 1.0 BBOX_REG_WEIGHTS: *id001 BOUNDARY_THRESH: -1 CONV_DIMS:
- -1 HEAD_NAME: StandardRPNHead IN_FEATURES:
- res4 IOU_LABELS:
- 0
- -1
- 1 IOU_THRESHOLDS:
- 0.3
- 0.7 LOSS_WEIGHT: 1.0 NMS_THRESH: 0.7 POSITIVE_FRACTION: 0.5 POST_NMS_TOPK_TEST: 1000 POST_NMS_TOPK_TRAIN: 2000 PRE_NMS_TOPK_TEST: 6000 PRE_NMS_TOPK_TRAIN: 12000 SMOOTH_L1_BETA: 0.0 SEM_SEG_HEAD: COMMON_STRIDE: 4 CONVS_DIM: 128 IGNORE_VALUE: 255 IN_FEATURES:
- p2
- p3
- p4
- p5 LOSS_WEIGHT: 1.0 NAME: SemSegFPNHead NORM: GN NUM_CLASSES: 54 SWINT: APE: false DEPTHS:
- 2
- 2
- 6
- 2 DROP_PATH_RATE: 0.2 EMBED_DIM: 96 MLP_RATIO: 4 NUM_HEADS:
- 3
- 6
- 12
- 24 OUT_FEATURES:
- stage2
- stage3
- stage4
- stage5 WINDOW_SIZE: 7 WEIGHTS: detectron2://ImageNetPretrained/torchvision/R-50.pkl OUTPUT_DIR: ./output SEED: 2333333 SOLVER: AMP: ENABLED: false BACKBONE_MULTIPLIER: 1.0 BASE_LR: 2.5e-05 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 5000 CLIP_GRADIENTS: CLIP_TYPE: full_model CLIP_VALUE: 1.0 ENABLED: true NORM_TYPE: 2.0 GAMMA: 0.1 IMS_PER_BATCH: 16 LR_SCHEDULER_NAME: WarmupMultiStepLR MAX_ITER: 270000 MOMENTUM: 0.9 NESTEROV: false OPTIMIZER: ADAMW REFERENCE_WORLD_SIZE: 0 STEPS:
210000
250000 WARMUP_FACTOR: 0.01 WARMUP_ITERS: 1000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0001 WEIGHT_DECAY_BIAS: null WEIGHT_DECAY_NORM: 0.0 TEST: AUG: ENABLED: false FLIP: true MAX_SIZE: 4000 MIN_SIZES:
- 400
- 500
- 600
- 700
- 800
- 900
- 1000
- 1100
- 1200 DETECTIONS_PER_IMAGE: 100 EVAL_PERIOD: 7330 EXPECTED_RESULTS: [] KEYPOINT_OKS_SIGMAS: [] PRECISE_BN: ENABLED: false NUM_ITER: 200 VERSION: 2 VIS_PERIOD: 0

[08/25 11:15:41 detectron2]: Full config saved to ./output/config.yaml [08/25 11:15:47 d2.engine.defaults]: Model: ISTR( (backbone): FPN( (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (top_block): LastLevelMaxPool() (bottom_up): ResNet( (stem): BasicStem( (conv1): Conv2d( 3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) ) (res2): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv1): Conv2d( 64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) ) (res3): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv1): Conv2d( 256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) ) (res4): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) (conv1): Conv2d( 512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (4): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (5): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) ) (res5): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) (conv1): Conv2d( 1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) ) ) ) (pos_embeddings): Embedding(300, 256) (init_proposal_boxes): Embedding(300, 4) (IFE): ImgFeatExtractor() (mask_E): Encoder( (encoder): Sequential( (0): Conv2d(1, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(16, 32, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (8): ELU(alpha=True) (9): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (10): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (11): ELU(alpha=True) (12): Conv2d(128, 256, kernel_size=(7, 7), stride=(1, 1)) (13): View() ) ) (mask_D): Decoder( (decoder): Sequential( (0): View() (1): ConvTranspose2d(256, 128, kernel_size=(7, 7), stride=(1, 1)) (2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) (4): up_conv( (up): Sequential( (0): Upsample(scale_factor=2.0, mode=bilinear) (1): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) ) ) (5): up_conv( (up): Sequential( (0): Upsample(scale_factor=2.0, mode=bilinear) (1): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) ) ) (6): up_conv( (up): Sequential( (0): Upsample(scale_factor=2.0, mode=bilinear) (1): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) ) ) (7): up_conv( (up): Sequential( (0): Upsample(scale_factor=2.0, mode=bilinear) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) ) ) (8): Conv2d(16, 1, kernel_size=(1, 1), stride=(1, 1)) (9): Sigmoid() (10): View() ) ) (head): DynamicHead( (box_pooler): ROIPooler( (level_poolers): ModuleList( (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=2, aligned=True) (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=2, aligned=True) (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=2, aligned=True) (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=2, aligned=True) ) ) (mask_pooler): ROIPooler( (level_poolers): ModuleList( (0): ROIAlign(output_size=(28, 28), spatial_scale=0.25, sampling_ratio=2, aligned=True) (1): ROIAlign(output_size=(28, 28), spatial_scale=0.125, sampling_ratio=2, aligned=True) (2): ROIAlign(output_size=(28, 28), spatial_scale=0.0625, sampling_ratio=2, aligned=True) (3): ROIAlign(output_size=(28, 28), spatial_scale=0.03125, sampling_ratio=2, aligned=True) ) ) (head_series): ModuleList( (0): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (1): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (2): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (3): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (4): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (5): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) ) ) (criterion): SetCriterion( (matcher): HungarianMatcher() ) ) [08/25 11:15:53 d2.data.datasets.coco]: Loading /content/drive/MyDrive/imenselmi/ISTR_TRAIN/data/result/train.json takes 6.61 seconds. [08/25 11:15:54 d2.data.datasets.coco]: Loaded 43480 images in COCO format from /content/drive/MyDrive/imenselmi/ISTR_TRAIN/data/result/train.json [08/25 11:15:57 d2.data.build]: Removed 0 images with no usable annotations. 43480 images left. [08/25 11:15:59 d2.data.build]: Distribution of instances among all 5 categories: | category | #instances | category | #instances | category | #instances | |:-------------:|:-------------|:-------------:|:-------------|:-------------:|:-------------| | short_sleev.. | 18359 | long_sleeve.. | 14566 | long_sleeve.. | 10492 | | shorts | 12123 | trousers | 18227 | | | | total | 73767 | | | | |

pos_embeddings.weight WARNING [08/25 11:16:04 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model: stem.fc.{bias, weight} [08/25 11:16:04 d2.engine.train_loop]: Starting training from iteration 0 /usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py:724: ShapelyDeprecationWarning: Iteration over multi-part geometries is deprecated and will be removed in Shapely 2.0. Use the geoms property to access the constituent parts of a multi-part geometry. for poly in cropped: /usr/local/lib/python3.7/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) ERROR [08/25 11:16:05 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 273, in run_step loss_dict = self.model(data) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/content/drive/.shortcut-targets-by-id/190HFmYfsGdKfNWeUiqnpTgh7X3m3GFmF/ISTR_TRAIN/ISTR/projects/ISTR/istr/inseg.py", line 162, in forward src = self.backbone(images.tensor) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward bottom_up_features = self.bottom_up(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 449, in forward x = stage(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 201, in forward out = self.conv3(out) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/wrappers.py", line 110, in forward x = self.norm(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/batch_norm.py", line 53, in forward return x * scale.to(out_dtype) + bias.to(out_dtype) RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch) [08/25 11:16:05 d2.engine.hooks]: Total training time: 0:00:01 (0:00:00 on hooks) [08/25 11:16:05 d2.utils.events]: iter: 0 lr: N/A max_mem: 14075M Traceback (most recent call last): File "projects/ISTR/train_net.py", line 136, in args=(args,), File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/launch.py", line 82, in launch main_func(*args) File "projects/ISTR/train_net.py", line 124, in main return trainer.train() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 484, in train super().train(self.start_iter, self.max_iter) File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 273, in run_step loss_dict = self.model(data) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/content/drive/.shortcut-targets-by-id/190HFmYfsGdKfNWeUiqnpTgh7X3m3GFmF/ISTR_TRAIN/ISTR/projects/ISTR/istr/inseg.py", line 162, in forward src = self.backbone(images.tensor) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward bottom_up_features = self.bottom_up(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 449, in forward x = stage(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 201, in forward out = self.conv3(out) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/wrappers.py", line 110, in forward x = self.norm(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/batch_norm.py", line 53, in forward return x * scale.to(out_dtype) + bias.to(out_dtype) RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch)

pred_mask visualization cannot match the object

I trained the model on my dataset，when run the 'demo.py' for testing, I got the result as following: It seems like that I got the right prediction result,but the code for visualization have some problems, or else??

opened by GuQingtao 5
PATH_COMPONENTS: "./projects/AE/checkpoints/AE_112_256.t7" not found

I tried using config file of projects/ISTR/configs/ISTR-AE-swinL-3x.yaml. But the path components given in that config file is missing frorm GitHub. Is there any link of this file?

opened by pd2871 2
Why the box ap is higher than Sparse R-CNN?

@hujiecpp Hi! Thanks for opensourcing your code. It is a very amazing work. I do not understand Why the box ap is higher than Sparse R-CNN? From the code, I find you use the global image features for query feature learning. However, I add this into sparse rcnn (mmdet version) I can not find the improvement.

opened by lxtGH 2
What is PATH_COMPONENTS ?

Hi, thanks for your awesome work.

I try to retrain the network, while I found that cfg.MODEL.ISTR.PATH_COMPONENTS = "./datasets/coco/components/coco_2017_train_class_agnosticTrue_whitenTrue_sigmoidTrue_60.npz" However, I can't found it in this repo. How can I get it?

Thanks

opened by Len-Li 2
NMS module?

Hi @hujiecpp, thank you for making this project publicly available. I tried to apply this framework to an instance segmentation task of medical images. The prediction result looks good but some objects seem to be detected as multiple overlapped objects. This problem is solved by NMS in MASK RCNN. I checked the inference code but failed to find the way to overcome this problem. Is there any counterpart (like NMS) in ISTR to filter out extra predictions? I probably missed it in the manuscirpt, sorry about that. In advance thank you.

opened by cao13jf 1
Mask Embeddings in DETR

Hi, thanks for your awesome work.

Have you tried the Mask Embeddings in DETR? I am very interested about the performance of this method on DETR.

Thank you.

opened by mxin262 0

Error in installation step

Hello,

Thank you very much for sharing the code for the model,

I'm trying to follow the installation guide. However, when I run the command python setup.py build develop, I get the error below:

Traceback (most recent call last):
  File "setup.py", line 174, in get_model_zoo_configs
    os.symlink(source_configs_dir, destination)
FileExistsError: [Errno 17] File exists: '/home/thi/ISTR/configs' -> '/home/thi/ISTR/detectron2/model_zoo/configs'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "setup.py", line 198, in <module>
    package_data={"detectron2.model_zoo": get_model_zoo_configs()},
  File "setup.py", line 177, in get_model_zoo_configs
    shutil.copytree(source_configs_dir, destination)
  File "/home/thi/anaconda3/envs/ISTR/lib/python3.8/shutil.py", line 555, in copytree
    with os.scandir(src) as itr:
FileNotFoundError: [Errno 2] No such file or directory: '/home/thi/ISTR/configs'

Can you help me fix this, please? Thank you very much!

opened by thiphan94 2

RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 1; 10.76 GiB total capacity; 9.75 GiB already allocated; 25.56 MiB free; 9.89 GiB reserved in total by PyTorch)

RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 1; 10.76 GiB total capacity; 9.75 GiB already allocated; 25.56 MiB free; 9.89 GiB reserved in total by PyTorch)

how do you solve it？，my gpu is rtx 2080ti（memory 11G）.

opened by guangxuwang 0
RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch)
dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False) [08/25 11:15:41 detectron2]: Contents of args.config_file=projects/ISTR/configs/ISTR-AE-R50-3x.yaml: BASE: "Base-ISTR.yaml" MODEL: WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" RESNETS: DEPTH: 50 STRIDE_IN_1X1: False ISTR: NUM_PROPOSALS: 300 NUM_CLASSES: 5 MASK_ENCODING_METHOD: "AE" PATH_COMPONENTS: "/content/drive/MyDrive/imenselmi/ISTR_TRAIN/ISTR/projects/AE/checkpoints/AE_112_256.t7" DATASETS: TRAIN: ("train",) TEST: ("val",) SOLVER: STEPS: (210000, 250000) MAX_ITER: 270000 INPUT: FORMAT: "RGB"

[08/25 11:15:41 detectron2]: Running with full config: CUDNN_BENCHMARK: true DATALOADER: ASPECT_RATIO_GROUPING: true FILTER_EMPTY_ANNOTATIONS: true NUM_WORKERS: 4 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: [] PROPOSAL_FILES_TRAIN: [] TEST:

val TRAIN:

train GLOBAL: HACK: 1.0 INPUT: CROP: ENABLED: true SIZE:

384

600 TYPE: absolute_range FORMAT: RGB MASK_FORMAT: polygon MAX_SIZE_TEST: 1333 MAX_SIZE_TRAIN: 1333 MIN_SIZE_TEST: 800 MIN_SIZE_TRAIN:

416

448

480

512

544

576

608

640

672

704

736

768

800

832

864

896

928

960

992

1024

1056

1088 MIN_SIZE_TRAIN_SAMPLING: choice RANDOM_FLIP: horizontal LSJ_AUG: false MODEL: ANCHOR_GENERATOR: ANGLES:

-90

0

90 ASPECT_RATIOS:

0.5

1.0

2.0 NAME: DefaultAnchorGenerator OFFSET: 0.0 SIZES:

32

64

128

256

512 BACKBONE: FREEZE_AT: -1 NAME: build_resnet_fpn_backbone DEVICE: cuda FPN: FUSE_TYPE: sum IN_FEATURES:

res2

res3

res4

res5 NORM: '' OUT_CHANNELS: 256 ISTR: ALPHA: 0.25 CLASS_WEIGHT: 2.0 DEEP_SUPERVISION: true DIM_DYNAMIC: 64 DIM_FEEDFORWARD: 2048 DROPOUT: 0.0 FEAT_WEIGHT: 1.0 GAMMA: 2.0 GIOU_WEIGHT: 2.0 HIDDEN_DIM: 256 IOU_LABELS:

0

1 IOU_THRESHOLDS:

0.5 L1_WEIGHT: 5.0 MASK_ENCODING_METHOD: AE MASK_FEAT_DIM: 256 MASK_SIZE: 112 MASK_WEIGHT: 5.0 NHEADS: 8 NO_OBJECT_WEIGHT: 0.1 NUM_CLASSES: 5 NUM_CLS: 3 NUM_DYNAMIC: 2 NUM_HEADS: 6 NUM_MASK: 3 NUM_PROPOSALS: 300 NUM_REG: 3 PATH_COMPONENTS: /content/drive/MyDrive/imenselmi/ISTR_TRAIN/ISTR/projects/AE/checkpoints/AE_112_256.t7 PRIOR_PROB: 0.01 KEYPOINT_ON: false LOAD_PROPOSALS: false MASK_ON: true META_ARCHITECTURE: ISTR PANOPTIC_FPN: COMBINE: ENABLED: true INSTANCES_CONFIDENCE_THRESH: 0.5 OVERLAP_THRESH: 0.5 STUFF_AREA_LIMIT: 4096 INSTANCE_LOSS_WEIGHT: 1.0 PIXEL_MEAN:

123.675

116.28

103.53 PIXEL_STD:

58.395

57.12

57.375 PROPOSAL_GENERATOR: MIN_SIZE: 0 NAME: RPN RESNETS: DEFORM_MODULATED: false DEFORM_NUM_GROUPS: 1 DEFORM_ON_PER_STAGE:

false

false

false

false DEPTH: 50 NORM: FrozenBN NUM_GROUPS: 1 OUT_FEATURES:

res2

res3

res4

res5 RES2_OUT_CHANNELS: 256 RES5_DILATION: 1 STEM_OUT_CHANNELS: 64 STRIDE_IN_1X1: false WIDTH_PER_GROUP: 64 RETINANET: BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_WEIGHTS: &id001

1.0

1.0

1.0

1.0 FOCAL_LOSS_ALPHA: 0.25 FOCAL_LOSS_GAMMA: 2.0 IN_FEATURES:

p3

p4

p5

p6

p7 IOU_LABELS:

0

-1

1 IOU_THRESHOLDS:

0.4

0.5 NMS_THRESH_TEST: 0.5 NORM: '' NUM_CLASSES: 80 NUM_CONVS: 4 PRIOR_PROB: 0.01 SCORE_THRESH_TEST: 0.05 SMOOTH_L1_LOSS_BETA: 0.1 TOPK_CANDIDATES_TEST: 1000 ROI_BOX_CASCADE_HEAD: BBOX_REG_WEIGHTS:

10.0

10.0

5.0

5.0

20.0

20.0

10.0

10.0

30.0

30.0

15.0

15.0 IOUS:

0.5

0.6

0.7 ROI_BOX_HEAD: BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_LOSS_WEIGHT: 1.0 BBOX_REG_WEIGHTS:

10.0

10.0

5.0

5.0 CLS_AGNOSTIC_BBOX_REG: false CONV_DIM: 256 FC_DIM: 1024 NAME: '' NORM: '' NUM_CONV: 0 NUM_FC: 0 POOLER_RESOLUTION: 7 POOLER_SAMPLING_RATIO: 2 POOLER_TYPE: ROIAlignV2 SMOOTH_L1_BETA: 0.0 TRAIN_ON_PRED_BOXES: false ROI_HEADS: BATCH_SIZE_PER_IMAGE: 512 IN_FEATURES:

p2

p3

p4

p5 IOU_LABELS:

0

1 IOU_THRESHOLDS:

0.5 NAME: Res5ROIHeads NMS_THRESH_TEST: 0.5 NUM_CLASSES: 80 POSITIVE_FRACTION: 0.25 PROPOSAL_APPEND_GT: true SCORE_THRESH_TEST: 0.05 ROI_KEYPOINT_HEAD: CONV_DIMS:

512

512

512

512

512

512

512

512 LOSS_WEIGHT: 1.0 MIN_KEYPOINTS_PER_IMAGE: 1 NAME: KRCNNConvDeconvUpsampleHead NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true NUM_KEYPOINTS: 17 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 ROI_MASK_HEAD: CLS_AGNOSTIC_MASK: false CONV_DIM: 256 NAME: MaskRCNNConvUpsampleHead NORM: '' NUM_CONV: 0 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 RPN: BATCH_SIZE_PER_IMAGE: 256 BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_LOSS_WEIGHT: 1.0 BBOX_REG_WEIGHTS: *id001 BOUNDARY_THRESH: -1 CONV_DIMS:

-1 HEAD_NAME: StandardRPNHead IN_FEATURES:

res4 IOU_LABELS:

0

-1

1 IOU_THRESHOLDS:

0.3

0.7 LOSS_WEIGHT: 1.0 NMS_THRESH: 0.7 POSITIVE_FRACTION: 0.5 POST_NMS_TOPK_TEST: 1000 POST_NMS_TOPK_TRAIN: 2000 PRE_NMS_TOPK_TEST: 6000 PRE_NMS_TOPK_TRAIN: 12000 SMOOTH_L1_BETA: 0.0 SEM_SEG_HEAD: COMMON_STRIDE: 4 CONVS_DIM: 128 IGNORE_VALUE: 255 IN_FEATURES:

p2

p3

p4

p5 LOSS_WEIGHT: 1.0 NAME: SemSegFPNHead NORM: GN NUM_CLASSES: 54 SWINT: APE: false DEPTHS:

2

2

6

2 DROP_PATH_RATE: 0.2 EMBED_DIM: 96 MLP_RATIO: 4 NUM_HEADS:

3

6

12

24 OUT_FEATURES:

stage2

stage3

stage4

stage5 WINDOW_SIZE: 7 WEIGHTS: detectron2://ImageNetPretrained/torchvision/R-50.pkl OUTPUT_DIR: ./output SEED: 2333333 SOLVER: AMP: ENABLED: false BACKBONE_MULTIPLIER: 1.0 BASE_LR: 2.5e-05 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 5000 CLIP_GRADIENTS: CLIP_TYPE: full_model CLIP_VALUE: 1.0 ENABLED: true NORM_TYPE: 2.0 GAMMA: 0.1 IMS_PER_BATCH: 16 LR_SCHEDULER_NAME: WarmupMultiStepLR MAX_ITER: 270000 MOMENTUM: 0.9 NESTEROV: false OPTIMIZER: ADAMW REFERENCE_WORLD_SIZE: 0 STEPS:

210000

250000 WARMUP_FACTOR: 0.01 WARMUP_ITERS: 1000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0001 WEIGHT_DECAY_BIAS: null WEIGHT_DECAY_NORM: 0.0 TEST: AUG: ENABLED: false FLIP: true MAX_SIZE: 4000 MIN_SIZES:

400

500

600

700

800

900

1000

1100

1200 DETECTIONS_PER_IMAGE: 100 EVAL_PERIOD: 7330 EXPECTED_RESULTS: [] KEYPOINT_OKS_SIGMAS: [] PRECISE_BN: ENABLED: false NUM_ITER: 200 VERSION: 2 VIS_PERIOD: 0

[08/25 11:15:41 detectron2]: Full config saved to ./output/config.yaml [08/25 11:15:47 d2.engine.defaults]: Model: ISTR( (backbone): FPN( (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (top_block): LastLevelMaxPool() (bottom_up): ResNet( (stem): BasicStem( (conv1): Conv2d( 3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) ) (res2): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv1): Conv2d( 64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) ) (res3): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv1): Conv2d( 256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) ) (res4): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) (conv1): Conv2d( 512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (4): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (5): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) ) (res5): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) (conv1): Conv2d( 1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) ) ) ) (pos_embeddings): Embedding(300, 256) (init_proposal_boxes): Embedding(300, 4) (IFE): ImgFeatExtractor() (mask_E): Encoder( (encoder): Sequential( (0): Conv2d(1, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(16, 32, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (8): ELU(alpha=True) (9): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (10): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (11): ELU(alpha=True) (12): Conv2d(128, 256, kernel_size=(7, 7), stride=(1, 1)) (13): View() ) ) (mask_D): Decoder( (decoder): Sequential( (0): View() (1): ConvTranspose2d(256, 128, kernel_size=(7, 7), stride=(1, 1)) (2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) (4): up_conv( (up): Sequential( (0): Upsample(scale_factor=2.0, mode=bilinear) (1): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) ) ) (5): up_conv( (up): Sequential( (0): Upsample(scale_factor=2.0, mode=bilinear) (1): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) ) ) (6): up_conv( (up): Sequential( (0): Upsample(scale_factor=2.0, mode=bilinear) (1): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) ) ) (7): up_conv( (up): Sequential( (0): Upsample(scale_factor=2.0, mode=bilinear) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ELU(alpha=1.0, inplace=True) ) ) (8): Conv2d(16, 1, kernel_size=(1, 1), stride=(1, 1)) (9): Sigmoid() (10): View() ) ) (head): DynamicHead( (box_pooler): ROIPooler( (level_poolers): ModuleList( (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=2, aligned=True) (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=2, aligned=True) (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=2, aligned=True) (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=2, aligned=True) ) ) (mask_pooler): ROIPooler( (level_poolers): ModuleList( (0): ROIAlign(output_size=(28, 28), spatial_scale=0.25, sampling_ratio=2, aligned=True) (1): ROIAlign(output_size=(28, 28), spatial_scale=0.125, sampling_ratio=2, aligned=True) (2): ROIAlign(output_size=(28, 28), spatial_scale=0.0625, sampling_ratio=2, aligned=True) (3): ROIAlign(output_size=(28, 28), spatial_scale=0.03125, sampling_ratio=2, aligned=True) ) ) (head_series): ModuleList( (0): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (1): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (2): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (3): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (4): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) (5): RCNNHead( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (inst_interact): DynamicConv( (dynamic_layer): Linear(in_features=256, out_features=32768, bias=True) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (activation): ELU(alpha=1.0, inplace=True) (out_layer): Linear(in_features=12544, out_features=256, bias=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout1): Dropout(p=0.0, inplace=False) (dropout2): Dropout(p=0.0, inplace=False) (dropout3): Dropout(p=0.0, inplace=False) (activation): ELU(alpha=1.0, inplace=True) (cls_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (reg_module): ModuleList( (0): Linear(in_features=256, out_features=256, bias=False) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ELU(alpha=1.0, inplace=True) (3): Linear(in_features=256, out_features=256, bias=False) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ELU(alpha=1.0, inplace=True) (6): Linear(in_features=256, out_features=256, bias=False) (7): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (8): ELU(alpha=1.0, inplace=True) ) (mask_module): Sequential( (0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=True) (3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=True) (6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1)) ) (ret_roi_layer_1): conv_block( (conv): Sequential( (0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (ret_roi_layer_2): conv_block( (conv): Sequential( (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ELU(alpha=1.0, inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ELU(alpha=1.0, inplace=True) ) ) (class_logits): Linear(in_features=256, out_features=5, bias=True) (bboxes_delta): Linear(in_features=256, out_features=4, bias=True) ) ) ) (criterion): SetCriterion( (matcher): HungarianMatcher() ) ) [08/25 11:15:53 d2.data.datasets.coco]: Loading /content/drive/MyDrive/imenselmi/ISTR_TRAIN/data/result/train.json takes 6.61 seconds. [08/25 11:15:54 d2.data.datasets.coco]: Loaded 43480 images in COCO format from /content/drive/MyDrive/imenselmi/ISTR_TRAIN/data/result/train.json [08/25 11:15:57 d2.data.build]: Removed 0 images with no usable annotations. 43480 images left. [08/25 11:15:59 d2.data.build]: Distribution of instances among all 5 categories: | category | #instances | category | #instances | category | #instances | |:-------------:|:-------------|:-------------:|:-------------|:-------------:|:-------------| | short_sleev.. | 18359 | long_sleeve.. | 14566 | long_sleeve.. | 10492 | | shorts | 12123 | trousers | 18227 | | | | total | 73767 | | | | |

pos_embeddings.weight WARNING [08/25 11:16:04 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model: stem.fc.{bias, weight} [08/25 11:16:04 d2.engine.train_loop]: Starting training from iteration 0 /usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py:724: ShapelyDeprecationWarning: Iteration over multi-part geometries is deprecated and will be removed in Shapely 2.0. Use the geoms property to access the constituent parts of a multi-part geometry. for poly in cropped: /usr/local/lib/python3.7/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) ERROR [08/25 11:16:05 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 273, in run_step loss_dict = self.model(data) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/content/drive/.shortcut-targets-by-id/190HFmYfsGdKfNWeUiqnpTgh7X3m3GFmF/ISTR_TRAIN/ISTR/projects/ISTR/istr/inseg.py", line 162, in forward src = self.backbone(images.tensor) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward bottom_up_features = self.bottom_up(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 449, in forward x = stage(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 201, in forward out = self.conv3(out) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/wrappers.py", line 110, in forward x = self.norm(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/batch_norm.py", line 53, in forward return x * scale.to(out_dtype) + bias.to(out_dtype) RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch) [08/25 11:16:05 d2.engine.hooks]: Total training time: 0:00:01 (0:00:00 on hooks) [08/25 11:16:05 d2.utils.events]: iter: 0 lr: N/A max_mem: 14075M Traceback (most recent call last): File "projects/ISTR/train_net.py", line 136, in args=(args,), File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/launch.py", line 82, in launch main_func(*args) File "projects/ISTR/train_net.py", line 124, in main return trainer.train() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 484, in train super().train(self.start_iter, self.max_iter) File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 273, in run_step loss_dict = self.model(data) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/content/drive/.shortcut-targets-by-id/190HFmYfsGdKfNWeUiqnpTgh7X3m3GFmF/ISTR_TRAIN/ISTR/projects/ISTR/istr/inseg.py", line 162, in forward src = self.backbone(images.tensor) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward bottom_up_features = self.bottom_up(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 449, in forward x = stage(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 201, in forward out = self.conv3(out) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/wrappers.py", line 110, in forward x = self.norm(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/batch_norm.py", line 53, in forward return x * scale.to(out_dtype) + bias.to(out_dtype) RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch)
opened by aymennturki 3
RuntimeError: CUDA out of memory.

return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors) RuntimeError: CUDA out of memory. Tried to allocate 180.00 MiB (GPU 0; 10.92 GiB total capacity; 9.74 GiB already allocated; 57.56 MiB free; 9.90 GiB reserved in total by PyTorch)

how can i solve this issue? which file should i change ?

opened by maamouunn 1

ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

Related tags

Overview

Updates

Models

Installation

Requirements

Steps

Citation

Comments

Owner

Jie Hu

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

https://arxiv.org/abs/2102.11005

Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561

Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Minimal implementation of PAWS (https://arxiv.org/abs/2104.13963) in TensorFlow.

A PyTorch implementation of EventProp [https://arxiv.org/abs/2009.08378], a method to train Spiking Neural Networks

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).