HybridNets: End-to-End Perception Network

Thanh Dat Vu

Last update: Dec 29, 2022

Related tags

Deep Learning detection segmentation autonomous-driving multitask-learning bifpn end2end-network hybridnets

Overview

HybridNets: End2End Perception Network

HybridNets Network Architecture.

HybridNets: End-to-End Perception Network

by Dat Vu, Bao Ngo, Hung Phan^📧 FPT University

(^📧) corresponding author.

arXiv technical report (arXiv 2203.09035)

Table of Contents

About The Project
- Project Structure
Getting Started
- Installation
- Demo
Usage
- Data Preparation
- Training
Training Tips
Results
License
Acknowledgements
Citation

About The Project

HybridNets is an end2end perception network for multi-tasks. Our work focused on traffic object detection, drivable area segmentation and lane detection. HybridNets can run real-time on embedded systems, and obtains SOTA Object Detection, Lane Detection on BDD100K Dataset.

Project Structure

HybridNets
│   backbone.py                   # Model configuration
│   hubconf.py                    # Pytorch Hub entrypoint
│   hybridnets_test.py            # Image inference
│   hybridnets_test_videos.py     # Video inference
│   train.py                      # Train script
│   val.py                        # Validate script
│
├───encoders                      # https://github.com/qubvel/segmentation_models.pytorch/tree/master/segmentation_models_pytorch/encoders
│       ...
│
├───hybridnets
│       autoanchor.py             # Generate new anchors by k-means
│       dataset.py                # BDD100K dataset
│       loss.py                   # Focal, tversky (dice)
│       model.py                  # Model blocks
│
├───projects
│       bdd100k.yml               # Project configuration
│
└───utils
    │   plot.py                   # Draw bounding box
    │   smp_metrics.py            # https://github.com/qubvel/segmentation_models.pytorch/blob/master/segmentation_models_pytorch/metrics/functional.py
    │   utils.py                  # Various helper functions (preprocess, postprocess, eval...)
    │
    └───sync_batchnorm            # https://github.com/vacancy/Synchronized-BatchNorm-PyTorch/tree/master/sync_batchnorm 
            ...

Getting Started

Installation

The project was developed with Python>=3.7 and Pytorch>=1.10.

git clone https://github.com/datvuthanh/HybridNets
cd HybridNets
pip install -r requirements.txt

Demo

# Download end-to-end weights
mkdir weights
curl -L -o weights/hybridnets.pth https://github.com/datvuthanh/HybridNets/releases/download/v1.0/hybridnets.pth

# Image inference
python hybridnets_test.py -w weights/hybridnets.pth --source demo/image --output demo_result --imshow False --imwrite True

# Video inference
python hybridnets_test_videos.py -w weights/hybridnets.pth --source demo/video --output demo_result

# Result is saved in a new folder called demo_result

Usage

Data Preparation

Recommended dataset structure:

HybridNets
└───datasets
    ├───imgs
    │   ├───train
    │   └───val
    ├───det_annot
    │   ├───train
    │   └───val
    ├───da_seg_annot
    │   ├───train
    │   └───val
    └───ll_seg_annot
        ├───train
        └───val

Update your dataset paths in projects/your_project_name.yml.

For BDD100K: imgs, det_annot, da_seg_annot, ll_seg_annot

Training

1) Edit or create a new project configuration, using bdd100k.yml as a template

# mean and std of dataset in RGB order
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]

# bdd100k anchors
anchors_scales: '[2**0, 2**0.70, 2**1.32]'
anchors_ratios: '[(0.62, 1.58), (1.0, 1.0), (1.58, 0.62)]'

# BDD100K officially supports 10 classes
# obj_list: ['person', 'rider', 'car', 'truck', 'bus', 'train', 'motorcycle', 'bicycle', 'traffic light', 'traffic sign']
obj_list: ['car']
obj_combine: ['car', 'bus', 'truck', 'train']  # if single class, combine these classes into 1 single class in obj_list
                                               # leave as empty list ([]) to not combine classes

seg_list: ['road',
          'lane']

dataset:
  color_rgb: false
  dataroot: path/to/imgs
  labelroot: path/to/det_annot
  laneroot: path/to/ll_seg_annot
  maskroot: path/to/da_seg_annot
...

2) Train

python train.py -p bdd100k        # your_project_name
                -c 3              # coefficient of effnet backbone, result from paper is 3
                -n 4              # num_workers
                -b 8              # batch_size per gpu
                -w path/to/weight # use 'last' to resume training from previous session
                --freeze_det      # freeze detection head, others: --freeze_backbone, --freeze_seg
                --lr 1e-5         # learning rate
                --optim adamw     # adamw | sgd
                --num_epochs 200

Please check python train.py --help for every available arguments.

3) Evaluate

python val.py -p bdd100k -c 3 -w checkpoints/weight.pth

Training Tips

Anchors ⚓

If your dataset is intrinsically different from COCO or BDD100K, or the metrics of detection after training are not as high as expected, you could try enabling autoanchor in project.yml:

...
model:
  image_size:
  - 640
  - 384
need_autoanchor: true  # set to true to run autoanchor
pin_memory: false
...

This automatically finds the best combination of anchor scales and anchor ratios for your dataset. Then you can manually edit them project.yml and disable autoanchor.

If you're feeling lucky, maybe mess around with base_anchor_scale in backbone.py:

class HybridNetsBackbone(nn.Module):
  ...
  self.pyramid_levels = [5, 5, 5, 5, 5, 5, 5, 5, 6]
  self.anchor_scale = [1.25,1.25,1.25,1.25,1.25,1.25,1.25,1.25,1.25,]
  self.aspect_ratios = kwargs.get('ratios', [(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)])
  ...

and model.py:

class Anchors(nn.Module):
  ...
  for scale, ratio in itertools.product(self.scales, self.ratios):
    base_anchor_size = self.anchor_scale * stride * scale
    anchor_size_x_2 = base_anchor_size * ratio[0] / 2.0
    anchor_size_y_2 = base_anchor_size * ratio[1] / 2.0
  ...

to get a grasp on how anchor boxes work.

And because a picture is worth a thousand words, you can visualize your anchor boxes in Anchor Computation Tool.

Training stages

We experimented with training stages and found that this settings achieved the best results:

--freeze_seg True ~ 100 epochs
--freeze_backbone True --freeze_det True ~ 50 epochs
Train end-to-end ~ 50 epochs

The reason being detection head is harder to converge early on, so we basically skipped segmentation head to focus on detection first.

Results

Traffic Object Detection

Result Visualization

Model	Recall (%)	[email protected] (%)
`MultiNet`	81.3	60.2
`DLT-Net`	89.4	68.4
`Faster R-CNN`	77.2	55.6
`YOLOv5s`	86.8	77.2
`YOLOP`	89.2	76.5
`HybridNets`	92.8	77.3

Drivable Area Segmentation

Result Visualization

Model	Drivable mIoU (%)
`MultiNet`	71.6
`DLT-Net`	71.3
`PSPNet`	89.6
`YOLOP`	91.5
`HybridNets`	90.5

Lane Line Detection

Result Visualization

Model	Accuracy (%)	Lane Line IoU (%)
`Enet`	34.12	14.64
`SCNN`	35.79	15.84
`Enet-SAD`	36.56	16.02
`YOLOP`	70.5	26.2
`HybridNets`	85.4	31.6

Original footage courtesy of Hanoi Life

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgements

Our work would not be complete without the wonderful work of the following authors:

Citation

If you find our paper and code useful for your research, please consider giving a star ⭐ and citation 📝 :

@misc{vu2022hybridnets,
      title={HybridNets: End-to-End Perception Network}, 
      author={Dat Vu and Bao Ngo and Hung Phan},
      year={2022},
      eprint={2203.09035},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Comments

Lane color and Lane type Segmentation

In the traffic light and traffic sign detection issue, you've mentioned that we just have to change the obj_list in project.yml by adding the classes needed. Does that apply to seg_list as well?

If I want to detect lane color and lane type, can I change seg_list as follows

seg_list: ['road', 'double white', 'double yellow', 'single white', 'single yellow', 'solid', 'dashed']

Actually, I need classes like double solid yellow, double solid white, single solid yellow, single solid white, single dashed yellow, single dashed white, double dashed yellow, double dashed white, but BDD100K already has labeled double white, double yellow, single white, single yellow classes under Lane Categories and solid, dashed classes under Lane Styles

Will the change in seg_list as shown above work, if not, how to do it

opened by nannapaneni4 9
dataset link in colab doesnt work

Hi,

This link is required access permission. Could you help with it? https://drive.google.com/drive/folders/1iW9Darrars4xc9uHq2ZnRsWoJolCcyzg?usp=sharing

opened by zehranrgi 4
Issue with FPS mistake in the article

Hello. First of all,thank you for this work.

I noted you mistake the code about the inf_time and fps.

So I think maybe you calculation the inference time incorrectly in the article , your article show that YOLOP have 52ms the infercence time per frame(batch size 1), which mean 20fps? (although 41 fps show in the YOLOP's article).

And sadly in the hybridnets_test.py , i try calculate the HYBRIDNET's inference time but only get 0.06s(only model(x) ) ,which means 17-20fps. （Tesla v100 ）but get 0.021s(only model(x) ) in YOLOP, which means 48 fps（Tesla v100 ）

Sadly , it may not faster than YOLOP and not reach the real-time.
question

opened by hankplease 3

Did anyone successfully export onnx?

code as follows, but export nothing:

weight_path = 'weights/hybridnets.pth'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
params = Params(os.path.join(Path(__file__).resolve().parent, "projects/bdd100k.yml"))
model = HybridNetsBackbone(num_classes=len(params.obj_list), compound_coef=3,
                           ratios=eval(params.anchors_ratios), scales=eval(params.anchors_scales),
                           seg_classes=len(params.seg_list), backbone_name=None)
model.load_state_dict(torch.load(weight_path, map_location=device))
model.eval()
inputs = torch.randn(1, 3, 384, 640)
print("begin to convert onnx")
torch.onnx.export(model, inputs, 'HybridNetsBackbone.onnx',
                  verbose=False, opset_version=12, input_names=['images'])
print("done")

shell log:

HybridNets/utils/utils.py:673: TracerWarning: torch.from_numpy results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  anchor_boxes = torch.from_numpy(anchor_boxes.astype(dtype)).to(image.device)
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.

...
ONNX export failed: Couldn't export Python operator SwishImplementation

question

opened by aimuch 3

Evaluation results not accurate?

Hi, I've been trying to recreate your results from the Hybridnets paper. I've run the eval code on the 100k dataset but the results I'm getting are nowhere close to the actual results you presented in the paper. I'm sure I'm missing something here, could you please tell me what could possibly be going wrong here. I think the root locations that I'm giving are inaccurate.

data root - raw images from the 100k dataset label root - I saved separate json file for each image from the "bdd100k_labels_images_val.json" that I downloaded from the bdd100k dataset. So in total I saved 10,000 separate json (one for each image) into another folder named "val". Road labels in Seg_list - I used the drivable masks Lane labels in Seg_list - I used the lane masks

The output that I got after evaluation of 100 images:

The iou values are inconsistent with the ones mentioned in the paper, they are nowhere near them. I was also wondering why the precision is so low, is there a specific reason as to why there are so many false positives. I hope you can help me out, thanks.
question

opened by luceeleven 3

Problem in Training stage

Hello, I tried to follow your suggestion to train the model. So accordingly, at first I freeze the segmentation and trained for some epoch.

python train.py -p bdd100k -c 3 -n 4 -b 8 --freeze_seg True --lr 1e-5 --optim adamw --num_epochs 75 --val_interval 1 --log_path D:\HybridNets\rgb-clean --saved_path D:\HybridNets\rgb-clean --save_interval 500 --verbose True --num_gpus 1 --plots True

After that I am freezing the backbone and detection head.

python train.py -p bdd100k -c 3 -n 4 -b 8 --freeze_backbone True --freeze_det True --lr 1e-5 --optim adamw --num_epochs 12 --val_interval 1 --log_path D:\HybridNets\rgb_clean --saved_path D:\HybridNets\rgb_clean --save_interval 500 --verbose True --num_gpus 1 --plots True -w D:\HybridNets\rgb-clean\bdd100k\hybridnets-d3_74_129225_best.pth

But I am getting the error

Can you please suggest how to solve this issue?

Thank you in advance

question

opened by dreamer-1996 2

AssertionError BUG

if i just want to seg one class，such as seg_list only have ’road‘. Then i run train.py , in loss.py line 538, in soft_tversky_score assert output.size() == target.size() AssertionError

then i debug code,find output.size() = torch.Size([2, 1, 245760]) target.size() = torch.Size([2, 1, 491520])

How to fix that???

opened by harrylee999 2
IndexError: boolean index did not match indexed array along dimension 0

Hello , when i put a image of size 1920*1080 for test,there is the following error. Can you please help me resolve this issue? Thank you! "IndexError: boolean index did not match indexed array along dimension 0; dimension is 1080 but corresponding boolean dimension is 720"

opened by liuliaocheng 2
[Discussion] Gradient flow
Back when we were toying with mosaic, we removed the segmentation head completely from the model and dataloader. Now that we try to add mosaic augmentation officially, we have to make a decision of not using it for segmentation training.

hybridnets/dataset.py

if self.use_mosaic: # honestly, mosaic is not for road and lane segmentation anyway # you cant expect road and lane to be split up in 4 separate corners in an image, do you? # only use mosaic with freeze_seg :) img, labels, seg_label, lane_label, (h0, w0), (h, w), path = self.load_mosaic(idx)

Only images and object annotations are mosaic, while segmentation annotations are kept intact, which produces incorrect segmentation loss but that doesn't matter because we froze segmentation head anyway, thinking that requires_grad=False makes the segmentation head disappear from backprop graph. But that is wrong, the backbone is still affected by segmentation loss.

Check this colab for interactive stuffs.

So we've been planning to just straight ahead set the losses to 0 when you --freeze_head like this:

cls_loss, reg_loss, seg_loss, regression, classification, anchors, segmentation = model(imgs, annot, seg_annot, obj_list=params.obj_list) cls_loss = cls_loss.mean() if not opt.freeze_det else 0 reg_loss = reg_loss.mean() if not opt.freeze_det else 0 seg_loss = seg_loss.mean() if not opt.freeze_seg else 0

Is this approach too naive? Are there any recommendation regarding this matter? Or should we also mosaic the segmentation labels?
help wanted question
opened by xoiga123 2
Could not use Pytorch quantization for model

model_to_quantize = copy.deepcopy(model)

qconfig_dict = {"": torch.quantization.get_default_qconfig('qnnpack')}

model_to_quantize.eval()

# prepare model_prepared = quantize_fx.prepare_fx(model_to_quantize, qconfig_dict)

# calibrate (not shown) # quantize model_quantized = quantize_fx.convert_fx(model_prepared)

When using Pytorch quantization example for your model, I get this following error:

~/Documents/DL_course_project/HybridNets/backbone.py in forward(self, inputs) 100 101 # p1, p2, p3, p4, p5 = self.backbone_net(inputs) 102 --> p2, p3, p4, p5 = self.encoder(inputs)[-4:] # self.backbone_net(inputs) 103 104 features = (p3, p4, p5) NameError: module is not installed as a submodule

How can I avoid this error?
help wanted

opened by tiendatAI 2
Drivable and Lane Type training
@datvuthanh thanks for sharing the code based i have following queries

since bdd100k has different types of lanes eg double yellow, single white lane , dashed, can we use current source code to train for different lane types? is so what are the modifications need to be made int he code based

Can we similar train the current source code with the driveable area and alternate drivable area labels ? if so what are the changes to be made

Please share your thoughts Thanks in advance
opened by abhigoku10 2
How to generate drivable area and lane masks?

I know you shared a drive link for bdd100k drivable area and lane masks for the dataloader but I want to replicate it for understanding what to do for my custom dataset. I looked for bdd repo and there is "to_mask.py" which has some scripts to do it. It passes its own tests but I cannot get similar results as you shared. Can you please explain how to generate those masks? Thanks in advance.

opened by eren-aydemir 0
Issue with FPS calculation code.

Hello. First of all, great work.

While running the hybridnets_test_videos.py I found some issues with the FPS calculation part. In the hybridnets_test_videos.py script, the FPS is calculated as: (t2-t1)/frame_count)

But it seems that the above code will give the inference time per frame and not the FPS. Most probably it should be: 1/((t2-t1)/frame_count)) That is, we need to divide it by 1.

Please let me know if any updates happen on this front.

opened by sovit-123 1
The loss doesn't converge when training segmentation head only.

I changed the backbone to Efficientnet-b0 and reduce the number of BiFPN layers from 6 to 1, in order to cut down the runtime of inference. After training 200 epochs with segmentation head frozen, I tried to train the model freezing backbone and detection head. But I found that the train loss of segmentation head dose not seem to converge. And the loss of valuation and mIOU are reducing at the same time, which doesn't make sense. Apart from that, I also found that when freezing segmentation head, the segmentation loss is not set to 0 in the code. which can affect the updating of weights in backbone, I suppose.
bug help wanted

opened by bigsquirrel18 13

Releases(v1.0)

v1.0(Mar 13, 2022)

This is used in our paper.
Source code(tar.gz)
Source code(zip)
hybridnets.pth(52.19 MB)

Owner

Thanh Dat Vu

GitHub

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

?? Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

156 Jan 9, 2023

Project page of the paper 'Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network' (ECCVW 2018)

EPSR (Enhanced Perceptual Super-resolution Network) paper This repo provides the test code, pretrained models, and results on benchmark datasets of ou

78 Nov 19, 2022

End-to-End Object Detection with Fully Convolutional Network

This project provides an implementation for "End-to-End Object Detection with Fully Convolutional Network" on PyTorch.

472 Dec 22, 2022

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

187 Dec 24, 2022

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集，包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。人机交互主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

Perceiver - Pytorch Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch Install $ pip install perceiver-pytorch Usage

876 Dec 29, 2022

Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow

Perceiver This Python package implements Perceiver: General Perception with Iterative Attention by Andrew Jaegle in TensorFlow. This model builds on t

84 Oct 15, 2022

Official source code to CVPR'20 paper, "When2com: Multi-Agent Perception via Communication Graph Grouping"

When2com: Multi-Agent Perception via Communication Graph Grouping This is the PyTorch implementation of our paper: When2com: Multi-Agent Perception vi

34 Nov 9, 2022

Code for Towards Streaming Perception (ECCV 2020) :car:

sAP — Code for Towards Streaming Perception ECCV Best Paper Honorable Mention Award Feb 2021: Announcing the Streaming Perception Challenge (CVPR 2021

85 Dec 22, 2022

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

147 Dec 3, 2022

Certifiable Outlier-Robust Geometric Perception

Certifiable Outlier-Robust Geometric Perception About This repository holds the implementation for certifiably solving outlier-robust geometric percep

83 Dec 31, 2022

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

VGPL-Visual-Prior PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner (VGPL). Give

8 Dec 29, 2022

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

78 Dec 27, 2022

HybridNets: End-to-End Perception Network

Related tags

Overview

HybridNets: End2End Perception Network

About The Project

Project Structure

Getting Started

Installation

Demo

Usage

Data Preparation

Training

1) Edit or create a new project configuration, using bdd100k.yml as a template

2) Train

3) Evaluate

Training Tips

Anchors ⚓

Training stages

Results

Traffic Object Detection

Drivable Area Segmentation

Lane Line Detection

License

Acknowledgements

Citation

Comments

Releases(v1.0)

v1.0(Mar 13, 2022)

Owner

Thanh Dat Vu

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

Project page of the paper 'Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network' (ECCVW 2018)

End-to-End Object Detection with Fully Convolutional Network

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow

Official source code to CVPR'20 paper, "When2com: Multi-Agent Perception via Communication Graph Grouping"

Code for Towards Streaming Perception (ECCV 2020) :car:

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

Certifiable Outlier-Robust Geometric Perception

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

Autonomous Perception: 3D Object Detection with Complex-YOLO

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Real-time Object Detection for Streaming Perception, CVPR 2022

An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

PURE: End-to-End Relation Extraction

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.