PyTorch_YOLOF
A PyTorch version of You Only Look at One-level Feature object detector.
The input image must be resized to have their shorter side being 800 and their longer side less or equal to 1333.
During reproducing the YOLOF, I found many tricks used in YOLOF but the baseline RetinaNet dosen't use those tricks. For example, YOLOF takes advantage of RandomShift, CTR_CLAMP, large learning rate, big batchsize(like 64), negative prediction threshold. Is it really fair that YOLOF use these tricks to compare with RetinaNet?
In a other word, whether the YOLOF can still work without those tricks?
Requirements
- We recommend you to use Anaconda to create a conda environment:
conda create -n yolof python=3.6
- Then, activate the environment:
conda activate yolof
- Requirements:
pip install -r requirements.txt
PyTorch >= 1.1.0 and Torchvision >= 0.3.0
Visualize positive sample
You can run following command to visualize positiva sample:
python train.py \
-d voc \
--batch_size 2 \
--root path/to/your/dataset \
--vis_targets
My Ablation Studies
image mask
- Backbone: ResNet-50
- image size: shorter size = 800, longer size <= 1333
- Batch size: 16
- lr: 0.01
- lr of backbone: 0.01
- SGD with momentum 0.9 and weight decay 1e-4
- Matcher: IoU Top4 (Different from the official matcher that uses top4 of L1 distance.)
- epoch: 12 (1x schedule)
- lr decay: 8, 11
- augmentation: RandomFlip
We ignore the loss of samples who are not in image.
Method | AP | AP50 | AP75 | APs | APm | APl |
---|---|---|---|---|---|---|
w/o mask | 28.3 | 46.7 | 28.9 | 13.4 | 33.4 | 39.9 |
w mask | 28.4 | 46.9 | 29.1 | 13.5 | 33.5 | 39.1 |
L1 Top4
- Backbone: ResNet-50
- image size: shorter size = 800, longer size <= 1333
- Batch size: 16
- lr: 0.01
- lr of backbone: 0.01
- SGD with momentum 0.9 and weight decay 1e-4
- epoch: 12 (1x schedule)
- lr decay: 8, 11
- augmentation: RandomFlip
- with image mask
IoU topk: We choose the topK of IoU between anchor boxes and labels as the positive samples.
L1 topk: We choose the topK of L1 distance between anchor boxes and labels as the positive samples.
Method | AP | AP50 | AP75 | APs | APm | APl |
---|---|---|---|---|---|---|
IoU Top4 | 28.4 | 46.9 | 29.1 | 13.5 | 33.5 | 39.1 |
L1 Top4 | 28.6 | 46.9 | 29.4 | 13.8 | 34.0 | 39.0 |
RandomShift Augmentation
- Backbone: ResNet-50
- image size: shorter size = 800, longer size <= 1333
- Batch size: 16
- lr: 0.01
- lr of backbone: 0.01
- SGD with momentum 0.9 and weight decay 1e-4
- Matcher: L1 Top4
- epoch: 12 (1x schedule)
- lr decay: 8, 11
- augmentation: RandomFlip
- with image mask
YOLOF takes advantage of RandomShift augmentation which is not used in RetinaNet.
Method | AP | AP50 | AP75 | APs | APm | APl |
---|---|---|---|---|---|---|
w/o RandomShift | 28.6 | 46.9 | 29.4 | 13.8 | 34.0 | 39.0 |
w/ RandomShift | 29.0 | 47.3 | 29.8 | 14.2 | 34.2 | 38.9 |
Fix a bug in dataloader
- Backbone: ResNet-50
- image size: shorter size = 800, longer size <= 1333
- Batch size: 16
- lr: 0.01
- lr of backbone: 0.01
- SGD with momentum 0.9 and weight decay 1e-4
- Matcher: L1 Top4
- epoch: 12 (1x schedule)
- lr decay: 8, 11
- augmentation: RandomFlip + RandomShift
- with image mask
I fixed a bug in dataloader. Specifically, I set the shuffle
in dataloader as False
...
Method | AP | AP50 | AP75 | APs | APm | APl |
---|---|---|---|---|---|---|
bug | 29.0 | 47.3 | 29.8 | 14.2 | 34.2 | 38.9 |
no bug | 30.1 | 49.0 | 31.0 | 15.2 | 36.3 | 39.8 |
Ignore samples
- Backbone: ResNet-50
- image size: shorter size = 800, longer size <= 1333
- Batch size: 16
- lr: 0.01
- lr of backbone: 0.01
- SGD with momentum 0.9 and weight decay 1e-4
- Matcher: L1 Top4
- epoch: 12 (1x schedule)
- lr decay: 8, 11
- augmentation: RandomFlip + RandomShift
- with image mask
We ignore those negative samples whose IoU with labels are higher the ignore threshold (igt).
Method | AP | AP50 | AP75 | APs | APm | APl |
---|---|---|---|---|---|---|
no igt | 30.1 | 49.0 | 31.0 | 15.2 | 36.3 | 39.8 |
igt=0.7 |
Decode boxes
- Backbone: ResNet-50
- image size: shorter size = 800, longer size <= 1333
- Batch size: 16
- lr: 0.01
- lr of backbone: 0.01
- SGD with momentum 0.9 and weight decay 1e-4
- Matcher: L1 Top4
- epoch: 12 (1x schedule)
- lr decay: 8, 11
- augmentation: RandomFlip + RandomShift
- with image mask
Method-1: ctr_x = x_anchor + t_x, ctr_y = y_anchor + t_y
Method-2: ctr_x = x_anchor + t_x * w_anchor, ctr_y = y_anchor + t_y * h_anchor
The Method-2 is following the operation used in YOLOF.
Method | AP | AP50 | AP75 | APs | APm | APl |
---|---|---|---|---|---|---|
Method-1 | ||||||
Method-2 |
Train
sh train.sh
You can change the configurations of train.sh
.
If you just want to check which anchor box is assigned to the positive sample, you can run:
python train.py --cuda -d voc --batch_size 8 --vis_targets
According to your own situation, you can make necessary adjustments to the above run commands
Test
python test.py -d [select a dataset: voc or coco] \
--cuda \
-v [select a model] \
--weight [ Please input the path to model dir. ] \
--img_size 800 \
--root path/to/dataset/ \
--show
You can run the above command to visualize the detection results on the dataset.