I am trying to load the VOC YOLOv2 weights from the yolo website. I am working in the jupyter notebook provided in this repository. Here are the modified parameters:
LABELS = ['Person', 'Car', 'Bicycle', 'Bus', 'Motorbike', 'Train', 'Aeroplane', 'Chair', 'Bottle', 'Dining Table', 'Potted Plant', 'TV/Monitor', 'Sofa', 'Bird', 'Cat', 'Cow', 'Dog', 'Horse', 'Sheep']
IMAGE_H, IMAGE_W = 416, 416
GRID_H, GRID_W = 13 , 13
BOX = 5
CLASS = len(LABELS)
CLASS_WEIGHTS = np.ones(CLASS, dtype='float32')
OBJ_THRESHOLD = 0.3#0.5
NMS_THRESHOLD = 0.3#0.45
ANCHORS = [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828]
NO_OBJECT_SCALE = 1.0
OBJECT_SCALE = 5.0
COORD_SCALE = 1.0
CLASS_SCALE = 1.0
BATCH_SIZE = 16
WARM_UP_BATCHES = 0
TRUE_BOX_BUFFER = 50
Here is the model summary:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_5 (InputLayer) (None, 416, 416, 3) 0
__________________________________________________________________________________________________
conv_0 (Conv2D) (None, 416, 416, 32) 864 input_5[0][0]
__________________________________________________________________________________________________
batch_norm_0 (BatchNormalizatio (None, 416, 416, 32) 128 conv_0[0][0]
__________________________________________________________________________________________________
leaky_re_lu_45 (LeakyReLU) (None, 416, 416, 32) 0 batch_norm_0[0][0]
__________________________________________________________________________________________________
max_pooling2d_11 (MaxPooling2D) (None, 208, 208, 32) 0 leaky_re_lu_45[0][0]
__________________________________________________________________________________________________
conv_1 (Conv2D) (None, 208, 208, 64) 18432 max_pooling2d_11[0][0]
__________________________________________________________________________________________________
batch_norm_1 (BatchNormalizatio (None, 208, 208, 64) 256 conv_1[0][0]
__________________________________________________________________________________________________
leaky_re_lu_46 (LeakyReLU) (None, 208, 208, 64) 0 batch_norm_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_12 (MaxPooling2D) (None, 104, 104, 64) 0 leaky_re_lu_46[0][0]
__________________________________________________________________________________________________
conv_2 (Conv2D) (None, 104, 104, 128 73728 max_pooling2d_12[0][0]
__________________________________________________________________________________________________
batch_norm_2 (BatchNormalizatio (None, 104, 104, 128 512 conv_2[0][0]
__________________________________________________________________________________________________
leaky_re_lu_47 (LeakyReLU) (None, 104, 104, 128 0 batch_norm_2[0][0]
__________________________________________________________________________________________________
conv_3 (Conv2D) (None, 104, 104, 64) 8192 leaky_re_lu_47[0][0]
__________________________________________________________________________________________________
batch_norm_3 (BatchNormalizatio (None, 104, 104, 64) 256 conv_3[0][0]
__________________________________________________________________________________________________
leaky_re_lu_48 (LeakyReLU) (None, 104, 104, 64) 0 batch_norm_3[0][0]
__________________________________________________________________________________________________
conv_4 (Conv2D) (None, 104, 104, 128 73728 leaky_re_lu_48[0][0]
__________________________________________________________________________________________________
batch_norm_4 (BatchNormalizatio (None, 104, 104, 128 512 conv_4[0][0]
__________________________________________________________________________________________________
leaky_re_lu_49 (LeakyReLU) (None, 104, 104, 128 0 batch_norm_4[0][0]
__________________________________________________________________________________________________
max_pooling2d_13 (MaxPooling2D) (None, 52, 52, 128) 0 leaky_re_lu_49[0][0]
__________________________________________________________________________________________________
conv_5 (Conv2D) (None, 52, 52, 256) 294912 max_pooling2d_13[0][0]
__________________________________________________________________________________________________
batch_norm_5 (BatchNormalizatio (None, 52, 52, 256) 1024 conv_5[0][0]
__________________________________________________________________________________________________
leaky_re_lu_50 (LeakyReLU) (None, 52, 52, 256) 0 batch_norm_5[0][0]
__________________________________________________________________________________________________
conv_6 (Conv2D) (None, 52, 52, 128) 32768 leaky_re_lu_50[0][0]
__________________________________________________________________________________________________
batch_norm_6 (BatchNormalizatio (None, 52, 52, 128) 512 conv_6[0][0]
__________________________________________________________________________________________________
leaky_re_lu_51 (LeakyReLU) (None, 52, 52, 128) 0 batch_norm_6[0][0]
__________________________________________________________________________________________________
conv_7 (Conv2D) (None, 52, 52, 256) 294912 leaky_re_lu_51[0][0]
__________________________________________________________________________________________________
batch_norm_7 (BatchNormalizatio (None, 52, 52, 256) 1024 conv_7[0][0]
__________________________________________________________________________________________________
leaky_re_lu_52 (LeakyReLU) (None, 52, 52, 256) 0 batch_norm_7[0][0]
__________________________________________________________________________________________________
max_pooling2d_14 (MaxPooling2D) (None, 26, 26, 256) 0 leaky_re_lu_52[0][0]
__________________________________________________________________________________________________
conv_8 (Conv2D) (None, 26, 26, 512) 1179648 max_pooling2d_14[0][0]
__________________________________________________________________________________________________
batch_norm_8 (BatchNormalizatio (None, 26, 26, 512) 2048 conv_8[0][0]
__________________________________________________________________________________________________
leaky_re_lu_53 (LeakyReLU) (None, 26, 26, 512) 0 batch_norm_8[0][0]
__________________________________________________________________________________________________
conv_9 (Conv2D) (None, 26, 26, 256) 131072 leaky_re_lu_53[0][0]
__________________________________________________________________________________________________
batch_norm_9 (BatchNormalizatio (None, 26, 26, 256) 1024 conv_9[0][0]
__________________________________________________________________________________________________
leaky_re_lu_54 (LeakyReLU) (None, 26, 26, 256) 0 batch_norm_9[0][0]
__________________________________________________________________________________________________
conv_10 (Conv2D) (None, 26, 26, 512) 1179648 leaky_re_lu_54[0][0]
__________________________________________________________________________________________________
batch_norm_10 (BatchNormalizati (None, 26, 26, 512) 2048 conv_10[0][0]
__________________________________________________________________________________________________
leaky_re_lu_55 (LeakyReLU) (None, 26, 26, 512) 0 batch_norm_10[0][0]
__________________________________________________________________________________________________
conv_11 (Conv2D) (None, 26, 26, 256) 131072 leaky_re_lu_55[0][0]
__________________________________________________________________________________________________
batch_norm_11 (BatchNormalizati (None, 26, 26, 256) 1024 conv_11[0][0]
__________________________________________________________________________________________________
leaky_re_lu_56 (LeakyReLU) (None, 26, 26, 256) 0 batch_norm_11[0][0]
__________________________________________________________________________________________________
conv_12 (Conv2D) (None, 26, 26, 512) 1179648 leaky_re_lu_56[0][0]
__________________________________________________________________________________________________
batch_norm_12 (BatchNormalizati (None, 26, 26, 512) 2048 conv_12[0][0]
__________________________________________________________________________________________________
leaky_re_lu_57 (LeakyReLU) (None, 26, 26, 512) 0 batch_norm_12[0][0]
__________________________________________________________________________________________________
max_pooling2d_15 (MaxPooling2D) (None, 13, 13, 512) 0 leaky_re_lu_57[0][0]
__________________________________________________________________________________________________
conv_13 (Conv2D) (None, 13, 13, 1024) 4718592 max_pooling2d_15[0][0]
__________________________________________________________________________________________________
batch_norm_13 (BatchNormalizati (None, 13, 13, 1024) 4096 conv_13[0][0]
__________________________________________________________________________________________________
leaky_re_lu_58 (LeakyReLU) (None, 13, 13, 1024) 0 batch_norm_13[0][0]
__________________________________________________________________________________________________
conv_14 (Conv2D) (None, 13, 13, 512) 524288 leaky_re_lu_58[0][0]
__________________________________________________________________________________________________
batch_norm_14 (BatchNormalizati (None, 13, 13, 512) 2048 conv_14[0][0]
__________________________________________________________________________________________________
leaky_re_lu_59 (LeakyReLU) (None, 13, 13, 512) 0 batch_norm_14[0][0]
__________________________________________________________________________________________________
conv_15 (Conv2D) (None, 13, 13, 1024) 4718592 leaky_re_lu_59[0][0]
__________________________________________________________________________________________________
batch_norm_15 (BatchNormalizati (None, 13, 13, 1024) 4096 conv_15[0][0]
__________________________________________________________________________________________________
leaky_re_lu_60 (LeakyReLU) (None, 13, 13, 1024) 0 batch_norm_15[0][0]
__________________________________________________________________________________________________
conv_16 (Conv2D) (None, 13, 13, 512) 524288 leaky_re_lu_60[0][0]
__________________________________________________________________________________________________
batch_norm_16 (BatchNormalizati (None, 13, 13, 512) 2048 conv_16[0][0]
__________________________________________________________________________________________________
leaky_re_lu_61 (LeakyReLU) (None, 13, 13, 512) 0 batch_norm_16[0][0]
__________________________________________________________________________________________________
conv_17 (Conv2D) (None, 13, 13, 1024) 4718592 leaky_re_lu_61[0][0]
__________________________________________________________________________________________________
batch_norm_17 (BatchNormalizati (None, 13, 13, 1024) 4096 conv_17[0][0]
__________________________________________________________________________________________________
leaky_re_lu_62 (LeakyReLU) (None, 13, 13, 1024) 0 batch_norm_17[0][0]
__________________________________________________________________________________________________
conv_18 (Conv2D) (None, 13, 13, 1024) 9437184 leaky_re_lu_62[0][0]
__________________________________________________________________________________________________
batch_norm_18 (BatchNormalizati (None, 13, 13, 1024) 4096 conv_18[0][0]
__________________________________________________________________________________________________
conv_20 (Conv2D) (None, 26, 26, 64) 32768 leaky_re_lu_57[0][0]
__________________________________________________________________________________________________
leaky_re_lu_63 (LeakyReLU) (None, 13, 13, 1024) 0 batch_norm_18[0][0]
__________________________________________________________________________________________________
batch_norm_20 (BatchNormalizati (None, 26, 26, 64) 256 conv_20[0][0]
__________________________________________________________________________________________________
conv_19 (Conv2D) (None, 13, 13, 1024) 9437184 leaky_re_lu_63[0][0]
__________________________________________________________________________________________________
leaky_re_lu_65 (LeakyReLU) (None, 26, 26, 64) 0 batch_norm_20[0][0]
__________________________________________________________________________________________________
batch_norm_19 (BatchNormalizati (None, 13, 13, 1024) 4096 conv_19[0][0]
__________________________________________________________________________________________________
lambda_4 (Lambda) (None, 13, 13, 256) 0 leaky_re_lu_65[0][0]
__________________________________________________________________________________________________
leaky_re_lu_64 (LeakyReLU) (None, 13, 13, 1024) 0 batch_norm_19[0][0]
__________________________________________________________________________________________________
concatenate_3 (Concatenate) (None, 13, 13, 1280) 0 lambda_4[0][0]
leaky_re_lu_64[0][0]
__________________________________________________________________________________________________
conv_21 (Conv2D) (None, 13, 13, 1024) 11796480 concatenate_3[0][0]
__________________________________________________________________________________________________
batch_norm_21 (BatchNormalizati (None, 13, 13, 1024) 4096 conv_21[0][0]
__________________________________________________________________________________________________
leaky_re_lu_66 (LeakyReLU) (None, 13, 13, 1024) 0 batch_norm_21[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 13, 13, 120) 123000 leaky_re_lu_66[0][0]
__________________________________________________________________________________________________
reshape_3 (Reshape) (None, 13, 13, 5, 24 0 conv2d_3[0][0]
__________________________________________________________________________________________________
input_4 (InputLayer) (None, 1, 1, 1, 50, 0
__________________________________________________________________________________________________
lambda_5 (Lambda) (None, 13, 13, 5, 24 0 reshape_3[0][0]
input_4[0][0]
==================================================================================================
Total params: 50,670,936
Trainable params: 50,650,264
Non-trainable params: 20,672
__________________________________________________________________________________________________
I "succesfully" read the weights (succesfully as in no errors):
weight_reader = WeightReader('yolov2-voc.weights')
for index in range(conv_count):
conv_layer = model.get_layer('conv_%i' % index)
norm_layer = model.get_layer('batch_norm_%i' % index)
size = np.prod(norm_layer.get_weights()[0].shape) # get product of shape (total values)
# read sizes
beta = weight_reader.read(size)
gamma = weight_reader.read(size)
mean = weight_reader.read(size)
var = weight_reader.read(size)
norm_layer.set_weights([gamma, beta, mean, var])
if len(conv_layer.get_weights()) > 1:
bias = weight_reader.read(np.prod(conv_layer.get_weights()[1].shape))
kernel = weight_reader.read(np.prod(conv_layer.get_weights()[0].shape))
kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
kernel = kernel.transpose([2, 3, 1, 0])
conv_layer.set_weights([kernel, bias])
else:
kernel = weight_reader.read(np.prod(conv_layer.get_weights()[0].shape))
kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
kernel = kernel.transpose([2, 3, 1, 0])
conv_layer.set_weights([kernel])
When I try to run the following code, no boxes are created:
img = cv2.imread("dog-cycle-car.png")
img = cv2.resize(img, (416, 416)) # resize to the input dimension
img = img / 255
img = img[..., ::-1] # .transpose((2, 0, 1)) # BGR -> RGB | H X W C -> C X H X W
img_input = np.array([img])
dummy_array = np.zeros((1, 1, 1, 1, TRUE_BOX_BUFFER, 4))
test_prediction = model.predict([img_input, dumby_array])
boxes = decode_netout(test_prediction[0],
obj_threshold=OBJ_THRESHOLD,
nms_threshold=NMS_THRESHOLD,
anchors=ANCHORS,
nb_class=CLASS)
img = draw_boxes(img, boxes, labels=LABELS)
img.shape
plt.imshow(img)
boxes
I assume that there must be an issue with how I am loading the weights because the summary of the model looks the same and that is the only thing that I have modified in the notebook. I would be super grateful if someone could shine some light on to why this may be happening and what I am doing wrong.
Edit
If it would be helpful to see my jupyter notebook just ask for a link.