Hi again, I was playing a bit around and discovered that it fails for non-square dimensional images, i.e. where height != width. Maybe I was looking wrong or missed something, but I haven't found it mentioned anywhere and the docs kind of suggests that it can be any height and any width. The same goes for the description of the layers (e.g. s1
). In the other issue, you mentioned that
One thing you may want to add to this transformer pipeline is a transforms.Resize followed by a transforms.CenterCrop to ensure all images end up having the same height and width
but didn't mention why. Why is it not possible for non-square images? Is there any workaround if one doesn't want to crop? Maybe to pad like in this post*?
To demonstrate the issue:
import os
import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import pickle
import hmax
path_hmax = './'
# Initialize the model with the universal patch set
print('Constructing model')
model = hmax.HMAX(os.path.join(path_hmax,'universal_patch_set.mat'))
# A folder with example images
example_images = datasets.ImageFolder(
os.path.join(path_hmax,'example_images'),
transform=transforms.Compose([
transforms.Resize((400, 500)),
transforms.CenterCrop((400, 500)),
transforms.Grayscale(),
transforms.ToTensor(),
transforms.Lambda(lambda x: x * 255),
])
)
# A dataloader that will run through all example images in one batch
dataloader = DataLoader(example_images, batch_size=10)
# Determine whether there is a compatible GPU available
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# Run the model on the example images
print('Running model on', device)
model = model.to(device)
for X, y in dataloader:
s1, c1, s2, c2 = model.get_all_layers(X.to(device))
print('[done]')
will give an error in the forward
function:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-7-a6bab15d9571>](https://localhost:8080/#) in <module>()
33 model = model.to(device)
34 for X, y in dataloader:
---> 35 s1, c1, s2, c2 = model.get_all_layers(X.to(device))
36
37 # print('Saving output of all layers to: output.pkl')
4 frames
[/gdrive/MyDrive/Colab Notebooks/data_HMAX/pytorch_hmax/hmax.py](https://localhost:8080/#) in forward(self, c1_outputs)
285 conv_output = conv_output.view(
286 -1, self.num_orientations, self.num_patches, conv_output_size,
--> 287 conv_output_size)
288
289 # Pool over orientations
RuntimeError: shape '[-1, 4, 400, 126, 126]' is invalid for input of size 203616000
*Code for that:
import torchvision.transforms.functional as F
class SquarePad:
def __call__(self, image):
max_wh = max(image.size)
p_left, p_top = [(max_wh - s) // 2 for s in image.size]
p_right, p_bottom = [max_wh - (s+pad) for s, pad in zip(image.size, [p_left, p_top])]
padding = (p_left, p_top, p_right, p_bottom)
return F.pad(image, padding, 0, 'constant')
target_image_size = (224, 224) # as an example
# now use it as the replacement of transforms.Pad class
transform=transforms.Compose([
SquarePad(),
transforms.Resize(target_image_size),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])