pytorch-caffe-models
This repo contains the original weights of some Caffe models, ported to PyTorch. Currently there are:
GoogLeNet (Going Deeper with Convolutions):
-
BVLC GoogLeNet, trained on ImageNet.
The GoogLeNet model in torchvision was trained from scratch by the PyTorch team with very different data preprocessing and has very differently scaled internal activations, which can be important when using the model as a feature extractor.
There is also a tool (dump_caffe_model.py
) to dump Caffe model weights to a more portable format (pickles of NumPy arrays), which requires Caffe and its Python 3 bindings to be installed. A script to compute validation loss and accuracy (validate.py
) is also included (the ImageNet validation set can be obtained from Academic Torrents).
Usage
Basic usage
This outputs logits for 1000 ImageNet classes for a black (zero) input image:
import pytorch_caffe_models
model, transform = pytorch_caffe_models.googlenet_bvlc()
model(transform(torch.zeros([1, 3, 224, 224])))
The original models were trained with BGR input data in the range 0-255, which had then been scaled to zero mean but not unit standard deviation. The model-specific transform returned by the pretrained model creation function expects RGB input data in the range 0-1 and it will differentiably rescale the input and convert from RGB to BGR.
Feature extraction
Using the new torchvision feature extraction utility:
from torchvision.models import feature_extraction
layer_names = feature_extraction.get_graph_node_names(model)[1]
Then pick your favorite layer (we can use inception_4c.conv_5x5
)
model.eval().requires_grad_(False)
extractor = feature_extraction.create_feature_extractor(model, {'inception_4c.conv_5x5': 'out'})
input_image = torch.randn([1, 3, 224, 224]) / 50 + 0.5
input_image.requires_grad_()
features = extractor(transform(input_image))['out']
loss = -torch.sum(features**2) / 2
loss.backward()
input_image
now has its .grad
attribute populated and you can normalize and descend this gradient for DeepDream or other feature visualization methods. (The BVLC GoogLeNet model was the most popular model used for DeepDream.)