Datasets, Transforms and Models specific to Computer Vision

Overview

torchvision

https://pepy.tech/badge/torchvision https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchvision%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v

The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.

Installation

We recommend Anaconda as Python package management system. Please refer to pytorch.org for the detail of PyTorch (torch) installation. The following is the corresponding torchvision versions and supported Python versions.

torch torchvision python
master / nightly master / nightly >=3.6
1.7.1 0.8.2 >=3.6
1.7.0 0.8.1 >=3.6
1.7.0 0.8.0 >=3.6
1.6.0 0.7.0 >=3.6
1.5.1 0.6.1 >=3.5
1.5.0 0.6.0 >=3.5
1.4.0 0.5.0 ==2.7, >=3.5, <=3.8
1.3.1 0.4.2 ==2.7, >=3.5, <=3.7
1.3.0 0.4.1 ==2.7, >=3.5, <=3.7
1.2.0 0.4.0 ==2.7, >=3.5, <=3.7
1.1.0 0.3.0 ==2.7, >=3.5, <=3.7
<=1.0.1 0.2.2 ==2.7, >=3.5, <=3.7

Anaconda:

conda install torchvision -c pytorch

pip:

pip install torchvision

From source:

python setup.py install
# or, for OSX
# MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install

In case building TorchVision from source fails, install the nightly version of PyTorch following the linked guide on the contributing page and retry the install.

By default, GPU support is built if CUDA is found and torch.cuda.is_available() is true. It's possible to force building GPU support by setting FORCE_CUDA=1 environment variable, which is useful when building a docker image.

Image Backend

Torchvision currently supports the following image backends:

  • Pillow (default)
  • Pillow-SIMD - a much faster drop-in replacement for Pillow with SIMD. If installed will be used as the default.
  • accimage - if installed can be activated by calling torchvision.set_image_backend('accimage')
  • libpng - can be installed via conda conda install libpng or any of the package managers for debian-based and RHEL-based Linux distributions.
  • libjpeg - can be installed via conda conda install jpeg or any of the package managers for debian-based and RHEL-based Linux distributions. libjpeg-turbo can be used as well.

Notes: libpng and libjpeg must be available at compilation time in order to be available. Make sure that it is available on the standard library locations, otherwise, add the include and library paths in the environment variables TORCHVISION_INCLUDE and TORCHVISION_LIBRARY, respectively.

C++ API

TorchVision also offers a C++ API that contains C++ equivalent of python models.

Installation From source:

mkdir build
cd build
# Add -DWITH_CUDA=on support for the CUDA if needed
cmake ..
make
make install

Once installed, the library can be accessed in cmake (after properly configuring CMAKE_PREFIX_PATH) via the TorchVision::TorchVision target:

find_package(TorchVision REQUIRED)
target_link_libraries(my-target PUBLIC TorchVision::TorchVision)

The TorchVision package will also automatically look for the Torch package and add it as a dependency to my-target, so make sure that it is also available to cmake via the CMAKE_PREFIX_PATH.

For an example setup, take a look at examples/cpp/hello_world.

TorchVision Operators

In order to get the torchvision operators registered with torch (eg. for the JIT), all you need to do is to ensure that you #include <torchvision/vision.h> in your project.

Documentation

You can find the API documentation on the pytorch website: https://pytorch.org/docs/stable/torchvision/index.html

Contributing

See the CONTRIBUTING file for how to help out.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.

If you're a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Comments
  • RetinaNet object detection.

    RetinaNet object detection.

    As briefly discussed in https://github.com/pytorch/vision/issues/1151 , this PR intends to serve as a discussion platform for the implementation of a RetinaNet network in torchvision.

    The code is more like a skeleton implementation of what needs to be done, showcasing the design choices made so far. Before I start working on the todo's, I would like to discuss the current design in case it is not in line with torchvision.

    The current list of todo's (also commented in the code) :

    • [x] Implement focal loss (is it already somewhere in pytorch? Couldn't find it in the documentation page).
    • [x] Use Smooth L1 loss for bbox regression, or use L1 like in Faster RCNN.
    • [x] Move some functionality for anchor matching out of rpn.RegionProposalNetwork since we can share the code.
    • [x] Implement functionality to decode bbox regression, similarly as with anchor matching, the goal is to share as much code as possible with rpn.RegionProposalNetwork.
    • [x] Train resnet50_fpn on COCO.
    • [x] Make sure it works with torchscript.
    • [x] Test with a custom additional head.

    Some design choices that might be worth discussing:

    • I decided not to inherit from GeneralizedRCNN for two reasons: it is a trivial implementation and it doesn't match with how RetinaNet works.
    • I put the compute_loss methods in the heads (RetinaNetClassificationHead / RetinaNetRegressionHead) as they are tightly correlated.
    • I made a single nn.Module to represent the RetinaNet heads, so that you should be able to add different heads by making a (sub)class like RetinaNetHead. This can be useful if you want to train other things than just classification and bbox regression. I think there is some more work required to allow a variable number of heads (with filtering the detections mainly), but I don't want to worry about that for now :). Since this was an easy thing to do I already implemented the head with that concept in mind.
    • I left num_classes to include the background, however the RetinaNet paper says they predict without a background class (so using sigmoid instead of softmax). This shouldn't be an issue I suppose, but it is worth noting. I left it like this because it is in line with the other implementations in torchvision. Personally I prefer to classify without a background class and using sigmoid mainly because it allows you to do multiclass classification, which softmax does not.
    • Currently rpn.RegionProposalNetwork is not usable in RetinaNet, and I think we shouldn't modify it to fit the use-case of RetinaNet either, but it does share a lot of the required functionality. I am thinking about how I can take some of the functionality out of rpn.RegionProposalNetwork, place it somewhere else so that both rpn.RegionProposalNetwork and RetinaNet can make use of it. The functionality is mainly the matching of predictions to ground truth and the decoding of regression values and anchors to bboxes.

    @fmassa I would love to hear your opinion.

    opened by hgaiser 78
  • C++ Models

    C++ Models

    Hi @fmassa I have implemented C++ models as discussed here. But I ran into some problems:

    • There is no kaiming initialization in C++ API of Pytorch. There is a PR for it but there hasn't been progress in the last 20 days. I have used xavier initialization right now
    • Modules in DenseNet are associated with names and are inside OrderedDict. I didn't know what to do with these so I have implemented them without names
    • I think that there is an error in inception implementation which I have reported here
    • I've implemented scipy.stats.truncnorm myself with the help of this
    • My knowledge of CMake is limited so I'm going to need a little bit of help there
    • I have partially tested the models like running random data with them and checking number of parameters to see it they match python models but I didn't find tests for python models. Shouldn't there be test units for these?

    If you can help me solve these issues I will be able to finish the code.

    opened by ShahriarRezghi 73
  • New Pillow version (7.0.0) breaks torchvision (ImportError: cannot import name 'PILLOW_VERSION' from 'PIL')

    New Pillow version (7.0.0) breaks torchvision (ImportError: cannot import name 'PILLOW_VERSION' from 'PIL')

    Hi, it looks like Pillow released version 7.0.0 so I can no longer import torchvision without getting this error: ImportError: cannot import name 'PILLOW_VERSION' from 'PIL'

    Pinning to Pillow 6.2.1 fixes the issue.

    I see that you fixed this for a future torchvision release (https://github.com/pytorch/vision/pull/1501). Do you know when this will be released? If it will be awhile, could the version of Pillow be pinned to be less than 7.0.0 in the meantime?

    Thanks.

    Versions: torch: 1.3.1 torchvision: 0.4.2 Pillow: 7.0.0

    bug module: transforms 
    opened by parsing-science 62
  • Non-Maximum Supression on the GPU

    Non-Maximum Supression on the GPU

    Is there any interest in an NMS layer that runs on the GPU for torchvision? I have one implemented; it gives a 1-2 order of magnitude speedup over a naive version composed from pytorch ops. Would be happy to contribute it if anyone's interested.

    enhancement help wanted 
    opened by ghost 61
  • [JIT] Not supported for maskrcnn_resnet50_fpn

    [JIT] Not supported for maskrcnn_resnet50_fpn

    I am trying to accelerate the maskrcnn_resnet50_fpn pretrained model using JIT tracing provided by pytorch. It appears that some operations present in this model are not supported by pytorch JIT.

    Are these models supposed to have JIT support officially? If not, would you be able to provide advice for a workaround?

    To replicate, running:

    import torch
    import torchvision
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
    model.eval()
    traced_net = torch.jit.trace(model, torch.rand(1, 3,800, 800))
    

    produces

    RuntimeError: log2_vml_cpu not implemented for 'Long

    Thank you.

    enhancement module: models topic: object detection 
    opened by rbrigden 59
  • Make maskrcnn scriptable

    Make maskrcnn scriptable

    Make Maskrcnn scriptable.

    Working off of https://github.com/pytorch/vision/pull/1333.

    A couple issues:

    • IntermediateLayerGetter expects a Dict[str, str] for layers but resnet_fpn_backbone has layers specified as Dict[str, int]: return_layers = {'layer1': 0, 'layer2': 1, 'layer3': 2, 'layer4': 3}. when i tried changing it something broke.

    • a few cleanups left to do However this almost scripts and doesn't break the expects tests so I think it's close...

    opened by eellison 54
  • Downloading MNIST dataset with torchvision gives HTTP Error 403

    Downloading MNIST dataset with torchvision gives HTTP Error 403

    🐛 Bug

    I'm getting a 403 error when I try to download MNIST dataset with torchvision 0.4.2.

    To Reproduce

    ../.local/lib/python3.6/site-packages/torchvision/datasets/mnist.py:68: in __init__
        self.download()
    ../.local/lib/python3.6/site-packages/torchvision/datasets/mnist.py:135: in download
        download_and_extract_archive(url, download_root=self.raw_folder, filename=filename)
    ../.local/lib/python3.6/site-packages/torchvision/datasets/utils.py:248: in download_and_extract_archive
        download_url(url, download_root, filename, md5)
    ../.local/lib/python3.6/site-packages/torchvision/datasets/utils.py:96: in download_url
        raise e
    ../.local/lib/python3.6/site-packages/torchvision/datasets/utils.py:84: in download_url
        reporthook=gen_bar_updater()
    /usr/local/lib/python3.6/urllib/request.py:248: in urlretrieve
        with contextlib.closing(urlopen(url, data)) as fp:
    /usr/local/lib/python3.6/urllib/request.py:223: in urlopen
        return opener.open(url, data, timeout)
    /usr/local/lib/python3.6/urllib/request.py:532: in open
        response = meth(req, response)
    /usr/local/lib/python3.6/urllib/request.py:642: in http_response
        'http', request, response, code, msg, hdrs)
    /usr/local/lib/python3.6/urllib/request.py:570: in error
        return self._call_chain(*args)
    /usr/local/lib/python3.6/urllib/request.py:504: in _call_chain
        result = func(*args)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    self = <urllib.request.HTTPDefaultErrorHandler object at 0x7efbf9edaac8>
    req = <urllib.request.Request object at 0x7efbf9eda8d0>
    fp = <http.client.HTTPResponse object at 0x7efbf9edaf98>, code = 403
    msg = 'Forbidden', hdrs = <http.client.HTTPMessage object at 0x7efbf9ea22b0>
    
        def http_error_default(self, req, fp, code, msg, hdrs):
    >       raise HTTPError(req.full_url, code, msg, hdrs, fp)
    E       urllib.error.HTTPError: HTTP Error 403: Forbidden
    

    Environment

    • torch==1.3.1
    • torchvision==0.4.2

    Additional context

    https://app.circleci.com/jobs/github/PyTorchLightning/pytorch-lightning/6877

    bug help wanted module: datasets 
    opened by Borda 47
  • [feature request] ROI Pooling layers

    [feature request] ROI Pooling layers

    It would be great to have support for various ROI Pooling operations as easy to add layers to facilitate research in object detection and semantic/instance segmentation.

    Here is a live checklist:

    • [x] ROI Pooling #592 #632
    • [ ] Position Specific ROI Pooling
    • [x] ROI Align #630

    General PRs: #626

    opened by varunagrawal 45
  • Problems training Faster-RCNN from pretrained backbone

    Problems training Faster-RCNN from pretrained backbone

    Is there any recommendation to train Faster-RCNN starting from the pretrained backbone? I'm using VOC 2007 dataset and I'm able to do transfer learning starting from:

    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes=21)
    

    Using COCO pretrained 'fasterrcnn_resnet50_fpn' i'm able to obtain an mAP of 79% on VOC 2007 test set. Problems arise when i try to train from scratch using only the pretrained backbone:

    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False)
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes=21)
    

    I have been trying to train this model for weeks but the highest mAP i got was 63% (again on test set).

    Now, i know that training from scratch is harder, but i really would like to know how to set the training parameters to obtain a decent accuracy, in the future i may want to change the backbone and chances are that i will be not able to find a pretrained faster-rcnn on which i can do transfer learning.

    question module: reference scripts topic: object detection 
    opened by lpuglia 44
  • Add MobileNet V2

    Add MobileNet V2

    This PR adds support for MobileNetV2.

    It's been heavily based on the implementation from https://github.com/pytorch/vision/issues/625 by @tonylins

    I'm currently training the model from scratch with a custom script, and I'll be uploading the weights (together with the training hyperparameters) once training finishes and I match reported accuracies.

    opened by fmassa 44
  • ops.deform_conv2d causes CUDA illegal memory access

    ops.deform_conv2d causes CUDA illegal memory access

    🐛 Bug

    I try to test the speed of deformable conv2d. But always encountered memory error.

    To Reproduce

    $ ipython
    Python 3.8.5 (default, Jul 27 2020, 08:42:51) 
    Type 'copyright', 'credits' or 'license' for more information
    IPython 7.17.0 -- An enhanced Interactive Python. Type '?' for help.
    
    In [1]: import torch
       ...: import torchvision as tv
       ...: weight = torch.randn(9,9,3,3).cuda()
       ...: weight.requires_grad = True
       ...: img = torch.randn(8, 9, 1000, 110).cuda()
       ...: def test():
       ...:     offset = torch.randn(8,18,1000,110).cuda()
       ...:     out = tv.ops.deform_conv2d(img, offset, weight, padding=1)
       ...:     out.mean().backward()
       ...: 
    
    In [2]: import os
    
    In [3]: os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
    
    In [4]: timeit test()
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    <ipython-input-4-a1086d7a4706> in <module>
    ----> 1 get_ipython().run_line_magic('timeit', 'test()')
    
    /usr/lib/python3.8/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
       2324                 kwargs['local_ns'] = self.get_local_scope(stack_depth)
       2325             with self.builtin_trap:
    -> 2326                 result = fn(*args, **kwargs)
       2327             return result
       2328 
    
    <decorator-gen-60> in timeit(self, line, cell, local_ns)
    
    /usr/lib/python3.8/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
        185     # but it's overkill for just that one bit of state.
        186     def magic_deco(arg):
    --> 187         call = lambda f, *a, **k: f(*a, **k)
        188 
        189         if callable(arg):
    
    /usr/lib/python3.8/site-packages/IPython/core/magics/execution.py in timeit(self, line, cell, local_ns)
       1171                     break
       1172 
    -> 1173         all_runs = timer.repeat(repeat, number)
       1174         best = min(all_runs) / number
       1175         worst = max(all_runs) / number
    
    /usr/lib/python3.8/timeit.py in repeat(self, repeat, number)
        203         r = []
        204         for i in range(repeat):
    --> 205             t = self.timeit(number)
        206             r.append(t)
        207         return r
    
    /usr/lib/python3.8/site-packages/IPython/core/magics/execution.py in timeit(self, number)
        167         gc.disable()
        168         try:
    --> 169             timing = self.inner(it, self.timer)
        170         finally:
        171             if gcold:
    
    <magic-timeit> in inner(_it, _timer)
    
    <ipython-input-1-a97200bb984a> in test()
          5 img = torch.randn(8, 9, 1000, 110).cuda()
          6 def test():
    ----> 7     offset = torch.randn(8,18,1000,110).cuda()
          8     out = tv.ops.deform_conv2d(img, offset, weight, padding=1)
          9     out.mean().backward()
    
    RuntimeError: CUDA error: an illegal memory access was encountered
    
    In [5]: 
    

    Environment

    PyTorch version: 1.6.0 Is debug build: False CUDA used to build PyTorch: 11.0

    OS: Arch Linux (x86_64) GCC version: (GCC) 10.1.0 Clang version: 10.0.1 CMake version: version 3.18.1

    Python version: 3.8 (64-bit runtime) Is CUDA available: True CUDA runtime version: 11.0.2 GPU models and configuration: GPU 0: GeForce GTX 1050 Ti Nvidia driver version: 450.57 cuDNN version: Probably one of the following: /usr/lib/libcudnn.so.8.0.2 /usr/lib/libcudnn_adv_infer.so.8.0.2 /usr/lib/libcudnn_adv_train.so.8.0.2 /usr/lib/libcudnn_cnn_infer.so.8.0.2 /usr/lib/libcudnn_cnn_train.so.8.0.2 /usr/lib/libcudnn_ops_infer.so.8.0.2 /usr/lib/libcudnn_ops_train.so.8.0.2

    Versions of relevant libraries: [pip3] numpy==1.19.1 [pip3] torch==1.6.0 [pip3] torch-cluster==1.4.5 [pip3] torch-geometric==1.3.2 [pip3] torch-scatter==1.4.0 [pip3] torch-sparse==0.4.3 [pip3] torchvision==0.7.0a0 [conda] Could not collect

    module: ops triage review high priority 
    opened by godspeed1989 40
  • [ONNX] Fix dtype for NonMaxSuppression; misc improvements

    [ONNX] Fix dtype for NonMaxSuppression; misc improvements

    Explicitly cast inputs to NonMaxSuppression to float32 to accommodate float64 inputs because NonMaxSuppression only supports float32 coordinates. This is necessary to unblock https://github.com/pytorch/pytorch/issues/78442 and https://github.com/pytorch/pytorch/pull/86146.

    Other misc improvements include

    • Removed the use of deprecated _cast_Long.
    • Updated constant naming to upper case.
    • Moved symbolic functions to the module scope and cleaned up imports for symplicity.

    cc @BowenBao @mruberry

    cla signed 
    opened by justinchuby 1
  • Torchvision nightly version unavailable for Python 3.8 Linux from December 23

    Torchvision nightly version unavailable for Python 3.8 Linux from December 23

    🐛 Describe the bug

    Torchvision nightly 0.15.0 had a Python 3.8 for linux released daily till 12/22, but has not been released since then. It can be checked on the pytorch nighly link(https://download.pytorch.org/whl/nightly/cu116/torch_nightly.html). Is there any plan to make this package available for Python 3.8 for Linux?

    To Reproduce:

    1. Go to https://download.pytorch.org/whl/nightly/cu116/torch_nightly.html
    2. Search for torchvision-0.15.0.dev20221226
    3. You will not find linux_x86_64.whl files for Python 3.8 (only Python 3.10 is available for Linux right now).

    Expected behavior These files should be available: torchvision-0.15.0.dev20221226%2Bcu116-cp38-cp38-linux_x86_64.whl

    Versions

    Output from a slightly old image (20221213) for building nightly torch from source

    Collecting environment information... PyTorch version: 1.14.0.dev20221213+cu116 Is debug build: False CUDA used to build PyTorch: 11.6 ROCM used to build PyTorch: N/A

    OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.23.0 Libc version: glibc-2.31

    Python version: 3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.4.0-135-generic-x86_64-with-glibc2.17 Is CUDA available: True CUDA runtime version: 11.6.124 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1070 GPU 1: NVIDIA GeForce GTX 1070

    Nvidia driver version: 510.108.03 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

    Versions of relevant libraries: [pip3] numpy==1.22.2 [pip3] pytorch-lightning==1.6.3 [pip3] torch==1.14.0.dev20221213+cu116 [pip3] torch-nebula==0.15.9 [pip3] torch-ort==1.14.0.dev20221213 [pip3] torchmetrics==0.7.1 [pip3] torchvision==0.15.0.dev20221213+cu116 [conda] magma-cuda116 2.6.1 1 pytorch [conda] mkl 2021.4.0 pypi_0 pypi [conda] mkl-include 2021.4.0 pypi_0 pypi [conda] numpy 1.22.2 pypi_0 pypi [conda] pytorch-lightning 1.6.3 pypi_0 pypi [conda] torch 1.14.0.dev20221213+cu116 pypi_0 pypi [conda] torch-nebula 0.15.9 pypi_0 pypi [conda] torch-ort 1.14.0.dev20221213 pypi_0 pypi [conda] torchmetrics 0.7.1 pypi_0 pypi [conda] torchvision 0.15.0.dev20221213+cu116 pypi_0 pypi

    opened by ajindal1 0
  • Be able to write tensors as video to a buffer

    Be able to write tensors as video to a buffer

    🚀 The feature

    Right now, you can only use write_video to write a video to disk. There could be an option to return a bytes object instead.

    Motivation, pitch

    When doing data processing, you often want to augment some data and then store it back in a .parquet file (for example).

    Alternatives

    No response

    Additional context

    No response

    opened by vedantroy 0
  • [NOMRG] experiment for 7050

    [NOMRG] experiment for 7050

    Experiment PR for #7050 , in here we will modify the .circleci and .github script in order to filter most tests and only run the tests we interested in to debug. In this way, we can have a much faster iteration for the testing.

    cla signed 
    opened by YosuaMichael 0
  • request: CUDA 11.8 builds

    request: CUDA 11.8 builds

    🚀 The feature

    CUDA 11.8 adds a number of new features, including support for the new Ada and Hopper architectures. Much like torch has done, torchvision should provide binaries built with CUDA 11.8 to enable use of these new architectures.

    Motivation, pitch

    Given the recent addition of CUDA 11.8 workflows to PyTorch (https://github.com/pytorch/pytorch/pull/90826), it would be nice to have matching binary releases for torchvision.

    Alternatives

    Building from source?

    Additional context

    I do not know the project goals, timelines, or the effort involved in adding a workflow to generate CUDA 11.8 builds, so please feel free to close as appropriate!

    Thank you all for your work :)

    opened by ConnorBaker 1
  • [WIP] Test using real image few flaky models

    [WIP] Test using real image few flaky models

    Recently on the GHA GPU Machine, we found some model test get precision error (see logs), note that this test is on main checked on this PR: https://github.com/pytorch/vision/pull/5009, and only the GPU tests that are failing but the CPU test is okay (this suggest there are some different randomness or precision on GPU machine now).

    This PR try to fix by reducing the random sources and use real image for the affected model.

    cla signed 
    opened by YosuaMichael 0
Releases(v0.14.1)
  • v0.14.1(Dec 16, 2022)

  • v0.14.0(Oct 28, 2022)

    Highlights

    [BETA] New Model Registration API

    Following up on the multi-weight support API that was released on the previous version, we have added a new model registration API to help users retrieve models and weights. There are now 4 new methods under the torchvision.models module: get_model, get_model_weights, get_weight, and list_models. Here are examples of how we can use them:

    import torchvision
    from torchvision.models import get_model, get_model_weights, list_models
    
    
    max_params = 5000000
    
    tiny_models = []
    for model_name in list_models(module=torchvision.models):
        weights_enum = get_model_weights(model_name)
        if len([w for w in weights_enum if w.meta["num_params"] <= max_params]) > 0:
            tiny_models.append(model_name)
    
    print(tiny_models)
    # ['mnasnet0_5', 'mnasnet0_75', 'mnasnet1_0', 'mobilenet_v2', ...]
    
    model = get_model(tiny_models[0], weights="DEFAULT")
    print(sum(x.numel() for x in model.state_dict().values()))
    # 2239188
    
    

    As of now, this API is still beta and there might be changes in the future in order to improve its usability based on your feedback.

    New Architecture and Model Variants

    Classification Models

    We’ve added the Swin Transformer V2 architecture along with pre-trained weights for its tiny/small/base variants. In addition, we have added support for the MaxViT transformer. Here is an example on how to use the models:

    import torch
    from torchvision.models import *
    
    image = torch.rand(1, 3, 224, 224)
    model = swin_v2_t(weights="DEFAULT").eval()
    # model = maxvit_t(weights="DEFAULT").eval()
    prediction = model(image)
    

    Here is the table showing the accuracy of the models tested on ImageNet1K dataset.

    Model Acc@1 Acc@1

    change over V1

    Acc@5 Acc@5

    change over V1

    swin_v2_t

    82.072

      +0.598

    96.132

      +0.356

    swin_v2_s

    83.712

      +0.516

    96.816

      +0.456

    swin_v2_b

    84.112

      +0.530

    96.864

      +0.224

    maxvit_t

    83.700

    -

    96.722

    -

    We would like to thank Ren Pang and Teodor Poncu for contributing the 2 models to torchvision.

    [BETA] Video Classification Model

    We added two new video classification models, MViT and S3D. MViT is a state of the art video classification transformer model which has 80.757% accuracy on Kinetics400 dataset, while S3D is a relatively small model with good accuracy for its size. These models can be used as follows:

    import torch
    from torchvision.models.video import *
    
    video = torch.rand(3, 32, 800, 600)
    model = mvit_v2_s(weights="DEFAULT")
    # model = s3d(weights="DEFAULT")
    model.eval()
    prediction = model(images)
    

    Here is the table showing the accuracy of the new video classification models tested in the Kinetics400 dataset.

    Model Acc@1 Acc@5
    mvit_v1_b

    81.474

    95.776

    mvit_v2_s

    83.196

    96.36

    s3d

    83.582

    96.64

    We would like to thank Haoqi Fan, Yanghao Li, Christoph Feichtenhofer and Wan-Yen Lo for their work on PyTorchVideo and their support during the development of the MViT model. We would like to thank Sophia Zhi for her contribution implementing the S3D model in torchvision.

    New Primitives & Augmentations

    In this release we’ve added the SimpleCopyPaste augmentation in our reference scripts and we up-streamed the PolynomialLR scheduler to PyTorch Core. We would like to thank Lezwon Castelino and Federico Pozzi for their contributions. We are continuing our efforts to modernize TorchVision by adding more SoTA primitives, Augmentations and architectures with the help of our community. If you are interested in contributing, have a look at the following issue.

    Upcoming Prototype APIs

    We are currently working on extending our existing Transforms and Functional API to provide native support for Video, Object Detection, Semantic and Instance Segmentation. This will enable us to offer better support to the existing Computer Vision tasks and make importable from the TorchVision binary SoTA augmentations such as MixUp, CutMix, Large Scale Jitter and SimpleCopyPaste. The API is still under development and thus was not included in the release but you can read more about it on our blogpost and provide your feedback on the dedicated Github issue.

    Backward Incompatible Changes

    We’ve removed some APIs that have been deprecated since version 0.12 (or before). Here is the list of things that we removed and their replacement:

    • The Kinetics400 class has been removed. Users must now use the newer Kinetics class which is a direct replacement.
    • The class _DeprecatedConvBNAct, ConvBNReLU, and ConvBNActivation were removed from torchvision.models.mobilenetv2 and are replaced with the more generic Conv2dNormActivation class.
    • The torchvision.models.mobilenetv3.SqueezeExcitation has been removed in favor of torchvision.ops.SqueezeExcitation.
    • The class methods convert_to_roi_format, infer_scale, setup_scales from torchvision.ops.MultiScaleRoiAlign have been removed.
    • We have removed the resample and fillcolor parameters from the Transforms API. They have been replaced with interpolation and fill respectively.
    • We’ve removed the range parameter from torchvision.utils.make_grid as it was replaced by the value_range parameter to avoid shadowing the Python built-in method.

    Detailed Changes (PRs)

    Deprecations

    [models] Remove cpp model in v0.14 due to deprecation (#6632) [utils, ops, transforms, models, datasets] Remove deprecated APIs for 0.14 (#6258)

    New Features

    [datasets] Add various Stereo Matching datasets (#6345, #6346, #6311, #6347, #6349, #6348, #6350, #6351) [models] Add the S3D architecture to TorchVision (#6412, #6537) [models] add crestereo implementation (#6310, #6629) [models] MaxVit model (#6342) [models] Make get_model_builder public (#6560) [models] Add registration mechanism for models (#6333, #6369) [models] Add MViT architecture in TorchVision for both V1 and V2 (#6198, #6373) [models] Add SwinV2 mode variant (#6246, #6266) [reference scripts] Add stereo matching reference scripts (#6549, #6554, #6605) [transforms] Added elastic transform in torchvision.transforms (#4938) [build] Add M1 binary builds (#5948, #6135, #6140, #6110, #6132, #6324, #6122, #6409)

    Improvements

    [build] Various torchvision binary build improvements (#6396, #6201, #6230, #6199) [build] Install NVJPEG on Windows for 11.6 and 11.7 CUDA (#6578) [models] Change weights return type to Mapping in models api (#6097) [models] Vectorize box decoding and encoding in FCOS (#6203, #6278) [ci] Add CUDA 11.7 builds (#6425) [ci] Various CI improvements (#6590, #6290, #6170, #6218) [documentation] Various documentations improvements (#6276, #6163, #6450, #6294, #6572, #6176, #6340, #6314, #6427, #6536, #6215, #6150) [documentation] Add new .. betastatus:: directive and document Beta APIs (#6115) [hub] Expose on Hub the public methods of the registration API (#6364) [io, documentation] DOC: add limitation of decode_jpeg in the function docstring (#6637) [models] Make the assert message more verbose in vision transformer (#6583) [ops] Generalize ConvNormActivation function to accept tuple for some parameters (#6251) [reference scripts] Update the dataset cache to factor input parameters (#6234) [reference scripts] Adding video level accuracy for video_classification reference script (#6241) [reference scripts] refactor: replace LambdaLR with PolynomialLR in segmentation training script (#6405, #6436) [reference scripts, documentation] Introduce resize params, fix lr estimation, update docs. (#6444) [reference scripts, transforms] Add SimpleCopyPaste augmentation (#5825) [rocm, ci] Update to rocm5.2 wheels (#6571) [tests] Various tests improvements (#6601, #6380, #6497, #6248, #6660, #6027, #6226, #6594, #6747, #6272) [tests] Skip big models on CI tests (#6539, #6197, #6573) [transforms] Added antialias arg to resized crop transform and op (#6193) [transforms] Refactored and modified private api for resize functional op (#6191) [utils] Throw ValueError in draw bounding boxes for invalid boxes (#6123) [utils] Extend _log_api_usage_once to work for overwritten classes (#6237) [video] Add more logging information for decoder (#6108) [video] [FBcode->GH] Handle images with AV_PIX_FMT_PAL8 pixel format in decoder callback (#6359) [io] Add an option to skip packets with empty data (#6442) [datasets] Put back CelebA download (#6147) [datasets, tests] Update link to download SBU dataset. Enable the test again (#6638)

    Bug Fixes

    [build] Set MACOSX_DEPLOYMENT_TARGET=10.9 for binary jobs (#6298) [ci] Fixing issue with setup_env.sh in docker (#6106) [datasets] swap MD5 checksums of PCAM val and test split (#6644) [documentation] fix example galleries in documentation (#6701) [hub] Add missing resnext101_64x4d to hubconf.py (#6228) [io] Fix out-of-bounds read in decode_png (#6456) [models] Fix swapped width and height in DefaultBoxGenerator (#6551) [models] Fix the error message of _ovewrite_value_param (#6585) [models] Add missing handle_legacy_interface() calls (#6565) [models] Fix resnet model by checking if norm_layer weight is None before init (#6082) [models] Adding _log_api_usage_once to Swin's reusable components. (#6174) [models] Move out the pad operation from PatchMerging in swin transformer to make it fx compatible (#6252) [models] Add missing _version to the MLPBlock (#6113) [ops] Fix d/c IoU for different batch sizes (#6338) [ops] update roipool to make it torch fx traceable (#6501) [ops] Fix typing jit issue on RoIPool and RoIAlign (#6397) [reference scripts] Fix copypaste collate pickle issues (#6181) [reference scripts] Remove the unused/buggy --train-center-crop flag from Classification preset (#6642) [tests] Add .float() before .mean() on test_backbone_utils.py because .mean() dont accept integer dtype (#6090) [transforms] Update pil_constants.py (#6154) [transforms] Fixed issue with F.crop when cropping outside the input image (#6615) [transforms] Bugfix for accimage test on functional_pil.resize image (#6208) [transforms] Fixed error condition in RandomCrop (#6548) [video] [FBcode->GH] Move func calls outside of *CHECK* in io decoder (#6357) [video] [bugfix] Fix the output format for VideoClips.subset (#6700) (#6706) [video] fix bug in output format for pyav (#6672) (#6703) [ci] Fix for cygpath windows issue (#6513) [ops] Replaced CHECK_ by TORCH_CHECK_ (#6322) [build] Fix typo in GHA nightly build condition (#6158)

    Code Quality

    [ci, test] Improvements on CI and test code quality (#6413, #6303, #6652, #6360, #6493, #6146, #6593, #6297, #6678, #6389) [ci] Upgrade usort to 1.0.2 and black to 22.3.0 (#5106) [reference scripts] [FBcode->GH] Rename asset files to remove spaces. (#6666) [build] fix submodule imports by importing functions directly (#6188) [datasets] Simplify _check_integrity for cifar and stl10 (#6335) [datasets] Moved pfm file reading into dataset utils (#6270) [documentation] Docs: build with Sphinx 5 (#5121) [models] Typo fix in comment in mvit.py (#6618) [models] cleanup for box encoding and decoding in FCOS (#6277) [ops] Remove AffineQuantizer.h from qnms_kernel.cpp (#6141) [reference scripts] Type fix in transformers.py (#6376) [transforms] Fix typo in error message (#6291) [transforms] Update typehint for fill arg in rotate (#6594) [io] Free avPacket on EAGAIN decoder error (#6432) (#6443) [android] [pytorch] Bump SoLoader version to 0.10.4 (#81946) (#6327) [transforms, reference script] port FixedSizeCrop from detection references to prototype transforms (#6417) [models, transforms] Update the expected removal date for several deprecated API for release v0.14 (#6654) [tests] Replace torch.utils.data.graph.traverse with traverse_dps (#6657) [build] Replacing cudatoolkit by cuda for 11.6 (#5996) [ops] [FBcode->GH] [quant][core][better-engineering] Rename files in quantized directory… (#6133) [build] [BE] Unify version computation (#6117) [models] Refactor swin transfomer so later we can reuse component for 3d version (#6088) [models] [FBcode->GH] Fix vit model assert message to be compatible with torchmultimodal test (#6592)

    Contributors

    We're grateful for our community, which helps us improving torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

    Abhijit Deo, Adam J. Stewart, Aditya Oke, Alexander Jipa, Ambuj Pawar, Andrey Talman, dzdang, Edward Wang (EcoF), Eli Uriegas, Erjia Guan, Federico Pozzi, inisis, Jithun Nair, Joao Gomes, Karan Desai, Kevin Tse, Lenz, Lezwon Castelino, Mayanand, Nicolas Granger, Nicolas Hug, Nikita Shulga, Oleksandr Voietsa, Philip Meier, Ponku, ptrblck, Sergii Dymchenko, Sergiy Bilobrov, Shantanu, Sim Sun, Sophia Zhi, Tinson Lai, Vasilis Vryniotis, vcarpani, vcwai, vfdev-5, Yakhyokhuja Valikhujaev, Yosua Michael Maranatha, Zachariah Carmichael, キツネさん

    Source code(tar.gz)
    Source code(zip)
  • v0.13.1(Aug 5, 2022)

    This minor release bumps the pinned PyTorch version to v1.12.1 and contains some minor bug fixes.

    Highlights

    Bug Fixes

    • Small Patch SwinTransformer for FX compatibility https://github.com/pytorch/vision/pull/6252
    • Indicate strings can be used to specify weights parameter https://github.com/pytorch/vision/pull/6314
    • Fix d/c IoU for different batch sizes https://github.com/pytorch/vision/pull/6338
    Source code(tar.gz)
    Source code(zip)
  • v0.13.0(Jun 28, 2022)

    Highlights

    Models

    Multi-weight support API

    TorchVision v0.13 offers a new Multi-weight support API for loading different weights to the existing model builder methods:

    from torchvision.models import *
    
    # Old weights with accuracy 76.130%
    resnet50(weights=ResNet50_Weights.IMAGENET1K_V1)
    
    # New weights with accuracy 80.858%
    resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
    
    # Best available weights (currently alias for IMAGENET1K_V2)
    # Note that these weights may change across versions
    resnet50(weights=ResNet50_Weights.DEFAULT)
    
    # Strings are also supported
    resnet50(weights="IMAGENET1K_V2")
    
    # No weights - random initialization
    resnet50(weights=None)
    

    The new API bundles along with the weights important details such as the preprocessing transforms and meta-data such as labels. Here is how to make the most out of it:

    from torchvision.io import read_image
    from torchvision.models import resnet50, ResNet50_Weights
    
    img = read_image("test/assets/encode_jpeg/grace_hopper_517x606.jpg")
    
    # Step 1: Initialize model with the best available weights
    weights = ResNet50_Weights.DEFAULT
    model = resnet50(weights=weights)
    model.eval()
    
    # Step 2: Initialize the inference transforms
    preprocess = weights.transforms()
    
    # Step 3: Apply inference preprocessing transforms
    batch = preprocess(img).unsqueeze(0)
    
    # Step 4: Use the model and print the predicted category
    prediction = model(batch).squeeze(0).softmax(0)
    class_id = prediction.argmax().item()
    score = prediction[class_id].item()
    category_name = weights.meta["categories"][class_id]
    print(f"{category_name}: {100 * score:.1f}%")
    

    You can read more about the new API in the docs. To provide your feedback, please use this dedicated Github issue.

    New architectures and model variants

    Classification

    The Swin Transformer and EfficienetNetV2 are two popular classification models which are often used for downstream vision tasks. This release includes 6 pre-trained weights for their classification variants. Here is how to use the new models:

    import torch
    from torchvision.models import *
    
    image = torch.rand(1, 3, 224, 224)
    model = swin_t(weights="DEFAULT").eval()
    prediction = model(image)
    
    image = torch.rand(1, 3, 384, 384)
    model = efficientnet_v2_s(weights="DEFAULT").eval()
    prediction = model(image)
    

    In addition to the above, we also provide new variants for existing architectures such as ShuffleNetV2, ResNeXt and MNASNet. The accuracies of all the new pre-trained models obtained on ImageNet-1K are seen below:

    Model | Acc@1 | Acc@5 -- | -- | -- swin_t | 81.474 | 95.776 swin_s | 83.196 | 96.36 swin_b | 83.582 | 96.64 efficientnet_v2_s | 84.228 | 96.878 efficientnet_v2_m | 85.112 | 97.156 efficientnet_v2_l | 85.808 | 97.788 resnext101_64x4d | 83.246 | 96.454 resnext101_64x4d (quantized) | 82.898 | 96.326 shufflenet_v2_x1_5 | 72.996 | 91.086 shufflenet_v2_x1_5 (quantized) | 72.052 | 90.700 shufflenet_v2_x2_0 | 76.230 | 93.006 shufflenet_v2_x2_0 (quantized) | 75.354 | 92.488 mnasnet0_75 | 71.180 | 90.496 mnas1_3 | 76.506 | 93.522

    We would like to thank Hu Ye for contributing to TorchVision the Swin Transformer implementation.

    [BETA] Object Detection and Instance Segmentation

    We have introduced 3 new model variants for RetinaNet, FasterRCNN and MaskRCNN that include several post-paper architectural optimizations and improved training recipes. All models can be used similarly:

    import torch
    from torchvision.models.detection import *
    
    images = [torch.rand(3, 800, 600)]
    model = retinanet_resnet50_fpn_v2(weights="DEFAULT")
    # model = fasterrcnn_resnet50_fpn_v2(weights="DEFAULT")
    # model = maskrcnn_resnet50_fpn_v2(weights="DEFAULT")
    model.eval()
    prediction = model(images)
    

    Below we present the metrics of the new variants on COCO val2017. In parenthesis we denote the improvement over the old variants:

    Model | Box mAP | Mask mAP -- | -- | -- retinanet_resnet50_fpn_v2 | 41.5 (+5.1) | - fasterrcnn_resnet50_fpn_v2 | 46.7 (+9.7) | - maskrcnn_resnet50_fpn_v2 | 47.4 (+9.5) | 41.8 (+7.2)

    We would like to thank Ross Girshick, Piotr Dollar, Vaibhav Aggarwal, Francisco Massa and Hu Ye for their past research and contributions to this work.

    New pre-trained weights

    SWAG weights

    The ViT and RegNet model variants offer new pre-trained SWAG (Supervised Weakly from hashtAGs) weights. One of the biggest of these models achieves a whopping 88.6% accuracy on ImageNet-1K. We currently offer two versions of the weights: 1) fine-tuned end-to-end weights on ImageNet-1K (highest accuracy) and 2) frozen trunk weights with a linear classifier fit on ImageNet-1K (great for transfer learning). Below we see the detailed accuracies of each model variant:

    Model Weights | Acc@1 | Acc@5 -- | -- | -- RegNet_Y_16GF_Weights.IMAGENET1K_SWAG_E2E_V1 | 86.012 | 98.054 RegNet_Y_16GF_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 83.976 | 97.244 RegNet_Y_32GF_Weights.IMAGENET1K_SWAG_E2E_V1 | 86.838 | 98.362 RegNet_Y_32GF_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 84.622 | 97.48 RegNet_Y_128GF_Weights.IMAGENET1K_SWAG_E2E_V1 | 88.228 | 98.682 RegNet_Y_128GF_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 86.068 | 97.844 ViT_B_16_Weights.IMAGENET1K_SWAG_E2E_V1 | 85.304 | 97.65 ViT_B_16_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 81.886 | 96.18 ViT_L_16_Weights.IMAGENET1K_SWAG_E2E_V1 | 88.064 | 98.512 ViT_L_16_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 85.146 | 97.422 ViT_H_14_Weights.IMAGENET1K_SWAG_E2E_V1 | 88.552 | 98.694 ViT_H_14_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 85.708 | 97.73

    The weights can be loaded normally as follows:

    from torchvision.models import *
    
    model1 = vit_h_14(weights="IMAGENET1K_SWAG_E2E_V1")
    model2 = vit_h_14(weights="IMAGENET1K_SWAG_LINEAR_V1")
    

    The SWAG weights are released under the Attribution-NonCommercial 4.0 International license. We would like to thank Laura Gustafson, Mannat Singh and Aaron Adcock for their work and support in making the weights available to TorchVision.

    Model Refresh

    The release of the Multi-weight support API enabled us to refresh the most popular models and offer more accurate weights. We improved on average each model by ~3 points. The new recipe used was learned on top of ResNet50 and its details were covered on a previous blogpost.

    Model | Old weights | New weights -- | -- | -- efficientnet_b1 | 78.642 | 79.838 mobilenet_v2 | 71.878 | 72.154 mobilenet_v3_large | 74.042 | 75.274 regnet_y_400mf | 74.046 | 75.804 regnet_y_800mf | 76.42 | 78.828 regnet_y_1_6gf | 77.95 | 80.876 regnet_y_3_2gf | 78.948 | 81.982 regnet_y_8gf | 80.032 | 82.828 regnet_y_16gf | 80.424 | 82.886 regnet_y_32gf | 80.878 | 83.368 regnet_x_400mf | 72.834 | 74.864 regnet_x_800mf | 75.212 | 77.522 regnet_x_1_6gf | 77.04 | 79.668 regnet_x_3_2gf | 78.364 | 81.196 regnet_x_8gf | 79.344 | 81.682 regnet_x_16gf | 80.058 | 82.716 regnet_x_32gf | 80.622 | 83.014 resnet50 | 76.13 | 80.858 resnet50 (quantized) | 75.92 | 80.282 resnet101 | 77.374 | 81.886 resnet152 | 78.312 | 82.284 resnext50_32x4d | 77.618 | 81.198 resnext101_32x8d | 79.312 | 82.834 resnext101_32x8d (quantized) | 78.986 | 82.574 wide_resnet50_2 | 78.468 | 81.602 wide_resnet101_2 | 78.848 | 82.51

    We would like to thank Piotr Dollar, Mannat Singh and Hugo Touvron for their past research and contributions to this work.

    Ops and Transforms

    New Augmentations, Layers and Losses

    This release brings a bunch of new primitives which can be used to produce SOTA models. Some highlights include the addition of AugMix data-augmentation method, the DropBlock layer, the cIoU/dIoU loss and many more. We would like to thank Aditya Oke, Abhijit Deo, Yassine Alouini and Hu Ye for contributing to the project and for helping us maintain TorchVision relevant and fresh.

    Documentation

    We completely revamped our models documentation to make them easier to browse, and added various key information such as supported image sizes, or image pre-processing steps of pre-trained weights. We now have a main model page with various summary tables of available weights, and each model has a dedicated page. Each model builder is also documented in their own page, with more details about the available weights, including accuracy, minimal image size, link to training recipes, and other valuable info. For comparison, our previous models docs are here. To provide feedback on the new documentation, please use the dedicated Github issue.

    Backward-incompatible changes

    The new Multi-weight support API replaced the legacy “pretrained” parameter of model builders. Both solutions are currently supported to maintain backwards compatibility but our intention is to remove the deprecated API in 2 versions. Migrating to the new API is very straightforward. The following method calls between the 2 APIs are all equivalent:

    from torchvision.models import resnet50, ResNet50_Weights
    
    # Using pretrained weights:
    resnet50(weights=ResNet50_Weights.IMAGENET1K_V1)
    resnet50(weights="IMAGENET1K_V1")
    resnet50(pretrained=True)  # deprecated
    resnet50(True)  # deprecated
    
    # Using no weights:
    resnet50(weights=None)
    resnet50()
    resnet50(pretrained=False)  # deprecated
    resnet50(False)  # deprecated
    

    Deprecations

    [models, models.quantization] Reinstate and deprecate model_urls and quant_model_urls (#5992) [transforms] Deprecate int as interpolation argument type (#5974)

    New Features

    [models] New Multi-weight API support (#5618, #5859, #6047, #6026, #5848) [models] Adding Swin Transformer architecture (#5491) [models] Adding EfficientNetV2 architecture (#5450) [models] Adding detection model improved weights: RetinaNet, MaskRCNN, FasterRCNN (#5756, #5773, #5763) [models] Adding classification model weight: resnext101 64x4d, mnasnet0_75, mnasnet1_3 (#5935, #6019) [models] Add SWAG model pretrained weights (#5714, #5722, #5732, #5793, #5721) [ops] AddingIoU loss function variants: DIoU, CIoU (#5786, #5776) [ops] Adding various ops and test for ops (#6053, #5416, #5792, #5783) [transforms] Adding AugMix transforms implementation (#5411) [reference scripts] Support custom weight decay setting in classification reference script (#5671) [transforms, reference scripts] Improve detection reference script: Scale Jitter, RandomShortestSize, FixedSizeCrop (#5435, #5610, #5607) [ci] Add M1 support : (#6167) [ci] Add Python-3.10 (build and test) (#5420)

    Improvements

    [documentation] Complete new revamp of models documentation (#5821, #5876, #5899, #6025, #5885, #5884, #5886, #5891, #6023, #6009, #5852, #5831, #5832, #6003, #6013, #5856, #6004, #6005, #5878, #6012, #5894, #6002, #5854, #5864, #5920, #5869, #5871, #6021, #6006, #6016, #5905, #6028, #5915, #5924, #5977, #5918, #5921, #5934, #5936, #5937, #5933, #5949, #5988, #5962, #5963, #5975, #5900, #5917, #5895, #5901, #6033, #6032, #6030, #5904, #5661, #6035, #6049, #6036, #5908, #5907, #6044, #6039, #5874, #6151) [documentation] Various documentation improvements (#5695, #5930, #5814, #5799, #5827, #5796, #5923, #5599, #5554, #5995, #5457, #6163, #6031, #6000, #5847, #6024)) [documentation] Add warnings in docs to document Beta APIs (#6115) [datasets] improve GDrive downloads (#5704, #5645) [datasets] indicate md5 checksum is not used for security (#5717) [models] Add shufflenetv2 1.5 and 2.0 weights (#5906) [models] Reduce unnecessary cuda sync in anchor_utils.py (#5515) [models] Adding improved MobileNetV2 weights (#5560) [models] Remove (N, T, H, W, C) => (N, T, C, H, W) from presets (#6058) [models] add swin_s and swin_b variants and improved swin_t (#6048) [models] Update ShuffleNetV2 annotations for x1_5 and x2_0 variants (#6022) [models] Better error message in ViT (#5820) [models, ops] Add private support for ciou and diou (#5984, #5685, #5690) [models, reference scripts] Various improvements to detection recipe and models (#5715, #5444) [transforms, tests] add functional vertical flip tests on segmentation mask (#5860) [transforms] make _max_value jit-scriptable (#5623) [transforms] Make ScaleJitter proportional (#5559) [transforms] add tensor kernels for normalize and erase (#5462) [transforms] Update transforms following PIL deprecation (#5898) [transforms, models, datasets…] Replace asserts with exceptions (#5587, #5659) [utils] add warning if font is not set in draw_bounding_boxes (#5785) [utils] Throw warning for empty masks or box tensors on draw_segmentation_masks and draw_bounding_boxes (#5857) [video] Add output_format do video datasets and readers (#6061) [video, io] Better compatibility with FFMPEG 5.0 (#5644) [video, io] Allow cuda device to be passed without the index for GPU decoding (#5505) [reference scripts] Simplify EMA to use Pytorch's update_parameters (#5469) [reference scripts] Reduce variance of evaluation in reference (#5819) [reference scripts] Various improvements to RAFT training reference (#5590) [tests] Speed up Model tests by 20% (#5574) [tests] Make test suite fail on unexpected test success (#5556) [tests] Skip big model in test to reduce memory usage in CI (#5903, #5902) [tests] Improve test of backbone utils (#5552) [tests] Validate against expected files on videos (#6077) [ci] Support for CUDA 11.6 (#5803, 5862) [ci] pre-download model weights in CI docs build (#5625)

    Bug Fixes

    [transforms] remove option to pass fill as str in transforms (#5632) [transforms] Better handling for Pad's fill argument (#5596) [transforms] [FBcode->GH] Fix accimage tests (#5545) [transforms] Update _pil_constants.py (#6154) (#6156) [transforms] Fix resize transform when size == small_edge_size and max_size isn't None (#5409) [transforms] Fixed rotate transform with expand inconsistency (#5677) [transforms] Fixed upstream issue with padding (#5875) [transforms] Fix functional.adjust_gamma (#5427) [models] Respect strict=False when loading detection models (#5841) [models] Fix resnet norm initialization (#6082) (#6085) [models] Use frozen BN only if pre-trained for detection models. (#5443) [models] fix fcos gtarea calculation (#5816) [models, onnx] Add topk min function for trace and onnx (#5310) [models, tests] fix mobilnet norm layer test (#5643) [reference scripts] Fix regression on Detection training script (#5985) [datasets] do not re-download from GDrive if file is already present (#5805) [datasets] Fix datasets: kinetics, Flowers102, VOC_2009, INaturalist 2021_train, caltech (#5578, #5775, #5425, #5844, #5789) [documentation] Fixes device mismatch issue while building docs (#5428) [documentation] Fix Accuracy meta-data on shufflenetv2 (#5896) [documentation] fix typo in docstrings of some transforms (#5609) [video, documentation] Fix append of audio_pts (#5488) [io, tests] More robust check in tests for 16 bits images (#5652) [video, io] Fix shape mismatch error in video reader (#5489) [io] Address nvjpeg leak on CUDA < 11.6 issue (#5713, #5482) [ci] Fixing issue with setup_env.sh in docker: resolve "unsafe directory" error (#6106) (#6109) [ci] fix documentation version problems when new release is tagged (#5583) [ci] Replace jcenter and fix version for android (#6046) [tests] Add .float() before .mean() on test_backbone_utils.py because .mean() dont accept integer dtype (#6090) (#6091) [tests] Fix keypointrcnn_resnet50_fpn flaky test (#5911) [tests] Disable test_encode|write_jpeg_reference tests (#5910) [mobile] Bump up LibTorchvision version number for Podspec to release Cocoapods (#5624) [feature extraction] Add default tracer args for model feature extraction function (#5637) [build] Fix libtorchvision.so not able to encode images by adding *_FOUND macros to CMakeLists.txt (#5547)

    Code Quality

    [dataset, models] Better deprecation message for voc2007 and SqueezeExcitation (#5391) [datasets, reference scripts] Use Kinetics instead of Kinetics400 in references (#5787) (#5952) [models] CleanUp DenseNet code (#5966) [models] Minor Swin Transformer fixes (#6054) [models, onnx] Use onnx function only in tracing mode (#5468) [models] Refactor swin transfomer so later we can reuse component for 3d version (#6088) (#6100) [models, tests] Fix minor issues with model tests. (#5576) [transforms] Remove to_tensor() and ToTensor() usages (#5553) [transforms] Refactor Augmentation Space calls to speed up. (#5402) [transforms] Recoded _max_value method using a dictionary (#5566) [transforms] Replace get_image_size/num_channels with get_dimensions (#5487) [ops] Replace usages of atomicAdd with gpuAtomicAdd (#5823) [ops] Fix unused variable warning in ps_roi_align_kernel.cu (#5408) [ops] Remove custom ops interpolation with antialiasing (#5329) [ops] Move Permute layer to ops. (#6055) [ops] Remove assertions for generalized_box_iou (#5691) [utils] Moving sequence_to_str to torchvision._utils (#5604) [utils] Clarify TypeError message in make_grid (#5997) [video, io] replace distutils.spawn with shutil.which per PEP632 in setup script (#5849) [video, io] Move VideoReader out of init (#5495) [video, io] Remove unnecessary initialisation in GPUDecoder (#5507) [video, io] Remove unused member variable and argument in GPUDecoder (#5499) [video, io] Improve test_video_reader (#5498) [video, io] Update private attribute name for readability (#5484) [video, tests] Improve test_videoapi (#5497) [reference scripts] Minor updates to optical flow ref for consistency (#5654) [reference scripts] Add barrier() after init_process_group() (#5475) [ci] Delete stale packaging scripts (#5433) [ci] remove explicit install of Pillow throughout CI (#5950) [ci, test] remove unnecessary pytest install (#5739) [ci, tests] Remove unnecessary PYTORCH_TEST_WITH_SLOW env (#5631) [ci] Add .git-blame-ignore-revs to ignore specific commits in git blame (#5696) [ci] Remove CUDA 11.1 support (#5477, #5470, #5451, #5978) [ci] Minor linting improvement (#5880) [ci] Remove Bandit and CodeQL jobs (#5734) [ci] Various type annotation fixes / issues (#5598, #5970, #5943)

    Contributors

    We're grateful for our community, which helps us improving torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

    Abhijit Deo, Aditya Oke, Andrey Talman, Anton Thomma, Behrooz, Bruno Korbar, Daniel Angelov, Dbhasin1, Drishti Bhasin, F-G Fernandez, Federico Pozzi, FG Fernandez, Georg Grab, Gouvernathor, Hu Ye, Jeffery (Zeyu) Zhao, Joao Gomes, kaijieshi, Kazuki Adachi, KyleCZH, kylematoba, LEGRAND Matthieu, Lezwon Castelino, Luming Tang, Matti Picus, Nicolas Hug, Nikita, Nikita Shulga, oxabz, Philip Meier, Prabhat Roy, puhuk, Richard Barnes, Sahil Goyal, satojkovic, Shijie, Shubham Bhokare, talregev, tcmyxc, Vasilis Vryniotis, vfdev, WuZhe, XiaobingZhang, Xu Zhao, Yassine Alouini, Yonghye Kwon, YosuaMichael, Yulv-git, Zhiqiang Wang

    Source code(tar.gz)
    Source code(zip)
  • v0.12.0(Mar 10, 2022)

    Highlights

    New Models

    Four new model families have been released in the latest version along with pre-trained weights for their variants: FCOS, RAFT, Vision Transformer (ViT) and ConvNeXt.

    Object Detection

    FCOS is a popular, fully convolutional, anchor-free model for object detection. In this release we include a community-contributed model implementation as well as pre-trained weights. The model was trained on COCO train2017 and can be used as follows:

    import torch
    from torchvision import models
    
    x = [torch.rand(3, 224, 224)]
    fcos = models.detection.fcos_resnet50_fpn(pretrained=True).eval()
    predictions =  fcos(x)
    

    The box AP of the pre-trained model on COCO val2017 is 39.2 (see #4961 for more details).

    We would like to thank Hu Ye and Zhiqiang Wang for contributing to the model implementation and initial training. This was the first community-contributed model in a long while, and given its success, we decided to use the learnings from this process and create a new model contribution guidelines.

    Optical Flow support and RAFT model

    Torchvision now supports optical flow! Optical flow models try to predict movement in a video: given two consecutive frames, the model predicts where each pixel of the first frame ends up in the second frame. Check out our new tutorial on Optical Flow!

    We implemented a torchscript-compatible RAFT model with pre-trained weights (both normal and “small” versions), and added support for training and evaluating optical flow models. Our training scripts support distributed training across processes and nodes, leading to much faster training time than the original implementation. We also added 5 new optical flow datasets: Flying Chairs, Flying Things, Sintel, Kitti, and HD1K.

    raft

    Image Classification

    Vision Transformer (ViT) and ConvNeXt are two popular architectures which can be used as image classifiers or as backbones for downstream vision tasks. In this release we include 8 pre-trained weights for their classification variants. The models were trained on ImageNet and can be used as follows:

    import torch
    from torchvision import models
    
    x = torch.rand(1, 3, 224, 224)
    vit = models.vit_b_16(pretrained=True).eval()
    convnext = models.convnext_tiny(pretrained=True).eval()
    predictions1 = vit(x)
    predictions2 = convnext(x)
    

    The accuracies of the pre-trained models obtained on ImageNet val are seen below:

    |Model |Acc@1 |Acc@5 | |--- |--- |--- | |vit_b_16|81.072|95.318| |vit_b_32|75.912|92.466| |vit_l_16|79.662|94.638| |vit_l_32|76.972|93.07| |convnext_tiny|82.52|96.146| |convnext_small|83.616|96.65| |convnext_base|84.062|96.87| |convnext_large|84.414|96.976|

    The above models have been trained using an adjusted version of our new training recipe and this allows us to offer models with accuracies significantly higher than the ones on the original papers.

    GPU Video Decoding

    In this release, we add support for GPU video decoding in the video reading API. To use hardware-accelerated decoding, we just need to pass a cuda device to the video reading API as shown below:

    import torchvision
    
    reader = torchvision.io.VideoReader(file_name, device='cuda:0')
    for frame in reader:
        print(frame)
    

    We also support seeking to anyframe or a keyframe in the video before reading, as shown below:

    reader.seek(seek_time)
    

    New Datasets

    We have implemented 14 new classification datasets: CLEVR, GTSRB, FER2013, SUN397, Country211, Flowers102, fvgc_aircraft, OxfordIIITPet, DTD, Food 101, Rendered SST2, Stanford cars, PCAM, and EuroSAT.

    As part of our work on Optical Flow support (see above for more details), we also added 5 new optical flow datasets: Flying Chairs, Flying Things, Sintel, Kitti, and HD1K.

    Documentation

    New documentation layout

    We have updated our documentation pages to be more compact and easier to browse. Each function / class is now documented in a separate page, clearing up some space in the per-module pages, and easing the discovery of the proposed APIs. Compare e.g. our previous docs vs the new ones. Please let us know if you have any feedback!

    Model contribution guidelines

    New model contribution guidelines have been published following the success of the FCOS model which was contributed by the community. These guidelines aim to be an overview of the model contribution process for anyone who would like to suggest, implement and train a new model.

    Upcoming Prototype APIs

    We are currently working on a prototype API which adds Multi-weight support on all of our model builder methods. This will enable us to offer multiple pre-trained weights, associated with their meta-data and inference transforms. The API is still under review and thus was not included in the release but you can read more about it on our blogpost and provide your feedback on the dedicated Github issue.

    Changes in our deprecation policy

    Up until now, torchvision would almost never remove deprecated APIs. In order to be more aligned and consistent with pytorch core, we are updating our deprecation policy. We are now following a 2-release deprecation cycle: deprecated APIs will raise a warning for 2 versions, and will be removed after that. To reflect these changes and to smooth the transition, we have decided to:

    • Remove all APIs that had been deprecated before or on v0.8, released 1.5 years ago.
    • Update the removal timeline of all other deprecated APIs to v0.14, to reflect the new 2-cycle policy starting now in v0.12.

    Backward-incompatible changes

    [models.quantization] Removed the Quantized shufflenet_v2_x1_5 and shufflenet_v2_x2_0 model builders which had no associated weights, rendering them useless. Additionally we added pre-trained weights for the shufflenet_v2_x0_5 quantized variant.. (#4854) [ops] Change to stable sort in nms implementations - this change can lead to different behavior in rare cases therefore it has been flagged as backwards-incompatible (#4767) [transforms] Changed the center and the parametrization of shear X/Y in Auto Augment transforms to align with the original papers (#5285) (#5384)

    Deprecations

    Note: in order to be more aligned with pytorch core, we are updating our deprecation policy. Please read more above in the “Highlights” section.

    [ops] The ops.poolers.MultiScaleRoIAlign public methods setup_setup_scales, convert_to_roi_format, and infer_scale have been deprecated and will be removed in 0.14 (#4951) (#4810)

    New Features

    [datasets] New optical flow datasets added: FlyingChairs, Kitti, Sintel, FlyingThings3D, and HD1K (#4860) (#4845) (#4858) (#4890) (#5004) (#4889) (#4888) (#4870) [datasets] New classification datasets support for FLAVA: CLEVR, GTSRB, FER2013, SUN397, Country211, Flowers102, fvgc_aircraft, OxfordIIITPet, DTD, Food 101, Rendered SST2, Stanford cars, PCAM, and EuroSAT (#5120) (#5130) (#5117) (#5132) (#5138) (#5177) (#5178) (#5116) (#5115) (#5119) (#5220) (#5166) (#5203) (#5114) (#5164) (#5280) [models] Add VisionTransformer model (#5173) (#5210) (#5172) (#5085) (#5226) (#5025) (#5086) (#5159) [models] Add ConvNeXt model (#5330) (#5253) [models] Add RAFT models and support for optical flow model training (#5022) (#5070) (#5174) (#5381) (#5078) (#5076) (#5081) (#5079) (#5026) (#5027) (#5082) (#5060) (#4868) (#4657) (#4732) [models] Add FCOS model (#4961) (#5267) [utils] Add utility to convert optical flow to an image (#5134) (#5308) [utils] Add utility to draw keypoints (#4216) [video] Add video GPU decoder (#5019) (#5191) (#5215) (#5256) (#4474) (#3179) (#4878) (#5328) (#5327) (#5183) (#4947) (#5192)

    Improvements

    [datasets] Migrate mnist dataset from np.frombuffer (#4598) [io, tests] Switch from np.frombuffer to torch.frombuffer (#4578) [models] Update ResNet-50 accuracy with Repeated Augmentation (#5201) [models] Add regnet_y_128gf factory function, and several regnet model weights (#5176) (#4530) [models] Adding min_size to classification and video models (#5223) [models] Remove in-place mutation in DefaultBoxGenerator (#5279) [models] Added Dropout parameter to Models Constructors (#4580) [models] Allow to use custom norm_layer (#4621) [models] Add IntermediateLayerGetter on segmentation (#5298) [models] Use FX feature extractor for segm model (#4563) [models, ops, io] Add model, ops and io usage logging (#4956) (#4735) (#4736) (#4737) (#5044) (#4799) (#5095) (#5038) [models.quantization] Implement is_qat in TorchVision (#5299) [models.quantization] Cleanup Quantized ShuffleNet (#4854) [models.quantization] Adding new Quantized models (#4969) [ops] [FBcode->GH] Fix missing kernel guards (#4620) (#4743) [ops] Expose misc ops at package level (#4812) [ops] Fix giou naming bug (#5270) [ops] Change batched NMS threshold to choose for-loop version (#4990) [ops] Add bias parameter to ConvNormActivation (#5012) [ops] Feature extraction default arguments - ops (#4810) [ops] Change to stable sort in nms implementations (#4767) [reference scripts] Support amp training (#4923) (#4933) (#4994) (#4547) (#4570) [reference scripts] Add types and improve descriptions to ArgumentParser parameters (#4724) [reference scripts] Replaced all 'no_grad()' instances with 'inference_mode()' (#4629) [reference scripts] Adding Repeated Augment Sampler (#5051) [reference scripts] Reduce variance of classification references evaluation (#4609) [reference scripts] Avoid inplace modification of target boxes in detection references (#5289) [reference scripts] Allow variable number of repetitions for RA (#5084) [reference scripts, classification] Adding gradient clipping (#4824) [reference scripts, models.quantization] Add --prototype flag to quantization scripts. (#5334) [reference scripts, ops] Additional SOTA ingredients on Classification Recipe (#4493) [transforms] Added center arg to F.affine and RandomAffine ops (#5208) [transforms] Explicitly copying array in pil_to_tensor (#4566) [transforms] Update functional_tensor.py (#4852) [transforms] Add api usage log to transforms (#5007) [utils] Support random colors by default for draw_bounding_boxes (#5127) [utils] Add API usage calls to utils (#5077) Various documentation improvements (#4913) (#4892) (#5305) (#5273) (#5089) (#4653) (#5302) (#4647) (#4922) (#5124) (#4972) (#5165) (#4843) (#5238) (#4846) (#4823) (#5316) (#5195) (#5153) (#4783) (#4798) (#4797) (#5368) (#5037) (#4830) (#4681) (#4579) (#4520) (#4586) (#4536) (#4574)) (#4565) (#4822) (#5315) (#4546) (#4522) (#5312) (#5372) (#4833) [tests] Set seed on several tests to reduce flakiness (#4911) (#4764) (#4762) (#4759) (#4766) (#4763) (#4758) (#4761) [tests]Other tests improvements (#4756) (#4775) (#4867) (#4929) (#4632) (#5029) (#4597) Added script to sync fbcode changes with main branch (#4769) [ci] Various CI improvements (#4662) (#4669) (#4791) (#4626) (#5021) (#4739) (#3973)(#4618) (#4788) (#4946) (#5112) (#5099) (#5288) (#5152) (#4696) (#5122) (#4793) (#4998) (#4498) [build] Various build improvements (#5261) (#5190) (#4945) (#4920) (#5024) (#4571) (#4742) (#4944) (#4989) (#5179) (#4516) (#4661) (#4695) (#4939) (#4954) [io] decode_* returns contiguous tensors (#4898) [io] Revert "decode_* returns contiguous tensors (#4898)" (#4901)

    Bug Fixes

    [datasets] fix Caltech datasets (#4556) [datasets] fix UCF101 on Windows (#5129) [datasets] remove extracted archive if flag was set (#5055) [datasets] Reverted folder.py back to using complete path to file for make_dataset and is_valid_file rather than just the filename (#4885) [datasets] fix fromfile on windows (#4980) [datasets] fix WIDERFace download links (#4649) [datasets] fix target_type selection for Caltech101 (#4637) [io] Skip jpeg comparison tests with PIL (#5169) [io] [Windows] Workaround for loading bundled DLLs (#4893) [models] Adding missing named param check on ViT (#5196) [models] Modifying keypoint_rcnn.py for keypoint_predictor issue (#5180) [models] Fixing bug on SSD backbone freezing (#4590) [models] [FBcode->GH] Removed type annotations from rcnn (#4883) [models.quantization] Amend the weights only if quantize=True (#4966) [models.quantization] fix mobilenetv3 quantization state dict loading (#4997) [ops] Adding masks_to_boxes to all in ops (#4779) [ops] Update the error message on DeformConv2d (#4908) [ops, onnx] RoiAlign aligned=True (#4692) [reference scripts] Fix reduce_across_processes inconsistent return type (#4733) [reference scripts] Fix bug on EMA n_averaged estimation (#4544) [reference scripts] support random seed for RA sampler (#5053) [reference scripts] fix bug in training model by amp (#4874) [reference scripts, transforms] Fix a bug on RandomZoomOut (#5278) [tests] Skip expected checks for quantized resnet50 due to flakiness (#4686) [transforms] Fix bug on autocontrast when min==max (#4999) [transforms] Fix augmentation space to be uint8 compatible (#4806) [utils] Fix draw_bounding_boxes and draw_keypointsfor tensors on GPU (#5101) (#5102) [build] fix formatting CIRCLECI_TAG when building docs (#4693) [build] Fix nvjpeg packaging into the wheel (#4752) [build] Switch Android app to pytorch_android stable (#4926) [ci] Add libtinfo5 dependency (#4931) [ci] Revert vit_h_14 as it breaks our CI (#5259) [ci] Remove pager on git diff (#4800) [ci] Fix failing CI job for android (#4912) [ci] Add numpy as explicit dependency to build_cmake.sh (#4987)

    Code Quality

    Various typing improvements (#4603) (#4172) (#4173) (#4631) (#4619) (#4583) (#4602) (#5182) Add ufmt (usort + black) as code formatter (#4384) Fix formatting issues (#4535) (#4747) Add pre-commit hook to fix line endings (#5021) Various imports cleanups/improvements (#4533) (#4879) Use f-strings almost everywhere, and other cleanups by applying pyupgrade (#4585) Update code to Python 3.7 compliance and remove Python 3.6 references (#5125) (#5161) Consolidate repr methods throughout the repo (#5392) Set allow_redefinition = True for mypy (#4531) Use is to compare type of objects (#4605) Various typos fixed (#5031) (#5092) Fix annotations for Python >= 3.8 (#5301) Revamp log api usage method (#5072) [deprecation] Update deprecation messages stating APIs will be removed in 0.14 and remove APIs that were deprecated before 0.8 (#5387) (#5386) [build] Updated setup.py to use TorchVersion object for version comparison (#4307) [ops] remove debugging asserts (#5332) [c++frontend] Fix missing Torch includes (#5118) [ci] Cleanup and removing unnecessary references and parameters (#4983) (#4930) (#5042) [datasets] [FBcode->GH] remove unused requests functionality (#5014) [datasets] allow single extension as str in make_dataset (#5229) [datasets] use helper function to extract archive in CelebA (#4557) [datasets] simplify QMNIST download logic (#4562) [documentation] fix make html-noplot docs build command (#5389) [models] Move all weight initializations from private methods to constructors (#5331) [models] simplify model builders (#5001) [models] Replace asserts with ValueErrors (#5275) [models] Use enumerate to get index of ModuleList (#4534) [models] Simplify efficientnet code by removing _efficientnet_conf (#4690) [models] Refactor Segmentation models (#4646) [models] Pass indexing param to meshgrid to avoid warning in detection models (#4645) [models] Refactor the backbone builders of detection (#4656) [models.quantization] Switch torch.quantization to torch.ao.quantization (#5296) (#4554) [ops] Fixed unused variables in ops (#4666) [ops] Refactor poolers (#4951) [reference scripts] Simplify the gradient clipping code (#4896) [reference scripts] only set random generator if shuffle=true (#5135) [tests] Refactor BoxOps tests to use parameterize (#5380) [tests] rename TestWeights to appease pytest (#5054) [tests] fix and add test for sequence_to_str (#5213) [tests] remove get_bool_env_var (#5222) [models, tests] remove custom code for model output comparison (#4971) [utils, documentation] Fix annotation of draw_segmentation_masks (#4527) [video] Fix error message in demuxer (#5293)

    Contributors

    We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

    Abhijit Deo, Aditya Oke, Alexander Soare, Alexander Unnervik, Allen Goodman, Andrey Talman, Brian Johnson, Bruno Korbar, buckage, Carlosbogo, Chungman Lee, Daniel Falbel, David Fan, Dmytro, Eli Uriegas, Ethan White, Eugene Yurtsev, F-G Fernandez, Fedor, Francisco Massa, Guo, Harish Kulkarni, HeungwooLee, Hu Ye, Jane (Yuan) Xu, Jirka Borovec, Jithun Nair, Joao Gomes, Jopo, Kai Zhang, kbozas, Kevin Tse, Khushi Agrawal, Konstantinos Bozas, Kotchin, Kushashwa Ravi Shrimali, KyleCZH, Mark Harfouche, Marko Kohtala, Masahiro Masuda, Matti Picus, Mengwei Liu, Mohammad (Moe) Rezaalipour, Mriganka Nath, Muhammed Abdullah, Nicolas Granger, Nicolas Hug, Nikita Shulga, peterbell10, Philip Meier, Piyush Singh, Prabhat Roy, ProGamerGov, puhuk, Richard Barnes, rvandeghen, Sai Krishna, Santiago Castro, Saswat Das, Sepehr Sameni, Sergii Khomenko, Stephen Matthews, Sumanth Ratna, Sumukh Aithal, Tal Ben-Nun, Vasilis Vryniotis, vfdev, Xiaolin Wang, Yi Zhang, Yiwen Song, Yoshitomo Matsubara, Yuchen Huang, Yuxin Wu, zhiqiang, and Zhiqiang Wang.

    Source code(tar.gz)
    Source code(zip)
    raft.png(189.34 KB)
  • v0.11.3(Jan 27, 2022)

  • v0.11.2(Dec 16, 2021)

    This minor release bumps the pinned PyTorch version to v1.10.1 and contains some minor bug fixes.

    Highlights

    Bug Fixes

    • [CI] Fix clang_format issue (#5061)
    • [CI, MOBILE] Fix binary_libtorchvision_ops_android job (#5062)
    • [CI] Add numpy as explicit dependency to build_cmake.sh (#5065)
    • [MODELS] Amend the weights only if quantize=True. (#5066)
    • [TRANSFORMS] Fix augmentation space to be uint8 compatible (#5067)
    • [DATASETS] Fix WIDERFace download links (#5068)
    • [BUILD, WINDOWS] Workaround for loading bundled DLLs (#5094)
    Source code(tar.gz)
    Source code(zip)
  • v0.11.1(Oct 21, 2021)

    Users were reporting issues installing torchvision on PyPI, this release contains an update to the dependencies for wheels to point directly to torch==0.10.0

    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Oct 21, 2021)

    This release introduces the RegNet and EfficientNet architectures, a new FX-based utility to perform Feature Extraction, new data augmentation techniques such as RandAugment and TrivialAugment, updated training recipes that support EMA, Label Smoothing, Learning-Rate Warmup, Mixup and Cutmix, and many more.

    Highlights

    New Models

    RegNet and EfficientNet are two popular architectures that can be scaled to different computational budgets. In this release we include 22 pre-trained weights for their classification variants. The models were trained on ImageNet and can be used as follows:

    import torch
    from torchvision import models
    
    x = torch.rand(1, 3, 224, 224)
    
    regnet = models.regnet_y_400mf(pretrained=True)
    regnet.eval()
    predictions = regnet(x)
    
    efficientnet = models.efficientnet_b0(pretrained=True)
    efficientnet.eval()
    predictions = efficientnet(x)
    

    The accuracies of the pre-trained models obtained on ImageNet val are seen below (see #4403, #4530 and #4293 for more details)

    |Model |Acc@1 |Acc@5 | |--- |--- |--- | |regnet_x_400mf |72.834 |90.95 | |regnet_x_800mf |75.212 |92.348 | |regnet_x_1_6gf |77.04 |93.44 | |regnet_x_3_2gf |78.364 |93.992 | |regnet_x_8gf |79.344 |94.686 | |regnet_x_16gf |80.058 |94.944 | |regnet_x_32gf |80.622 |95.248 | |regnet_y_400mf |74.046 |91.716 | |regnet_y_800mf |76.42 |93.136 | |regnet_y_1_6gf |77.95 |93.966 | |regnet_y_3_2gf |78.948 |94.576 | |regnet_y_8gf |80.032 |95.048 | |regnet_y_16gf |80.424 |95.24 | |regnet_y_32gf |80.878 |95.34 | |EfficientNet-B0 |77.692 |93.532 | |EfficientNet-B1 |78.642 |94.186 | |EfficientNet-B2 |80.608 |95.31 | |EfficientNet-B3 |82.008 |96.054 | |EfficientNet-B4 |83.384 |96.594 | |EfficientNet-B5 |83.444 |96.628 | |EfficientNet-B6 |84.008 |96.916 | |EfficientNet-B7 |84.122 |96.908 |

    We would like to thank Ross Wightman and Luke Melas-Kyriazi for contributing the weights of the EfficientNet variants.

    FX-based Feature Extraction

    A new Feature Extraction method has been added to our utilities. It uses PyTorch FX and enables us to retrieve the outputs of intermediate layers of a network which is useful for feature extraction and visualization. Here is an example of how to use the new utility:

    import torch
    from torchvision.models import resnet50
    from torchvision.models.feature_extraction import create_feature_extractor
    
    
    x = torch.rand(1, 3, 224, 224)
    
    model = resnet50()
    
    return_nodes = {
        "layer4.2.relu_2": "layer4"
    }
    model2 = create_feature_extractor(model, return_nodes=return_nodes)
    intermediate_outputs = model2(x)
    
    print(intermediate_outputs['layer4'].shape)
    
    

    We would like to thank Alexander Soare for developing this utility.

    New Data Augmentations

    Two new Automatic Augmentation techniques were added: Rand Augment and Trivial Augment. Both methods can be used as drop-in replacement of the AutoAugment technique as seen below:

    from torchvision import transforms
    
    t = transforms.RandAugment()
    # t = transforms.TrivialAugmentWide()
    transformed = t(image)
    
    transform = transforms.Compose([
        transforms.Resize(256),
        transforms.RandAugment(),  # transforms.TrivialAugmentWide()
        transforms.ToTensor()])
    

    We would like to thank Samuel G. Müller for contributing Trivial Augment and for his help on refactoring the AA package.

    Updated Training Recipes

    We have updated our training reference scripts to add support of Exponential Moving Average, Label Smoothing, Learning-Rate Warmup, Mixup, Cutmix and other SOTA primitives. The above enabled us to improve the classification Acc@1 of some pre-trained models by over 4 points. A major update of the existing pre-trained weights is expected on the next release.

    Backward-incompatible changes

    [models] Use torch instead of scipy for random initialization of inception and googlenet weights (#4256)

    Deprecations

    [models] Deprecate the C++ vision::models namespace (#4375)

    New Features

    [datasets] Add iNaturalist dataset (#4123) [datasets] Download and Kinetics 400/600/700 Datasets (#3680) [datasets] Added LFW Dataset (#4255) [models] Add FX feature extraction as an alternative to intermediate_layer_getter (#4302) (#4418) [models] Add RegNet Architecture in TorchVision (#4403) (#4530) (#4550) [ops] Add new masks_to_boxes op (#4290) (#4469) [ops] Add StochasticDepth implementation (#4301) [reference scripts] Adding Mixup and Cutmix (#4379) [transforms] Integration of TrivialAugment with the current AutoAugment Code (#4221) [transforms] Adding RandAugment implementation (#4348) [models] Add EfficientNet Architecture in TorchVision (#4293)

    Improvements

    Various documentation improvements (#4239) (#4251) (#4275) (#4342) (#3894) (#4159) (#4133) (#4138) (#4089) (#3944) (#4349) (#3754) (#4308) (#4352) (#4318) (#4244) (#4362) (#3863) (#4382) (#4484) (#4503) (#4376) (#4457) (#4505) (#4363) (#4361) (#4337) (#4546) (#4553) (#4565) (#4567) (#4574) (#4575) (#4383) (#4390) (#3409) (#4451) (#4340) (#3967) (#4072) (#4028) (#4132) [build] Add CUDA-11.3 builds to torchvision (#4248) [ci, tests] Skip some CPU-only tests on CircleCI machines with GPU (#4002) (#4025) (#4062) [ci] New issue templates (#4299) [ci] Various CI improvements, in particular putting back GPU testing on windows (#4421) (#4014) (#4053) (#4482) (#4475) (#3998) (#4388) (#4179) (#4394) (#4162) (#4065) (#3928) (#4081) (#4203) (#4011) (#4055) (#4074) (#4419) (#4067) (#4201) (#4200) (#4202) (#4496) (#3925) [ci] ping maintainers in case a PR was not properly labeled (#3993) (#4012) (#4021) (#4501) [datasets] Add bzip2 file compression support to datasets (#4097) [datasets] Faster dataset indexing (#3939) [datasets] Enable logging of internal dataset instanciations. (#4319) (#4090) [datasets] Removed copy=False in torch.from_numpy in MNIST to avoid warning (#4184) [io] Add warning for files with corrupt containers (#3961) [models, tests] Add test to check that classification models are FX-compatible (#3662) [tests] Speedup various tests (#3929) (#3933) (#3936) [models] Allow custom activation in SqueezeExcitation of EfficientNet (#4448) [models] Allow gradient backpropagation through GeneralizedRCNNTransform to inputs (#4327) [ops, tests] Add JIT tests (#4472) [ops] Make StochasticDepth FX-compatible (#4373) [ops] Added backward pass on CPU and CUDA for interpolation with anti-alias option (#4208) (#4211) [ops] Small refactoring to support opt mode for torchvision ops (fb internal specific) (#4080) (#4095) [reference scripts] Added Exponential Moving Average support to classification reference script (#4381) (#4406) (#4407) [reference scripts] Adding label smoothing on classification reference (#4335) [reference scripts] Further enhance Classification Reference (#4444) [reference scripts] Replaced to_tensor() with pil_to_tensor() + convert_image_dtype() (#4452) [reference scripts] Update the metrics output on reference scripts (#4408) [reference scripts] Warmup schedulers in References (#4411) [tests] Add check for fx compatibility on segmentation and video models (#4131) [tests] Mock redirection logic for tests (#4197) [tests] Replace set_deterministic with non-deprecated spelling (#4212) [tests] Skip building torchvision with ffmpeg when python==3.9 (#4417) [tests] [jit] Make operation call accept Stack& instead Stack* (#63414) (#4380) [tests] make tests that involve GDrive more robust (#4454) [tests] remove dependency for dtype getters (#4291) [transforms] Replaced example usage of ToTensor() by PILToTensor() + ConvertImageDtype() (#4494) [transforms] Explicitly copying array in pil_to_tensor (#4566) (#4573) [transforms] Make get_image_size and get_image_num_channels public. (#4321) [transforms] adding gray images support for adjust_contrast and adjust_saturation (#4477) (#4480) [utils] Support single color in utils.draw_bounding_boxes (#4075) [video, documentation] Port the video_api.ipynb notebook to the example gallery (#4241) [video, io, tests] Added check for invalid input file (#3932) [video, io] remove deprecated function call (#3861) (#3989) [video, tests] Removed test_audio_video_sync as it doesn't work as expected (#4050) [video] Build torchvision with ffmpeg only on Linux and ignore ffmpeg on other platforms (#4413, #4410, #4041)

    Bug Fixes

    [build] Conda: Add numpy dependency (#4442) [build] Explicitly exclude PIL 8.3.0 from compatible dependencies (#4148) [build] More robust version check (#4285) [ci] Fix broken clang format test. (#4320) [ci] Remove mentions of conda-forge (#4082) [ci] fixup '' -> '/./' for CI filter (#4059) [datasets] Fix download from google drive which was downloading empty files in some cases (#4109) [datasets] Fix splitting CelebA dataset (#4377) [datasets] Add support for files with periods in name (#4099) [io, tests] Don't check transparency channel for pil >= 8.3 in test_decode_png (#4167) [io] Fix size_t issues across JPEG versions and platforms (#4439) [io] Raise proper error when decoding 16-bits jpegs (#4101) [io] Unpinned the libjpeg version and fixed jpeg_mem_dest's size type Wind… (#4288) [io] deinterlacing PNG images with read_image (#4268) [io] More robust ffmpeg version query in setup.py (#4254) [io] Fixed read_image bug (#3948) [models] Don't download backbone weights if pretrained=True (#4283) [onnx, tests] Do not disable profiling executor in ONNX tests (#4324) [ops, tests] Fix DeformConvTester::test_backward_cuda by setting threads per block to 512 (#3942) [ops] Fix typing issue to make DeformConv2d scriptable (#4079) [ops] Fixes deform_conv issue with large input/output (#4351) [ops] Resolving tracing problem on StochasticDepth iterator. (#4372) [ops] Port quantize_val and dequantize_val into torchvision to avoid at::native and android xplat incompatibility (#4311) [reference scripts] Fix bug on EMA n_averaged estimation. (#4544) (#4545) [tests] Avoid cmyk in nvjpeg tests (#4246) [tests] Catch ValueError due to recent change to torch.testing.assert_close (#4165) [tests] Fix failing tests by catching the proper exception from torch.testing (#4121) [tests] Skip test if connection issues on fate (#4284) [transforms] Fix RandAugment and TrivialAugment bugs (#4370) [transforms] [FBcode->GH] [JIT] Add reference semantics to TorchScript classes (#44324) (#4166) [utils] Handle grayscale images on draw_bounding_boxes (#4043) (#4049) [video, io] Fixed missing audio with video_reader and pyav backend (#3934, #4064)

    Code Quality

    Various typing improvements (#4369) (#4168) (#4169) (#4170) (#4171) (#4224) (#4227) (#4395) (#4409) (#4232) (#4234 (#4236) (#4226) (#4416) Renamed the “master” branch into “main” (#4306) (#4365) [ci] (fb-internal only) Allow all torchvision test rules to run with RE (#4073) [ci] add pre-commit hooks for convenient formatting checks (#4387) [ci] Import hipify_python only when needed (#4031) [io] Fixed a couple of typos and removed unnecessary bracket (#4345) [io] use from_blob to avoid memcpy (#4118) [models, ops] Moving common layers to ops (#4504) [models, ops] Replace MobileNetV3's SqueezeExcitation with EfficientNet's one (#4487) [models] Explicitely store a distance value that is reused (#4341) [models] Use torch instead of scipy for random initialization of inception and googlenet weights (#4256) [onnx, tests] Use test images from repo rather than internet for ONNX tests (#4176) [onnx] Import ONNX utils from symbolic_opset11 module (#4230) [ops] Fix clang formatting in deform_conv2d_kernel.cu (#3943) [ops] Update gpu atomics include path (#4478) (reverted) [reference scripts] Cleaned-up coco evaluation code (#4453) [reference scripts] remove unused package in coco_eval.py (#4404) [tests] Ported all tests to pytest (#3962) (#3996) (#3950) (#3964) (#3957) (#3959) (#3981) (#3952) (#3977) (#3974) (#3976) (#3983) (#3971) (#3988) (#3990) (#3985) (#3984) (#4030) (#3955)r (#4008) (#4010) (#4023) (#3954) (#4026) (#3953) (#4047) (#4185) (#3947) (#4045) (#4036) (#4034) (#3978) (#4046) (#3991) (#3930) (#4038) (#4037) (#4215) (#3972) (#3966) (#4114) (#4177) (#4280) (#3946) (#4233) (#4258) (#4035) (#4040) (#4000) (#4196) (#3922) (#4032) [tests] Prevent tests from leaking their respective RNG (#4497) (#3926) (#4250) [tests] Remove TestCase dependency for test_models_detection_anchor_utils.py (#4207) [tests] Removed tests executing deprecated F_t.center/five/ten_crop methods (#4479) [tests] Replace set_deterministic with non-deprecated spelling (#4212) [tests] Remove torchvision/test/fakedata_generation.py (#4130) [transforms, reference scripts] Added PILToTensor and ConvertImageDtype classes in reference scripts and used them to replace ToTensor(#4495, #4481) [transforms] Refactor AutoAugment to support more augmentations. (#4338) [transforms] Replace deprecated torch.lstsq with torch.linalg.lstsq (#3918) [video] Drop virtual from private member functions of Decoder class (#4027) [video] Fixed comparison warnings in audio_stream and video_stream (#4007) [video] Fixed some ffmpeg deprecation warnings in decoder (#4003)

    Contributors

    We're grateful for our community, which helps us improving torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

    ABD-01, Adam J. Stewart, Aditya Oke, Alex Lin, Alexander Grund, Alexander Soare, Allen Goodman, Amani Kiruga, Anirudh, Beat Buesser, beet, Bert Maher, Bruno Korbar, Camilo De La Torre, cyy, D. Khuê Lê-Huu, David Fan, DevPranjal, dgenzel, dgenzel2, Dmitriy Genzel, Drishti Bhasin, Edward Z. Yang, Eli Uriegas, F-G Fernandez, Francisco Massa, Gary Miguel, Gaurav7888, IgorSusmelj, Ishan Kumar, Ivan Kobzarev, Jiawei Liu, Jithun Nair, Joao Gomes, Joe Early, Julien RIPOCHE, julienripoche, Kai Zhang, kingyiusuen, Loi Ly, Matti Picus, Meghan Lele, Muhammed Abdullah, Nicolas Hug, Nikita Shulga, ORippler, peterbell10, Philip Meier, Prabhat Roy, puhuk, Rajat Jaiswal, S Harish, Sahil Goyal, Samuel Gabriel, Santiago Castro, Saswat Das, Sepehr Sameni, Shengwei An, Shrill Shrestha, Shruti Pulstya, Sugato Ray, tanvimoharir, Vasilis Vryniotis, Vassilis C. Nicodemou, Vassilis Nicodemou, vfdev-5, Vincent Moens, Vivek Kumar, Yi Zhang, Yiwen Song, Yonghye Kwon, Yuchen Huang, Zhengxu Chen, Zhiqiang Wang, Zhongkai Zhu, zzk1st

    Source code(tar.gz)
    Source code(zip)
  • v0.10.1(Sep 27, 2021)

  • v0.10.0(Jun 15, 2021)

    This release improves support for mobile, with new mobile-friendly detection models based on SSD and SSDlite, CPU kernels for quantized NMS and quantized RoIAlign, pre-compiled binaries for iOS available in cocoapods and an iOS demo app. It also improves image IO by providing JPEG decoding on the GPU, and many more.

    Highlights

    [BETA] New models for detection

    SSD and SSDlite are two popular object detection architectures which are efficient in terms of speed and provide good results for low resolution pictures. In this release, we provide implementations for the original SSD model with VGG16 backbone and for its mobile-friendly variant SSDlite with MobileNetV3-Large backbone. The models were pre-trained on COCO train2017 and can be used as follows:

    import torch
    import torchvision
    
    # Original SSD variant
    x = [torch.rand(3, 300, 300), torch.rand(3, 500, 400)]
    m_detector = torchvision.models.detection.ssd300_vgg16(pretrained=True)
    m_detector.eval()
    predictions = m_detector(x)
    
    # Mobile-friendly SSDlite variant
    x = [torch.rand(3, 320, 320), torch.rand(3, 500, 400)]
    m_detector = torchvision.models.detection.ssdlite320_mobilenet_v3_large(pretrained=True)
    m_detector.eval()
    predictions = m_detector(x)
    

    The following accuracies can be obtained on COCO val2017 (full results available in #3403 and #3757):

    Model | mAP | mAP@50 | mAP@75 -- | -- | -- | -- SSD300 VGG16 | 25.1 | 41.5 | 26.2 SSDlite320 MobileNetV3-Large | 21.3 | 34.3 | 22.1

    [STABLE] Quantized kernels for object detection

    The forward pass of the nms and roi_align operators now support tensors with a quantized dtype, which can help lowering the memory footprint of object detection models, particularly on mobile environments.

    [BETA] JPEG decoding on the GPU

    Decoding jpegs is now possible on GPUs with the use of nvjpeg, which should be readily available in your CUDA setup. The decoding time of a single image should be about 2 to 3 times faster than with libjpeg on CPU. While the resulting tensor will be stored on the GPU device, the input raw tensor still needs to reside on the host (CPU), because the first stages of the decoding process take place on the host:

    from torchvision.io.image import read_file, decode_jpeg
    
    data = read_file('path_to_image.jpg')  # raw data is on CPU
    img = decode_jpeg(data, device='cuda')  # decoded image in on GPU
    

    [BETA] iOS support

    TorchVision 0.10 now provides pre-compiled iOS binaries for its C++ operators, which means you can run Faster R-CNN and Mask R-CNN on iOS. An example app on how to build a program leveraging those ops can be found in here.

    [STABLE] Speed optimizations for Tensor transforms

    The resize and flip transforms have been optimized and its runtime improved by up to 5x on the CPU. The corresponding PRs were sent to PyTorch in https://github.com/pytorch/pytorch/pull/51653, https://github.com/pytorch/pytorch/pull/54500 and https://github.com/pytorch/pytorch/pull/56713

    [STABLE] Documentation improvements

    Significant improvements were made to the documentation. In particular, a new gallery of examples is available: see here for the latest version (the stable version is not released at the time of writing). These examples visually illustrate how each transform acts on an image, and also properly documents and illustrate the output of the segmentation models.

    The example gallery will be extended in the future to provide more comprehensive examples and serve as a reference for common torchvision tasks.

    Backwards Incompatible Changes

    • [transforms] Ensure input type of normalize is float. (#3621)
    • [models] Use PyTorch smooth_l1_loss and remove private custom implementation (#3539)

    New Features

    • Added iOS binaries and test app (#3582)(#3629) (#3806)
    • [datasets] Added KITTI dataset (#3640)
    • [utils] Added utility to draw segmentation masks (#3330, #3824)
    • [models] Added the SSD & SSDlite object detection models (#3403, #3757, #3766, #3855, #3896, #3818, #3799)
    • [transforms] Added antialias option to transforms.functional.resize (#3761, #3810, #3842)
    • [transforms] Add new max_size parameter to Resize (#3494)
    • [io] Support for decoding jpegs on GPU with nvjpeg (#3792)
    • [ci, rocm] Add ROCm to builds (#3840) (#3604) (#3575)
    • [ops, models.quantization] Add quantized version of NMS (#3601)
    • [ops, models.quantization] Add quantized version of RoIAlign (#3624, #3904)

    Improvement

    • [build] Various build improvements: (#3618) (#3622) (#3399) (#3794) (#3561)
    • [ci] Various CI improvements (#3647) (#3609) (#3635) (#3599) (#3778) (#3636) (#3809) (#3625) (#3764) (#3679) (#3869) (#3871) (#3444) (#3445) (#3480) (#3768) (#3919) (#3641)(#3900)
    • [datasets] Improve error handling in make_dataset (#3496)
    • [datasets] Remove caching from MNIST and variants (#3420)
    • [datasets] Make DatasetFolder.find_classes public (#3628)
    • [datasets] Separate extraction and decompression logic in datasets.utils.extract_archive (#3443)
    • [datasets, tests] Improve dataset test coverage and infrastructure (#3450) (#3457) (#3454) (#3447) (#3489) (#3661) (#3458 (#3705) (#3411) (#3461) (#3465) (#3543) (#3550) (#3665) (#3464) (#3595) (#3466) (#3468) (#3467) (#3486) (#3736) (#3730) (#3731) (#3477) (#3589) (#3503) (#3423) (#3492)(#3578) (#3605) (#3448) (#3864) (#3544)
    • [datasets, tests] Fix lazy importing for dataset tests (#3481)
    • [datasets, tests] Fix test_extract(zip|tar|tar_xz|gzip) on windows (#3542)
    • [datasets, tests] Fix kwargs forwarding in fake data utility functions (#3459)
    • [datasets, tests] Properly fix dataset test that passes by accident (#3434)
    • [documentation] Improve the documentation infrastructure (#3868) (#3724) (#3834) (#3689) (#3700) (#3513) (#3671) (#3490) (#3660) (#3594)
    • [documentation] Various documentation improvements (#3793) (#3715) (#3727) (#3838) (#3701) (#3923) (#3643) (#3537) (#3691) (#3453) (#3437) (#3732) (#3683) (#3853) (#3684) (#3576) (#3739) (#3530) (#3586) (#3744) (#3645) (#3694) (#3584) (#3615) (#3693) (#3706) (#3646) (#3780) (#3704) (#3774) (#3634)(#3591)(#3807)(#3663)
    • [documentation, ci] Improve the CI infrastructure for documentation (#3734) (#3837) (#3796) (#3711)
    • [io] remove deprecated function calls (#3859) (#3858)
    • [documentation, io] Improve IO docs and expose ImageReadMode in torchvision.io (#3812)
    • [onnx, models] Replace reshape with flatten in MobileNetV2 (#3462)
    • [ops, tests] Added test for aligned=True (#3540)
    • [ops, tests] Add onnx test for batched_nms (#3483)
    • [tests] Various test improvements (#3548) (#3422) (#3435) (#3860) (#3479) (#3721) (#3872) (#3908) (#2916) (#3917) (#3920) (#3579)
    • [transforms] add __repr__ for transforms.RandomErasing (#3491)
    • [transforms, documentation] Adds Documentation for AutoAugmentation (#3529)
    • [transforms, documentation] Add illustrations of transforms with sphinx-gallery (#3652)
    • [datasets] Remove pandas dependency for CelebA dataset (#3656, #3698)
    • [documentation] Add docs for missing datasets (#3536)
    • [referencescripts] Make reference scripts compatible with submitit (#3785)
    • [referencescripts] Updated all_gather() to make use of all_gather_object() from PyTorch (#3857)
    • [datasets] Added dataset download support in fbcode (#3823) (#3826)

    Code quality

    • Remove inconsistent FB copyright headers (#3741)
    • Keep consistency in classes ConvBNActivation (#3750)
    • Removed unused imports (#3738, #3740, #3639)
    • Fixed floor_divide deprecation warnings seen in pytest output (#3672)
    • Unify onnx and JIT resize implementations (#3654)
    • Cleaned-up imports in test files related to datasets (#3720)
    • [documentation] Remove old css file (#3839)
    • [ci] Fix inconsistent version pinning across yaml files (#3790)
    • [datasets] Remove redundant path.join in Places365 (#3545)
    • [datasets] Remove imprecise error handling in PhotoTour dataset (#3488)
    • [datasets, tests] Remove obsolete test_datasets_transforms.py (#3867)
    • [models] Making protected params of MobileNetV3 public (#3828)
    • [models] Make target argument in transform.py truly optional (#3866)
    • [models] Adding some references on MobileNetV3 implementation. (#3850)
    • [models] Refactored set_cell_anchors() in AnchorGenerator (#3755)
    • [ops] Minor cleanup of roi_align_forward_kernel_impl (#3619)
    • [ops] Replace deprecated AutoNonVariableTypeMode with AutoDispatchBelowADInplaceOrView. (#3786, #3897)
    • [tests] Port tests to use pytest (#3852, #3845, #3697, #3907, #3749)
    • [ops, tests] simplify get_script_fn (#3541)
    • [tests] Use torch.testing.assert_close in out test suite (#3886) (#3885) (#3883) (#3882) (#3881) (#3887) (#3880) (#3878) (#3877) (#3875) (#3888) (#3874) (#3884) (#3876) (#3879) (#3873)
    • [tests] Clean up test accept behaviour (#3759)
    • [tests] Remove unused masks variable in test_image.py (#3910)
    • [transforms] use ternary if in resize (#3533)
    • [transforms] replaced deprecated call to ByteTensor with from_numpy (#3813)
    • [transforms] Remove unnecessary casting in adjust_gamma (#3472)

    Bugfixes

    • [ci] set empty cxx flags as default (#3474)
    • [android][test_app] Cleanup duplicate dependency (#3428)
    • Remove leftover exception (#3717)
    • Corrected spelling in a TypeError (#3659)
    • Add missing device info. (#3651)
    • Moving tensors to the right device (#3870)
    • Proper error message (#3725)
    • [ci, io] Pin JPEG version to resolve the size_t issue on windows (#3787)
    • [datasets] Make LSUN OS agnostic (#3455)
    • [datasets] Update squeezenet urls (#3581)
    • [datasets] Add .item() to the target variable in fakedataset.py (#3587)
    • [datasets] Fix VOC datasets for 2007 (#3572)
    • [datasets] Add custom user agent for download_url (#3498)
    • [datasets] Fix LSUN dataset tests flakyness (#3703)
    • [datasets] Fix (Fashion|K)MNIST download and MNIST download test (#3557)
    • [datasets] fix check for exceeded quota on Google Drive (#3710)
    • [datasets] Fix redirect behavior of datasets.utils.download_url (#3564)
    • [datasets] Update EMNIST url (#3567)
    • [datasets] Redirect datasets to correct urls (#3574)
    • [datasets] Prevent potential bug in DatasetFolder.make_dataset (#3733)
    • [datasets, tests] Fix redirection in download tests (#3568)
    • [documentation] Correct the size of returned tensor in comments of ps_roi_pool.py and ps_roi_align.py (#3849)
    • [io] Fix ternary operator to decide to store an image in Grayscale or RGB (#3553)
    • [io] Fixed audio-video synchronisation problem in read_video() when using pts as unit (#3791)
    • [models] Fix bug on detection backbones when trainable_layers == 0 (#3906)
    • [models] Removed caching of anchors from AnchorGenerator (#3745)
    • [models] Update weights of classification models with new serialization format to allow proper unpickling (#3620, #3851)
    • [onnx, ops] Fix roi_align ONNX export (#3355)
    • [referencescripts] Only sync cuda ifn cuda available (#3674)
    • [referencescripts] Add checkpoints used for preemption. (#3789)
    • [transforms] Fix to_tensor for accimage backend (#3439)
    • [transforms] Make crop work the same for PIL and Tensor (#3770)
    • [transforms, models, tests] Fix some tests in fbcode (#3686)
    • [transforms, tests] Fix test_random_autocontrast flakyness (#3699)
    • [utils] Fix the spacing of labels on draw_bounding_boxes (#3895)
    • [utils, tests] Fix test_draw_boxes (#3631)

    Deprecation

    • [transforms] Deprecate _transforms_video and _functional_video in favor of transforms (#3441)

    Performance

    • [ops] Improve performance of batched_nms when number of boxes is large (#3426)
    • [transforms] Speed up equalize transform by using bincount instead of histc (#3493)

    Contributors

    We're grateful for our community, which helps us improving torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

    Aditya Oke, Akshay Kumar, Alessandro Melis, Avijit Dasgupta, Bruno Korbar, Caroline Chen, chengjuzhou, Edgar Andrés Margffoy Tuay, Eli Uriegas, Francisco Massa, Guillem Orellana Trullols, harishsdev, Ivan Kobzarev, Jaesun Park, James Thewlis, Jeff Daily, Jeff Yang, Jithendra Paruchuri, Jon Janzen, KAI ZHAO, Ksenija Stanojevic, Lewis Patten, Matti Picus, moto, Mustafa Bal, Nicolas Hug, Nikhil Kumar, Nikita Shulga, Philip Meier, Prabhat Roy, Sanket Thakur, scott-vsi, Sofiane Abbar, t-rutten, urmi22, Vasilis Vryniotis, vfdev, Yuchen Huang, Zhengyang Feng, Zhiqiang Wang

    Thank you!

    Source code(tar.gz)
    Source code(zip)
  • v0.9.1(Mar 25, 2021)

    Highlights

    This minor release bumps the pinned PyTorch version to v1.8.1, and brings a few bugfixes for datasets, including MNIST download not being available.

    Bugfixes

    • fix VOC datasets for 2007 (#3572)
    • Update EMNIST url (#3567)
    • Fix redirect behavior of datasets.utils.download_url (#3564)
    • Fix MNIST download for minor release (#3559)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Mar 4, 2021)

    This release introduces improved support for mobile, with new mobile-friendly models, pre-compiled binaries for Android available in maven and an android demo app. It also improves image IO and provides new data augmentations including AutoAugment.

    Highlights

    Better mobile support

    torchvision 0.9 adds support for the MobileNetV3 architecture with pre-trained weights for Classification, Object Detection and Segmentation tasks. It also improves C++ operators so that they can be compiled and run on Android, and we are providing pre-compiled torchvision artifacts published to jcenter. An example application on how to use the torchvision ops on an Android app can be found in here.

    Classification

    We provide MobileNetV3 variants (including a quantized version) pre-trained on ImageNet 2012.

    import torch
    import torchvision
    
    # Classification
    x = torch.rand(1, 3, 224, 224)
    m_classifier = torchvision.models.mobilenet_v3_large(pretrained=True)
    # m_classifier = torchvision.models.mobilenet_v3_small(pretrained=True)
    m_classifier.eval()
    predictions = m_classifier(x)
    
    # Quantized Classification
    x = torch.rand(1, 3, 224, 224)
    m_classifier = torchvision.models.quantization.mobilenet_v3_large(pretrained=True)
    m_classifier.eval()
    predictions = m_classifier(x)
    

    The pre-trained models have the following accuracies on ImageNet 2012 val:

    | Model | Top-1 Acc | Top-5 Acc | --- | --- | --- | | MobileNetV3 Large | 74.042 | 91.340 | | MobileNetV3 Large (Quantized) | 73.004 | 90.858 | | MobileNetV3 Small | 67.620 | 87.404 |

    Object Detection

    We provide two variants of Faster R-CNN with MobileNetV3 backbone pre-trained on COCO train2017. They can be obtained as follows

    import torch
    import torchvision
    
    # Fast Low Resolution Model
    x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
    m_detector = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(pretrained=True)
    m_detector.eval()
    predictions = m_detector(x)
    
    # Highly Accurate High Resolution Model
    x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
    m_detector = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)
    m_detector.eval()
    predictions = m_detector(x)
    

    And yield the following accuracies on COCO val 2017 (full results available in #3265):

    | Model | mAP | mAP@50 | mAP@75 | | --- | --- | --- | --- | | Faster R-CNN MobileNetV3-Large 320 FPN | 22.8 | 38.0 | 23.2 | | Faster R-CNN MobileNetV3-Large FPN | 32.8 | 52.5 | 34.3 |

    Semantic Segmentation

    We also provide pre-trained models for semantic segmentation. The models have been trained on a subset of COCO train2017, which contains the same 20 categories as those from Pascal VOC.

    import torch
    import torchvision
    
    # Fast Mobile Model
    x = torch.rand(1, 3, 520, 520)
    m_segmenter = torchvision.models.segmentation.lraspp_mobilenet_v3_large(pretrained=True)
    m_segmenter.eval()
    predictions = m_segmenter(x)
    
    # Highly Accurate Mobile Model
    x = torch.rand(1, 3, 520, 520)
    m_segmenter = torchvision.models.segmentation.deeplabv3_mobilenet_v3_large(pretrained=True)
    m_segmenter.eval()
    predictions = m_segmenter(x)
    

    The pre-trained models give the following results on the subset of COCO val2017 which contain the same 20 categories as those present in Pascal VOC (full results in #3276):

    | Model | mean IoU | global pixelwise accuracy | | --- | --- | --- | | Lite R-ASPP with Dilated MobileNetV3 Large Backbone | 57.9 | 91.2 | | DeepLabV3 with Dilated MobileNetV3 Large Backbone | 60.3 | 91.2 |

    Addition of the AutoAugment method

    AutoAugment is a common Data Augmentation technique that can improve the accuracy of Scene Classification models. Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that ImageNet policies provide significant improvements when applied to other datasets.

    In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFA10 and SVHN. The new transform can be used standalone or mixed-and-matched with existing transforms:

    from torchvision import transforms
    
    t = transforms.AutoAugment()
    transformed = t(image)
    
    transform=transforms.Compose([
        transforms.Resize(256),
        transforms.AutoAugment(),
        transforms.ToTensor()])
    

    Improved Image IO and on-the-fly image type conversions

    All the read and decode methods of the io.image package have been updated to:

    • Add support for Palette, Grayscale Alpha and RBG Alpha image types during PNG decoding.
    • Allow the on-the-fly conversion of image from one type to the other during read.
    from torchvision.io.image import read_image, ImageReadMode
    
    # keeps original type, channels unchanged
    x1 = read_image("image.png")
    
    # converts to grayscale, channels = 1
    x2 = read_image("image.png", mode=ImageReadMode.GRAY)
    
    # converts to grayscale with alpha transparency, channels = 2
    x3 = read_image("image.png", mode=ImageReadMode.GRAY_ALPHA)
    
    # coverts to RGB, channels = 3
    x4 = read_image("image.png", mode=ImageReadMode.RGB)
    
    # converts to RGB with alpha transparency, channels = 4
    x5 = read_image("image.png", mode=ImageReadMode.RGB_ALPHA)
    

    Python 3.9 and CUDA 11.1

    This release adds official support for Python 3.9 and CUDA 11.1 (#3341, #3418)

    Backwards Incompatible Changes

    • [Ops] Change default eps value of FrozenBN to better align with nn.BatchNorm (#2933)
    • [Ops] Remove deprecated _new_empty_tensor. (#3156)
    • [Transforms] ColorJitter gets its random params by calling get_params() (#3001)
    • [Transforms] Change rounding of transforms on integer tensors (#2964)
    • [Utils] Remove normalize from save_image (#3324)

    New Features

    • [Datasets] Add WiderFace dataset (#2883)
    • [Models] Add MobileNetV3 architecture:
      • Classification Models: (#3354, #3252, #3182, #3242, #3177)
      • Object Detection Models: (#3265, #3253, #3223, #3243, #3244, #3248)
      • Segmentation Models: (#3276)
      • Quantized Models: (#3366, #3323)
    • [Models] Improve speed/accuracy of FasterRCNN by introducing a score threshold on RPN (#3205)
    • [Mobile] Add Android gradle project with demo test app (#2897)
    • [Transforms] Implemented AutoAugment, along with required new transforms + Policies (#3123)
    • [Ops] Added support of Autocast in all Operators: #2938, #2926, #2922, #2928, #2905, #2906, #2907, #2898
    • [Ops] Add modulation input for DeformConv2D (#2791)
    • [IO] Improved io.image with on-the-fly image type conversions: (#3193, #3069, #3024, #2988, #2984)
    • [IO] Add option to write audio to video file (#2304)
    • [Utils] Added a utility to draw bounding boxes (#2785, #3296, #3075)

    Improvements

    Datasets

    • Concatenate small tensors in video datasets to reduce the use of shared file descriptor (#1795)
    • Improve testing for datasets (#3336, #3337, #3402, #3412, #3413, #3415, #3416, #3345, #3376, #3346, #3338)
    • Check if dataset file is located on Google Drive before downloading it (#3245)
    • Improve Coco implementation (#3417)
    • Make download_url follow redirects (#3236)
    • make_dataset as staticmethod of DatasetFolder (#3215)
    • Add a warning if any clip can't be obtained from a video in VideoClips. (#2513)

    Models

    • Improve error message in AnchorGenerator (#2960)
    • Disable pretrained backbone downloading if pretrained is True in segmentation models (#3325)
    • Support for image with no annotations in RetinaNet (#3032)
    • Change RoIHeads reshape to support empty batches. (#3031)
    • Fixed typing exception throwing issues with JIT (#3029)
    • Replace deprecated functional.sigmoid with torch.sigmoid in RetinaNet (#3307)
    • Assert that inputs are floating point in Faster R-CNN normalize method (#3266)
    • Speedup RetinaNet's postprocessing (#2828)

    Ops

    • Added eps in the __repr__ of FrozenBN (#2852)
    • Added __repr__ to MultiScaleRoIAlign (#2840)
    • Exposing LevelMapper params in MultiScaleRoIAlign (#3151)
    • Enable autocast for all operators and let them use the dispatcher (#2926, #2922, #2928, #2898)

    Transforms

    • adjust_hue now accepts tensors with one channel (#3222)
    • Add fill color support for tensor affine transforms (#2904)
    • Remove torchscript workaround for center_crop (#3118)
    • Improved error message for RandomCrop (#2816)

    IO

    • Enabling to import read_file and the other methods from torchvision.io (#2918)
    • accept python bytes in _read_video_from_memory() (#3347)
    • Enable rtmp timeout in decoder (#3076)
    • Specify tls cert file to decoder through config (#3289, #3374)
    • Add UUID in LOG() in decoder (#3080)

    References

    • Add weight averaging and storing methods in references utils (#3352)
    • Adding Preset Transforms in reference scripts (#3317)
    • Load variables when --resume /path/to/checkpoint --test-only (#3285)
    • Updated video classification ref example with new transforms (#2935)

    Misc

    • Various documentation improvements (#3039, #3271, #2820, #2808, #3131, #3062, #3061, #3000, #3299, #3400, #2899, #2901, #2908, #2851, #2909, #3005, #2821, #2957, #3360, #3019, #3124, #3217, #2879, #3234, #3180, #3425, #2979, #2935, #3298, #3268, #3203, #3290, #3295, #3200, #2663, #3153, #3147, #3232)
    • The documentation infrastructure was improved, in particular the docs are now built on every PR and uploaded to CircleCI (#3259, #3378, #3408, #3373, #3290)
    • Avoid some deprecation warnings from PyTorch (#3348)
    • Ensure operators are added in C++ (#2798, #3091, #3391)
    • Fixed compilation warnings on C++ codebase (#3390)
    • CI Improvements (#3401, #3329, #2990, #2978, #3189, #3230, #3254, #2844, #2872, #2825, #3144, #3137, #2827, #2848, #2914, #3419, #2895, #2837)
    • Installation improvements (#3302, #2969, #3113, #3202)
    • CMake improvements (#2801, #2805, #3212, #3381)

    Mobile

    • Add Torch Selective macros in all C++ Ops for better support on mobile (#3218)

    Code Quality, testing

    • [BC-breaking] Modernized C++ codebase & made it mobile-friendly (25% faster to compile): #2885, #2891, #2892, #2893, #2905, #2906, #2907, #2938, #2944, #2945, #3011, #3020, #3097, #3105, #3134, #3135, #3143, #3146, #3154, #3156, #3163, #3218, #3308, #3311, #3312, #3326, #3350, #3390
    • Cleaned up Python codebase & made it more Pythonic: #3263, #3239, #3059, #3055, #3045, #3382, #3159, #3171
    • Improve type annotations (#3288, #3045, #2862, #2858, #2857, #2863, #2865, #2856, #2860, #2864, #2875, #2859, #2854, #2861, #3174, #3059)
    • Code refactoring and static analysis improvements (#3379, #3335, #3229, #3204, #3095)
    • Miscellaneous test improvements (#2966, #2965, #3018, #3035, #2961, #2806, #2812, #2815, #2834, #2874, #3099, #3092, #3160, #3103, #2971, #3023, #2803, #3136, #3319, #3310, #3287, #3033, #2983, #3386, #3369, #3116, #2985, #3320)

    Bug Fixes

    • [DATASETS] Fixes EMNIST split and label issues (#2673)
    • [DATASETS] Fix overflow in STL10 fold reading (#3353)
    • [MODELS] Fix incorrectly frozen BN on ResNet FPN backbone (#3396)
    • [MODELS] Fix scriptability support in Inception V3 (#2976)
    • [MODELS] Changed default value of eps in FrozenBatchNorm to match BatchNorm: #2940 #2933
    • [MODELS] Fixed warning in models.detection.transforms.resize_image_and_masks. (#3237)
    • [MODELS] Fix trainable_layers on RetinaNet (#3234)
    • [MODELS] Fix ShuffleNetV2 ONNX model export issue. (#3158)
    • [UTILS] Fixes no grad and range bugs in utils. (#3269)
    • [UTILS] make_grid uses a more correct normalization (#2967)
    • [OPS] fix GET_THREADS() for ROCm with DeformConv (#2997)
    • [OPS] Fix NMS and IoU overflows for fp16 (#3383, #3382)
    • [OPS] Fix ops registration on windows (#3380)
    • [OPS] Fix initialisation bug on FeaturePyramidNetwork (#2954)
    • [IO] Replace hardcoded error code with ENODATA (#3277)
    • [REFERENCES] Fix repeated UserWarning and add more flexibility to reference code for segmentation tasks (#2886)
    • [TRANSFORMS] Fix default fill value in RandomRotation (#3303)
    • [TRANSFORMS] Correct aspect ratio sampling in transforms.RandomErasing (#3344)
    • [TRANSFORMS] Fix CenterCrop for Tensor size is greater than imgsize (#3333)
    • [TRANSFORMS] Functional to_tensor returns float tensor of default dtype (#3398)
    • [TRANSFORMS] Add explicit check for number of channels (#3013)
    • [TRANSFORMS] pil_to_tensor with accimage backend now return uint8 (#3109)
    • [TRANSFORMS] Fix potential overflow in convert_image_dtype (#3107)
    • [TRANSFORMS] Check num of channels on adjust*_ transformations (#3069)

    Deprecations

    • [TRANSFORMS] Introduced InterpolationModes and deprecated arguments: resample and fillcolor (#2952, #3055)
    Source code(tar.gz)
    Source code(zip)
  • v0.8.2(Dec 10, 2020)

    This minor release bumps the pinned PyTorch version to v1.7.1, and contains some minor improvements.

    Highlights

    Python 3.9 support

    This releases add native binaries for Python 3.9 #3063

    Bugfixes

    • Make read_file and write_file accept unicode strings on Windows #2949
    • Replaced tuple creation by one acceptable by majority of compilers #2937
    • Add docs for focal_loss #2979
    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Oct 27, 2020)

  • v0.8.0(Oct 27, 2020)

    This release brings new additions to torchvision that improves support for model deployment. Most notably, transforms in torchvision are now torchscript-compatible, and can thus be serialized together with your model for simpler deployment. Additionally, we provide native image IO with torchscript support, and a new video reading API (released as Beta) which is more flexible than torchvision.io.read_video.

    Highlights

    Transforms now support Tensor, batch computation, GPU and TorchScript

    torchvision transforms are now inherited from nn.Module and can be torchscripted and applied on torch Tensor inputs as well as on PIL images. They also support Tensors with batch dimension and work seamlessly on CPU/GPU devices:

    import torch
    import torchvision.transforms as T
    
    # to fix random seed, use torch.manual_seed
    # instead of random.seed
    torch.manual_seed(12)
    
    transforms = torch.nn.Sequential(
        T.RandomCrop(224),
        T.RandomHorizontalFlip(p=0.3),
        T.ConvertImageDtype(torch.float),
        T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    )
    scripted_transforms = torch.jit.script(transforms)
    # Note: we can similarly use T.Compose to define transforms
    # transforms = T.Compose([...]) and 
    # scripted_transforms = torch.jit.script(torch.nn.Sequential(*transforms.transforms))
    
    tensor_image = torch.randint(0, 256, size=(3, 256, 256), dtype=torch.uint8)
    # works directly on Tensors
    out_image1 = transforms(tensor_image)
    # on the GPU
    out_image1_cuda = transforms(tensor_image.cuda())
    # with batches
    batched_image = torch.randint(0, 256, size=(4, 3, 256, 256), dtype=torch.uint8)
    out_image_batched = transforms(batched_image)
    # and has torchscript support
    out_image2 = scripted_transforms(tensor_image)
    

    These improvements enable the following new features:

    • support for GPU acceleration
    • batched transformations e.g. as needed for videos
    • transform multi-band torch tensor images (with more than 3-4 channels)
    • torchscript transforms together with your model for deployment

    Note: Exceptions for TorchScript support includes Compose, RandomChoice, RandomOrder, Lambda and those applied on PIL images, such as ToPILImage.

    Native image IO for JPEG and PNG formats

    torchvision 0.8.0 introduces native image reading and writing operations for JPEG and PNG formats. Those operators support TorchScript and return CxHxW tensors in uint8 format, and can thus be now part of your model for deployment in C++ environments.

    from torchvision.io import read_image
    
    # tensor_image is a CxHxW uint8 Tensor
    tensor_image = read_image('path_to_image.jpeg')
    
    # or equivalently
    from torchvision.io.image import read_file, decode_image
    # raw_data is a 1d uint8 Tensor with the raw bytes
    raw_data = read_file('path_to_image.jpeg')
    tensor_image = decode_image(raw_data)
    
    # all operators are torchscriptable and can be
    # serialized together with your model torchscript code
    scripted_read_image = torch.jit.script(read_image)
    

    New detection model

    This release adds a pretrained model for RetinaNet with a ResNet50 backbone from Focal Loss for Dense Object Detection, with the following accuracies on COCO val2017:

    IoU metric: bbox
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.364
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.558
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.383
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.193
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.400
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.490
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.315
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.506
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.558
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.386
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.595
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.699
    

    [BETA] New Video Reader API

    This release introduces a new video reading abstraction, which gives more fine-grained control on how to iterate over the videos. It supports image and audio, and implements an iterator interface so that it can be combined with the rest of the python ecosystem, such as itertools.

    from torchvision.io import VideoReader
    
    # stream indicates if reading from audio or video
    reader = VideoReader('path_to_video.mp4', stream='video')
    # can change the stream after construction
    # via reader.set_current_stream
    
    # to read all frames in a video starting at 2 seconds
    for frame in reader.seek(2):
        # frame is a dict with "data" and "pts" metadata
        print(frame["data"], frame["pts"])
    
    # because reader is an iterator you can combine it with
    # itertools
    from itertools import takewhile, islice
    # read 10 frames starting from 2 seconds
    for frame in islice(reader.seek(2), 10):
        pass
        
    # or to return all frames between 2 and 5 seconds
    for frame in takewhile(lambda x: x["pts"] < 5, reader.seek(2)):
        pass
    

    Note: In order to use the Video Reader API, you need to compile torchvision from source and make sure that you have ffmpeg installed in your system. Note: the VideoReader API is currently released as beta and its API can change following user feedback.

    Backwards Incompatible Changes

    • [Transforms] Random seed now should be set with torch.manual_seed instead of random.seed (#2292)
    • [Transforms] RandomErasing.get_params function’s argument was previously value=0 and is now value=None which is interpreted as Gaussian random noise (#2386)
    • [Transforms] RandomPerspective and F.perspective changed the default value of interpolation to be BILINEAR instead of BICUBIC (#2558, #2561)
    • [Transforms] Fixes incoherence in affine transformation when center is defined as half image size + 0.5 (#2468)

    New Features

    • [Ops] Added focal loss (#2784)
    • [Ops] Added bounding boxes conversion function (#2710, #2737)
    • [Ops] Added Generalized IOU (#2642)
    • [Models] Added RetinaNet object detection model (#2784)
    • [Datasets] Added Places365 dataset (#2610, #2625)
    • [Transforms] Added GaussianBlur transform (#2658)
    • [Transforms] Added torchscript, batch and GPU and tensor support for transforms (#2769, #2767, #2749, #2755, #2485, #2721, #2645, #2694, #2584, #2661, #2566, #2345, #2342, #2356, #2368, #2373, #2496, #2553, #2495, #2561, #2518, #2478, #2459, #2444, #2396, #2401, #2394, #2586, #2371, #2477, #2456, #2628, #2569, #2639, #2620, #2595, #2456, #2403, #2729)
    • [Transforms] Added example notebook for tensor transforms (#2730)
    • [IO] Added JPEG/PNG encoding / decoding ops
      • JPEG (#2388, #2471, #2696, #2725)
      • PNG (#2382, #2726, #2398, #2457, #2735)
      • decode_image (#2680, #2695, #2718, #2764, #2766)
    • [IO] Added file reading / writing ops (#2728, #2765, #2768)
    • [IO] [BETA] Added new VideoReader API (#2683, #2781, #2778, #2802, #2596, #2612, #2734, #2770)

    Improvements

    Datasets

    • Added error message if Google Drive download quota is exceeded (#2321)
    • Optimized LSUN initialization time by only pulling keys from db (#2544)
    • Use more precise return type for gzip.open() (#2792)
    • Added UCF101 dataset tests (#2548)
    • Added download tests on a schedule (#2665, #2675, #2699, #2706, #2747, #2731)
    • Added typehints for datasets (#2487, #2521, #2522, #2523, #2524, #2526, #2528, #2529, #2525, #2527, #2530, #2533, #2534, #2535, #2536, #2532, #2538, #2537, #2539, #2531, #2540, #2667)

    Models

    • Removed hard coded value in DeepLabV3 (#2793)
    • Changed the anchor generator default argument to an equivalent one (#2722)
    • Moved model construction location in resnet_fpn_backbone into after docstring (#2482)
    • Partially enabled type hints for models (#2668)

    Ops

    • Moved RoIs shape check to C++ (#2794)
    • Use autocast built-in cast-helper functions (#2646)
    • Adde type annotations for torchvision.ops (#2331, #2462)

    References

    • [References] Removed redundant target send to device in detection evaluation (#2503)
    • [References] Removed obsolete import in segmentation. (#2399)

    Misc

    • [Transforms] Added support for negative padding in pad (#2744)
    • [IO] Added type hints for torchvision.io (#2543)
    • [ONNX] Export ROIAlign with aligned=True (#2613)

    Internal

    • [Binaries] Added CUDA 11 binary builds (#2671)
    • [Binaries] Added DEBUG=1 option to build torchvision (#2603)
    • [Binaries] Unpin ninja version (#2358)
    • Warn if torchvision imported from repo root (#2759)
    • Added compatibility checks for C++ extensions (#2467)
    • Added probot (#2448)
    • Added ipynb to git attributes file (#2772)
    • CI improvements (#2328, #2346, #2374, #2437, #2465, #2579, #2577, #2633, #2640, #2727, #2754, #2674, #2678)
    • CMakeList improvements (#2739, #2684, #2626, #2585, #2587)
    • Documentation improvements (#2659, #2615, #2614, #2542, #2685, #2507, #2760, #2550, #2656, #2723, #2601, #2654, #2757, #2592, #2606)

    Bug Fixes

    • [Ops] Fixed crash in deformable convolutions (#2604)
    • [Ops] Added empty batch support for DeformConv2d (#2782)
    • [Transforms] Enforced contiguous output in to_tensor (#2483)
    • [Transforms] Fixed fill parameter for PIL pad (#2515)
    • [Models] Fixed deprecation warning in nonzero for R-CNN models (#2705)
    • [IO] Explicitly cast to size_t in video decoder (#2389)
    • [ONNX] Fixed dynamic resize in Mask R-CNN (#2488)
    • [C++ API] Fixed function signatures for torch::nn::Functional (#2463)

    Deprecations

    • [Transforms] Deprecated dedicated implementations functional_tensor of F_t.center_crop, F_t.five_crop, F_t.ten_crop, as they can be implemented as a function of crop (#2568)
    • [Transforms] Deprecated explicit usage of F_pil and F_t functions, users should instead use the general functional API (#2664)
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Jul 28, 2020)

    Highlights

    Mixed precision support for all models

    torchvision models now support mixed-precision training via the new torch.cuda.amp package. Using mixed precision support is easy: just wrap the model and the loss inside a torch.cuda.amp.autocast context manager. Here is an example with Faster R-CNN:

    import torch, torchvision
    
    device = torch.device('cuda')
    
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn()
    model.to(device)
    
    input = [torch.rand(3, 300, 400, device=device)]
    boxes = torch.rand((5, 4), dtype=torch.float32, device=device)
    boxes[:, 2:] += boxes[:, :2]
    target = [{"boxes": boxes,
              "labels": torch.zeros(5, dtype=torch.int64, device=device),
              "image_id": 4,
              "area": torch.zeros(5, dtype=torch.float32, device=device),
              "iscrowd": torch.zeros((5,), dtype=torch.int64, device=device)}]
    
    # use automatic mixed precision
    with torch.cuda.amp.autocast():
        loss_dict = model(input, target)
    losses = sum(loss for loss in loss_dict.values())
    # perform backward outside of autocast context manager
    losses.backward()
    

    New pre-trained segmentation models

    This releases adds pre-trained weights for the ResNet50 variants of Fully-Convolutional Networks (FCN) and DeepLabV3. They are available under torchvision.models.segmentation, and can be obtained as follows:

    torchvision.models.segmentation.fcn_resnet50(pretrained=True)
    torchvision.models.segmentation.deeplabv3_resnet50(pretrained=True)
    

    They obtain the following accuracies: Network | mean IoU | global pixelwise acc -- | -- | -- FCN ResNet50 | 60.5 | 91.4 DeepLabV3 ResNet50 | 66.4 | 92.4

    Improved ONNX support for Faster / Mask / Keypoint R-CNN

    This release restores ONNX support for the R-CNN family of models that had been temporarily dropped in the 0.6.0 release, and additionally fixes a number of corner cases in the ONNX export for these models. Notable improvements includes support for dynamic input shape exports, including images with no detections.

    Backwards Incompatible Changes

    • [Transforms] Fix for integer fill value in constant padding (#2284)
    • [Models] Replace L1 loss with smooth L1 loss in Faster R-CNN for better performance (#2113)
    • [Transforms] Use torch.rand instead of random.random() for random transforms (#2520)

    New Features

    • [Models] Add mixed-precision support (#2366, #2384)
    • [Models] Add fcn_resnet50 and deeplabv3_resnet50 pretrained models. (#2086, #2091)
    • [Ops] Added eps attribute to FrozenBatchNorm2d (#2190)
    • [Transforms] Add convert_image_dtype to functionals (#2078)
    • [Transforms] Add pil_to_tensor to functionals (#2092)

    Bug Fixes

    • [JIT] Fix virtualenv and torchhub support by removing eager scripting calls (#2248)
    • [IO] Fix write_video when floating point FPS is passed (#2334)
    • [IO] Fix missing compilation files for video-reader (#2183)
    • [IO] Fix missing include for OSX in video decoder (#2224)
    • [IO] Fix overflow error for large buffers. (#2303)
    • [Ops] Fix wrong clamping in RoIAlign with aligned=True (#2438)
    • [Ops] Fix corner case in interpolate (#2146)
    • [Ops] Fix the use of contiguous() in C++ kernels (#2131)
    • [Ops] Restore support of tuple of Tensors for region pooling ops (#2199)
    • [Datasets] Fix bug related with trailing slash on UCF-101 dataset (#2186)
    • [Models] Make copy of targets in GeneralizedRCNNTransform (#2227)
    • [Models] Fix DenseNet issue with gradient checkpoints (#2236)
    • [ONNX] Fix ONNX implementation ofheatmaps_to_keypoints in KeypointRCNN (#2312)
    • [ONNX] Fix export of images with no detection for Faster / Mask / Keypoint R-CNN (#2126, #2215, #2272)

    Deprecations

    • [Ops] Deprecate Conv2d, ConvTranspose2d and BatchNorm2d (#2244)
    • [Ops] Deprecate interpolate in favor of PyTorch's implementation (#2252)

    Improvements

    Datasets

    • Fix DatasetFolder error message (#2143)
    • Change range(len) to enumerate in DatasetFolder (#2153)
    • [DOC] Fix link URL to Flickr8k (#2178)
    • [DOC] Add CelebA to docs (#2107)
    • [DOC] Improve documentation of DatasetFolder and ImageFolder (#2112)

    TorchHub

    • Fix torchhub tests due to numerical changes in torch.sum (#2361)
    • Add all the latest models to hubconf (#2189)

    Transforms

    • Add fill argument to __repr__ of RandomRotation (#2340)
    • Add tensor support for adjust_hue (#2300, #2355)
    • Make ColorJitter torchscriptable (#2298)
    • Make RandomHorizontalFlip and RandomVerticalFlip torchscriptable (#2282)
    • [DOC] Use consistent symbols in the doc of Normalize to avoid confusion (#2181)
    • [DOC] Fix typo in hflip in functional.py (#2177)
    • [DOC] Fix spelling errors in functional.py (#2333)

    IO

    • Refactor video.py to improve clarity (#2335)
    • Save memory by not storing full frames in read_video_timestamps (#2202, #2268)
    • Improve warning when video_reader backend is not available (#2225)
    • Set should_buffer to True by default in _read_from_stream (#2201)
    • [Test] Temporarily disable one PyAV test (#2150)

    Models

    • Improve target checks in GeneralizedRCNN (#2207, #2258)
    • Use Module objects instead of functions for some layers of Inception3 (#2287)
    • Add support for other normalizations in MobileNetV2 (#2267)
    • Expose layer freezing option to detection models (#2160, #2242)
    • Make ASPP-Layer in DeepLab more generic (#2174)
    • Faster initialization for Inception family of models (#2170, #2211)
    • Make norm_layer as parameters in models/detection/backbone_utils.py (#2081)
    • Updates integer division to use floor division operator (#2234, #2243)
    • [JIT] Clean up no longer needed workarounds for torchscript support (#2249, #2261, #2210)
    • [DOC] Add docs to clarify aspect ratio definition in RPN. (#2185)
    • [DOC] Fix roi_heads argument name in doctstring of GeneralizedRCNN (#2093)
    • [DOC] Fix type annotation in RPN docstring (#2149)
    • [DOC] add clarifications to Object detection reference documentation (#2241)
    • [Test] Add tests for negative samples for Mask R-CNN and Keypoint R-CNN (#2069)

    Reference scripts

    • Add support for SyncBatchNorm in QAT reference script (#2230, #2280)
    • Fix training resuming in references/segmentation (#2142)
    • Rename image to images in references/detection/engine.py (#2187)

    ONNX

    • Add support for dynamic input shape export in R-CNN models (#2087)

    Ops

    • Added number of features in FrozenBatchNorm2d __repr__ (#2168)
    • improve consistency among box IoU CPU / GPU calculations (#2072)
    • Avoid using in header files (#2257)
    • Make ceil_div __host__ __device__ (#2217)
    • Don't include CUDAApplyUtils.cuh (#2127)
    • Add namespace to avoid conflict with ATen version of channel_shuffle() (#2206)
    • [DOC] Update the statement of supporting torchscript ops (#2343)
    • [DOC] Update torchvision ops in doc (#2341)
    • [DOC] Improve documentation for NMS (#2159)
    • [Test] Add more tests to NMS (#2279)

    Misc

    • Add PyTorch version compatibility table to README (#2260)
    • Fix lint (#2182, #2226, #2070)
    • Update version to 0.6.0 in CMake (#2140)
    • Remove mock (#2096)
    • Remove warning about deprecated (#2064)
    • Cleanup unused import (#2067)
    • Type annotations for torchvision/utils.py (#2034)

    CI

    • Add version suffix to build version
    • Add backslash to escape
    • Add workflows to run on tag
    • Bump version to 0.7.0, pin PyTorch to 1.6.0
    • Update link for cudnn 10.2 (#2277)
    • Fix binary builds with CUDA 9.2 on Windows (#2273)
    • Remove Python 3.5 from CI (#2158)
    • Improvements to CI infra (#2075, #2071, #2058, #2073, #2099, #2137, #2204, #2264, #2274, #2319)
    • Master version bump 0.6 -> 0.7 (#2102)
    • Add test channels for pytorch version functions (#2208)
    • Add static type check with mypy (#2195, #1696, #2247)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Jun 22, 2020)

  • v0.6.0(Apr 21, 2020)

    This release is the first one that officially drops support for Python 2. It contains a number of improvements and bugfixes.

    Highlights

    Faster/Mask/Keypoint RCNN supports negative samples

    It is now possible to feed training images to Faster / Mask / Keypoint R-CNN that do not contain any positive annotations. This enables increasing the number of negative samples during training. For those images, the annotations expect a tensor with 0 in the number of objects dimension, as follows:

    target = {"boxes": torch.zeros((0, 4), dtype=torch.float32),
              "labels": torch.zeros(0, dtype=torch.int64),
              "image_id": 4,
              "area": torch.zeros(0, dtype=torch.float32),
              "masks": torch.zeros((0, image_height, image_width), dtype=torch.uint8),
              "keypoints": torch.zeros((17, 0, 3), dtype=torch.float32),
              "iscrowd": torch.zeros((0,), dtype=torch.int64)}
    

    Aligned flag for RoIAlign

    RoIAlign now supports the aligned flag, which aligns more precisely two neighboring pixel indices.

    Refactored abstractions for C++ video decoder

    This change is transparent to Python users, but the whole C++ backend for video reading (which needs torchvision to be compiled from source for it to be enabled for now) has been refactored into more modular abstractions. The core abstractions are in https://github.com/pytorch/vision/tree/master/torchvision/csrc/cpu/decoder, and the video reader functions exposed to Python, by leveraging those abstractions, can be written in a much more concise way

    Backwards Incompatible Changes

    • Dropping Python2 support (#1761, #1792, #1984, #1976, #2037, #2033, #2017)
    • [Models] Fix inception quantized pre-trained model (#1954, #1969, #1975)
    • ONNX support for Mask R-CNN and Keypoint R-CNN has been temporarily dropped, but will be fixed in next releases

    New Features

    • [Transforms] Add Perspective fill option (#1973)
    • [Ops] aligned flag in ROIAlign (#1908)
    • [IO] Update video reader to use new decoder (#1978)
    • [IO] torchscriptable functions for video io (#1653, #1794)
    • [Models] Support negative samples in Faster R-CNN, Mask R-CNN and Keypoint R-CNN (#1911, #2069)

    Improvements

    Datasets

    • STL10: don't check integrity twice when download=True (#1787)
    • Improve code readability and docstring of video datasets(#2020)
    • [DOC] Fixed typo in Cityscapes docs (#1851)

    Transforms

    • Allow passing list to the input argument 'scale' of RandomResizedCrop (#1997) (#2008)
    • F.normalize unsqueeze mean & std only for 1-d arrays (#2002)
    • Improved error messages for transforms.functional.normalize(). (#1915)
    • generalize number of bands calculation in to_tensor (#1781)
    • Replace 2 transpose ops with 1 permute in ToTensor(#2018)
    • Fixed Pillow version check for Pillow >= 10 (#2039)
    • [DOC]: Improve transforms.Normalize docs (#1784, #1858)
    • [DOC] Fixed missing new line in transforms.Crop docstring (#1922)

    Ops

    • Check boxes shape in RoIPool / Align (#1968)
    • [ONNX] Export new_empty_tensor (#1733)
    • Fix Tensor::data<> deprecation. (#2028)
    • Fix deprecation warnings (#2055)

    Models

    • Add warning and note docs for scipy (#1842) (#1966)
    • Added repr attribute to GeneralizedRCNNTransform (#1834)
    • Replace mean on dimensions 2,3 by adaptive_avg_pooling2d in mobilenet (#1838)
    • Add init_weights keyword argument to Inception3 (#1832)
    • Add device to torch.tensor. (#1979)
    • ONNX export for variable input sizes in Faster R-CNN (#1840)
    • [JIT] Cleanup torchscript constant annotations (#1721, #1923, #1907, #1727)
    • [JIT] use // now that it is supported (#1658)
    • [JIT] add @torch.jit.script to ImageList (#1919)
    • [DOC] Improved docs for Faster R-CNN (#1886, #1868, #1768, #1763)
    • [DOC] add comments for the modified implementation of ResNet (#1983)
    • [DOC] Add comments to AnchorGenerator (#1941)
    • [DOC] Add comment in GoogleNet (#1932)

    Documentation

    • Document int8 quantization model (#1951)
    • Update Doc with ONNX support (#1752)
    • Update README to reflect strict dependency on torch==1.4.0 (#1767)
    • Update sphinx theme (#2031)
    • Document origin of preprocessing mean / std (#1965)
    • Fix docstring formatting issues (#2049)

    Reference scripts

    • Add return statement in evaluate function of detection reference script (#2029)
    • [DOC]Add default training parameters to classification reference README (#1998)
    • [DOC] Add README to references/segmentation (#1864)

    Tests

    • Improve stability of test_nms_cuda (#2044)
    • [ONNX] Disable model tests since export of interpolate script module is broken (#1989)
    • Skip inception v3 in test/test_quantized_models (#1885)
    • [LINT] Small indentation fix (#1831)

    Misc

    • Remove unintentional -O0 option in setup.py (#1770)
    • Create CODE_OF_CONDUCT.md
    • Update issue templates (#1913, #1914)
    • master version bump 0.5 → 0.6
    • replace torch 1.5.0 items flagged with deprecation warnings (fix #1906) (#1918)
    • CUDA_SUFFIX → PYTORCH_VERSION_SUFFIX

    CI

    • Remove av from the binary requirements (#2006)
    • ci: Add cu102 to CI and packaging, remove cu100 (#1980)
    • .circleci: Switch to use token for conda uploads (#1960)
    • Improvements to CI infra (#2051, #2032, #2046, #1735, #2048, #1789, #1731, #1961)
    • typing only needed for python 3.5 and previous (#1778)
    • Move C++ and Python linter to CircleCI (#2056, #2057)

    Bug Fixes

    Datasets

    • bug fix on downloading voc2007 test dataset (#1991)
    • fix lsun docstring example (#1935)
    • Fixes EMNIST classes attribute is wrong #1716 (#1736)
    • Force object annotation to be a list in VOC (#1790)

    Models

    • Fix for AnchorGenerator when device switch happen (#1745)
    • [JIT] fix len error (#1981)
    • [JIT] fix googlenet no aux logits (#1949)
    • [JIT] Fix quantized googlenet (#1974)

    Transforms

    • Fix for rotate fill with Images of type F (#1828)
    • Fix fill in rotate (#1760)

    Ops

    • Fix bug in DeformConv2d for batch sizes > 32 (#2027, #2040)
    • Fix for roi_align ONNX export (#1988)
    • Fix torchscript issue in ConvTranspose2d (#1917)
    • Fix interpolate when no scale_factor is passed (#1785)
    • Fix Windows build by renaming Python init functions (#1779)
    • fix for loading models with num_batches_tracked in frozen bn (#1728)

    Deprecations

    • the pts_unit of pts from read_video and read_video_timestamp is deprecated, and will be replaced in next releases with seconds.
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Jan 15, 2020)

    This release brings several new additions to torchvision that improves support for deployment. Most notably, all models in torchvision are torchscript-compatible, and can be exported to ONNX. Additionally, a few classification models have quantized weights.

    Note: this is the last version of torchvision that officially supports Python 2.

    Breaking changes

    Updated KeypointRCNN pre-trained weights

    The pre-trained weights for keypointrcnn_resnet50_fpn have been updated and now correspond to the results reported in the documentation. The previous weights corresponded to an intermediate training checkpoint. (#1609)

    Corrected the implementation for MNASNet

    The previous implementation contained a bug which affects all MNASNet variants other than mnasnet1_0. The bug was that the first few layers needed to also be scaled in terms of width multiplier, along with all the rest. We now provide a new checkpoint for mnasnet0_5, which gives 32.17 top1 error. (#1224)

    Highlights

    TorchScript support for all models

    All models in torchvision have native support for torchscript, for both training and testing. This includes complex models such as DeepLabV3, Mask R-CNN and Keypoint R-CNN. Using torchscript with torchvision models is easy:

    # get a pre-trained model
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
    
    # convert to torchscript
    model_script = torch.jit.script(model)
    model_script.eval()
    
    # compute predictions
    predictions = model_script([torch.rand(3, 300, 300)])
    

    Warning: the return type for the scripted version of Faster R-CNN, Mask R-CNN and Keypoint R-CNN is different from its eager counterpart, and it always returns a tuple of losses, detections. This discrepancy will be addressed in a future release.

    ONNX

    All models in torchvision can now be exported to ONNX for deployment. This includes models such as Mask R-CNN.

    # get a pre-trained model
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
    model.eval()
    inputs = [torch.rand(3, 300, 300)]
    predictions = model(inputs)
    
    # convert to ONNX
    torch.onnx.export(model, inputs, "model.onnx",
                      do_constant_folding=True,
                      opset_version=11  # opset_version 11 required for Mask R-CNN
                      )
    

    Warning: for Faster R-CNN / Mask R-CNN / Keypoint R-CNN, the current exported model is dependent on the input shape during export. As such, make sure that once the model has been exported to ONNX that all images that are fed to it have the same shape as the shape used to export the model to ONNX. This behavior will be made more general in a future release.

    Quantized models

    torchvision now provides quantized models for ResNet, ResNext, MobileNetV2, GoogleNet, InceptionV3 and ShuffleNetV2, as well as reference scripts for quantizing your own model in references/classification/train_quantization.py (https://github.com/pytorch/vision/blob/master/references/classification/train_quantization.py). Obtaining a pre-trained quantized model can be obtained with a few lines of code:

    model = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True)
    model.eval()
    
    # run the model with quantized inputs and weights
    out = model(torch.rand(1, 3, 224, 224))
    

    We provide pre-trained quantized weights for the following models:

    | Model | Acc@1 | Acc@5 | | --- | --- | --- | MobileNet V2 | 71.658 | 90.150 ShuffleNet V2: | 68.360 | 87.582 ResNet 18 | 69.494 | 88.882 ResNet 50 | 75.920 | 92.814 ResNext 101 32x8d | 78.986 | 94.480 Inception V3 | 77.084 | 93.398 GoogleNet | 69.826 | 89.404

    Torchscript support for torchvision.ops

    torchvision ops are now natively supported by torchscript. This includes operators such as nms, roi_align and roi_pool, and for the ops that support backpropagation, both eager and torchscript modes are supported in autograd.

    New operators

    Deformable Convolution (#1586) (#1660) (#1637)

    As described in Deformable Convolutional Networks (https://arxiv.org/abs/1703.06211), torchvision now supports deformable convolutions. The model expects as input both the input as well as the offsets, and can be used as follows:

    from torchvision import ops
    
    module = ops.DeformConv2d(in_channels=1, out_channels=1, kernel_size=3, padding=1)
    x = torch.rand(1, 1, 10, 10)
    
    # number of channels for offset should be a multiple
    # of 2 * module.weight.size[2] * module.weight.size[3], which correspond
    # to the kernel_size
    offset = torch.rand(1, 2 * 3 * 3, 10, 10)
    
    # the output requires both the input and the offsets
    out = module(x, offset)
    

    If needed, the user can create their own wrapper module that imposes constraints on the offset. Here is an example, using a single convolution layer to compute the offset:

    class BasicDeformConv2d(nn.Module):
        def __init__(self, in_channels, out_channels, kernel_size=1, stride=1,
                     dilation=1, groups=1, offset_groups=1):
            super().__init__()
            offset_channels = 2 * kernel_size * kernel_size
            self.conv2d_offset = nn.Conv2d(
                in_channels,
                offset_channels * offset_groups,
                kernel_size=3,
                stride=stride,
                padding=dilation,
                dilation=dilation,
            )
            self.conv2d = ops.DeformConv2d(
                in_channels,
                out_channels,
                kernel_size=kernel_size,
                stride=stride,
                padding=dilation,
                dilation=dilation,
                groups=groups,
                bias=False
            )
        
        def forward(self, x):
            offset = self.conv2d_offset(x)
            return self.conv2d(x, offset)
    

    Position-sensitive RoI Pool / Align (#1410)

    Position-Sensitive Region of Interest (RoI) Align operator mentioned in Light-Head R-CNN (https://arxiv.org/abs/1711.07264). These are available under ops.ps_roi_align, ps_roi_pool and the module equivalents ops.PSRoIAlign and ops.PSRoIPool, and have the same interface as RoIAlign / RoIPool.

    New Features

    TorchScript support

    • Bugfix in BalancedPositiveNegativeSampler introduced during torchscript support (#1670)
    • Make R-CNN models less verbose in script mode (#1671)
    • Minor torchscript fixes for Mask R-CNN (#1639)
    • remove BC-breaking changes (#1560)
    • Make maskrcnn scriptable (#1407)
    • Add Script Support for Video Resnet Models (#1393)
    • fix ASPPPooling (#1575)
    • Test that torchhub models are scriptable (#1242)
    • Make Googlnet & InceptionNet scriptable (#1349)
    • Make fcn_resnet Scriptable (#1352)
    • Make Densenet Scriptable (#1342)
    • make resnext scriptable (#1343)
    • make shufflenet and resnet scriptable (#1270)

    ONNX

    • Enable KeypointRCNN test (#1673)
    • enable mask rcnn test (#1613)
    • Changes to Enable KeypointRCNN ONNX Export (#1593)
    • Disable Profiling in Failing Test (#1585)
    • Enable ONNX Test for FasterRcnn (#1555)
    • Support Exporting Mask Rcnn to ONNX (#1461)
    • Lahaidar/export faster rcnn (#1401)
    • Support Exporting RPN to ONNX (#1329)
    • Support Exporting MultiScaleRoiAlign to ONNX (#1324)
    • Support Exporting GeneralizedRCNNTransform to ONNX (#1325)

    Quantization

    • Update quantized shufflenet weights (#1715)
    • Add commands to run quantized model with pretrained weights (#1547)
    • Quantizable googlenet, inceptionv3 and shufflenetv2 models (#1503)
    • Quantizable resnet and mobilenet models (#1471)
    • Remove model download from test_quantized_models (#1526)

    Improvements

    Bugfixes

    • Bugfix on GroupedBatchSampler for corner case where there are not enough examples in a category to form a batch (#1677)
    • Fix rpn memory leak and dataType errors. (#1657)
    • Fix torchvision install due to zippeg egg (#1536)

    Transforms

    • Make shear operation area preserving (#1529)
    • PILLOW_VERSION deprecation updates (#1501)
    • Adds optional fill colour to rotate (#1280)

    Ops

    • Add Deformable Convolution operation. (#1586) (#1660) (#1637)
    • Fix inconsistent NMS implementation between CPU and CUDA (#1556)
    • Speed up nms_cuda (#1704)
    • Implementation for Position-sensitive ROI Pool/Align (#1410)
    • Remove cpp extensions in favor of torch ops (#1348)
    • Make custom ops differentiable (#1314)
    • Fix Windows build in Torchvision Custom op Registration (#1320)
    • Revert "Register Torchvision Ops as Cutom Ops (#1267)" (#1316)
    • Register Torchvision Ops as Cutom Ops (#1267)
    • Use Tensor.data_ptr instead of .data (#1262)
    • Fix header includes for cpu (#1644)

    Datasets

    • fixed test for windows by closing the created temporary files (#1662)
    • VideoClips windows fixes (#1661)
    • Fix VOC on Windows (#1641)
    • update dead LSUN link (#1626)
    • DatasetFolder should follow links when searching for data (#1580)
    • add .tgz support to extract_archive (#1650)
    • expose audio_channels as a parameter to kinetics dataset (#1559)
    • Implemented integrity check (md5 hash) after dataset download (#1456)
    • Move VideoClips dummy dataset to top level for pickling (#1649)
    • Remove download for ImageNet (#1457)
    • add tar.xz archive handler (#1361)
    • Fix DeprecationWarning for collections.Iterable import in LSUN (#1417)
    • Support empty target_type for CelebA dataset (#1351)
    • VOC2007 support test set (#1340)
    • Fix EMNSIT download URL (#1297) (#1318)
    • Refactored clip_sampler (#1562)

    Documentation

    • Fix documentation for NMS (#1614)
    • More examples of functional transforms (#1402)
    • Fixed doc of crop functionals (#1388)
    • Added Training Sample code for fasterrcnn_resnet50_fpn (#1695)
    • Fix rpn.py typo (#1276)
    • Update README with minimum required version of PyTorch (#1272)
    • fix alignment of README (#1396)
    • fixed typo in DatasetFolder and ImageFolder (#1284)

    Models

    • Bugfix for MNASNet (#1224)
    • Fix anchor dtype in AnchorGenerator (#1341)

    Utils

    • Adding File object option to utils.save_image (#1301)
    • Fix make_grid: support any number of channels in tensor (#1300)
    • Fix bug of changing input tensor in utils.save_image (#1244)

    Reference scripts

    • add a README for training object detection models (#1612)
    • Adding args for names of train and val directories (#1544)
    • Fix broken bitwise operation in Similarity Reference loss (#1604)
    • Fixing issue #1530 by starting ann_id to 1 in convert_to_coco_api (#1531)
    • Add commands for model training (#1203)
    • adding documentation for automatic mixed precision training (#1533)
    • Fix reference training script for Mask R-CNN for PyTorch 1.2 (during evaluation after epoch, mask datatype became bool, pycocotools expects uint8) (#1413)
    • fix a little bug about resume (#1628)
    • Better explain lr and batch size in references/detection/train.py (#1233)
    • update default parameters in references/detection (#1611)
    • Removed code redundancy/refactored inn video_classification (#1549)
    • Fix comment in default arguments in references/detection (#1243)

    Tests

    • Correctness test implemented with old test architecture (#1511)
    • Simplify and organize test_ops. (#1551)
    • Replace asserts with assertEqual (#1488)(#1499)(#1497)(#1496)(#1498)(#1494)(#1487)(#1495)
    • Add expected result tests (#1377)
    • Add TorchHub tests to torchvision (#1319)
    • Scriptability checks for Tensor Transforms (#1690)
    • Add tests for results in script vs eager mode (#1430)
    • Test for checking non mutating behaviour of tensor transforms (#1656)
    • Disable download tests for Python2 (#1269)
    • Fix randomresized params flaky (#1282)

    CI

    • Disable C++ models from being compiled without explicit request (#1535)
    • Fix discrepancy in regenerate.py (#1583)
    • soumith -> pytorch for docker images (#1577)
    • [wip] try vs2019 toolchain (#1509)
    • Make CI use PyTorch nightly (#1492)
    • Try enabling Windows CUDA CI (#1486)
    • Fix CUDA builds on Windows (#1485)
    • Try fix Windows CircleCI (#1433)
    • Fix CUDA CI (#1464)
    • Change approach for rebase to master (#1427)
    • Temporary fix for CI (#1411)
    • Use PyTorch 1.3 for CI (#1467)
    • Use links from S3 to install CUDA (#1472)
    • Enable CUDA 9.2 builds for Windows (#1381)
    • Fix nightly builds (#1374)
    • Fix Windows CI after #1301 (#1368)
    • Retry anaconda login for Windows builds (#1366)
    • Fix nightly wheels builds for Windows (#1358)
    • Fix CI for py2.7 cu100 wheels (#1354)
    • Fix Windows CI (#1347)
    • Windows build scripts (#1241)
    • Make CircleCI checkout merge commit (#1344)
    • use native python code generation logic (#1321)
    • Add CircleCI (v2) (#1298)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Nov 7, 2019)

    This minor release introduces an optimized video_reader backend for torchvision. It is implemented in C++, and uses FFmpeg internally.

    The new video_reader backend can be up to 6 times faster compared to the pyav backend.

    • When decoding all video/audio frames in the video, the new video_reader is 1.2x - 6x faster depending on the codec and video length.
    • When decoding a fixed number of video frames (e.g. [4, 8, 16, 32, 64, 128]), video_reader runs equally fast for small values (i.e. [4, 8, 16]) and runs up to 3x faster for large values (e.g. [32, 64, 128]).

    Using the optimized video backend

    Switching to the new backend can be done via torchvision.set_video_backend('video_reader') function. By default, we use a backend based on top of PyAV.

    Due to packaging issues with FFmpeg, in order to use the video_reader backend one need to first have ffmpeg available on the system, and then compile torchvision from source using the instructions from https://github.com/pytorch/vision#installation

    Deprecations

    In torchvision 0.4.0, the read_video and read_video_timestamps functions used pts relative to the video stream. This could lead to unaligned video-audio being returned in some cases.

    torchvision now allow to specify a pts_unit argument in those functions. The default value is 'pts' (with same behavior as before), and the user can now specify pts_unit='sec', which produces consistently aligned results for both video and audio. The 'pts' value is deprecated for now, and kept for backwards-compatibility.

    In the next release, the default value of pts_unit will change to 'sec', so that calling read_video without specifying pts_unit returns consistently aligned audio-video results. This will require users to update their VideoClips checkpoints, which used to store the information in pts by default.

    Changelog

    • [video reader] inception commit (#1303) 31fad34
    • Expose frame-rate and cache to video datasets (#1356) 85ffd93
    • Expose num_workers in VideoClips (#1359) 02a8c0a
    • Fix randomresized params flaky (#1282) 7c9bbf5
    • Video transforms (#1353) 64917bc
    • add _backend argument to init() of class VideoClips (#1363) 7874374
    • Video clips workers (#1369) 0982395
    • modified code of io.read_video and io.read_video_timestamps to intepret pts values in seconds (#1331) 17e355f
    • add metadata to video dataset classes. bug fix. more robustness (#1376) 49b01e3
    • move sampler into TV core. Update UniformClipSampler (#1408) f0d3daa
    • remove hardcoded video extension in kinetics400 dataset (#1418) 929c81d
    • Fix hmdb51 and ucf101 typo (#1420) b13931a
    • fix a bug related to audio_end_pts (#1431) 1258bb7
    • expose more io api (#1423) e48b958
    • Make video transforms private (#1429) 79daca1
    • extend video reader to support fast video probing (#1437) ed5b2dc
    • Better handle corrupted videos (#1463) da89dad
    • Temporary fix to remove ffmpeg from build time (#1475) ed04dee
    • fix a bug when video decoding fails and empty frames are returned (#1506) 2804c12
    • extend DistributedSampler to support group_size (#1512) 355e9d2
    • Unify video backend (#1514) 97b53f9
    • Unify video metadata in VideoClips (#1527) 7d509c5
    • Fixed compute_clips docstring (#1543) b438d32
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Oct 30, 2019)

    This minor release provides binaries compatible with PyTorch 1.3.

    Compared to version 0.4.0, it contains a single bugfix for HMDB51 and UCF101 datasets, fixed in https://github.com/pytorch/vision/pull/1240

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Aug 8, 2019)

    This release adds support for video models and datasets, and brings several improvements.

    Note: torchvision 0.4 requires PyTorch 1.2 or newer

    Highlights

    Video and IO

    Video is now a first-class citizen in torchvision. The 0.4 release includes:

    • efficient IO primitives for reading and writing video files
    • Kinetics-400, HMDB51 and UCF101 datasets for action recognition, which are compatible with torch.utils.data.DataLoader
    • Pre-trained models for action recognition, trained on Kinetics-400
    • Training and evaluation scripts for reproducing the training results.

    Writing your own video dataset is easy. We provide an utility class VideoClips that simplifies the task of enumerating all possible clips of fixed size in a list of video files by creating an index of all clips in a set of videos. It additionally allows to specify a fixed frame-rate for the videos.

    from torchvision.datasets.video_utils import VideoClips
    
    class MyVideoDataset(object):
        def __init__(self, video_paths):
            self.video_clips = VideoClips(video_paths,
                                          clip_length_in_frames=16,
                                          frames_between_clips``=1,
                                          frame_rate=15)
    
        def __getitem__(self, idx):
            video, audio, info, video_idx = self.video_clips.get_clip(idx)
            return video, audio
        
        def __len__(self):
            return self.video_clips.num_clips()
    

    We provide pre-trained models for action recognition, trained on Kinetics-400, which reproduce the results on the original papers where they have been first introduced, as well the corresponding training scripts.

    |model |clip @ 1 | |--- |--- | |r3d_18 |52.748 | |mc3_18 |53.898 | |r2plus1d_18 |57.498 |

    Bugfixes

    • change aspect ratio calculation formula in references/detection (#1194)
    • bug fixes in ImageNet (#1149)
    • fix save_image when height or width equals 1 (#1059)
    • Fix STL10 __repr__ (#969)
    • Fix wrong behavior of GeneralizedRCNNTransform in Python2. (#960)

    Datasets

    New

    • Add USPS dataset (#961)(#1117)
    • Added support for the QMNIST dataset (#995)
    • Add HMDB51 and UCF101 datasets (#1156)
    • Add Kinetics400 dataset (#1077)

    Improvements

    • Miscellaneous dataset fixes (#1174)
    • Standardize str argument verification in datasets (#1167)
    • Always pass transform and target_transform to abstract dataset (#1126)
    • Remove duplicate transform assignment in FakeDataset (#1125)
    • Automatic extraction for Cityscapes Dataset (#1066) (#1068)
    • Use joint transform in Cityscapes (#1024)(#1045)
    • CelebA: track attr names, support split="all", code cleanup (#1008)
    • Add folds option to STL10 (#914)

    Models

    New

    • Add pretrained Wide ResNet (#912)
    • Memory efficient densenet (#1003) (#1090)
    • Implementation of the MNASNet family of models (#829)(#1043)(#1092)
    • Add VideoModelZoo models (#1130)

    Improvements

    • Fix resnet fpn backbone for resnet18 and resnet34 (#1147)
    • Add checks to roi_heads in detection module (#1091)
    • Make shallow copy of input list in GeneralizedRCNNTransform (#1085)(#1111)(#1084)
    • Make MobileNetV2 number of channel divisible by 8 (#1005)
    • typo fix: ouput -> output in Inception and GoogleNet (#1034)
    • Remove empty proposals from the RPN (#1026)
    • Remove empty boxes before NMS (#1019)
    • Reduce code duplication in segmentation models (#1009)
    • allow user to define residual settings in MobileNetV2 (#965)
    • Use flatten instead of view (#1134)

    Documentation

    • Consistency in detection box format (#1110)
    • Fix Mask R-CNN docs (#1089)
    • Add paper references to VGG and Resnet variants (#1088)
    • Doc, Test Fixes in Normalize (#1063)
    • Add transforms doc to more datasets (#1038)
    • Corrected typo: 5 to 0.5 (#1041)
    • Update doc for torchvision.transforms.functional.perspective (#1017)
    • Improve documentation for fillcolor option in RandomAffine (#994)
    • Fix COCO_INSTANCE_CATEGORY_NAMES (#991)
    • Added models information to documentation. (#985)
    • Add missing import in faster_rcnn.py documentation (#979)
    • Improve make_grid docs (#964)

    Tests

    • Add test for SVHN (#1086)
    • Add tests for Cityscapes Dataset (#1079)
    • Update CI to Python 3.6 (#1044)
    • Make test_save_image more robust (#1037)
    • Add a generic test for the datasets (#1015)
    • moved fakedata generation to separate module (#1014)
    • Create imagenet fakedata on-the-fly (#1012)
    • Minor test refactorings (#1011)
    • Add test for CIFAR10(0) (#1010)
    • Mock MNIST download for less flaky tests (#1004)
    • Add test for ImageNet (#976)(#1006)
    • Add tests for datasets (#966)

    Transforms

    New

    • Add Random Erasing for image augmentation (#909) (#1060) (#1087) (#1095)

    Improvements

    • Allowing 'F' mode for 1 channel FloatTensor in ToPILImage (#1100)
    • Add shear parallel to y-axis (#1070)
    • fix error message in to_tensor (#1000)
    • Fix TypeError in RandomResizedCrop.get_params (#1036)
    • Fix normalize for different dtype than float32 (#1021)

    Ops

    • Renamed vision.h files to vision_cpu.h and vision_cuda.h (#1051)(#1052)
    • Optimize nms_cuda by avoiding extra torch.cat call (#945)

    Reference scripts

    • Expose data-path in the detection reference scripts (#1109)
    • Make utils.py work with pytorch-cpu (#1023)
    • Add mixed precision training with Apex (#972)(#1124)
    • Add reference code for similarity learning (#1101)

    Build

    • Add windows build steps and wheel build scripts (#998)
    • add packaging scripts (#996)
    • Allow forcing GPU build with FORCE_CUDA=1 (#927)

    Misc

    • Misc lint fixes (#1020)
    • Reraise error on failed downloading (#1013)
    • add more hub models (#974)
    • make C extension lazy-import (#971)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(May 22, 2019)

    This release brings several new features to torchvision, including models for semantic segmentation, object detection, instance segmentation and person keypoint detection, and custom C++ / CUDA ops specific to computer vision.

    Note: torchvision 0.3 requires PyTorch 1.1 or newer

    Highlights

    Reference training / evaluation scripts

    We now provide under the references/ folder scripts for training and evaluation of the following tasks: classification, semantic segmentation, object detection, instance segmentation and person keypoint detection. Their purpose is twofold:

    • serve as a log of how to train a specific model.
    • provide baseline training and evaluation scripts to bootstrap research

    They all have an entry-point train.py which performs both training and evaluation for a particular task. Other helper files, specific to each training script, are also present in the folder, and they might get integrated into the torchvision library in the future.

    We expect users should copy-paste and modify those reference scripts and use them for their own needs.

    TorchVision Ops

    TorchVision now contains custom C++ / CUDA operators in torchvision.ops. Those operators are specific to computer vision, and make it easier to build object detection models. Those operators currently do not support PyTorch script mode, but support for it is planned for future releases.

    List of supported ops

    • roi_pool (and the module version RoIPool)
    • roi_align (and the module version RoIAlign)
    • nms, for non-maximum suppression of bounding boxes
    • box_iou, for computing the intersection over union metric between two sets of bounding boxes

    All the other ops present in torchvision.ops and its subfolders are experimental, in particular:

    • FeaturePyramidNetwork is a module that adds a FPN on top of a module that returns a set of feature maps.
    • MultiScaleRoIAlign is a wrapper around roi_align that works with multiple feature map scales

    Here are a few examples on using torchvision ops:

    import torch
    import torchvision
    
    # create 10 random boxes
    boxes = torch.rand(10, 4) * 100
    # they need to be in [x0, y0, x1, y1] format
    boxes[:, 2:] += boxes[:, :2]
    # create a random image
    image = torch.rand(1, 3, 200, 200)
    # extract regions in `image` defined in `boxes`, rescaling
    # them to have a size of 3x3
    pooled_regions = torchvision.ops.roi_align(image, [boxes], output_size=(3, 3))
    # check the size
    print(pooled_regions.shape)
    # torch.Size([10, 3, 3, 3])
    
    # or compute the intersection over union between
    # all pairs of boxes
    print(torchvision.ops.box_iou(boxes, boxes).shape)
    # torch.Size([10, 10])
    

    Models for more tasks

    The 0.3 release of torchvision includes pre-trained models for other tasks than image classification on ImageNet. We include two new categories of models: region-based models, like Faster R-CNN, and dense pixelwise prediction models, like DeepLabV3.

    Object Detection, Instance Segmentation and Person Keypoint Detection models

    Warning: The API is currently experimental and might change in future versions of torchvision

    The 0.3 release contains pre-trained models for Faster R-CNN, Mask R-CNN and Keypoint R-CNN, all of them using ResNet-50 backbone with FPN. They have been trained on COCO train2017 following the reference scripts in references/, and give the following results on COCO val2017

    Network | box AP | mask AP | keypoint AP -- | -- | -- | -- Faster R-CNN ResNet-50 FPN | 37.0 |   |   Mask R-CNN ResNet-50 FPN | 37.9 | 34.6 |   Keypoint R-CNN ResNet-50 FPN | 54.6 |   | 65.0

    The implementations of the models for object detection, instance segmentation and keypoint detection are fast, specially during training.

    In the following table, we use 8 V100 GPUs, with CUDA 10.0 and CUDNN 7.4 to report the results. During training, we use a batch size of 2 per GPU, and during testing a batch size of 1 is used.

    For test time, we report the time for the model evaluation and post-processing (including mask pasting in image), but not the time for computing the precision-recall.

    Network | train time (s / it) | test time (s / it) | memory (GB) -- | -- | -- | -- Faster R-CNN ResNet-50 FPN | 0.2288 | 0.0590 | 5.2 Mask R-CNN ResNet-50 FPN | 0.2728 | 0.0903 | 5.4 Keypoint R-CNN ResNet-50 FPN | 0.3789 | 0.1242 | 6.8

    You can load and use pre-trained detection and segmentation models with a few lines of code

    import torchvision
    
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
    # set it to evaluation mode, as the model behaves differently
    # during training and during evaluation
    model.eval()
    
    image = PIL.Image.open('/path/to/an/image.jpg')
    image_tensor = torchvision.transforms.functional.to_tensor(image)
    
    # pass a list of (potentially different sized) tensors
    # to the model, in 0-1 range. The model will take care of
    # batching them together and normalizing
    output = model([image_tensor])
    # output is a list of dict, containing the postprocessed predictions
    

    Pixelwise Semantic Segmentation models

    Warning: The API is currently experimental and might change in future versions of torchvision

    The 0.3 release also contains models for dense pixelwise prediction on images. It adds FCN and DeepLabV3 segmentation models, using a ResNet50 and ResNet101 backbones. Pre-trained weights for ResNet101 backbone are available, and have been trained on a subset of COCO train2017, which contains the same 20 categories as those from Pascal VOC.

    The pre-trained models give the following results on the subset of COCO val2017 which contain the same 20 categories as those present in Pascal VOC:

    Network | mean IoU | global pixelwise acc -- | -- | -- FCN ResNet101 | 63.7 | 91.9 DeepLabV3 ResNet101 | 67.4 | 92.4

    New Datasets

    • Add Caltech101, Caltech256, and CelebA (#775)
    • ImageNet dataset (#764) (#858) (#870)
    • Added Semantic Boundaries Dataset (#808) (#865)
    • Add VisionDataset as a base class for all datasets (#749) (#859) (#838) (#876) (#878)

    New Models

    Classification

    • Add GoogLeNet (Inception v1) (#678) (#821) (#828) (#816)
    • Add MobileNet V2 (#818) (#917)
    • Add ShuffleNet v2 (#849) (#886) (#889) (#892) (#916)
    • Add ResNeXt-50 32x4d and ResNeXt-101 32x8d (#822) (#852) (#917)

    Segmentation

    • Fully-Convolutional Network (FCN) with ResNet 101 backbone
    • DeepLabV3 with ResNet 101 backbone

    Detection

    • Faster R-CNN R-50 FPN trained on COCO train2017 (#898) (#921)
    • Mask R-CNN R-50 FPN trained on COCO train2017 (#898) (#921)
    • Keypoint R-CNN R-50 FPN trained on COCO train2017 (#898) (#921) (#922)

    Breaking changes

    • Make CocoDataset ids deterministically ordered (#868)

    New Transforms

    • Add bias vector to LinearTransformation (#793) (#843) (#881)
    • Add Random Perspective transform (#781) (#879)

    Bugfixes

    • Fix user warning when applying normalize (#810)
    • Fix logic error in check_integrity (#871)

    Improvements

    • Fixing mutation of 2d tensors in to_pil_image (#762)
    • Replace tensor.view with tensor.unsqueeze(0) in make_grid (#765)
    • Change usage of view to reshape in resnet to enable running with mkldnn (#890)
    • Improve normalize to work with tensors located on any device (#787)
    • Raise an IndexError for FakeData.__getitem__() if the index would be out of range (#780)
    • Aspect ratio is now sampled from a logarithmic distribution in RandomResizedCrop. (#799)
    • Modernize inception v3 weight initialization code (#824)
    • Remove duplicate code from densenet load_state_dict (#827)
    • Replace endswith calls in a loop with a single endswith call in DatasetFolder (#832)
    • Added missing dot in webp image extensions (#836)
    • fix inconsistent behavior for ~ expression (#850)
    • Minor Compressions in statements in folder.py (#874)
    • Minor fix to evaluation formula of PILLOW_VERSION in transforms.functional.affine (#895)
    • added is_valid_file parameter to DatasetFolder (#867)
    • Add support for joint transformations in VisionDataset (#872)
    • Auto calculating return dimension of squeezenet forward method (#884)
    • Added progress flag to model getters (#875) (#910)
    • Add support for other normalizations (i.e., GroupNorm) in ResNet (#813)
    • Add dilation option to ResNet (#866)

    Testing

    • Add basic model testing. (#811)
    • Add test for num_class in test_model.py (#815)
    • Added test for normalize functionality in make_grid function. (#840)
    • Added downloaded directory not empty check in test_datasets_utils (#844)
    • Added test for save_image in utils (#847)
    • Added tests for check_md5 and check_integrity (#873)

    Misc

    • Remove shebang in setup.py (#773)
    • configurable version and package names (#842)
    • More hub models (#851)
    • Update travis to use more recent GCC (#891)

    Documentation

    • Add comments regarding downsampling layers of resnet (#794)
    • Remove unnecessary bullet point in InceptionV3 doc (#814)
    • Fix crop and resized_crop docs in functional.py (#817)
    • Added dimensions in the comments of googlenet (#788)
    • Update transform doc with random offset of padding due to pad_if_needed (#791)
    • Added the argument transform_input in docs of InceptionV3 (#789)
    • Update documentation for MNIST datasets (#778)
    • Fixed typo in normalize() function. (#823)
    • Fix typo in squeezenet (#841)
    • Fix typo in DenseNet comment (#857)
    • Typo and syntax fixes to transform docstrings (#887)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Feb 26, 2019)

    This version introduces several improvements and fixes.

    Support for arbitrary input sizes for models

    It is now possible to feed larger images than 224x224 into the models in torchvision. We added an adaptive pooling just before the classifier, which adapts the size of the feature maps before the last layer, allowing for larger input images. Relevant PRs: #744 #747 #746 #672 #643

    Bugfixes

    • Fix invalid argument error when using lsun method in windows (#508)
    • Fix FashionMNIST loading MNIST (#640)
    • Fix inception v3 input transform for trace & onnx (#621)

    Datasets

    • Add support for webp and tiff images in ImageFolder #736 #724
    • Add K-MNIST dataset #687
    • Add Cityscapes dataset #695 #725 #739 #700
    • Add Flicker 8k and 30k datasets #674
    • Add VOCDetection and VOCSegmentation datasets #663
    • Add SBU Captioned Photo Dataset (#665)
    • Updated URLs for EMNIST #726
    • MNIST and FashionMNIST now have their own 'raw' and 'processed' folder #601
    • Add metadata to some datasets (#501)

    Improvements

    • Allow RandomCrop to crop in the padded region #564
    • ColorJitter now supports min/max values #548
    • Generalize resnet to use block.extension #487
    • Move area calculation out of for loop in RandomResizedCrop #641
    • Add option to zero-init the residual branch in resnet (#498)
    • Improve error messages in to_pil_image #673
    • Added the option of converting to tensor for numpy arrays having only two dimensions in to_tensor (#686)
    • Optimize _find_classes in DatasetFolder via scandir in Python3 (#559)
    • Add padding_mode to RandomCrop (#489 #512)
    • Make DatasetFolder more generic (#527)
    • Add in-place option to normalize (#699)
    • Add Hamming and Box interpolations to transforms.py (#693)
    • Added the support of 2-channel Image modes such as 'LA' and adding a mode in 4 channel modes (#688)
    • Improve support for 'P' image mode in pad (#683)
    • Make torchvision depend on pillow-simd if already installed (#522)
    • Make tests run faster (#745)
    • Add support for non-square crops in RandomResizedCrop (#715)

    Breaking changes

    • save_images now round to nearest integer #754

    Misc

    • Added code coverage to travis #703
    • Add downloads and docs badge to README (#702)
    • Add progress to download_url #497 #524 #535
    • Replace 'residual' with 'identity' in resnet.py (#679)
    • Consistency changes in the models
    • Refactored MNIST and CIFAR to have data and target fields #578 #594
    • Update torchvision to newer versions of PyTorch
    • Relax assertion in transforms.Lambda.__init__ (#637)
    • Cast MNIST target to int (#605)
    • Change default target type of FakedDataset to long (#581)
    • Improve docs of functional transforms (#602)
    • Docstring improvements
    • Add is_image_file to folder_dataset (#507)
    • Add deprecation warning in MNIST train[test]_labels[data] (#742)
    • Mention TORCH_MODEL_ZOO in models documentation. (#624)
    • Add scipy as a dependency to setup.py (#675)
    • Added size information for inception v3 (#719)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Apr 24, 2018)

    This version introduces several fixes and improvements to the previous version.

    Better printing of Datasets and Transforms

    • Add descriptions to Transform objects.
    # Now T.Compose([T.RandomHorizontalFlip(), T.RandomCrop(224), T.ToTensor()]) prints
    Compose(
        RandomHorizontalFlip(p=0.5)
        RandomCrop(size=(224, 224), padding=0)
        ToTensor()
    )
    
    • Add descriptions to Datasets
    # now torchvision.datasets.MNIST('~') prints
    Dataset MNIST
        Number of datapoints: 60000
        Split: train
        Root Location: /private/home/fmassa
        Transforms (if any): None
        Target Transforms (if any): None
    

    New transforms

    • Add RandomApply, RandomChoice, RandomOrder transformations #402

      • RandomApply: applies a list of transformation with a probability
      • RandomChoice: choose randomly a single transformation from a list
      • RandomOrder: apply transformations in a random order
    • Add random affine transformation #411

    • Add reflect, symmetric and edge padding to transforms.pad #460

    Performance improvements

    • Speedup MNIST preprocessing by a factor of 1000x
    • make weight initialization optional to speed VGG construction. This makes loading pre-trained VGG models much faster
    • Accelerate transforms.adjust_gamma by using PIL's point function instead of custom numpy-based implementation

    New Datasets

    • EMNIST - an extension of MNIST for hand-written letters
    • OMNIGLOT - a dataset for one-shot learning, with 1623 different handwritten characters from 50 different alphabets
    • Add a DatasetFolder class - generalization of ImageFolder

    Miscellaneous improvements

    • FakeData accepts a seed argument, so having multiple different FakeData instances is now possible
    • Use consistent datatypes in Dataset targets. Now all datasets that returns labels will have them as int
    • Add probability parameter in RandomHorizontalFlip and RandomHorizontalFlip
    • Replace np.random by random in transforms - improves reproducibility in multi-threaded environments with default arguments
    • Detect tif images in ImageFolder
    • Add pad_if_needed to RandomCrop, so that if the crop size is larger than the image, the image is automatically padded
    • Add support in transforms.ToTensor for PIL Images with mode '1'

    Bugfixes

    • Fix passing list of tensors to utils.save_image
    • single images passed to make_grid now are now also normalized
    • Fix PIL img close warnings
    • Added missing weight initializations to densenet
    • Avoid division by zero in make_grid when the image is constant
    • Fix ToTensor when PIL Image has mode F
    • Fix bug with to_tensor when the input is numpy array of type np.float32.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Nov 27, 2017)

    This version introduced a functional interface to the transforms, allowing for joint random transformation of inputs and targets. We also introduced a few breaking changes to some datasets and transforms (see below for more details).

    Transforms

    We have introduced a functional interface for the torchvision transforms, available under torchvision.transforms.functional. This now makes it possible to do joint random transformations on inputs and targets, which is especially useful in tasks like object detection, segmentation and super resolution. For example, you can now do the following:

    from torchvision import transforms
    import torchvision.transforms.functional as F
    import random
    
    def my_segmentation_transform(input, target):
    	i, j, h, w = transforms.RandomCrop.get_params(input, (100, 100))
    	input = F.crop(input, i, j, h, w)
    	target = F.crop(target, i, j, h, w)
    	if random.random() > 0.5:
    		input = F.hflip(input)
    		target = F.hflip(target)
    	F.to_tensor(input), F.to_tensor(target)
    	return input, target
    

    The following transforms have also been added:

    • F.vflip and RandomVerticalFlip
    • FiveCrop and TenCrop
    • Various color transformations:
      • ColorJitter
      • F.adjust_brightness
      • F.adjust_contrast
      • F.adjust_saturation
      • F.adjust_hue
    • LinearTransformation for applications such as whitening
    • Grayscale and RandomGrayscale
    • Rotate and RandomRotation
    • ToPILImage now supports RGBA images
    • ToPILImage now accepts a mode argument so you can specify which colorspace the image should be
    • RandomResizedCrop now accepts scale and ratio ranges as input parameters

    Documentation

    Documentation is now auto generated and publishing to pytorch.org

    Datasets:

    SEMEION Dataset of handwritten digits added Phototour dataset patches computed via multi-scale Harris corners now available by setting name equal to notredame_harris, yosemite_harris or liberty_harris in the Phototour dataset

    Bug fixes:

    • Pre-trained densenet models is now CPU compatible #251

    Breaking changes:

    This version also introduced some breaking changes:

    • The SVHN dataset has now been made consistent with other datasets by making the label for the digit 0 be 0, instead of 10 (as it was previously) (see #194 for more details)
    • the labels for the unlabelled STL10 dataset is now an array filled with -1
    • the order of the input args to the deprecated Scale transform has changed from (width, height) to (height, width) to be consistent with other transforms
    Source code(tar.gz)
    Source code(zip)
  • v0.1.9(Sep 11, 2017)

    • Ability to switch image backends between PIL and accimage
    • Added more tests
    • Various bug fixes and doc improvements

    Models

    • Fix for inception v3 input transform bug https://github.com/pytorch/vision/pull/144
    • Added pretrained VGG models with batch norm

    Datasets

    • Fix indexing bug in LSUN dataset (https://github.com/pytorch/vision/pull/177)
    • enable ~ to be used in dataset paths
    • ImageFolder now returns the same (sorted) file order on different machines (https://github.com/pytorch/vision/pull/193)

    Transforms

    • transforms.Scale now accepts a tuple as new size or single integer

    Utils

    • can now pass a pad value to make_grid and save_image
    Source code(tar.gz)
    Source code(zip)
  • v0.1.8(Apr 3, 2017)

    New Features

    Models

    • SqueezeNet 1.0 and 1.1 models added, along with pre-trained weights
    • Add pre-trained weights for VGG models
      • Fix location of dropout in VGG
    • torchvision.models now expose num_classes as a constructor argument
    • Add InceptionV3 model and pre-trained weights
    • Add DenseNet models and pre-trained weights

    Datasets

    • Add STL10 dataset
    • Add SVHN dataset
    • Add PhotoTour dataset

    Transforms and Utilities

    • transforms.Pad now allows fill colors of either number tuples, or named colors like "white"
    • add normalization options to make_grid and save_image
    • ToTensor now supports more input types

    Performance Improvements

    Bug Fixes

    • ToPILImage now supports a single image
    • Python3 compatibility bug fixes
    • ToTensor now copes with all PIL Image types, not just RGB images
    • ImageFolder now only scans subdirectories.
      • Having files like .DS_Store is now no longer a blocking hindrance
      • Check for non-zero number of images in ImageFolder
      • Subdirectories of classes have recursive scans for images
    • LSUN test set loads now
    Source code(tar.gz)
    Source code(zip)
  • v0.1.7(Apr 3, 2017)

Owner
null
Experimenting with computer vision techniques to generate annotated image datasets from gameplay recordings automatically.

Experimenting with computer vision techniques to generate annotated image datasets from gameplay recordings automatically. The collected data will then be used to train a deep neural network that can detect enemy player models in real time, during gameplay. Finally, a virtual input device will adjust the player's crosshair based on live detections for greater accuracy.

Martin Valchev 3 Apr 24, 2022
Image Processing, Image Smoothing, Edge Detection and Transforms

opevcvdl-hw1 This project uses openCV and Qt to achieve the requirements. Version Python 3.7 opencv-contrib-python 3.4.2.17 Matplotlib 3.1.1 pyqt5 5.1

Kenny Cheng 3 Aug 17, 2022
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

Rishikesh (ऋषिकेश) 218 Jan 5, 2023
Image data augmentation scheduler for albumentations transforms

albu_scheduler Scheduler for albumentations transforms based on PyTorch schedulers interface Usage TransformMultiStepScheduler import albumentations a

null 19 Aug 4, 2021
Progressive Coordinate Transforms for Monocular 3D Object Detection

Progressive Coordinate Transforms for Monocular 3D Object Detection This repository is the official implementation of PCT. Introduction In this paper,

null 58 Nov 6, 2022
RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching This repository contains the source code for our paper: RAFT-Stereo: Multilevel

Princeton Vision & Learning Lab 328 Jan 9, 2023
functorch is a prototype of JAX-like composable function transforms for PyTorch.

functorch is a prototype of JAX-like composable function transforms for PyTorch.

Facebook Research 1.2k Jan 9, 2023
It's like Shape Editor in Maya but works with skeletons (transforms).

Skeleposer What is Skeleposer? Briefly, it's like Shape Editor in Maya, but works with transforms and joints. It can be used to make complex facial ri

Alexander Zagoruyko 1 Nov 11, 2022
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

VITA 59 Dec 28, 2022
GluonMM is a library of transformer models for computer vision and multi-modality research

GluonMM is a library of transformer models for computer vision and multi-modality research. It contains reference implementations of widely adopted baseline models and also research work from Amazon Research.

null 42 Dec 2, 2022
Build fully-functioning computer vision models with PyTorch

Detecto is a Python package that allows you to build fully-functioning computer vision and object detection models with just 5 lines of code. Inferenc

Alan Bi 576 Dec 29, 2022
Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Hierarchical Pretraining: Research Repository This is a research repository for reproducing the results from the project "Self-supervised pretraining

Colorado Reed 53 Nov 9, 2022
A framework for analyzing computer vision models with simulated data

3DB: A framework for analyzing computer vision models with simulated data Paper Quickstart guide Blog post Installation Follow instructions on: https:

3DB 112 Jan 1, 2023
An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

EasyDatas An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results Installation pip install git+https

Ximing Yang 4 Dec 14, 2021
Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Deep Learning Dataset Maker Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data. How to use Down

deepbands 25 Dec 15, 2022
Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

Continual learning datasets Introduction This repository contains PyTorch image

berjaoui 5 Aug 28, 2022
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
Meta Language-Specific Layers in Multilingual Language Models

Meta Language-Specific Layers in Multilingual Language Models This repo contains the source codes for our paper On Negative Interference in Multilingu

Zirui Wang 20 Feb 13, 2022
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

null 107 Dec 2, 2022