A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Comments
  • RuntimeError: Found 0 images in subfolders of: ./data

    RuntimeError: Found 0 images in subfolders of: ./data

    Hi, @smartkiwi @ezyang @Smerity @dnouri @bartolsthoorn

    I met this error in DCGAN training with folder mode $ python3 main.py --dataset folder --dataroot './data'

    File "/usr/local/lib/python3.5/dist-packages/torchvision/datasets/folder.py", line 97, in init "Supported image extensions are: " + ",".join(IMG_EXTENSIONS))) RuntimeError: Found 0 images in subfolders of: ./data Supported image extensions are: .jpg,.JPG,.jpeg,.JPEG,.png,.PNG,.ppm,.PPM,.bmp,.BMP

    But, There are many jpg images in ../dcgan/data folder. What's wrong with me?

    Thanks in advance .

    Ubuntu 16.04 LTS 1080ti

    opened by bemoregt 26
  • Add finetuning

    Add finetuning

    ... INCORPORATING SUGGESTIONS ... Finetune for alexnet and resnet. It does not work for vgg yet (no pretrained weights). This will pick up the number of classes from the training directory and finetune the classification layer.

    This does 'hard' finetune by freezing feature layers. Another option is 'soft' finetune where small changes to higher layers are allowed. This PR only does the former.

    Used hints from https://discuss.pytorch.org/t/how-to-extract-features-of-an-image-from-a-trained-model/119/3 and @apaszke

    opened by sanealytics 18
  • Time Sequence Prediction broken...

    Time Sequence Prediction broken...

    When I run the train.py it makes predictions correctly until step 2. Here is the output and graphs generated.

    screenshot from 2017-09-11 18 45 13

    STEP: 0

    loss: 1.09054476065 loss: 0.826498360191 loss: 0.742258943851 loss: 0.46692670559 loss: 0.331289575869 loss: 0.236026240101 loss: 0.141259279307 loss: 0.0851531255285 loss: 0.0541043081422 loss: 0.040030847713 loss: 0.0348454304112 loss: 0.0298759609175 loss: 0.0270200336839 loss: 0.0249405874131 loss: 0.0236001364739 loss: 0.0191041066229 loss: 0.0133801897465 loss: 0.011366287259 loss: 0.00983567643843 loss: 0.00893716252079 test loss: 0.00847954020704

    STEP: 1

    loss: 0.00857080080665 loss: 0.00803037675458 loss: 0.00710759490784 loss: 0.00924047828179 loss: 0.00576260438146 loss: 0.00500472024838 loss: 0.0213943098213 loss: 0.0043065051954 loss: 0.00388057702212 loss: 0.00434915494093 loss: 0.00298646983473 loss: 0.00266532448006 loss: 0.00259056949055 loss: 0.00245869588709 loss: 0.00244078378996 loss: 0.00239072969226 loss: 0.00233532586375 loss: 0.00224324638568 loss: 0.00193656497962 loss: 0.0203101775237 test loss: 0.00190941044782

    STEP: 2

    loss: 0.0018911771942 loss: 0.00187708194719 loss: 0.00184734283665 loss: 0.00182429561988 loss: 0.0017866811193 loss: 0.00175956313747 loss: 0.0017367104804 loss: 0.0017279887115 loss: 0.00170404503114 loss: 0.00164831568869 loss: 0.0015344472405 loss: 0.00146448434852 loss: 0.00290571241089 loss: 0.00141937563202 loss: 0.00140850693117 loss: 0.00138915262294 loss: 0.00138241696307 loss: 0.00138046501574 loss: 0.00137327773791 loss: 0.00136058323062 test loss: 0.00137266833634

    STEP: 3

    loss: 0.00133259520575 loss: 0.00126648679697 loss: 0.00142157202624 loss: 0.00132635502659 loss: 0.00136787285234 loss: 0.0011413003146 loss: 0.00110063048445 loss: 0.00103065559065 loss: 0.00102210567153 loss: 0.000941988786438 loss: 0.000857952932802 loss: 0.254636943994 loss: 10.8644616366 loss: 69.6423922962 loss: 66.7292987378 loss: 51.7855249351 loss: 45.1530865189 loss: 349.471334127 loss: 303.620815972 loss: 345.339689524 test loss: 5.94561623697

    STEP: 4

    loss: 220.910263699 loss: 179.504037927 loss: 162.525490947 loss: 0.500108508365 test loss: 0.502430217481

    STEP: 5

    loss: 0.500108508365 test loss: 0.502430217481

    STEP: 6

    loss: 0.500108508365 test loss: 0.502430217481

    STEP: 7

    loss: 0.500108508365 test loss: 0.502430217481

    STEP: 8

    loss: 0.500108508365 test loss: 0.502430217481

    STEP: 9

    loss: 0.500108508365 test loss: 0.502430217481

    STEP: 10

    loss: 0.500108508365 test loss: 0.502430217481

    STEP: 11

    loss: 0.500108508365 test loss: 0.502430217481

    STEP: 12

    loss: 0.500108508365 test loss: 0.502430217481

    STEP: 13

    loss: 0.500108508365 test loss: 0.502430217481

    STEP: 14

    loss: 0.500108508365 test loss: 0.502430217481

    opened by TRTL4LIFE 17
  • why is detach necessary

    why is detach necessary

    Hi, I am wondering why is detach necessary in this line: https://github.com/pytorch/examples/blob/a60bd4e261afc091004ea3cf582d0ad3b2e01259/dcgan/main.py#L230

    I understand that we want to update the gradients of netD without changin the ones of netG. But if the optimizer is only using the parameters of netD, then only its weight will be updated. Am I missing something here? Thanks in advance!

    opened by rogertrullo 17
  • Distributed example on C++ API (LibTorch)

    Distributed example on C++ API (LibTorch)

    This PR adds an example involving distributed training using MPI on the C++ frontend similar to DistributedDataParallel in Python. This topic was raised in the forums here. Right now, this code is CPU only.

    Please let me know if this PR can be a worthwhile contribution.

    cc @yf225 @pietern

    cla signed 
    opened by soumyadipghosh 16
  • Triplet Network [WIP]

    Triplet Network [WIP]

    This WIP PR implements a triplet network based on TFeat shallow convolutional patch descriptor based on https://github.com/vbalnt/tfeat.

    Additionally, it implements the TripletLoss from Chainer using the autograd tools.

    TODO list:

    • [x] Replace MNIST dataset by PhotoTour
    • [x] Include testing step and evaluation metrics FPR95
    • [x] Serialization
    • [x] Improve documentation
    • [x] Integrate TripletLoss in functional API
    • [x] Integrate PhotoTour in `torchvision'
    cla signed 
    opened by edgarriba 16
  • RuntimeError: invalid argument 2: size '[-1 x 300]' for SNLI example

    RuntimeError: invalid argument 2: size '[-1 x 300]' for SNLI example

    I get the following error when I try to run the SNLI example on my machine.

    Traceback (most recent call last):
      File "train.py", line 35, in <module>
        inputs.vocab.load_vectors(wv_dir=args.data_cache, wv_type=args.word_vectors, wv_dim=args.d_embed)
      File "/usr/local/lib/python2.7/dist-packages/torchtext/vocab.py", line 162, in load_vectors
        wv_dict, wv_arr, self.wv_size = load_word_vectors(wv_dir, wv_type, wv_dim)
      File "/usr/local/lib/python2.7/dist-packages/torchtext/vocab.py", line 70, in load_word_vectors
        wv_arr = torch.Tensor(wv_arr).view(-1, wv_size)
    RuntimeError: invalid argument 2: size '[-1 x 300]' is invalid for input of with 544881656 elements at /pytorch/torch/lib/TH/THStorage.c:37
    

    Not sure if this is an error in the way the vectors are being loaded, or an error in torchtext itself. I'm using Python 2.7.12 on Ubuntu 16.04.

    bug 
    opened by mcneela 15
  • fast-neural-style example

    fast-neural-style example

    opened by abhiskk 11
  • Run CI daily

    Run CI daily

    xref pytorch/pytorch#32004

    Run github actions CI once a day (at 3:00) and if it fails, open an issue. mattip/pytorch-examples#21 is a forced example of such an issue. A few things to note:

    • The main purpose of this action is to consistently test this repo against pytorch nightly builds, to see if some new pytorch "feature" breaks one of the examples.
    • Anyone who forks this repo will, by default, get this scheduled action. I added a note to the README. We could also add a step to emit an issue if {{ github.repository_owner != 'pytorch' }} which would hopefully get people's attention :). By default forks do not have issues, so I am not sure what would happen. Forking this repo does seem like a popular thing to do: there are 64.k forks. Searching for a way to disable via the yml led me to this discussion which seems to indicate it cannot be done right now. Another option would be to move from github actions (which are available on all public repos) to travis or circleci, which would require action on part of the fork to enable it. I can pivot this PR to do that, but a admin would have to enable another CI service.
    • I hope someone is getting mails on all new issues in this repo, otherwise the issue will not get much attention.
    opened by mattip 10
  • The training result of the word_language_model's transformer model seems not good

    The training result of the word_language_model's transformer model seems not good

    I have tested the word_language_model's transformer model, but the result is not good, this is my training log, I don't know why, anyone could help me?

    | epoch   6 |   200/ 2983 batches | lr 5.00 | ms/batch 32.47 | loss  4.95 | ppl   140.84
    | epoch   6 |   400/ 2983 batches | lr 5.00 | ms/batch 32.18 | loss  4.97 | ppl   143.44
    | epoch   6 |   600/ 2983 batches | lr 5.00 | ms/batch 32.16 | loss  4.79 | ppl   120.34
    | epoch   6 |   800/ 2983 batches | lr 5.00 | ms/batch 32.14 | loss  4.85 | ppl   127.16
    | epoch   6 |  1000/ 2983 batches | lr 5.00 | ms/batch 32.19 | loss  4.84 | ppl   126.60
    | epoch   6 |  1200/ 2983 batches | lr 5.00 | ms/batch 32.16 | loss  4.86 | ppl   128.54
    | epoch   6 |  1400/ 2983 batches | lr 5.00 | ms/batch 32.17 | loss  4.91 | ppl   135.49
    | epoch   6 |  1600/ 2983 batches | lr 5.00 | ms/batch 32.17 | loss  4.96 | ppl   142.61
    | epoch   6 |  1800/ 2983 batches | lr 5.00 | ms/batch 32.15 | loss  4.86 | ppl   129.50
    | epoch   6 |  2000/ 2983 batches | lr 5.00 | ms/batch 32.18 | loss  4.90 | ppl   133.89
    | epoch   6 |  2200/ 2983 batches | lr 5.00 | ms/batch 32.17 | loss  4.79 | ppl   120.56
    | epoch   6 |  2400/ 2983 batches | lr 5.00 | ms/batch 32.23 | loss  4.84 | ppl   126.53
    | epoch   6 |  2600/ 2983 batches | lr 5.00 | ms/batch 32.31 | loss  4.87 | ppl   129.85
    | epoch   6 |  2800/ 2983 batches | lr 5.00 | ms/batch 32.19 | loss  4.80 | ppl   121.97
    -----------------------------------------------------------------------------------------
    | end of epoch   6 | time: 99.66s | valid loss  5.36 | valid ppl   212.04
    -----------------------------------------------------------------------------------------
    =========================================================================================
    | End of training | test loss  5.27 | test ppl   194.93
    =========================================================================================
    

    Even I improve the epochs to 40, the result is not noticeable

    | end of epoch  40 | time: 89.19s | valid loss  4.73 | valid ppl   113.65
    | End of training | test loss  4.67 | test ppl   107.08
    

    I noticed that the code of the transformer part in this example is similar to SEQUENCE-TO-SEQUENCE MODELING WITH NN.TRANSFORMER AND TORCHTEXT,I have tried that tutorial's code, and the result is nearly same. But tutorial's result is very good,

    | end of epoch 3 | time: 107.26s | valid loss 1.00 | valid ppl 2.72

    | End of training | test loss 0.99 | test ppl 2.68

    I ran the code, my result is

    | end of epoch   3 | time: 76.65s | valid loss  5.55 | valid ppl   257.81
    | End of training | test loss  5.46 | test ppl   234.59
    

    This truely puzzles me. Anyone could help me? Is there any problem in the code?

    Thanks to the example and tutorial's author.

    opened by SpringRi 10
  • Script of training ImageNet is not correct

    Script of training ImageNet is not correct

    Hi I use your official released script to train ResNet18 on ImageNet, however the result is far away from the expected one(67.8% vs 69.75 in your model zoo) even adding color augmentation tricks. In other models as mobilenet the result is also worse than the one trained on other frameworks like caffe. I do not know whether there is sth wrong with your script. Can u give a baseline of the model trained with your script ?

    opened by XiongweiWu 10
  • Update requirements.txt for minGPT ddp example

    Update requirements.txt for minGPT ddp example

    FSSpec requires additional libraries: aiohttp and requests

    (mingpt) ubuntu@ip-172-31-11-224:~/workspace/pytorch_examples/distributed/minGPT-ddp/mingpt$ MASTER_PORT=25000 MASTER_ADDR=$(hostname) RANK=0 WORLD_SIZE=1 python main.py
    [2022-12-28 17:48:09,024][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0
    [2022-12-28 17:48:09,025][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
    Error executing job with overrides: []
    Traceback (most recent call last):
      File "/home/ubuntu/venv/mingpt/lib/python3.8/site-packages/fsspec/registry.py", line 243, in get_filesystem_class
        register_implementation(protocol, _import_class(bit["class"]))
      File "/home/ubuntu/venv/mingpt/lib/python3.8/site-packages/fsspec/registry.py", line 266, in _import_class
        mod = importlib.import_module(mod)
      File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
      File "<frozen importlib._bootstrap>", line 991, in _find_and_load
      File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 848, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/home/ubuntu/venv/mingpt/lib/python3.8/site-packages/fsspec/implementations/http.py", line 11, in <module>
        import aiohttp
    ModuleNotFoundError: No module named 'aiohttp'
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "main.py", line 33, in main
        model, optimizer, train_data, test_data = get_train_objs(gpt_cfg, opt_cfg, data_cfg)
      File "main.py", line 13, in get_train_objs
        dataset = CharDataset(data_cfg)
      File "/home/ubuntu/workspace/pytorch_examples/distributed/minGPT-ddp/mingpt/char_dataset.py", line 20, in __init__
        data = fsspec.open(data_cfg.path).open().read().decode('utf-8')
      File "/home/ubuntu/venv/mingpt/lib/python3.8/site-packages/fsspec/core.py", line 441, in open
        return open_files(
      File "/home/ubuntu/venv/mingpt/lib/python3.8/site-packages/fsspec/core.py", line 273, in open_files
        fs, fs_token, paths = get_fs_token_paths(
      File "/home/ubuntu/venv/mingpt/lib/python3.8/site-packages/fsspec/core.py", line 621, in get_fs_token_paths
        cls = get_filesystem_class(protocol)
      File "/home/ubuntu/venv/mingpt/lib/python3.8/site-packages/fsspec/registry.py", line 245, in get_filesystem_class
        raise ImportError(bit["err"]) from e
    ImportError: HTTPFileSystem requires "requests" and "aiohttp" to be installed
    
    
    cla signed 
    opened by kurman 1
  • MNIST on Apple Silicon

    MNIST on Apple Silicon

    Any help would be appreciated! Unable to run multiprocessing with mps device

    Context

    • Pytorch version: 2.0.0.dev20221220
    • Operating System and version: macOS 13.1

    Your Environment

    • Installed using source? [yes/no]: no
    • Are you planning to deploy it using docker container? [yes/no]: no
    • Is it a CPU or GPU environment?: Trying to use GPU
    • Which example are you using: MNIST Hogwild
    • Link to code or data to repro [if any]: https://github.com/pytorch/examples/tree/main/mnist_hogwild

    Expected Behavior

    Adding argument --mps should result in training with GPU

    Current Behavior

    Runtimeerror: share_filename: only available on CPU

    Traceback (most recent call last):
      File "/Volumes/Main/pytorch/main.py", line 87, in <module>
        model.share_memory()  # gradients are allocated lazily, so they are not shared here
      File "/Users/jeffreythomas/opt/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2340, in share_memory
        return self._apply(lambda t: t.share_memory_())
      File "/Users/jeffreythomas/opt/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 784, in _apply
        module._apply(fn)
      File "/Users/jeffreythomas/opt/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 807, in _apply
        param_applied = fn(param)
      File "/Users/jeffreythomas/opt/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2340, in <lambda>
        return self._apply(lambda t: t.share_memory_())
      File "/Users/jeffreythomas/opt/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/_tensor.py", line 616, in share_memory_
        self._typed_storage()._share_memory_()
      File "/Users/jeffreythomas/opt/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/storage.py", line 701, in _share_memory_
        self._untyped_storage.share_memory_()
      File "/Users/jeffreythomas/opt/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/storage.py", line 209, in share_memory_
        self._share_filename_cpu_()
    RuntimeError: _share_filename_: only available on CPU
    

    Possible Solution

    Steps to Reproduce

    1. Clone repo
    2. Run with --mps on Apple M1 Ultra ...
    opened by jeffreykthomas 1
  • Inconsistency b/w tutorial and the code

    Inconsistency b/w tutorial and the code

    📚 Documentation

    In the DDP Tutorial, there is inconsistency between the code in the tutorial and original code.

    For example, under Running the distributed training job section, the Trainer object should take train_data as an argument not dataset (in the original code, it is right).

    The ideal PR to fix this issue is to make the tutorial consistent with the original code.

    help wanted distributed 
    opened by BalajiAI 1
  • DDP training question

    DDP training question

    opened by Henryplay 1
  • Query on loss calculation in word language model

    Query on loss calculation in word language model

    In the main.py of word language model, I find that in the evaluate function the total_loss is getting multiplied by length of data https://github.com/pytorch/examples/blob/ca1bd9167f7216e087532160fc5b98643d53f87e/word_language_model/main.py#L163

    However in the train function, total_loss is not getting multiplied by length of data https://github.com/pytorch/examples/blob/ca1bd9167f7216e087532160fc5b98643d53f87e/word_language_model/main.py#L194

    Is this proper?

    help wanted 
    opened by AvisP 0
  • Running on Windows

    Running on Windows

    📚 Documentation

    I'm trying to get DCGAN running on my Windows machine. It appears that the code may not support windows, but this is not mentioned in the readme. Is there a procedure to get it running on Windows?

    opened by maxbonzulak 2
Owner
null
Simple examples to introduce PyTorch

This repository introduces the fundamental concepts of PyTorch through self-contained examples. At its core, PyTorch provides two main features: An n-

Justin Johnson 4.4k Jan 7, 2023
Open source guides/codes for mastering deep learning to deploying deep learning in production in PyTorch, Python, C++ and more.

Deep Learning Materials by Deep Learning Wizard Start Learning Now Please head to www.deeplearningwizard.com to start learning! It is mobile/tablet fr

Ritchie Ng 572 Dec 28, 2022
Deep Learning (with PyTorch)

Deep Learning (with PyTorch) This notebook repository now has a companion website, where all the course material can be found in video and textual for

Alfredo Canziani 6.2k Jan 2, 2023
PyTorch Tutorial for Deep Learning Researchers

This repository provides tutorial code for deep learning researchers to learn PyTorch. In the tutorial, most of the models were implemented with less

Yunjey Choi 25.4k Jan 5, 2023
PyTorch tutorials.

PyTorch Tutorials All the tutorials are now presented as sphinx style documentation at: https://pytorch.org/tutorials Contributing We use sphinx-galle

null 6.6k Jan 2, 2023
C++ Implementation of PyTorch Tutorials for Everyone

C++ Implementation of PyTorch Tutorials for Everyone OS (Compiler)\LibTorch 1.9.0 macOS (clang 10.0, 11.0, 12.0) Linux (gcc 8, 9, 10, 11) Windows (msv

Omkar Prabhu 1.5k Jan 4, 2023
Minimal tutorials for PyTorch

Minimal tutorials for PyTorch adapted from Alec Radford's Theano tutorials. Tensor multiplication Linear Regression Logistic Regression Neural Network

Vinh Khuc 321 Oct 25, 2022
PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

pytorch-fcn PyTorch implementation of Fully Convolutional Networks. Requirements pytorch >= 0.2.0 torchvision >= 0.1.8 fcn >= 6.1.5 Pillow scipy tqdm

Kentaro Wada 1.6k Jan 4, 2023
Simple PyTorch Tutorials Zero to ALL!

PyTorchZeroToAll Quick 3~4 day lecture materials for HKUST students. Video Lectures: (RNN TBA) Youtube Bilibili Slides Lecture Slides @GoogleDrive If

Sung Kim 3.7k Dec 30, 2022
Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ)

DeepNLP-models-Pytorch Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ: NLP with Deep Learning) This is not for Pytorch be

Kim SungDong 2.9k Dec 24, 2022
PyTorch tutorials and best practices.

Effective PyTorch Table of Contents Part I: PyTorch Fundamentals PyTorch basics Encapsulate your model with Modules Broadcasting the good and the ugly

Vahid Kazemi 1.5k Jan 4, 2023
Some example scripts on pytorch

pytorch-practice Some example scripts on pytorch CONLL 2000 Chunking task Uses BiLSTM CRF loss with char CNN embeddings. To run use: cd data/conll2000

Shubhanshu Mishra 180 Dec 22, 2022
Example of network fine-tuning in pytorch for the kaggle competition Dogs vs. Cats Redux: Kernels Edition

Example of network fine-tuning in pytorch for the kaggle competition Dogs vs. Cats Redux: Kernels Edition Currently

bobby 70 Sep 22, 2022
ConvNet training using pytorch

Convolutional networks using PyTorch This is a complete training example for Deep Convolutional Networks on various datasets (ImageNet, Cifar10, Cifar

Elad Hoffer 336 Dec 30, 2022
simple generative adversarial network (GAN) using PyTorch

Generative Adversarial Networks (GANs) in PyTorch Running Run the sample code by typing: ./gan_pytorch.py ...and you'll train two nets to battle it o

vanguard_space 32 Jun 14, 2020
Torch Containers simplified in PyTorch

pytorch-containers This repository aims to help former Torchies more seamlessly transition to the "Containerless" world of PyTorch by providing a list

Max deGroot 88 Apr 25, 2022
The Hitchiker's Guide to PyTorch

The Hitchiker's Guide to PyTorch

Kai Arulkumaran 1k Dec 20, 2022
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities.

D2L.ai: Interactive Deep Learning Book with Multi-Framework Code, Math, and Discussions Book website | STAT 157 Course at UC Berkeley | Latest version

Dive into Deep Learning (D2L.ai) 16k Jan 3, 2023
A collection of various deep learning architectures, models, and tips

Deep Learning Models A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks. Traditiona

Sebastian Raschka 15.5k Jan 7, 2023