PyTorch implementation of "Efficient Neural Architecture Search via Parameters Sharing"

Overview

Efficient Neural Architecture Search (ENAS) in PyTorch

PyTorch implementation of Efficient Neural Architecture Search via Parameters Sharing.

ENAS_rnn

ENAS reduce the computational requirement (GPU-hours) of Neural Architecture Search (NAS) by 1000x via parameter sharing between models that are subgraphs within a large computational graph. SOTA on Penn Treebank language modeling.

**[Caveat] Use official code from the authors: link**

Prerequisites

  • Python 3.6+
  • PyTorch==0.3.1
  • tqdm, scipy, imageio, graphviz, tensorboardX

Usage

Install prerequisites with:

conda install graphviz
pip install -r requirements.txt

To train ENAS to discover a recurrent cell for RNN:

python main.py --network_type rnn --dataset ptb --controller_optim adam --controller_lr 0.00035 \
               --shared_optim sgd --shared_lr 20.0 --entropy_coeff 0.0001

python main.py --network_type rnn --dataset wikitext

To train ENAS to discover CNN architecture (in progress):

python main.py --network_type cnn --dataset cifar --controller_optim momentum --controller_lr_cosine=True \
               --controller_lr_max 0.05 --controller_lr_min 0.0001 --entropy_coeff 0.1

or you can use your own dataset by placing images like:

data
├── YOUR_TEXT_DATASET
│   ├── test.txt
│   ├── train.txt
│   └── valid.txt
├── YOUR_IMAGE_DATASET
│   ├── test
│   │   ├── xxx.jpg (name doesn't matter)
│   │   ├── yyy.jpg (name doesn't matter)
│   │   └── ...
│   ├── train
│   │   ├── xxx.jpg
│   │   └── ...
│   └── valid
│       ├── xxx.jpg
│       └── ...
├── image.py
└── text.py

To generate gif image of generated samples:

python generate_gif.py --model_name=ptb_2018-02-15_11-20-02 --output=sample.gif

More configurations can be found here.

Results

Efficient Neural Architecture Search (ENAS) is composed of two sets of learnable parameters, controller LSTM θ and the shared parameters ω. These two parameters are alternatively trained and only trained controller is used to derive novel architectures.

1. Discovering Recurrent Cells

rnn

Controller LSTM decide 1) what activation function to use and 2) which previous node to connect.

The RNN cell ENAS discovered for Penn Treebank and WikiText-2 dataset:

ptb wikitext

Best discovered ENAS cell for Penn Treebank at epoch 27:

ptb

You can see the details of training (e.g. reward, entropy, loss) with:

tensorboard --logdir=logs --port=6006

2. Discovering Convolutional Neural Networks

cnn

Controller LSTM samples 1) what computation operation to use and 2) which previous node to connect.

The CNN network ENAS discovered for CIFAR-10 dataset:

(in progress)

3. Designing Convolutional Cells

(in progress)

Reference

Author

Taehoon Kim / @carpedm20

Comments
  • Can't run the code, SyntaxError: invalid syntax

    Can't run the code, SyntaxError: invalid syntax

    I suppose I'm the only one with this problem, seeing everyone else can actually run the program.

    I tried both

    python3 main.py --network_type rnn --dataset ptb --controller_optim adam --controller_lr 0.00035 \
                   --shared_optim sgd --shared_lr 20.0 --entropy_coeff 0.0001
    
    python3 main.py --network_type rnn --dataset wikitext
    

    but an error comes out with

    File "main.py", line 28
        raise NotImplementedError(f"{args.dataset} is not supported")
                                                                   ^
    SyntaxError: invalid syntax
    

    Does anyone have any idea why? I'm using Ubuntu on Docker, with PyTorch installed (as well as those listed in requirement.txt, except pygraphviz (due to installation error, but this shouldn't raise any errors until it's actually called in utils.py, which I commented out anyway).

    opened by harewei 4
  • Upgrade to PyTorch 0.4, add regularizations, clip hidden states

    Upgrade to PyTorch 0.4, add regularizations, clip hidden states

    Hi, I wanted to open a pull request and get some feedback about how you would like changes to get merged into your repository.

    I guess the major issue is that I had to make a number of changes to get the codebase to work with PyTorch v0.4.

    I also made a number of aesthetic changes, including adding comments, but also other changes such as reducing line length to 80. I just did these as a force of habit to make the code more readable to me as I got used to it. I realize this is probably annoying as it clutters the commits.

    So yea, let me know please what is acceptable to merge. All of it? Or if not, I can potentially create another pull request incorporating maintainer feedback.

    Thank you for the repository by the way.

    opened by dukebw 4
  • RuntimeError: cuda runtime error (59) : device-side assert triggered

    RuntimeError: cuda runtime error (59) : device-side assert triggered

    Running with the default settings on 1 GPU leads to this error after running successfully for a lot of epochs. (both wikitext and ptb)

    train_shared| loss: 2.844:  11%|███████▏                                                           | 3500/32634 [00:40<05:19, 91.09it/s]
    2018-03-07 05:26:19,885:INFO::| epoch  83 | lr 1.20 | loss 2.97 | ppl    19.43
    train_shared| loss: 2.750:  16%|██████████▊                                                        | 5250/32634 [00:59<04:58, 91.77it/s]
    2018-03-07 05:26:39,208:INFO::| epoch  83 | lr 1.20 | loss 2.87 | ppl    17.70
    train_shared| loss: 2.619:  21%|██████████████▎                                                    | 7000/32634 [01:19<05:03, 84.33it/s]
    2018-03-07 05:26:58,999:INFO::| epoch  83 | lr 1.20 | loss 2.95 | ppl    19.18
    train_shared| loss: 2.627:  27%|█████████████████▉                                                 | 8750/32634 [01:38<04:20, 91.85it/s]
    2018-03-07 05:27:18,144:INFO::| epoch  83 | lr 1.20 | loss 3.26 | ppl    25.94
    train_shared| loss: 4.249:  32%|█████████████████████▏                                            | 10500/32634 [01:57<04:00, 91.96it/s]
    2018-03-07 05:27:37,222:INFO::| epoch  83 | lr 1.20 | loss 2.93 | ppl    18.67
    train_shared| loss: 2.546:  38%|████████████████████████▊                                         | 12250/32634 [02:17<04:00, 84.66it/s]
    2018-03-07 05:27:56,890:INFO::| epoch  83 | lr 1.20 | loss 2.96 | ppl    19.35
    train_shared| loss:   nan:  43%|████████████████████████████▎                                     | 14000/32634 [02:37<03:35, 86.57it/s]
    2018-03-07 05:28:17,371:INFO::| epoch  83 | lr 1.20 | loss nan | ppl      nan
    train_shared| loss:   nan:  43%|████████████████████████████▍                                     | 14035/32634 [02:37<03:34, 86.72it/sT
    HCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorCopy.cu line=100 error=59 : device-side asse
    rt triggered
    Traceback (most recent call last):
      File "main.py", line 45, in <module>
        main(args)
      File "main.py", line 34, in main
        trainer.train()
      File "/home/karan/metalearning-project/ENAS-pytorch/trainer.py", line 94, in train
        self.train_controller()
      File "/home/karan/metalearning-project/ENAS-pytorch/trainer.py", line 225, in train_controller
        dags, log_probs, entropies = self.controller.sample(with_details=True)
      File "/home/karan/metalearning-project/ENAS-pytorch/models/controller.py", line 96, in sample
        is_embed=block_idx==0)
      File "/home/karan/metalearning-project/ENAS-pytorch/models/controller.py", line 67, in forward
        logits = self.decoders[block_idx](hx)
      File "/home/karan/anaconda2/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/karan/anaconda2/envs/torch/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 55, in forward
        return F.linear(input, self.weight, self.bias)
      File "/home/karan/anaconda2/envs/torch/lib/python3.6/site-packages/torch/nn/functional.py", line 835, in linear
        return torch.addmm(bias, input, weight.t())
    RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/TH
    CTensorCopy.cu:100
    /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorRandom.cuh:179: void sampleMultinomialOnce(long *, long, int, T *
    , T *) [with T = float, AccT = float]: block: [0,0,0], thread: [0,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
    /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorRandom.cuh:179: void sampleMultinomialOnce(long *, long, int, T *
    , T *) [with T = float, AccT = float]: block: [0,0,0], thread: [1,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
    /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorRandom.cuh:179: void sampleMultinomialOnce(long *, long, int, T *
    , T *) [with T = float, AccT = float]: block: [0,0,0], thread: [2,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
    /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorRandom.cuh:179: void sampleMultinomialOnce(long *, long, int, T *
    , T *) [with T = float, AccT = float]: block: [0,0,0], thread: [3,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
    
    opened by karandwivedi42 4
  • Controller.encoder seems much too large

    Controller.encoder seems much too large

    From the Controller constructor:

    class Controller(torch.nn.Module):
        def __init__(self, args):
            torch.nn.Module.__init__(self)
            self.args = args
            self.forward_evals = 0
            if self.args.network_type == 'rnn':
                # NOTE(brendan): `num_tokens` here is just the activation function
                # for every even step,
                self.num_tokens = [len(args.shared_rnn_activations)]
                for idx in range(self.args.num_blocks):
                    self.num_tokens += [idx + 1,
                                        len(args.shared_rnn_activations)]
                self.func_names = args.shared_rnn_activations
            elif self.args.network_type == 'cnn':
                self.num_tokens = [len(args.shared_cnn_types),
                                   self.args.num_blocks]
                self.func_names = args.shared_cnn_types
    
            num_total_tokens = sum(self.num_tokens) #why sum the tokens here?
            #Shouldn't this be: num_total_tokens = len(args.shared_rnn_activations)+self.args.num_blocks
            self.encoder = torch.nn.Embedding(num_total_tokens,
                                              args.controller_hid)
    
    

    It seems like num_total_tokens doesn't need to be summation of the self.num_tokens - in the case where self.args.num_blocks = 6, that number is 49. Yet from what I can tell, the largest number you can ever get where the embedding is used in Controller.forward() is going to be len(args.shared_rnn_activations)+self.args.num_blocks (in this case that would be 10)

    opened by philtomson 2
  • Errors When running

    Errors When running

    @dukebw ,Hi,thanks for your work,when I run this code I meet some problems.

    1. When I run it using the run.sh by default ,I get THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory Traceback (most recent call last): File "main.py", line 48, in main(args) File "main.py", line 30, in main trnr = trainer.Trainer(args, dataset) File "/home/axi/ENAS-pytorch-master-3/trainer.py", line 160, in init self.build_model() File "/home/axi/ENAS-pytorch-master-3/trainer.py", line 192, in build_model self.shared.cuda() File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 147, in cuda return self._apply(lambda t: t.cuda(device_id)) File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 124, in _apply param.data = fn(param.data) File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 147, in return self._apply(lambda t: t.cuda(device_id)) File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 66, in cuda return new_type(self.size()).copy(self, async) RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generic/THCStorage.cu:66

    While I have 3 GPUS,10 G memory. 2. When I run it using : python main.py --network_type cnn --dataset cifar --controller_optim momentum --controller_lr_cosine=True --controller_lr_max 0.05 --controller_lr_min 0.0001 --entropy_coeff 0.1,I get: 2018-04-29 19:01:57,957:INFO::[*] Make directories : logs/cifar_2018-04-29_19-01-57 Traceback (most recent call last): File "main.py", line 48, in main(args) File "main.py", line 26, in main dataset = data.image.Image(args.data_path) File "/home/axi/ENAS-pytorch-master-2/data/image.py", line 8, in init if args.datset == 'cifar10': AttributeError: 'str' object has no attribute 'datset' and after I make some changes,I get other errors such as: 2018-04-29 18:49:24,745:INFO::[*] Make directories : logs/cifar10_2018-04-29_18-49-24 Files already downloaded and verified 2018-04-29 18:49:27,464:INFO::regularizing: Traceback (most recent call last): File "main.py", line 48, in main(args) File "main.py", line 30, in main trnr = trainer.Trainer(args, dataset) File "/home/axi/ENAS-pytorch-master-1/trainer.py", line 139, in init self.cuda) File "/home/axi/ENAS-pytorch-master-1/utils.py", line 148, in batchify data = data.narrow(0, 0, nbatch * bsz) AttributeError: 'DataLoader' object has no attribute 'narrow' or 2018-04-29 18:22:50,192:INFO::[*] Make directories : logs/cifar10_2018-04-29_18-22-50 Files already downloaded and verified 2018-04-29 18:22:55,041:INFO::regularizing: Traceback (most recent call last): File "main.py", line 48, in main(args) File "main.py", line 30, in main trnr = trainer.Trainer(args, dataset) File "/home/axi/ENAS-pytorch-master-1/trainer.py", line 139, in init self.cuda) File "/home/axi/ENAS-pytorch-master-1/utils.py", line 147, in batchify nbatch = data.size // bsz AttributeError: 'DataLoader' object has no attribute 'size'

    Would you please tell me what changes I should make before I run the code.Thanks for you response.

    opened by axiniu 2
  • The question about INF network parameters

    The question about INF network parameters

    When I was running your code, I found that some network parameters into INF. Is there any suggestion to solve this problem. Thanks. By the way, I was curiosity about the test ppl after training.

    opened by NewGod 2
  • Adds a

    Adds a "single" mode that loads and trains a given dag

    When running with --mode single, it loads the dag given by --dag_path (which is a simple json dump) and trains this dag.

    In this mode, there is no "autoML" part. The controller is not trained and the same dag is used during the "train_shared" phase and validate phase.

    I provided a dag.json file, which is the best dag found in the paper. This mode should allow us to "retrain from scratch" and eventually reach the result of the paper (working on it...).

    Major changes are in the Trainer class. I added a single=False optional parameter on the train() method and a dag=None optional parameter on the train_shared() method. This pull does not break backward compatibility.

    opened by nkcr 1
  • set default of `valid_idx` to 0 in `get_reward`

    set default of `valid_idx` to 0 in `get_reward`

    The value of valid_idx would always be overwritten by the if block. I think the author intended to write:

    if not valid_idx:
        valid_idx = 0
    

    But forgot the not in the condition. In fact instead of adding the not we can get rid of the if block and set the valid_idx parameter to a default value of 0.

    Before this update, omitting to give the optional parameter valid_idx would produce an exception.

    opened by nkcr 1
  • changes to run on PyTorch version >= 0.4.0

    changes to run on PyTorch version >= 0.4.0

    The current code only runs on versions of PyTorch before 0.4.0, this pull request adds changes to allow running on PyTorch versions >= 0.4.0.

    see: https://github.com/carpedm20/ENAS-pytorch/issues/25

    opened by philtomson 1
  • Fix the hidden state norm stabilization code for PyTorch v0.3.1, and

    Fix the hidden state norm stabilization code for PyTorch v0.3.1, and

    reduce the noise created by logging clipped hidden states.

    Sorry, this fixes a few bugs that were still present after my last commit (adding PyTorch v0.3.1 compatibility). I've tested now for v0.4 and v0.3.1 and training seems to be working for both.

    opened by dukebw 1
  • CUDA out of memory

    CUDA out of memory

    First off, thanks for making this, looks great!

    I downloaded the repo and I'm trying to run examples to test out the repo before moving on. Unfortunately I'm running into a problem with running the training in that almost immediately CUDA runs out of memory. I'm running on a GTX 1050 with 4GB of RAM (about 3GB avalible to use for training), same as the 980 you mentioned you were running on? I was just wondering if you had any ideas about what could be causing this issue! Full error message below.

    python main.py --network_type rnn --dataset ptb --controller_optim adam --controller_lr 0.00035 --shared_optim sgd --shared_lr 20.0 --entropy_coeff 0.0001
    2018-02-16 22:22:54,351:INFO::[*] Make directories : logs/ptb_2018-02-16_22-22-54
    2018-02-16 22:22:59,204:INFO::# of parameters: 146,014,000
    2018-02-16 22:22:59,315:INFO::[*] MODEL dir: logs/ptb_2018-02-16_22-22-54
    2018-02-16 22:22:59,316:INFO::[*] PARAM path: logs/ptb_2018-02-16_22-22-54/params.json
    train_shared:   0%|   | 0/14524 [00:00<?, ?it/s]
    /home/mjhutchinson/Documents/Machine Learning/ENAS-pytorch/models/controller.py:96: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
      probs = F.softmax(logits)
    /home/mjhutchinson/Documents/Machine Learning/ENAS-pytorch/models/controller.py:97: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
      log_prob = F.log_softmax(logits)
    THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
    Traceback (most recent call last):
      File "main.py", line 45, in <module>
        main(args)
      File "main.py", line 34, in main
        trainer.train()
      File "/home/mjhutchinson/Documents/Machine Learning/ENAS-pytorch/trainer.py", line 87, in train
        self.train_shared()
      File "/home/mjhutchinson/Documents/Machine Learning/ENAS-pytorch/trainer.py", line 143, in train_shared
        loss.backward()
      File "/home/mjhutchinson/.conda/envs/pytorch/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
      File "/home/mjhutchinson/.conda/envs/pytorch/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
        variables, grad_variables, retain_graph)
    RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCStorage.cu:58
    

    If there's any other info that would be helpful pleas let me know!

    opened by MJHutchinson 1
  • RuntimeError: grad can be implicitly created only for scalar outputs

    RuntimeError: grad can be implicitly created only for scalar outputs

    I encountered this strange error. Here is the output, thank you. Before, it was showing that the error cannot run on CPU and GPU at the same time, I added . cuda() after loss, it starts showing this error.

    Traceback (most recent call last): File "D:/xiangmu/ENAS-pytorch-master/main.py", line 56, in main(args) File "D:/xiangmu/ENAS-pytorch-master/main.py", line 35, in main trnr.train() File "D:\xiangmu\ENAS-pytorch-master\trainer.py", line 223, in train self.train_shared(dag=dag) File "D:\xiangmu\ENAS-pytorch-master\trainer.py", line 317, in train_shared loss.backward() File "C:\Users\sunhaonan.conda\envs\enas\lib\site-packages\torch_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "C:\Users\sunhaonan.conda\envs\enas\lib\site-packages\torch\autograd_init_.py", line 150, in backward grad_tensors_ = make_grads(tensors, grad_tensors) File "C:\Users\sunhaonan.conda\envs\enas\lib\site-packages\torch\autograd_init_.py", line 51, in _make_grads raise RuntimeError("grad can be implicitly created only for scalar outputs") RuntimeError: grad can be implicitly created only for scalar outputs

    opened by Shn9909 0
  • RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

    RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

    I encountered this strange error. Here is the output

    $ python main.py 
    2020-10-17 06:19:37,971:INFO::[*] Make directories : logs/ptb_2020-10-17_06-19-37
    2020-10-17 06:19:45,686:INFO::regularizing:
    2020-10-17 06:19:56,858:INFO::# of parameters: 146,014,000
    2020-10-17 06:19:57,208:INFO::[*] MODEL dir: logs/ptb_2020-10-17_06-19-37
    2020-10-17 06:19:57,208:INFO::[*] PARAM path: logs/ptb_2020-10-17_06-19-37/params.json
    /home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/nn/functional.py:1614: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
      warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
    /home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/nn/functional.py:1625: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
      warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
    2020-10-17 06:19:57,872:INFO::max hidden 3.5992980003356934
    2020-10-17 06:19:58,043:INFO::abs max grad 0
    /home/ubuntu/ENAS-pytorch/trainer.py:323: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
      self.args.shared_grad_clip)
    2020-10-17 06:19:58,879:INFO::abs max grad 0.05615033581852913
    2020-10-17 06:19:59,448:INFO::max hidden 9.425106048583984
    2020-10-17 06:19:59,774:INFO::abs max grad 0.0575626865029335
    2020-10-17 06:20:01,810:INFO::abs max grad 0.12187317758798599
    2020-10-17 06:20:03,771:INFO::abs max grad 0.5459710359573364
    2020-10-17 06:20:07,741:INFO::max hidden 15.914213180541992
    2020-10-17 06:20:17,945:INFO::abs max grad 0.8663018941879272
    2020-10-17 06:20:41,948:INFO::| epoch   0 | lr 20.00 | raw loss 8.39 | loss 8.39 | ppl  4402.23
    2020-10-17 06:21:21,796:INFO::| epoch   0 | lr 20.00 | raw loss 7.20 | loss 7.20 | ppl  1343.73
    2020-10-17 06:21:26,601:INFO::max hidden 20.534639358520508
    2020-10-17 06:22:06,855:INFO::| epoch   0 | lr 20.00 | raw loss 7.00 | loss 7.00 | ppl  1093.28
    2020-10-17 06:22:07,417:INFO::max hidden 22.71334457397461
    2020-10-17 06:22:19,596:INFO::clipped 1 hidden states in one forward pass. max clipped hidden state norm: 25.37160301208496
    Traceback (most recent call last):
      File "main.py", line 54, in <module>
        main(args)
      File "main.py", line 34, in main
        trnr.train()
      File "/home/ubuntu/ENAS-pytorch/trainer.py", line 222, in train
        self.train_shared(dag=dag)
      File "/home/ubuntu/ENAS-pytorch/trainer.py", line 313, in train_shared
        loss.backward()
      File "/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/tensor.py", line 185, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph)
      File "/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/autograd/__init__.py", line 127, in backward
        allow_unreachable=True)  # allow_unreachable flag
    RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 1000]], which is output 0 of AddBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
    
    opened by dangne 3
  • pygraphviz can not be installed

    pygraphviz can not be installed

    $ sudo pip3 install pygraphviz WARNING: The directory '/home/usr1/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting pygraphviz Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7e/b1/d6d849ddaf6f11036f9980d433f383d4c13d1ebcfc3cd09bc845bda7e433/pygraphviz-1.5.zip (117 kB) |████████████████████████████████| 117 kB 11.0 MB/s Installing collected packages: pygraphviz Running setup.py install for pygraphviz ... error ERROR: Command errored out with exit status 1: command: /usr/local/bin/python3.8 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-0z76ssje/pygraphviz/setup.py'"'"'; file='"'"'/tmp/pip-install-0z76ssje/pygraphviz/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-_abu592r/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.8/pygraphviz cwd: /tmp/pip-install-0z76ssje/pygraphviz/ Complete output (34 lines): running install Trying dpkg dpkg-query: no path found matching pattern graphviz Could not run dpkg Trying pkg-config Package libcgraph was not found in the pkg-config search path. Perhaps you should add the directory containing `libcgraph.pc' to the PKG_CONFIG_PATH environment variable No package 'libcgraph' found Traceback (most recent call last): File "", line 1, in File "/tmp/pip-install-0z76ssje/pygraphviz/setup.py", line 70, in setup( File "/usr/local/lib/python3.8/site-packages/setuptools/init.py", line 145, in setup return distutils.core.setup(**attrs) File "/usr/local/lib/python3.8/distutils/core.py", line 148, in setup dist.run_commands() File "/usr/local/lib/python3.8/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/usr/local/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/tmp/pip-install-0z76ssje/pygraphviz/setup_commands.py", line 44, in modified_run self.include_path, self.library_path = get_graphviz_dirs() File "/tmp/pip-install-0z76ssje/pygraphviz/setup_extra.py", line 162, in get_graphviz_dirs include_dirs, library_dirs = _try_configure(include_dirs, library_dirs, _pkg_config) File "/tmp/pip-install-0z76ssje/pygraphviz/setup_extra.py", line 117, in _try_configure i, l = try_function() File "/tmp/pip-install-0z76ssje/pygraphviz/setup_extra.py", line 72, in _pkg_config output = S.check_output(['pkg-config', '--libs-only-L', 'libcgraph']) File "/usr/local/lib/python3.8/subprocess.py", line 411, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/usr/local/lib/python3.8/subprocess.py", line 512, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['pkg-config', '--libs-only-L', 'libcgraph']' returned non-zero exit status 1. ---------------------------------------- ERROR: Command errored out with exit status 1: /usr/local/bin/python3.8 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-0z76ssje/pygraphviz/setup.py'"'"'; file='"'"'/tmp/pip-install-0z76ssje/pygraphviz/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-_abu592r/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.8/pygraphviz Check the logs for full command output.

    opened by Light-- 2
  • AttributeError: 'Namespace' object has no attribute 'num_workers'

    AttributeError: 'Namespace' object has no attribute 'num_workers'

    kukby@kukby-GI5KN54:~/ENAS-pytorch-master$ python3 main.py --network_type cnn --dataset cifar --controller_optim momentum --controller_lr_cosine=True --controller_lr_max 0.05 --controller_lr_min 0.0001 --entropy_coeff 0.1 2020-03-03 20:59:42,792:INFO::[*] Make directories : logs/cifar_2020-03-03_20-59-42 Files already downloaded and verified Traceback (most recent call last): File "main.py", line 54, in main(args) File "main.py", line 26, in main dataset = data.image.Image(args) File "/home/kukby/ENAS-pytorch-master/data/image.py", line 30, in init num_workers=args.num_workers, pin_memory=True)

    opened by kukby 1
  • a bug related to CNN search?

    a bug related to CNN search?

    Hi, thanks for sharing the code. I am implementing the CNN part. I think theblock_idx in forward function should be moded by 2 when CNN case as you used only two softmax. Could you check it? Thanks.

    https://github.com/carpedm20/ENAS-pytorch/blob/25c4a89e17851d72e85566e16c81f0fd3749d58f/models/controller.py#L173-L177

    opened by neouyghur 0
Owner
Taehoon Kim
ex OpenAI
Taehoon Kim
An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

Enrico Fini 48 Sep 27, 2022
RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

Simo Ryu 90 Dec 8, 2022
A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

Harry Yang 121 Dec 17, 2022
A pytorch implementation of Pytorch-Sketch-RNN

Pytorch-Sketch-RNN A pytorch implementation of https://arxiv.org/abs/1704.03477 In order to draw other things than cats, you will find more drawing da

Alexis David Jacq 172 Dec 12, 2022
PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Advantage async actor-critic Algorithms (A3C) in PyTorch @inproceedings{mnih2016asynchronous, title={Asynchronous methods for deep reinforcement lea

LEI TAI 111 Dec 8, 2022
Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

Arthur Juliani 76 Jan 7, 2023
Fang Zhonghao 13 Nov 19, 2022
RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent

Phil Wang 556 Jan 4, 2023
HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

Yash Sanjay Bhalgat 616 Jan 6, 2023
Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

NN Template Generic template to bootstrap your PyTorch project. Click on Use this Template and avoid writing boilerplate code for: PyTorch Lightning,

Luca Moschella 520 Dec 30, 2022
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

NVIDIA Corporation 6.9k Jan 3, 2023
Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

30 Days Of Machine Learning Using Pytorch Objective of the repository is to learn and build machine learning models using Pytorch. List of Algorithms

Mayur 119 Nov 24, 2022
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pytorch Lightning 1.4k Jan 1, 2023
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 360 Dec 10, 2022
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

Ritchie Ng 9.2k Jan 2, 2023
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 359 Jan 5, 2023
A bunch of random PyTorch models using PyTorch's C++ frontend

PyTorch Deep Learning Models using the C++ frontend Gettting started Clone the repo 1. https://github.com/mrdvince/pytorchcpp 2. cd fashionmnist or

Vince 0 Jul 13, 2021
PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

Subin An 8 Nov 21, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 157 Dec 11, 2022