A TensorFlow implementation of DeepMind's WaveNet paper

Overview

A TensorFlow implementation of DeepMind's WaveNet paper

Build Status

This is a TensorFlow implementation of the WaveNet generative neural network architecture for audio generation.

The WaveNet neural network architecture directly generates a raw audio waveform, showing excellent results in text-to-speech and general audio generation (see the DeepMind blog post and paper for details).

The network models the conditional probability to generate the next sample in the audio waveform, given all previous samples and possibly additional parameters.

After an audio preprocessing step, the input waveform is quantized to a fixed integer range. The integer amplitudes are then one-hot encoded to produce a tensor of shape (num_samples, num_channels).

A convolutional layer that only accesses the current and previous inputs then reduces the channel dimension.

The core of the network is constructed as a stack of causal dilated layers, each of which is a dilated convolution (convolution with holes), which only accesses the current and past audio samples.

The outputs of all layers are combined and extended back to the original number of channels by a series of dense postprocessing layers, followed by a softmax function to transform the outputs into a categorical distribution.

The loss function is the cross-entropy between the output for each timestep and the input at the next timestep.

In this repository, the network implementation can be found in model.py.

Requirements

TensorFlow needs to be installed before running the training script. Code is tested on TensorFlow version 1.0.1 for Python 2.7 and Python 3.5.

In addition, librosa must be installed for reading and writing audio.

To install the required python packages, run

pip install -r requirements.txt

For GPU support, use

pip install -r requirements_gpu.txt

Training the network

You can use any corpus containing .wav files. We've mainly used the VCTK corpus (around 10.4GB, Alternative host) so far.

In order to train the network, execute

python train.py --data_dir=corpus

to train the network, where corpus is a directory containing .wav files. The script will recursively collect all .wav files in the directory.

You can see documentation on each of the training settings by running

python train.py --help

You can find the configuration of the model parameters in wavenet_params.json. These need to stay the same between training and generation.

Global Conditioning

Global conditioning refers to modifying the model such that the id of a set of mutually-exclusive categories is specified during training and generation of .wav file. In the case of the VCTK, this id is the integer id of the speaker, of which there are over a hundred. This allows (indeed requires) that a speaker id be specified at time of generation to select which of the speakers it should mimic. For more details see the paper or source code.

Training with Global Conditioning

The instructions above for training refer to training without global conditioning. To train with global conditioning, specify command-line arguments as follows:

python train.py --data_dir=corpus --gc_channels=32

The --gc_channels argument does two things:

  • It tells the train.py script that it should build a model that includes global conditioning.
  • It specifies the size of the embedding vector that is looked up based on the id of the speaker.

The global conditioning logic in train.py and audio_reader.py is "hard-wired" to the VCTK corpus at the moment in that it expects to be able to determine the speaker id from the pattern of file naming used in VCTK, but can be easily be modified.

Generating audio

Example output generated by @jyegerlehner based on speaker 280 from the VCTK corpus.

You can use the generate.py script to generate audio using a previously trained model.

Generating without Global Conditioning

Run

python generate.py --samples 16000 logdir/train/2017-02-13T16-45-34/model.ckpt-80000

where logdir/train/2017-02-13T16-45-34/model.ckpt-80000 needs to be a path to previously saved model (without extension). The --samples parameter specifies how many audio samples you would like to generate (16000 corresponds to 1 second by default).

The generated waveform can be played back using TensorBoard, or stored as a .wav file by using the --wav_out_path parameter:

python generate.py --wav_out_path=generated.wav --samples 16000 logdir/train/2017-02-13T16-45-34/model.ckpt-80000

Passing --save_every in addition to --wav_out_path will save the in-progress wav file every n samples.

python generate.py --wav_out_path=generated.wav --save_every 2000 --samples 16000 logdir/train/2017-02-13T16-45-34/model.ckpt-80000

Fast generation is enabled by default. It uses the implementation from the Fast Wavenet repository. You can follow the link for an explanation of how it works. This reduces the time needed to generate samples to a few minutes.

To disable fast generation:

python generate.py --samples 16000 logdir/train/2017-02-13T16-45-34/model.ckpt-80000 --fast_generation=false

Generating with Global Conditioning

Generate from a model incorporating global conditioning as follows:

python generate.py --samples 16000  --wav_out_path speaker311.wav --gc_channels=32 --gc_cardinality=377 --gc_id=311 logdir/train/2017-02-13T16-45-34/model.ckpt-80000

Where:

--gc_channels=32 specifies 32 is the size of the embedding vector, and must match what was specified when training.

--gc_cardinality=377 is required as 376 is the largest id of a speaker in the VCTK corpus. If some other corpus is used, then this number should match what is automatically determined and printed out by the train.py script at training time.

--gc_id=311 specifies the id of speaker, speaker 311, for which a sample is to be generated.

Running tests

Install the test requirements

pip install -r requirements_test.txt

Run the test suite

./ci/test.sh

Missing features

Currently there is no local conditioning on extra information which would allow context stacks or controlling what speech is generated.

Related projects

Comments
  • Global conditioning

    Global conditioning

    There is a unit test in test_model.py.

    This code doesn't provider an AudioReader does not send back the speaker id, so it's not ready for use quite yet.

    There are other global conditioning implementations in flight. Related discussions here. Let's try to find the most expeditious way of getting something merged from the various implementations. We're past-due for it IMO.

    opened by jyegerlehner 30
  • Training error in main.py

    Training error in main.py

    Getting the following error when I try to train the network - any idea what this is?

    I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
    I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
    I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
    I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
    I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
    I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
    name: GeForce GTX TITAN X
    major: 5 minor: 2 memoryClockRate (GHz) 1.076
    pciBusID 0000:01:00.0
    Total memory: 12.00GiB
    Free memory: 11.53GiB
    I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
    I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
    I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0)
    Traceback (most recent call last):
      File "main.py", line 129, in <module>
        main()
      File "main.py", line 83, in main
        loss = net.loss(audio_batch)
      File "/home/seth/Development/tensorflow-wavenet/wavenet.py", line 97, in loss
        raw_output = self._create_network(encoded)
      File "/home/seth/Development/tensorflow-wavenet/wavenet.py", line 67, in _create_network
        dilation=dilation)
      File "/home/seth/Development/tensorflow-wavenet/wavenet.py", line 23, in _create_dilation_layer
        name="conv_f")
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 168, in atrous_conv2d
        in_height = int(value_shape[1])
    TypeError: __int__ returned non-int (type NoneType)
    
    opened by polyrhythmatic 29
  • librosa.util.exceptions.ParameterError: Buffer is too short (n=1427) for frame_length=2048

    librosa.util.exceptions.ParameterError: Buffer is too short (n=1427) for frame_length=2048

    Exception in thread Thread-13: Traceback (most recent call last): File "//anaconda/envs/py35/lib/python3.5/threading.py", line 914, in _bootstrap_inner self.run() File "//anaconda/envs/py35/lib/python3.5/threading.py", line 862, in run self._target(*self._args, **self._kwargs) File "/Users/fs8b/Documents/tech/tensorflow-wavenet-master/wavenet/audio_reader.py", line 162, in thread_main audio = trim_silence(audio[:, 0], self.silence_threshold) File "/Users/fs8b/Documents/tech/tensorflow-wavenet-master/wavenet/audio_reader.py", line 66, in trim_silence energy = librosa.feature.rmse(audio) File "//anaconda/envs/py35/lib/python3.5/site-packages/librosa/feature/spectral.py", line 575, in rmse hop_length=hop_length) File "//anaconda/envs/py35/lib/python3.5/site-packages/librosa/util/utils.py", line 82, in frame ' for frame_length={:d}'.format(len(y), frame_length)) librosa.util.exceptions.ParameterError: Buffer is too short (n=1427) for frame_length=2048

    After training on the entire VCTK-Corpus, this is the error of which kills execution. Has anybody else encountered this?

    Still cannot even train successfully, let alone generate audio

    opened by Jovonni 23
  • Can't generate samples from checkpoint file

    Can't generate samples from checkpoint file

    When I try to run generate.py per the readme, I get this:

    I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
    Restoring model from model.ckpt-250
    Traceback (most recent call last):
      File "generate.py", line 86, in <module>
        main()
      File "generate.py", line 66, in main
        feed_dict={samples: window})
      File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 710, in run
        run_metadata_ptr)
      File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 908, in _run
        feed_dict_string, options, run_metadata)
      File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 958, in _do_run
        target_list, options, run_metadata)
      File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 978, in _do_call
        raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors.InvalidArgumentError: Output dimensions must be positive
             [[Node: wavenet/dilated_stack/layer1/conv_filter/BatchToSpace = BatchToSpace[T=DT_FLOAT, block_size=2, _device="/job:localhost/replica:0/task:0/gpu:0"](wavenet/dilated_stack/layer1/conv_filter, wavenet/dilated_stack/layer1/conv_filter/BatchToSpace/crops)]]
    Caused by op u'wavenet/dilated_stack/layer1/conv_filter/BatchToSpace', defined at:
      File "generate.py", line 86, in <module>
        main()
      File "generate.py", line 51, in main
        next_sample = net.predict_proba(samples)
      File "/home/ubuntu/jupyter_base/project/tensorflow-wavenet/wavenet.py", line 154, in predict_proba
        raw_output = self._create_network(encoded)
      File "/home/ubuntu/jupyter_base/project/tensorflow-wavenet/wavenet.py", line 112, in _create_network
        self.dilation_channels)
      File "/home/ubuntu/jupyter_base/project/tensorflow-wavenet/wavenet.py", line 51, in _create_dilation_layer
        name="conv_filter")
      File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 228, in atrous_conv2d
        block_size=rate)
      File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 308, in batch_to_space
        block_size=block_size, name=name)
      File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op
        op_def=op_def)
      File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2317, in create_op
        original_op=self._default_original_op, op_def=op_def)
      File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1239, in __init__
        self._traceback = _extract_stack()```
    
    bug 
    opened by maxhodak 22
  • My implementation of WaveNet for text generation (based in this repository)

    My implementation of WaveNet for text generation (based in this repository)

    Hi, friends.

    As I have not a good GPU for heping you directly in this, I have use the baseline of the work in this repository to develop a WaveNet text generator (self-generator): https://github.com/Zeta36/tensorflow-tex-wavenet.

    In summary: I utilize the WaveNet model as a text generator. I feed the model using a raw text data (characters), instead of raw audio files, and once the network is trained, I use the conditional probability found to generate samples (characters) into an self-generating process.

    Only printable ASCII characters (Dec. 0 up to 255) is supported right now. Results

    And pretty interesting results are reached!! Feeding the network with enough text and training, the model is able to memorize the probability of the characters disposition (in a lenguage), and generate later even a very similar text!!

    For example, using the Penn Tree Bank (PTB) dataset, and only after 15000 steps of training (with low set of parameters setting) this was the self-generated output (the final loss was around 1.1):

    "Prediction is: 300-servenns on the divide mushin attore and operations losers nis called him for investment it was with as pursicularly federal and sotheby d. reported firsts truckhe of the guarantees as paining at the available ransions i 'm new york for basicane as a facerement of its a set to the u.s. spected on install death about in the little there have a $ N million or N N bilot in closing is of a trading a congress of society or N cents for policy half feeling the does n't people of general and the crafted ended yesterday still also arjas trading an effectors that a can singaes about N bound who that mestituty was below for which unrecontimer 's have day simple d. frisons already earnings on the annual says had minority four-$ N sance for an advised in reclution by from $ N million morris selpiculations the not year break government these up why thief east down for his hobses weakness as equiped also plan amr. him loss appealle they operation after and the monthly spendings soa $ N million from cansident third-quarter loan was N pressure of new and the intended up he header because in luly of tept. N million crowd up lowers were to passed N while provision according to and canada said the 1980s defense reporters who west scheduled is a volume at broke also and national leader than N years on the sharing N million pro-m was our american piconmentalist profited himses but the measures from N in N N of social only announcistoner corp. say to average u.j. dey said he crew is vice phick-bar creating the drives will shares of customer with welm reporters involved in the continues after power good operationed retain medhay as the end consumer whitecs of the national inc. closed N million advanc"

    This is really wonderful!! We can see that the original WaveNet model has a great capacity to learn and save long codified text information inside its nodes (and not only audio or image information). This "text generator" WaveNet was able to learn how to write English words and phrases just by predicting characters one by one, and sometimes was able even to learn what word to use based on context.

    This output is far to be perfect, but It was trained in a only CPU machine (without GPU) using a low set of parameters configuration in just two hours!! I hope somebody with a better computer can explore the potential of this implementation.

    You can download the new development in here: https://github.com/Zeta36/tensorflow-tex-wavenet.

    Technically:

    1. I made a TextReader for feeding and replace the AudioReader.
    2. I used the printable character ASCII decimal value (0-255) as the 8bit sample (and I remove the mu_law function from everywhere).
    3. Removed all TensorBoard summaries (I have no memory to waste :P).
    4. Removed wite_wav() and developed a write_text()
    5. Some other minor changes: I start the "waveform" always with a space (char 32) and not with a random int, changed some terminal arguments, etc.

    And that all!!

    I hope this can help you in any way.

    Best regards, Samu.

    opened by Zeta36 21
  • [WIP] Compute loss for outputs only where receptive field is filled

    [WIP] Compute loss for outputs only where receptive field is filled

    This is my attempt at a fix for issue 98, using nakosung's suggested solution.

    I've merged it into a branch I'm training on, and there is not an immediate drop to lower reported loss. Though it might be learning a bit faster. Hard to say. At least it doesn't appear to have broken anything.

    opened by jyegerlehner 20
  • Excessive memory consumption

    Excessive memory consumption

    The network currently runs into out of memory issues at a low number of layers. This seems to be a problem with TensorFlow's atrous_conv2d operation. If I set the dilation factor to 1, which means atrous_conv2d simply calls conv2d, I can easily run with 10s of layers. It could just be the additional batch_to_space and space_to_batch operations, in which case I can write a single C++ op for atrous_conv2d.

    opened by ibab 20
  • The u-law encoding is badly mapping values from the [-1,1] range to [0,255]

    The u-law encoding is badly mapping values from the [-1,1] range to [0,255]

    The u-law encoding was badly mapping values from the [-1,1] range to [0,255].

    The correct equation to do this is (tested): return tf.cast(((signal + 1) * mu) / 2, tf.int32)

    opened by Zeta36 19
  • Fast generation

    Fast generation

    We added fast wavenet generation. Addresses issue #26.

    Verification

    • We compared the output of our fast generation with slow generation, and ensured it exactly matches.
    • We also did some speed tests, and verify it is substantially faster 😸

    Any comments on style/formatting are welcome.

    opened by tomlepaine 19
  • Added temperature flag to generation script

    Added temperature flag to generation script

    It's nice to be able to specify sampling "temperature" when generating output, usually for aesthetic reasons, so I added some code to scale the sampling probabilities if a temperature other than 1.0 is provided.

    Demo: https://soundcloud.com/robinsloan/sets/tensorflow-wavenet-temperature-demo

    opened by robinsloan 16
  • What should output wave file sound like?

    What should output wave file sound like?

    From the model of mine trained 1999 steps(It might be so little steps to sound normally), It sounds just like noises.

    It would be better to give well-trained example output for understanding desired output.

    opened by chanil1218 16
  • Project dependencies may have API risk issues

    Project dependencies may have API risk issues

    Hi, In tensorflow-wavenet, inappropriate dependency versioning constraints can cause risks.

    Below are the dependencies and version constraints that the project is using

    librosa>=0.5
    tensorflow>=1.0.0
    

    The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

    After further analysis, in this project, The version constraint of dependency librosa can be changed to >=0.2.0,<=0.7.2.

    The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

    The invocation of the current project includes all the following methods.

    The calling methods from the librosa
    librosa.output.write_wav
    
    The calling methods from the all methods
    sum
    tf.trainable_variables
    tf.Variable
    tf.nn.embedding_lookup
    np.logaddexp.reduce
    coord.join
    enumerate
    np.random.randint
    self._generator_conv
    net.predict_proba
    self._generator_causal_layer
    self.coord.should_stop
    tf.RunOptions
    os.makedirs
    q.enqueue_many
    tf.train.AdamOptimizer
    audio_reader.trim_silence
    args.optimizer.optimizer_factory
    tf.summary.audio
    q.dequeue
    tf.train.MomentumOptimizer
    argparse.ArgumentTypeError
    create_variable
    self._one_hot
    tf.histogram_summary
    get_arguments
    tf.div
    tf.zeros
    tf.sigmoid
    sys.stdout.flush
    initializer
    tf.train.Saver
    time_to_batch
    writer.add_graph
    librosa.load
    tf.summary.merge_all
    reader.dequeue
    np.nonzero
    WaveNetModel.calculate_receptive_field
    get_default_logdir
    load_generic_audio
    tf.train.get_checkpoint_state
    np.seterr
    self._create_generator
    tf.size
    open
    librosa.output.write_wav
    var.append
    dict
    audio.reshape
    f.write
    create_bias_variable
    np.arange
    self._generator_dilation_layer
    find_files
    tf.pad
    os.path.join
    optimizer.minimize
    np.array
    tf.constant
    trim_silence
    write_wav
    net.loss
    tf.RunMetadata
    abs
    tf.constant_initializer
    saver.restore
    list
    self._embed_gc
    randomize_files
    tf.PaddingFIFOQueue
    id_reg_expression.findall
    self._create_variables
    waveform.append
    tf.global_variables
    np.pad
    parser.parse_args
    self._create_causal_layer
    tf.cond
    tf.shape
    tf.transpose
    np.testing.assert_allclose
    float
    batch_to_time
    np.reshape
    sess.run
    tf.placeholder
    tf.add_n
    self._create_dilation_layer
    int
    len
    tf.nn.conv1d
    WaveNetModel
    librosa.core.frames_to_samples
    tf.slice
    ckpt.model_checkpoint_path.split
    thread.start
    create_seed
    threading.Thread
    tf.nn.l2_loss
    fnmatch.filter
    ckpt.model_checkpoint_path.split.split
    self.threads.append
    tf.get_default_graph
    tf.Session
    tf.summary.FileWriter
    tf.name_scope
    tf.nn.softmax
    tf.to_float
    tf.nn.softmax_cross_entropy_with_logits
    not_all_have_id
    tf.cast
    tf.add
    datetime.now
    time_since_print.total_seconds
    mu_law_decode
    os.walk
    np.identity
    tf.one_hot
    parser.add_argument
    tf.train.RMSPropOptimizer
    self.queue.dequeue_many
    json.load
    format
    s.lower
    tf.ConfigProto
    tf.to_int32
    librosa.feature.rmse
    create_embedding_table
    causal_conv
    tf.global_variables_initializer
    datetime.now.str.replace
    main
    push_ops.append
    save
    global_condition.get_shape
    str
    self.gc_queue.dequeue_many
    tf.train.start_queue_runners
    load
    outputs.extend
    q.enqueue
    tf.sign
    tf.FIFOQueue
    mu_law_encode
    net.predict_proba_incremental
    tf.train.Coordinator
    id_reg_exp.findall
    time.time
    get_category_cardinality
    print
    tl.generate_chrome_trace_format
    np.exp
    argparse.ArgumentParser
    writer.add_run_metadata
    self._create_network
    seed.sess.run.tolist
    NotImplementedError
    ValueError
    tf.reshape
    AudioReader
    range
    files.append
    os.path.exists
    tf.tanh
    np.random.choice
    tf.summary.scalar
    saver.save
    re.compile
    tf.nn.relu
    tf.reduce_mean
    random.randint
    tf.minimum
    reader.dequeue_gc
    timeline.Timeline
    np.log
    tf.abs
    init_ops.append
    reader.start_threads
    optimizer_factory.keys
    self.queue.enqueue
    tf.variable_scope
    self.gc_queue.enqueue
    validate_directories
    tf.matmul
    outputs.append
    tf.log1p
    tf.contrib.layers.xavier_initializer_conv2d
    writer.add_summary
    coord.request_stop
    

    @developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

    opened by PyDeps 0
  • about loading VCTK_Corpus dataset?

    about loading VCTK_Corpus dataset?

    when I used librosa to load audio file of VCTK_Corpus, the following errors occurs. Has anyone encountered the same situation? File "/anaconda3/envs/tf/lib/python3.6/site-packages/librosa/core/audio.py", line 112, in load with audioread.audio_open(os.path.realpath(path)) as input_file:

    File "/anaconda3/envs/tf/lib/python3.6/site-packages/audioread/init.py", line 116, in audio_open raise NoBackendError()

    audioread.exceptions.NoBackendError

    opened by Joll123 0
  • ModuleNotFoundError: No module named 'tensorflow.contrib'

    ModuleNotFoundError: No module named 'tensorflow.contrib'

    Hi,

    I believe there isn't a contrib module in Tensorflow 2.0 - does this mean we need an earlier version of TF to run wavenet?

    `import wavenet

    2021-10-25 12:32:15.774545: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2021-10-25 12:32:15.774629: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Traceback (most recent call last):

    File "", line 1, in import wavenet

    File "C:\Users**\Environments\project1\lib\site-packages\wavenet_init_.py", line 5, in from .network import Conv, Model, Network

    File "C:\Users**\Environments\project1\lib\site-packages\wavenet\network.py", line 9, in from .cell import ConvCell

    File "C:\Users**\Environments\project1\lib\site-packages\wavenet\cell.py", line 6, in from tensorflow.contrib.rnn import RNNCell # pylint: disable=E0611

    ModuleNotFoundError: No module named 'tensorflow.contrib'`

    Cheers, Tristan

    opened by tristankleyn 0
  • Why is there no activation function applied to the 1x1 conv that produces the dense output?

    Why is there no activation function applied to the 1x1 conv that produces the dense output?

    I have been trying to understand why there is no activation function applied to the 1x1 conv that is used between the residual connections. From what I understand having a linear layer with no activation function does not really add to the expressive power of the model. The skip connections eventually have a relu applied so that does make sense to me. However, the linear output of the residual connections has no activation applied as far as I can tell. It is just added to the residual bus and fed into the next layer. What is the point of having the 1x1 convolution in this case? Why not just skip the 1x1 convolution and add the filter * gate directly to the inputs to create the dense output?

    opened by chasep255 0
  • Module 'tensorflow' has no attribute 'placeholder'

    Module 'tensorflow' has no attribute 'placeholder'

    I'm using Anaconda3 and the latest version of this repository. I have manually installed librosa and TensorFlow (following Anaconda tutorial). Environment: Anaconda Prompt (Windows 10), TensorFlow set up as "tf" using conda create -n tf tensorflow

    See attached picture for the error. The same happens when using tf-gpu. IMG1621652119

    I'm sure I did something wrong but I don't know what it is.

    opened by UnforeseenOcean 8
Owner
Igor Babuschkin
Igor Babuschkin
TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

null 2.6k Jan 4, 2023
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Peter Lin 6.5k Jan 4, 2023
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting (RVM) English | 中文 Official repository for the paper Robust High-Resolution Video Matting with Temporal Guidance. RVM is specific

flow-dev 2 Aug 21, 2022
Functional TensorFlow Implementation of Singular Value Decomposition for paper Fast Graph Learning

tf-fsvd TensorFlow Implementation of Functional Singular Value Decomposition for paper Fast Graph Learning with Unique Optimal Solutions Cite If you f

Sami Abu-El-Haija 14 Nov 25, 2021
Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences", CVPR 2021.

HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature fo

Google Interns 50 Dec 21, 2022
Unofficial Tensorflow-Keras implementation of Fastformer based on paper [Fastformer: Additive Attention Can Be All You Need](https://arxiv.org/abs/2108.09084).

Fastformer-Keras Unofficial Tensorflow-Keras implementation of Fastformer based on paper Fastformer: Additive Attention Can Be All You Need. Tensorflo

Yam Peleg 10 Jan 30, 2022
Tensorflow 2 implementation of the paper: Learning and Evaluating Representations for Deep One-class Classification published at ICLR 2021

Deep Representation One-class Classification (DROC). This is not an officially supported Google product. Tensorflow 2 implementation of the paper: Lea

Google Research 137 Dec 23, 2022
Unofficial implementation of the paper: PonderNet: Learning to Ponder in TensorFlow

PonderNet-TensorFlow This is an Unofficial Implementation of the paper: PonderNet: Learning to Ponder in TensorFlow. Official PyTorch Implementation:

null 1 Oct 23, 2022
The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

PIC4SeRCentre 20 Jan 3, 2023
Unofficial Tensorflow 2 implementation of the paper Implicit Neural Representations with Periodic Activation Functions

Siren: Implicit Neural Representations with Periodic Activation Functions The unofficial Tensorflow 2 implementation of the paper Implicit Neural Repr

Seyma Yucer 2 Jun 27, 2022
Official TensorFlow code for the forthcoming paper

~ Efficient-CapsNet ~ Are you tired of over inflated and overused convolutional neural networks? You're right! It's time for CAPSULES :)

Vittorio Mazzia 203 Jan 8, 2023
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

null 73 Nov 6, 2022
Annotated notes and summaries of the TensorFlow white paper, along with SVG figures and links to documentation

TensorFlow White Paper Notes Features Notes broken down section by section, as well as subsection by subsection Relevant links to documentation, resou

Sam Abrahams 437 Oct 9, 2022
The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

This repository contains the software implementation of most algorithms used or developed in my research. The LaTeX and Python code for generating the

João Fonseca 3 Jan 3, 2023
Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow

xRBM Library Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow Installation Using pip: pip install xrbm Examples Tut

Omid Alemi 55 Dec 29, 2022
StyleGAN2 - Official TensorFlow Implementation

StyleGAN2 - Official TensorFlow Implementation

NVIDIA Research Projects 10.1k Dec 28, 2022
An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

SERank An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow

Zhihu 44 Oct 20, 2022
Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow

Perceiver This Python package implements Perceiver: General Perception with Iterative Attention by Andrew Jaegle in TensorFlow. This model builds on t

Rishit Dagli 84 Oct 15, 2022
Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow.

Denoised-Smoothing-TF Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow. Denoised Smoothing is

Sayak Paul 19 Dec 11, 2022