Seq2seq - Sequence to Sequence Learning with Keras

Overview

Seq2seq

Sequence to Sequence Learning with Keras

Hi! You have just found Seq2Seq. Seq2Seq is a sequence to sequence learning add-on for the python deep learning library Keras. Using Seq2Seq, you can build and train sequence-to-sequence neural network models in Keras. Such models are useful for machine translation, chatbots (see [4]), parsers, or whatever that comes to your mind.

seq2seq

Getting started

Seq2Seq contains modular and reusable layers that you can use to build your own seq2seq models as well as built-in models that work out of the box. Seq2Seq models can be compiled as they are or added as layers to a bigger model. Every Seq2Seq model has 2 primary layers : the encoder and the decoder. Generally, the encoder encodes the input sequence to an internal representation called 'context vector' which is used by the decoder to generate the output sequence. The lengths of input and output sequences can be different, as there is no explicit one on one relation between the input and output sequences. In addition to the encoder and decoder layers, a Seq2Seq model may also contain layers such as the left-stack (Stacked LSTMs on the encoder side), the right-stack (Stacked LSTMs on the decoder side), resizers (for shape compatibility between the encoder and the decoder) and dropout layers to avoid overfitting. The source code is heavily documented, so lets go straight to the examples:

A simple Seq2Seq model:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=8)
model.compile(loss='mse', optimizer='rmsprop')

That's it! You have successfully compiled a minimal Seq2Seq model! Next, let's build a 6 layer deep Seq2Seq model (3 layers for encoding, 3 layers for decoding).

Deep Seq2Seq models:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=8, depth=3)
model.compile(loss='mse', optimizer='rmsprop')

Notice that we have specified the depth for both encoder and decoder as 3, and your model has a total depth of 3 + 3 = 6. You can also specify different depths for the encoder and the decoder. Example:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=20, depth=(4, 5))
model.compile(loss='mse', optimizer='rmsprop')

Notice that the depth is specified as tuple, (4, 5). Which means your encoder will be 4 layers deep whereas your decoder will be 5 layers deep. And your model will have a total depth of 4 + 5 = 9.

Advanced Seq2Seq models:

Until now, you have been using the SimpleSeq2Seq model, which is a very minimalistic model. In the actual Seq2Seq implementation described in [1], the hidden state of the encoder is transferred to decoder. Also, the output of decoder at each timestep becomes the input to the decoder at the next time step. To make things more complicated, the hidden state is propogated throughout the LSTM stack. But you have no reason to worry, as we have a built-in model that does all that out of the box. Example:

import seq2seq
from seq2seq.models import Seq2Seq

model = Seq2Seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4)
model.compile(loss='mse', optimizer='rmsprop')

Note that we had to specify the complete input shape, including the samples dimensions. This is because we need a static hidden state(similar to a stateful RNN) for transferring it across layers. (Update : Full input shape is not required in the latest version, since we switched to Recurrent Shop backend). By the way, Seq2Seq models also support the stateful argument, in case you need it.

You can also experiment with the hidden state propogation turned off. Simply set the arguments broadcast_state and inner_broadcast_state to False.

Peeky Seq2seq model:

Let's not stop there. Let's build a model similar to cho et al 2014, where the decoder gets a 'peek' at the context vector at every timestep.

cho et al 2014

To achieve this, simply add the argument peek=True:

import seq2seq
from seq2seq.models import Seq2Seq

model = Seq2Seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4, peek=True)
model.compile(loss='mse', optimizer='rmsprop')

Seq2seq model with attention:

Attention Seq2seq

Let's not stop there either. In all the models described above, there is no allignment between the input sequence elements and the output sequence elements. But for machine translation, learning a soft allignment between the input and output sequences imporves performance.[3]. The Seq2seq framework includes a ready made attention model which does the same. Note that in the attention model, there is no hidden state propogation, and a bidirectional LSTM encoder is used by default. Example:

import seq2seq
from seq2seq.models import AttentionSeq2Seq

model = AttentionSeq2Seq(input_dim=5, input_length=7, hidden_dim=10, output_length=8, output_dim=20, depth=4)
model.compile(loss='mse', optimizer='rmsprop')

As you can see, in the attention model you need not specify the samples dimension as there are no static hidden states involved(But you have to if you are building a stateful Seq2seq model). Note: You can set the argument bidirectional=False if you wish not to use a bidirectional encoder.

Final Words

That's all for now. Hope you love this library. For any questions you might have, create an issue and I will get in touch. You can also contribute to this project by reporting bugs, adding new examples, datasets or models.

Installation:

sudo pip install git+https://github.com/farizrahman4u/seq2seq.git

Requirements:

Working Example:

Papers:

Comments
  • Using Seq2Seq models in modular way (nesting models) results in MissingInputError

    Using Seq2Seq models in modular way (nesting models) results in MissingInputError

    Hi! Sorry for crossposting this (I also opened this issue on the Keras main repo), but I figured maybe it's actually related to seq2seq or recurrentshop internals.

    I'm trying to use a Seq2Seq model as follows:

    
    input = Input(shape=(maxlen,))
    one_hot = Lambda(
        lambda x: K.one_hot(K.cast(x, dtype="int32"), nb_classes=num_inputs), output_shape=(maxlen, num_inputs)
    )(input)
    output_seq = Seq2Seq(
        input_shape=(maxlen, num_inputs),
        hidden_dim=hidden_dim,
        output_length=out_maxlen, output_dim=num_inputs,
        depth=2, peek=True
    )(one_hot)
    predicted = TimeDistributed(Activation("softmax"))(output_seq)
    model = Model(input, predicted)
    return model
    

    Which compiles fine, but when I try to fit the model using my (num_samples, maxlen) shaped matrix, Theano complains that input_2 was not provided - which, as it turns out, is the input layer of the Seq2Seq model. I was hoping this layer would be fed the output of my Lambda layer automatically, but apparently this does not work. Is what I am trying to do possible? I realize I could just copy and slightly alter the Seq2Seq code, but of course I'd prefer just using the library for more maintainable code.

    More precise exception output:

    theano.gof.fg.MissingInputError: ("An input of the graph, used to compute Reshape{2}(input_2, TensorConstant{[-1 53]}), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.", input_2)
    
    opened by phdowling 21
  • Support for teacher forcing?

    Support for teacher forcing?

    In decoders, we usually feed the previous timestep's output vector on to the next input vector, language model style. However, in teacher forcing, the ground-truth predictions of the previous timestep are always used, rather than the actual prediction. I suppose a model could still train even without teacher forcing, but often it helps training quite a bit from what I read.

    I do not see a way of doing this without providing the teacher inputs as an explicit input to the keras model, though I might be wrong. Now, since in this library we do not supply such inputs I'm guessing this library does not implement teacher forcing, is this correct? If so, are there any plans to perhaps support it in the future?

    opened by phdowling 20
  • workable example needed

    workable example needed

    Dear developers, thank you for your work! It would be of great help to use your seq2seq implementation, however after a considerable amount of time and efforts I still can't do it due to the lack of documentation and examples. Keras docs don't tell much about seq2seq mapping either. So please add a simple workable code example that would demostrate the usage of your library: how to prepare the input data and how to implement the training and predicting procedures. Would be greatly appriciated.

    opened by nicolas-ivanov 20
  • Error on saving model

    Error on saving model

    bwlf02:~/data/seq2seq$ python
    Python 2.7.12 (default, Jun 28 2016, 17:49:40) 
    [GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import seq2seq
    Using Theano backend.
    >>> from seq2seq.models import AttentionSeq2Seq
    >>> 
    >>> model = AttentionSeq2Seq(input_dim=5, input_length=7, hidden_dim=10, output_length=8, output_dim=20, depth=4)
    >>> model.compile(loss='mse', optimizer='rmsprop')
    >>> model.save('1.h5', overwrite=True)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/is33/Envs/alexa_seq2seq/lib/python2.7/site-packages/keras/engine/topology.py", line 2423, in save
        save_model(self, filepath, overwrite)
      File "/home/is33/Envs/alexa_seq2seq/lib/python2.7/site-packages/keras/models.py", line 52, in save_model
        'config': model.get_config()
      File "/home/is33/Envs/alexa_seq2seq/lib/python2.7/site-packages/keras/engine/topology.py", line 2285, in get_config
        layer_config = layer.get_config()
      File "/home/is33/data/recurrentshop/recurrentshop/engine.py", line 497, in get_config
        config['model'] = self.model.get_config()
      File "/home/is33/Envs/alexa_seq2seq/lib/python2.7/site-packages/keras/models.py", line 967, in get_config
        'config': layer.get_config()})
      File "seq2seq/cells.py", line 110, in get_config
        base_config = super(LSTMDecoderCell, self).get_config()
    TypeError: super(type, obj): obj must be an instance or subtype of type
    
    opened by ishalyminov 11
  • Add readout_activation param to models

    Add readout_activation param to models

    The enables avoiding getting stuck in a NaN loss hole when training. This workaround let's the user fix #189

    Example usage:

    model = Seq2Seq(input_dim=in_dim, input_length=MAXLENGTH, hidden_dim=HIDDEN_SIZE, output_length=MAXLENGTH, output_dim=out_dim, depth=LAYERS, peek=True, readout_activation='softmax')
    

    I do not fully understand the implications of using softmax as the output activation layer, but in my own project (https://github.com/Automattic/wp-translate) setting the output to softmax using this code does seem to have gotten me past getting stuck with NaN during training.

    opened by gibrown 9
  • IndexError: index 3 is out of bounds for axis 0 with size 3

    IndexError: index 3 is out of bounds for axis 0 with size 3

    Dear humans, either I'm missimg something or there is a bug in Fariz's seq2seq implementation. What's your bet?

    Code:

    import numpy as np
    from keras.models import Sequential
    from seq2seq.seq2seq import Seq2seq
    
    
    vocab_size = 10 #number of words
    maxlen = 3 #length of input sequence and output sequence
    embedding_dim = 5 #word embedding size
    hidden_dim = 50 #memory size of seq2seq
    batch_size = 7
    
    seq2seq = Seq2seq(input_length=maxlen,
                      input_dim=embedding_dim,
                      hidden_dim=hidden_dim,
                      output_dim=vocab_size,
                      output_length=maxlen,
                      batch_size=batch_size,
                      depth=1)
    
    print 'Build model ...'
    model = Sequential()
    model.add(seq2seq)
    model.compile(loss='mse', optimizer='adam')
    
    print 'Generate dummy data ...'
    train_examples_num = batch_size
    X = np.zeros((train_examples_num, maxlen, embedding_dim))
    Y = np.zeros((train_examples_num, maxlen, vocab_size))
    
    for train_example_idx in xrange(train_examples_num):
        for word_idx in xrange(maxlen):
            w2v_vector = np.random.rand(1, embedding_dim)[0]
            X[train_example_idx][word_idx] = w2v_vector
    
            bool_vector = np.zeros(vocab_size)
            bool_vector[np.random.choice(vocab_size)] = 1
            Y[train_example_idx][word_idx] = bool_vector
    
    print X.shape, X
    print Y.shape, Y
    
    print 'Fit data ...'
    model.fit(X, Y)
    

    Log:

    Build model ...
    /usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
      from scan_perform.scan_perform import *
    Generate dummy data ...
    (7, 3, 5) [[[ 0.88256245  0.77128004  0.11195964  0.0918089   0.57508599]
      [ 0.23917097  0.69324662  0.5007371   0.08220433  0.25601655]
      [ 0.31997691  0.96215576  0.37516188  0.4564258   0.44645146]]
    
     [[ 0.1164474   0.68527686  0.48801347  0.06237132  0.64461641]
      [ 0.21418609  0.56414103  0.69280567  0.09577648  0.46501309]
      [ 0.59522824  0.82593701  0.8952664   0.61032139  0.60784708]]
    
     [[ 0.50277342  0.18204284  0.6920746   0.23992536  0.5031889 ]
      [ 0.24719549  0.39098328  0.84927183  0.93091596  0.93981078]
      [ 0.76817661  0.68241358  0.97509582  0.78777374  0.41076285]]
    
     [[ 0.83762506  0.76151013  0.06292322  0.71097064  0.77048028]
      [ 0.78948919  0.77401108  0.39082489  0.66905667  0.54795132]
      [ 0.74940861  0.26011439  0.23257989  0.87033028  0.88954607]]
    
     [[ 0.98032484  0.29076576  0.76085615  0.53828208  0.92028479]
      [ 0.81111357  0.52959467  0.41101679  0.39434533  0.47918241]
      [ 0.18741232  0.68735943  0.27534715  0.18796185  0.89010293]]
    
     [[ 0.00484476  0.38136868  0.55200039  0.36352682  0.65304447]
      [ 0.19502873  0.86442676  0.82170956  0.90937185  0.93152998]
      [ 0.71814645  0.47181875  0.99475651  0.24588243  0.13357496]]
    
     [[ 0.92716079  0.83195725  0.50047687  0.86742848  0.27778597]
      [ 0.90902709  0.60421839  0.17206286  0.53972434  0.9863197 ]
      [ 0.63227496  0.14045515  0.88635036  0.72415621  0.88298206]]]
    (7, 3, 10) [[[ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
      [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
      [ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]]
    
     [[ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]
      [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
      [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]]
    
     [[ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
      [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
      [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]
    
     [[ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]
      [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
      [ 0.  0.  0.  0.  0.  0.  0.  0.  1.  0.]]
    
     [[ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]
      [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
      [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]]
    
     [[ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]
      [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
      [ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]]
    
     [[ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
      [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
      [ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]]]
    Fit data ...
    Epoch 1/100
    Traceback (most recent call last):
      File "/home/nicolas/Code/seq2seq/bin/try.py", line 43, in <module>
        model.fit(X, Y)
      File "build/bdist.linux-x86_64/egg/keras/models.py", line 495, in fit
      File "build/bdist.linux-x86_64/egg/keras/models.py", line 216, in _fit
      File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 606, in __call__
        storage_map=self.fn.storage_map)
      File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 595, in __call__
        outputs = self.fn()
      File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 768, in rval
        r = p(n, [x[0] for x in i], o)
      File "/usr/local/lib/python2.7/dist-packages/theano/tensor/subtensor.py", line 2088, in perform
        out[0] = inputs[0].__getitem__(inputs[1:])
    IndexError: index 3 is out of bounds for axis 0 with size 3
    Apply node that caused the error: AdvancedSubtensor(Subtensor{int64::}.0, Subtensor{int64}.0, Subtensor{int64}.0)
    Inputs types: [TensorType(float64, 3D), TensorType(int64, vector), TensorType(int64, vector)]
    Inputs shapes: [(3, 7, 10), (21,), (21,)]
    Inputs strides: [(560, 80, 8), (8,), (8,)]
    Inputs values: ['not shown', 'not shown', 'not shown']
    
    Backtrace when the node is created:
      File "build/bdist.linux-x86_64/egg/keras/models.py", line 75, in weighted
        filtered_y_pred = y_pred[weights.nonzero()[:-1]]
    

    Repo with the source for your convenience: https://github.com/nicolas-ivanov/seq2seq

    opened by nicolas-ivanov 9
  • Error when import seq2seq

    Error when import seq2seq

    Hello,

    I got this error when I tried to import seq2seq:

    import seq2seq Using TensorFlow backend. Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/site-packages/seq2seq/init.py", line 1, in from .cells import * File "/usr/local/lib/python2.7/site-packages/seq2seq/cells.py", line 1, in from recurrentshop import RNNCell, LSTMCell, weight File "/usr/local/lib/python2.7/site-packages/recurrentshop/init.py", line 1, in from .engine import RNNCell, RecurrentContainer, weight File "/usr/local/lib/python2.7/site-packages/recurrentshop/engine.py", line 32, in _backend = getattr(K, K.backend() + '_backend') AttributeError: 'module' object has no attribute 'backend'

    What should I do?

    BTW, I just installed the latest version of seq2seq and recurrentshop. The following are the version of my python and keras.

    python: 2.7.11 keras: 1.0.5

    Thanks!

    opened by ay2456 8
  • Different input lengths

    Different input lengths

    I'm trying to train something like character level transliteration translator. But input length varies from 1 to 20. I want to avoid bucketing/padding proposed in tensorflow. Is there a way to have inputs of different lengths? I think that RNN-s are capable of solving that kind of problems...

    opened by TigranGalstyan 8
  • RuntimeError: maximum recursion depth exceeded while calling a Python object

    RuntimeError: maximum recursion depth exceeded while calling a Python object

    Hi, thanks for sharing the great code. I met an error when using the AttentionSeq2seq model.The error information:

    Traceback (most recent call last): File "train.py", line 182, in model.compile(loss='mse', optimizer='adam') File "build/bdist.linux-x86_64/egg/keras/models.py", line 467, in compile File "build/bdist.linux-x86_64/egg/keras/optimizers.py", line 250, in get_updates File "build/bdist.linux-x86_64/egg/keras/optimizers.py", line 47, in get_gradients File "build/bdist.linux-x86_64/egg/keras/backend/theano_backend.py", line 402, in gradients File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 545, in grad grad_dict, wrt, cost_name) File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 1283, in _populate_grad_dict rval = [access_grad_cache(elem) for elem in wrt] File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 1241, in access_grad_cache term = access_term_cache(node)[idx] File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 951, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 1241, in access_grad_cache term = access_term_cache(node)[idx] File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 951, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] ... ...

    File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1030, in local_traverse rval += local_traverse(inp, x) File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1030, in local_traverse rval += local_traverse(inp, x) File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1030, in local_traverse rval += local_traverse(inp, x) File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1030, in local_traverse rval += local_traverse(inp, x) File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1023, in local_traverse if equal_computations([graph], [x]): File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 410, in equal_computations if x not in in_xs and x.type != y.type: File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gof/utils.py", line 127, in ne return not self == other File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/tensor/type.py", line 260, in eq return type(self) == type(other) and other.dtype == self.dtype
    RuntimeError: maximum recursion depth exceeded while calling a Python object

    When I use the Seq2seq layer, the model works well. However when i changes to the AttentionSeq2seq layer, i meet this error.Here is my model with AttentionSeq2seq:

    model = Sequential() model.add(embeddings.Embedding(input_dim=len(chars)+1, output_dim=EMB_DIM, input_length=MAXLEN)) seq2seq = AttentionSeq2seq(input_dim=EMB_DIM, input_length=MAXLEN, hidden_dim=HIDDEN_SIZE, output_length=MAXLEN, output_dim=len(chars), depth=1) model.add(seq2seq) model.compile(loss='mse', optimizer='adam')

    And I am using the keras=0.3.1, numpy=1.10.4, seq2seq=0.0.2

    Besides, I changed the parameter "masking=False" to "mask=None" in the decoders.py, due to an unexpected param error, I don't know if it is the reason of the error mentioned before.

    Thanks a lot

    opened by qfzhu 8
  • Cannot import seq2seq

    Cannot import seq2seq

    Hi,

    I can't import the seq2seq library. With Keras 1.1.0, I get cannot import name initializers

    selection_43

    With Keras 2.0.2, I get cannot import name weight

    selection_44

    I'm using recurrentshop-1.0.0. I tried with recurrentshop-0.0.1 as mentioned in #172 but still got an error.

    Does anyone know what I should do to make it work ?

    Thanks :+1:

    opened by Blockost 7
  • Modified SimpleSeq2Seq in order to be able to run it in Keras 2.

    Modified SimpleSeq2Seq in order to be able to run it in Keras 2.

    Maybe this changes shouldn't go into the repo, but I decided to share my modifications. With this changes I was able to run model training.

    For some reasons first Dropout in the decoder failed to work, so I moved it to the final model.

    opened by nchervyakov 7
  • ValueError: An operation has `None` for gradient.

    ValueError: An operation has `None` for gradient.

    I meet a problem about gradient!!!

    the code is :

    ` from seq2seq import SimpleSeq2Seq, Seq2Seq, AttentionSeq2Seq import numpy as np

    input_length = 5 input_dim = 3

    output_length = 3 output_dim = 4

    samples = 100 hidden_dim = 24

    x = np.random.random((samples, input_length, input_dim)) y = np.random.random((samples, output_length, output_dim))

    model = SimpleSeq2Seq(input_shape=(5, 3), hidden_dim=10, output_length=3, output_dim=4, depth=(4, 5))

    model.compile(loss='mse', optimizer='sgd') model.fit(x, y, nb_epoch=10) ` And the error is:

    Traceback (most recent call last):

    File "", line 1, in model.fit(x, y, nb_epoch=10)

    File "E:\Anaconda\envs\tf2\lib\site-packages\keras\engine\training.py", line 1213, in fit self._make_train_function()

    File "E:\Anaconda\envs\tf2\lib\site-packages\keras\engine\training.py", line 316, in _make_train_function loss=self.total_loss)

    File "E:\Anaconda\envs\tf2\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper return func(*args, **kwargs)

    File "E:\Anaconda\envs\tf2\lib\site-packages\keras\optimizers.py", line 259, in get_updates grads = self.get_gradients(loss, params)

    File "E:\Anaconda\envs\tf2\lib\site-packages\keras\optimizers.py", line 93, in get_gradients raise ValueError('An operation has None for gradient. '

    ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

    So, what should I do to solve it.

    Thanks for any help.

    opened by cui-xiaoang96 2
  • error:name 'K' is not defined

    error:name 'K' is not defined

    afte i use sudo pip install git+https://github.com/farizrahman4u/seq2seq.git,I write import seq2seq in python command line , but i meet this bug:

    Using TensorFlow backend. Traceback (most recent call last): File "", line 1, in File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 668, in _load_unlocked File "", line 638, in _load_backward_compatible File "/Users/shawn/anaconda3/lib/python3.7/site-packages/seq2seq-1.0.0-py3.7.egg/seq2seq/init.py", line 1, in File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 668, in _load_unlocked File "", line 638, in _load_backward_compatible File "/Users/shawn/anaconda3/lib/python3.7/site-packages/seq2seq-1.0.0-py3.7.egg/seq2seq/cells.py", line 1, in File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 668, in _load_unlocked File "", line 638, in _load_backward_compatible File "/Users/shawn/anaconda3/lib/python3.7/site-packages/recurrentshop-1.0.0-py3.7.egg/recurrentshop/init.py", line 1, in File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 668, in _load_unlocked File "", line 638, in _load_backward_compatible File "/Users/shawn/anaconda3/lib/python3.7/site-packages/recurrentshop-1.0.0-py3.7.egg/recurrentshop/engine.py", line 10, in NameError: name 'K' is not defined

    How can i solve this? thank u

    opened by fsdfsd123 3
  • Bad results of a simple translation task. Ask for help...:(

    Bad results of a simple translation task. Ask for help...:(

    Actually, I plan to do a machine translation project with this seq2seq module. Before that, I just did a simple test and got a very bad result. I don't know where goes wrong. Pls help me... Here's the process:

    #1. traning set

    def generate_sequence(length, n_unique):
        return [randint(1, n_unique-1) for _ in range(length)]
    x = np.array(generate_sequence(100000,100)).reshape(10000,10)
    y = np.array(generate_sequence(50000,100)).reshape(10000,5)
    x_encoder_input_data = to_categorical(x)
    y_decoder_target_data = to_categorical(y)
    
    #x_encoder_input_data.shape = (10000, 10, 100)
    #10000 training data, x_input_length=10,x_input_dim=100
    #y_decoder_target_data.shape = (10000, 5, 100)
    
    

    #2. building&training model

    model = SimpleSeq2Seq(input_dim=100,
                 input_length=10, 
                output_dim=100, 
                hidden_dim=128,
                output_length=5)
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
    model.fit(x_encoder_input_data,  y_decoder_target_data, batch_size=128, epochs=500)
    

    #3. the partial losses

    Epoch 1/500
    10000/10000  - 44s 4ms/step - loss: 7.2350 - acc: 0.0108
    Epoch 2/500
    10000/10000  - 20s 2ms/step - loss: 6.9500 - acc: 0.0104
    Epoch 3/500
    10000/10000  - 22s 2ms/step - loss: 10.7297 - acc: 0.0103
    Epoch 4/500
    10000/10000  - 20s 2ms/step - loss: 8.6834 - acc: 0.0096
    Epoch 5/500
    10000/10000  - 21s 2ms/step - loss: 8.4943 - acc: 0.0099
    Epoch 6/500
    10000/10000  - 19s 2ms/step - loss: 8.4487 - acc: 0.0100
    Epoch 7/500
    10000/10000  - 19s 2ms/step - loss: 8.6318 - acc: 0.0099
    Epoch 8/500
    10000/10000  - 18s 2ms/step - loss: 8.5765 - acc: 0.0099
    Epoch 9/500
    10000/10000  - 20s 2ms/step - loss: 8.4753 - acc: 0.0099
    Epoch 10/500
    10000/10000  - 20s 2ms/step - loss: 8.3738 - acc: 0.0099
    Epoch 11/500
    10000/10000  - 20s 2ms/step - loss: 8.3999 - acc: 0.0098
    Epoch 12/500
    10000/10000  - 19s 2ms/step - loss: 8.3108 - acc: 0.0099
    Epoch 13/500
    10000/10000  - 19s 2ms/step - loss: 8.3457 - acc: 0.0099
    Epoch 14/500
    10000/10000  - 19s 2ms/step - loss: 8.4852 - acc: 0.0098
    Epoch 15/500
    10000/10000  - 18s 2ms/step - loss: 8.4749 - acc: 0.0099
    Epoch 16/500
    10000/10000  - 18s 2ms/step - loss: 8.5881 - acc: 0.0098
    Epoch 17/500
    10000/10000  - 19s 2ms/step - loss: 8.3868 - acc: 0.0099
    Epoch 18/500
    10000/10000   - 21s 2ms/step - loss: 8.2499 - acc: 0.0098
    Epoch 19/500
    10000/10000   - 20s 2ms/step - loss: 8.4659 - acc: 0.0099
    Epoch 20/500
    10000/10000  - 20s 2ms/step - loss: 7.8421 - acc: 0.0099
    Epoch 21/500
    10000/10000   - 21s 2ms/step - loss: 7.6197 - acc: 0.0099
    Epoch 22/500
    10000/10000  - 20s 2ms/step - loss: 7.6193 - acc: 0.0099
    Epoch 23/500
    10000/10000   - 19s 2ms/step - loss: 7.6193 - acc: 0.0099
    Epoch 24/500
    10000/10000   - 21s 2ms/step - loss: 7.6193 - acc: 0.0099
    Epoch 25/500
    10000/10000   - 19s 2ms/step - loss: 7.6193 - acc: 0.0099
    Epoch 26/500
    10000/10000  - 22s 2ms/step - loss: 7.6193 - acc: 0.0099
    Epoch 27/500
    10000/10000  - 22s 2ms/step - loss: 7.6193 - acc: 0.0099
    ··· ···
    

    #4. predicting

    for seq_index in range(6):
        predictions = model.predict(x_encoder_input_data[seq_index:seq_index+1])
        predicted_list=[]
    
        for prediction_vector in predictions:
            for pred in prediction_vector:
                next_token = np.argmax(pred)
                predicted_list.append(next_token)
                
        print('-')
        print('Input sentence:', X[seq_index])
        print('Decoded sentence:', predicted_list)
        print('Target sentence:', y[seq_index])
    

    #5. the predicting results:

    -
    Input sentence: [28, 2, 46, 12, 21, 6]      #  x
    Decoded sentence: [78, 78, 78, 78, 66] # y_predict
    Target sentence: [82 22 82 41 27]          # y
    -
    Input sentence: [12, 20, 45, 28, 18, 42]
    Decoded sentence: [78, 78, 66, 66, 66]
    Target sentence: [43 36 30 13 64]
    -
    Input sentence: [3, 43, 45, 4, 33, 27]
    Decoded sentence: [78, 78, 66, 66, 66]
    Target sentence: [90 20 56 23 32]
    -
    Input sentence: [34, 50, 21, 20, 11, 6]
    Decoded sentence: [78, 78, 78, 78, 66]
    Target sentence: [27 57 50 57 81]
    -
    Input sentence: [47, 42, 14, 2, 31, 6]
    Decoded sentence: [78, 78, 78, 78, 66]
    Target sentence: [77 94 47 26 67]
    -
    Input sentence: [20, 24, 34, 31, 37, 25]
    Decoded sentence: [78, 78, 66, 66, 66]
    Target sentence: [11 48 99 67 66]
    
    opened by JillinJia 0
  • How to use seq2seq for simple sequences

    How to use seq2seq for simple sequences

    Hi

    I have a simple categorical sequence data set like Target has 3 classes

    X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | X11 | X12 | Target -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 0 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 1

    Will be able to use seq2seq for these king of problems? can you sample syntax

    Thank you.

    opened by rt3722 1
Owner
Fariz Rahman
Fariz Rahman
Using a Seq2Seq RNN architecture via TensorFlow to predict future Bitcoin prices

Recurrent Bitcoin Network A Data Science Thesis Project About This repository contains the source code for implementing Bitcoin price prediciton using

Frizu 6 Sep 8, 2022
This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Yogi-Optimizer_Keras This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras) The NeurIPS-Paper can be found here: http://papers.nips.c

null 14 Sep 13, 2022
Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

Edge Impulse 8 Nov 2, 2022
Classification models 1D Zoo - Keras and TF.Keras

Classification models 1D Zoo - Keras and TF.Keras This repository contains 1D variants of popular CNN models for classification like ResNets, DenseNet

Roman Solovyev 12 Jan 6, 2023
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

Sunbow Liu 22 Nov 25, 2022
Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

Elad Hoffer 514 Nov 17, 2022
Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

OFA Sys 1.4k Jan 8, 2023
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 161 Dec 8, 2022
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

zhaohu xing 112 Dec 16, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group 897 Jan 5, 2023
Sequence to Sequence Models with PyTorch

Sequence to Sequence models with PyTorch This repository contains implementations of Sequence to Sequence (Seq2Seq) models in PyTorch At present it ha

Sandeep Subramanian 708 Dec 19, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 195 Dec 17, 2022
Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

Cornelius Roemer 24 Oct 26, 2022
An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Rugby score prediction An end-to-end machine learning web app to predict rugby scores Overview An demo project to provide a high-level overview of the

null 34 May 24, 2022
Distributed Deep learning with Keras & Spark

Elephas: Distributed Deep Learning with Keras & Spark Elephas is an extension of Keras, which allows you to run distributed deep learning models at sc

Max Pumperla 1.6k Jan 5, 2023
QKeras: a quantization deep learning library for Tensorflow Keras

QKeras github.com/google/qkeras QKeras 0.8 highlights: Automatic quantization using QKeras; Stochastic behavior (including stochastic rouding) is disa

Google 437 Jan 3, 2023
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

Microsoft 5.7k Jan 9, 2023
Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

Packt 1.5k Jan 3, 2023