Seq2seq - Sequence to Sequence Learning with Keras

Fariz Rahman

Last update: Dec 18, 2022

Related tags

Deep Learning seq2seq

Overview

Seq2seq

Sequence to Sequence Learning with Keras

Hi! You have just found Seq2Seq. Seq2Seq is a sequence to sequence learning add-on for the python deep learning library Keras. Using Seq2Seq, you can build and train sequence-to-sequence neural network models in Keras. Such models are useful for machine translation, chatbots (see [4]), parsers, or whatever that comes to your mind.

Getting started

Seq2Seq contains modular and reusable layers that you can use to build your own seq2seq models as well as built-in models that work out of the box. Seq2Seq models can be compiled as they are or added as layers to a bigger model. Every Seq2Seq model has 2 primary layers : the encoder and the decoder. Generally, the encoder encodes the input sequence to an internal representation called 'context vector' which is used by the decoder to generate the output sequence. The lengths of input and output sequences can be different, as there is no explicit one on one relation between the input and output sequences. In addition to the encoder and decoder layers, a Seq2Seq model may also contain layers such as the left-stack (Stacked LSTMs on the encoder side), the right-stack (Stacked LSTMs on the decoder side), resizers (for shape compatibility between the encoder and the decoder) and dropout layers to avoid overfitting. The source code is heavily documented, so lets go straight to the examples:

A simple Seq2Seq model:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=8)
model.compile(loss='mse', optimizer='rmsprop')

That's it! You have successfully compiled a minimal Seq2Seq model! Next, let's build a 6 layer deep Seq2Seq model (3 layers for encoding, 3 layers for decoding).

Deep Seq2Seq models:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=8, depth=3)
model.compile(loss='mse', optimizer='rmsprop')

Notice that we have specified the depth for both encoder and decoder as 3, and your model has a total depth of 3 + 3 = 6. You can also specify different depths for the encoder and the decoder. Example:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=20, depth=(4, 5))
model.compile(loss='mse', optimizer='rmsprop')

Notice that the depth is specified as tuple, (4, 5). Which means your encoder will be 4 layers deep whereas your decoder will be 5 layers deep. And your model will have a total depth of 4 + 5 = 9.

Advanced Seq2Seq models:

Until now, you have been using the SimpleSeq2Seq model, which is a very minimalistic model. In the actual Seq2Seq implementation described in [1], the hidden state of the encoder is transferred to decoder. Also, the output of decoder at each timestep becomes the input to the decoder at the next time step. To make things more complicated, the hidden state is propogated throughout the LSTM stack. But you have no reason to worry, as we have a built-in model that does all that out of the box. Example:

import seq2seq
from seq2seq.models import Seq2Seq

model = Seq2Seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4)
model.compile(loss='mse', optimizer='rmsprop')

Note that we had to specify the complete input shape, including the samples dimensions. This is because we need a static hidden state(similar to a stateful RNN) for transferring it across layers. (Update : Full input shape is not required in the latest version, since we switched to Recurrent Shop backend). By the way, Seq2Seq models also support the stateful argument, in case you need it.

You can also experiment with the hidden state propogation turned off. Simply set the arguments broadcast_state and inner_broadcast_state to False.

Peeky Seq2seq model:

Let's not stop there. Let's build a model similar to cho et al 2014, where the decoder gets a 'peek' at the context vector at every timestep.

To achieve this, simply add the argument peek=True:

import seq2seq
from seq2seq.models import Seq2Seq

model = Seq2Seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4, peek=True)
model.compile(loss='mse', optimizer='rmsprop')

Seq2seq model with attention:

Let's not stop there either. In all the models described above, there is no allignment between the input sequence elements and the output sequence elements. But for machine translation, learning a soft allignment between the input and output sequences imporves performance.[3]. The Seq2seq framework includes a ready made attention model which does the same. Note that in the attention model, there is no hidden state propogation, and a bidirectional LSTM encoder is used by default. Example:

import seq2seq
from seq2seq.models import AttentionSeq2Seq

model = AttentionSeq2Seq(input_dim=5, input_length=7, hidden_dim=10, output_length=8, output_dim=20, depth=4)
model.compile(loss='mse', optimizer='rmsprop')

As you can see, in the attention model you need not specify the samples dimension as there are no static hidden states involved(But you have to if you are building a stateful Seq2seq model). Note: You can set the argument bidirectional=False if you wish not to use a bidirectional encoder.

Final Words

That's all for now. Hope you love this library. For any questions you might have, create an issue and I will get in touch. You can also contribute to this project by reporting bugs, adding new examples, datasets or models.

Installation:

sudo pip install git+https://github.com/farizrahman4u/seq2seq.git

Requirements:

Working Example:

Training Seq2seq with movie subtitles - Thanks to Nicolas Ivanov

Papers:

Comments

Using Seq2Seq models in modular way (nesting models) results in MissingInputError
Hi! Sorry for crossposting this (I also opened this issue on the Keras main repo), but I figured maybe it's actually related to seq2seq or recurrentshop internals.

I'm trying to use a Seq2Seq model as follows:

input = Input(shape=(maxlen,)) one_hot = Lambda( lambda x: K.one_hot(K.cast(x, dtype="int32"), nb_classes=num_inputs), output_shape=(maxlen, num_inputs) )(input) output_seq = Seq2Seq( input_shape=(maxlen, num_inputs), hidden_dim=hidden_dim, output_length=out_maxlen, output_dim=num_inputs, depth=2, peek=True )(one_hot) predicted = TimeDistributed(Activation("softmax"))(output_seq) model = Model(input, predicted) return model

Which compiles fine, but when I try to fit the model using my (num_samples, maxlen) shaped matrix, Theano complains that input_2 was not provided - which, as it turns out, is the input layer of the Seq2Seq model. I was hoping this layer would be fed the output of my Lambda layer automatically, but apparently this does not work. Is what I am trying to do possible? I realize I could just copy and slightly alter the Seq2Seq code, but of course I'd prefer just using the library for more maintainable code.

More precise exception output:

theano.gof.fg.MissingInputError: ("An input of the graph, used to compute Reshape{2}(input_2, TensorConstant{[-1 53]}), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.", input_2)
opened by phdowling 21
Support for teacher forcing?

In decoders, we usually feed the previous timestep's output vector on to the next input vector, language model style. However, in teacher forcing, the ground-truth predictions of the previous timestep are always used, rather than the actual prediction. I suppose a model could still train even without teacher forcing, but often it helps training quite a bit from what I read.

I do not see a way of doing this without providing the teacher inputs as an explicit input to the keras model, though I might be wrong. Now, since in this library we do not supply such inputs I'm guessing this library does not implement teacher forcing, is this correct? If so, are there any plans to perhaps support it in the future?

opened by phdowling 20
workable example needed

Dear developers, thank you for your work! It would be of great help to use your seq2seq implementation, however after a considerable amount of time and efforts I still can't do it due to the lack of documentation and examples. Keras docs don't tell much about seq2seq mapping either. So please add a simple workable code example that would demostrate the usage of your library: how to prepare the input data and how to implement the training and predicting procedures. Would be greatly appriciated.

opened by nicolas-ivanov 20

Error on saving model

bwlf02:~/data/seq2seq$ python
Python 2.7.12 (default, Jun 28 2016, 17:49:40) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import seq2seq
Using Theano backend.
>>> from seq2seq.models import AttentionSeq2Seq
>>> 
>>> model = AttentionSeq2Seq(input_dim=5, input_length=7, hidden_dim=10, output_length=8, output_dim=20, depth=4)
>>> model.compile(loss='mse', optimizer='rmsprop')
>>> model.save('1.h5', overwrite=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/is33/Envs/alexa_seq2seq/lib/python2.7/site-packages/keras/engine/topology.py", line 2423, in save
    save_model(self, filepath, overwrite)
  File "/home/is33/Envs/alexa_seq2seq/lib/python2.7/site-packages/keras/models.py", line 52, in save_model
    'config': model.get_config()
  File "/home/is33/Envs/alexa_seq2seq/lib/python2.7/site-packages/keras/engine/topology.py", line 2285, in get_config
    layer_config = layer.get_config()
  File "/home/is33/data/recurrentshop/recurrentshop/engine.py", line 497, in get_config
    config['model'] = self.model.get_config()
  File "/home/is33/Envs/alexa_seq2seq/lib/python2.7/site-packages/keras/models.py", line 967, in get_config
    'config': layer.get_config()})
  File "seq2seq/cells.py", line 110, in get_config
    base_config = super(LSTMDecoderCell, self).get_config()
TypeError: super(type, obj): obj must be an instance or subtype of type

opened by ishalyminov 11

Add readout_activation param to models
The enables avoiding getting stuck in a NaN loss hole when training. This workaround let's the user fix #189

Example usage:

model = Seq2Seq(input_dim=in_dim, input_length=MAXLENGTH, hidden_dim=HIDDEN_SIZE, output_length=MAXLENGTH, output_dim=out_dim, depth=LAYERS, peek=True, readout_activation='softmax')

I do not fully understand the implications of using softmax as the output activation layer, but in my own project (https://github.com/Automattic/wp-translate) setting the output to softmax using this code does seem to have gotten me past getting stuck with NaN during training.
opened by gibrown 9

IndexError: index 3 is out of bounds for axis 0 with size 3

Dear humans, either I'm missimg something or there is a bug in Fariz's seq2seq implementation. What's your bet?

Code:

import numpy as np
from keras.models import Sequential
from seq2seq.seq2seq import Seq2seq


vocab_size = 10 #number of words
maxlen = 3 #length of input sequence and output sequence
embedding_dim = 5 #word embedding size
hidden_dim = 50 #memory size of seq2seq
batch_size = 7

seq2seq = Seq2seq(input_length=maxlen,
                  input_dim=embedding_dim,
                  hidden_dim=hidden_dim,
                  output_dim=vocab_size,
                  output_length=maxlen,
                  batch_size=batch_size,
                  depth=1)

print 'Build model ...'
model = Sequential()
model.add(seq2seq)
model.compile(loss='mse', optimizer='adam')

print 'Generate dummy data ...'
train_examples_num = batch_size
X = np.zeros((train_examples_num, maxlen, embedding_dim))
Y = np.zeros((train_examples_num, maxlen, vocab_size))

for train_example_idx in xrange(train_examples_num):
    for word_idx in xrange(maxlen):
        w2v_vector = np.random.rand(1, embedding_dim)[0]
        X[train_example_idx][word_idx] = w2v_vector

        bool_vector = np.zeros(vocab_size)
        bool_vector[np.random.choice(vocab_size)] = 1
        Y[train_example_idx][word_idx] = bool_vector

print X.shape, X
print Y.shape, Y

print 'Fit data ...'
model.fit(X, Y)

Log:

Build model ...
/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
  from scan_perform.scan_perform import *
Generate dummy data ...
(7, 3, 5) [[[ 0.88256245  0.77128004  0.11195964  0.0918089   0.57508599]
  [ 0.23917097  0.69324662  0.5007371   0.08220433  0.25601655]
  [ 0.31997691  0.96215576  0.37516188  0.4564258   0.44645146]]

 [[ 0.1164474   0.68527686  0.48801347  0.06237132  0.64461641]
  [ 0.21418609  0.56414103  0.69280567  0.09577648  0.46501309]
  [ 0.59522824  0.82593701  0.8952664   0.61032139  0.60784708]]

 [[ 0.50277342  0.18204284  0.6920746   0.23992536  0.5031889 ]
  [ 0.24719549  0.39098328  0.84927183  0.93091596  0.93981078]
  [ 0.76817661  0.68241358  0.97509582  0.78777374  0.41076285]]

 [[ 0.83762506  0.76151013  0.06292322  0.71097064  0.77048028]
  [ 0.78948919  0.77401108  0.39082489  0.66905667  0.54795132]
  [ 0.74940861  0.26011439  0.23257989  0.87033028  0.88954607]]

 [[ 0.98032484  0.29076576  0.76085615  0.53828208  0.92028479]
  [ 0.81111357  0.52959467  0.41101679  0.39434533  0.47918241]
  [ 0.18741232  0.68735943  0.27534715  0.18796185  0.89010293]]

 [[ 0.00484476  0.38136868  0.55200039  0.36352682  0.65304447]
  [ 0.19502873  0.86442676  0.82170956  0.90937185  0.93152998]
  [ 0.71814645  0.47181875  0.99475651  0.24588243  0.13357496]]

 [[ 0.92716079  0.83195725  0.50047687  0.86742848  0.27778597]
  [ 0.90902709  0.60421839  0.17206286  0.53972434  0.9863197 ]
  [ 0.63227496  0.14045515  0.88635036  0.72415621  0.88298206]]]
(7, 3, 10) [[[ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]]

 [[ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]
  [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
  [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]]

 [[ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
  [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]

 [[ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.  0.  1.  0.]]

 [[ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]
  [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]]

 [[ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]
  [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]]

 [[ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
  [ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]]]
Fit data ...
Epoch 1/100
Traceback (most recent call last):
  File "/home/nicolas/Code/seq2seq/bin/try.py", line 43, in <module>
    model.fit(X, Y)
  File "build/bdist.linux-x86_64/egg/keras/models.py", line 495, in fit
  File "build/bdist.linux-x86_64/egg/keras/models.py", line 216, in _fit
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 606, in __call__
    storage_map=self.fn.storage_map)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 595, in __call__
    outputs = self.fn()
  File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 768, in rval
    r = p(n, [x[0] for x in i], o)
  File "/usr/local/lib/python2.7/dist-packages/theano/tensor/subtensor.py", line 2088, in perform
    out[0] = inputs[0].__getitem__(inputs[1:])
IndexError: index 3 is out of bounds for axis 0 with size 3
Apply node that caused the error: AdvancedSubtensor(Subtensor{int64::}.0, Subtensor{int64}.0, Subtensor{int64}.0)
Inputs types: [TensorType(float64, 3D), TensorType(int64, vector), TensorType(int64, vector)]
Inputs shapes: [(3, 7, 10), (21,), (21,)]
Inputs strides: [(560, 80, 8), (8,), (8,)]
Inputs values: ['not shown', 'not shown', 'not shown']

Backtrace when the node is created:
  File "build/bdist.linux-x86_64/egg/keras/models.py", line 75, in weighted
    filtered_y_pred = y_pred[weights.nonzero()[:-1]]

Repo with the source for your convenience: https://github.com/nicolas-ivanov/seq2seq

opened by nicolas-ivanov 9

Error when import seq2seq

Hello,

I got this error when I tried to import seq2seq:

import seq2seq Using TensorFlow backend. Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/site-packages/seq2seq/init.py", line 1, in from .cells import * File "/usr/local/lib/python2.7/site-packages/seq2seq/cells.py", line 1, in from recurrentshop import RNNCell, LSTMCell, weight File "/usr/local/lib/python2.7/site-packages/recurrentshop/init.py", line 1, in from .engine import RNNCell, RecurrentContainer, weight File "/usr/local/lib/python2.7/site-packages/recurrentshop/engine.py", line 32, in _backend = getattr(K, K.backend() + '_backend') AttributeError: 'module' object has no attribute 'backend'

What should I do?

BTW, I just installed the latest version of seq2seq and recurrentshop. The following are the version of my python and keras.

python: 2.7.11 keras: 1.0.5

Thanks!

opened by ay2456 8
Different input lengths

I'm trying to train something like character level transliteration translator. But input length varies from 1 to 20. I want to avoid bucketing/padding proposed in tensorflow. Is there a way to have inputs of different lengths? I think that RNN-s are capable of solving that kind of problems...

opened by TigranGalstyan 8
RuntimeError: maximum recursion depth exceeded while calling a Python object

Hi, thanks for sharing the great code. I met an error when using the AttentionSeq2seq model.The error information:

Traceback (most recent call last): File "train.py", line 182, in model.compile(loss='mse', optimizer='adam') File "build/bdist.linux-x86_64/egg/keras/models.py", line 467, in compile File "build/bdist.linux-x86_64/egg/keras/optimizers.py", line 250, in get_updates File "build/bdist.linux-x86_64/egg/keras/optimizers.py", line 47, in get_gradients File "build/bdist.linux-x86_64/egg/keras/backend/theano_backend.py", line 402, in gradients File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 545, in grad grad_dict, wrt, cost_name) File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 1283, in _populate_grad_dict rval = [access_grad_cache(elem) for elem in wrt] File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 1241, in access_grad_cache term = access_term_cache(node)[idx] File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 951, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 1241, in access_grad_cache term = access_term_cache(node)[idx] File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 951, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] ... ...

File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1030, in local_traverse rval += local_traverse(inp, x) File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1030, in local_traverse rval += local_traverse(inp, x) File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1030, in local_traverse rval += local_traverse(inp, x) File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1030, in local_traverse rval += local_traverse(inp, x) File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1023, in local_traverse if equal_computations([graph], [x]): File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 410, in equal_computations if x not in in_xs and x.type != y.type: File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gof/utils.py", line 127, in ne return not self == other File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/tensor/type.py", line 260, in eq return type(self) == type(other) and other.dtype == self.dtype
RuntimeError: maximum recursion depth exceeded while calling a Python object

When I use the Seq2seq layer, the model works well. However when i changes to the AttentionSeq2seq layer, i meet this error.Here is my model with AttentionSeq2seq:

model = Sequential() model.add(embeddings.Embedding(input_dim=len(chars)+1, output_dim=EMB_DIM, input_length=MAXLEN)) seq2seq = AttentionSeq2seq(input_dim=EMB_DIM, input_length=MAXLEN, hidden_dim=HIDDEN_SIZE, output_length=MAXLEN, output_dim=len(chars), depth=1) model.add(seq2seq) model.compile(loss='mse', optimizer='adam')

And I am using the keras=0.3.1, numpy=1.10.4, seq2seq=0.0.2

Besides, I changed the parameter "masking=False" to "mask=None" in the decoders.py, due to an unexpected param error, I don't know if it is the reason of the error mentioned before.

Thanks a lot

opened by qfzhu 8
Cannot import seq2seq

Hi,

I can't import the seq2seq library. With Keras 1.1.0, I get cannot import name initializers

With Keras 2.0.2, I get cannot import name weight

I'm using recurrentshop-1.0.0. I tried with recurrentshop-0.0.1 as mentioned in #172 but still got an error.

Does anyone know what I should do to make it work ?

Thanks :+1:

opened by Blockost 7
Modified SimpleSeq2Seq in order to be able to run it in Keras 2.

Maybe this changes shouldn't go into the repo, but I decided to share my modifications. With this changes I was able to run model training.

For some reasons first Dropout in the decoder failed to work, so I moved it to the final model.

opened by nchervyakov 7
ValueError: An operation has `None` for gradient.

I meet a problem about gradient!!!

the code is :

` from seq2seq import SimpleSeq2Seq, Seq2Seq, AttentionSeq2Seq import numpy as np

input_length = 5 input_dim = 3

output_length = 3 output_dim = 4

samples = 100 hidden_dim = 24

x = np.random.random((samples, input_length, input_dim)) y = np.random.random((samples, output_length, output_dim))

model = SimpleSeq2Seq(input_shape=(5, 3), hidden_dim=10, output_length=3, output_dim=4, depth=(4, 5))

model.compile(loss='mse', optimizer='sgd') model.fit(x, y, nb_epoch=10) ` And the error is:

Traceback (most recent call last):

File "", line 1, in model.fit(x, y, nb_epoch=10)

File "E:\Anaconda\envs\tf2\lib\site-packages\keras\engine\training.py", line 1213, in fit self._make_train_function()

File "E:\Anaconda\envs\tf2\lib\site-packages\keras\engine\training.py", line 316, in _make_train_function loss=self.total_loss)

File "E:\Anaconda\envs\tf2\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper return func(*args, **kwargs)

File "E:\Anaconda\envs\tf2\lib\site-packages\keras\optimizers.py", line 259, in get_updates grads = self.get_gradients(loss, params)

File "E:\Anaconda\envs\tf2\lib\site-packages\keras\optimizers.py", line 93, in get_gradients raise ValueError('An operation has None for gradient. '

ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

So, what should I do to solve it.

Thanks for any help.

opened by cui-xiaoang96 2
error:name 'K' is not defined

afte i use sudo pip install git+https://github.com/farizrahman4u/seq2seq.git,I write import seq2seq in python command line , but i meet this bug:

Using TensorFlow backend. Traceback (most recent call last): File "", line 1, in File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 668, in _load_unlocked File "", line 638, in _load_backward_compatible File "/Users/shawn/anaconda3/lib/python3.7/site-packages/seq2seq-1.0.0-py3.7.egg/seq2seq/init.py", line 1, in File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 668, in _load_unlocked File "", line 638, in _load_backward_compatible File "/Users/shawn/anaconda3/lib/python3.7/site-packages/seq2seq-1.0.0-py3.7.egg/seq2seq/cells.py", line 1, in File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 668, in _load_unlocked File "", line 638, in _load_backward_compatible File "/Users/shawn/anaconda3/lib/python3.7/site-packages/recurrentshop-1.0.0-py3.7.egg/recurrentshop/init.py", line 1, in File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 668, in _load_unlocked File "", line 638, in _load_backward_compatible File "/Users/shawn/anaconda3/lib/python3.7/site-packages/recurrentshop-1.0.0-py3.7.egg/recurrentshop/engine.py", line 10, in NameError: name 'K' is not defined

How can i solve this? thank u

opened by fsdfsd123 3

Bad results of a simple translation task. Ask for help...:(

Actually, I plan to do a machine translation project with this seq2seq module. Before that, I just did a simple test and got a very bad result. I don't know where goes wrong. Pls help me... Here's the process:

#1. traning set

def generate_sequence(length, n_unique):
    return [randint(1, n_unique-1) for _ in range(length)]
x = np.array(generate_sequence(100000,100)).reshape(10000,10)
y = np.array(generate_sequence(50000,100)).reshape(10000,5)
x_encoder_input_data = to_categorical(x)
y_decoder_target_data = to_categorical(y)

#x_encoder_input_data.shape = (10000, 10, 100)
#10000 training data, x_input_length=10,x_input_dim=100
#y_decoder_target_data.shape = (10000, 5, 100)

#2. building&training model

model = SimpleSeq2Seq(input_dim=100,
             input_length=10, 
            output_dim=100, 
            hidden_dim=128,
            output_length=5)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
model.fit(x_encoder_input_data,  y_decoder_target_data, batch_size=128, epochs=500)

#3. the partial losses

Epoch 1/500
10000/10000  - 44s 4ms/step - loss: 7.2350 - acc: 0.0108
Epoch 2/500
10000/10000  - 20s 2ms/step - loss: 6.9500 - acc: 0.0104
Epoch 3/500
10000/10000  - 22s 2ms/step - loss: 10.7297 - acc: 0.0103
Epoch 4/500
10000/10000  - 20s 2ms/step - loss: 8.6834 - acc: 0.0096
Epoch 5/500
10000/10000  - 21s 2ms/step - loss: 8.4943 - acc: 0.0099
Epoch 6/500
10000/10000  - 19s 2ms/step - loss: 8.4487 - acc: 0.0100
Epoch 7/500
10000/10000  - 19s 2ms/step - loss: 8.6318 - acc: 0.0099
Epoch 8/500
10000/10000  - 18s 2ms/step - loss: 8.5765 - acc: 0.0099
Epoch 9/500
10000/10000  - 20s 2ms/step - loss: 8.4753 - acc: 0.0099
Epoch 10/500
10000/10000  - 20s 2ms/step - loss: 8.3738 - acc: 0.0099
Epoch 11/500
10000/10000  - 20s 2ms/step - loss: 8.3999 - acc: 0.0098
Epoch 12/500
10000/10000  - 19s 2ms/step - loss: 8.3108 - acc: 0.0099
Epoch 13/500
10000/10000  - 19s 2ms/step - loss: 8.3457 - acc: 0.0099
Epoch 14/500
10000/10000  - 19s 2ms/step - loss: 8.4852 - acc: 0.0098
Epoch 15/500
10000/10000  - 18s 2ms/step - loss: 8.4749 - acc: 0.0099
Epoch 16/500
10000/10000  - 18s 2ms/step - loss: 8.5881 - acc: 0.0098
Epoch 17/500
10000/10000  - 19s 2ms/step - loss: 8.3868 - acc: 0.0099
Epoch 18/500
10000/10000   - 21s 2ms/step - loss: 8.2499 - acc: 0.0098
Epoch 19/500
10000/10000   - 20s 2ms/step - loss: 8.4659 - acc: 0.0099
Epoch 20/500
10000/10000  - 20s 2ms/step - loss: 7.8421 - acc: 0.0099
Epoch 21/500
10000/10000   - 21s 2ms/step - loss: 7.6197 - acc: 0.0099
Epoch 22/500
10000/10000  - 20s 2ms/step - loss: 7.6193 - acc: 0.0099
Epoch 23/500
10000/10000   - 19s 2ms/step - loss: 7.6193 - acc: 0.0099
Epoch 24/500
10000/10000   - 21s 2ms/step - loss: 7.6193 - acc: 0.0099
Epoch 25/500
10000/10000   - 19s 2ms/step - loss: 7.6193 - acc: 0.0099
Epoch 26/500
10000/10000  - 22s 2ms/step - loss: 7.6193 - acc: 0.0099
Epoch 27/500
10000/10000  - 22s 2ms/step - loss: 7.6193 - acc: 0.0099
··· ···

#4. predicting

for seq_index in range(6):
    predictions = model.predict(x_encoder_input_data[seq_index:seq_index+1])
    predicted_list=[]

    for prediction_vector in predictions:
        for pred in prediction_vector:
            next_token = np.argmax(pred)
            predicted_list.append(next_token)
            
    print('-')
    print('Input sentence:', X[seq_index])
    print('Decoded sentence:', predicted_list)
    print('Target sentence:', y[seq_index])

#5. the predicting results:

-
Input sentence: [28, 2, 46, 12, 21, 6]      #  x
Decoded sentence: [78, 78, 78, 78, 66] # y_predict
Target sentence: [82 22 82 41 27]          # y
-
Input sentence: [12, 20, 45, 28, 18, 42]
Decoded sentence: [78, 78, 66, 66, 66]
Target sentence: [43 36 30 13 64]
-
Input sentence: [3, 43, 45, 4, 33, 27]
Decoded sentence: [78, 78, 66, 66, 66]
Target sentence: [90 20 56 23 32]
-
Input sentence: [34, 50, 21, 20, 11, 6]
Decoded sentence: [78, 78, 78, 78, 66]
Target sentence: [27 57 50 57 81]
-
Input sentence: [47, 42, 14, 2, 31, 6]
Decoded sentence: [78, 78, 78, 78, 66]
Target sentence: [77 94 47 26 67]
-
Input sentence: [20, 24, 34, 31, 37, 25]
Decoded sentence: [78, 78, 66, 66, 66]
Target sentence: [11 48 99 67 66]

opened by JillinJia 0

How to use seq2seq for simple sequences

Hi

I have a simple categorical sequence data set like Target has 3 classes

X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | X11 | X12 | Target -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 0 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 1

Will be able to use seq2seq for these king of problems? can you sample syntax

Thank you.

opened by rt3722 1

Owner

Fariz Rahman

GitHub

Using a Seq2Seq RNN architecture via TensorFlow to predict future Bitcoin prices

Recurrent Bitcoin Network A Data Science Thesis Project About This repository contains the source code for implementing Bitcoin price prediciton using

6 Sep 8, 2022

This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Yogi-Optimizer_Keras This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras) The NeurIPS-Paper can be found here: http://papers.nips.c

14 Sep 13, 2022

Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

8 Nov 2, 2022

Classification models 1D Zoo - Keras and TF.Keras

Classification models 1D Zoo - Keras and TF.Keras This repository contains 1D variants of popular CNN models for classification like ResNets, DenseNet

12 Jan 6, 2023

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

22 Nov 25, 2022

Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

514 Nov 17, 2022

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

1.4k Jan 8, 2023

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

161 Dec 8, 2022

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

112 Dec 16, 2022

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

5.7k Jan 9, 2023

Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

1.5k Jan 3, 2023

Seq2seq - Sequence to Sequence Learning with Keras

Related tags

Overview

Seq2seq

Getting started

Final Words

Comments

Owner

Fariz Rahman

Using a Seq2Seq RNN architecture via TensorFlow to predict future Bitcoin prices

This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Classification models 1D Zoo - Keras and TF.Keras

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Sequence-to-Sequence learning using PyTorch

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Sequence to Sequence Models with PyTorch

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

An implementation of a sequence to sequence neural network using an encoder-decoder

Sequence lineage information extracted from RKI sequence data repo

An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Distributed Deep learning with Keras & Spark

QKeras: a quantization deep learning library for Tensorflow Keras

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)