Hi there, I encountered an error when trying the seq2seq_attn
example with iwslt14 dataset. The full error log is appended at the end of the post.
The error occurs when operating this line:
https://github.com/asyml/texar/blob/9c699e8143fd8ecb5d65a41ceef09c45832b9258/examples/seq2seq_attn/seq2seq_attn.py#L125
It indicates that the training stage is ok, and the bug occurs in the validation stage.
Here are two pieces of error logs:
2018-09-11 16:04:10.567407: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at tensor_array_ops.cc:1122 : Invalid argument: TensorArray bidirectional_rnn_encoder_2/bidirectional_rnn/fw/fw/dynamic_rnn/input_0_32362: Could not write to TensorArray index 0 because the value shape is [23,256] which is incompatible with the TensorArray's inferred element shape: [32,256] (consider setting infer_shape=False).
2018-09-11 16:04:10.567928: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at tensor_array_ops.cc:1122 : Invalid argument: TensorArray bidirectional_rnn_encoder_2/bidirectional_rnn/bw/bw/dynamic_rnn/input_0_32364: Could not write to TensorArray index 0 because the value shape is [23,256] which is incompatible with the TensorArray's inferred element shape: [32,256] (consider setting infer_shape=False).
......
Caused by op 'attention_rnn_decoder_5/tile_batch_1/Reshape', defined at:
File "seq2seq_attn.py", line 161, in <module>
main()
File "seq2seq_attn.py", line 93, in main
train_op, infer_outputs = build_model(batch, train_data)
File "seq2seq_attn.py", line 77, in build_model
max_decoding_length=60)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/texar/modules/decoders/beam_search_decode.py", line 193, in beam_search_decode
cell = decoder_or_cell._get_beam_search_cell(beam_width=beam_width)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/texar/modules/decoders/rnn_decoders.py", line 545, in _get_beam_search_cell
memory_seq_length, beam_width)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py", line 122, in tile_batch
return nest.map_structure(lambda t_: _tile_batch(t_, multiplier), t)
......
As there is no problem in training stage, I guess there might be something wrong in the implementation of beam_search_decode
.
The error can be described as: the tensor expects a dimension of 32
while we feed 23
instead.
And I find that the batch size is 32, and there are 887 validation examples in valid.de
, where 887 % 32 == 23
.
Also, add 'allow_smaller_final_batch': False
to the val
and test
item of config_iwslt14.py
can get rid of the error.
But this "fix" is not what we really want. Theoretically, we are supposed to run validation and test on all the dev/test samples.
I am using tensorflow 1.8.
Please let me know if I need to provide any other environment information.
The full logs:
$ python seq2seq_attn.py --config_model config_model --config_data config_iwslt14
2018-09-11 15:50:03.793617: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-11 15:50:04.038875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla P40 major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:02:00.0
totalMemory: 22.38GiB freeMemory: 22.21GiB
2018-09-11 15:50:04.038945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-09-11 15:50:04.384688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-11 15:50:04.384752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-09-11 15:50:04.385091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-09-11 15:50:04.385631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21549 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:02:00.0, compute capability: 6.1)
step=0, loss=481.5847
step=500, loss=101.2404
step=1000, loss=75.9185
step=1500, loss=102.7388
step=2000, loss=81.9897
step=2500, loss=64.7623
step=3000, loss=76.1445
step=3500, loss=81.1186
step=4000, loss=48.0918
step=4500, loss=54.7355
step=5000, loss=74.8126
2018-09-11 16:04:10.567407: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at tensor_array_ops.cc:1122 : Invalid argument: TensorArray bidirectional_rnn_encoder_2/bidirectional_rnn/fw/fw/dynamic_rnn/input_0_32362: Could not write to TensorArray index 0 because the value shape is [23,256] which is incompatible with the TensorArray's inferred element shape: [32,256] (consider setting infer_shape=False).
2018-09-11 16:04:10.567928: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at tensor_array_ops.cc:1122 : Invalid argument: TensorArray bidirectional_rnn_encoder_2/bidirectional_rnn/bw/bw/dynamic_rnn/input_0_32364: Could not write to TensorArray index 0 because the value shape is [23,256] which is incompatible with the TensorArray's inferred element shape: [32,256] (consider setting infer_shape=False).
Traceback (most recent call last):
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 230 values, but the requested shape has 320
[[Node: attention_rnn_decoder_5/tile_batch_1/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](attention_rnn_decoder_5/tile_batch_1/Tile/_4547, attention_rnn_decoder_5/tile_batch_1/concat)]]
[[Node: attention_rnn_decoder_6/decoder/while/LoopCond/_4655 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_790_attention_rnn_decoder_6/decoder/while/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopattention_rnn_decoder_6/decoder/while/BeamSearchDecoderStep/next_beam_word_ids/y/_4501)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "seq2seq_attn.py", line 161, in <module>
main()
File "seq2seq_attn.py", line 149, in main
val_bleu = _eval_epoch(sess, 'val')
File "seq2seq_attn.py", line 125, in _eval_epoch
sess.run(fetches, feed_dict=feed_dict)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 230 values, but the requested shape has 320
[[Node: attention_rnn_decoder_5/tile_batch_1/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](attention_rnn_decoder_5/tile_batch_1/Tile/_4547, attention_rnn_decoder_5/tile_batch_1/concat)]]
[[Node: attention_rnn_decoder_6/decoder/while/LoopCond/_4655 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_790_attention_rnn_decoder_6/decoder/while/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopattention_rnn_decoder_6/decoder/while/BeamSearchDecoderStep/next_beam_word_ids/y/_4501)]]
Caused by op 'attention_rnn_decoder_5/tile_batch_1/Reshape', defined at:
File "seq2seq_attn.py", line 161, in <module>
main()
File "seq2seq_attn.py", line 93, in main
train_op, infer_outputs = build_model(batch, train_data)
File "seq2seq_attn.py", line 77, in build_model
max_decoding_length=60)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/texar/modules/decoders/beam_search_decode.py", line 193, in beam_search_decode
cell = decoder_or_cell._get_beam_search_cell(beam_width=beam_width)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/texar/modules/decoders/rnn_decoders.py", line 545, in _get_beam_search_cell
memory_seq_length, beam_width)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py", line 122, in tile_batch
return nest.map_structure(lambda t_: _tile_batch(t_, multiplier), t)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 375, in map_structure
structure[0], [func(*x) for x in entries])
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 375, in <listcomp>
structure[0], [func(*x) for x in entries])
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py", line 122, in <lambda>
return nest.map_structure(lambda t_: _tile_batch(t_, multiplier), t)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py", line 90, in _tile_batch
([shape_t[0] * multiplier], shape_t[1:]), 0))
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 6113, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 230 values, but the requested shape has 320
[[Node: attention_rnn_decoder_5/tile_batch_1/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](attention_rnn_decoder_5/tile_batch_1/Tile/_4547, attention_rnn_decoder_5/tile_batch_1/concat)]]
[[Node: attention_rnn_decoder_6/decoder/while/LoopCond/_4655 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_790_attention_rnn_decoder_6/decoder/while/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopattention_rnn_decoder_6/decoder/while/BeamSearchDecoderStep/next_beam_word_ids/y/_4501)]]