Hi,
I think the training progresses well, and there does'nt seem to be any improvement after 10,000
iterations. Sometimes I get nan
s very early on, and have to restart, and I also got nan
s after 40,000
iterations.
May I ask how to serialise and save the DNC
at checkpoints? I set a check point after 100
iterations, and got the following error,
Using CUDA.
Iteration 0/50000
Avg. Logistic Loss: 0.6931
Iteration 50/50000
Avg. Logistic Loss: 0.6674
Iteration 100/50000
Avg. Logistic Loss: 0.4560
Saving Checkpoint ... Traceback (most recent call last):
File "train.py", line 183, in <module>
torch.save(ncomputer, f)
File "/home/ajay/anaconda3/envs/rllab3/lib/python3.5/site-packages/torch/serialization.py", line 120, in save
return _save(obj, f, pickle_module, pickle_protocol)
File "/home/ajay/anaconda3/envs/rllab3/lib/python3.5/site-packages/torch/serialization.py", line 186, in _save
pickler.dump(obj)
_pickle.PicklingError: Can't pickle <class 'memory.mem_tuple'>: attribute lookup mem_tuple on memory failed
PS - Sometimes I get nan
s very early on, and have to restart, and I also got nan
s after 40,000
iterations. I've seen this very often with Neural Turing Machines, so I guess it's inherent with these type of things?