Hi,
I am doing some work about RL, and very interested in the two algorithms. I have tried to train your models both on CPU and GPU, however, both outputted "out of memory" error. The memory in use was keeping increasing.
It seems that the data and/or the model in former steps are not released . And the code is very similar to the example, as follows:
action = agent.select_action(state, ounoise, param_noise)
next_state, reward, done, info = env.step(action.cpu().numpy()[0])
total_numsteps += 1
episode_reward += reward
action = torch.Tensor(action.cpu())
mask = torch.Tensor([not done])
next_state = torch.Tensor(next_state.cpu())
reward = torch.Tensor([reward])
# pdb.set_trace()
memory.push(state, action, mask, next_state, reward)
state = next_state
if len(memory) > args.batch_size:
for _ in range(args.updates_per_step):
transitions = memory.sample(args.batch_size)
batch = Transition(*zip(*transitions))
value_loss, policy_loss = agent.update_parameters(batch)
writer.add_scalar('loss/value', value_loss, updates)
writer.add_scalar('loss/policy', policy_loss, updates)
updates += 1
Would you please help to solve the problem? Thanks in advance