Update: I wonder if repo update is possible with torch.amp
or mixed precision applied.
Hello,
I'd like to ask, with my dataset and machine, if it is normal to see out of memory or if I might have some programming issue.
RuntimeError: CUDA out of memory. Tried to allocate 260.00 MiB (GPU 0; 15.78 GiB total capacity; 14.02 GiB already allocated; 198.19 MiB free; 14.13 GiB reserved in total by PyTorch)
I using AWS p3.8xlarge(4 Tesla V100s), trying to train a CycleGAN with 5055 images of 1024x1024 resolution.
I checked that resized dataset of 512x512 works with batch size 4, but with 1024x1024, even batch size 1 doesn't work.
I think we need p4d.24xlarge for this project, but it's hard to get the instance due to the lack of zone capacity.
possible tries are:
-reduce num of the dataset (but I think 5055 images are still small for training) My colleague thinks the model loads the whole dataset at once, and that's the reason we need to reduce the dataset.
-find a memory leak?
any comments or hints are appreciated.
below is the log for reference.
train.py --dataroot database/face2smile \
> --model cycle_gan \
> --log_dir logs/cycle_gan/face2smile/teacher_1080 \
> --netG inception_9blocks \
> --real_stat_A_path real_stat_1080/face2smile_A.npz \
> --real_stat_B_path real_stat_1080/face2smile_B.npz \
> --batch_size 1 \
> --num_threads 1 \
> --gpu_ids 0,1,2,3 \
> --norm_affine \
> --norm_affine_D \
> --channels_reduction_factor 6 \
> --kernel_sizes 1 3 5 \
> --save_latest_freq 10000 --save_epoch_freq 5 \
> --nepochs 1 --nepochs_decay 0 \
> --preprocess none
----------------- Options ---------------
active_fn: nn.ReLU
active_fn_D: nn.LeakyReLU
aspect_ratio: 1.0
batch_size: 4 [default: 1]
beta1: 0.5
channels: None
channels_reduction_factor: 6 [default: 1]
cityscapes_path: database/cityscapes-origin
crop_size: 256, 256
dataroot: database/face2smile [default: None]
dataset_mode: unaligned
direction: AtoB
display_winsize: 256
drn_path: drn-d-105_ms_cityscapes.pth
dropout_rate: 0
epoch_base: 1
eval_batch_size: 1
gan_mode: lsgan
gpu_ids: 0,1,2,3 [default: 0]
init_gain: 0.02
init_type: normal
input_nc: 3
isTrain: True [default: None]
iter_base: 1
kernel_sizes: [1, 3, 5] [default: [3, 5, 7]]
lambda_A: 10.0
lambda_B: 10.0
lambda_identity: 0.5
load_in_memory: False
load_size: 286
log_dir: logs/cycle_gan/face2smile/teacher_1080 [default: logs]
lr: 0.0002
lr_decay_iters: 50
lr_policy: linear
max_dataset_size: -1
model: cycle_gan [default: pix2pix]
moving_average_decay: 0.0
moving_average_decay_adjust: False
moving_average_decay_base_batch: 32
n_layers_D: 3
ndf: 64
nepochs: 1 [default: 100]
nepochs_decay: 0 [default: 100]
netD: n_layers
netG: inception_9blocks
ngf: 64
no_flip: False
norm: instance
norm_affine: True [default: False]
norm_affine_D: True [default: False]
norm_epsilon: 1e-05
norm_momentum: 0.1
norm_student: instance
norm_track_running_stats: False
num_threads: 32 [default: 4]
output_nc: 3
padding_type: reflect
phase: train
pool_size: 50
preprocess: none [default: resize_and_crop]
print_freq: 100
real_stat_A_path: real_stat_1080/face2smile_A.npz [default: None]
real_stat_B_path: real_stat_1080/face2smile_B.npz [default: None]
restore_D_A_path: None
restore_D_B_path: None
restore_G_A_path: None
restore_G_B_path: None
restore_O_path: None
save_epoch_freq: 5 [default: 20]
save_latest_freq: 10000 [default: 20000]
seed: 233
serial_batches: False
table_path: datasets/table.txt
tensorboard_dir: None
----------------- End -------------------
train.py --dataroot database/face2smile --model cycle_gan --log_dir logs/cycle_gan/face2smile/teacher_1080 --netG inception_9blocks --real_stat_A_path real_stat_1080/face2smile_A.npz --real_stat_B_path real_stat_1080/face2smile_B.npz --batch_size 4 --num_threads 32 --gpu_ids 0,1,2,3 --norm_affine --norm_affine_D --channels_reduction_factor 6 --kernel_sizes 1 3 5 --save_latest_freq 10000 --save_epoch_freq 5 --nepochs 1 --nepochs_decay 0 --preprocess none
dataset [UnalignedDataset] was created
The number of training images = 5055
data shape is: channel=3, height=1024, width=1024.
initialize network with normal
initialize network with normal
initialize network with normal
initialize network with normal
dataset [SingleDataset] was created
dataset [SingleDataset] was created
/home/ubuntu/.local/lib/python3.9/site-packages/torchvision/models/inception.py:80: FutureWarning: The default weight initialization of inception_v3 will be changed in future releases of torchvision. If you wish to keep the old behavior (which leads to long initialization times due to scipy/scipy#11299), please set init_weights=True.
warnings.warn('The default weight initialization of inception_v3 will be changed in future releases of '
model [CycleGANModel] was created
---------- Networks initialized -------------
DataParallel(
(module): InceptionGenerator(
(down_sampling): Sequential(
(0): ReflectionPad2d((3, 3, 3, 3))
(1): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1))
(2): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(3): ReLU(inplace=True)
(4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(5): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(6): ReLU(inplace=True)
(7): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(8): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(9): ReLU(inplace=True)
)
(features): Sequential(
(0): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(1): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(2): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(3): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(4): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(5): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(6): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(7): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(8): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
)
(up_sampling): Sequential(
(0): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
(1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(2): ReLU(inplace=True)
(3): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
(4): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(5): ReLU(inplace=True)
(6): ReflectionPad2d((3, 3, 3, 3))
(7): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1))
(8): Tanh()
)
)
)
[Network G_A] Total number of parameters : 8.154 M
DataParallel(
(module): InceptionGenerator(
(down_sampling): Sequential(
(0): ReflectionPad2d((3, 3, 3, 3))
(1): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1))
(2): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(3): ReLU(inplace=True)
(4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(5): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(6): ReLU(inplace=True)
(7): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(8): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(9): ReLU(inplace=True)
)
(features): Sequential(
(0): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(1): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(2): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(3): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(4): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(5): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(6): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(7): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(8): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
)
(up_sampling): Sequential(
(0): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
(1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(2): ReLU(inplace=True)
(3): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
(4): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(5): ReLU(inplace=True)
(6): ReflectionPad2d((3, 3, 3, 3))
(7): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1))
(8): Tanh()
)
)
)
[Network G_B] Total number of parameters : 8.154 M
DataParallel(
(module): NLayerDiscriminator(
(model): Sequential(
(0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(3): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(4): LeakyReLU(negative_slope=0.2, inplace=True)
(5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(7): LeakyReLU(negative_slope=0.2, inplace=True)
(8): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
(9): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(10): LeakyReLU(negative_slope=0.2, inplace=True)
(11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
)
)
)
[Network D_A] Total number of parameters : 2.767 M
DataParallel(
(module): NLayerDiscriminator(
(model): Sequential(
(0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(3): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(4): LeakyReLU(negative_slope=0.2, inplace=True)
(5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(7): LeakyReLU(negative_slope=0.2, inplace=True)
(8): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
(9): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(10): LeakyReLU(negative_slope=0.2, inplace=True)
(11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
)
)
)
[Network D_B] Total number of parameters : 2.767 M
-----------------------------------------------
start_epoch: 1
end_epoch: 1
total_iter: 1
current memory allocated: 265.4296875
max memory allocated: 265.4296875
cached memory: 276.0
will set input data
Traceback (most recent call last):
File "/data/CAT/train.py", line 14, in <module>
trainer.start()
File "/data/CAT/trainer.py", line 159, in start
model.optimize_parameters(total_iter)
File "/data/CAT/models/cycle_gan_model.py", line 295, in optimize_parameters
self.forward()
File "/data/CAT/models/cycle_gan_model.py", line 235, in forward
self.rec_A = self.netG_B(self.fake_B)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/CAT/models/modules/inception_architecture/inception_generator.py", line 141, in forward
res = self.up_sampling(res)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/padding.py", line 173, in forward
return F.pad(input, self.padding, 'reflect')
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/functional.py", line 4014, in _pad
return torch._C._nn.reflection_pad2d(input, pad)
RuntimeError: CUDA out of memory. Tried to allocate 260.00 MiB (GPU 0; 15.78 GiB total capacity; 14.02 GiB already allocated; 198.19 MiB free; 14.13 GiB reserved in total by PyTorch)