Hello, thanks for the script.
When I do the following command to visualize the locations of sound sources
python sep_video.py ../data/translator.mp4 --model full --cam --out ../results/
I got a error:
Start time: 0.0
GPU = 0
Spectrogram samples: 128
2.145 2.135
100.0% complete, total time: 0:00:00. 0:00:00 per iteration. (01:57 PM Fri)
Struct(alg=sourcesep, augment_audio=False, augment_ims=True, augment_rms=False, base_lr=0.0001, batch_size=6, bn_last=True, bn_scale=True, both_videos_in_batch=True, cam=False, check_iters=1000, crop_im_dim=224, dilate=False, do_shift=False, dset_seed=None, fix_frame=False, fps=29.97, frame_length_ms=64, frame_sample_delta=74, frame_step_ms=16, freq_len=1024, full_im_dim=256, full_model=False, full_samples_len=105000, gamma=0.1, gan_weight=0.0, grad_clip=10.0, im_split=False, im_type=jpeg, init_path=../results/nets/shift/net.tf-650000, init_type=shift, input_rms=0.141421356237, l1_weight=1.0, log_spec=True, loss_types=['fg-bg'], model_path=../results/nets/sep/full/net.tf-160000, mono=False, multi_shift=False, net_style=full, normalize_rms=True, num_dbs=None, num_samples=44144, opt_method=adam, pad_stft=False, phase_type=pred, phase_weight=0.01, pit_weight=0.0, predict_bg=True, print_iters=10, profile_iters=None, resdir=/multisensory-master/results/nets/sep/full, samp_sr=21000.0, sample_len=None, sampled_frames=63, samples_per_frame=700.700700701, show_iters=None, show_videos=False, slow_check_iters=10000, spec_len=128, spec_max=80.0, spec_min=-100.0, step_size=120000, subsample_frames=None, summary_iters=10, test_batch=10, test_list=../data/celeb-tf-v6-full/test/tf, total_frames=149, train_iters=160000, train_list=../data/celeb-tf-v6-full/train/tf, use_3d=True, use_sound=True, use_wav_gan=False, val_list=../data/celeb-tf-v6-full/val/tf, variable_frame_count=False, vid_dur=2.135, weight_decay=1e-05)
ffmpeg -loglevel error -ss 0.0 -i "../data/translator.mp4" -safe 0 -t 2.185 -r 29.97 -vf scale=256:256 "/tmp/tmpVEitNC/small_%04d.png"
ffmpeg -loglevel error -ss 0.0 -i "../data/translator.mp4" -safe 0 -t 2.185 -r 29.97 -vf "scale=-2:'min(600,ih)'" "/tmp/tmpVEitNC/full_%04d.png"
ffmpeg -loglevel error -ss 0.0 -i "../data/translator.mp4" -safe 0 -t 2.185 -ar 21000.0 -ac 2 "/tmp/tmpVEitNC/sound.wav"
Running on: /gpu:0
2018-06-15 13:57:11.657961: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-06-15 13:57:12.523259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla K40m major: 3 minor: 5 memoryClockRate(GHz): 0.745
pciBusID: 0000:02:00.0
totalMemory: 11.92GiB freeMemory: 11.84GiB
2018-06-15 13:57:12.523316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:02:00.0, compute capability: 3.5)
Raw spec length: [1, 128, 1025]
Truncated spec length: [1, 128, 1025]
bn scale: True
arg_scope train = False
sf/conv1_1 -> [1, 11036, 1, 64]
sf/conv2_1_short -> [1, 690, 1, 128]
sf/conv2_1_1 -> [1, 690, 1, 128]
sf/conv2_1_2 -> [1, 690, 1, 128]
sf/conv3_1_1 -> [1, 173, 1, 128]
sf/conv3_1_2 -> [1, 173, 1, 128]
sf/conv4_1_short -> [1, 44, 1, 256]
sf/conv4_1_1 -> [1, 44, 1, 256]
sf/conv4_1_2 -> [1, 44, 1, 256]
im/conv1 -> [1, 32, 112, 112, 64] before: [1, 63, 224, 224, 3]
pool -> [1, 32, 56, 56, 64]
im/conv2_1_1 -> [1, 32, 56, 56, 64] before: [1, 32, 56, 56, 64]
im/conv2_1_2 -> [1, 32, 56, 56, 64] before: [1, 32, 56, 56, 64]
pool -> [1, 16, 28, 28, 64]
im/conv2_2_1 -> [1, 16, 28, 28, 64] before: [1, 32, 56, 56, 64]
im/conv2_2_2 -> [1, 16, 28, 28, 64] before: [1, 16, 28, 28, 64]
frac: 2.6875
sf/conv5_1 -> [1, 16, 1, 128]
sf_net shape before merge: [1, 44, 1, 256], and after merge: [1, 16, 1, 256]
im/merge1 -> [1, 16, 28, 28, 512] before: [1, 16, 28, 28, 192]
im/merge2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 512]
im/conv3_1_1 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128]
im/conv3_1_2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128]
im/conv3_2_1 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128]
im/conv3_2_2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128]
im/conv4_1_short -> [1, 8, 14, 14, 256] before: [1, 16, 28, 28, 128]
im/conv4_1_1 -> [1, 8, 14, 14, 256] before: [1, 16, 28, 28, 128]
im/conv4_1_2 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256]
im/conv4_2_1 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256]
im/conv4_2_2 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256]
time_stride = 1
im/conv5_1_short -> [1, 8, 7, 7, 512] before: [1, 8, 14, 14, 256]
im/conv5_1_1 -> [1, 8, 7, 7, 512] before: [1, 8, 14, 14, 256]
im/conv5_1_2 -> [1, 8, 7, 7, 512] before: [1, 8, 7, 7, 512]
im/conv5_2_1 -> [1, 8, 7, 7, 512] before: [1, 8, 7, 7, 512]
im/conv5_2_2 -> [1, 8, 7, 7, 512] before: [1, 8, 7, 7, 512]
joint/logits -> [1, 1, 1, 1, 1] before: [1, 1, 1, 1, 512]
joint/logits -> [1, 8, 7, 7, 1] before: [1, 8, 7, 7, 512]
gen/conv1 [1, 128, 1024, 2] -> [1, 128, 512, 64]
gen/conv2 [1, 128, 512, 64] -> [1, 128, 256, 128]
gen/conv3 [1, 128, 256, 128] -> [1, 64, 128, 256]
Video net before merge: [1, 16, 1, 64] After: [1, 64, 1, 64]
gen/conv4 [1, 64, 128, 320] -> [1, 32, 64, 512]
Video net before merge: [1, 16, 1, 128] After: [1, 32, 1, 128]
gen/conv5 [1, 32, 64, 640] -> [1, 16, 32, 512]
Video net before merge: [1, 8, 1, 512] After: [1, 16, 1, 512]
gen/conv6 [1, 16, 32, 1024] -> [1, 8, 16, 512]
gen/conv7 [1, 8, 16, 512] -> [1, 4, 8, 512]
gen/conv8 [1, 4, 8, 512] -> [1, 2, 4, 512]
gen/conv9 [1, 2, 4, 512] -> [1, 1, 2, 512]
gen/deconv1 [1, 1, 2, 512] -> [1, 2, 4, 512]
gen/deconv2 [1, 2, 4, 1024] -> [1, 4, 8, 512]
gen/deconv3 [1, 4, 8, 1024] -> [1, 8, 16, 512]
gen/deconv4 [1, 8, 16, 1024] -> [1, 16, 32, 512]
gen/deconv5 [1, 16, 32, 1536] -> [1, 32, 64, 512]
gen/deconv6 [1, 32, 64, 1152] -> [1, 64, 128, 256]
gen/deconv7 [1, 64, 128, 576] -> [1, 128, 256, 128]
gen/deconv8 [1, 128, 256, 256] -> [1, 128, 512, 64]
gen/fg [1, 128, 512, 128] -> [1, 128, 1024, 2]
gen/bg [1, 128, 512, 128] -> [1, 128, 1024, 2]
Restoring from: ../results/nets/sep/full/net.tf-160000
predict
samples shape: (1, 44144, 2)
samples pred shape: (1, 44144, 2)
(128, 1025)
Running on: 0
2018-06-15 13:57:18.753499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:02:00.0, compute capability: 3.5)
bn scale: False
arg_scope train = True
sf/conv1_1 -> [1, 11036, 1, 64]
sf/conv2_1_short -> [1, 690, 1, 128]
sf/conv2_1_1 -> [1, 690, 1, 128]
sf/conv2_1_2 -> [1, 690, 1, 128]
sf/conv3_1_1 -> [1, 173, 1, 128]
sf/conv3_1_2 -> [1, 173, 1, 128]
sf/conv4_1_short -> [1, 44, 1, 256]
sf/conv4_1_1 -> [1, 44, 1, 256]
sf/conv4_1_2 -> [1, 44, 1, 256]
im/conv1 -> [1, 32, 112, 112, 64] before: [1, 63, 224, 224, 3]
pool -> [1, 32, 56, 56, 64]
im/conv2_1_1 -> [1, 32, 56, 56, 64] before: [1, 32, 56, 56, 64]
im/conv2_1_2 -> [1, 32, 56, 56, 64] before: [1, 32, 56, 56, 64]
pool -> [1, 16, 28, 28, 64]
im/conv2_2_1 -> [1, 16, 28, 28, 64] before: [1, 32, 56, 56, 64]
im/conv2_2_2 -> [1, 16, 28, 28, 64] before: [1, 16, 28, 28, 64]
frac: 2.6875
sf/conv5_1 -> [1, 16, 1, 128]
sf_net shape before merge: [1, 44, 1, 256], and after merge: [1, 16, 1, 256]
im/merge1 -> [1, 16, 28, 28, 512] before: [1, 16, 28, 28, 192]
im/merge2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 512]
im/conv3_1_1 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128]
im/conv3_1_2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128]
im/conv3_2_1 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128]
im/conv3_2_2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128]
im/conv4_1_short -> [1, 8, 14, 14, 256] before: [1, 16, 28, 28, 128]
im/conv4_1_1 -> [1, 8, 14, 14, 256] before: [1, 16, 28, 28, 128]
im/conv4_1_2 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256]
im/conv4_2_1 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256]
im/conv4_2_2 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256]
time_stride = 1
im/conv5_1_short -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 256]
im/conv5_1_1 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 256]
im/conv5_1_2 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 512]
im/conv5_2_1 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 512]
im/conv5_2_2 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 512]
joint/logits -> [1, 1, 1, 1, 1] before: [1, 1, 1, 1, 512]
joint/logits -> [1, 8, 14, 14, 1] before: [1, 8, 14, 14, 512]
bn scale: False
arg_scope train = True
sf/conv1_1 -> [1, 11036, 1, 64]
sf/conv2_1_short -> [1, 690, 1, 128]
sf/conv2_1_1 -> [1, 690, 1, 128]
sf/conv2_1_2 -> [1, 690, 1, 128]
sf/conv3_1_1 -> [1, 173, 1, 128]
sf/conv3_1_2 -> [1, 173, 1, 128]
sf/conv4_1_short -> [1, 44, 1, 256]
sf/conv4_1_1 -> [1, 44, 1, 256]
sf/conv4_1_2 -> [1, 44, 1, 256]
im/conv1 -> [1, 32, 112, 112, 64] before: [1, 63, 224, 224, 3]
pool -> [1, 32, 56, 56, 64]
im/conv2_1_1 -> [1, 32, 56, 56, 64] before: [1, 32, 56, 56, 64]
im/conv2_1_2 -> [1, 32, 56, 56, 64] before: [1, 32, 56, 56, 64]
pool -> [1, 16, 28, 28, 64]
im/conv2_2_1 -> [1, 16, 28, 28, 64] before: [1, 32, 56, 56, 64]
im/conv2_2_2 -> [1, 16, 28, 28, 64] before: [1, 16, 28, 28, 64]
frac: 2.6875
sf/conv5_1 -> [1, 16, 1, 128]
sf_net shape before merge: [1, 44, 1, 256], and after merge: [1, 16, 1, 256]
im/merge1 -> [1, 16, 28, 28, 512] before: [1, 16, 28, 28, 192]
im/merge2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 512]
im/conv3_1_1 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128]
im/conv3_1_2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128]
im/conv3_2_1 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128]
im/conv3_2_2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128]
im/conv4_1_short -> [1, 8, 14, 14, 256] before: [1, 16, 28, 28, 128]
im/conv4_1_1 -> [1, 8, 14, 14, 256] before: [1, 16, 28, 28, 128]
im/conv4_1_2 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256]
im/conv4_2_1 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256]
im/conv4_2_2 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256]
time_stride = 1
im/conv5_1_short -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 256]
im/conv5_1_1 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 256]
im/conv5_1_2 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 512]
im/conv5_2_1 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 512]
im/conv5_2_2 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 512]
joint/logits -> [1, 1, 1, 1, 1] before: [1, 1, 1, 1, 512]
joint/logits -> [1, 8, 14, 14, 1] before: [1, 8, 14, 14, 512]
Writing to: ../results/
ffmpeg -i "/tmp/ao_M0QAze.wav" -r 29.970000 -loglevel warning -safe 0 -f concat -i "/tmp/ao_cnpblR.txt" -pix_fmt yuv420p -vcodec h264 -strict -2 -y -acodec aac "../results/fg_cam_translator.mp4"
Guessed Channel Layout for Input Stream #0.0 : mono
[concat @ 0x382d700] DTS -230584300921369 < 0 out of order
[h264_v4l2m2m @ 0x385f500] Could not find a valid device
[h264_v4l2m2m @ 0x385f500] can't configure encoder
Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height
Traceback (most recent call last):
File "sep_video.py", line 442, in
ut.make_video(full_ims, pr.fps, pj(arg.out, 'fg%s.mp4' % name), snd(full_samples_fg))
File "/multisensory-master/src/aolib/util.py", line 3169, in make_video
% (sound_flags_in, fps, input_file, sound_flags_out, flags, out_fname))
File "/multisensory-master/src/aolib/util.py", line 915, in sys_check
fail('Command failed! %s' % cmd)
File "/multisensory-master/src/aolib/util.py", line 12, in fail
def fail(s = ''): raise RuntimeError(s)
RuntimeError: Command failed! ffmpeg -i "/tmp/ao_M0QAze.wav" -r 29.970000 -loglevel warning -safe 0 -f concat -i "/tmp/ao_cnpblR.txt" -pix_fmt yuv420p -vcodec h264 -strict -2 -y -acodec aac "../results/fg_cam_translator.mp4"
I want to know what went wrong and what should i do...
Any suggestion will be appreciated! Thanks.