Hi,
I am testing your great package. After trying yesterday to do one epoch on my CPU with my own dataset, it took about 16 hours so I figured the package worked but I needed some GPU acceleration. :-)
I installed OpenCL for my GeForce GT 330M (please don't laugh, I am still testing the fundamentals before scaling up). Then I added a config file to the build folder specifying which device and platform I wanted to use. I recompiled the package with the OpenCL flag on in ccmake. When I trained the network again, I get the error:
ERR [ Tensor::MoveToGPU(407) ] FATAL: Error moving to GPU: -4
terminate called after throwing an instance of 'std::runtime_error'
The full output:
ruud@computer:~/CN24/cn24/build$ ./trainNetwork application/config.set application/test.net
INF [ System::Init(68) ] CN24 version 06206bfbe88095f4dca46506a7fe06ee497d6e76 refs/heads/stable
INF [ System::Init(69) ] Copyright (C) 2015 Clemens-Alexander Brust
INF [ System::Init(70) ] For licensing information, see the LICENSE file included with this project.
DBG [ System::Init(75) ] Executable path: /home/ruud/CN24/cn24/build/
INF [ System::Init(89) ] Loading config file: /home/ruud/CN24/cn24/build/config
INF [ CLHelper::Init(200) ] Using OpenCL device: GeForce GT 330M
INF [ CLHelper::Init(201) ] Image support: Yes
INF [ CLHelper::Init(202) ] Max work group size: 3752810960
INF [ CLHelper::Init(213) ] Creating OpenCL context...
INF [ CLHelper::Init(224) ] Creating OpenCL command queue...
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/crossCorrelation.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/biasedConvolution.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/fullConvolution.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/foldWeights.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/biasedMatrixVector.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/biasGradient.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/matrixMatrix.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/maximumPooling.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/nonLinearFunctions.cl
DBG [ TensorViewer::TensorViewer(57) ] Instance created.
DBG [ ConfigurableFactory::ConfigurableFactory(57) ] Adding convolutional layer to receptive field (7,7)
DBG [ ConfigurableFactory::ConfigurableFactory(64) ] Convolutional layer
DBG [ ConfigurableFactory::ConfigurableFactory(66) ] Adding maxpooling layer to receptive field (2,2)
DBG [ ConfigurableFactory::ConfigurableFactory(57) ] Adding convolutional layer to receptive field (5,5)
DBG [ ConfigurableFactory::ConfigurableFactory(57) ] Adding convolutional layer to receptive field (5,5)
DBG [ main(78) ] Optimal settings: LR: 0.0001, GM: 0.003, EX: 0.75, SB: 10, PB: 2, L1: 0.001, L2: 0.0005, MM: 0.9
INF [ main(84) ] Using fully convolutional training
DBG [ TensorStreamDataset* Conv::TensorStreamDataset::CreateFromConfiguration(328) ] Loading dataset with 6 classes
DBG [ TensorStreamDataset* Conv::TensorStreamDataset::CreateFromConfiguration(329) ] Training tensor: /home/ruud/CN24/cn24/build/pepper/pepper_train.Tensor
DBG [ TensorStreamDataset* Conv::TensorStreamDataset::CreateFromConfiguration(330) ] Testing tensor: /home/ruud/CN24/cn24/build/pepper/pepper_test.Tensor
DBG [ TensorStreamDataset::TensorStreamDataset(32) ] Instance created.
DBG [ TensorStreamDataset::TensorStreamDataset(54) ] 1 training tensors
DBG [ TensorStreamDataset::TensorStreamDataset(73) ] 1 testing tensors
DBG [ DatasetInputLayer::DatasetInputLayer(31) ] Instance created.
DBG [ DatasetInputLayer::DatasetInputLayer(41) ] Using loss sampling probability: 0.25
DBG [ DatasetInputLayer::DatasetInputLayer(47) ] Total samples: 2
DBG [ Net::AddLayer(59) ] Layer 0 output 0: (2s@1000x752x3m)
DBG [ Net::AddLayer(59) ] Layer 0 output 1: (2s@1000x752x6m)
DBG [ Net::AddLayer(59) ] Layer 0 output 2: (2s@1000x752x2m)
DBG [ Net::AddLayer(59) ] Layer 0 output 3: (2s@1000x752x1m)
DBG [ Net::AddLayer(73) ] Layer 0 added.
DBG [ Net::AddLayer(77) ] Layer 0 is OpenCL aware
DBG [ Net::AddLayer(87) ] Layer 0 added as training layer.
DBG [ ResizeLayer::ResizeLayer(23) ] Instance created, border size: (22, 22)
DBG [ Net::AddLayer(37) ] Layer 1 input: layer 0, output 0
DBG [ Net::AddLayer(59) ] Layer 1 output 0: (2s@1022x774x3m)
DBG [ Net::AddLayer(73) ] Layer 1 added.
DBG [ Net::AddLayer(77) ] Layer 1 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: convolutional kernels=16 size=7x7
DBG [ ConfigurableFactory::AddLayers(157) ] Parsed dropout fraction: 0
DBG [ ConvolutionLayer::ConvolutionLayer(48) ] Instance created. 16 output maps with 7x7 kernels.
DBG [ ConvolutionLayer::ConvolutionLayer(50) ] Dropout fraction: 0
DBG [ ConfigurableFactory::AddLayers(162) ] LLR factor: 1, RFX: 24
DBG [ Layer::SetLocalLearningRate(76) ] Setting local learning rate to 1
DBG [ Net::AddLayer(37) ] Layer 2 input: layer 1, output 0
DBG [ Net::AddLayer(59) ] Layer 2 output 0: (2s@1016x768x16m)
DBG [ ConvolutionLayer::Connect(113) ] Local learning rate is now 1
DBG [ Net::AddLayer(73) ] Layer 2 added.
DBG [ Net::AddLayer(77) ] Layer 2 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: maxpooling size=2x2
DBG [ MaxPoolingLayer::MaxPoolingLayer(22) ] Instance created: 2x2 pooling.
DBG [ Net::AddLayer(37) ] Layer 3 input: layer 2, output 0
DBG [ Net::AddLayer(59) ] Layer 3 output 0: (2s@508x384x16m)
DBG [ Net::AddLayer(73) ] Layer 3 added.
DBG [ Net::AddLayer(77) ] Layer 3 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: tanh
DBG [ TanhLayer::TanhLayer(54) ] Instance created, nl: Tanh
DBG [ Net::AddLayer(37) ] Layer 4 input: layer 3, output 0
DBG [ Net::AddLayer(59) ] Layer 4 output 0: (2s@508x384x16m)
DBG [ Net::AddLayer(73) ] Layer 4 added.
DBG [ Net::AddLayer(77) ] Layer 4 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: convolutional size=5x5 kernels=12
DBG [ ConfigurableFactory::AddLayers(157) ] Parsed dropout fraction: 0
DBG [ ConvolutionLayer::ConvolutionLayer(48) ] Instance created. 12 output maps with 5x5 kernels.
DBG [ ConvolutionLayer::ConvolutionLayer(50) ] Dropout fraction: 0
DBG [ ConfigurableFactory::AddLayers(162) ] LLR factor: 1, RFX: 24
DBG [ Layer::SetLocalLearningRate(76) ] Setting local learning rate to 1
DBG [ Net::AddLayer(37) ] Layer 5 input: layer 4, output 0
DBG [ Net::AddLayer(59) ] Layer 5 output 0: (2s@504x380x12m)
DBG [ ConvolutionLayer::Connect(113) ] Local learning rate is now 1
DBG [ Net::AddLayer(73) ] Layer 5 added.
DBG [ Net::AddLayer(77) ] Layer 5 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: tanh
DBG [ TanhLayer::TanhLayer(54) ] Instance created, nl: Tanh
DBG [ Net::AddLayer(37) ] Layer 6 input: layer 5, output 0
DBG [ Net::AddLayer(59) ] Layer 6 output 0: (2s@504x380x12m)
DBG [ Net::AddLayer(73) ] Layer 6 added.
DBG [ Net::AddLayer(77) ] Layer 6 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: convolutional size=5x5 kernels=64
DBG [ ConfigurableFactory::AddLayers(157) ] Parsed dropout fraction: 0
DBG [ ConvolutionLayer::ConvolutionLayer(48) ] Instance created. 64 output maps with 5x5 kernels.
DBG [ ConvolutionLayer::ConvolutionLayer(50) ] Dropout fraction: 0
DBG [ ConfigurableFactory::AddLayers(162) ] LLR factor: 1, RFX: 24
DBG [ Layer::SetLocalLearningRate(76) ] Setting local learning rate to 1
DBG [ Net::AddLayer(37) ] Layer 7 input: layer 6, output 0
DBG [ Net::AddLayer(59) ] Layer 7 output 0: (2s@500x376x64m)
DBG [ ConvolutionLayer::Connect(113) ] Local learning rate is now 1
DBG [ Net::AddLayer(73) ] Layer 7 added.
DBG [ Net::AddLayer(77) ] Layer 7 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: tanh
DBG [ TanhLayer::TanhLayer(54) ] Instance created, nl: Tanh
DBG [ Net::AddLayer(37) ] Layer 8 input: layer 7, output 0
DBG [ Net::AddLayer(59) ] Layer 8 output 0: (2s@500x376x64m)
DBG [ Net::AddLayer(73) ] Layer 8 added.
DBG [ Net::AddLayer(77) ] Layer 8 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: convolutional size=1x1 kernels=192
DBG [ ConfigurableFactory::AddLayers(157) ] Parsed dropout fraction: 0
DBG [ ConvolutionLayer::ConvolutionLayer(48) ] Instance created. 192 output maps with 1x1 kernels.
DBG [ ConvolutionLayer::ConvolutionLayer(50) ] Dropout fraction: 0
DBG [ ConfigurableFactory::AddLayers(162) ] LLR factor: 1, RFX: 24
DBG [ Layer::SetLocalLearningRate(76) ] Setting local learning rate to 1
DBG [ Net::AddLayer(37) ] Layer 9 input: layer 8, output 0
DBG [ Net::AddLayer(59) ] Layer 9 output 0: (2s@500x376x192m)
DBG [ ConvolutionLayer::Connect(113) ] Local learning rate is now 1
DBG [ Net::AddLayer(73) ] Layer 9 added.
DBG [ Net::AddLayer(77) ] Layer 9 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: tanh
DBG [ TanhLayer::TanhLayer(54) ] Instance created, nl: Tanh
DBG [ Net::AddLayer(37) ] Layer 10 input: layer 9, output 0
DBG [ Net::AddLayer(59) ] Layer 10 output 0: (2s@500x376x192m)
DBG [ Net::AddLayer(73) ] Layer 10 added.
DBG [ Net::AddLayer(77) ] Layer 10 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: convolutional size=1x1 kernels=6
DBG [ ConfigurableFactory::AddLayers(157) ] Parsed dropout fraction: 0
DBG [ ConvolutionLayer::ConvolutionLayer(48) ] Instance created. 6 output maps with 1x1 kernels.
DBG [ ConvolutionLayer::ConvolutionLayer(50) ] Dropout fraction: 0
DBG [ ConfigurableFactory::AddLayers(162) ] LLR factor: 1, RFX: 24
DBG [ Layer::SetLocalLearningRate(76) ] Setting local learning rate to 1
DBG [ Net::AddLayer(37) ] Layer 11 input: layer 10, output 0
DBG [ Net::AddLayer(59) ] Layer 11 output 0: (2s@500x376x6m)
DBG [ ConvolutionLayer::Connect(113) ] Local learning rate is now 1
DBG [ Net::AddLayer(73) ] Layer 11 added.
DBG [ Net::AddLayer(77) ] Layer 11 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: sigm
DBG [ SigmoidLayer::SigmoidLayer(55) ] Instance created, nl: Sigmoid
DBG [ Net::AddLayer(37) ] Layer 12 input: layer 11, output 0
DBG [ Net::AddLayer(59) ] Layer 12 output 0: (2s@500x376x6m)
DBG [ Net::AddLayer(73) ] Layer 12 added.
DBG [ Net::AddLayer(77) ] Layer 12 is OpenCL aware
DBG [ UpscaleLayer::UpscaleLayer(18) ] Instance created: 2x2 upscaling.
DBG [ Net::AddLayer(37) ] Layer 13 input: layer 12, output 0
DBG [ Net::AddLayer(59) ] Layer 13 output 0: (2s@1000x752x6m)
DBG [ Net::AddLayer(73) ] Layer 13 added.
WRN [ Net::AddLayer(79) ] Layer 13 is NOT OpenCL aware
DBG [ ConfigurableFactory::AddLayers(236) ] Added upscaling layer for FCN
DBG [ main(127) ] Output layer id: 13
DBG [ ErrorLayer::ErrorLayer(17) ] Instance created.
DBG [ Net::AddLayer(37) ] Layer 14 input: layer 13, output 0
DBG [ Net::AddLayer(37) ] Layer 14 input: layer 0, output 1
DBG [ Net::AddLayer(37) ] Layer 14 input: layer 0, output 3
DBG [ Net::AddLayer(73) ] Layer 14 added.
WRN [ Net::AddLayer(79) ] Layer 14 is NOT OpenCL aware
DBG [ Net::AddLayer(123) ] Layer 14 added as loss function layer.
DBG [ ConfusionMatrixLayer::ConfusionMatrixLayer(17) ] Instance created, 6 classes.
DBG [ Net::AddLayer(37) ] Layer 15 input: layer 13, output 0
DBG [ Net::AddLayer(37) ] Layer 15 input: layer 0, output 1
DBG [ Net::AddLayer(37) ] Layer 15 input: layer 0, output 3
DBG [ Net::AddLayer(73) ] Layer 15 added.
WRN [ Net::AddLayer(79) ] Layer 15 is NOT OpenCL aware
DBG [ Net::AddLayer(111) ] Layer 15 added as confusion matrix layer.
DBG [ ConvolutionLayer::OnLayerConnect(695) ] Updating weights: 192 -> 0
DBG [ ConvolutionLayer::OnLayerConnect(695) ] Updating weights: 64 -> 192
DBG [ ConvolutionLayer::OnLayerConnect(695) ] Updating weights: 300 -> 64
DBG [ ConvolutionLayer::OnLayerConnect(695) ] Updating weights: 400 -> 300
DBG [ ConvolutionLayer::OnLayerConnect(695) ] Updating weights: 147 -> 100
DBG [ Trainer::Trainer(22) ] Instance created
DBG [ Trainer::Trainer(37) ] Optimizing 10 sets of parameters.
DBG [ Trainer::Trainer(57) ] Weights: 40082
INF [ Trainer::Trainer(60) ] Training settings: LR: 0.0001, GM: 0.003, EX: 0.75, SB: 10, PB: 2, L1: 0.001, L2: 0.0005, MM: 0.9
INF [ main(247) ] Enter "help" for information on how to use this program
train epochs=1
DBG [ Net::SetTestOnlyStatDisabled(174) ] Confusion matrix layer disabled: 0
DBG [ Trainer::Epoch(162) ] Epoch: 0, it: 100, bsize: 20, lr0: 0.0001
ERR [ Tensor::MoveToGPU(407) ] FATAL: Error moving to GPU: -4
terminate called after throwing an instance of 'std::runtime_error'
what(): See log for details.
Aborted (core dumped)
help wanted question