I am under Ubuntu 16.04, Cuda 8.0 CuDNN 6.0.21; on a GTX 980Ti. I attach my caffe2 cmake summary and my compile log for ref
compile_pytorch.txt
cmake_pytorch.txt
Cannot figure out where is the problem. HELP please!!!
I pass the first caffe2 tests:
- python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"
Success
- python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())'
1
Then I "fail" the relu_op_test.py with a weird message but I have read those errors may just be warnings so not sure.
python caffe2/python/operator_test/relu_op_test.py
No handlers could be found for logger "caffe2.python.net_drawer"
net_drawer will not run correctly. Please install the correct dependencies.
/usr/local/lib/python2.7/dist-packages/caffe2/python/hypothesis_test_util.py:75: HypothesisDeprecationWarning:
The min_satisfying_examples setting has been deprecated and disabled, due to
overlap with the filter_too_much healthcheck and poor interaction with the
max_examples setting.
verbosity=hypothesis.Verbosity.verbose))
/usr/local/lib/python2.7/dist-packages/caffe2/python/hypothesis_test_util.py:84: HypothesisDeprecationWarning:
The min_satisfying_examples setting has been deprecated and disabled, due to
overlap with the filter_too_much healthcheck and poor interaction with the
max_examples setting.
verbosity=hypothesis.Verbosity.verbose))
/usr/local/lib/python2.7/dist-packages/caffe2/python/hypothesis_test_util.py:92: HypothesisDeprecationWarning:
The min_satisfying_examples setting has been deprecated and disabled, due to
overlap with the filter_too_much healthcheck and poor interaction with the
max_examples setting.
verbosity=hypothesis.Verbosity.verbose))
E0702 22:04:28.531708 3503 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0702 22:04:28.531721 3503 init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0702 22:04:28.531724 3503 init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([0.], dtype=float32), gc=, dc=[, device_type: 1], engine=u'')
{u'X': }
{u'X': device_type: 1
}
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([[[[-0.25521252],
[-0.25521252],
[-0.25521252],
[-0.25521252]],
[[-0.25521252],
[-0.25521252],
[-0.25521252],
[-0.25521252]],
[[-0.25521252],
[-0.25521252],
[-0.25521252],
[-0.25521252]]],
[[[-0.25521252],
[-0.25521252],
[ 0.37402993],
[-0.25521252]],
[[-0.25521252],
[-0.25521252],
[-0.25521252],
[-0.25521252]],
[[-0.25521252],
[-0.25521252],
[-0.25521252],
[-0.25521252]]],
[[[-0.25521252],
[-0.25521252],
[ 0.6883273 ],
[-0.25521252]],
[[-0.25521252],
[-0.25521252],
[-0.25521252],
[-0.25521252]],
[[-0.25521252],
[ 0.86089224],
[-0.25521252],
[-0.25521252]]]], dtype=float32), gc=device_type: 1, dc=[, device_type: 1], engine=u'')
{u'X': }
{u'X': device_type: 1
}
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([[-0.793503 , -0.793503 , -0.793503 ],
[-0.49327314, -0.793503 , -0.793503 ],
[-0.793503 , -0.793503 , -0.793503 ],
[-0.29144055, -0.793503 , -0.793503 ],
[-0.793503 , 0.07806086, -0.5510364 ]], dtype=float32), gc=device_type: 1, dc=[, device_type: 1], engine=u'CUDNN')
{u'X': }
I0702 22:04:28.761914 3503 operator.cc:169] Engine CUDNN is not available for operator Relu.
{u'X': device_type: 1
}
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/hypothesis/core.py", line 671, in evaluate_test_data
result = self.execute(data, collect=True)
File "/usr/local/lib/python2.7/dist-packages/hypothesis/core.py", line 586, in execute
result = self.test_runner(data, run)
File "/usr/local/lib/python2.7/dist-packages/hypothesis/executors.py", line 58, in default_new_style_executor
return function(data)
File "/usr/local/lib/python2.7/dist-packages/hypothesis/core.py", line 582, in run
return test(*args, **kwargs)
File "caffe2/python/operator_test/relu_op_test.py", line 19, in test_relu
engine=st.sampled_from(["", "CUDNN"]),
File "/usr/local/lib/python2.7/dist-packages/hypothesis/core.py", line 524, in test
result = self.test(*args, **kwargs)
File "caffe2/python/operator_test/relu_op_test.py", line 26, in test_relu
self.assertDeviceChecks(dc, op, [X], [0])
File "/usr/local/lib/python2.7/dist-packages/caffe2/python/hypothesis_test_util.py", line 350, in assertDeviceChecks
dc.CheckSimple(op, inputs, outputs_to_check, input_device_options)
File "/usr/local/lib/python2.7/dist-packages/caffe2/python/device_checker.py", line 49, in CheckSimple
workspace.RunOperatorOnce(op)
File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 165, in RunOperatorOnce
return C.run_operator_once(StringifyProto(operator))
RuntimeError: [enforce fail at context_gpu.h:100] status == CUDNN_STATUS_SUCCESS. 4 vs 0. , Error at: /home/tets/pytorch/caffe2/core/context_gpu.h:100: CUDNN_STATUS_INTERNAL_ERROR Error from operator:
input: "X" output: "Y" name: "" type: "Relu" device_option { device_type: 1 } engine: "CUDNN"
I compiled denspose with no pb, and passed the detectron checks:
python2 $DENSEPOSE/detectron/tests/test_spatial_narrow_as_op.py
No handlers could be found for logger "caffe2.python.net_drawer"
net_drawer will not run correctly. Please install the correct dependencies.
E0702 22:22:49.913838 3731 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0702 22:22:49.913887 3731 init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0702 22:22:49.913918 3731 init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Found Detectron ops lib: /usr/local/lib/libcaffe2_detectron_ops_gpu.so
...
----------------------------------------------------------------------
Ran 3 tests in 0.932s
OK
python2 $DENSEPOSE/detectron/tests/test_zero_even_op.py
E0702 22:24:07.535202 3750 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0702 22:24:07.535217 3750 init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0702 22:24:07.535226 3750 init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
............
----------------------------------------------------------------------
Ran 12 tests in 0.136s
OK
But when running the infer_simple.py command, I got the below cryptic error message.
~/densepose$ python2 tools/infer_simple.py --cfg configs/DensePose_ResNet101_FPN_s1x-e2e.yaml --output-dir DensePoseData/infer_out/ --image-ext jpg --wts https://s3.amazonaws.com/densepose/DensePose_ResNet101_FPN_s1x-e2e.pkl DensePoseData/demo_data/20170716_101313.jpg
/home/tets/.local/lib/python2.7/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Found Detectron ops lib: /usr/local/lib/libcaffe2_detectron_ops_gpu.so
E0702 21:59:20.453374 3391 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0702 21:59:20.453388 3391 init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0702 21:59:20.453392 3391 init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
WARNING cnn.py: 25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
INFO net.py: 51: Loading weights from: /tmp/detectron-download-cache/DensePose_ResNet101_FPN_s1x-e2e.pkl
I0702 21:59:21.176241 3391 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000106374 secs
I0702 21:59:21.185215 3391 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 8.9347e-05 secs
I0702 21:59:21.186816 3391 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 1.1866e-05 secs
INFO infer_simple.py: 103: Processing DensePoseData/demo_data/20170716_101313.jpg -> DensePoseData/infer_out/20170716_101313.jpg.pdf
I0702 21:59:21.589475 3391 net_async_base.cc:435] Using specified CPU pool size: 4; NUMA node id: -1
I0702 21:59:21.589490 3391 net_async_base.cc:440] Created new CPU pool, size: 4; NUMA node id: -1
E0702 21:59:23.285727 3413 net_async_base.cc:368] [enforce fail at context_gpu.h:100] status == CUDNN_STATUS_SUCCESS. 4 vs 0. , Error at: /home/tets/pytorch/caffe2/core/context_gpu.h:100: CUDNN_STATUS_INTERNAL_ERROR Error from operator:
input: "gpu_0/data" input: "gpu_0/conv1_w" output: "gpu_0/conv1" name: "" type: "Conv" arg { name: "kernel" i: 7 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 3 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 2 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN", op Conv
WARNING workspace.py: 185: Original python traceback for operator `0` in network `generalized_rcnn` in exception above (most recent call last):
WARNING workspace.py: 190: File "tools/infer_simple.py", line 140, in <module>
WARNING workspace.py: 190: File "tools/infer_simple.py", line 91, in main
WARNING workspace.py: 190: File "/home/tets/densepose/detectron/core/test_engine.py", line 334, in initialize_model_from_cfg
WARNING workspace.py: 190: File "/home/tets/densepose/detectron/modeling/model_builder.py", line 119, in create
WARNING workspace.py: 190: File "/home/tets/densepose/detectron/modeling/model_builder.py", line 84, in generalized_rcnn
WARNING workspace.py: 190: File "/home/tets/densepose/detectron/modeling/model_builder.py", line 233, in build_generic_detection_model
WARNING workspace.py: 190: File "/home/tets/densepose/detectron/modeling/optimizer.py", line 46, in build_data_parallel_model
WARNING workspace.py: 190: File "/home/tets/densepose/detectron/modeling/model_builder.py", line 165, in _single_gpu_build_func
WARNING workspace.py: 190: File "/home/tets/densepose/detectron/modeling/FPN.py", line 55, in add_fpn_ResNet101_conv5_body
WARNING workspace.py: 190: File "/home/tets/densepose/detectron/modeling/FPN.py", line 96, in add_fpn_onto_conv_body
WARNING workspace.py: 190: File "/home/tets/densepose/detectron/modeling/ResNet.py", line 40, in add_ResNet101_conv5_body
WARNING workspace.py: 190: File "/home/tets/densepose/detectron/modeling/ResNet.py", line 90, in add_ResNet_convX_body
WARNING workspace.py: 190: File "/home/tets/densepose/detectron/modeling/ResNet.py", line 243, in basic_bn_stem
WARNING workspace.py: 190: File "/usr/local/lib/python2.7/dist-packages/caffe2/python/cnn.py", line 97, in Conv
WARNING workspace.py: 190: File "/usr/local/lib/python2.7/dist-packages/caffe2/python/brew.py", line 107, in scope_wrapper
WARNING workspace.py: 190: File "/usr/local/lib/python2.7/dist-packages/caffe2/python/helpers/conv.py", line 186, in conv
WARNING workspace.py: 190: File "/usr/local/lib/python2.7/dist-packages/caffe2/python/helpers/conv.py", line 139, in _ConvBase
Traceback (most recent call last):
File "tools/infer_simple.py", line 140, in <module>
main(args)
File "tools/infer_simple.py", line 109, in main
model, im, None, timers=timers
File "/home/tets/densepose/detectron/core/test.py", line 58, in im_detect_all
model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, boxes=box_proposals
File "/home/tets/densepose/detectron/core/test.py", line 158, in im_detect_bbox
workspace.RunNet(model.net.Proto().name)
File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 217, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 178, in CallWithExceptionIntercept
return func(*args, **kwargs)
RuntimeError: [enforce fail at context_gpu.h:100] status == CUDNN_STATUS_SUCCESS. 4 vs 0. , Error at: /home/tets/pytorch/caffe2/core/context_gpu.h:100: CUDNN_STATUS_INTERNAL_ERROR Error from operator:
input: "gpu_0/data" input: "gpu_0/conv1_w" output: "gpu_0/conv1" name: "" type: "Conv" arg { name: "kernel" i: 7 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 3 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 2 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN"
my NVIDIA setup for ref:
Mon Jul 2 22:20:18 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 980 Ti Off | 00000000:01:00.0 On | N/A |
| 0% 50C P0 61W / 250W | 243MiB / 6077MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1141 G /usr/lib/xorg/Xorg 17MiB |
| 0 1447 G /usr/lib/xorg/Xorg 101MiB |
| 0 1917 G /usr/bin/gnome-shell 93MiB |
+-----------------------------------------------------------------------------+