An addernet CUDA version


Training addernet accelerated by CUDA


cd adder_cuda
python install
cd ..


pytorch 1.10.0 CUDA 11.3


version training_time_per_batch/s
raw 1.61
torch.cdist 1.49
cuda_unoptimized 0.4508
this work 0.3158

The CUDA version of AdderNet has achieved a 5× speed increase over the original version. There seems to be some bugs in the Cuda_unoptimized version, causing the model to fail to converge. Its speed is still listed here for comparison. The experiment was run on RTX 2080Ti platform, and ResNet-20 based on CIFAR-10 was trained.

Time(%) Time Calls Avg Min Max Name
48.57 30.4752s 3920 7.7743ms 162.70us 12.271ms CONV_BACKWARD
34.85 21.8686s 19680 1.1112ms 5.3770us 11.827ms _ZN2at6native27unrolled_elementwise_kernel...
7.46 4.67901s 5920 790.37us 26.529us 1.5841ms CONV
2.24 1.40372s 3920 358.09us 31.298us 845.80us col2im_kernel
2.10 1.31882s 36862 35.777us 1.4720us 276.24us vectorized_elementwise_kernel
1.43 900.03ms 5920 152.03us 7.9040us 372.40us im2col_kernel

Here is the time distribution of training an epoch. If you are interested, you can continue to optimize the CUDA kernel.

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Prevent `CUDA error: out of memory` in just 1 line of code.
Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.
A PaddlePaddle version of Neural Renderer, refer to its PyTorch version
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
  • illegal memory access was encountered

    illegal memory access was encountered

    每次跑完8个batch后出现这个问题,改batchsize没用,都是8个batch后报错。 机器装了3块GPU,设置的GPU_ID = 1 Files already downloaded and verified Files already downloaded and verified Train - Epoch 1, Batch: 0, Loss: 2.296886, Time 5.307902 Train - Epoch 1, Batch: 1, Loss: 2.301040, Time 0.105161 Train - Epoch 1, Batch: 2, Loss: 2.300776, Time 0.110913 Train - Epoch 1, Batch: 3, Loss: 2.303986, Time 0.104652 Train - Epoch 1, Batch: 4, Loss: 2.289750, Time 0.100140 Train - Epoch 1, Batch: 5, Loss: 2.315252, Time 0.099318 Train - Epoch 1, Batch: 6, Loss: 2.298506, Time 0.106323 Train - Epoch 1, Batch: 7, Loss: 2.310294, Time 0.106855 Traceback (most recent call last): File "/work/sunbiao/AdderNetCUDA-LingYeAI/", line 146, in main() File "/work/sunbiao/AdderNetCUDA-LingYeAI/", line 142, in main train_and_test(e) File "/work/sunbiao/AdderNetCUDA-LingYeAI/", line 135, in train_and_test train(epoch) File "/work/sunbiao/AdderNetCUDA-LingYeAI/", line 90, in train output = net(images) File "/home/nature/anaconda3/envs/addernet/lib/python3.9/site-packages/torch/nn/modules/", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/work/sunbiao/AdderNetCUDA-LingYeAI/", line 83, in forward x = self.trans3(self.dense3(x)) File "/home/nature/anaconda3/envs/addernet/lib/python3.9/site-packages/torch/nn/modules/", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/nature/anaconda3/envs/addernet/lib/python3.9/site-packages/torch/nn/modules/", line 141, in forward input = module(input) File "/home/nature/anaconda3/envs/addernet/lib/python3.9/site-packages/torch/nn/modules/", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/work/sunbiao/AdderNetCUDA-LingYeAI/", line 17, in forward y = self.conv1(func.relu(self.bn1(x))) File "/home/nature/anaconda3/envs/addernet/lib/python3.9/site-packages/torch/nn/modules/", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/work/sunbiao/AdderNetCUDA-LingYeAI/", line 104, in forward output = adder2d_function(x, self.adder, self.stride, self.padding) File "/work/sunbiao/AdderNetCUDA-LingYeAI/", line 39, in adder2d_function out = out.permute(3, 0, 1, 2).contiguous() RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

    opened by tju-sun-lab 2
  •  Resnet20 based on adder_cuda seems to have difficulty converging

    Resnet20 based on adder_cuda seems to have difficulty converging

    I try to train resnet20 for classification task on cifar10 dataset. But when using adder_cuda, the network seems to have difficulty converging. So, I am curious about the author's experimental results on the cifar10 dataset.

    opened by 154115081020 1


    hello, I run your code and there is an CUDA ERROR: an illegal memory access was encountered. The detailed information is

    Traceback (most recent call last): File "/home/new/classification-CNN/AdderNetCUDA-main/", line 145, in main() File "/home/new/classification-CNN/AdderNetCUDA-main/", line 141, in main train_and_test(e) File "/home/new/classification-CNN/AdderNetCUDA-main/", line 134, in train_and_test train(epoch) File "/home/new/classification-CNN/AdderNetCUDA-main/", line 101, in train loss.backward() File "/home/new/anaconda3/envs/pytorch38/lib/python3.8/site-packages/torch/", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/new/anaconda3/envs/pytorch38/lib/python3.8/site-packages/torch/autograd/", line 145, in backward Variable._execution_engine.run_backward( File "/home/new/anaconda3/envs/pytorch38/lib/python3.8/site-packages/torch/autograd/", line 89, in apply return self._forward_cls.backward(self, *args) # type: ignore File "/home/new/classification-CNN/AdderNetCUDA-main/", line 78, in backward grad_W_col = grad_W_col/grad_W_col.norm(p=2).clamp(min=1e-12)*math.sqrt(W_col.size(1)*W_col.size(0))/5 File "/home/new/anaconda3/envs/pytorch38/lib/python3.8/site-packages/torch/", line 401, in norm return torch.norm(self, p, dim, keepdim, dtype=dtype) File "/home/new/anaconda3/envs/pytorch38/lib/python3.8/site-packages/torch/", line 1376, in norm return _VF.norm(input, p, dim=_dim, keepdim=keepdim) # type: ignore RuntimeError: CUDA error: an illegal memory access was encountered

    Can you provide me with some solutions to this problem?

    opened by wangchangyi1160 1
