All public open-source implementations of convnets benchmarks

Soumith Chintala

Last update: Dec 30, 2022

Related tags

Deep Learning convnet-benchmarks

Overview

convnet-benchmarks

Easy benchmarking of all public open-source implementations of convnets. A summary is provided in the section below.

Machine: 6-core Intel Core i7-5930K CPU @ 3.50GHz + NVIDIA Titan X + Ubuntu 14.04 x86_64

Imagenet Winners Benchmarking

I pick some popular imagenet models, and I clock the time for a full forward + backward pass. I average my times over 10 runs. I ignored dropout and softmax layers.

Notation

Input is described as {batch_size}x{num_filters}x{filter_width}x{filter_height}. Where batch_size is the number of images used in a minibatch, num_filters is the number of channels in an image, filter_width is the width of the image, and filter_height is the height of the image.

One small note:

The CuDNN benchmarks are done using Torch bindings. One can also do the same via Caffe bindings or bindings of any other library. This note is here to clarify that Caffe (native) and Torch (native) are the convolution kernels which are present as a default fallback. Some of the frameworks like TensorFlow and Chainer are benchmarked with CuDNN, but it is not explicitly mentioned, and hence one might think that these frameworks as a whole are faster, than for example Caffe, which might not be the case.

AlexNet (One Weird Trick paper) - Input 128x3x224x224

Library	Class	Time (ms)	forward (ms)	backward (ms)
CuDNN[R4]-fp16 (Torch)	cudnn.SpatialConvolution	71	25	46
Nervana-neon-fp16	ConvLayer	78	25	52
CuDNN[R4]-fp32 (Torch)	cudnn.SpatialConvolution	81	27	53
TensorFlow	conv2d	81	26	55
Nervana-neon-fp32	ConvLayer	87	28	58
fbfft (Torch)	fbnn.SpatialConvolution	104	31	72
Chainer	Convolution2D	177	40	136
cudaconvnet2*	ConvLayer	177	42	135
CuDNN[R2] *	cudnn.SpatialConvolution	231	70	161
Caffe (native)	ConvolutionLayer	324	121	203
Torch-7 (native)	SpatialConvolutionMM	342	132	210
CL-nn (Torch)	SpatialConvolutionMM	963	388	574
Caffe-CLGreenTea	ConvolutionLayer	1442	210	1232

Overfeat [fast] - Input 128x3x231x231

Library	Class	Time (ms)	forward (ms)	backward (ms)
Nervana-neon-fp16	ConvLayer	176	58	118
Nervana-neon-fp32	ConvLayer	211	69	141
CuDNN[R4]-fp16 (Torch)	cudnn.SpatialConvolution	242	86	156
CuDNN[R4]-fp32 (Torch)	cudnn.SpatialConvolution	268	94	174
TensorFlow	conv2d	279	90	189
fbfft (Torch)	SpatialConvolutionCuFFT	342	114	227
Chainer	Convolution2D	620	135	484
cudaconvnet2*	ConvLayer	723	176	547
CuDNN[R2] *	cudnn.SpatialConvolution	810	234	576
Caffe	ConvolutionLayer	823	355	468
Torch-7 (native)	SpatialConvolutionMM	878	379	499
CL-nn (Torch)	SpatialConvolutionMM	963	388	574
Caffe-CLGreenTea	ConvolutionLayer	2857	616	2240

OxfordNet [Model-A] - Input 64x3x224x224

Library	Class	Time (ms)	forward (ms)	backward (ms)
Nervana-neon-fp16	ConvLayer	254	82	171
Nervana-neon-fp32	ConvLayer	320	103	217
CuDNN[R4]-fp16 (Torch)	cudnn.SpatialConvolution	471	140	331
CuDNN[R4]-fp32 (Torch)	cudnn.SpatialConvolution	529	162	366
TensorFlow	conv2d	540	158	382
Chainer	Convolution2D	885	251	632
fbfft (Torch)	SpatialConvolutionCuFFT	1092	355	737
cudaconvnet2*	ConvLayer	1229	408	821
CuDNN[R2] *	cudnn.SpatialConvolution	1099	342	757
Caffe	ConvolutionLayer	1068	323	745
Torch-7 (native)	SpatialConvolutionMM	1105	350	755
CL-nn (Torch)	SpatialConvolutionMM	3437	875	2562
Caffe-CLGreenTea	ConvolutionLayer	5620	988	4632

GoogleNet V1 - Input 128x3x224x224

Library	Class	Time (ms)	forward (ms)	backward (ms)
Nervana-neon-fp16	ConvLayer	230	72	157
Nervana-neon-fp32	ConvLayer	270	84	186
TensorFlow	conv2d	445	135	310
CuDNN[R4]-fp16 (Torch)	cudnn.SpatialConvolution	462	112	349
CuDNN[R4]-fp32 (Torch)	cudnn.SpatialConvolution	470	130	340
Chainer	Convolution2D	687	189	497
Caffe	ConvolutionLayer	1935	786	1148
CL-nn (Torch)	SpatialConvolutionMM	7016	3027	3988
Caffe-CLGreenTea	ConvolutionLayer	9462	746	8716

Layer-wise Benchmarking (Last Updated April 2015)

Spatial Convolution layer (3D input 3D output, densely connected)

forward + backprop (wrt input and weights)

Original Library	Class/Function Benchmarked	Time (ms)	forward (ms)	backward (ms)
fbfft	SpatialConvolutionCuFFT	256	101	155
cuda-convnet2 *	ConvLayer	977	201	776
cuda-convnet**	pylearn2.cuda_convnet	1077	312	765
CuDNN R2 *	cudnn.SpatialConvolution	1019	269	750
Theano	CorrMM	1225	407	818
Caffe	ConvolutionLayer	1231	396	835
Torch-7	SpatialConvolutionMM	1265	418	877
DeepCL	ConvolutionLayer	6280	2648	3632
cherry-picking****	best per layer	235	79	155

This table is NOT UPDATED For TITAN-X. These numbers below were on Titan Black and are here only for informational and legacy purposes.

Original Library	Class/Function Benchmarked	Time (ms)	forward (ms)	backward (ms)
Theano (experimental)***	conv2d_fft	1178	304	874
Torch-7	nn.SpatialConvolutionBHWD	1892	581	1311
ccv	ccv_convnet_layer	809+bw	809
Theano (legacy)	conv2d	70774	3833	66941

* indicates that the library was tested with Torch bindings of the specific kernels.
** indicates that the library was tested with Pylearn2 bindings.
*** This is an experimental module which used FFT to calculate convolutions. It uses a lot of memory according to @benanne
**** The last row shows results obtainable when choosing the best-performing library for each layer.
L1 - Input: 128x128 Batch-size 128, Feature maps: 3->96, Kernel Size: 11x11, Stride: 1x1
L2 - Input: 64x64 Batch-size 128, Feature maps: 64->128, Kernel Size: 9x9, Stride: 1x1
L3 - Input: 32x32 Batch-size 128, Feature maps: 128->128, Kernel Size: 9x9, Stride: 1x1
L4 - Input: 16x16 Batch-size 128, Feature maps: 128->128, Kernel Size: 7x7, Stride: 1x1
L5 - Input: 13x13 Batch-size 128, Feature maps: 384->384, Kernel Size: 3x3, Stride: 1x1
The table is ranked according to the total time forward+backward calls for layers (L1 + L2 + L3 + L4 + L5)

Breakdown

forward

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library	Class/Function Benchmarked	L1	L2	L3	L4	L5	Total
fbfft	SpatialConvolutionCuFFT	57	27	6	2	9	101
cuda-convnet2 *	ConvLayer	36	113	40	4	8	201
cuda-convnet**	pylearn2.cuda_convnet	38	183	68	7	16	312
CuDNN R2	cudnn.SpatialConvolution	56	143	53	6	11	269
Theano	CorrMM	91	143	121	24	28	407
Caffe	ConvolutionLayer	93	136	116	24	27	396
Torch-7	nn.SpatialConvolutionMM	94	149	123	24	28	418
DeepCL	ConvolutionLayer	738	1241	518	47	104	2648
cherry-picking****	best per layer	36	27	6	2	8	79

backward (gradInput + gradWeight)

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library	Class/Function Benchmarked	L1	L2	L3	L4	L5	Total
fbfft	SpatialConvolutionCuFFT	76	45	12	4	18	155
cuda-convnet2 *	ConvLayer	103	467	162	15	29	776
cuda-convnet**	pylearn2.cuda_convnet	136	433	147	15	34	765
CuDNN R2	cudnn.SpatialConvolution	139	401	159	19	32	750
Theano	CorrMM	179	405	174	29	31	818
Caffe	ConvolutionLayer	200	405	172	28	30	835
Torch-7	nn.SpatialConvolutionMM	206	432	178	29	32	877
DeepCL	ConvolutionLayer	484	2144	747	59	198	3632
cherry-picking****	best per layer	76	45	12	4	18	155

Comments

Benchmark TensorFlow
Google's TensorFlow benchmarks are here!

I've run the benchmarks on the Imagenet Winners. When I saw issues with the numbers, memory etc., I emailed @Yangqing to confirm what I'm seeing, and that it is expected.

With that disclaimer out of the way, here's some things that you should know about TensorFlow (as of the pip version that I installed today):

in-place ReLU seems non-existent in practice.

Yangqing says: "right now there are little in-place operations in TensorFlow and we pretty much rely on the scheduler and the memory pool to allocate and deallocate memory"

Supports CuDNN R2. No R3 support yet, Yangqing says the next version they are going to support is likely R4.

Coming to the benchmarks:

Googlenet with batchsize 128 goes Out of Memory. The largest batch-size I could fit is 16 (tried 16, 32, 64, 128)

VGG with batchsize 64 goes Out of Memory (Edit: VGG memory issue was solved by using the BFC allocator updated by GOOG). ~~The largest batch-size I could fit is 32 (tried 32, 64).~~

I've also computed Torch7+CuDNN-R2 baselines for these batch-sizes.

AlexNet (One Weird Trick paper) - Input 128x3x224x224

| Library | Time (ms) | forward (ms) | backward (ms) | | :-: | --: | --: | --: | | CuDNN-R3 (Torch) | 96 | 32 | 64 | | Nervana (Neon) | 101 | 32 | 69 | | CuDNN-R2 (Torch) | 231 | 70 | 161 | | TensorFlow | 326 | 96 | 230 |

Overfeat [fast] - Input 128x3x231x231

| Library | Time (ms) | forward (ms) | backward (ms) | | :-: | --: | --: | --: | | CuDNN-R3 (Torch) | 326 | 113 | 213 | | fbfft (Torch) | 342 | 114 | 227 | | CuDNN-R2 (Torch) | 810 | 234 | 576 | | TensorFlow | 1084 | 316 | 768 |

OxfordNet [Model-A] - Input 64x3x224x224

| Library | Time (ms) | forward (ms) | backward (ms) | | :-: | --: | --: | --: | | Nervana | 590 | 180 | 410 | | CuDNN-R3 (Torch) | 615 | 196 | 418 | | CuDNN-R2 (Torch) | 1099 | 342 | 757 | | TensorFlow | 1840 | 545 | 1295 |

GoogleNet V1 - Input 16x3x224x224

| Library | Time (ms) | forward (ms) | backward (ms) | | :-: | --: | --: | --: | | CuDNN-R2 (Torch) | 564 | 174 | 390 | | TensorFlow | 590 | 54 | 536 |

Note that at batch size of 16, googlenet with CuDNN-R2 + Torch likely runs into dispatching overhead, so it's an exotic comparison, but not practically very interesting or encouraging.

There you go.

I'm assuming that the first release of TensorFlow is still quite unpolished, and that they will improve it over time with various memory and time optimizations baked in.
opened by soumith 112
[August 2015] Rejigging the marks...
With Cudnn R3 coming in, improvements to Nervana, and a new kid on the block called Chainer, faster Facebook kernels, I will be doing a minor re-run of the benchmarks to see how things have improved.

Target date: August 15th.

I am still thinking quite a lot on how to take the benchmarks forward, beyond ConvNets, beyond Images (into NLP, Video and Audio) and beyond single-GPU. If any domain experts have suggestions (especially for Audio and NLP), please do write to me.

The only thing that stopped me from multi-GPU benchmarks was the lack of enough frameworks to do benchmarking. This somewhat seemed to have changed, and a decent number of frameworks now support multi-GPU, so will plan on that.

More fun to come soon.

Checklist:

[x] CuDNN R3

fp 16

fp 32

[x] Nervana Neon

fp 16

fp 32

[ ] Chainer

[x] CL-Torch

[x] CL-Caffe (greentea)

[x] FB-CuNN
opened by soumith 56
Theano fft experimental version

This add the benchmark of Theano fft experimental version. I also try to make it more clear that this is work in progress and which conclusion can't be infered.

opened by nouiz 29
Theano benchmark: Use pylearn2 only for cuda-convnet wrapper

This changes the Theano benchmark to use the Theano conv2d op directly instead of setting up an MLP in pylearn2. It fixes the problem of the experimental FFT convolution crashing for some of the backpropagation timings.

opened by f0k 16
Update Theano benchmark for latest version

Recently, the cuDNN- and gemm-based convolutions have been enabled by default in Theano. This PR updates the compile modes such that the correct versions are benchmarked again. (So there is no need to update the timings, this PR just ensures that you get the correct numbers when running with the latest Theano version.)

opened by f0k 12
Theano benchmark: Removed two lines resetting the compile mode

The Theano benchmark code had two forgotten lines resetting the compile mode halfway through the benchmark so everything run afterwards wouldn't use the GPU by default. Fixed by this PR.

opened by f0k 11
cuda-convnet2

added the relevant config files and a basic README.

looks like i have to add some low-level clocking and synchronization code inside to do layer-wise benchmarking (right now the entire network :forward is asynchronous), dropped an email to alex and he said that's the best approach as well.

opened by soumith 11
Backpropagation benchmarks for Theano

This adds benchmarks for the backward pass to the Theano benchmark suite. It pretends that whatever was in sharedY after the forward pass benchmark is already the gradient of the (not-actually-existing) cost wrt. the output, so it just does the backward step to compute the gradient wrt. the weights.

Thinking again, are you actually interested in computing the gradient wrt. the weights, or rather the gradient wrt. the input? Or both? Or should that be two separate benchmarks? Let me know and I'll update the pull request.

By the way, I've omitted the GFLOP formula as I guess you already figured it out for some of the other libraries and can just copy it over. Otherwise, FilterActs, WeightActs and ImageActs in pylearn2 also have a flops() method that should do the correct thing.

opened by f0k 10
add imagenet benchmarks for theano and lasagne

Hello,

I am interested in comparing performance of theano against tensorflow and torch. I have ported tensorflow implementation in theano/lasagne. Mostly it is line to line correspondance. I am slightly confused by stat calculations of time measurements, so I replaced them by numpy stats.

opened by kshmelkov 8
Greentea / Caffe with OpenCL benchmarks

This adds benchmarking for https://github.com/naibaf7/caffe (see also: https://github.com/BVLC/caffe/pull/2610) It is called project Greentea and contains a complete OpenCL backend for Caffe.

Tested & found working on Fedora 22 & Ubuntu 14.04. The installation script is written so that it should work on Ubuntu 13.04 to 15.04. On Fedora 22, the packages need to be installed manually.

The biggest pitfall here is a faulty OpenCL installation and/or incompatible libraries. Make sure to install CUDA & nVidia drivers respectively FGLRX for AMD correctly first.

Note that it will be quite a bit slower than Caffe with CUDA. The OpenCL backend is still a work-in-progress project. The standard compilation also uses ViennaCL-BLAS for simplicity which is often slower than AMD's clBLAS.

opened by naibaf7 8
CPU Convnet Benchmarks: Caffe vs. Torch Discrepancies (20x) on Jetson TX1 A57 CPU
Caffe is 20x faster than Torch when benchmarking the ARM Cortex A57 CPU on the NVIDIA Jetson TX1. I performed the same test on an Intel Xeon E5-2637 CPU using Caffe + openBLAS (CPU) vs. Torch + openBLAS (CPU) and the differences are fairly small (< 30% difference).

Does anyone have any tips/tricks to get the Torch CPU code to be on par with Caffe CPU code on the ARM A57?

Lua Benchmark:

Imports Caffe's bvlc_alexnet model to a nn specification in Lua using LoadCaffe (https://github.com/szagoruyko/loadcaffe).

Torch is installed using the standard installation method shown here http://torch.ch/docs/getting-started.html. OpenBLAS is detected, and I verify that there are 4 threads by looking at the number of luajit threads that are spawned whenever I call the benchmark.

'th benchmark.lua' will load the AlexNet model and will time the time it takes to perform model:forward(inputs) for some random inputs.

Test configuration as follows: model: bvlc_alexnet, batch_size = 100, input size = 3x227,227, iter = 1, threads = 4. The images per second (inference, forward pass only): 0.25 FPS (or 400,000 ms per batch of 100). My Lua benchmark code for the CPU can be downloaded here: http://homes.cs.washington.edu/~cdel/download/benchmark_A57.tgz

Caffe Benchmark:

I build Caffe with OpenBLAS, and I set OPENBLAS_NUM_THREADS = 4. GNU configure shows that OPENBLAS and Neon vector instructions are enabled on the ARM A57.

I run build/tools/caffe time --model=models/bvlc_alexnet/deploy.prototxt --iterations=1

Test configuration for Caffe for the ARM A57 CPU is: bvlc_alexnet, batch_size = 100, input_size = 3x227x27, iter = 1, threads = 4 and get a resulting images per second (inference, forward pass only): 5.2 FPS (or 19036 ms per batch of 100).
opened by ghost 7
convnet-benchmark is not working with tensorflow 1.8 on AMD or Nvidia cards

If you build and install tensorflow 1.8 from https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/tree/r1.8-rocm and run the benchmark tests, you will get 0 values either in AMD or Nvidia cards with their latest release public driver.

2018-07-09 20:30:59.599248: Forward across 100 steps, 0.000 +/- 0.000 sec / batch

Is this benchmark no more applicable for tensorflow1.8 latest version? Is dev is going to update it to work for tensorflow1.8 and higher version for the future?

opened by pramenku 2

cltorch googlenet.lua: attempt to index global 'cudnn' (a nil value)

Operating System: Ubuntu 16.04.3 LTS, Linux kernel 4.13.0 GPU: AMD RX 580 ROCm backend.

~/convnet-benchmarks/cltorch$ th imagenet_winners/benchmark.lua 
libthclnn_searchpath    /storage/home/yige/torch-cl/install/lib/lua/5.1/libTHCLNN.so
Running on device: gfx803
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: gfx803
ModelType: OverFeat[fast]       Kernels: clnn   Input shape: 128x3x231x231
clnn                                    :updateOutput():     673.86
clnn                                 :updateGradInput():     344.01
clnn                               :accGradParameters():     480.27
clnn                                           :Forward:     673.86
clnn                                          :Backward:     824.27
clnn                                             :TOTAL:    1498.13
ModelType: AlexNet      Kernels: clnn   Input shape: 128x3x224x224
clnn                                    :updateOutput():     311.44
clnn                                 :updateGradInput():     158.93
clnn                               :accGradParameters():     623.55
clnn                                           :Forward:     311.44
clnn                                          :Backward:     782.48
clnn                                             :TOTAL:    1093.92
ModelType: VGG Model-A  Kernels: clnn   Input shape: 64x3x224x224
clnn                                    :updateOutput():     671.76
clnn                                 :updateGradInput():     508.20
clnn                               :accGradParameters():    1174.33
clnn                                           :Forward:     671.76
clnn                                          :Backward:    1682.52
clnn                                             :TOTAL:    2354.29
/storage/home/yige/torch-cl/install/bin/luajit: ./imagenet_winners/googlenet.lua:33: attempt to index global 'cudnn' (a nil value)
stack traceback:
        ./imagenet_winners/googlenet.lua:33: in function <./imagenet_winners/googlenet.lua:30>
        imagenet_winners/benchmark.lua:34: in main chunk
        [C]: in function 'dofile'
        ...e/torch-cl/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x00405e90

opened by yige-hu 0

worse chainer convnet-benchmarks performance on cupy-2.0.0 as compared to cupy-1.0.0.1

Hello, would you please help explain this issue? Thanks in advance. We found that convnet-benchmarks performance on cupy-2.0.0 is worse than that on cupy-1.0.0.1. We don't know whether it is problem of cupy or convnet-benchmarks scripts. We reported this issue in https://github.com/cupy/cupy/issues/753, got no response yet.

---------------------details-------------------------- Test Environment: P100 Test action: 1, install chainer 2, get convnet-benchmarks code: git clone https://github.com/mitmul/convnet-benchmarks 3, test cases 3.1: case "pip install cupy==1.0.0.1" (py2-chainer-gpu) [sys_dltest@mlt-gpu200 chainer]$ python train_imagenet.py alexnet ('Chainer version:', '2.0.0b1') ('CuPy version:', '1.0.0.1') ('CUDA:', True) ('CUDA Version:', u'V8.0.61') ('cuDNN:', True) ('cuDNN Version:', 5110) ('Input data shape:', (128, 3, 224, 224)) ('Average Forward: ', 16.15312328338623, ' ms') ('Average Backward: ', 35.27830085754395, ' ms') ('Average Total: ', 51.431424140930176, ' ms')

3.2: case "pip install cupy==2.0.0" (py2-chainer-gpu) [sys_dltest@mlt-gpu200 chainer]$ python train_imagenet.py alexnet ('Chainer version:', '2.0.0b1') ('CuPy version:', '2.0.0') ('CUDA:', True) ('cuDNN:', True) ('cuDNN Version:', 5110) ('Input data shape:', (128, 3, 224, 224)) ('Average Forward: ', 35.381299591064455, ' ms') ('Average Backward: ', 63.26389694213867, ' ms') ('Average Total: ', 98.64519653320312, ' ms')

3.3: case "pip install cupy==2.0.0rc1" (py2-chainer-gpu) [sys_dltest@mlt-gpu200 chainer]$ python train_imagenet.py alexnet ('Chainer version:', '2.0.0b1') ('CuPy version:', '2.0.0rc1') ('CUDA:', True) ('cuDNN:', True) ('cuDNN Version:', 5110) ('Input data shape:', (128, 3, 224, 224)) ('Average Forward: ', 35.5438117980957, ' ms') ('Average Backward: ', 63.336796569824216, ' ms') ('Average Total: ', 98.88060836791992, ' ms')

Notice: when run "case cupy==2.0.0*", you need to comment following lines in train_imagenet.py. #if chainer.cuda.available:

cuda_v = cupy.cuda.compiler._get_nvcc_version().split()[-1].decode('utf-8')

print('CUDA Version:', cuda_v)

opened by mingxiaoh 2
Tensorflow benchmark files not updated after migration?

After trying benchmark_googlenet.py to benchmark, i ran into "TypeError: Expected int32, got list containing Tensors of type '_Message' instead." After searching through i found some links that the tensorflow might not support the old features. Such as tf.concat function.

i think the files uses the old functions from the tensorflow.

opened by hit1001 0

All public open-source implementations of convnets benchmarks

Related tags

Overview

convnet-benchmarks

Imagenet Winners Benchmarking

Notation

One small note:

Layer-wise Benchmarking (Last Updated April 2015)

Spatial Convolution layer (3D input 3D output, densely connected)

forward + backprop (wrt input and weights)

Breakdown

forward

backward (gradInput + gradWeight)

Comments

cuda_v = cupy.cuda.compiler._get_nvcc_version().split()[-1].decode('utf-8')

print('CUDA Version:', cuda_v)

Owner

Soumith Chintala

PyTorch implementation of spectral graph ConvNets, NIPS’16

Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

Dcf-game-infrastructure-public - Contains all the components necessary to run a DC finals (attack-defense CTF) game from OOO

Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

Benchmarks for semi-supervised domain generalization.

Sequence modeling benchmarks and temporal convolutional networks

NeurIPS 2021 Datasets and Benchmarks Track

Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Benchmarks for the Optimal Power Flow Problem

Benchmark spaces - Benchmarks of how well different two dimensional spaces work for clustering algorithms

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

TorchMetrics is a collection of 25+ PyTorch metrics implementations and an easy-to-use API to create custom metrics.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch implementations of the paper: "Learning Independent Instance Maps for Crowd Localization"

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.