A benchmark framework for Tensorflow

Last update: Dec 30, 2022

Related tags

Deep Learning benchmarks

Overview

TensorFlow benchmarks

This repository contains various TensorFlow benchmarks. Currently, it consists of two projects:

PerfZero: A benchmark framework for TensorFlow.
scripts/tf_cnn_benchmarks (no longer maintained): The TensorFlow CNN benchmarks contain TensorFlow 1 benchmarks for several convolutional neural networks.

If you want to run TensorFlow models and measure their performance, also consider the TensorFlow Official Models

Comments

How can I start a benchmark with `distributed_all_reduce` ?

My Env: TensorFlow: 1.3 CUDA: 8.0 cuDNN: 6.0

I notice an update for distributed_all_reduce so I want to have a try. But I'm not sure what value should controller_host takes... My args are:

--variable_update=distributed_all_reduce
--all_reduce_spec=pscpu:32k:xring

and I start 3 processes with args: FIRST:

--job_name=worker
--worker_hosts=127.0.0.1:50001,127.0.0.1:50002
--task_index=0

SECONDE:

--job_name=worker
--worker_hosts=127.0.0.1:50001,127.0.0.1:50002
--task_index=1

THIRD:

--job_name=controller
--controller_host=??
--task_index=0

When I put 127.0.0.1:50000 or 127.0.0.1:50001 on controller_host, I got:

TensorFlow:  1.3
Model:       resnet50
Mode:        training
SingleSess:  True
Batch size:  128 global
             64 per device
Devices:     ['job:worker/task0/gpu:0', 'job:worker/task1/gpu:0']
Data format: NCHW
Optimizer:   sgd
Variables:   distributed_all_reduce
AllReduce:   pscpu:32k:xring
Sync:        True
==========
Generating model
WARNING:tensorflow:From /home/zzy/workspace/benchmarks/scripts/tf_cnn_benchmarks/preprocessing.py:486: __init__ (from tensorflow.contrib.data.python.ops.readers) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.TFRecordDataset`.
WARNING:tensorflow:From /home/zzy/workspace/benchmarks/scripts/tf_cnn_benchmarks/preprocessing.py:487: range (from tensorflow.contrib.data.python.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.range()`.
WARNING:tensorflow:From /home/zzy/workspace/benchmarks/scripts/tf_cnn_benchmarks/preprocessing.py:489: zip (from tensorflow.contrib.data.python.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.zip()`.
2017-10-10 14:03:34.183287: E tensorflow/core/common_runtime/session.cc:69] Not found: No session factory registered for the given session options: {target: "127.0.0.1:50001" config: intra_op_parallelism_threads: 1 gpu_options { force_gpu_compatible: true } allow_soft_placement: true} Registered factories are {DIRECT_SESSION, GRPC_SESSION}.
Traceback (most recent call last):
  File "/home/zzy/workspace/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py", line 46, in <module>
    tf.app.run()
  File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/zzy/workspace/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py", line 42, in main
    bench.run()
  File "/home/zzy/workspace/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 892, in run
    return self._benchmark_cnn()
  File "/home/zzy/workspace/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1068, in _benchmark_cnn
    start_standard_services=start_standard_services) as sess:
  File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
  File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 953, in managed_session
    start_standard_services=start_standard_services)
  File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 708, in prepare_or_wait_for_session
    init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
  File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 273, in prepare_session
    config=config)
  File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 178, in _restore_checkpoint
    sess = session.Session(self._target, graph=self._graph, config=config)
  File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1482, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 622, in __init__
    self._session = tf_session.TF_NewDeprecatedSession(opts, status)
  File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: No session factory registered for the given session options: {target: "127.0.0.1:50001" config: intra_op_parallelism_threads: 1 gpu_options { force_gpu_compatible: true } allow_soft_placement: true} Registered factories are {DIRECT_SESSION, GRPC_SESSION}.

opened by sleepfin 27

tf_cnn_benchmarks.py does not support --data_dir with my imagenet1k tfrecords

I'm using the HEAD of both tensorflow and benchmarks. I can run the tf_cnn_benchmarks.py with synthetic data like this:

python3 tf_cnn_benchmarks.py --num_batches=100 --display_every=1 --device=cpu --data_format=NHWC --model=trivial --batch_size=64

But when I try to specify my own local data_dir of tfrecords for imagenet1k, it hangs sometime after printing "Running warm up":

python3 tf_cnn_benchmarks.py --num_batches=100 --display_every=1 --device=cpu --data_format=NHWC --model=trivial --batch_size=64 --data_dir=/n0/ryan/imagenet1k_tfrecord
TensorFlow:  1.8  
Model:       trivial
Dataset:     imagenet
Mode:        training
SingleSess:  False
Batch size:  64 global
             64.0 per device
Num batches: 100
Num epochs:  0.00 
Devices:     ['/cpu:0']
Data format: NHWC 
Layout optimizer: False
Optimizer:   sgd  
Variables:   parameter_server
==========
Generating model
W0530 13:48:44.750849 140466104280896 tf_logging.py:125] From /home/ryan/sandbox/rreece/onboarding-cerebras/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:1611: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-05-30 13:48:44.798403: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
I0530 13:48:44.929922 140466104280896 tf_logging.py:115] Running local_init_op.
I0530 13:48:50.095620 140466104280896 tf_logging.py:115] Done running local_init_op.
Running warm up

and then it hangs.

Any ideas how I can debug using my own local dataset?

I noticed these seemingly related closed issues: #150 and #176, but they do not seem to be hanging at the same place tf_cnn_benchmarks.py does for me.

Thanks!

opened by rreece 25

FP16 support in the benchmark

Hi @tfboyd, I saw the benchmark has --use_fp16 flag now. So does the benchmark and the latest TensorFlow support FP16 now? Can we do the test on Volta GPUs? Thanks.

opened by renganxu 24
Same build for training and validation

It would be very interesting to read both training and validation data from minibatch function. It's really flexible to evaluate on validation set during training (for example pause training every 3 epochs and evaluate on validation). This is very demanding because the build of training and validation should be unified.

opened by chrisrn 19
large drop in performance with TF 1.6 and newer on ResNet50 (CPU-only)
I'm seeing a pretty significant performance regression when going from TF 1.4.1/1.5.0 to 1.6.0/1.7.0/1.8.0 using the ResNet50 benchmark included in this repository:

python tf_cnn_benchmarks.py --device cpu --batch_size=32 --model=resnet50 --variable_update=parameter_server --data_format NHWC

System details:

single node, 2x 10-core Intel Xeon E5-2660 v3 @ 2.60GHz (Intel Haswell architecture)

CentOS 7.4.1708

Python 3.6.4 (self-compiled)

TensorFlow built from source with Bazel and GCC 6.4.0 using -march=native (via EasyBuild)

Results using ResNet50 from 82dd0539c76afa8491e50d8f796e686b4d97b988 (values are reported total images/sec)

TF 1.4.1: 5.41

TF 1.5.0: 5.26

TF 1.6.0: 2.27 (2.4x slower than TF 1.4.1)

TF 1.7.0: 2.26

TF 1.8.0: 3.93 (1.4x slower than TF 1.4.1)

(I can reproduce a similar performance trend on other systems too)

While looking for a possible cause for this, I bumped into #137 which discusses performance regressions that were introduced exactly at 82dd0539c76afa8491e50d8f796e686b4d97b988 (what a coincidence?!), so I re-ran some of the tests with the commit right before that (f5d85aef2851881001130b28385795bc4c59fa38), but that pretty much shows the same trend:

TF 1.4.1: 5.14

TF 1.6.0: 2.23 (2.3x slower than TF 1.4.1)

TF 1.8.0: 3.89 (1.3x slower than TF 1.4.1)

That seems to suggest that #137 isn't relevant to what I'm seeing...

So, I tried again with current master of this repo (542d590bbd2a2740c19f196ea672451957170fc6), and then things got even weirder... It seems like current master only works with TF 1.8.0 (with 1.6.0 & older I get an ImportError on threadpool)

TF 1.8.0: 2.19 => 77s slower than with 82dd0539c76afa8491e50d8f796e686b4d97b988 or f5d85aef2851881001130b28385795bc4c59fa38

Note how the benchmark is now significantly slower with TF 1.8.0, which puts us even further away again from the 5.4 I saw with TF 1.4.1.

So, now I'm a bit confused... I see a couple of possible explanations:

The ResNet50 benchmark included in this repository is not well suited/stable enough for comparing performance across TF versions; if it's not, that's fine, but then maybe that should be clearly stated somewhere? If it's not, is there a better alternative?

Some serious performance regression was introduced in TF 1.6.0, which seems to be partially fixed in TF 1.8.0 (or not, based on results with master?)

Something is going wrong with installing from source of TF 1.6.0 & newer. Given the complexity of a from-source installation of TF, this wouldn't surprise me, but I couldn't seem to find anything that make explain the large performance differences...

The latter two don't explain the drop in performance with current master using TF 1.8.0...

Any ideas on what's going on here?
opened by boegel 17
Add Horovod support

I propose to add Horovod support to TF benchmarks, so it's always up to date with the latest TensorFlow innovations.

Usage: --variable_update horovod [--horovod_device gpu/cpu]
cla: yes

opened by alsrgv 17
Running tf_cnn_benchmarks.py

Hello,

I have copy the benchmarks folder under tensorflow directory.

(tensorflow) root@P50:/opt/DL/tensorflow# ls -all total 28 drwxr-xr-x 6 root root 4096 oct 22 13:00 . drwxr-xr-x 5 root root 4096 oct 22 16:53 .. drwxr-xr-x 8 root root 4096 oct 22 13:00 benchmarks drwxr-xr-x 2 root root 4096 oct 22 12:53 bin drwxr-xr-x 2 root root 4096 oct 22 12:50 include drwxr-xr-x 3 root root 4096 oct 22 12:50 lib -rw-r--r-- 1 root root 60 oct 22 12:50 pip-selfcheck.json

When trying to run tf_cnn_benchmark I am getting this error:

(tensorflow) root@P50:/opt/DL/tensorflow/benchmarks/scripts# python3 tf_cnn_benchmarks.py --local_parameter_device=cpu --num_gpus=1 --batch_size=16 --model=inception3 --data_dir=/opt/DL/imagenet/datasets/ --variable_update=parameter_server --nodistortions Traceback (most recent call last): File "tf_cnn_benchmarks.py", line 26, in import benchmark_cnn File "/opt/DL/tensorflow/benchmarks/scripts/benchmark_cnn.py", line 41, in import cnn_util File "/opt/DL/tensorflow/benchmarks/scripts/cnn_util.py", line 40 print log ^ SyntaxError: Missing parentheses in call to 'print' (tensorflow) root@P50:/opt/DL/tensorflow/benchmarks/scripts#

Do I need to do something else before running the benchmark?

Thank you, Florin

opened by fmoo7 17
Distributed performance on better GPUs?

Thanks very much for publishing the code. With this benchmark I've seen very good GPU utilization with single-machine multi-GPU training, however I found that distributed training doesn't scale very well.

The published distributed benchmark performance were only on K80s, so the communication overhead might be less of a problem there. However TitanX/M40 is about twice faster than it, and P100 is about 4x faster, and V100 would be ..

In more details: Tensorflow version: commit d101472296f88 compiled manually (with -march=native) Python 2.7, cuda 8.0.44, cudnn 5.1 GPU: 4 Tesla M40s per machine Latency between the two machines: 0.06~0.08ms given by ping Bandwidth: 9.3Gbit/s given by iperf

Speed numbers (all with resnet50, batch 64 per GPU): Single machine: (variable_update=parameter_server) 1GPU: 111 im/s -> 4GPU: 432 im/s Two machines (variable_update=distributed_replicated): 2x4=8GPU: only 561 im/s

Hope to see some more improvements on it!

opened by ppwwyyxx 17
missing 1 required positional argument: 'output_types'

Hi all, I am running the tensorflow benchmarks with the below software/hardware but getting the error TypeError: function_buffering_resource() missing 1 required positional argument: 'output_types' when running the test with imagenet dataset

Ubuntu 16.04.4 LTS (GNU/Linux 4.4.0-112-generic x86_64) nvidia-driver: 384.111
Platform: DGX-1

python3 tf_cnn_benchmarks.py --device=gpu --use_fp16=True --data_dir=/data/imagenet_tfrecord/train --data_name=imagenet --model=vgg16 --batch_size=32 --num_gpus=8

Output TensorFlow: 1.10 Model: vgg16 Dataset: imagenet Mode: training SingleSess: False Batch size: 256 global 32.0 per device Num batches: 100 Num epochs: 0.02 Devices: ['/gpu:0', '/gpu:1', '/gpu:2', '/gpu:3', '/gpu:4', '/gpu:5', '/gpu:6', '/gpu:7'] Data format: NCHW Layout optimizer: False Optimizer: sgd Variables: parameter_server

Generating model Traceback (most recent call last): File "tf_cnn_benchmarks.py", line 60, in app.run(main) # Raises error on invalid flags, unlike tf.app.run() File "/usr/local/lib/python3.5/dist-packages/absl/app.py", line 274, in run _run_main(main, args) File "/usr/local/lib/python3.5/dist-packages/absl/app.py", line 238, in _run_main sys.exit(main(argv)) File "tf_cnn_benchmarks.py", line 56, in main bench.run() File "/data/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1285, in run return self._benchmark_cnn() File "/data/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1404, in _benchmark_cnn self._build_model_with_dataset_prefetching()) File "/data/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1806, in _build_model_with_dataset_prefetching self.cpu_device, self.params, self.devices, self.dataset) File "/data/benchmarks/scripts/tf_cnn_benchmarks/data_utils.py", line 58, in build_prefetch_image_processing shared_name=None)

TypeError: function_buffering_resource() missing 1 required positional argument: 'output_types'

some tips?

opened by vilmara 14

collective_ops removed from tensorflow 1.7 / 1.7.1, but used in benchmarks?

Tried to run, but no collective_ops in tensorflow-gpu. Used inside allreduce.py What am I doing wrong? is it remove indeed?

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=densenet100_k24 --variable_update=parameter_server  1>densenet100_k24.txt

Traceback (most recent call last):
  File "tf_cnn_benchmarks.py", line 27, in <module>
    import benchmark_cnn
  File "XXX\scripts\tf_cnn_benchmarks\benchmark_cnn.py", line 51, in <module>
    import variable_mgr
  File "XXX\scripts\tf_cnn_benchmarks\variable_mgr.py", line 25, in <module>
    import allreduce
  File "XXX\scripts\tf_cnn_benchmarks\allreduce.py", line 28, in <module>
    from tensorflow.python.ops import collective_ops
ImportError: cannot import name 'collective_ops'

opened by eddr 13

Benchmark hangs for non syntetic data

I tried to run

# VGG16 training ImageNet with 8 GPUs using arguments that optimize for
# Google Compute Engine.
python tf_cnn_benchmarks.py --local_parameter_device=cpu --num_gpus=1 \
--batch_size=32 --model=vgg16 --data_dir=/home/ubuntu/flowers \
--variable_update=parameter_server --nodistortions

And the data dir has the TF Records inside, generated with bazel as in the models/inception/data tutorial

-rw-rwx--- 1  40 May 11 11:43 labels.txt
drwxrwx--- 7 4096 May 12 11:45 train
-rw-rwx--- 1  102419300 May 11 11:43 train-00000-of-00002
-rw-rwx--- 1   99116804 May 11 11:43 train-00001-of-00002
drwxrwx--- 7  4096 May 12 11:45 validation
-rw-rwx--- 1  16058779 May 11 11:43 validation-00000-of-00002
-rw-rwx--- 1  15919237 May 11 11:43 validation-00001-of-00002

And it hangs like this:

TensorFlow:  1.1
Model:       vgg16
Mode:        training
Batch size:  32 global
             32.0 per device
Devices:     ['/gpu:0']
Data format: NCHW
Optimizer:   sgd
Variables:   parameter_server
==========
Generating model
2017-05-12 11:57:30.357629: I tensorflow/core/common_runtime/gpu/gpu_device.cc:900] Found device 0 with properties:
....
pciBusID 0002:01:00.0
Total memory: 15.89GiB
Free memory: 15.61GiB
2017-05-12 11:57:30.357680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:921] DMA: 0
2017-05-12 11:57:30.357690: I tensorflow/core/common_runtime/gpu/gpu_device.cc:931] 0:   Y
2017-05-12 11:57:30.357707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0002:01:00.0)

But for syntatic data it works. Any idea how to fix this?

opened by ghost 13

perfzero resnet benchmark is outdated
Due to refactoring in the tf models repo, the estimator benchmarks seem to have moved or no longer exist. For example, running the following command

$ python3 lib/benchmark.py \ --git_repos="https://github.com/tensorflow/models.git;benchmark" \ --python_path=models \ --benchmark_methods=official.r1.resnet.estimator_benchmark.Resnet50EstimatorBenchmarkSynth.benchmark_graph_1_gpu \ --gcloud_key_file_url=""

produces ModuleNotFoundError: No module named 'official.r1'. I've tried to change the path to something else -- using official.legacy.image_classification.resnet makes some progress -- but I haven't been able to figure it out.
opened by vladfi1 2
Alternative/current state of tf_cnn_benchmark

Hello community and devs,

a quick question from my side. I see that tf_cnn_benchmark is no longer actively maintained. I see that this makes sense to reduce the code volume that requires compatibility with future tf versions. But I would like to understand if this poses a severe issue in using the benchmark in the upcoming time. Is the code known to be incompatible or not achieve the expected performance when using for instance tf2.8?

In other words: Is this tf_cnn_benchmark still in good use and only the promise to continue developing and maintaining the code missing? Or is it already outdated?

And the documentation points towards to the new TF2 models for benchmarking. Are you aware of an implementation of an actual benchmark based on the models that could be an alternative?

Would be happy to get a reply. Cheers Stefan

opened by kessel 3
How to evaluate worker performance independently on a distributed training

Hi

I'm trying to evaluate the performance of each worker independently in a cluster with multiple machines while training them using the same model. My goal is to record each worker training performance.

Every setup and config that I try I always get the same time for all workers (probably because of synchronization issues). So, even if one of my workers is a machine that is 4x faster, it would still record the same time as the slowest machine in the cluster.

Anyone has any idea how can I do that?

opened by delucca 0
Perfzero support for Openshift on RHEL

I am trying to run perfzero on Openshift/RHEL platform. Getting error while building docker image. Need to know whether perfzero supports for openshift platfom ? can anyone help me on this ?

opened by AkashSky5 0
resnet50 --use_fp16 error: cuDNN launch failure : input shape ([128,112,112,64])

3090 and following versions: Windows 10 python 3.9.5 tensorflow 2.5 CUDA 11.2.2 (path set) CuDNN 8.1

fp32 works: python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=resnet50 --variable_update=parameter_server

fp16 not: python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=resnet50 --variable_update=parameter_server --use_fp16 error: Internal: cuDNN launch failure : input shape ([128,112,112,64])

opened by drnefischer 0
The accuracy of the program running by horovod is low

When I run the program with "python tf_cnn_benchmarks.py --data_format=NCHW --batch_size=256
--model=resnet50 --optimizer=momentum --variable_update=replicated
--nodistortions --gradient_repacking=8 --num_gpus=8
--num_epochs=90 --weight_decay=1e-4 --data_dir=${DATA_DIR} --use_fp16
--train_dir=${CKPT_DIR}". The final test accuracy is 75.96.% But I run the program with " horovodrun -np 8 python tf_cnn_benchmarks.py --data_format=NCHW --batch_size=256
--model=resnet50 --optimizer=momentum --variable_update=horovod
--nodistortions --gradient_repacking=8 --num_gpus=8
--num_epochs=90 --weight_decay=1e-4 --data_dir=${DATA_DIR} --use_fp16
--train_dir=${CKPT_DIR}". The final test accuracy is 74%. Is this a normal result? or This is error that I run the program with horovod. Looking forward to your reply .Thank you

opened by lljjgg 0

A benchmark framework for Tensorflow

Related tags

Overview

TensorFlow benchmarks

Comments

Owner

TensorFlow (v2.7.0) benchmark results on an M1 Macbook Air 2020 laptop (macOS Monterey v12.1).

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ImageNet-CoG is a benchmark for concept generalization. It provides a full evaluation framework for pre-trained visual representations which measure how well they generalize to unseen concepts.

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation

KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

Curvlearn, a Tensorflow based non-Euclidean deep learning framework.

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

OpenMMLab Detection Toolbox and Benchmark

[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)

Open-L2O: A Comprehensive and Reproducible Benchmark for Learning to Optimize Algorithms

A code repository associated with the paper A Benchmark for Rough Sketch Cleanup by Chuan Yan, David Vanderhaeghe, and Yotam Gingold from SIGGRAPH Asia 2020.