A benchmark framework for Tensorflow

Overview

TensorFlow benchmarks

This repository contains various TensorFlow benchmarks. Currently, it consists of two projects:

  1. PerfZero: A benchmark framework for TensorFlow.

  2. scripts/tf_cnn_benchmarks (no longer maintained): The TensorFlow CNN benchmarks contain TensorFlow 1 benchmarks for several convolutional neural networks.

If you want to run TensorFlow models and measure their performance, also consider the TensorFlow Official Models

Comments
  • How can I start a benchmark with `distributed_all_reduce` ?

    How can I start a benchmark with `distributed_all_reduce` ?

    My Env: TensorFlow: 1.3 CUDA: 8.0 cuDNN: 6.0

    I notice an update for distributed_all_reduce so I want to have a try. But I'm not sure what value should controller_host takes... My args are:

    --variable_update=distributed_all_reduce
    --all_reduce_spec=pscpu:32k:xring
    

    and I start 3 processes with args: FIRST:

    --job_name=worker
    --worker_hosts=127.0.0.1:50001,127.0.0.1:50002
    --task_index=0
    

    SECONDE:

    --job_name=worker
    --worker_hosts=127.0.0.1:50001,127.0.0.1:50002
    --task_index=1
    

    THIRD:

    --job_name=controller
    --controller_host=??
    --task_index=0
    

    When I put 127.0.0.1:50000 or 127.0.0.1:50001 on controller_host, I got:

    TensorFlow:  1.3
    Model:       resnet50
    Mode:        training
    SingleSess:  True
    Batch size:  128 global
                 64 per device
    Devices:     ['job:worker/task0/gpu:0', 'job:worker/task1/gpu:0']
    Data format: NCHW
    Optimizer:   sgd
    Variables:   distributed_all_reduce
    AllReduce:   pscpu:32k:xring
    Sync:        True
    ==========
    Generating model
    WARNING:tensorflow:From /home/zzy/workspace/benchmarks/scripts/tf_cnn_benchmarks/preprocessing.py:486: __init__ (from tensorflow.contrib.data.python.ops.readers) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.data.TFRecordDataset`.
    WARNING:tensorflow:From /home/zzy/workspace/benchmarks/scripts/tf_cnn_benchmarks/preprocessing.py:487: range (from tensorflow.contrib.data.python.ops.dataset_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.data.Dataset.range()`.
    WARNING:tensorflow:From /home/zzy/workspace/benchmarks/scripts/tf_cnn_benchmarks/preprocessing.py:489: zip (from tensorflow.contrib.data.python.ops.dataset_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.data.Dataset.zip()`.
    2017-10-10 14:03:34.183287: E tensorflow/core/common_runtime/session.cc:69] Not found: No session factory registered for the given session options: {target: "127.0.0.1:50001" config: intra_op_parallelism_threads: 1 gpu_options { force_gpu_compatible: true } allow_soft_placement: true} Registered factories are {DIRECT_SESSION, GRPC_SESSION}.
    Traceback (most recent call last):
      File "/home/zzy/workspace/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py", line 46, in <module>
        tf.app.run()
      File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
        _sys.exit(main(_sys.argv[:1] + flags_passthrough))
      File "/home/zzy/workspace/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py", line 42, in main
        bench.run()
      File "/home/zzy/workspace/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 892, in run
        return self._benchmark_cnn()
      File "/home/zzy/workspace/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1068, in _benchmark_cnn
        start_standard_services=start_standard_services) as sess:
      File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/contextlib.py", line 17, in __enter__
        return self.gen.next()
      File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
        self.stop(close_summary_writer=close_summary_writer)
      File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
        stop_grace_period_secs=self._stop_grace_secs)
      File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
        six.reraise(*self._exc_info_to_raise)
      File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 953, in managed_session
        start_standard_services=start_standard_services)
      File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 708, in prepare_or_wait_for_session
        init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
      File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 273, in prepare_session
        config=config)
      File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 178, in _restore_checkpoint
        sess = session.Session(self._target, graph=self._graph, config=config)
      File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1482, in __init__
        super(Session, self).__init__(target, graph, config=config)
      File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 622, in __init__
        self._session = tf_session.TF_NewDeprecatedSession(opts, status)
      File "/home/zzy/anaconda2/envs/tf-1.3/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
        c_api.TF_GetCode(self.status.status))
    tensorflow.python.framework.errors_impl.NotFoundError: No session factory registered for the given session options: {target: "127.0.0.1:50001" config: intra_op_parallelism_threads: 1 gpu_options { force_gpu_compatible: true } allow_soft_placement: true} Registered factories are {DIRECT_SESSION, GRPC_SESSION}.
    
    
    opened by sleepfin 27
  • tf_cnn_benchmarks.py does not support --data_dir with my imagenet1k tfrecords

    tf_cnn_benchmarks.py does not support --data_dir with my imagenet1k tfrecords

    I'm using the HEAD of both tensorflow and benchmarks. I can run the tf_cnn_benchmarks.py with synthetic data like this:

    python3 tf_cnn_benchmarks.py --num_batches=100 --display_every=1 --device=cpu --data_format=NHWC --model=trivial --batch_size=64
    

    But when I try to specify my own local data_dir of tfrecords for imagenet1k, it hangs sometime after printing "Running warm up":

    python3 tf_cnn_benchmarks.py --num_batches=100 --display_every=1 --device=cpu --data_format=NHWC --model=trivial --batch_size=64 --data_dir=/n0/ryan/imagenet1k_tfrecord
    TensorFlow:  1.8  
    Model:       trivial
    Dataset:     imagenet
    Mode:        training
    SingleSess:  False
    Batch size:  64 global
                 64.0 per device
    Num batches: 100
    Num epochs:  0.00 
    Devices:     ['/cpu:0']
    Data format: NHWC 
    Layout optimizer: False
    Optimizer:   sgd  
    Variables:   parameter_server
    ==========
    Generating model
    W0530 13:48:44.750849 140466104280896 tf_logging.py:125] From /home/ryan/sandbox/rreece/onboarding-cerebras/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:1611: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please switch to tf.train.MonitoredTrainingSession
    2018-05-30 13:48:44.798403: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
    I0530 13:48:44.929922 140466104280896 tf_logging.py:115] Running local_init_op.
    I0530 13:48:50.095620 140466104280896 tf_logging.py:115] Done running local_init_op.
    Running warm up
    

    and then it hangs.

    Any ideas how I can debug using my own local dataset?

    I noticed these seemingly related closed issues: #150 and #176, but they do not seem to be hanging at the same place tf_cnn_benchmarks.py does for me.

    Thanks!

    opened by rreece 25
  • FP16 support in the benchmark

    FP16 support in the benchmark

    Hi @tfboyd, I saw the benchmark has --use_fp16 flag now. So does the benchmark and the latest TensorFlow support FP16 now? Can we do the test on Volta GPUs? Thanks.

    opened by renganxu 24
  • Same build for training and validation

    Same build for training and validation

    It would be very interesting to read both training and validation data from minibatch function. It's really flexible to evaluate on validation set during training (for example pause training every 3 epochs and evaluate on validation). This is very demanding because the build of training and validation should be unified.

    opened by chrisrn 19
  • large drop in performance with TF 1.6 and newer on ResNet50 (CPU-only)

    large drop in performance with TF 1.6 and newer on ResNet50 (CPU-only)

    I'm seeing a pretty significant performance regression when going from TF 1.4.1/1.5.0 to 1.6.0/1.7.0/1.8.0 using the ResNet50 benchmark included in this repository:

    python tf_cnn_benchmarks.py --device cpu --batch_size=32 --model=resnet50 --variable_update=parameter_server --data_format NHWC
    

    System details:

    • single node, 2x 10-core Intel Xeon E5-2660 v3 @ 2.60GHz (Intel Haswell architecture)
    • CentOS 7.4.1708
    • Python 3.6.4 (self-compiled)
    • TensorFlow built from source with Bazel and GCC 6.4.0 using -march=native (via EasyBuild)

    Results using ResNet50 from 82dd0539c76afa8491e50d8f796e686b4d97b988 (values are reported total images/sec)

    • TF 1.4.1: 5.41
    • TF 1.5.0: 5.26
    • TF 1.6.0: 2.27 (2.4x slower than TF 1.4.1)
    • TF 1.7.0: 2.26
    • TF 1.8.0: 3.93 (1.4x slower than TF 1.4.1)

    (I can reproduce a similar performance trend on other systems too)

    While looking for a possible cause for this, I bumped into #137 which discusses performance regressions that were introduced exactly at 82dd0539c76afa8491e50d8f796e686b4d97b988 (what a coincidence?!), so I re-ran some of the tests with the commit right before that (f5d85aef2851881001130b28385795bc4c59fa38), but that pretty much shows the same trend:

    • TF 1.4.1: 5.14
    • TF 1.6.0: 2.23 (2.3x slower than TF 1.4.1)
    • TF 1.8.0: 3.89 (1.3x slower than TF 1.4.1)

    That seems to suggest that #137 isn't relevant to what I'm seeing...

    So, I tried again with current master of this repo (542d590bbd2a2740c19f196ea672451957170fc6), and then things got even weirder... It seems like current master only works with TF 1.8.0 (with 1.6.0 & older I get an ImportError on threadpool)

    • TF 1.8.0: 2.19 => 77s slower than with 82dd0539c76afa8491e50d8f796e686b4d97b988 or f5d85aef2851881001130b28385795bc4c59fa38

    Note how the benchmark is now significantly slower with TF 1.8.0, which puts us even further away again from the 5.4 I saw with TF 1.4.1.

    So, now I'm a bit confused... I see a couple of possible explanations:

    • The ResNet50 benchmark included in this repository is not well suited/stable enough for comparing performance across TF versions; if it's not, that's fine, but then maybe that should be clearly stated somewhere? If it's not, is there a better alternative?
    • Some serious performance regression was introduced in TF 1.6.0, which seems to be partially fixed in TF 1.8.0 (or not, based on results with master?)
    • Something is going wrong with installing from source of TF 1.6.0 & newer. Given the complexity of a from-source installation of TF, this wouldn't surprise me, but I couldn't seem to find anything that make explain the large performance differences...

    The latter two don't explain the drop in performance with current master using TF 1.8.0...

    Any ideas on what's going on here?

    opened by boegel 17
  • Add Horovod support

    Add Horovod support

    I propose to add Horovod support to TF benchmarks, so it's always up to date with the latest TensorFlow innovations.

    Usage: --variable_update horovod [--horovod_device gpu/cpu]

    cla: yes 
    opened by alsrgv 17
  • Running tf_cnn_benchmarks.py

    Running tf_cnn_benchmarks.py

    Hello,

    I have copy the benchmarks folder under tensorflow directory.

    (tensorflow) root@P50:/opt/DL/tensorflow# ls -all total 28 drwxr-xr-x 6 root root 4096 oct 22 13:00 . drwxr-xr-x 5 root root 4096 oct 22 16:53 .. drwxr-xr-x 8 root root 4096 oct 22 13:00 benchmarks drwxr-xr-x 2 root root 4096 oct 22 12:53 bin drwxr-xr-x 2 root root 4096 oct 22 12:50 include drwxr-xr-x 3 root root 4096 oct 22 12:50 lib -rw-r--r-- 1 root root 60 oct 22 12:50 pip-selfcheck.json

    When trying to run tf_cnn_benchmark I am getting this error:

    (tensorflow) root@P50:/opt/DL/tensorflow/benchmarks/scripts# python3 tf_cnn_benchmarks.py --local_parameter_device=cpu --num_gpus=1 --batch_size=16 --model=inception3 --data_dir=/opt/DL/imagenet/datasets/ --variable_update=parameter_server --nodistortions Traceback (most recent call last): File "tf_cnn_benchmarks.py", line 26, in import benchmark_cnn File "/opt/DL/tensorflow/benchmarks/scripts/benchmark_cnn.py", line 41, in import cnn_util File "/opt/DL/tensorflow/benchmarks/scripts/cnn_util.py", line 40 print log ^ SyntaxError: Missing parentheses in call to 'print' (tensorflow) root@P50:/opt/DL/tensorflow/benchmarks/scripts#

    Do I need to do something else before running the benchmark?

    Thank you, Florin

    opened by fmoo7 17
  • Distributed performance on better GPUs?

    Distributed performance on better GPUs?

    Thanks very much for publishing the code. With this benchmark I've seen very good GPU utilization with single-machine multi-GPU training, however I found that distributed training doesn't scale very well.

    The published distributed benchmark performance were only on K80s, so the communication overhead might be less of a problem there. However TitanX/M40 is about twice faster than it, and P100 is about 4x faster, and V100 would be ..

    In more details: Tensorflow version: commit d101472296f88 compiled manually (with -march=native) Python 2.7, cuda 8.0.44, cudnn 5.1 GPU: 4 Tesla M40s per machine Latency between the two machines: 0.06~0.08ms given by ping Bandwidth: 9.3Gbit/s given by iperf

    Speed numbers (all with resnet50, batch 64 per GPU): Single machine: (variable_update=parameter_server) 1GPU: 111 im/s -> 4GPU: 432 im/s Two machines (variable_update=distributed_replicated): 2x4=8GPU: only 561 im/s

    Hope to see some more improvements on it!

    opened by ppwwyyxx 17
  • missing 1 required positional argument: 'output_types'

    missing 1 required positional argument: 'output_types'

    Hi all, I am running the tensorflow benchmarks with the below software/hardware but getting the error TypeError: function_buffering_resource() missing 1 required positional argument: 'output_types' when running the test with imagenet dataset

    Ubuntu 16.04.4 LTS (GNU/Linux 4.4.0-112-generic x86_64) nvidia-driver: 384.111
    Platform: DGX-1

    python3 tf_cnn_benchmarks.py --device=gpu --use_fp16=True --data_dir=/data/imagenet_tfrecord/train --data_name=imagenet --model=vgg16 --batch_size=32 --num_gpus=8

    Output TensorFlow: 1.10 Model: vgg16 Dataset: imagenet Mode: training SingleSess: False Batch size: 256 global 32.0 per device Num batches: 100 Num epochs: 0.02 Devices: ['/gpu:0', '/gpu:1', '/gpu:2', '/gpu:3', '/gpu:4', '/gpu:5', '/gpu:6', '/gpu:7'] Data format: NCHW Layout optimizer: False Optimizer: sgd Variables: parameter_server

    Generating model Traceback (most recent call last): File "tf_cnn_benchmarks.py", line 60, in app.run(main) # Raises error on invalid flags, unlike tf.app.run() File "/usr/local/lib/python3.5/dist-packages/absl/app.py", line 274, in run _run_main(main, args) File "/usr/local/lib/python3.5/dist-packages/absl/app.py", line 238, in _run_main sys.exit(main(argv)) File "tf_cnn_benchmarks.py", line 56, in main bench.run() File "/data/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1285, in run return self._benchmark_cnn() File "/data/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1404, in _benchmark_cnn self._build_model_with_dataset_prefetching()) File "/data/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1806, in _build_model_with_dataset_prefetching self.cpu_device, self.params, self.devices, self.dataset) File "/data/benchmarks/scripts/tf_cnn_benchmarks/data_utils.py", line 58, in build_prefetch_image_processing shared_name=None)

    TypeError: function_buffering_resource() missing 1 required positional argument: 'output_types'

    some tips?

    opened by vilmara 14
  • collective_ops removed from tensorflow 1.7 / 1.7.1, but used in benchmarks?

    collective_ops removed from tensorflow 1.7 / 1.7.1, but used in benchmarks?

    Hi

    Tried to run, but no collective_ops in tensorflow-gpu. Used inside allreduce.py What am I doing wrong? is it remove indeed?

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=densenet100_k24 --variable_update=parameter_server  1>densenet100_k24.txt
    
    Traceback (most recent call last):
      File "tf_cnn_benchmarks.py", line 27, in <module>
        import benchmark_cnn
      File "XXX\scripts\tf_cnn_benchmarks\benchmark_cnn.py", line 51, in <module>
        import variable_mgr
      File "XXX\scripts\tf_cnn_benchmarks\variable_mgr.py", line 25, in <module>
        import allreduce
      File "XXX\scripts\tf_cnn_benchmarks\allreduce.py", line 28, in <module>
        from tensorflow.python.ops import collective_ops
    ImportError: cannot import name 'collective_ops'
    
    opened by eddr 13
  • Benchmark hangs for non syntetic data

    Benchmark hangs for non syntetic data

    I tried to run

    # VGG16 training ImageNet with 8 GPUs using arguments that optimize for
    # Google Compute Engine.
    python tf_cnn_benchmarks.py --local_parameter_device=cpu --num_gpus=1 \
    --batch_size=32 --model=vgg16 --data_dir=/home/ubuntu/flowers \
    --variable_update=parameter_server --nodistortions
    

    And the data dir has the TF Records inside, generated with bazel as in the models/inception/data tutorial

    -rw-rwx--- 1  40 May 11 11:43 labels.txt
    drwxrwx--- 7 4096 May 12 11:45 train
    -rw-rwx--- 1  102419300 May 11 11:43 train-00000-of-00002
    -rw-rwx--- 1   99116804 May 11 11:43 train-00001-of-00002
    drwxrwx--- 7  4096 May 12 11:45 validation
    -rw-rwx--- 1  16058779 May 11 11:43 validation-00000-of-00002
    -rw-rwx--- 1  15919237 May 11 11:43 validation-00001-of-00002
    

    And it hangs like this:

    TensorFlow:  1.1
    Model:       vgg16
    Mode:        training
    Batch size:  32 global
                 32.0 per device
    Devices:     ['/gpu:0']
    Data format: NCHW
    Optimizer:   sgd
    Variables:   parameter_server
    ==========
    Generating model
    2017-05-12 11:57:30.357629: I tensorflow/core/common_runtime/gpu/gpu_device.cc:900] Found device 0 with properties:
    ....
    pciBusID 0002:01:00.0
    Total memory: 15.89GiB
    Free memory: 15.61GiB
    2017-05-12 11:57:30.357680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:921] DMA: 0
    2017-05-12 11:57:30.357690: I tensorflow/core/common_runtime/gpu/gpu_device.cc:931] 0:   Y
    2017-05-12 11:57:30.357707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0002:01:00.0)
    

    But for syntatic data it works. Any idea how to fix this?

    opened by ghost 13
  • perfzero resnet benchmark is outdated

    perfzero resnet benchmark is outdated

    Due to refactoring in the tf models repo, the estimator benchmarks seem to have moved or no longer exist. For example, running the following command

    $ python3 lib/benchmark.py \
    --git_repos="https://github.com/tensorflow/models.git;benchmark" \
    --python_path=models \
    --benchmark_methods=official.r1.resnet.estimator_benchmark.Resnet50EstimatorBenchmarkSynth.benchmark_graph_1_gpu \
    --gcloud_key_file_url=""
    

    produces ModuleNotFoundError: No module named 'official.r1'. I've tried to change the path to something else -- using official.legacy.image_classification.resnet makes some progress -- but I haven't been able to figure it out.

    opened by vladfi1 2
  • Alternative/current state of tf_cnn_benchmark

    Alternative/current state of tf_cnn_benchmark

    Hello community and devs,

    a quick question from my side. I see that tf_cnn_benchmark is no longer actively maintained. I see that this makes sense to reduce the code volume that requires compatibility with future tf versions. But I would like to understand if this poses a severe issue in using the benchmark in the upcoming time. Is the code known to be incompatible or not achieve the expected performance when using for instance tf2.8?

    In other words: Is this tf_cnn_benchmark still in good use and only the promise to continue developing and maintaining the code missing? Or is it already outdated?

    And the documentation points towards to the new TF2 models for benchmarking. Are you aware of an implementation of an actual benchmark based on the models that could be an alternative?

    Would be happy to get a reply. Cheers Stefan

    opened by kessel 3
  • How to evaluate worker performance independently on a distributed training

    How to evaluate worker performance independently on a distributed training

    Hi

    I'm trying to evaluate the performance of each worker independently in a cluster with multiple machines while training them using the same model. My goal is to record each worker training performance.

    Every setup and config that I try I always get the same time for all workers (probably because of synchronization issues). So, even if one of my workers is a machine that is 4x faster, it would still record the same time as the slowest machine in the cluster.

    Anyone has any idea how can I do that?

    opened by delucca 0
  • Perfzero support for Openshift on RHEL

    Perfzero support for Openshift on RHEL

    I am trying to run perfzero on Openshift/RHEL platform. Getting error while building docker image. Need to know whether perfzero supports for openshift platfom ? can anyone help me on this ?

    opened by AkashSky5 0
  • resnet50  --use_fp16 error: cuDNN launch failure : input shape ([128,112,112,64])

    resnet50 --use_fp16 error: cuDNN launch failure : input shape ([128,112,112,64])

    3090 and following versions: Windows 10 python 3.9.5 tensorflow 2.5 CUDA 11.2.2 (path set) CuDNN 8.1

    fp32 works: python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=resnet50 --variable_update=parameter_server

    fp16 not: python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=resnet50 --variable_update=parameter_server --use_fp16 error: Internal: cuDNN launch failure : input shape ([128,112,112,64])

    opened by drnefischer 0
  • The accuracy of the program running by horovod is low

    The accuracy of the program running by horovod is low

    When I run the program with "python tf_cnn_benchmarks.py --data_format=NCHW --batch_size=256
    --model=resnet50 --optimizer=momentum --variable_update=replicated
    --nodistortions --gradient_repacking=8 --num_gpus=8
    --num_epochs=90 --weight_decay=1e-4 --data_dir=${DATA_DIR} --use_fp16
    --train_dir=${CKPT_DIR}". The final test accuracy is 75.96.% But I run the program with " horovodrun -np 8 python tf_cnn_benchmarks.py --data_format=NCHW --batch_size=256
    --model=resnet50 --optimizer=momentum --variable_update=horovod
    --nodistortions --gradient_repacking=8 --num_gpus=8
    --num_epochs=90 --weight_decay=1e-4 --data_dir=${DATA_DIR} --use_fp16
    --train_dir=${CKPT_DIR}". The final test accuracy is 74%. Is this a normal result? or This is error that I run the program with horovod. Looking forward to your reply .Thank you

    opened by lljjgg 0
Owner
null
TensorFlow (v2.7.0) benchmark results on an M1 Macbook Air 2020 laptop (macOS Monterey v12.1).

M1-tensorflow-benchmark TensorFlow (v2.7.0) benchmark results on an M1 Macbook Air 2020 laptop (macOS Monterey v12.1). I was initially testing if Tens

particle 2 Jan 5, 2022
ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

Hao Su's Lab, UCSD 48 Dec 30, 2022
ImageNet-CoG is a benchmark for concept generalization. It provides a full evaluation framework for pre-trained visual representations which measure how well they generalize to unseen concepts.

The ImageNet-CoG Benchmark Project Website Paper (arXiv) Code repository for the ImageNet-CoG Benchmark introduced in the paper "Concept Generalizatio

NAVER 23 Oct 9, 2022
Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy. Now with tensorflow 1.0 support. Evaluation usa

Marcel R. 349 Aug 6, 2022
TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

null 2.6k Jan 4, 2023
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Peter Lin 6.5k Jan 4, 2023
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting (RVM) English | 中文 Official repository for the paper Robust High-Resolution Video Matting with Temporal Guidance. RVM is specific

flow-dev 2 Aug 21, 2022
A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation

Aboleth A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation [1] with stochastic gradient variational Bayes

Gradient Institute 127 Dec 12, 2022
KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

null 80 Dec 27, 2022
Curvlearn, a Tensorflow based non-Euclidean deep learning framework.

English | 简体中文 Why Non-Euclidean Geometry Considering these simple graph structures shown below. Nodes with same color has 2-hop distance whereas 1-ho

Alibaba 123 Dec 12, 2022
Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

null 2.7k Jan 5, 2023
A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

P-Lambda 437 Dec 30, 2022
DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

The DeepMind Alchemy environment is a meta-reinforcement learning benchmark that presents tasks sampled from a task distribution with deep underlying structure.

DeepMind 188 Dec 25, 2022
OpenMMLab Detection Toolbox and Benchmark

MMDetection is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project.

OpenMMLab 22.5k Jan 5, 2023
[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark Accepted as a spotlight paper at ICLR 2021. Table of content File structure Prerequi

null 72 Jan 3, 2023
Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)

Causality In Traffic Accident (Under Construction) Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020) Overview Data Prepa

Tackgeun 21 Nov 20, 2022
Open-L2O: A Comprehensive and Reproducible Benchmark for Learning to Optimize Algorithms

Open-L2O This repository establishes the first comprehensive benchmark efforts of existing learning to optimize (L2O) approaches on a number of proble

VITA 161 Jan 2, 2023
A code repository associated with the paper A Benchmark for Rough Sketch Cleanup by Chuan Yan, David Vanderhaeghe, and Yotam Gingold from SIGGRAPH Asia 2020.

A Benchmark for Rough Sketch Cleanup This is the code repository associated with the paper A Benchmark for Rough Sketch Cleanup by Chuan Yan, David Va

null 33 Dec 18, 2022