ColossalAI-Benchmark - Performance benchmarking with ColossalAI

HPC-AI Tech

Last update: Oct 7, 2022

Related tags

Deep Learning ColossalAI-Benchmark

Overview

Benchmark for Tuning Accuracy and Efficiency

Overview

The benchmark includes our efforts in using Colossal-AI to train different tasks to achieve SOTA results. We are interested in both validataion accuracy and training speed, and prefer larger batch size to take advantage of more GPU devices. For example, we trained vision transformer with batch size 512 on CIFAR10 and 4096 on ImageNet1k, which are basically not used in existing works. Some of the results in the benchmark trained with 8x A100 are shown below.

Task	Model	Training Time	Top-1 Accuracy
CIFAR10	ViT-Lite-7/4	~ 16 min	~ 90.5%
ImageNet1k	ViT-S/16	~ 16.5 h	~ 74.5%

The train.py script in each task runs training with the specific configuration script in configs/ for different parallelisms. Supported parallelisms include data parallel only (ends with vanilla), 1D (ends with 1d), 2D (ends with 2d), 2.5D (ends with 2p5d), 3D (ends with 3d).

Each configuration scripts basically includes the following elements, taking ImageNet1k task as example:

TOTAL_BATCH_SIZE = 4096
LEARNING_RATE = 3e-3
WEIGHT_DECAY = 0.3

NUM_EPOCHS = 300
WARMUP_EPOCHS = 32

# data parallel only
TENSOR_PARALLEL_SIZE = 1    
TENSOR_PARALLEL_MODE = None

# parallelism setting
parallel = dict(
    pipeline=1,
    tensor=dict(mode=TENSOR_PARALLEL_MODE, size=TENSOR_PARALLEL_SIZE),
)

fp16 = dict(mode=AMP_TYPE.TORCH, ) # amp setting

gradient_accumulation = 2 # accumulate 2 steps for gradient update

BATCH_SIZE = TOTAL_BATCH_SIZE // gradient_accumulation # actual batch size for dataloader

clip_grad_norm = 1.0 # clip gradient with norm 1.0

Upper case elements are basically what train.py needs, and lower case elements are what Colossal-AI needs to initialize the training.

Usage

To start training, use the following command to run each worker:

$ DATA=/path/to/dataset python train.py --world_size=WORLD_SIZE \
                                        --rank=RANK \
                                        --local_rank=LOCAL_RANK \
                                        --host=MASTER_IP_ADDRESS \
                                        --port=MASTER_PORT \
                                        --config=CONFIG_FILE

It is also recommended to start training with torchrun as:

$ DATA=/path/to/dataset torchrun --nproc_per_node=NUM_GPUS_PER_NODE \
                                 --nnodes=NUM_NODES \
                                 --node_rank=NODE_RANK \
                                 --master_addr=MASTER_IP_ADDRESS \
                                 --master_port=MASTER_PORT \
                                 train.py --config=CONFIG_FILE

Comments

install problem: installing in NUS HPC, but the GCC is old, and can not install the latest colossalai

🐛 Describe the bug

I am installing the latest colossalai using the python setup.py install command, and the NUS HPC prompt some errors like: subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. #error "You're running a too old version of GCC. We need GCC 5 or later."

If I use pip install colossalai, another error happens! Well, pip install need Internet, and if I am using GPU in NUS, no internet can be accessed, so I have to use python setup.py install, but the GCC version is too old...

Lucky for me (maybe unlucky for other students using the server), I saved the old version's colossalai....

Thank you!!!

Environment

No response

opened by Arsmart123 10
Add DeepSpeed ZeRO Init Context

Describe the feature

Zero init context of DeepSpeed is not provided now. Let's add this feature so taht we can benchmark larger models, but need to take care of numel count.

opened by FrankLeeeee 1
Automate submodule commit update

The submodule in ColossalAI should be update its commit ID when there is any update in this repository. This may be done via github action and we should definitely automate this process to save some trouble.

opened by FrankLeeeee 1
Hotfix/flops profiler & GPT dataset
added synthetic gpt dataset To enable it, add "synthetic": true under "hyperparameter" in your json configuration file (please refer to torch_utils/gpt2_config.json)

disabled gpt tokenization caching

updated flops&params profiler with deepspeed version
opened by kurisusnowdeng 0
Where is model_zoo module?

🐛 Describe the bug

I try to train vision transformer following instructions in README.md. However, it throws an error in imagenet1k/train.py that there is no module named model_zoo. Corresponding code is from model_zoo.vit import vit_small_patch16_224. I tried to find this module in all repos in hpcaitech organization and required python wheels, but nothing was found.

My training script is DATA=../dataset/tfrecord torchrun --nproc_per_node=8 train.py --config=configs/vit_vanilla.py , which is executed in imagenet1k folder.

Environment

CUDA 11.3 Torch 1.12.1 Torchvision 0.13.1

opened by GhostScreaming 0
Prepared a demo dataset for GPT performance benchmarking

🐛 Describe the bug

As a place to show the best practice for users, I believe it is necessary to help users to skip the annoying dataset preparation stage.

Environment

No response

opened by feifeibear 0
README of Benchmark is not clear and misleading.

In 'Usage' part, the first command needs launchers, eg. OpenMPI, but this is not mentioned. It's easy to mislead newbies to waste time and effort if they are running on their local machine.

Some parameters in the second command seem not necessary, eg. I can run the example by the following command. DATA=/data/cifar-10 torchrun --nproc_per_node=2 --master_port=29501 train.py --config=configs/vit_vanilla.py

In addition, it's hard for newbies to know what content they should provide for those parameters. eg. how to know the RANK, IP_ADDRESS and PORT. It would be better if you can provide some explanation and example.
documentation

opened by binmakeswell 0
Align benchmark with the others

Hello, thanks for the wonderful project. Did you consider aligning the results with some commonly used ones? https://github.com/mlcommons/training https://github.com/Oneflow-Inc/DLPerf

opened by feifeibear 0

Owner

HPC-AI Tech

We are a global team to help you train and deploy your AI models

GitHub

Pip-package for trajectory benchmarking from "Be your own Benchmark: No-Reference Trajectory Metric on Registered Point Clouds", ECMR'21

Map Metrics for Trajectory Quality Map metrics toolkit provides a set of metrics to quantitatively evaluate trajectory quality via estimating consiste

31 Oct 28, 2022

FedScale: Benchmarking Model and System Performance of Federated Learning

FedScale: Benchmarking Model and System Performance of Federated Learning (Paper) This repository contains scripts and instructions of building FedSca

268 Jan 1, 2023

This is the official repository for evaluation on the NoW Benchmark Dataset. The goal of the NoW benchmark is to introduce a standard evaluation metric to measure the accuracy and robustness of 3D face reconstruction methods from a single image under variations in viewing angle, lighting, and common occlusions.

NoW Evaluation This is the official repository for evaluation on the NoW Benchmark Dataset. The goal of the NoW benchmark is to introduce a standard e

71 Dec 30, 2022

Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.

Heterogeneous Graph Benchmark Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks. Roadmap We organize our repo by task, and on

176 Dec 17, 2022

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System This repository contains the PyTorch im

25 Sep 6, 2022

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

The first comprehensive Robustness investigation benchmark on large-scale dataset ImageNet regarding ARchitecture design and Training techniques towards diverse noises.

132 Dec 23, 2022

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

12 Sep 26, 2021

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking We revisit and address issues with Oxford 5k and Paris 6k image retrieval benchm

188 Dec 17, 2022

Molecular Sets (MOSES): A benchmarking platform for molecular generation models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

3 Oct 14, 2022

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

B-Pref Official codebase for B-Pref: Benchmarking Preference-BasedReinforcement Learning contains scripts to reproduce experiments. Install conda env

48 Dec 20, 2022

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

656 Dec 29, 2022

Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions"

ModelNet-C Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions". For the latest updates, see: sites.google.com

45 Dec 28, 2022

Evaluation and Benchmarking of Speech Super-resolution Methods

Speech Super-resolution Evaluation and Benchmarking What this repo do: A toolbox for the evaluation of speech super-resolution algorithms. Unify the e

84 Dec 20, 2022

Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions This repo contains the dataset and code for the paper Benchmarking Ro

168 Dec 29, 2022

ColossalAI-Benchmark - Performance benchmarking with ColossalAI

Related tags

Overview

Benchmark for Tuning Accuracy and Efficiency

Overview

Usage

Comments

🐛 Describe the bug

Environment

Describe the feature

🐛 Describe the bug

Environment

🐛 Describe the bug

Environment

Owner

HPC-AI Tech

Pip-package for trajectory benchmarking from "Be your own Benchmark: No-Reference Trajectory Metric on Registered Point Clouds", ECMR'21

FedScale: Benchmarking Model and System Performance of Federated Learning

Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Molecular Sets (MOSES): A benchmarking platform for molecular generation models

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions"

Evaluation and Benchmarking of Speech Super-resolution Methods

Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

OpenMMLab Detection Toolbox and Benchmark

[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)