ColossalAI-Benchmark - Performance benchmarking with ColossalAI

Overview

Benchmark for Tuning Accuracy and Efficiency

Overview

The benchmark includes our efforts in using Colossal-AI to train different tasks to achieve SOTA results. We are interested in both validataion accuracy and training speed, and prefer larger batch size to take advantage of more GPU devices. For example, we trained vision transformer with batch size 512 on CIFAR10 and 4096 on ImageNet1k, which are basically not used in existing works. Some of the results in the benchmark trained with 8x A100 are shown below.

Task Model Training Time Top-1 Accuracy
CIFAR10 ViT-Lite-7/4 ~ 16 min ~ 90.5%
ImageNet1k ViT-S/16 ~ 16.5 h ~ 74.5%

The train.py script in each task runs training with the specific configuration script in configs/ for different parallelisms. Supported parallelisms include data parallel only (ends with vanilla), 1D (ends with 1d), 2D (ends with 2d), 2.5D (ends with 2p5d), 3D (ends with 3d).

Each configuration scripts basically includes the following elements, taking ImageNet1k task as example:

TOTAL_BATCH_SIZE = 4096
LEARNING_RATE = 3e-3
WEIGHT_DECAY = 0.3

NUM_EPOCHS = 300
WARMUP_EPOCHS = 32

# data parallel only
TENSOR_PARALLEL_SIZE = 1    
TENSOR_PARALLEL_MODE = None

# parallelism setting
parallel = dict(
    pipeline=1,
    tensor=dict(mode=TENSOR_PARALLEL_MODE, size=TENSOR_PARALLEL_SIZE),
)

fp16 = dict(mode=AMP_TYPE.TORCH, ) # amp setting

gradient_accumulation = 2 # accumulate 2 steps for gradient update

BATCH_SIZE = TOTAL_BATCH_SIZE // gradient_accumulation # actual batch size for dataloader

clip_grad_norm = 1.0 # clip gradient with norm 1.0

Upper case elements are basically what train.py needs, and lower case elements are what Colossal-AI needs to initialize the training.

Usage

To start training, use the following command to run each worker:

$ DATA=/path/to/dataset python train.py --world_size=WORLD_SIZE \
                                        --rank=RANK \
                                        --local_rank=LOCAL_RANK \
                                        --host=MASTER_IP_ADDRESS \
                                        --port=MASTER_PORT \
                                        --config=CONFIG_FILE

It is also recommended to start training with torchrun as:

$ DATA=/path/to/dataset torchrun --nproc_per_node=NUM_GPUS_PER_NODE \
                                 --nnodes=NUM_NODES \
                                 --node_rank=NODE_RANK \
                                 --master_addr=MASTER_IP_ADDRESS \
                                 --master_port=MASTER_PORT \
                                 train.py --config=CONFIG_FILE
Comments
  • install problem: installing in NUS HPC, but the GCC is old, and can not install the latest colossalai

    install problem: installing in NUS HPC, but the GCC is old, and can not install the latest colossalai

    πŸ› Describe the bug

    I am installing the latest colossalai using the python setup.py install command, and the NUS HPC prompt some errors like: subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. #error "You're running a too old version of GCC. We need GCC 5 or later." image

    If I use pip install colossalai, another error happens! image Well, pip install need Internet, and if I am using GPU in NUS, no internet can be accessed, so I have to use python setup.py install, but the GCC version is too old...

    Lucky for me (maybe unlucky for other students using the server), I saved the old version's colossalai....

    Thank you!!!

    Environment

    No response

    opened by Arsmart123 10
  • Add DeepSpeed ZeRO Init Context

    Add DeepSpeed ZeRO Init Context

    Describe the feature

    Zero init context of DeepSpeed is not provided now. Let's add this feature so taht we can benchmark larger models, but need to take care of numel count.

    opened by FrankLeeeee 1
  • Automate submodule commit update

    Automate submodule commit update

    The submodule in ColossalAI should be update its commit ID when there is any update in this repository. This may be done via github action and we should definitely automate this process to save some trouble.

    opened by FrankLeeeee 1
  • Hotfix/flops profiler & GPT dataset

    Hotfix/flops profiler & GPT dataset

    • added synthetic gpt dataset To enable it, add "synthetic": true under "hyperparameter" in your json configuration file (please refer to torch_utils/gpt2_config.json)
    • disabled gpt tokenization caching
    • updated flops&params profiler with deepspeed version
    opened by kurisusnowdeng 0
  • Where is model_zoo module?

    Where is model_zoo module?

    πŸ› Describe the bug

    I try to train vision transformer following instructions in README.md. However, it throws an error in imagenet1k/train.py that there is no module named model_zoo. Corresponding code is from model_zoo.vit import vit_small_patch16_224. I tried to find this module in all repos in hpcaitech organization and required python wheels, but nothing was found.

    My training script is DATA=../dataset/tfrecord torchrun --nproc_per_node=8 train.py --config=configs/vit_vanilla.py , which is executed in imagenet1k folder.

    Environment

    CUDA 11.3 Torch 1.12.1 Torchvision 0.13.1

    opened by GhostScreaming 0
  • Prepared a demo dataset for GPT performance benchmarking

    Prepared a demo dataset for GPT performance benchmarking

    πŸ› Describe the bug

    As a place to show the best practice for users, I believe it is necessary to help users to skip the annoying dataset preparation stage.

    Environment

    No response

    opened by feifeibear 0
  • README of Benchmark is not clear and misleading.

    README of Benchmark is not clear and misleading.

    In 'Usage' part, the first command needs launchers, eg. OpenMPI, but this is not mentioned. It's easy to mislead newbies to waste time and effort if they are running on their local machine.

    Some parameters in the second command seem not necessary, eg. I can run the example by the following command. DATA=/data/cifar-10 torchrun --nproc_per_node=2 --master_port=29501 train.py --config=configs/vit_vanilla.py

    In addition, it's hard for newbies to know what content they should provide for those parameters. eg. how to know the RANK, IP_ADDRESS and PORT. It would be better if you can provide some explanation and example.

    documentation 
    opened by binmakeswell 0
  • Align benchmark with the others

    Align benchmark with the others

    Hello, thanks for the wonderful project. Did you consider aligning the results with some commonly used ones? https://github.com/mlcommons/training https://github.com/Oneflow-Inc/DLPerf

    opened by feifeibear 0
Owner
HPC-AI Tech
We are a global team to help you train and deploy your AI models
HPC-AI Tech
Pip-package for trajectory benchmarking from "Be your own Benchmark: No-Reference Trajectory Metric on Registered Point Clouds", ECMR'21

Map Metrics for Trajectory Quality Map metrics toolkit provides a set of metrics to quantitatively evaluate trajectory quality via estimating consiste

Mobile Robotics Lab. at Skoltech 31 Oct 28, 2022
FedScale: Benchmarking Model and System Performance of Federated Learning

FedScale: Benchmarking Model and System Performance of Federated Learning (Paper) This repository contains scripts and instructions of building FedSca

null 268 Jan 1, 2023
Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.

Heterogeneous Graph Benchmark Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks. Roadmap We organize our repo by task, and on

THUDM 176 Dec 17, 2022
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System This repository contains the PyTorch im

Libo Qin 25 Sep 6, 2022
RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

The first comprehensive Robustness investigation benchmark on large-scale dataset ImageNet regarding ARchitecture design and Training techniques towards diverse noises.

null 132 Dec 23, 2022
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Libo Qin 12 Sep 26, 2021
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking We revisit and address issues with Oxford 5k and Paris 6k image retrieval benchm

Filip Radenovic 188 Dec 17, 2022
Molecular Sets (MOSES): A benchmarking platform for molecular generation models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

Neelesh C A 3 Oct 14, 2022
Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

B-Pref Official codebase for B-Pref: Benchmarking Preference-BasedReinforcement Learning contains scripts to reproduce experiments. Install conda env

null 48 Dec 20, 2022
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

MOSES 656 Dec 29, 2022
Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions"

ModelNet-C Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions". For the latest updates, see: sites.google.com

Jiawei Ren 45 Dec 28, 2022
Evaluation and Benchmarking of Speech Super-resolution Methods

Speech Super-resolution Evaluation and Benchmarking What this repo do: A toolbox for the evaluation of speech super-resolution algorithms. Unify the e

Haohe Liu (刘濠衫) 84 Dec 20, 2022
Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions This repo contains the dataset and code for the paper Benchmarking Ro

Jiachen Sun 168 Dec 29, 2022
A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

P-Lambda 437 Dec 30, 2022
DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

The DeepMind Alchemy environment is a meta-reinforcement learning benchmark that presents tasks sampled from a task distribution with deep underlying structure.

DeepMind 188 Dec 25, 2022
OpenMMLab Detection Toolbox and Benchmark

MMDetection is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project.

OpenMMLab 22.5k Jan 5, 2023
[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark Accepted as a spotlight paper at ICLR 2021. Table of content File structure Prerequi

null 72 Jan 3, 2023
Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)

Causality In Traffic Accident (Under Construction) Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020) Overview Data Prepa

Tackgeun 21 Nov 20, 2022