OSLO: Open Source framework for Large-scale transformer Optimization

TUNiB

Last update: Nov 24, 2022

Related tags

Deep Learning oslo

Overview

O S L O

Open Source framework for Large-scale transformer Optimization

What's New:

December 21, 2021 Released OSLO 1.0.

What is OSLO about?

OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

Installation

OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

pip install oslo-core

Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

If the C++ is available:

CPP_AVAILABLE=1 pip install oslo-core

If the C++ is not available:

CPP_AVAILABLE=0 pip install oslo-core

Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

Key Features

import deepspeed 
from oslo import GPTJForCausalLM

# 1. 3D Parallelism
model = GPTJForCausalLM.from_pretrained_with_parallel(
    "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2,
)

# 2. Kernel Fusion
model = model.fuse()

# 3. DeepSpeed Support
engines = deepspeed.initialize(
    model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ...,
)

# 4. Data Processing
from oslo import (
    DatasetPreprocessor, 
    DatasetBlender, 
    DatasetForCausalLM, 
    ...    
)

OSLO offers the following features.

3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.
Kernel Fusion: A GPU optimization method to increase training and inference speed.
DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.
Data Processing: Various utilities for efficient large-scale data processing.

See USAGE.md to learn how to use them.

Administrative Notes

Citing OSLO

If you find our work useful, please consider citing:

@misc{oslo,
  author       = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong},
  title        = {OSLO: Open Source framework for Large-scale transformer Optimization},
  howpublished = {\url{https://github.com/tunib-ai/oslo}},
  year         = {2021},
}

Licensing

The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

Acknowledgements

The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).

Comments

[WIP] Implement ZeRO Stage 3 (FSDP)
Title

Implement ZeRO Stage 3 (FullyShardedDataParallel)

Description

[x] Add reduce_scatter_bucketer.py

[x] Add test_reduce_scatter_bucketer.py

[x] Add flatten_params_wrapper.py

[x] Add test_flatten_params_wrapper.py

[x] Add containers.py

[x] Add test_containers.py

[x] Add parallel.py

[x] Add test_parallel.py

[x] Add fsdp_optim_utils.py

[x] Update fsdp.py

[x] Add auto_wrap.py

[x] Add test_wrap.py
opened by jinok2im 9
FusedAdam & CPUAdam
Title

-FusedAdam & CPUAdam

Description

Implement FusedAdam & CPUAdam

Tasks

[x] Implement FusedAdam

[x] implement CPUAdam

[x] Test FusedAdam

[x] Test CPUAdam

[x] Test FusedSclaeMaskSoftmax (Name changed)
opened by cozytk 6
[WIP] Add data processing modules referring to the lassl
Title

add data processing modules referring to the lassl

Description

brought data processing functions that fit gpt2 with reference to lassl

Linked Issues

None
opened by gimmaru 6
Implementation of Sequential Parallelism
SP with DP implementation

Implemented SP wrapper with DP

Description

SequenceDataParallel works like native torch DDP with SP

you can find details in the file oslo/tests/torch/nn/parallal/data_parallel/test_sp.py
opened by ohwi 5
Update data collators and Add models
Title

Update data collators and Add models

Description

Updated data collators to utilize sequence parallel in Oslo trainer

Add models by referring to the transformers library
opened by gimmaru 3
Implement Expert Parallel and Test for Initialization and Forward Pass
Title

Implement Expert Parallel and Test for Initialization and Forward Pass

Description

Implement Wrapper, Modules and Features for Expert Parallel

Implement mapping_utils._ParallelMappingForHuggingFace as super class of _TensorParallelMappingForHuggingFace and _ExpertParallelMappingForHuggingFace

Test initialization and forward pass for expert parallel
opened by scsc0511 3
Integrate Sequence Parallelism branches
Title

Sequence parallelism (feat. @reniew, @ohwi, @l-yohai)

Description

This PR is Integration of SP current version. But there is something wrong.

We will fix the bugs for the coming week and write test modules according to the SP design.

It did not include the contents of the branch that worked for the test.
opened by l-yohai 3
implement tp-3d layers, wrapper, test codes and refactor all tp test codes and layers
implement tp-3d wrapper

rank transpose problem (tensor_3d_input_rank <-> tensor_3d_output_rank) by implementing ranking transpose function.

revise tp-3d layers for huggingface compatibility

implement tp-3d test codes

refactor all tp test codes

unify format across all tensor parallel modules.
opened by bzantium 2
Refactoring MultiheadAttention with todo anchors
Title

Refactoring MultiheadAttention with todo anchors

Description

Refactoring oslo/torch/nn/modules/functional/multi_head_attention_forward.py.

Remove unnecessary or unintended code and clean up annotations.

Unify return format and the variable name with native torch.

Additionally, I need to test attention_mask. However, it seems that it can proceed with this part after FusedScaleMaskSoftmax is integrated.

cc. @hyunwoongko @ohwi
opened by l-yohai 2
Add tp-1d layers testing
Add testing for tp-1d layers: col_linear, row_linear, vocab_embedding_1d

modify number to integer variable like summa_dim, world_size cc: @hyunwoongko
opened by bzantium 2
[WIP] add test code of sp training
Title

SP Model Test Code

Description

Writing a test code to verify that the gradient and loss values of the model are the same when the sequence parallelism is applied.

WIP - merging @ohwi 's test code comparing SP of ColossalAI and simple learning model.
opened by l-yohai 2

Releases(v2.0.2)

v2.0.2(Aug 25, 2022)
Revert oslo to 1.1.2.

Source code(tar.gz)
Source code(zip)
v2.0.1(Feb 20, 2022)
Merge changes from functorch upstream.

Fix documents and tutorials

Source code(tar.gz)
Source code(zip)
v2.0.0(Feb 14, 2022)
Official release of OSLO 2.0.0 🎉🎉

This version of OSLO provides the following features:

Tensor model parallelism

Efficient activation checkpointing

Kernel fusion

We plan to add the pipeline model parallelism and the ZeRO optimization in the next versions.

New feature: Kernel Fusion

{ "kernel_fusion": { "enable": "bool", "memory_efficient_fusion": "bool", "custom_cuda_kernels": "list" } }

For more information, please check the kernel fusion tutorial
Source code(tar.gz)
Source code(zip)
v2.0.0a2(Feb 2, 2022)

Quick fix of cuda rng state tracker
Source code(tar.gz)
Source code(zip)

v2.0.0a1(Feb 2, 2022)

Add activation checkpointing

You can use efficient activation checkpointing using OSLO with the following configuration.

model = oslo.initialize(
    model,
    config={
        "model_parallelism": {
            "enable": True,
            "tensor_parallel_size": YOUR_TENSOR_PARALLEL_SIZE,
        },
        "activation_checkpointing": {
            "enable": True,
            "cpu_checkpointing": True,
            "partitioned_checkpointing": True,
            "contiguous_checkpointing": True,
        },
    },
)

Tutorial: https://tunib-ai.github.io/oslo/TUTORIALS/activation_checkpointing.html

Source code(tar.gz)
Source code(zip)

v2.0.0a0(Jan 30, 2022)
New API

We paid homage to DeepSpeed. Now it's easier and simpler to use.

import oslo model = oslo.initialize(model, config="oslo-config.json")

Add new models

Albert

Bert

Bart

T5

GPT2

GPTNeo

GPTJ

Electra

Roberta

Add document

https://tunib-ai.github.io/oslo

Remove old pipeline parallelism, kernel fusion code

We'll refurbish them using the latest methods

Kernel fusion: AOTAutograd

Pipeline parallelism: Sagemaker PP

Source code(tar.gz)
Source code(zip)
v.1.1.2(Jan 15, 2022)
Updates

[#7] Selective Kernel Fusion [#9] Fix argument bug

New Feature: Selective Kernel Fusion

Since version 1.1.2, you can fuse only partial kernels, not all kernels. Currently, only Attention class and MLP class are supported.

from oslo import GPT2MLP, GPT2Attention # MLP only fusion model.fuse([GPT2MLP]) # Attention only fusion model.fuse([GPT2Attention]) # MLP + Attention fusion model.fuse([GPT2MLP, GPT2Attention])
Source code(tar.gz)
Source code(zip)

v1.1(Dec 29, 2021)

[#3] Add deployment launcher of Parallelformers into OSLO.

from oslo import GPTNeoForCausalLM

model = GPTNeoForCausalLM.from_pretrained_with_parallel(
    "EleutherAI/gpt-neo-2.7B",
    tensor_parallel_size=2,
    pipeline_parallel_size=2,
    deployment=True  # <-- new feature !
)

You can easily use deployment launcher by deployment=True. Please refer to USAGE.md for more details.

Source code(tar.gz)
Source code(zip)

v1.0.1(Dec 22, 2021)
Quick Fix

Support Megatron-LM style (.jsonl) file preprecessing.

Source code(tar.gz)
Source code(zip)
v1.0(Dec 21, 2021)
O S L O

Open Source framework for Large-scale transformer Optimization

What's New:

December 21, 2021 Released OSLO 1.0.

What is OSLO about?

OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

Installation

OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

pip install oslo-core

Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

If the C++ is available:

CPP_AVAILABLE=1 pip install oslo-core

If the C++ is not available:

CPP_AVAILABLE=0 pip install oslo-core

Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

Key Features

import deepspeed from oslo import GPTJForCausalLM # 1. 3D Parallelism model = GPTJForCausalLM.from_pretrained_with_parallel( "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2, ) # 2. Kernel Fusion model = model.fuse() # 3. DeepSpeed Support engines = deepspeed.initialize( model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ..., ) # 4. Data Processing from oslo import ( DatasetPreprocessor, DatasetBlender, DatasetForCausalLM, ... )

OSLO offers the following features.

3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.

Kernel Fusion: A GPU optimization method to increase training and inference speed.

DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.

Data Processing: Various utilities for efficient large-scale data processing.

See USAGE.md to learn how to use them.

Administrative Notes

Citing OSLO

If you find our work useful, please consider citing:

@misc{oslo, author = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong}, title = {OSLO: Open Source framework for Large-scale transformer Optimization}, howpublished = {\url{https://github.com/tunib-ai/oslo}}, year = {2021}, }

Licensing

The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

Copyright 2021 TUNiB Inc. http://www.tunib.ai All Rights Reserved.

Acknowledgements

The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).
Source code(tar.gz)
Source code(zip)

Owner

TUNiB

TUNiB Inc.

GitHub

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

212 Dec 25, 2022

Open-AI's DALL-E for large scale training in mesh-tensorflow.

DALL-E in Mesh-Tensorflow [WIP] Open-AI's DALL-E in Mesh-Tensorflow. If this is similarly efficient to GPT-Neo, this repo should be able to train mode

432 Dec 16, 2022

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Knover Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers and developers to carry out eff

607 Dec 31, 2022

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

Large-Scale Long-Tailed Recognition in an Open World [Project] [Paper] [Blog] Overview Open Long-Tailed Recognition (OLTR) is the author's re-implemen

761 Dec 26, 2022

DeepGNN is a framework for training machine learning models on large scale graph data.

DeepGNN Overview DeepGNN is a framework for training machine learning models on large scale graph data. DeepGNN contains all the necessary features in

45 Jan 1, 2023

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集，包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。人机交互主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022

A Free and Open Source Python Library for Multiobjective Optimization

Platypus What is Platypus? Platypus is a framework for evolutionary computing in Python with a focus on multiobjective evolutionary algorithms (MOEAs)

424 Dec 18, 2022

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

3.7k Jan 3, 2023

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

1.4k Dec 25, 2022

Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Racing Line Optimization with PSO This repository contains a racing line optimization algorithm in python that uses Particle Swarm Optimization. Requi

6 Dec 14, 2022

Open source hardware and software platform to build a small scale self driving car.

Donkeycar is minimalist and modular self driving library for Python. It is developed for hobbyists and students with a focus on allowing fast experimentation and easy community contributions.

2.4k Jan 4, 2023

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning (FedML) developed and maintained by Scaleout Systems. FEDn enables highly scalable cross-silo and cross-device use-cases over FEDn networks.

75 Nov 9, 2022

Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op

34.7k Jan 4, 2023

[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks, ICLR 2021 (Spotlight) Demo | Paper [NEW!] Time to play with our interac

373 Jan 2, 2023

The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

STAR-FC This code is the implementation for the CVPR 2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes" ?? ?? . ?? Re

87 Dec 28, 2022

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

The SLIDE package contains the source code for reproducing the main experiments in this paper. Dataset The Datasets can be downloaded in Amazon-

72 Dec 16, 2022

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Skeleton Aware Multi-modal Sign Language Recognition By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu. Smile Lab @ Northeastern

128 Dec 8, 2022

Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets This is the official implementation of "Towards Good Pract

52 Nov 22, 2022

OSLO: Open Source framework for Large-scale transformer Optimization

Related tags

Overview

O S L O

What's New:

What is OSLO about?

Installation

Key Features

Administrative Notes

Citing OSLO

Licensing

Acknowledgements

Comments

Title

Description

Title

Description

Tasks

Title

Description

Linked Issues

SP with DP implementation

Description

Title

Description

Title

Description

Title

Description

Title

Description

Title

Description

Releases(v2.0.2)

v2.0.2(Aug 25, 2022)

v2.0.1(Feb 20, 2022)

v2.0.0(Feb 14, 2022)

Official release of OSLO 2.0.0 🎉🎉

New feature: Kernel Fusion

v2.0.0a2(Feb 2, 2022)

v2.0.0a1(Feb 2, 2022)

Add activation checkpointing

v2.0.0a0(Jan 30, 2022)

New API

Add new models

Add document

Remove old pipeline parallelism, kernel fusion code

v.1.1.2(Jan 15, 2022)

Updates

New Feature: Selective Kernel Fusion

v1.1(Dec 29, 2021)

v1.0.1(Dec 22, 2021)

v1.0(Dec 21, 2021)

O S L O

What's New:

What is OSLO about?

Installation

Key Features

Administrative Notes

Citing OSLO

Licensing

Acknowledgements

Owner

TUNiB

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

Open-AI's DALL-E for large scale training in mesh-tensorflow.

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

DeepGNN is a framework for training machine learning models on large scale graph data.

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

A Free and Open Source Python Library for Multiobjective Optimization

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Open source hardware and software platform to build a small scale self driving car.

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

Apache Spark - A unified analytics engine for large-scale data processing

[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"