ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

HPC-AI Tech

Last update: Jan 9, 2023

Related tags

Deep Learning ColossalAI-Examples

Overview

ColossalAI-Examples

This repository contains examples of training models with ColossalAI. These examples fall under three categories:

Computer Vision
Natural Language Processing
General examples to demonstrate ColossalAI's features

Discussion

Discussion about the Colossal-AI project and examples is always welcomed! We would love to exchange ideas with the community to better help this project grow. If you think there is a need to discuss anything, you may jump to our dicussion forum and create a topic there.

If you encounter any problem while running these examples, you may want to raise an issue in this repository.

Contributing

This project welcomes constructive ideas and implementations from the community. If you wish to add an example for a specific application, please commit your code either in the image or language folders. If you wish to add new examples to explain our features, you can commit your code in the features folder, we may invite you to put up a tutorial or blog in ColossalAI Documentation.

Comments

[feature] New example: MAE pretraining on ImageNet 1000 dataset

Colossal-AI implementation of MAE, arxiv.

As an example, we just cover the pretrain phase with ImageNet 1000 mini dataset. Helpers under subdir util/ are from facebookresearch/deit, under Apache License 2.0.

About the coding style

The coding style might be a little different from other examples like run_resnet_cifar10_with_engine.py, the configuration config/pretrain.py handled rich initialization logic and default values.

The DeiT and MAE code has a really complicated and intertwined initialization process. By making full use of Colossal-AI's dynamic python configuration ability, we can keep things simple enough for newcomers to understand.

opened by ofey404 9

ZeRO without using shard_param

🐛 Describe the bug

When i use ZeRO without shard_params, it occurs the following problems

Traceback (most recent call last):
  File "train.py", line 175, in <module>
    main()
  File "train.py", line 39, in main
    with ZeroInitContext(target_device=torch.cuda.current_device(), shard_strategy=shard_strategy, shard_param=False):
  File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/colossalai/zero/init_ctx/init_context.py", line 75, in __init__
    self.config = ZeroContextConfig(target_device=target_device, replicated=True, shard_param=shard_param)
  File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/colossalai/zero/init_ctx/init_context.py", line 37, in __init__
    assert target_device.type == 'cuda', "Replicated no-shard paramters should locate in cuda."
AttributeError: 'int' object has no attribute 'type'

My init code is:

def main():
    parser = colossalai.get_default_parser()
    parser.add_argument('--use_trainer', action='store_true', help='whether to use trainer')
    args = parser.parse_args()

    colossalai.launch_from_torch(config='./config.py')

    logger = get_dist_logger()

    rank = int(os.environ['RANK'])
    # build resnet
    use_zero3 = hasattr(gpc.config, 'zero')
    if use_zero3:
        shard_strategy = TensorShardStrategy()
        with ZeroInitContext(target_device=torch.cuda.current_device(), shard_strategy=shard_strategy, shard_param=False):
            model = resnet34(num_classes=10)
    else:
        model = resnet34(num_classes=10)

my config is

from colossalai.amp import AMP_TYPE
from colossalai.zero.shard_utils import TensorShardStrategy
from colossalai.nn.optimizer import HybridAdam

zero = dict(
    model_config=dict(
        tensor_placement_policy='cuda',
        shard_strategy=TensorShardStrategy(),
        reuse_fp16_shard=False
    ),
    optimizer_config=dict()
)

optimizer = dict(
    type=HybridAdam,
    lr=0.001,
    # weight_decay=1e-2,
)

BATCH_SIZE = 64
NUM_EPOCHS = 20
LOGGING_FREQUNCE = 20
OUTPUT = './'

gradient_clipping = 5.0

Environment

pip install colossalai==0.1.5+torch1.10cu11.1 -f https://release.colossalai.org

ubuntu 18.04

opened by powermano 7

Failed to run gpt2_3d example

Dear developers,

I am trying to run the gpt2_3d example but failed. It looks like the model didn't load the correct batch size. Hope to get some advice.

Thanks.

Error

File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/_operation.py", line 281, in split_tensor_3d

assert dim_size % world_size == 0, \

AssertionError: The dimension 0 to split, size (1) is not a multiple of world size (2), cannot split tensor evenly.

Command

torchrun --standalone --nproc_per_node=8 train_gpt.py --config=gpt2_configs/gpt2_3d.py --from_torch

Environment

colossalai 0.1.2
nvcc 11.3.109
python 3.8.13
pytorch 1.11.0
GPUs: 40G A100 * 8

Error details

$ torchrun --standalone --nproc_per_node=8 ./train_gpt.py --config=./gpt2_configs/gpt2_3d.py  --from_torch
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/apex-0.1-py3.8-linux-x86_64.egg/apex/pyprof/__init__.py:5: FutureWarning: pyprof will be removed by the end of June, 2022
  warnings.warn("pyprof will be removed by the end of June, 2022", FutureWarning)
/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/apex-0.1-py3.8-linux-x86_64.egg/apex/pyprof/__init__.py:5: FutureWarning: pyprof will be removed by the end of June, 2022
  warnings.warn("pyprof will be removed by the end of June, 2022", FutureWarning)
/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/apex-0.1-py3.8-linux-x86_64.egg/apex/pyprof/__init__.py:5: FutureWarning: pyprof will be removed by the end of June, 2022
  warnings.warn("pyprof will be removed by the end of June, 2022", FutureWarning)
/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/apex-0.1-py3.8-linux-x86_64.egg/apex/pyprof/__init__.py:5: FutureWarning: pyprof will be removed by the end of June, 2022
  warnings.warn("pyprof will be removed by the end of June, 2022", FutureWarning)
/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/apex-0.1-py3.8-linux-x86_64.egg/apex/pyprof/__init__.py:5: FutureWarning: pyprof will be removed by the end of June, 2022
  warnings.warn("pyprof will be removed by the end of June, 2022", FutureWarning)
/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/apex-0.1-py3.8-linux-x86_64.egg/apex/pyprof/__init__.py:5: FutureWarning: pyprof will be removed by the end of June, 2022
  warnings.warn("pyprof will be removed by the end of June, 2022", FutureWarning)
/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/apex-0.1-py3.8-linux-x86_64.egg/apex/pyprof/__init__.py:5: FutureWarning: pyprof will be removed by the end of June, 2022
  warnings.warn("pyprof will be removed by the end of June, 2022", FutureWarning)
/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/apex-0.1-py3.8-linux-x86_64.egg/apex/pyprof/__init__.py:5: FutureWarning: pyprof will be removed by the end of June, 2022
  warnings.warn("pyprof will be removed by the end of June, 2022", FutureWarning)
[05/01/22 10:53:55] INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:509 set_device
                    INFO     colossalai - colossalai - INFO: process rank 2 is bound to device 2
                    INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:545 set_seed
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 2, numpy: 1024, python random: 1024,
                             ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1026,the default parallel seed is
                             ParallelMode.DATA.
[05/01/22 10:53:55] INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:509 set_device
[05/01/22 10:53:55] INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:509 set_device
[05/01/22 10:53:55] INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:509 set_device
[05/01/22 10:53:55] INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:509 set_device
[05/01/22 10:53:55] INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:509 set_device
[05/01/22 10:53:55] INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:509 set_device
                    INFO     colossalai - colossalai - INFO: process rank 3 is bound to device 3
                    INFO     colossalai - colossalai - INFO: process rank 7 is bound to device 7
                    INFO     colossalai - colossalai - INFO: process rank 1 is bound to device 1
[05/01/22 10:53:55] INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:509 set_device
                    INFO     colossalai - colossalai - INFO: process rank 4 is bound to device 4
                    INFO     colossalai - colossalai - INFO: process rank 5 is bound to device 5
                    INFO     colossalai - colossalai - INFO: process rank 0 is bound to device 0
                    INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:545 set_seed
                    INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:545 set_seed
                    INFO     colossalai - colossalai - INFO: process rank 6 is bound to device 6
                    INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:545 set_seed
                    INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:545 set_seed
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 3, numpy: 1024, python random: 1024,
                             ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1027,the default parallel seed is
                             ParallelMode.DATA.
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 7, numpy: 1024, python random: 1024,
                             ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1031,the default parallel seed is
                             ParallelMode.DATA.
                    INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:545 set_seed
                    INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:545 set_seed
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 1, numpy: 1024, python random: 1024,
                             ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1025,the default parallel seed is
                             ParallelMode.DATA.
                    INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/context/parallel_context.py:545 set_seed
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 4, numpy: 1024, python random: 1024,
                             ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1028,the default parallel seed is
                             ParallelMode.DATA.
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 1024, python random: 1024,
                             ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel seed is
                             ParallelMode.DATA.
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 5, numpy: 1024, python random: 1024,
                             ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1029,the default parallel seed is
                             ParallelMode.DATA.
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 6, numpy: 1024, python random: 1024,
                             ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1030,the default parallel seed is
                             ParallelMode.DATA.
                    INFO     colossalai - colossalai - INFO:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/initialize.py:109 launch
                    INFO     colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1,                             pipeline parallel size: 1, tensor parallel size: 8
                    INFO     colossalai - colossalai - INFO: ./train_gpt_0.1.2.py:45 main
                    INFO     colossalai - colossalai - INFO: Build data loader
                    INFO     colossalai - colossalai - INFO: ./train_gpt_0.1.2.py:54 main
                    INFO     colossalai - colossalai - INFO: Build model
[05/01/22 10:54:01] INFO     colossalai - colossalai - INFO: ./train_gpt_0.1.2.py:84 main
                    INFO     colossalai - colossalai - INFO: Build optimizer
[05/01/22 10:54:01] WARNING  colossalai - colossalai - WARNING:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/initialize.py:281 initialize
                    INFO     colossalai - colossalai - INFO:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/initialize.py:240 initialize
[05/01/22 10:54:01] WARNING  colossalai - colossalai - WARNING:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/initialize.py:281 initialize
                    WARNING  colossalai - colossalai - WARNING: Initializing an non ZeRO model with optimizer class
                    WARNING  colossalai - colossalai - WARNING: Initializing an non ZeRO model with optimizer class
[05/01/22 10:54:01] WARNING  colossalai - colossalai - WARNING:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/initialize.py:281 initialize
[05/01/22 10:54:01] WARNING  colossalai - colossalai - WARNING:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/initialize.py:281 initialize
                    WARNING  colossalai - colossalai - WARNING: Initializing an non ZeRO model with optimizer class
                    WARNING  colossalai - colossalai - WARNING: Initializing an non ZeRO model with optimizer class
[05/01/22 10:54:01] WARNING  colossalai - colossalai - WARNING:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/initialize.py:281 initialize
                    INFO     colossalai - colossalai - INFO:
                             ========== Your Config ========
                             {'BATCH_SIZE': 4,
                              'NUM_EPOCHS': 60,
                              'SEQ_LEN': 1024,
                              'TENSOR_PARALLEL': 8,
                              'fp16': {'mode': <AMP_TYPE.NAIVE: 'naive'>},
                              'gpt2_small': <function gpt2_small at 0x7f32a53354c0>,
                              'loss': {'type': <class 'model_zoo.gpt.gpt.GPTLMLoss'>},
                              'model': {'checkpoint': True},
                              'optimizer': {'lr': 0.00015, 'weight_decay': 0.01},
                              'parallel': {'pipeline': 1, 'tensor': {'mode': '3d', 'size': 8}}}
                             ================================

                    INFO     colossalai - colossalai - INFO:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/initialize.py:252 initialize
                    WARNING  colossalai - colossalai - WARNING: Initializing an non ZeRO model with optimizer class
[05/01/22 10:54:01] WARNING  colossalai - colossalai - WARNING:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/initialize.py:281 initialize
                    INFO     colossalai - colossalai - INFO: cuDNN benchmark = True, deterministic = False
                    WARNING  colossalai - colossalai - WARNING: Initializing an non ZeRO model with optimizer class
                    WARNING  colossalai - colossalai - WARNING:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/initialize.py:281 initialize
                    WARNING  colossalai - colossalai - WARNING: Initializing an non ZeRO model with optimizer class
[05/01/22 10:54:02] WARNING  colossalai - colossalai - WARNING:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/initialize.py:281 initialize
                    WARNING  colossalai - colossalai - WARNING: Initializing an non ZeRO model with optimizer class
[05/01/22 10:54:02] WARNING  colossalai - colossalai - WARNING:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/initialize.py:409 initialize
                    WARNING  colossalai - colossalai - WARNING: No PyTorch DDP or gradient handler is set up, please make
                             sure you do not need to all-reduce the gradients after a training step.
                    INFO     colossalai - colossalai - INFO: ./train_gpt_0.1.2.py:98 main
                    INFO     colossalai - colossalai - INFO: Init done, global batch size = 4
                    INFO     colossalai - colossalai - INFO:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py:315 fit
                    INFO     colossalai - colossalai - INFO: Using LossHook for training, priority = 0
                    INFO     colossalai - colossalai - INFO:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py:315 fit
                    INFO     colossalai - colossalai - INFO: Using LRSchedulerHook for training, priority = 1
                    INFO     colossalai - colossalai - INFO:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py:315 fit
                    INFO     colossalai - colossalai - INFO: Using LogMetricByEpochHook for training, priority = 10
                    INFO     colossalai - colossalai - INFO:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py:315 fit
                    INFO     colossalai - colossalai - INFO: Using ThroughputHook for training, priority = 10
                    INFO     colossalai - colossalai - INFO:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py:315 fit
                    INFO     colossalai - colossalai - INFO: Using LogMetricByStepHook for training, priority = 10
                    INFO     colossalai - colossalai - INFO:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py:315 fit
                    INFO     colossalai - colossalai - INFO: Using LogMemoryByEpochHook for training, priority = 10
                    INFO     colossalai - colossalai - INFO:
                             /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py:319 fit
                    INFO     colossalai - colossalai - INFO: Lower value means higher priority for calling hook function
                    INFO     colossalai - colossalai - INFO: /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossal                             ai/utils/memory_utils/memory_monitor.py:63 report_memory_usage
                    INFO     colossalai - colossalai - INFO: Before-train: GPU: allocated 91.75 MB, max allocated 92.3 MB,
                             cached: 96.0 MB, max cached: 96.0 MB
[Epoch 0 / Train]:   0%|                                                                             | 0/5 [00:00<?, ?it/s]Traceback (most recent call last):
Traceback (most recent call last):
  File "./train_gpt_0.1.2.py", line 132, in <module>
Traceback (most recent call last):
  File "./train_gpt_0.1.2.py", line 132, in <module>
  File "./train_gpt_0.1.2.py", line 132, in <module>
    main()Traceback (most recent call last):

  File "./train_gpt_0.1.2.py", line 132, in <module>
  File "./train_gpt_0.1.2.py", line 120, in main
Traceback (most recent call last):
      File "./train_gpt_0.1.2.py", line 132, in <module>
main()
  File "./train_gpt_0.1.2.py", line 120, in main
    main()
    main()
  File "./train_gpt_0.1.2.py", line 120, in main
  File "./train_gpt_0.1.2.py", line 120, in main
    trainer.fit(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 334, in fit
    trainer.fit(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 334, in fit
    trainer.fit(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 334, in fit
trainer.fit(
self._train_epoch(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 185, in _train_epoch
    self._train_epoch(
    logits, label, loss = self.engine.execute_schedule(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 185, in _train_epoch
main()  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 198, in execute_schedule
      File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 334, in fit

    logits, label, loss = self.engine.execute_schedule(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 198, in execute_schedule
  File "./train_gpt_0.1.2.py", line 120, in main
    self._train_epoch(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 185, in _train_epoch
    trainer.fit(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 334, in fit
Traceback (most recent call last):
self._train_epoch(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 185, in _train_epoch
  File "./train_gpt_0.1.2.py", line 132, in <module>
    output, label, loss = self._schedule.forward_backward_step(self, data_iter, **kwargs)    output, label, loss = self._schedule.forward_backward_step(self, data_iter, **kwargs)

  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_non_pipeline_schedule.py", line 49, in forward_backward_step
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_non_pipeline_schedule.py", line 49, in forward_backward_step
        logits, label, loss = self.engine.execute_schedule(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 198, in execute_schedule
    self._train_epoch(
      File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 185, in _train_epoch
logits, label, loss = self.engine.execute_schedule(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 198, in execute_schedule
        output = self._call_engine(engine, data)output = self._call_engine(engine, data)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_base_schedule.py", line 105, in _call_engine

  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_base_schedule.py", line 105, in _call_engine
    main()
output, label, loss = self._schedule.forward_backward_step(self, data_iter, **kwargs)
  File "./train_gpt_0.1.2.py", line 120, in main
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_non_pipeline_schedule.py", line 49, in forward_backward_step
logits, label, loss = self.engine.execute_schedule(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 198, in execute_schedule
output, label, loss = self._schedule.forward_backward_step(self, data_iter, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_non_pipeline_schedule.py", line 49, in forward_backward_step
        return engine(**inputs)
return engine(**inputs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 183, in __call__
    output = self._call_engine(engine, data)
      File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_base_schedule.py", line 105, in _call_engine
trainer.fit(
output = self._call_engine(engine, data)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_base_schedule.py", line 105, in _call_engine
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 334, in fit
    return self.model(*args, **kwargs)
        return engine(**inputs)  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl

return engine(**inputs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 183, in __call__
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 183, in __call__
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 183, in __call__
    output, label, loss = self._schedule.forward_backward_step(self, data_iter, **kwargs)
    return self.model(*args, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_non_pipeline_schedule.py", line 49, in forward_backward_step
    self._train_epoch(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 185, in _train_epoch
    return self.model(*args, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    output = self._call_engine(engine, data)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_base_schedule.py", line 105, in _call_engine
        return self.model(*args, **kwargs)logits, label, loss = self.engine.execute_schedule(

  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 198, in execute_schedule
    return engine(**inputs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 183, in __call__
    output, label, loss = self._schedule.forward_backward_step(self, data_iter, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_non_pipeline_schedule.py", line 49, in forward_backward_step
    return self.model(*args, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    output = self._call_engine(engine, data)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_base_schedule.py", line 105, in _call_engine
    return engine(**inputs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 183, in __call__
    return self.model(*args, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
        return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/amp/naive_amp/naive_amp.py", line 145, in forward
    return forward_call(*input, **kwargs)return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/amp/naive_amp/naive_amp.py", line 145, in forward
    return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/amp/naive_amp/naive_amp.py", line 145, in forward

  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/amp/naive_amp/naive_amp.py", line 145, in forward
        out = self.model(*args, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
        return forward_call(*input, **kwargs)out = self.model(*args, **kwargs)out = self.model(*args, **kwargs)

  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/amp/naive_amp/naive_amp.py", line 145, in forward
    return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/amp/naive_amp/naive_amp.py", line 145, in forward
    out = self.model(*args, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    out = self.model(*args, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    out = self.model(*args, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 291, in forward
    return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 291, in forward
    return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 291, in forward

  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 291, in forward
    return forward_call(*input, **kwargs)
      File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 291, in forward
x = self.embed(input_ids)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    x = self.embed(input_ids)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
    x = self.embed(input_ids)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 291, in forward
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    x = self.embed(input_ids)
        return forward_call(*input, **kwargs)  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl

return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 50, in forward
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 50, in forward
    return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 50, in forward
    x = self.embed(input_ids)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    x = self.word_embeddings(input_ids) + self.position_embeddings(position_ids)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1128, in _call_impl
    x = self.word_embeddings(input_ids) + self.position_embeddings(position_ids)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1128, in _call_impl
    x = self.word_embeddings(input_ids) + self.position_embeddings(position_ids)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1128, in _call_impl
[Epoch 0 / Train]:   0%|                                                                             | 0/5 [00:00<?, ?it/s]Traceback (most recent call last):
  File "./train_gpt_0.1.2.py", line 132, in <module>
        result = forward_call(*input, **kwargs)result = forward_call(*input, **kwargs)

      File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/colossalai_layer/_utils.py", line 38, in forward
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/colossalai_layer/_utils.py", line 38, in forward
result = forward_call(*input, **kwargs)
main()  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/colossalai_layer/_utils.py", line 38, in forward

  File "./train_gpt_0.1.2.py", line 120, in main
    trainer.fit(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 334, in fit
    x = self.embed(input_ids)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
      File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 50, in forward
self._train_epoch(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 185, in _train_epoch
    return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 50, in forward
    logits, label, loss = self.engine.execute_schedule(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 198, in execute_schedule
    x = self.word_embeddings(input_ids) + self.position_embeddings(position_ids)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1128, in _call_impl
            return self._forward_func(*args)return self._forward_func(*args)    output, label, loss = self._schedule.forward_backward_step(self, data_iter, **kwargs)

x = self.word_embeddings(input_ids) + self.position_embeddings(position_ids)  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/layers.py", line 976, in forward
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/layers.py", line 976, in forward


return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_non_pipeline_schedule.py", line 49, in forward_backward_step
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 50, in forward
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1128, in _call_impl
        output = self._call_engine(engine, data)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_base_schedule.py", line 105, in _call_engine
    input_ = split_tensor_3d(input_, 0, self.weight_parallel_mode)input_ = split_tensor_3d(input_, 0, self.weight_parallel_mode)
x = self.word_embeddings(input_ids) + self.position_embeddings(position_ids)

  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/_operation.py", line 281, in split_tensor_3d
      File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/_operation.py", line 281, in split_tensor_3d
    return self._forward_func(*args)
result = forward_call(*input, **kwargs)  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/layers.py", line 976, in forward

    return engine(**inputs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/colossalai_layer/_utils.py", line 38, in forward
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 183, in __call__
    return self.model(*args, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return self._forward_func(*args)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/layers.py", line 976, in forward
    input_ = split_tensor_3d(input_, 0, self.weight_parallel_mode)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/_operation.py", line 281, in split_tensor_3d
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1128, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/amp/naive_amp/naive_amp.py", line 145, in forward
        assert dim_size % world_size == 0, \assert dim_size % world_size == 0, \

        result = forward_call(*input, **kwargs)AssertionErrorout = self.model(*args, **kwargs)
:
The dimension 0 to split, size (1) is not a multiple of world size (2), cannot split tensor evenly  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/colossalai_layer/_utils.py", line 38, in forward
AssertionError  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl

: The dimension 0 to split, size (1) is not a multiple of world size (2), cannot split tensor evenly
    return self._forward_func(*args)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/layers.py", line 976, in forward
    result = forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/colossalai_layer/_utils.py", line 38, in forward
    return self._forward_func(*args)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/layers.py", line 976, in forward
    input_ = split_tensor_3d(input_, 0, self.weight_parallel_mode)
      File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/_operation.py", line 281, in split_tensor_3d
return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 291, in forward
    x = self.embed(input_ids)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    input_ = split_tensor_3d(input_, 0, self.weight_parallel_mode)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/_operation.py", line 281, in split_tensor_3d
    assert dim_size % world_size == 0, \
AssertionError: The dimension 0 to split, size (1) is not a multiple of world size (2), cannot split tensor evenly
    assert dim_size % world_size == 0, \
AssertionError: The dimension 0 to split, size (1) is not a multiple of world size (2), cannot split tensor evenly
    return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 50, in forward
    input_ = split_tensor_3d(input_, 0, self.weight_parallel_mode)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/_operation.py", line 281, in split_tensor_3d
    x = self.word_embeddings(input_ids) + self.position_embeddings(position_ids)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1128, in _call_impl
    assert dim_size % world_size == 0, \
AssertionError: The dimension 0 to split, size (1) is not a multiple of world size (2), cannot split tensor evenly
    result = forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/colossalai_layer/_utils.py", line 38, in forward
    return self._forward_func(*args)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/layers.py", line 976, in forward
    input_ = split_tensor_3d(input_, 0, self.weight_parallel_mode)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/_operation.py", line 281, in split_tensor_3d
    assert dim_size % world_size == 0, \
AssertionError: The dimension 0 to split, size (1) is not a multiple of world size (2), cannot split tensor evenly
    assert dim_size % world_size == 0, \
AssertionError: The dimension 0 to split, size (1) is not a multiple of world size (2), cannot split tensor evenly
Traceback (most recent call last):
  File "./train_gpt_0.1.2.py", line 132, in <module>
    main()
  File "./train_gpt_0.1.2.py", line 120, in main
    trainer.fit(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 334, in fit
    self._train_epoch(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 185, in _train_epoch
    logits, label, loss = self.engine.execute_schedule(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 198, in execute_schedule
    output, label, loss = self._schedule.forward_backward_step(self, data_iter, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_non_pipeline_schedule.py", line 49, in forward_backward_step
    output = self._call_engine(engine, data)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/schedule/_base_schedule.py", line 105, in _call_engine
    return engine(**inputs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 183, in __call__
    return self.model(*args, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/amp/naive_amp/naive_amp.py", line 145, in forward
    out = self.model(*args, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 291, in forward
    x = self.embed(input_ids)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/model_zoo/gpt/gpt.py", line 50, in forward
    x = self.word_embeddings(input_ids) + self.position_embeddings(position_ids)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1128, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/colossalai_layer/_utils.py", line 38, in forward
    return self._forward_func(*args)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/layers.py", line 976, in forward
    input_ = split_tensor_3d(input_, 0, self.weight_parallel_mode)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/colossalai/nn/layer/parallel_3d/_operation.py", line 281, in split_tensor_3d
    assert dim_size % world_size == 0, \
AssertionError: The dimension 0 to split, size (1) is not a multiple of world size (2), cannot split tensor evenly
terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: driver shutting down
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from query at /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/cuda/CUDAEvent.h:95 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f0bb282b1bd in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x11a (0x7f0bf06ba6ea in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x50 (0x7f0bf06bccd0 in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x145 (0x7f0bf06bdf65 in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #4: <unknown function> + 0xc9039 (0x7f0c48562039 in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/../../../../libstdc++.so.6)
frame #5: <unknown function> + 0x7ea5 (0x7f0c6ecd8ea5 in /lib64/libpthread.so.0)
frame #6: clone + 0x6d (0x7f0c6ea019fd in /lib64/libc.so.6)

terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: driver shutting down
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from query at /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/cuda/CUDAEvent.h:95 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7fe8efe431bd in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x11a (0x7fe92dcd26ea in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x50 (0x7fe92dcd4cd0 in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x145 (0x7fe92dcd5f65 in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #4: <unknown function> + 0xc9039 (0x7fe985b3a039 in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/../../../../libstdc++.so.6)
frame #5: <unknown function> + 0x7ea5 (0x7fe9ac2f0ea5 in /lib64/libpthread.so.0)
frame #6: clone + 0x6d (0x7fe9ac0199fd in /lib64/libc.so.6)

terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: driver shutting down
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from query at /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/cuda/CUDAEvent.h:95 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7fdfff31b1bd in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x11a (0x7fe03d1aa6ea in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x50 (0x7fe03d1accd0 in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x145 (0x7fe03d1adf65 in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #4: <unknown function> + 0xc9039 (0x7fe095012039 in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/../../../../libstdc++.so.6)
frame #5: <unknown function> + 0x7ea5 (0x7fe0bb7c8ea5 in /lib64/libpthread.so.0)
frame #6: clone + 0x6d (0x7fe0bb4f19fd in /lib64/libc.so.6)

terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: driver shutting down
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from query at /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/cuda/CUDAEvent.h:95 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f835f9611bd in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x11a (0x7f839d7f06ea in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x50 (0x7f839d7f2cd0 in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x145 (0x7f839d7f3f65 in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #4: <unknown function> + 0xc9039 (0x7f83f5658039 in /home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/lib/../../../../libstdc++.so.6)
frame #5: <unknown function> + 0x7ea5 (0x7f841be0eea5 in /lib64/libpthread.so.0)
frame #6: clone + 0x6d (0x7f841bb379fd in /lib64/libc.so.6)

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 184844) of binary: /home/asc/.conda/envs/nlp/bin/python
Traceback (most recent call last):
  File "/home/asc/.conda/envs/nlp/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==1.11.0', 'console_scripts', 'torchrun')())
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/distributed/run.py", line 724, in main
    run(args)
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/distributed/run.py", line 715, in run
    elastic_launch(
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/asc/.conda/envs/nlp/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
./train_gpt_0.1.2.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2022-05-01_10:54:05
  host      : localhost.localdomain
  rank      : 1 (local_rank: 1)
  exitcode  : -6 (pid: 184845)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 184845
[2]:
  time      : 2022-05-01_10:54:05
  host      : localhost.localdomain
  rank      : 2 (local_rank: 2)
  exitcode  : -6 (pid: 184846)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 184846
[3]:
  time      : 2022-05-01_10:54:05
  host      : localhost.localdomain
  rank      : 3 (local_rank: 3)
  exitcode  : -6 (pid: 184847)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 184847
[4]:
  time      : 2022-05-01_10:54:05
  host      : localhost.localdomain
  rank      : 4 (local_rank: 4)
  exitcode  : -6 (pid: 184848)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 184848
[5]:
  time      : 2022-05-01_10:54:05
  host      : localhost.localdomain
  rank      : 5 (local_rank: 5)
  exitcode  : 1 (pid: 184849)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
  time      : 2022-05-01_10:54:05
  host      : localhost.localdomain
  rank      : 6 (local_rank: 6)
  exitcode  : 1 (pid: 184850)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[7]:
  time      : 2022-05-01_10:54:05
  host      : localhost.localdomain
  rank      : 7 (local_rank: 7)
  exitcode  : 1 (pid: 184851)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-05-01_10:54:05
  host      : localhost.localdomain
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 184844)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

opened by FJRFrancio 7

当模型gradient_checkpointing时运行feature/zero/train_v2.py出错

🐛 Describe the bug

Traceback (most recent call last): File "/data1/users/jizhong1/ColossalAI-Examples/features/zero/train_v2.py", line 133, in main() File "/dirname/ColossalAI-Examples/features/zero/train_v2.py", line 123, in main optimizer.backward(loss) File "/python_path/lib/python3.9/site-packages/colossalai/zero/zero_optimizer.py", line 154, in backward self.module.backward(loss) File "/python_path/lib/python3.9/site-packages/colossalai/nn/parallel/data_parallel.py", line 266, in backward loss.backward() File "/python_path/lib/python3.9/site-packages/torch/_tensor.py", line 388, in backward return handle_torch_function( File "/python_path/lib/python3.9/site-packages/torch/overrides.py", line 1498, in handle_torch_function result = torch_func_method(public_api, types, args, kwargs) File "/python_path/lib/python3.9/site-packages/colossalai/tensor/colo_tensor.py", line 171, in torch_function ret = func(*args, **kwargs) File "/python_path/lib/python3.9/site-packages/torch/_tensor.py", line 396, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/python_path/lib/python3.9/site-packages/torch/autograd/init.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/python_path/lib/python3.9/site-packages/torch/autograd/function.py", line 253, in apply return user_fn(self, *args) File "/python_path/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 130, in backward outputs = ctx.run_function(*detached_inputs) File "/python_path/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 887, in custom_forward return module(*inputs, use_cache, output_attentions) File "/python_path/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/python_path/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 400, in forward hidden_states = self.ln_1(hidden_states) File "/python_path/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/python_path/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 189, in forward return F.layer_norm( File "/python_path/lib/python3.9/site-packages/torch/nn/functional.py", line 2503, in layer_norm return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 40088) of binary: /python_path/bin/python Traceback (most recent call last): File "/python_path/bin/torchrun", line 33, in sys.exit(load_entry_point('torch==1.12.1', 'console_scripts', 'torchrun')()) File "/python_path/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper return f(*args, **kwargs) File "/python_path/lib/python3.9/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/python_path/lib/python3.9/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/python_path/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/python_path/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Environment

No response

opened by wjizhong 5
wikiextractor raise BdbQuit

🐛 Describe the bug

Hi All, When I run the code in language Bert # extractmodule wikiextractor --json enwiki-latest-pages-articles.xml.bz2 I got raise BdbQuit, this seems to be solved in here , by changing the version of wikiextractor to 3.0.4 But after that, the example code couldn't work due to 3.0.4 does not support --json

Environment

No response

opened by saleelirenyun 3

[enhancement] Examplify `all_reduce()` for tensor_parallel_*

Tutorial 1D Tensor Parallelism mentioned the use of all_reduce(), but the example attached doesn't show us how to do it.

Quote:

on each processor, then use an all-reduce to aggregate the results as $Z=Y_1B_1+Y_2B_2Z=Y$

So I made this enhancement, to print weight information before and after calling all_reduce().

Output:

Weight of the first linear layer: torch.Size([256, 512])
Weight of the second linear layer: torch.Size([512, 256])
Output of the first linear layer: torch.Size([16, 512])
Output of the second linear layer: torch.Size([16, 256])
Output of the dropout layer: torch.Size([16, 256])
On rank 0, first 10 elements of x:
tensor([-0.1215, -0.3460, -0.2717, -0.0932, -0.4238, -0.0999, -0.0000,  0.2923,
        -0.1130, -0.0000], device='cuda:0', grad_fn=<SliceBackward0>)

On rank 1, first 10 elements of x:
tensor([-0.1215, -0.3460, -0.2717, -0.0932, -0.4238, -0.0999, -0.0000,  0.2923,
        -0.1130, -0.0000], device='cuda:1', grad_fn=<SliceBackward0>)

After `all_reduce()`, first 10 elements of x:
tensor([-0.2431, -0.6920, -0.5434, -0.1864, -0.8475, -0.1998, -0.0000,  0.5845,
        -0.2259, -0.0000], device='cuda:0', grad_fn=<SliceBackward0>)

Output of the all_reduce opration: torch.Size([16, 256])

opened by ofey404 3

knowledge graph embedding examples - Bin Shang

My name is Bin Shang. As requested by professor Yong You, I add three knowledge graph embedding examples DistMult, ComplEx and RotatE. Please check the code and feel free to contact me if you have any questions.

opened by MiracleDesigner 3
failed to run gpt example
🐛 Describe the bug

cd ColossalAI/examples/language/gpt torchrun --standalone --nproc_per_node=1 train_gpt.py --config=gpt2_configs/gpt2_zero3.py --from_torch

bash: /opt/lcsoftware/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/miniconda3-4.10.3-u6p3tgreee7aigtnvuhr44yqo7vcg6r6/lib/libtinfo.so.6: no version information available (required by bash) Colossalai should be built with cuda extension to use the FP16 optimizer /home/lcfjr/.local/lib/python3.9/site-packages/torch/cuda/init.py:143: UserWarning: NVIDIA A100-PCIE-80GB with CUDA capability sm_80 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA A100-PCIE-80GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) colossalai - colossalai - 2022-02-24 15:04:02,751 INFO: process rank 0 is bound to device 0 colossalai - colossalai - 2022-02-24 15:04:02,772 INFO: initialized seed on rank 0, numpy: 1024, python random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel seed is ParallelMode.DATA. colossalai - colossalai - 2022-02-24 15:04:02,772 INFO: Distributed environment is initialized, data parallel size: 1, pipeline parallel size: 1, tensor parallel size: 1 colossalai - colossalai - 2022-02-24 15:04:02,772 INFO: Build data loader colossalai - colossalai - 2022-02-24 15:04:02,864 INFO: Build model Traceback (most recent call last): File "/home/lcfjr/codes/ColossalAI/examples/language/gpt/train_gpt.py", line 118, in main() File "/home/lcfjr/codes/ColossalAI/examples/language/gpt/train_gpt.py", line 49, in main model = gpc.config.model.pop('type')(**gpc.config.model) File "/home/lcfjr/.local/lib/python3.9/site-packages/model_zoo/gpt/gpt.py", line 402, in gpt2_small return create_gpt_model(**model_kwargs) File "/home/lcfjr/.local/lib/python3.9/site-packages/model_zoo/gpt/gpt.py", line 368, in create_gpt_model model = GPT(**model_kwargs) File "/home/lcfjr/.local/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 254, in wrapper f(module, *args, **kwargs) File "/home/lcfjr/.local/lib/python3.9/site-packages/model_zoo/gpt/gpt.py", line 261, in init self.embed = GPTEmbedding(embedding_dim=dim, File "/home/lcfjr/.local/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 254, in wrapper f(module, *args, **kwargs) File "/home/lcfjr/.local/lib/python3.9/site-packages/model_zoo/gpt/gpt.py", line 33, in init self.word_embeddings = col_nn.Embedding(vocab_size, embedding_dim, padding_idx=padding_idx, dtype=dtype) File "/home/lcfjr/.local/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 254, in wrapper f(module, *args, **kwargs) File "/home/lcfjr/.local/lib/python3.9/site-packages/colossalai/nn/layer/colossalai_layer/embedding.py", line 69, in init weight_initializer(self.embed.weight, fan_in=num_embeddings, fan_out=embedding_dim) File "/home/lcfjr/.local/lib/python3.9/site-packages/colossalai/nn/init.py", line 31, in initializer return nn.init.normal(tensor, mean, std) File "/home/lcfjr/.local/lib/python3.9/site-packages/torch/nn/init.py", line 151, in normal return no_grad_normal(tensor, mean, std) File "/home/lcfjr/.local/lib/python3.9/site-packages/torch/nn/init.py", line 19, in no_grad_normal return tensor.normal_(mean, std) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'HPC-AI_1150681_0' has failed to send a keep-alive heartbeat to the rendezvous 'a5650b64-ab96-467e-861a-b345eaa8ab3b' due to an error of type RendezvousConnectionError. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1150747) of binary: /opt/lcsoftware/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/miniconda3-4.10.3-u6p3tgreee7aigtnvuhr44yqo7vcg6r6/bin/python ERROR:torch.distributed.elastic.agent.server.api:Error waiting on exit barrier. Elapsed: 0.00041747093200683594 seconds Traceback (most recent call last): File "/home/lcfjr/.local/lib/python3.9/site-packages/torch/distributed/elastic/agent/server/api.py", line 899, in _exit_barrier store_util.barrier( File "/home/lcfjr/.local/lib/python3.9/site-packages/torch/distributed/elastic/utils/store.py", line 67, in barrier synchronize(store, data, rank, world_size, key_prefix, barrier_timeout) File "/home/lcfjr/.local/lib/python3.9/site-packages/torch/distributed/elastic/utils/store.py", line 52, in synchronize store.set(f"{key_prefix}{rank}", data) RuntimeError: Broken pipe WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'HPC-AI_1150681_0' has failed to shutdown the rendezvous 'a5650b64-ab96-467e-861a-b345eaa8ab3b' due to an error of type RendezvousConnectionError. Traceback (most recent call last): File "/home/lcfjr/.local/bin/torchrun", line 10, in sys.exit(main()) File "/home/lcfjr/.local/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper return f(*args, **kwargs) File "/home/lcfjr/.local/lib/python3.9/site-packages/torch/distributed/run.py", line 719, in main run(args) File "/home/lcfjr/.local/lib/python3.9/site-packages/torch/distributed/run.py", line 710, in run elastic_launch( File "/home/lcfjr/.local/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/lcfjr/.local/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train_gpt.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2022-02-24_15:04:10 host : HPC-AI rank : 0 (local_rank: 0) exitcode : 1 (pid: 1150747) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Environment

No response
opened by feifeibear 3
BERT示例运行错误，ColossalAI-Examples/language/bert/sequene_parallel/

🐛 Describe the bug

使用了最新提供的Dockerhub上的镜像0.1.8，但是在运行BERT序列并行案例：ColossalAI-Examples/language/bert/sequene_parallel/时候仍不能正常运行，提示缺少相关包： Traceback (most recent call last): File "/workspace/ColossalAI-Examples/language/bert/sequene_parallel/train.py", line 10, in from model.bert import BertForPretrain File "/workspace/ColossalAI-Examples/language/bert/sequene_parallel/model/bert.py", line 12, in from colossalai.builder.pipeline import partition_uniform ModuleNotFoundError: No module named 'colossalai.builder.pipeline'

Environment

docker镜像：docker pull hpcaitech/colossalai:0.1.8

opened by ZXM1063694570 2
Problem with saving model state dict

🐛 Describe the bug

https://github.com/hpcaitech/ColossalAI-Examples/blob/f743872c2089d6bb5e593db6a8a48d427e6b2b1e/language/opt/run_clm.py#L504

The code in this line should be model_state = model.state_dict(), although fixing this bug, the saved state dict is all None.

Traceback (most recent call last): File "generate.py", line 238, in <module> main() File "generate.py", line 211, in main model = OPTForCausalLM.from_pretrained(args.model_path) File "/mnt/datadisk0/ouyangliqi/miniconda3/envs/colossalai/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2119, in from_pretrained model, missing_keys, unexpected_keys, mismatched_keys, error_msgs = cls._load_pretrained_model( File "/mnt/datadisk0/ouyangliqi/miniconda3/envs/colossalai/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2376, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for OPTForCausalLM: size mismatch for model.decoder.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([50272, 4096]). size mismatch for model.decoder.embed_positions.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([2050, 4096]). size mismatch for model.decoder.final_layer_norm.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.final_layer_norm.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096, 4096]). size mismatch for model.decoder.layers.0.self_attn.k_proj.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096, 4096]). size mismatch for model.decoder.layers.0.self_attn.v_proj.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096, 4096]). size mismatch for model.decoder.layers.0.self_attn.q_proj.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.self_attn.out_proj.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096, 4096]). size mismatch for model.decoder.layers.0.self_attn.out_proj.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.self_attn_layer_norm.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.self_attn_layer_norm.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.fc1.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([16384, 4096]). size mismatch for model.decoder.layers.0.fc1.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([16384]). size mismatch for model.decoder.layers.0.fc2.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096, 16384]). size mismatch for model.decoder.layers.0.fc2.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.final_layer_norm.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096])....

Environment

CUDA: 11.3 Pytorch: 1.12 transformers: 4.21.0.dev0

opened by ouyangliqi 2

failed to run gpt2 zero3 example

🐛 Describe the bug

Command:

OMP_NUM_THREADS=32 torchrun --standalone --nnodes=1 --nproc_per_node 2 train_gpt.py --config=gpt2_configs/gpt2_zero3.py --from_torch

Result:

Traceback (most recent call last):
  File "train_gpt.py", line 130, in <module>
    main()
  File "train_gpt.py", line 56, in main
    ctx = ZeroInitContext(target_device=torch.cuda.current_device(),
TypeError: __init__() missing 1 required positional argument: 'convert_fp16'
Traceback (most recent call last):
  File "train_gpt.py", line 130, in <module>
    main()
  File "train_gpt.py", line 56, in main
    ctx = ZeroInitContext(target_device=torch.cuda.current_device(),
TypeError: __init__() missing 1 required positional argument: 'convert_fp16'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 38441) of binary: /home/toga/.conda/envs/ColAI/bin/python
Traceback (most recent call last):
  File "/home/toga/.conda/envs/ColAI/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==1.10.1', 'console_scripts', 'torchrun')())
  File "/home/toga/.conda/envs/ColAI/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/home/toga/.conda/envs/ColAI/lib/python3.8/site-packages/torch/distributed/run.py", line 719, in main
    run(args)
  File "/home/toga/.conda/envs/ColAI/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run
    elastic_launch(
  File "/home/toga/.conda/envs/ColAI/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/toga/.conda/envs/ColAI/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train_gpt.py FAILED

Environment

colossalai

colossalai               0.1.1

nvcc:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0

Python

Python 3.8.12

PyTorch

torch                    1.10.1

opened by CHN-ChenYi 2

connection failure

🐛 Describe the bug

I found a runtime error while running the code: The client socket has failed to connect to any network address of (hcp-bb-03, 52873). The client socket has failed to connect to hcp-bb-03:52873 (errno: 110 - Connection timed out) using command line :colossalai run --nproc_per_node 4 --master_port 29505 train.py

Environment

opened by lhj-git 2
cannot import name 'OPTForCausalLM'
🐛 Describe the bug

I tried to run the command in this link https://github.com/hpcaitech/ColossalAI-Examples/tree/main/language/opt, but errors occured.

Traceback (most recent call last): File "run_clm.py", line 44, in <module> from transformers import (CONFIG_MAPPING, MODEL_MAPPING, AutoConfig, OPTForCausalLM, AutoTokenizer, SchedulerType, ImportError: cannot import name 'OPTForCausalLM'

Environment

python 3.6 CUDA 10.2 transformers 4.18.0
opened by upwindflys 0
Outdated OPT example
🐛 Describe the bug

When running OPT example, I got the following errors:

AttributeError: type object 'ChunkManager' has no attribute 'search_chunk_size'

This is caused by an outdated API. Comparing to the OPT example in ColossalAI, the example here is not updated up for a while.

Environment

No response
opened by larry-fuy 0
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
Load ColossalAI GPT model as HuggingFace/Transformers Model

Describe the feature

Hi all,

I'm trying to use a GPT model I trained using ColossalAI with huggingface/transformers for inference but it's not possible to load the model as a huggingface model as it is implemented in pytorch. How can I go about loading the model I trained using huggingface/transformers library?

Thanks so much for your help.

Best, Red

opened by Red-Giuliano 2

ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

Related tags

Overview

ColossalAI-Examples

Discussion

Contributing

Comments

About the coding style

🐛 Describe the bug

Environment

Error

Command

Environment

Error details

🐛 Describe the bug

Environment

🐛 Describe the bug

Environment

🐛 Describe the bug

train_gpt.py FAILED

Failures: <NO_OTHER_FAILURES>

Environment

🐛 Describe the bug

Environment

🐛 Describe the bug

Environment

🐛 Describe the bug

Environment

🐛 Describe the bug

Environment

🐛 Describe the bug

Environment

🐛 Describe the bug

Environment

Patching CVE-2007-4559

Describe the feature

Owner

HPC-AI Tech

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Mesh TensorFlow: Model Parallelism Made Easier

A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

PyTorch implementation of paper: HPNet: Deep Primitive Segmentation Using Hybrid Representations.

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

Hybrid Neural Fusion for Full-frame Video Stabilization

Code for Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations

The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

HyDiff: Hybrid Differential Software Analysis

QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing

An adaptive hierarchical energy management strategy for hybrid electric vehicles

Generalized hybrid model for mode-locked laser diodes with an extended passive cavity

Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

The official implementation of the Hybrid Self-Attention NEAT algorithm