Fine-Tune EleutherAI GPT-Neo to Generate Netflix Movie Descriptions in Only 47 Lines of Code Using Hugginface And DeepSpeed

Overview
Comments
  • DeepSpeed Loss Overflow

    DeepSpeed Loss Overflow

    Hardware: Rtx 3090, Ryzen 3600, 64 GB of RAM.

    I am trying to train the 1.3B parameter model on a custom dataset. This model training takes more memory due to the longer input(or so it seems, it def takes more memory). Thus I am trying to use DeepSpeed. I have changed nothing other than using the smaller 1.3B model and reducing the batch size to 4.

    An issue I am having is that the loss(I think its the loss) is overflowing. I know this is due to using mixed or half-precision in order to reduce memory usage. When training on the provided dataset, this is not an issue. The provided dataset does initially have the overflow issue, but it is quickly resolved through internal adjustments. Is there some configuration change I can make so that this custom dataset will work without overflowing?

    Below are some logs, you can see that it still is overflowing even with the loss scale at 1.

    python gpt_neo_xl_deepspeed.py Special tokens have been added in the vocabulary, make sure the associated word embedding are fine-tuned or trained. Max length: 384 [2021-06-08 12:02:15,302] [INFO] [distributed.py:47:init_distributed] Initializing torch distributed with backend: nccl [2021-06-08 12:02:15,601] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed info: version=0.3.13+12a53b4, git-hash=12a53b4, git-branch=HEAD [2021-06-08 12:02:15,622] [INFO] [engine.py:77:_initialize_parameter_parallel_groups] data_parallel_size: 1, parameter_parallel_size: 1 Adam Optimizer #0 is created with scalar arithmetic capability. Config: alpha=0.000050, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1 [2021-06-08 12:02:17,843] [INFO] [engine.py:602:_configure_optimizer] Using DeepSpeed Optimizer param name adamw as basic optimizer [2021-06-08 12:02:17,843] [INFO] [engine.py:606:_configure_optimizer] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.DeepSpeedCPUAdam'> [2021-06-08 12:02:17,843] [INFO] [logging.py:60:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer Initializing ZeRO Stage 3 [2021-06-08 12:02:17,844] [WARNING] [stage3.py:35:] apex was installed without --cpp_ext. Falling back to Python flatten and unflatten. [2021-06-08 12:02:17,844] [INFO] [utils.py:555:see_memory_usage] Stage 3 intialize beginning /home/blake/anaconda3/envs/gpt/lib/python3.7/site-packages/torch/cuda/memory.py:346: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved FutureWarning) /home/blake/anaconda3/envs/gpt/lib/python3.7/site-packages/torch/cuda/memory.py:354: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved FutureWarning) [2021-06-08 12:02:17,845] [INFO] [utils.py:560:see_memory_usage] MA 2.5 GB Max_MA 5.14 GB CA 5.14 GB Max_CA 5 GB [2021-06-08 12:02:17,845] [INFO] [utils.py:565:see_memory_usage] CPU Virtual Memory: used = 24.87 GB, percent = 39.6% [2021-06-08 12:02:17,845] [INFO] [stage3.py:586:init] Reduce bucket size 500000000 [2021-06-08 12:02:17,845] [INFO] [stage3.py:587:init] Allgather bucket size 50000000 [2021-06-08 12:02:22,511] [INFO] [stage3.py:730:init] optimizer state initialized [2021-06-08 12:02:23,014] [INFO] [utils.py:555:see_memory_usage] After initializing ZeRO optimizer [2021-06-08 12:02:23,014] [INFO] [utils.py:560:see_memory_usage] MA 0.43 GB Max_MA 5.14 GB CA 5.53 GB Max_CA 6 GB [2021-06-08 12:02:23,014] [INFO] [utils.py:565:see_memory_usage] CPU Virtual Memory: used = 47.03 GB, percent = 74.9% [2021-06-08 12:02:23,014] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw [2021-06-08 12:02:23,014] [INFO] [engine.py:439:_configure_lr_scheduler] DeepSpeed using configured LR scheduler = WarmupLR [2021-06-08 12:02:23,014] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed LR Scheduler = <deepspeed.runtime.lr_schedules.WarmupLR object at 0x7fb428040ed0> [2021-06-08 12:02:23,014] [INFO] [logging.py:60:log_dist] [Rank 0] step=0, skipped=0, lr=[5e-05], mom=[[0.9, 0.999]] [2021-06-08 12:02:23,014] [INFO] [config.py:737:print] DeepSpeedEngine configuration: [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] activation_checkpointing_config { "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "partition_activations": false, "profile": false, "synchronize_checkpoint_boundary": false } [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] allreduce_always_fp32 ........ False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] amp_enabled .................. False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] amp_params ................... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] checkpoint_tag_validation_enabled True [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] checkpoint_tag_validation_fail False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] disable_allgather ............ False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] dump_state ................... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] dynamic_loss_scale_args ...... {'init_scale': 4294967296, 'scale_window': 1000, 'delayed_shift': 2, 'min_scale': 1} [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] elasticity_enabled ........... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] flops_profiler_config ........ { "detailed": true, "enabled": false, "module_depth": -1, "profile_step": 1, "top_modules": 3 } [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] fp16_enabled ................. True [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] global_rank .................. 0 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] gradient_accumulation_steps .. 1 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] gradient_clipping ............ 1.0 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] gradient_predivide_factor .... 1.0 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] initial_dynamic_scale ........ 4294967296 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] loss_scale ................... 0 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] memory_breakdown ............. False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] optimizer_legacy_fusion ...... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] optimizer_name ............... adamw [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] optimizer_params ............. {'lr': 5e-05, 'betas': [0.9, 0.999], 'eps': 1e-08} [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] pld_enabled .................. False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] pld_params ................... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] prescale_gradients ........... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] scheduler_name ............... WarmupLR [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] scheduler_params ............. {'warmup_min_lr': 0, 'warmup_max_lr': 5e-05, 'warmup_num_steps': 100} [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] sparse_attention ............. None [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] sparse_gradients_enabled ..... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] steps_per_print .............. 10 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] tensorboard_enabled .......... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] tensorboard_job_name ......... DeepSpeedJobName [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] tensorboard_output_path ...... [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] train_batch_size ............. 4 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] train_micro_batch_size_per_gpu 4 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] wall_clock_breakdown ......... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] world_size ................... 1 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] zero_allow_untested_optimizer False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] zero_config .................. { "allgather_bucket_size": 500000000, "allgather_partitions": true, "contiguous_gradients": true, "cpu_offload": true, "cpu_offload_params": true, "cpu_offload_use_pin_memory": false, "elastic_checkpoint": true, "load_from_fp32_weights": true, "max_live_parameters": 1000000000, "max_reuse_distance": 1000000000, "overlap_comm": true, "param_persistence_threshold": 100000, "prefetch_bucket_size": 50000000, "reduce_bucket_size": 500000000, "reduce_scatter": true, "stage": 3, "sub_group_size": 1000000000000 } [2021-06-08 12:02:23,016] [INFO] [config.py:741:print] zero_enabled ................. True [2021-06-08 12:02:23,016] [INFO] [config.py:741:print] zero_optimization_stage ...... 3 [2021-06-08 12:02:23,016] [INFO] [config.py:747:print] json = { "fp16":{ "enabled":true, "min_loss_scale":1, "opt_level":"O3" }, "gradient_accumulation_steps":1, "gradient_clipping":1.0, "optimizer":{ "params":{ "betas":[ 0.9, 0.999 ], "eps":1e-08, "lr":5e-05 }, "type":"AdamW" }, "scheduler":{ "params":{ "warmup_max_lr":5e-05, "warmup_min_lr":0, "warmup_num_steps":100 }, "type":"WarmupLR" }, "train_micro_batch_size_per_gpu":4, "zero_optimization":{ "contiguous_gradients":true, "cpu_offload":true, "cpu_offload_params":true, "overlap_comm":true, "stage":3 } } 0%| | 0/63645 [00:00<?, ?it/s][2021-06-08 12:02:25,515] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4294967296, reducing to 4294967296 0%| | 1/63645 [00:02<44:05:31, 2.49s/it][2021-06-08 12:02:27,976] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0 0%| | 2/63645 [00:04<43:44:33, 2.47s/it][2021-06-08 12:02:30,098] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0 0%| | 3/63645 [00:07<40:53:49, 2.31s/it][2021-06-08 12:02:32,216] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0 0%| | 4/63645 [00:09<39:31:55, 2.24s/it][2021-06-08 12:02:34,369] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0 0%| | 5/63645 [00:11<39:00:10, 2.21s/it][2021-06-08 12:02:36,493] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0 0%| | 6/63645 [00:13<38:30:25, 2.18s/it][2021-06-08 12:02:38,621] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0 0%| | 7/63645 [00:15<38:13:03, 2.16s/it][2021-06-08 12:02:40,756] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0 0%| | 8/63645 [00:17<38:03:59, 2.15s/it][2021-06-08 12:02:42,877] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0 0%| | 9/63645 [00:19<37:53:00, 2.14s/it][2021-06-08 12:02:45,001] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0 [2021-06-08 12:02:45,002] [INFO] [timer.py:157:stop] 0/10, SamplesPerSec=1.8825260836384787 0%| | 10/63645 [00:21<37:46:42, 2.14s/it][2021-06-08 12:02:47,123] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0 0%| | 11/63645 [00:24<37:41:45, 2.13s/it][2021-06-08 12:02:49,251] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0 0%| | 12/63645 [00:26<37:40:18, 2.13s/it][2021-06-08 12:02:51,371] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0 0%| | 13/63645 [00:28<37:36:38, 2.13s/it][2021-06-08 12:02:53,504] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0 0%| | 14/63645 [00:30<37:38:20, 2.13s/it][2021-06-08 12:02:55,643] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0 0%| | 15/63645 [00:32<37:41:12, 2.13s/it][2021-06-08 12:02:57,770] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 0%| | 16/63645 [00:34<37:39:38, 2.13s/it][2021-06-08 12:02:59,892] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072.0, reducing to 65536.0 0%| | 17/63645 [00:36<37:37:35, 2.13s/it][2021-06-08 12:03:02,036] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536.0, reducing to 32768.0 0%| | 18/63645 [00:39<37:41:26, 2.13s/it][2021-06-08 12:03:04,169] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0 0%| | 19/63645 [00:41<37:41:35, 2.13s/it][2021-06-08 12:03:06,275] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0 [2021-06-08 12:03:06,275] [INFO] [timer.py:157:stop] 0/20, SamplesPerSec=1.8830545084715222 0%| | 20/63645 [00:43<37:32:58, 2.12s/it][2021-06-08 12:03:08,413] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0 0%| | 21/63645 [00:45<37:37:14, 2.13s/it][2021-06-08 12:03:10,537] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096.0, reducing to 2048.0 0%| | 22/63645 [00:47<37:35:54, 2.13s/it][2021-06-08 12:03:12,696] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048.0, reducing to 1024.0 0%| | 23/63645 [00:49<37:45:52, 2.14s/it][2021-06-08 12:03:14,855] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1024.0, reducing to 512.0 0%| | 24/63645 [00:51<37:52:39, 2.14s/it][2021-06-08 12:03:17,010] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 512.0, reducing to 256.0 0%| | 25/63645 [00:53<37:56:31, 2.15s/it][2021-06-08 12:03:19,143] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 256.0, reducing to 128.0 0%| | 26/63645 [00:56<37:51:51, 2.14s/it][2021-06-08 12:03:21,267] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 128.0, reducing to 64.0 0%| | 27/63645 [00:58<37:45:52, 2.14s/it][2021-06-08 12:03:23,390] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 64.0, reducing to 32.0 0%| | 28/63645 [01:00<37:41:21, 2.13s/it][2021-06-08 12:03:25,526] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32.0, reducing to 16.0 0%| | 29/63645 [01:02<37:42:30, 2.13s/it][2021-06-08 12:03:27,669] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16.0, reducing to 8.0 [2021-06-08 12:03:27,670] [INFO] [timer.py:157:stop] 0/30, SamplesPerSec=1.8793183327344205 0%| | 30/63645 [01:04<37:45:17, 2.14s/it][2021-06-08 12:03:29,803] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8.0, reducing to 4.0 0%| | 31/63645 [01:06<37:44:34, 2.14s/it][2021-06-08 12:03:31,936] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4.0, reducing to 2.0 0%| | 32/63645 [01:08<37:43:25, 2.13s/it][2021-06-08 12:03:34,071] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2.0, reducing to 1.0 0%| | 33/63645 [01:11<37:43:30, 2.13s/it][2021-06-08 12:03:36,202] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1.0, reducing to 1 0%| | 34/63645 [01:13<37:42:09, 2.13s/it][2021-06-08 12:03:38,344] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 35/63645 [01:15<37:44:52, 2.14s/it][2021-06-08 12:03:40,449] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 36/63645 [01:17<37:34:48, 2.13s/it][2021-06-08 12:03:42,570] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 37/63645 [01:19<37:32:51, 2.13s/it][2021-06-08 12:03:44,722] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 38/63645 [01:21<37:41:34, 2.13s/it][2021-06-08 12:03:46,892] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 39/63645 [01:23<37:53:09, 2.14s/it][2021-06-08 12:03:49,014] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-06-08 12:03:49,015] [INFO] [timer.py:157:stop] 0/40, SamplesPerSec=1.8786745912560845 0%| | 40/63645 [01:25<37:46:00, 2.14s/it][2021-06-08 12:03:51,134] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 41/63645 [01:28<37:40:15, 2.13s/it][2021-06-08 12:03:53,292] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 42/63645 [01:30<37:48:30, 2.14s/it][2021-06-08 12:03:55,423] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 43/63645 [01:32<37:45:29, 2.14s/it][2021-06-08 12:03:57,549] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 44/63645 [01:34<37:42:00, 2.13s/it][2021-06-08 12:03:59,675] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 45/63645 [01:36<37:39:17, 2.13s/it][2021-06-08 12:04:01,811] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 46/63645 [01:38<37:40:58, 2.13s/it][2021-06-08 12:04:03,961] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 47/63645 [01:40<37:46:14, 2.14s/it][2021-06-08 12:04:06,086] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 48/63645 [01:43<37:42:06, 2.13s/it]

    opened by mallorbc 6
  • 2.7B model hardware requirements

    2.7B model hardware requirements

    I have tried fine tunning this 2.7B parameter model with my RTX 3090 and with 64 GB of RAM. Looking at system resources, I am exhausting all of my RAM before the program is killed. My question is what hardware was used in this repo? Specifically, how much RAM is required to train the 2.7B model?

    opened by mallorbc 3
  • '<|startoftext|>' bug

    '<|startoftext|>' bug

    Hi! Thank you for your great project.
    I am following your example. But, I found small bug in tokenizer.
    See below.

    >>> tokenizer = GPT2Tokenizer.from_pretrained(
    ...       "EleutherAI/gpt-neo-1.3B", 
    ...       bos_token="<|startoftext|>",
    ...       eos_token="<|endoftext|>",
    ...       pad_token="<|pad|>",
    ... )
    >>> tokenizer("<|startoftext|>")
     {'input_ids': [27, 91, 9688, 1659, 5239, 91, 29], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}
    >>> tokenizer("<|pad|>")
    {'input_ids': [50257], 'attention_mask': [1]}
    >>> tokenizer("<|endoftext|>")
    {'input_ids': [50256], 'attention_mask': [1]}
    
    opened by sooftware 3
  • GPU Memory Requirements

    GPU Memory Requirements

    Question

    Just a quick question, is anyone aware of how much GPU memory is required to train these models? I'm on Kaggle with a P100(16GB) and I can't see to to call .train() without running out memory on any of the 3 models available.

    I've tried with the 2.7B, 1.3B and 125M param models and I get the same result with all 3, surely a P100 can handle the 125M model. 🤔

    I might order some more RAM for my home server and try this again on a CPU, I saw from the other post here it looks like I'll need at least 75GB which isn't bad at all, plus it gives me an excuse to upgrade it haha.

    RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 15.90 GiB total capacity; 14.71 GiB already allocated; 77.75 MiB free; 14.95 GiB reserved in total by PyTorch)
    
    opened by AaronWatson2975 2
  • add a note to remove the torch.distributed emulation

    add a note to remove the torch.distributed emulation

    it looks like users try to run this example on torch.distributed with multiple gpus and of course it fails.

    So I'm proposing to please at least add a note to remove the torch.distributed emulation hack for when multi-gpu setup is used.

    Please feel free to edit the wording to your liking

    Thank you!

    opened by stas00 1
  • Freezing at

    Freezing at "Using /home/user/.cache/torch_extensions as PyTorch extensions root..."

    Prreviously I was able to run the model, but getting loss overflows for any custom data or running out of RAM, but it seemed to work for the Netflix dataset. I made a new conda environment and now I get a point where some Vram is allocated before it just freezes at this point.

    python gpt_neo_xl_deepspeed.py Special tokens have been added in the vocabulary, make sure the associated word embedding are fine-tuned or trained. Max length: 384 [2021-06-08 10:46:33,763] [INFO] [distributed.py:47:init_distributed] Initializing torch distributed with backend: nccl [2021-06-08 10:46:33,899] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed info: version=0.3.13, git-hash=unknown, git-branch=unknown [2021-06-08 10:46:34,019] [INFO] [engine.py:77:_initialize_parameter_parallel_groups] data_parallel_size: 1, parameter_parallel_size: 1 Using /home/user/.cache/torch_extensions as PyTorch extensions root...

    I feel like this may be due to environment issues, perhaps cudatoolkit since that is not specified. What cudatoolkit was used? Does anyone have any idea what the issue may be? See below for my env:

    name: gpt_neo_train channels:

    • nvidia
    • defaults dependencies:
    • _libgcc_mutex=0.1=main
    • ca-certificates=2021.5.25=h06a4308_1
    • certifi=2021.5.30=py37h06a4308_0
    • cudatoolkit=11.1.74=h6bb024c_0
    • ld_impl_linux-64=2.33.1=h53a641e_7
    • libffi=3.3=he6710b0_2
    • libgcc-ng=9.1.0=hdf63c60_0
    • libstdcxx-ng=9.1.0=hdf63c60_0
    • ncurses=6.2=he6710b0_1
    • openssl=1.1.1k=h27cfd23_0
    • pip=21.1.1=py37h06a4308_0
    • python=3.7.10=hdb3f193_0
    • readline=8.1=h27cfd23_0
    • setuptools=52.0.0=py37h06a4308_0
    • sqlite=3.35.4=hdfb4753_0
    • tk=8.6.10=hbc83047_0
    • wheel=0.36.2=pyhd3eb1b0_0
    • xz=5.2.5=h7b6447c_0
    • zlib=1.2.11=h7b6447c_3
    • pip:
      • chardet==4.0.0
      • click==8.0.1
      • deepspeed==0.3.13
      • filelock==3.0.12
      • idna==2.10
      • importlib-metadata==4.5.0
      • joblib==1.0.1
      • ninja==1.10.0.post2
      • numpy==1.17.3
      • packaging==20.9
      • pandas==1.2.2
      • pillow==8.2.0
      • protobuf==3.17.2
      • psutil==5.8.0
      • pyparsing==2.4.7
      • python-dateutil==2.8.1
      • pytz==2021.1
      • regex==2021.4.4
      • requests==2.25.1
      • sacremoses==0.0.45
      • six==1.16.0
      • tensorboardx==1.8
      • tokenizers==0.10.3
      • torch==1.8.1+cu111
      • torchsummary==1.5.1
      • torchvision==0.9.1+cu111
      • tqdm==4.61.0
      • transformers==4.5.0
      • typing-extensions==3.10.0.0
      • urllib3==1.26.5
      • zipp==3.4.1 prefix: /home/user/anaconda3/envs/gpt_neo_train
    opened by mallorbc 1
  • Training params?

    Training params?

    opened by aaronrmm 1
  • RuntimeError: Error building extension 'cpu_adam'

    RuntimeError: Error building extension 'cpu_adam'

    Trainer(model=model, args=training_args, train_dataset=train_dataset, eval_dataset=val_dataset, data_collator=lambda data: {'input_ids': torch.stack([f[0] for f in data]), 'attention_mask': torch.stack([f[1] for f in data]), 'labels': torch.stack([f[0] for f in data])}).train()


    CalledProcessError Traceback (most recent call last) File /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py:1900, in _run_ninja_build(build_directory, verbose, error_prefix) 1899 stdout_fileno = 1 -> 1900 subprocess.run( 1901 command, 1902 stdout=stdout_fileno if verbose else subprocess.PIPE, 1903 stderr=subprocess.STDOUT, 1904 cwd=build_directory, 1905 check=True, 1906 env=env) 1907 except subprocess.CalledProcessError as e: 1908 # Python 2 and 3 compatible way of getting the error object.

    File /opt/conda/envs/pytorch/lib/python3.9/subprocess.py:528, in run(input, capture_output, timeout, check, *popenargs, **kwargs) 527 if check and retcode: --> 528 raise CalledProcessError(retcode, process.args, 529 output=stdout, stderr=stderr) 530 return CompletedProcess(process.args, retcode, stdout, stderr)

    CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

    The above exception was the direct cause of the following exception:

    RuntimeError Traceback (most recent call last) Cell In [10], line 1 ----> 1 Trainer(model=model, args=training_args, train_dataset=train_dataset, 2 eval_dataset=val_dataset, data_collator=lambda data: {'input_ids': torch.stack([f[0] for f in data]), 3 'attention_mask': torch.stack([f[1] for f in data]), 4 'labels': torch.stack([f[0] for f in data])}).train()

    File /opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/trainer.py:1527, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs) 1522 self.model_wrapped = self.model 1524 inner_training_loop = find_executable_batch_size( 1525 self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size 1526 ) -> 1527 return inner_training_loop( 1528 args=args, 1529 resume_from_checkpoint=resume_from_checkpoint, 1530 trial=trial, 1531 ignore_keys_for_eval=ignore_keys_for_eval, 1532 )

    File /opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/trainer.py:1596, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval) 1589 delay_optimizer_creation = ( 1590 self.sharded_ddp is not None 1591 and self.sharded_ddp != ShardedDDPOption.SIMPLE 1592 or is_sagemaker_mp_enabled() 1593 or self.fsdp is not None 1594 ) 1595 if args.deepspeed: -> 1596 deepspeed_engine, optimizer, lr_scheduler = deepspeed_init( 1597 self, num_training_steps=max_steps, resume_from_checkpoint=resume_from_checkpoint 1598 ) 1599 self.model = deepspeed_engine.module 1600 self.model_wrapped = deepspeed_engine

    File /opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/deepspeed.py:344, in deepspeed_init(trainer, num_training_steps, resume_from_checkpoint, inference) 333 # keep for quick debug: 334 # from pprint import pprint; pprint(config) 336 kwargs = dict( 337 model=model, 338 model_parameters=model_parameters, (...) 341 lr_scheduler=lr_scheduler, 342 ) --> 344 deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs) 346 if resume_from_checkpoint is not None: 347 348 # it's possible that the user is trying to resume from model_path, which doesn't necessarily 349 # contain a deepspeed checkpoint. e.g. examples just check if the dir exists and assume it's 350 # a resume from a checkpoint and not just a local pretrained weight. So we check here if the 351 # path contains what looks like a deepspeed checkpoint 352 import glob

    File /opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/init.py:125, in initialize(args, model, optimizer, model_parameters, training_data, lr_scheduler, mpu, dist_init_required, collate_fn, config, config_params) 122 assert model is not None, "deepspeed.initialize requires a model" 124 if not isinstance(model, PipelineModule): --> 125 engine = DeepSpeedEngine(args=args, 126 model=model, 127 optimizer=optimizer, 128 model_parameters=model_parameters, 129 training_data=training_data, 130 lr_scheduler=lr_scheduler, 131 mpu=mpu, 132 dist_init_required=dist_init_required, 133 collate_fn=collate_fn, 134 config=config, 135 config_params=config_params) 136 else: 137 assert mpu is None, "mpu must be None with pipeline parallelism"

    File /opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/runtime/engine.py:330, in DeepSpeedEngine.init(self, args, model, optimizer, model_parameters, training_data, lr_scheduler, mpu, dist_init_required, collate_fn, config, config_params, dont_change_device) 327 model_parameters = self.module.parameters() 329 if has_optimizer: --> 330 self._configure_optimizer(optimizer, model_parameters) 331 self._configure_lr_scheduler(lr_scheduler) 332 self._report_progress(0)

    File /opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/runtime/engine.py:1195, in DeepSpeedEngine._configure_optimizer(self, client_optimizer, model_parameters) 1193 log_dist('Using client callable to create basic optimizer', ranks=[0]) 1194 else: -> 1195 basic_optimizer = self._configure_basic_optimizer(model_parameters) 1196 log_dist( 1197 f"Using DeepSpeed Optimizer param name {self.optimizer_name()} as basic optimizer", 1198 ranks=[0]) 1200 self._check_for_duplicates(basic_optimizer)

    File /opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/runtime/engine.py:1266, in DeepSpeedEngine._configure_basic_optimizer(self, model_parameters) 1264 else: 1265 from deepspeed.ops.adam import DeepSpeedCPUAdam -> 1266 optimizer = DeepSpeedCPUAdam(model_parameters, 1267 **optimizer_parameters, 1268 adamw_mode=effective_adam_w_mode) 1269 else: 1270 from deepspeed.ops.adam import FusedAdam

    File /opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py:94, in DeepSpeedCPUAdam.init(self, model_params, lr, bias_correction, betas, eps, weight_decay, amsgrad, adamw_mode, fp32_optimizer_states) 92 self.adam_w_mode = adamw_mode 93 self.fp32_optimizer_states = fp32_optimizer_states ---> 94 self.ds_opt_adam = CPUAdamBuilder().load() 96 self.ds_opt_adam.create_adam(self.opt_id, 97 lr, 98 betas[0], (...) 102 adamw_mode, 103 should_log_le("info"))

    File /opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py:460, in OpBuilder.load(self, verbose) 458 return importlib.import_module(self.absolute_name()) 459 else: --> 460 return self.jit_load(verbose)

    File /opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py:495, in OpBuilder.jit_load(self, verbose) 492 torch_arch_list = os.environ.get("TORCH_CUDA_ARCH_LIST") 493 os.environ["TORCH_CUDA_ARCH_LIST"] = "" --> 495 op_module = load( 496 name=self.name, 497 sources=self.strip_empty_entries(sources), 498 extra_include_paths=self.strip_empty_entries(extra_include_paths), 499 extra_cflags=self.strip_empty_entries(self.cxx_args()), 500 extra_cuda_cflags=self.strip_empty_entries(self.nvcc_args()), 501 extra_ldflags=self.strip_empty_entries(self.extra_ldflags()), 502 verbose=verbose) 503 build_duration = time.time() - start_build 504 if verbose:

    File /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py:1284, in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates) 1192 def load(name, 1193 sources: Union[str, List[str]], 1194 extra_cflags=None, (...) 1202 is_standalone=False, 1203 keep_intermediates=True): 1204 r''' 1205 Loads a PyTorch C++ extension just-in-time (JIT). 1206 (...) 1282 ... verbose=True) 1283 ''' -> 1284 return _jit_compile( 1285 name, 1286 [sources] if isinstance(sources, str) else sources, 1287 extra_cflags, 1288 extra_cuda_cflags, 1289 extra_ldflags, 1290 extra_include_paths, 1291 build_directory or _get_build_directory(name, verbose), 1292 verbose, 1293 with_cuda, 1294 is_python_module, 1295 is_standalone, 1296 keep_intermediates=keep_intermediates)

    File /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py:1508, in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates) 1504 hipified_sources.add(hipify_result[s_abs]["hipified_path"] if s_abs in hipify_result else s_abs) 1506 sources = list(hipified_sources) -> 1508 _write_ninja_file_and_build_library( 1509 name=name, 1510 sources=sources, 1511 extra_cflags=extra_cflags or [], 1512 extra_cuda_cflags=extra_cuda_cflags or [], 1513 extra_ldflags=extra_ldflags or [], 1514 extra_include_paths=extra_include_paths or [], 1515 build_directory=build_directory, 1516 verbose=verbose, 1517 with_cuda=with_cuda, 1518 is_standalone=is_standalone) 1519 finally: 1520 baton.release()

    File /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py:1623, in _write_ninja_file_and_build_library(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_standalone) 1621 if verbose: 1622 print(f'Building extension module {name}...', file=sys.stderr) -> 1623 _run_ninja_build( 1624 build_directory, 1625 verbose, 1626 error_prefix=f"Error building extension '{name}'")

    File /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py:1916, in _run_ninja_build(build_directory, verbose, error_prefix) 1914 if hasattr(error, 'output') and error.output: # type: ignore[union-attr] 1915 message += f": {error.output.decode(*SUBPROCESS_DECODE_ARGS)}" # type: ignore[union-attr] -> 1916 raise RuntimeError(message) from e

    RuntimeError: Error building extension 'cpu_adam'

    opened by ivrschool 0
  • Deepspeed stuck

    Deepspeed stuck

    When replicating the code Deepspeed gets stuck with [2021-06-29 14:29:44,757] [INFO] [utils.py:13:_initialize_parameter_parallel_groups] data_parallel_size: 1, parameter_parallel_size: 1

    Any ideas on how to fix this?

    opened by SamsTheGreatest 0
  • Saving and loading model / tokenizer issues

    Saving and loading model / tokenizer issues

    If I want to save and run generation on the model later on, I assume I do something like this:

    After training: tokenizer.save_pretrained('./results/')

    Later generation:

    weights = "./results/"
    tokenizer = GPT2Tokenizer.from_pretrained(weights, bos_token='<|startoftext|>', eos_token='<|endoftext|>', pad_token='<|pad|>')
    model = GPTNeoForCausalLM.from_pretrained('./results/checkpoint-90/').cuda()
    

    But I get an error about the size not being right. Any idea why?

    File "generate.py", line 12, in <module>
        model = GPTNeoForCausalLM.from_pretrained('./results/checkpoint-90').cuda()
      File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/transformers/modeling_utils.py", line 1183, in from_pretrained
        raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
    RuntimeError: Error(s) in loading state_dict for GPTNeoForCausalLM:
    	size mismatch for transformer.wte.weight: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([50258, 768]).
    
    
    opened by Shane-Neeley 0
Owner
Nikita
Team Lead Java/JVM/C++/Python/ML                     5 years in Fintech                                 10+ years in JVM languages         
Nikita
Fine-tune pretrained Convolutional Neural Networks with PyTorch

Fine-tune pretrained Convolutional Neural Networks with PyTorch. Features Gives access to the most popular CNN architectures pretrained on ImageNet. A

Alex Parinov 694 Nov 23, 2022
GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model -- based on GPT-3, called GPT-Codex -- that is fine-tuned on publicly available code from GitHub.

null 2.3k Jan 9, 2023
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Microsoft 8.4k Jan 1, 2023
GPT, but made only out of gMLPs

GPT - gMLP This repository will attempt to crack long context autoregressive language modeling (GPT) using variations of gMLPs. Specifically, it will

Phil Wang 80 Dec 1, 2022
A GPT, made only of MLPs, in Jax

MLP GPT - Jax (wip) A GPT, made only of MLPs, in Jax. The specific MLP to be used are gMLPs with the Spatial Gating Units. Working Pytorch implementat

Phil Wang 53 Sep 27, 2022
Saeed Lotfi 28 Dec 12, 2022
DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Differentiable Model Compression via Pseudo Quantization Noise DiffQ performs differentiable quantization using pseudo quantization noise. It can auto

Facebook Research 145 Dec 30, 2022
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

Melon(Xuguang Duan) 96 Nov 1, 2022
Train emoji embeddings based on emoji descriptions.

emoji2vec This is my attempt to train, visualize and evaluate emoji embeddings as presented by Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko

Miruna Pislar 17 Sep 3, 2022
Official PyTorch implementation of the paper "TEMOS: Generating diverse human motions from textual descriptions"

TEMOS: TExt to MOtionS Generating diverse human motions from textual descriptions Description Official PyTorch implementation of the paper "TEMOS: Gen

Mathis Petrovich 187 Dec 27, 2022
Generate vibrant and detailed images using only text.

CLIP Guided Diffusion From RiversHaveWings. Generate vibrant and detailed images using only text. See captions and more generations in the Gallery See

Clay M. 401 Dec 28, 2022
SparseML is a libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

SparseML is a toolkit that includes APIs, CLIs, scripts and libraries that apply state-of-the-art sparsification algorithms such as pruning and quantization to any neural network. General, recipe-driven approaches built around these algorithms enable the simplification of creating faster and smaller models for the ML performance community at large.

Neural Magic 1.5k Dec 30, 2022
sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

sequitur sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code. It implements three differ

Jonathan Shobrook 305 Dec 21, 2022
Deploy a ML inference service on a budget in less than 10 lines of code.

BudgetML is perfect for practitioners who would like to quickly deploy their models to an endpoint, but not waste a lot of time, money, and effort trying to figure out how to do this end-to-end.

null 1.3k Dec 25, 2022
Train neural network for semantic segmentation (deep lab V3) with pytorch in less then 50 lines of code

Train neural network for semantic segmentation (deep lab V3) with pytorch in 50 lines of code Train net semantic segmentation net using Trans10K datas

null 17 Dec 19, 2022
Create Data & AI apps in 20 lines of code with Shimoku

Install with: pip install shimoku-api-python Start with: from os import getenv import shimoku_api_python.client as Shimoku

Shimoku 5 Nov 7, 2022
Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.

Yasunori Shimura 7 Jul 27, 2022
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

Nader Akoury 27 Dec 20, 2022
ChatBot-Pytorch - A GPT-2 ChatBot implemented using Pytorch and Huggingface-transformers

ChatBot-Pytorch A GPT-2 ChatBot implemented using Pytorch and Huggingface-transf

ParZival 42 Dec 9, 2022