Fine-Tune EleutherAI GPT-Neo to Generate Netflix Movie Descriptions in Only 47 Lines of Code Using Hugginface And DeepSpeed

Nikita

Last update: Jan 5, 2023

Related tags

Deep Learning text-generation fine-tuning gpt-3 deepspeed deepspeed-library gpt-neo gpt-neo-xl gpt-neo-fine-tuning gpt-neo-hugging-face gpt-neo-text-generation

Overview

GPT-Neo-2.7B Fine-Tuning Example Using HuggingFace & DeepSpeed

Installation

cd venv/bin
./pip install -r ../../requirements.txt 
./pip install deepspeed==0.3.13

Example with GPT-Neo-2.7B with DeepSpeed
Example with GPT-Neo-1.3B without DeepSpeed
DeepSpeed configuration
Training and testing log

GPU VRAM load during training

Resilts

Comments

DeepSpeed Loss Overflow

Hardware: Rtx 3090, Ryzen 3600, 64 GB of RAM.

I am trying to train the 1.3B parameter model on a custom dataset. This model training takes more memory due to the longer input(or so it seems, it def takes more memory). Thus I am trying to use DeepSpeed. I have changed nothing other than using the smaller 1.3B model and reducing the batch size to 4.

An issue I am having is that the loss(I think its the loss) is overflowing. I know this is due to using mixed or half-precision in order to reduce memory usage. When training on the provided dataset, this is not an issue. The provided dataset does initially have the overflow issue, but it is quickly resolved through internal adjustments. Is there some configuration change I can make so that this custom dataset will work without overflowing?

Below are some logs, you can see that it still is overflowing even with the loss scale at 1.

python gpt_neo_xl_deepspeed.py Special tokens have been added in the vocabulary, make sure the associated word embedding are fine-tuned or trained. Max length: 384 [2021-06-08 12:02:15,302] [INFO] [distributed.py:47:init_distributed] Initializing torch distributed with backend: nccl [2021-06-08 12:02:15,601] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed info: version=0.3.13+12a53b4, git-hash=12a53b4, git-branch=HEAD [2021-06-08 12:02:15,622] [INFO] [engine.py:77:_initialize_parameter_parallel_groups] data_parallel_size: 1, parameter_parallel_size: 1 Adam Optimizer #0 is created with scalar arithmetic capability. Config: alpha=0.000050, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1 [2021-06-08 12:02:17,843] [INFO] [engine.py:602:_configure_optimizer] Using DeepSpeed Optimizer param name adamw as basic optimizer [2021-06-08 12:02:17,843] [INFO] [engine.py:606:_configure_optimizer] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.DeepSpeedCPUAdam'> [2021-06-08 12:02:17,843] [INFO] [logging.py:60:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer Initializing ZeRO Stage 3 [2021-06-08 12:02:17,844] [WARNING] [stage3.py:35:] apex was installed without --cpp_ext. Falling back to Python flatten and unflatten. [2021-06-08 12:02:17,844] [INFO] [utils.py:555:see_memory_usage] Stage 3 intialize beginning /home/blake/anaconda3/envs/gpt/lib/python3.7/site-packages/torch/cuda/memory.py:346: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved FutureWarning) /home/blake/anaconda3/envs/gpt/lib/python3.7/site-packages/torch/cuda/memory.py:354: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved FutureWarning) [2021-06-08 12:02:17,845] [INFO] [utils.py:560:see_memory_usage] MA 2.5 GB Max_MA 5.14 GB CA 5.14 GB Max_CA 5 GB [2021-06-08 12:02:17,845] [INFO] [utils.py:565:see_memory_usage] CPU Virtual Memory: used = 24.87 GB, percent = 39.6% [2021-06-08 12:02:17,845] [INFO] [stage3.py:586:init] Reduce bucket size 500000000 [2021-06-08 12:02:17,845] [INFO] [stage3.py:587:init] Allgather bucket size 50000000 [2021-06-08 12:02:22,511] [INFO] [stage3.py:730:init] optimizer state initialized [2021-06-08 12:02:23,014] [INFO] [utils.py:555:see_memory_usage] After initializing ZeRO optimizer [2021-06-08 12:02:23,014] [INFO] [utils.py:560:see_memory_usage] MA 0.43 GB Max_MA 5.14 GB CA 5.53 GB Max_CA 6 GB [2021-06-08 12:02:23,014] [INFO] [utils.py:565:see_memory_usage] CPU Virtual Memory: used = 47.03 GB, percent = 74.9% [2021-06-08 12:02:23,014] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw [2021-06-08 12:02:23,014] [INFO] [engine.py:439:_configure_lr_scheduler] DeepSpeed using configured LR scheduler = WarmupLR [2021-06-08 12:02:23,014] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed LR Scheduler = <deepspeed.runtime.lr_schedules.WarmupLR object at 0x7fb428040ed0> [2021-06-08 12:02:23,014] [INFO] [logging.py:60:log_dist] [Rank 0] step=0, skipped=0, lr=[5e-05], mom=[[0.9, 0.999]] [2021-06-08 12:02:23,014] [INFO] [config.py:737:print] DeepSpeedEngine configuration: [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] activation_checkpointing_config { "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "partition_activations": false, "profile": false, "synchronize_checkpoint_boundary": false } [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] allreduce_always_fp32 ........ False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] amp_enabled .................. False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] amp_params ................... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] checkpoint_tag_validation_enabled True [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] checkpoint_tag_validation_fail False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] disable_allgather ............ False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] dump_state ................... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] dynamic_loss_scale_args ...... {'init_scale': 4294967296, 'scale_window': 1000, 'delayed_shift': 2, 'min_scale': 1} [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] elasticity_enabled ........... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] flops_profiler_config ........ { "detailed": true, "enabled": false, "module_depth": -1, "profile_step": 1, "top_modules": 3 } [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] fp16_enabled ................. True [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] global_rank .................. 0 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] gradient_accumulation_steps .. 1 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] gradient_clipping ............ 1.0 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] gradient_predivide_factor .... 1.0 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] initial_dynamic_scale ........ 4294967296 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] loss_scale ................... 0 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] memory_breakdown ............. False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] optimizer_legacy_fusion ...... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] optimizer_name ............... adamw [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] optimizer_params ............. {'lr': 5e-05, 'betas': [0.9, 0.999], 'eps': 1e-08} [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] pld_enabled .................. False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] pld_params ................... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] prescale_gradients ........... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] scheduler_name ............... WarmupLR [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] scheduler_params ............. {'warmup_min_lr': 0, 'warmup_max_lr': 5e-05, 'warmup_num_steps': 100} [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] sparse_attention ............. None [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] sparse_gradients_enabled ..... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] steps_per_print .............. 10 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] tensorboard_enabled .......... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] tensorboard_job_name ......... DeepSpeedJobName [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] tensorboard_output_path ...... [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] train_batch_size ............. 4 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] train_micro_batch_size_per_gpu 4 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] wall_clock_breakdown ......... False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] world_size ................... 1 [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] zero_allow_untested_optimizer False [2021-06-08 12:02:23,015] [INFO] [config.py:741:print] zero_config .................. { "allgather_bucket_size": 500000000, "allgather_partitions": true, "contiguous_gradients": true, "cpu_offload": true, "cpu_offload_params": true, "cpu_offload_use_pin_memory": false, "elastic_checkpoint": true, "load_from_fp32_weights": true, "max_live_parameters": 1000000000, "max_reuse_distance": 1000000000, "overlap_comm": true, "param_persistence_threshold": 100000, "prefetch_bucket_size": 50000000, "reduce_bucket_size": 500000000, "reduce_scatter": true, "stage": 3, "sub_group_size": 1000000000000 } [2021-06-08 12:02:23,016] [INFO] [config.py:741:print] zero_enabled ................. True [2021-06-08 12:02:23,016] [INFO] [config.py:741:print] zero_optimization_stage ...... 3 [2021-06-08 12:02:23,016] [INFO] [config.py:747:print] json = { "fp16":{ "enabled":true, "min_loss_scale":1, "opt_level":"O3" }, "gradient_accumulation_steps":1, "gradient_clipping":1.0, "optimizer":{ "params":{ "betas":[ 0.9, 0.999 ], "eps":1e-08, "lr":5e-05 }, "type":"AdamW" }, "scheduler":{ "params":{ "warmup_max_lr":5e-05, "warmup_min_lr":0, "warmup_num_steps":100 }, "type":"WarmupLR" }, "train_micro_batch_size_per_gpu":4, "zero_optimization":{ "contiguous_gradients":true, "cpu_offload":true, "cpu_offload_params":true, "overlap_comm":true, "stage":3 } } 0%| | 0/63645 [00:00<?, ?it/s][2021-06-08 12:02:25,515] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4294967296, reducing to 4294967296 0%| | 1/63645 [00:02<44:05:31, 2.49s/it][2021-06-08 12:02:27,976] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0 0%| | 2/63645 [00:04<43:44:33, 2.47s/it][2021-06-08 12:02:30,098] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0 0%| | 3/63645 [00:07<40:53:49, 2.31s/it][2021-06-08 12:02:32,216] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0 0%| | 4/63645 [00:09<39:31:55, 2.24s/it][2021-06-08 12:02:34,369] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0 0%| | 5/63645 [00:11<39:00:10, 2.21s/it][2021-06-08 12:02:36,493] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0 0%| | 6/63645 [00:13<38:30:25, 2.18s/it][2021-06-08 12:02:38,621] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0 0%| | 7/63645 [00:15<38:13:03, 2.16s/it][2021-06-08 12:02:40,756] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0 0%| | 8/63645 [00:17<38:03:59, 2.15s/it][2021-06-08 12:02:42,877] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0 0%| | 9/63645 [00:19<37:53:00, 2.14s/it][2021-06-08 12:02:45,001] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0 [2021-06-08 12:02:45,002] [INFO] [timer.py:157:stop] 0/10, SamplesPerSec=1.8825260836384787 0%| | 10/63645 [00:21<37:46:42, 2.14s/it][2021-06-08 12:02:47,123] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0 0%| | 11/63645 [00:24<37:41:45, 2.13s/it][2021-06-08 12:02:49,251] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0 0%| | 12/63645 [00:26<37:40:18, 2.13s/it][2021-06-08 12:02:51,371] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0 0%| | 13/63645 [00:28<37:36:38, 2.13s/it][2021-06-08 12:02:53,504] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0 0%| | 14/63645 [00:30<37:38:20, 2.13s/it][2021-06-08 12:02:55,643] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0 0%| | 15/63645 [00:32<37:41:12, 2.13s/it][2021-06-08 12:02:57,770] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 0%| | 16/63645 [00:34<37:39:38, 2.13s/it][2021-06-08 12:02:59,892] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072.0, reducing to 65536.0 0%| | 17/63645 [00:36<37:37:35, 2.13s/it][2021-06-08 12:03:02,036] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536.0, reducing to 32768.0 0%| | 18/63645 [00:39<37:41:26, 2.13s/it][2021-06-08 12:03:04,169] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0 0%| | 19/63645 [00:41<37:41:35, 2.13s/it][2021-06-08 12:03:06,275] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0 [2021-06-08 12:03:06,275] [INFO] [timer.py:157:stop] 0/20, SamplesPerSec=1.8830545084715222 0%| | 20/63645 [00:43<37:32:58, 2.12s/it][2021-06-08 12:03:08,413] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0 0%| | 21/63645 [00:45<37:37:14, 2.13s/it][2021-06-08 12:03:10,537] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096.0, reducing to 2048.0 0%| | 22/63645 [00:47<37:35:54, 2.13s/it][2021-06-08 12:03:12,696] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048.0, reducing to 1024.0 0%| | 23/63645 [00:49<37:45:52, 2.14s/it][2021-06-08 12:03:14,855] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1024.0, reducing to 512.0 0%| | 24/63645 [00:51<37:52:39, 2.14s/it][2021-06-08 12:03:17,010] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 512.0, reducing to 256.0 0%| | 25/63645 [00:53<37:56:31, 2.15s/it][2021-06-08 12:03:19,143] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 256.0, reducing to 128.0 0%| | 26/63645 [00:56<37:51:51, 2.14s/it][2021-06-08 12:03:21,267] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 128.0, reducing to 64.0 0%| | 27/63645 [00:58<37:45:52, 2.14s/it][2021-06-08 12:03:23,390] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 64.0, reducing to 32.0 0%| | 28/63645 [01:00<37:41:21, 2.13s/it][2021-06-08 12:03:25,526] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32.0, reducing to 16.0 0%| | 29/63645 [01:02<37:42:30, 2.13s/it][2021-06-08 12:03:27,669] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16.0, reducing to 8.0 [2021-06-08 12:03:27,670] [INFO] [timer.py:157:stop] 0/30, SamplesPerSec=1.8793183327344205 0%| | 30/63645 [01:04<37:45:17, 2.14s/it][2021-06-08 12:03:29,803] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8.0, reducing to 4.0 0%| | 31/63645 [01:06<37:44:34, 2.14s/it][2021-06-08 12:03:31,936] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4.0, reducing to 2.0 0%| | 32/63645 [01:08<37:43:25, 2.13s/it][2021-06-08 12:03:34,071] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2.0, reducing to 1.0 0%| | 33/63645 [01:11<37:43:30, 2.13s/it][2021-06-08 12:03:36,202] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1.0, reducing to 1 0%| | 34/63645 [01:13<37:42:09, 2.13s/it][2021-06-08 12:03:38,344] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 35/63645 [01:15<37:44:52, 2.14s/it][2021-06-08 12:03:40,449] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 36/63645 [01:17<37:34:48, 2.13s/it][2021-06-08 12:03:42,570] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 37/63645 [01:19<37:32:51, 2.13s/it][2021-06-08 12:03:44,722] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 38/63645 [01:21<37:41:34, 2.13s/it][2021-06-08 12:03:46,892] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 39/63645 [01:23<37:53:09, 2.14s/it][2021-06-08 12:03:49,014] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-06-08 12:03:49,015] [INFO] [timer.py:157:stop] 0/40, SamplesPerSec=1.8786745912560845 0%| | 40/63645 [01:25<37:46:00, 2.14s/it][2021-06-08 12:03:51,134] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 41/63645 [01:28<37:40:15, 2.13s/it][2021-06-08 12:03:53,292] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 42/63645 [01:30<37:48:30, 2.14s/it][2021-06-08 12:03:55,423] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 43/63645 [01:32<37:45:29, 2.14s/it][2021-06-08 12:03:57,549] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 44/63645 [01:34<37:42:00, 2.13s/it][2021-06-08 12:03:59,675] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 45/63645 [01:36<37:39:17, 2.13s/it][2021-06-08 12:04:01,811] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 46/63645 [01:38<37:40:58, 2.13s/it][2021-06-08 12:04:03,961] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 47/63645 [01:40<37:46:14, 2.14s/it][2021-06-08 12:04:06,086] [INFO] [stage3.py:2323:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 0%| | 48/63645 [01:43<37:42:06, 2.13s/it]

opened by mallorbc 6
2.7B model hardware requirements

I have tried fine tunning this 2.7B parameter model with my RTX 3090 and with 64 GB of RAM. Looking at system resources, I am exhausting all of my RAM before the program is killed. My question is what hardware was used in this repo? Specifically, how much RAM is required to train the 2.7B model?

opened by mallorbc 3

'<|startoftext|>' bug

Hi! Thank you for your great project.
I am following your example. But, I found small bug in tokenizer.
See below.

>>> tokenizer = GPT2Tokenizer.from_pretrained(
...       "EleutherAI/gpt-neo-1.3B", 
...       bos_token="<|startoftext|>",
...       eos_token="<|endoftext|>",
...       pad_token="<|pad|>",
... )
>>> tokenizer("<|startoftext|>")
 {'input_ids': [27, 91, 9688, 1659, 5239, 91, 29], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}
>>> tokenizer("<|pad|>")
{'input_ids': [50257], 'attention_mask': [1]}
>>> tokenizer("<|endoftext|>")
{'input_ids': [50256], 'attention_mask': [1]}

opened by sooftware 3

GPU Memory Requirements
Question

Just a quick question, is anyone aware of how much GPU memory is required to train these models? I'm on Kaggle with a P100(16GB) and I can't see to to call .train() without running out memory on any of the 3 models available.

I've tried with the 2.7B, 1.3B and 125M param models and I get the same result with all 3, surely a P100 can handle the 125M model. 🤔

I might order some more RAM for my home server and try this again on a CPU, I saw from the other post here it looks like I'll need at least 75GB which isn't bad at all, plus it gives me an excuse to upgrade it haha.

RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 15.90 GiB total capacity; 14.71 GiB already allocated; 77.75 MiB free; 14.95 GiB reserved in total by PyTorch)
opened by AaronWatson2975 2
add a note to remove the torch.distributed emulation

it looks like users try to run this example on torch.distributed with multiple gpus and of course it fails.

So I'm proposing to please at least add a note to remove the torch.distributed emulation hack for when multi-gpu setup is used.

Please feel free to edit the wording to your liking

Thank you!

opened by stas00 1
Freezing at "Using /home/user/.cache/torch_extensions as PyTorch extensions root..."
Prreviously I was able to run the model, but getting loss overflows for any custom data or running out of RAM, but it seemed to work for the Netflix dataset. I made a new conda environment and now I get a point where some Vram is allocated before it just freezes at this point.

python gpt_neo_xl_deepspeed.py Special tokens have been added in the vocabulary, make sure the associated word embedding are fine-tuned or trained. Max length: 384 [2021-06-08 10:46:33,763] [INFO] [distributed.py:47:init_distributed] Initializing torch distributed with backend: nccl [2021-06-08 10:46:33,899] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed info: version=0.3.13, git-hash=unknown, git-branch=unknown [2021-06-08 10:46:34,019] [INFO] [engine.py:77:_initialize_parameter_parallel_groups] data_parallel_size: 1, parameter_parallel_size: 1 Using /home/user/.cache/torch_extensions as PyTorch extensions root...

I feel like this may be due to environment issues, perhaps cudatoolkit since that is not specified. What cudatoolkit was used? Does anyone have any idea what the issue may be? See below for my env:

name: gpt_neo_train channels:

nvidia

defaults dependencies:

_libgcc_mutex=0.1=main

ca-certificates=2021.5.25=h06a4308_1

certifi=2021.5.30=py37h06a4308_0

cudatoolkit=11.1.74=h6bb024c_0

ld_impl_linux-64=2.33.1=h53a641e_7

libffi=3.3=he6710b0_2

libgcc-ng=9.1.0=hdf63c60_0

libstdcxx-ng=9.1.0=hdf63c60_0

ncurses=6.2=he6710b0_1

openssl=1.1.1k=h27cfd23_0

pip=21.1.1=py37h06a4308_0

python=3.7.10=hdb3f193_0

readline=8.1=h27cfd23_0

setuptools=52.0.0=py37h06a4308_0

sqlite=3.35.4=hdfb4753_0

tk=8.6.10=hbc83047_0

wheel=0.36.2=pyhd3eb1b0_0

xz=5.2.5=h7b6447c_0

zlib=1.2.11=h7b6447c_3

pip:

chardet==4.0.0

click==8.0.1

deepspeed==0.3.13

filelock==3.0.12

idna==2.10

importlib-metadata==4.5.0

joblib==1.0.1

ninja==1.10.0.post2

numpy==1.17.3

packaging==20.9

pandas==1.2.2

pillow==8.2.0

protobuf==3.17.2

psutil==5.8.0

pyparsing==2.4.7

python-dateutil==2.8.1

pytz==2021.1

regex==2021.4.4

requests==2.25.1

sacremoses==0.0.45

six==1.16.0

tensorboardx==1.8

tokenizers==0.10.3

torch==1.8.1+cu111

torchsummary==1.5.1

torchvision==0.9.1+cu111

tqdm==4.61.0

transformers==4.5.0

typing-extensions==3.10.0.0

urllib3==1.26.5

zipp==3.4.1 prefix: /home/user/anaconda3/envs/gpt_neo_train
opened by mallorbc 1
Training params?

This is awesome! Thank you for showing the VRAM to show training requirements and speed. Only one thing I didn't find: Do you remember the max_length of the tokenized Netflix titles?

max_length = max([len(tokenizer.encode(description)) for description in descriptions])

opened by aaronrmm 1
RuntimeError: Error building extension 'cpu_adam'

Trainer(model=model, args=training_args, train_dataset=train_dataset, eval_dataset=val_dataset, data_collator=lambda data: {'input_ids': torch.stack([f[0] for f in data]), 'attention_mask': torch.stack([f[1] for f in data]), 'labels': torch.stack([f[0] for f in data])}).train()

CalledProcessError Traceback (most recent call last) File /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py:1900, in _run_ninja_build(build_directory, verbose, error_prefix) 1899 stdout_fileno = 1 -> 1900 subprocess.run( 1901 command, 1902 stdout=stdout_fileno if verbose else subprocess.PIPE, 1903 stderr=subprocess.STDOUT, 1904 cwd=build_directory, 1905 check=True, 1906 env=env) 1907 except subprocess.CalledProcessError as e: 1908 # Python 2 and 3 compatible way of getting the error object.

File /opt/conda/envs/pytorch/lib/python3.9/subprocess.py:528, in run(input, capture_output, timeout, check, *popenargs, **kwargs) 527 if check and retcode: --> 528 raise CalledProcessError(retcode, process.args, 529 output=stdout, stderr=stderr) 530 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last) Cell In [10], line 1 ----> 1 Trainer(model=model, args=training_args, train_dataset=train_dataset, 2 eval_dataset=val_dataset, data_collator=lambda data: {'input_ids': torch.stack([f[0] for f in data]), 3 'attention_mask': torch.stack([f[1] for f in data]), 4 'labels': torch.stack([f[0] for f in data])}).train()

File /opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/trainer.py:1527, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs) 1522 self.model_wrapped = self.model 1524 inner_training_loop = find_executable_batch_size( 1525 self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size 1526 ) -> 1527 return inner_training_loop( 1528 args=args, 1529 resume_from_checkpoint=resume_from_checkpoint, 1530 trial=trial, 1531 ignore_keys_for_eval=ignore_keys_for_eval, 1532 )

File /opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/trainer.py:1596, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval) 1589 delay_optimizer_creation = ( 1590 self.sharded_ddp is not None 1591 and self.sharded_ddp != ShardedDDPOption.SIMPLE 1592 or is_sagemaker_mp_enabled() 1593 or self.fsdp is not None 1594 ) 1595 if args.deepspeed: -> 1596 deepspeed_engine, optimizer, lr_scheduler = deepspeed_init( 1597 self, num_training_steps=max_steps, resume_from_checkpoint=resume_from_checkpoint 1598 ) 1599 self.model = deepspeed_engine.module 1600 self.model_wrapped = deepspeed_engine

File /opt/conda/envs/pytorch/lib/python3.9/site-packages/transformers/deepspeed.py:344, in deepspeed_init(trainer, num_training_steps, resume_from_checkpoint, inference) 333 # keep for quick debug: 334 # from pprint import pprint; pprint(config) 336 kwargs = dict( 337 model=model, 338 model_parameters=model_parameters, (...) 341 lr_scheduler=lr_scheduler, 342 ) --> 344 deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs) 346 if resume_from_checkpoint is not None: 347 348 # it's possible that the user is trying to resume from model_path, which doesn't necessarily 349 # contain a deepspeed checkpoint. e.g. examples just check if the dir exists and assume it's 350 # a resume from a checkpoint and not just a local pretrained weight. So we check here if the 351 # path contains what looks like a deepspeed checkpoint 352 import glob

File /opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/init.py:125, in initialize(args, model, optimizer, model_parameters, training_data, lr_scheduler, mpu, dist_init_required, collate_fn, config, config_params) 122 assert model is not None, "deepspeed.initialize requires a model" 124 if not isinstance(model, PipelineModule): --> 125 engine = DeepSpeedEngine(args=args, 126 model=model, 127 optimizer=optimizer, 128 model_parameters=model_parameters, 129 training_data=training_data, 130 lr_scheduler=lr_scheduler, 131 mpu=mpu, 132 dist_init_required=dist_init_required, 133 collate_fn=collate_fn, 134 config=config, 135 config_params=config_params) 136 else: 137 assert mpu is None, "mpu must be None with pipeline parallelism"

File /opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/runtime/engine.py:330, in DeepSpeedEngine.init(self, args, model, optimizer, model_parameters, training_data, lr_scheduler, mpu, dist_init_required, collate_fn, config, config_params, dont_change_device) 327 model_parameters = self.module.parameters() 329 if has_optimizer: --> 330 self._configure_optimizer(optimizer, model_parameters) 331 self._configure_lr_scheduler(lr_scheduler) 332 self._report_progress(0)

File /opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/runtime/engine.py:1195, in DeepSpeedEngine._configure_optimizer(self, client_optimizer, model_parameters) 1193 log_dist('Using client callable to create basic optimizer', ranks=[0]) 1194 else: -> 1195 basic_optimizer = self._configure_basic_optimizer(model_parameters) 1196 log_dist( 1197 f"Using DeepSpeed Optimizer param name {self.optimizer_name()} as basic optimizer", 1198 ranks=[0]) 1200 self._check_for_duplicates(basic_optimizer)

File /opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/runtime/engine.py:1266, in DeepSpeedEngine._configure_basic_optimizer(self, model_parameters) 1264 else: 1265 from deepspeed.ops.adam import DeepSpeedCPUAdam -> 1266 optimizer = DeepSpeedCPUAdam(model_parameters, 1267 **optimizer_parameters, 1268 adamw_mode=effective_adam_w_mode) 1269 else: 1270 from deepspeed.ops.adam import FusedAdam

File /opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py:94, in DeepSpeedCPUAdam.init(self, model_params, lr, bias_correction, betas, eps, weight_decay, amsgrad, adamw_mode, fp32_optimizer_states) 92 self.adam_w_mode = adamw_mode 93 self.fp32_optimizer_states = fp32_optimizer_states ---> 94 self.ds_opt_adam = CPUAdamBuilder().load() 96 self.ds_opt_adam.create_adam(self.opt_id, 97 lr, 98 betas[0], (...) 102 adamw_mode, 103 should_log_le("info"))

File /opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py:460, in OpBuilder.load(self, verbose) 458 return importlib.import_module(self.absolute_name()) 459 else: --> 460 return self.jit_load(verbose)

File /opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py:495, in OpBuilder.jit_load(self, verbose) 492 torch_arch_list = os.environ.get("TORCH_CUDA_ARCH_LIST") 493 os.environ["TORCH_CUDA_ARCH_LIST"] = "" --> 495 op_module = load( 496 name=self.name, 497 sources=self.strip_empty_entries(sources), 498 extra_include_paths=self.strip_empty_entries(extra_include_paths), 499 extra_cflags=self.strip_empty_entries(self.cxx_args()), 500 extra_cuda_cflags=self.strip_empty_entries(self.nvcc_args()), 501 extra_ldflags=self.strip_empty_entries(self.extra_ldflags()), 502 verbose=verbose) 503 build_duration = time.time() - start_build 504 if verbose:

File /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py:1284, in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates) 1192 def load(name, 1193 sources: Union[str, List[str]], 1194 extra_cflags=None, (...) 1202 is_standalone=False, 1203 keep_intermediates=True): 1204 r''' 1205 Loads a PyTorch C++ extension just-in-time (JIT). 1206 (...) 1282 ... verbose=True) 1283 ''' -> 1284 return _jit_compile( 1285 name, 1286 [sources] if isinstance(sources, str) else sources, 1287 extra_cflags, 1288 extra_cuda_cflags, 1289 extra_ldflags, 1290 extra_include_paths, 1291 build_directory or _get_build_directory(name, verbose), 1292 verbose, 1293 with_cuda, 1294 is_python_module, 1295 is_standalone, 1296 keep_intermediates=keep_intermediates)

File /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py:1508, in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates) 1504 hipified_sources.add(hipify_result[s_abs]["hipified_path"] if s_abs in hipify_result else s_abs) 1506 sources = list(hipified_sources) -> 1508 _write_ninja_file_and_build_library( 1509 name=name, 1510 sources=sources, 1511 extra_cflags=extra_cflags or [], 1512 extra_cuda_cflags=extra_cuda_cflags or [], 1513 extra_ldflags=extra_ldflags or [], 1514 extra_include_paths=extra_include_paths or [], 1515 build_directory=build_directory, 1516 verbose=verbose, 1517 with_cuda=with_cuda, 1518 is_standalone=is_standalone) 1519 finally: 1520 baton.release()

File /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py:1623, in _write_ninja_file_and_build_library(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_standalone) 1621 if verbose: 1622 print(f'Building extension module {name}...', file=sys.stderr) -> 1623 _run_ninja_build( 1624 build_directory, 1625 verbose, 1626 error_prefix=f"Error building extension '{name}'")

File /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py:1916, in _run_ninja_build(build_directory, verbose, error_prefix) 1914 if hasattr(error, 'output') and error.output: # type: ignore[union-attr] 1915 message += f": {error.output.decode(*SUBPROCESS_DECODE_ARGS)}" # type: ignore[union-attr] -> 1916 raise RuntimeError(message) from e

RuntimeError: Error building extension 'cpu_adam'

opened by ivrschool 0
Deepspeed stuck

When replicating the code Deepspeed gets stuck with [2021-06-29 14:29:44,757] [INFO] [utils.py:13:_initialize_parameter_parallel_groups] data_parallel_size: 1, parameter_parallel_size: 1

Any ideas on how to fix this?

opened by SamsTheGreatest 0

Saving and loading model / tokenizer issues

If I want to save and run generation on the model later on, I assume I do something like this:

After training: tokenizer.save_pretrained('./results/')

Later generation:

weights = "./results/"
tokenizer = GPT2Tokenizer.from_pretrained(weights, bos_token='<|startoftext|>', eos_token='<|endoftext|>', pad_token='<|pad|>')
model = GPTNeoForCausalLM.from_pretrained('./results/checkpoint-90/').cuda()

But I get an error about the size not being right. Any idea why?

File "generate.py", line 12, in <module>
    model = GPTNeoForCausalLM.from_pretrained('./results/checkpoint-90').cuda()
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/transformers/modeling_utils.py", line 1183, in from_pretrained
    raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for GPTNeoForCausalLM:
	size mismatch for transformer.wte.weight: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([50258, 768]).

opened by Shane-Neeley 0

Owner

Nikita

Team Lead Java/JVM/C++/Python/ML 5 years in Fintech 10+ years in JVM languages

GitHub

Fine-tune pretrained Convolutional Neural Networks with PyTorch

Fine-tune pretrained Convolutional Neural Networks with PyTorch. Features Gives access to the most popular CNN architectures pretrained on ImageNet. A

694 Nov 23, 2022

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model -- based on GPT-3, called GPT-Codex -- that is fine-tuned on publicly available code from GitHub.

2.3k Jan 9, 2023

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

8.4k Jan 1, 2023

GPT, but made only out of gMLPs

GPT - gMLP This repository will attempt to crack long context autoregressive language modeling (GPT) using variations of gMLPs. Specifically, it will

80 Dec 1, 2022

A GPT, made only of MLPs, in Jax

MLP GPT - Jax (wip) A GPT, made only of MLPs, in Jax. The specific MLP to be used are gMLPs with the Spatial Gating Units. Working Pytorch implementat

53 Sep 27, 2022

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

"# SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING" i

28 Dec 12, 2022

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Differentiable Model Compression via Pseudo Quantization Noise DiffQ performs differentiable quantization using pseudo quantization noise. It can auto

145 Dec 30, 2022

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

96 Nov 1, 2022

Train emoji embeddings based on emoji descriptions.

emoji2vec This is my attempt to train, visualize and evaluate emoji embeddings as presented by Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko

17 Sep 3, 2022

Official PyTorch implementation of the paper "TEMOS: Generating diverse human motions from textual descriptions"

TEMOS: TExt to MOtionS Generating diverse human motions from textual descriptions Description Official PyTorch implementation of the paper "TEMOS: Gen

187 Dec 27, 2022

Generate vibrant and detailed images using only text.

CLIP Guided Diffusion From RiversHaveWings. Generate vibrant and detailed images using only text. See captions and more generations in the Gallery See

401 Dec 28, 2022

SparseML is a libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

SparseML is a toolkit that includes APIs, CLIs, scripts and libraries that apply state-of-the-art sparsification algorithms such as pruning and quantization to any neural network. General, recipe-driven approaches built around these algorithms enable the simplification of creating faster and smaller models for the ML performance community at large.

1.5k Dec 30, 2022

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

27 Dec 20, 2022

ChatBot-Pytorch - A GPT-2 ChatBot implemented using Pytorch and Huggingface-transformers

ChatBot-Pytorch A GPT-2 ChatBot implemented using Pytorch and Huggingface-transf

42 Dec 9, 2022

Fine-Tune EleutherAI GPT-Neo to Generate Netflix Movie Descriptions in Only 47 Lines of Code Using Hugginface And DeepSpeed

Related tags

Overview

GPT-Neo-2.7B Fine-Tuning Example Using HuggingFace & DeepSpeed

Installation

GPU VRAM load during training

Resilts

Comments

DeepSpeed Loss Overflow

2.7B model hardware requirements

'<|startoftext|>' bug

GPU Memory Requirements

Question

add a note to remove the torch.distributed emulation

Freezing at "Using /home/user/.cache/torch_extensions as PyTorch extensions root..."

Training params?

RuntimeError: Error building extension 'cpu_adam'

Deepspeed stuck

Saving and loading model / tokenizer issues

Owner

Nikita

Fine-tune pretrained Convolutional Neural Networks with PyTorch

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

GPT, but made only out of gMLPs

A GPT, made only of MLPs, in Jax

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

Train emoji embeddings based on emoji descriptions.

Official PyTorch implementation of the paper "TEMOS: Generating diverse human motions from textual descriptions"

Generate vibrant and detailed images using only text.

SparseML is a libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

Deploy a ML inference service on a budget in less than 10 lines of code.

Train neural network for semantic segmentation (deep lab V3) with pytorch in less then 50 lines of code

Create Data & AI apps in 20 lines of code with Shimoku

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

ChatBot-Pytorch - A GPT-2 ChatBot implemented using Pytorch and Huggingface-transformers