ACL'22: Structured Pruning Learns Compact and Accurate Models

Overview

CoFiPruning: Structured Pruning Learns Compact and Accurate Models

This repository contains the code and pruned models for our ACL'22 paper Structured Pruning Learns Compact and Accurate Models.

**************************** Updates ****************************

  • 05/09/2022: We release the pruned model checkpoints on RTE, MRPC and CoLA!
  • 04/01/2022: We released our paper along with pruned model checkpoints on SQuAD, SST-2, QNLI and MNLI. Check it out!

Quick Links

Overview

We propose CoFiPruning, a task-specific, structured pruning approach (Coarse and Fine-grained Pruning) and show that structured pruning can achieve highly compact subnetworks and obtain large speedups and competitive accuracy as distillation approaches, while requiring much less computation. Our key insight is to jointly prune coarse-grained units (e.g., self-attention or feed-forward layers) and fine-grained units (e.g., heads, hidden dimensions) simultaneously. Different from existing works, our approach controls the pruning decision of every single parameter by multiple masks of different granularity. This is the key to large compression, as it allows the greatest flexibility of pruned structures and eases the optimization compared to only pruning small units. We also devise a layerwise distillation strategy to transfer knowledge from unpruned to pruned models during optimization.

Main Results

We show the main results of CoFiPruning along with results of popular pruning and distillation methods including Block Pruning, DynaBERT, DistilBERT and TinyBERT. Please see more detailed results in our paper.

Model List

Our released models are listed as following. You can download these models with the following links. We use a batch size of 128 and V100 32GB GPUs for speedup evaluation. We show F1 score for SQuAD and accuracy score for GLUE datasets. s60 denotes that the sparsity of the model is roughly 60%.

model name task sparsity speedup score
princeton-nlp/CoFi-MNLI-s60 MNLI 60.2% 2.1 × 85.3
princeton-nlp/CoFi-MNLI-s95 MNLI 94.3% 12.1 × 80.6
princeton-nlp/CoFi-QNLI-s60 QNLI 60.3% 2.1 × 91.8
princeton-nlp/CoFi-QNLI-s95 QNLI 94.5% 12.1 × 86.1
princeton-nlp/CoFi-SST2-s60 SST-2 60.1% 2.1 × 93.0
princeton-nlp/CoFi-SST2-s95 SST-2 94.5% 12.2 × 90.4
princeton-nlp/CoFi-SQuAD-s60 SQuAD 59.8% 2.0 × 89.1
princeton-nlp/CoFi-SQuAD-s93 SQuAD 92.4% 8.7 × 82.6
princeton-nlp/CoFi-RTE-s60 RTE 60.2% 2.0 x 72.6
princeton-nlp/CoFi-RTE-s96 RTE 96.2% 12.8 x 66.1
princeton-nlp/CoFi-CoLA-s60 CoLA 60.4% 2.0 x 60.4
princeton-nlp/CoFi-CoLA-s95 CoLA 95.1% 12.3 x 38.9
princeton-nlp/CoFi-MRPC-s60 MRPC 61.5% 2.0 x 86.8
princeton-nlp/CoFi-MRPC-s95 MRPC 94.9% 12.2 x 83.6

You can use these models with the huggingface interface:

from CoFiPruning.models import CoFiBertForSequenceClassification
model = CoFiBertForSequenceClassification.from_pretrained("princeton-nlp/CoFi-MNLI-s95") 
output = model(**inputs)

Train CoFiPruning

In the following section, we provide instructions on training CoFi with our code.

Requirements

Try runing the following script to install the dependencies.

pip install -r requirements.txt

Training

Training scripts

We provide example training scripts for training with CoFiPruning with different combination of training units and objectives in scripts/run_CoFi.sh. The script only supports single-GPU training and we explain the arguments in following:

  • --task_name: we support sequence classification tasks and extractive question answer tasks. You can input a glue task name, e.g., MNLI or use --train_file and --validation_file arguments with other tasks (supported by HuggingFace).
  • --ex_name_suffix: experiment name (for output dir)
  • --ex_cate: experiment category name (for output dir)
  • --pruning_type: we support all combinations of the following four types of pruning units. Default pruning type is structured_heads+structured_mlp+hidden+layer. Setting it to None falls back to standard fine-tuning.
    • structured_heads: head pruning
    • structured_mlp: mlp intermediate dimension pruning
    • hidden: hidden states pruning
    • layer: layer pruning
  • --target_sparsity: target sparsity of the pruned model
  • --distillation_path: the directory of the teacher model
  • --distillation_layer_loss_alpha: weight for layer distillation
  • --distillation_ce_loss_alpha: weight for cross entropy distillation
  • --layer_distill_version: we recommend using version 4 for small-sized datasets to impose an explicit restriction on layer orders but for relatively larger datasets, version 3 and version 4 do not make much difference.

After pruning the model, the same script could be used for further fine-tuning the pruned model with following arguments:

  • --pretrained_pruned_model: directory of the pruned model
  • --learning_rate: learning rate of the fine-tuning stage Note that during fine-tuning stage, pruning_type should be set to None.

An example for training (pruning) is as follows:

TASK=MNLI
SUFFIX=sparsity0.95
EX_CATE=CoFi
PRUNING_TYPE=structured_head+structured_mlp+hidden+layer
SPARSITY=0.95
DISTILL_LAYER_LOSS_ALPHA=0.9
DISTILL_CE_LOSS_ALPHA=0.1
LAYER_DISTILL_VERSION=4

bash scripts/run_CoFi.sh $TASK $SUFFIX $EX_CATE $PRUNING_TYPE $SPARSITY [DISTILLATION_PATH] $DISTILL_LAYER_LOSS_ALPHA $DISTILL_CE_LOSS_ALPHA $LAYER_DISTILL_VERSION

An example for fine_tuning after pruning is as follows:

PRUNED_MODEL_PATH=$proj_dir/$TASK/$EX_CATE/${TASK}_${SUFFIX}/best
PRUNING_TYPE=None # Setting the pruning type to be None for standard fine-tuning.
LEARNING_RATE=3e-5

bash scripts/run_CoFi.sh $TASK $SUFFIX $EX_CATE $PRUNING_TYPE $SPARSITY [DISTILLATION_PATH] $DISTILL_LAYER_LOSS_ALPHA $DISTILL_CE_LOSS_ALPHA $LAYER_DISTILL_VERSION [PRUNED_MODEL_PATH] $LEARNING_RATE

The training process will save the model with the best validation accuracy under $PRUNED_MODEL_PATH/best. And you can use the evaluation.py script for evaluation.

Evaluation

Our pruned models are served on Huggingface's model hub. You can use the script evalution.py to get the sparsity, inference time and development set results of a pruned model.

python evaluation.py [TASK] [MODEL_NAME_OR_DIR]

An example use of evaluating a sentence classification model is as follows:

python evaluation.py MNLI princeton-nlp/CoFi-MNLI-s95 

The expected output of the model is as follows:

Task: MNLI
Model path: princeton-nlp/CoFi-MNLI-s95
Model size: 4920106
Sparsity: 0.943
mnli/acc: 0.8055
seconds/example: 0.010151

Hyperparameters

We use the following hyperparamters for training CoFiPruning:

GLUE (small) GLUE (large) SQuAD
Batch size 32 32 16
Pruning learning rate 2e-5 2e-5 3e-5
Fine-tuning learning rate 1e-5, 2e-5, 3e-5 1e-5, 2e-5, 3e-5 1e-5, 2e-5, 3e-5
Layer distill. alpha 0.9, 0.7, 0.5 0.9, 0.7, 0.5 0.9, 0.7, 0.5
Cross entropy distill. alpha 0.1, 0.3, 0.5 0.1, 0.3, 0.5 0.1, 0.3, 0.5
Pruning epochs 100 20 20
Pre-finetuning epochs 4 1 1
Sparsity warmup epochs 20 2 2
Finetuning epochs 20 20 20

GLUE (small) denotes the GLUE tasks with a relatively smaller size including CoLA, STS-B, MRPC and RTE and GLUE (large) denotes the rest of the GLUE tasks including SST-2, MNLI, QQP and QNLI. Note that hyperparameter search is essential for small-sized datasets but is less important for large-sized datasets.

Bugs or Questions?

If you have any questions related to the code or the paper, feel free to email Mengzhou ([email protected]) and Zexuan ([email protected]). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

Citation

Please cite our paper if you use CoFiPruning in your work:

@inproceedings{xia2022structured,
   title={Structured Pruning Learns Compact and Accurate Models},
   author={Xia, Mengzhou and Zhong, Zexuan and Chen, Danqi},
   booktitle={Association for Computational Linguistics (ACL)},
   year={2022}
}
Comments
  • About the diag() and distillation in your paper

    About the diag() and distillation in your paper

    Hi @xiamengzhou , many thanks to your contribution. I have small questions in your paper, in your paper you said that

    FNN pruning introduce a Zint

    And in your paper there is a Eq, but what is diag, why do we have to put Zint into a diagonal matrix? Do diag(Zint) is df*df size?

    image

    And you also says that

    Coarse-grained and Fine- grained units (§3.1) with a layerwise distillation objective transferring knowledge from unpruned to pruned models (§3.2)

    However, distilling intermediate layers during the pruning process is challenging as the model struc- ture changes throughout training. (previous method)

    So are we pruning a student model during distillation?

    image

    Many thanks!!

    opened by CaffreyR 6
  • More numbers on other sparsities

    More numbers on other sparsities

    CoFi is a great work which may benefit the research in related areas.

    However, I have found the numbers of the task performance on other sparsities are not available. Could you please provide these numbers in detail?

    Besides, metrics besides accuracy scores on GLUE would also be appreciated.

    opened by GeneZC 6
  • The usage of L_c

    The usage of L_c

    image I do not understand how this loss works ---- since $\lambda_1$ and $\lambda_2$ are 0 as default, I find that sometimes the loss maybe a negative number sometimes.

    opened by Ther-nullptr 6
  • A Few issues with reproducing the code

    A Few issues with reproducing the code

    Hello,

    I am trying to run your codebase. I am having some issues however:

    1. Set of python requirements cannot be installed due to incompatibilities. Are these requirements strict or can they be relaxed?

    2. After relaxing the above evaluation runs fine, but training requires a --distillation_path. Could you provide an example on how to use this argument?

    3. To overcome 2, I set variable additional_args.do_distill to False. this results in an epoch being trained but crashing at the end. Model loss succesfully reduces but reg loss and lag loss is 0. The error at the end is a failure in assertion: " assert "head" in self.types" in the l0 module

    Could you help me or provide pointers on resolving the above?

    Thank you

    opened by ctsan 6
  • layer-distillation: teacher layer sets selection?

    layer-distillation: teacher layer sets selection?

    The original papers mentioned: Specifically, let T denote a set of teacher layers that we use to distill knowledge to the student model.'' And the code in trainer provides[2, 5, 8, 11]'' only, which is part of settings in Appendix. Any suggestions of selection of such teacher layer sets for distillation,? 4 layers at most? which 4 layers are proper? how do we specify task-aware settings? i.e., There are 12 layers for Students, why we only choose to select from given 4 layers? How about 5, 6, 12 layers for T,? I think it is critical for reproduce results, where I barely reproduce any results to match the reported scores now?

    opened by zhangzhenyu13 4
  • training error about qnli

    training error about qnli

    Great job. However, when I train to the 3rd epoch in the QNLI task, I encounter the following problem, but the CoLA or Squad tasks do not encounter this problem. Do you have any suggestions? I will be very grateful!

    image The error may appear in the following code block in the file: https://github.com/princeton-nlp/CoFiPruning/blob/main/trainer/trainer.py,Lines 680-685

    lagrangian_loss = None
    if self.start_prune:
            lagrangian_loss, _, _ = \
                     self.l0_module.lagrangian_regularization(
                            self.global_step - self.prepruning_finetune_steps)
            loss += lagrangian_loss
    
    opened by iMountTai 4
  • Device incompatibility?

    Device incompatibility?

    Hello,

    In the following line: https://github.com/princeton-nlp/CoFiPruning/blob/022847ae88f49fa7b8fc58f9c0613492fd1230cc/trainer/trainer.py#L599

    existing_layers tensor is in cpu and the result of indexes<last_aligned_layer is in gpu. This throws an error as a result

    Is this a bug? maybe first move existing_layers to gpu?

    opened by ctsan 4
  • Troubles reproducing the results

    Troubles reproducing the results

    Hello, Thank you for providing code. But I have a question on how to reproduce the results the 95% sparsity on MNLI with the following commands:

    TASK=MNLI
    SUFFIX=sparsity0.95
    EX_CATE=CoFi
    PRUNING_TYPE=structured_heads+structured_mlp+hidden+layer
    SPARSITY=0.95
    DISTILL_LAYER_LOSS_ALPHA=0.9
    DISTILL_CE_LOSS_ALPHA=0.1
    LAYER_DISTILL_VERSION=4
    DISTILLATION_PATH=dynabert/MNLI
    CUDA_VISIBLE_DEVICES=1 bash scripts/run_CoFi.sh $TASK $SUFFIX $EX_CATE $PRUNING_TYPE $SPARSITY $DISTILLATION_PATH $DISTILL_LAYER_LOSS_ALPHA $DISTILL_CE_LOSS_ALPHA $LAYER_DISTILL_VERSION
    

    And I get following results with accuracy 78.20 on MNLI:

    wandb: Run history:
    wandb:                   eval/loss ▃▁▂▂▂▃██▆▆▅▅▅▅▆▅▅▄▄▅▄▅▅▅▄▄▄▄▄▄▄▄▄▄▄▄▅▄▄▄
    wandb:              train/accuracy ▆█▇██▇▁▁▃▄▄▄▄▅▄▅▅▅▅▅▅▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆▆▆▆
    wandb:     train/expected_sparsity ▁▃▄▆████████████████████████████████████
    wandb:           train/global_step ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
    wandb:           train/hidden_dims █████▁▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
    wandb:              train/lag_loss ▆▆▇▆▆█▁▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆
    wandb:         train/learning_rate █████▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▁▁▁
    wandb:                  train/loss ▂▁▆▂▂▇▃█▇▇▇▇▇▇▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▅▆▅▆▆▆
    wandb: train/pruned_model_sparsity ▁▃▄▆████████████████████████████████████
    wandb:         train/pruned_params ▁▃▄▆████████████████████████████████████
    wandb:              train/reg_loss ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
    wandb:      train/remaining_params █▆▅▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
    wandb:       train/target_sparsity ▁▂▄▆████████████████████████████████████
    wandb:
    wandb: Run summary:
    wandb:                   eval/loss 0.66644
    wandb:              train/accuracy 0.78197
    wandb:     train/expected_sparsity 0.94999
    wandb:           train/global_step 0
    wandb:           train/hidden_dims 764
    wandb:              train/lag_loss 1e-05
    wandb:         train/learning_rate 0.0
    wandb:                  train/loss 0.40625
    wandb: train/pruned_model_sparsity 0.95561
    wandb:         train/pruned_params 81243440
    wandb:              train/reg_loss 0.0
    wandb:      train/remaining_params 3774160
    wandb:       train/target_sparsity 0.95
    

    By the way, I found some issues during reproducing:

    1. In evaluation.py:77, datasets["validation"] should be datasets["validation_matched"] for MNLI.
    2. Label_map in MNLI, dynabert and princeton-nlp/CoFi-MNLI-s95 use different label map compare to MNLI in datasets.load_dataset. And directly evaluating with python evaluation.py MNLI princeton-nlp/CoFi-MNLI-s95 will get wrong result.
    3. Pruning is unavailable for trained models in evaulation.py. For example, model is not purnned according to zs.pt with python evaluation.py MNLI ./out/MNLI/CoFi/MNLI_sparsity0.95.
    opened by kongds 4
  • Removing the already-pruned parts in the model may cause some changes in the outputs

    Removing the already-pruned parts in the model may cause some changes in the outputs

    Hi! I am trying to apply CoFi pruning to my own model, and I noticed that there might exist some edge cases where removing the already-pruned parts in my model will cause some changes in the outputs. I think this will happen when all the dims of the intermediate layer are removed.

    I found that when intermediate_zs are all zero, the intermediate.dense in the pruned model is set to None https://github.com/princeton-nlp/CoFiPruning/blob/5423094e7b318806462f2a7bdca5384d078e5eed/utils/cofi_utils.py#L229-L231 , and the FFN parts will then be skipped https://github.com/princeton-nlp/CoFiPruning/blob/5423094e7b318806462f2a7bdca5384d078e5eed/models/modeling_bert.py#L364-L365

    But before pruning, intermediate.dense is not None, and these zero outputs will still pass through CoFiBertOutput.dense which add a bias to the output https://github.com/princeton-nlp/CoFiPruning/blob/5423094e7b318806462f2a7bdca5384d078e5eed/models/modeling_bert.py#L562-L566 , so the FFN parts are not skipped.

    Should I change some part of my code to skip the FFN parts when intermediate_zs are all zero during training?

    opened by backspacetg 3
  • Fatal Logic Error found in trainer.py

    Fatal Logic Error found in trainer.py

    in the file: https://github.com/princeton-nlp/CoFiPruning/blob/main/trainer/trainer.py line 279 sepcifies following statement :

     if self.start_prune:
        zs = self.l0_module.forward(training=True)
        self.fill_inputs_with_zs(zs, inputs)
    

    only when this runs, we can get the gradient for the params in self.l0_optimizer. Only when the condiction satisfied as below (line 268):

      if self.prepruning_finetune_steps > 0 and self.global_step == self.prepruning_finetune_steps:
          self.start_prune = True
    

    However, line 301 just directly update the params without checking whether the grads are ready:

      if self.l0_module is not None and self.l0_optimizer is not None:
          self.l0_optimizer.step()
          self.lagrangian_optimizer.step()
    

    therefore, the adamw yields bugs for beta1/beta2 referred before define in its step method. As the the grad of the params are all None, the adamw implementation will skip define the hyper-params via the self.group dict.

    opened by zhangzhenyu13 3
  • Experimental results

    Experimental results

    Hello,@xiamengzhou !The result on the Squad task dataset is 79.74, which is quite different from the result (82.6) in the paper. Can you further announce the detailed parameters? The teacher model F1 value is 88.43. I will be very grateful!

    opened by iMountTai 2
  • Generating predictions with CoFi models

    Generating predictions with CoFi models

    Hi, First of all, thanks a lot for open-sourcing your code and models!

    I've been trying to use your code to generate predictions with CoFi models (with --do_predict on for example test-split of GLUE tasks) but unfortunately the prediction loop always fails with CUDA OOM exception (even on the 80GB A100 GPU). Could you also please try and let me know if I did something wrong?

    opened by eldarkurtic 1
  • Error in finetune with pruned model--AttributeError: 'NoneType' object has no attribute 'forward'`

    Error in finetune with pruned model--AttributeError: 'NoneType' object has no attribute 'forward'`

    hello,@xiamengzhou! When i use your script to finetune the pruned model, there is an issue. But i hava no idea about it. What`s wrong with my code?

    screenshot-20221021-161617

    TASK=MRPC
    SUFFIX=sparsity0.95
    EX_CATE=CoFi
    SPARSITY=0.95
    DISTILL_LAYER_LOSS_ALPHA=0.9
    DISTILL_CE_LOSS_ALPHA=0.1
    LAYER_DISTILL_VERSION=4
    SPARSITY_EPSILON=0.01
    DISTILLATION_PATH=/home/tt6232/KdQuant/teacher-model/bert-base-uncased/
    
    PRUNED_MODEL_PATH=./out/$TASK/$EX_CATE/${TASK}_${SUFFIX}/best
    PRUNING_TYPE=None # Setting the pruning type to be None for standard fine-tuning.
    LEARNING_RATE=3e-5
    
    bash scripts/run_CoFi.sh $TASK $SUFFIX $EX_CATE $PRUNING_TYPE $SPARSITY $DISTILLATION_PATH $DISTILL_LAYER_LOSS_ALPHA $DISTILL_CE_LOSS_ALPHA $LAYER_DISTILL_VERSION $PRUNED_MODEL_PATH $LEARNING_RATE &
    
    opened by zoetu 3
  • How to get the loss of `lagrangian_regularization`

    How to get the loss of `lagrangian_regularization`

    Hi! In your code you calculate the Lc image

    https://github.com/princeton-nlp/CoFiPruning/blob/main/trainer/trainer.py#L682

    And you use expected_size to calculate expected_sparsity , but does it match the equation in your paper? image

    https://github.com/princeton-nlp/CoFiPruning/blob/main/models/l0_module.py#L267

    Actually you said that sˆ is the expected model sparsity calculated from z , but the lagrangian_regularization() do not have inputs or z Many thanks!

    opened by CaffreyR 5
  • How to prune the model from the very begigning?

    How to prune the model from the very begigning?

    Hi @xiamengzhou , thanks for your contribution. But in your code, you use Model.from_pretrained to load the model architecture, and the files you have already provided. But if I want to prune my own, original model, for instance T5 model, using your method in the paper. Which code should I check? Many thanks:)

    opened by CaffreyR 4
  • potential bug loading a pruned model with no masks

    potential bug loading a pruned model with no masks

    load_pruned_model in the cofi_utils file seems to take a model as a first argument, however load_model(..) calls load_pruned_model by passing a string. In this case the program crashes as the string doesn't have a "config" property for example

    opened by ctsan 2
Owner
Princeton Natural Language Processing
Princeton Natural Language Processing
🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy floret is an extended version of fastText that can produce word repr

Explosion 222 Dec 16, 2022
VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

VampiresVsWerewolves Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition. Our Algorithm finish

Shawn 1 Jan 21, 2022
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Dedupe.io 3.6k Jan 2, 2023
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Dedupe.io 2.9k Feb 11, 2021
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Dedupe.io 2.9k Feb 17, 2021
Fast, general, and tested differentiable structured prediction in PyTorch

Torch-Struct: Structured Prediction Library A library of tested, GPU implementations of core structured prediction algorithms for deep learning applic

HNLP 1.1k Dec 16, 2022
Blue Brain text mining toolbox for semantic search and structured information extraction

Blue Brain Search Source Code DOI Data & Models DOI Documentation Latest Release Python Versions License Build Status Static Typing Code Style Securit

The Blue Brain Project 29 Dec 1, 2022
A Structured Self-attentive Sentence Embedding

Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR

Kaushal Shetty 488 Nov 28, 2022
Sequence Modeling with Structured State Spaces

Structured State Spaces for Sequence Modeling This repository provides implementations and experiments for the following papers. S4 Efficiently Modeli

HazyResearch 902 Jan 6, 2023
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Antlr Project 13.6k Jan 5, 2023
Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

gpt2-poetry The following code is for my senior honor's thesis project, under the guidance of Dr. Keith Holyoak at the University of California, Los A

Ashley Kim 2 Jan 9, 2022
Lingtrain Aligner — ML powered library for the accurate texts alignment.

Lingtrain Aligner ML powered library for the accurate texts alignment in different languages. Purpose Main purpose of this alignment tool is to build

Sergei Averkiev 76 Dec 14, 2022
PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

Facebook Research 605 Jan 2, 2023
A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

Ethan 66 Dec 26, 2022
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Alexander Veysov 3.2k Dec 31, 2022
An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

GPT-NeoX An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hun

EleutherAI 3.1k Jan 8, 2023
Client library to download and publish models and other files on the huggingface.co hub

huggingface_hub Client library to download and publish models and other files on the huggingface.co hub Do you have an open source ML library? We're l

Hugging Face 644 Jan 1, 2023