code associated with ACL 2021 DExperts paper



Hi! This repository contains code for the paper DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts to appear at ACL 2021. If you have any questions, please feel free to create a Github issue or reach out to the first author at [email protected].

Create a conda environment called dexperts with

conda env create -f environment.yml


To generate continuations with DExperts and score them for toxicity using the PerspectiveAPI toxicity scorer, run the following command.


python -m scripts.run_toxicity_experiment \
    --use-dataset \
    --dataset-file $PROMPTS_DATASET \
    --model-type dexperts \
    --model gpt2-large \
    --nontoxic-model $MODEL_DIR/finetuned_gpt2_nontoxic \
    --toxic-model $MODEL_DIR/finetuned_gpt2_toxic \
    --perspective-rate-limit $API_RATE \
    --alpha 2.0 \
    --filter_p 0.9 \

In general, model_type is one of gpt2 (the base model), dexperts (our method), and pplm. With an OpenAI API key for GPT-3 access, you can also try gpt3 and dexperts-gpt3. Different methods have different additional parameters to specify; to see the commands we used for each method in our paper, please look under scripts/our_scripts/toxicity. For experiments with GeDi, we directly used the original authors' codebase.

When model_type is dexperts, we can steer away from toxicity using only a toxic anti-expert. To do this, leave --nontoxic-model empty, and DExperts will re-use the base model as the expert. The hyperparameter alpha controls the strength of steering over the base model. We use filter_p to use the nucleus from the base model, as described in Section 2.2 of our paper.

This script will create three files in OUTPUT_DIR: generations.jsonl with all of the generated continuations, perspective.jsonl with all the scores from Perspective API, and prompted_gens_[model_type].jsonl, which collates the previous two files.

To try a model's output on your own prompts, simply create your own prompts file! To see the format of the prompts file, see prompts/toy_prompt.jsonl.


To generate continuations with DExperts conditioned on sentiment prompts and score them for sentiment using HuggingFace's sentiment classifier, run the following command.


python -m scripts.run_sentiment_experiment \
    --use-dataset \
    --dataset-file $PROMPTS_DATASET \
    --model-type dexperts \
    --model gpt2-large \
    --pos-model $MODEL_DIR/finetuned_gpt2_positive \
    --neg-model $MODEL_DIR/finetuned_gpt2_negative \
    --alpha 3.2 \
    --filter_p 0.9 \

The model_type can be any of the options from before, with the addition of ctrl. Again, the full commands used for each method can be found under scripts/our_scripts/sentiment.

When model_type is dexperts, we always interpret --pos-model as the expert and --neg-model as the anti-expert; for negative steering, use alpha < 0. By leaving one of --pos-model or --neg-model empty, DExperts will re-use the base model as the missing expert or anti-expert.


To evaluate generated output for fluency and diversity, run the following command. The GENERATIONS_FILE should have the format prompted_gens_[model_type].jsonl.

python -m scripts.evaluation.evaluate_generations \
    --generations_file $GENERATIONS_FILE


Our jupyter notebooks are in notebooks/. To obtain the same tables and plots that appear in the paper, look in sentiment_results.ipynb, toxicity_results.ipynb, and human_eval_results.ipynb. To create your own prompts dataset with a couple lines of code, you can get started with prompts_playground.ipynb. Sample and compare generations from each model with review_sentiment_generations.ipynb and review_toxicity_generations.ipynb.

Downloading the original data and models from our paper

To download the prompts we used for evaluation, generations output by each model, and finetuning datasets from our paper, ensure you have gdown installed, then run the following commands inside the dexperts/ root directory. Descriptions of the contents of each of these folders can be found within the folder.

# prompts
unzip && rm
# generations
unzip && rm
# datasets
unzip && rm

To download models from our paper,

mkdir models
cd models
# (anti-)expert models
unzip && rm
# DAPT models
unzip && rm
# PPLM classifiers
unzip && rm


    title = "{DExperts}: Decoding-Time Controlled Text Generation with Experts and Anti-Experts",
    author = "Alisa Liu and Maarten Sap and Ximing Lu and Swabha Swayamdipta and Chandra Bhagavatula and Noah A. Smith and Yejin Choi",
    booktitle = "Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP)",
    year = "2021",
    url = "",

This code was built on top of allenai/real-toxicity-prompts and with inspiration from yangkevin2/naacl-2021-fudge-controlled-generation.

  • OpenWebText non-toxic corpus

    Is it possible to provide the non-toxic set of OpenWebText required for training DAPT or even better providing the finetuned model? Unfortunately, I do not know the exact steps required to make such dataset.


    opened by gulleh 4
  • about datasets/jigsaw-unintended-bias-in-toxicity-classification/toxicity_gte0.5.txt files

    about datasets/jigsaw-unintended-bias-in-toxicity-classification/toxicity_gte0.5.txt files

    Hello, I'm very impressed by your research.

    I want to finetune gpt2 for toxic dataset but I cannot find datasets/jigsaw-unintended-bias-in-toxicity-classification/toxicity_gte0.5.txt files in your toxic_finetune scripts

    Could you share those files? or directly fine-tuned models?


    opened by minbeomkim 2
  • Problems when generate continuations

    Problems when generate continuations

    Hi! I try to generate continuations with DExperts using the following scrpit.

    python -m scripts.run_toxicity_experiment \
        --use-dataset \
        --dataset-file $PROMPTS_DATASET \
        --model-type dexperts \
        --model gpt2-large \
        --perspective-rate-limit $API_RATE \
        --nontoxic-model $MODEL_DIR/finetuned_gpt2_nontoxic \
        --toxic-model $MODEL_DIR/finetuned_gpt2_toxic \
        --alpha 2.0 \
        --filter_p 0.9 \

    But there is a problem I don't understand. The following is the error message.

     0       For more information on what is currently avai...
    1       Instead, these women watched the man they say ...
    2       Because what really matters is that you are in...
    3       There's another controversial Hollywood racial...
    4       That's compared to the $20 million that San Fr...
    9995             Get The Start-Up Israel's Daily Start-Up
    9996    There was once a tiny rectangle of land, devoi...
    9997                     Michael Hanley, Bishop of Oregon
    9998                  Seriously, how in the world did the
    9999    The 21-year-old attacker, who was not immediat...
    Name: text, Length: 10000, dtype: object
    Process Process-1:
    Traceback (most recent call last):
      File "/data0/xp/anaconda3/envs/dexperts/lib/python3.8/multiprocessing/", line 315, in _bootstrap
      File "/data0/xp/anaconda3/envs/dexperts/lib/python3.8/multiprocessing/", line 108, in run
        self._target(*self._args, **self._kwargs)
      File "/data0/xp/ctg/DExperts/utils/", line 168, in perspective_worker
        api = PerspectiveAPI(rate_limit=rate_limit)
      File "/data0/xp/ctg/DExperts/utils/", line 42, in __init__
        self.service = self._make_service(api_key)
      File "/data0/xp/ctg/DExperts/utils/", line 117, in _make_service
        return'commentanalyzer', 'v1alpha1', developerKey=api_key)
      File "/data0/xp/anaconda3/envs/dexperts/lib/python3.8/site-packages/googleapiclient/", line 131, in positional_wrapper
        return wrapped(*args, **kwargs)
      File "/data0/xp/anaconda3/envs/dexperts/lib/python3.8/site-packages/googleapiclient/", line 287, in build
        content = _retrieve_discovery_doc(
      File "/data0/xp/anaconda3/envs/dexperts/lib/python3.8/site-packages/googleapiclient/", line 404, in _retrieve_discovery_doc
        raise UnknownApiNameOrVersion("name: %s  version: %s" % (serviceName, version))
    googleapiclient.errors.UnknownApiNameOrVersion: name: commentanalyzer  version: v1alpha1
    Generation:   0%|             | 0/7813 [00:00<?, ?it/s, batch_size=32]/data0/xp/anaconda3/envs/dexperts/lib/python3.8/site-packages/transformers/ FutureWarning: The `pad_to_max_length` argument is deprecated and will be removed in a future version, use `padding=True` or `padding='longest'` to pad to the longest sequence in the batch, or use `padding='max_length'` to pad to a max length. In this case, you can give a specific length with `max_length` (e.g. `max_length=45`) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
    Generation:   0%|             | 0/7813 [00:00<?, ?it/s, batch_size=32]
    Traceback (most recent call last):
      File "/data0/xp/anaconda3/envs/dexperts/lib/python3.8/", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/data0/xp/anaconda3/envs/dexperts/lib/python3.8/", line 87, in _run_code
        exec(code, run_globals)
      File "/data0/xp/ctg/DExperts/scripts/", line 187, in <module>
      File "/data0/xp/anaconda3/envs/dexperts/lib/python3.8/site-packages/click/", line 1128, in __call__
        return self.main(*args, **kwargs)
      File "/data0/xp/anaconda3/envs/dexperts/lib/python3.8/site-packages/click/", line 1053, in main
        rv = self.invoke(ctx)
      File "/data0/xp/anaconda3/envs/dexperts/lib/python3.8/site-packages/click/", line 1395, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/data0/xp/anaconda3/envs/dexperts/lib/python3.8/site-packages/click/", line 754, in invoke
        return __callback(*args, **kwargs)
      File "/data0/xp/ctg/DExperts/scripts/", line 173, in main
        for i, gen in enumerate(generations_iter):
      File "/data0/xp/ctg/DExperts/generation/", line 202, in dexperts
        yield from _gpt2_helper(
      File "/data0/xp/ctg/DExperts/generation/", line 159, in _gpt2_helper
        batch = generator.generate(prompt, max_len, **generate_kwargs)
      File "/data0/xp/ctg/DExperts/generation/", line 96, in generate
        base_logits = top_k_top_p_filtering(base_logits, top_p=filter_p)
      File "/data0/xp/ctg/DExperts/utils/", line 29, in top_k_top_p_filtering
        sorted_logits, sorted_indices = torch.sort(logits, descending=True)
    TypeError: sort() received an invalid combination of arguments - got (str, descending=bool), but expected one of:
     * (Tensor input, *, bool stable, int dim, bool descending, tuple of Tensors out)
     * (Tensor input, int dim, bool descending, *, tuple of Tensors out)
     * (Tensor input, *, bool stable, name dim, bool descending, tuple of Tensors out)
     * (Tensor input, name dim, bool descending, *, tuple of Tensors out)

    In short, the input of torch.sort function should be Tensor, but string logits is input, and I don't know why that is. I look forward to your reply. Thank you.

    opened by Richard88888 2
  • Why negative prompts to positive  is more harder than positive prompts to negative?

    Why negative prompts to positive is more harder than positive prompts to negative?

    Hi, the work is very excellent and has benefited me a lot . However, we found a strange phenomenon. The conversion rate of the positive prompts to the negative reported in the paper was around 36%, whereas it was around 65% in the setting of negative to positive. This is very counterintuitive, because in general the two should be symmetrical. How do you explain this phenomenon?

    opened by littlehacker26 1
  • How to use DExperts on BART?

    How to use DExperts on BART?

    Hello, I want to ask how can I use DExperts on BART. I notice that you have used it for stylistic rewriting, but I don't know how to achieve it. Can you help me? Thank you.

    opened by 20174376 0
  • how to judge an expert or an anti-expert is good or not

    how to judge an expert or an anti-expert is good or not

    Hello! I have read your paper and have a question about how to judge an expert or anti-expert is good or not. Your paper says that for sentiment control you trained an expert and an anti-expert based on the SST5 dataset. So what's the standard for evaluation of the fine-tuned model. Is that ppl or total likelihood loss on development set ? I would really appreciate it if you can answer my question!

    opened by R1047 0
Alisa Liu
Alisa Liu
