Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Nader Akoury

Last update: Dec 20, 2022

Related tags

Overview

Storium GPT-2 Models

This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation]. It has all the code necessary to reproduce the models and analysis from the paper.

Overview

A high-level outline of our dataset and platform. In this example from a real STORIUM game, the character ADIRA MAKAROVA uses the strength card DEADLY AIM to DISRUPT THE GERMANS, a challenge card. Our model conditions on the natural language annotations in the scene intro, challenge card, strength card, and character, along with the text of the previous scene entry (not shown) to generate a suggested story continuation. Players may then edit the model output, by adding or deleting text, before publishing the entry. We collect these edits, using the matched text as the basis of our USER metric. New models can be added to the platform by simply implementing four methods: startup, shutdown, preprocess, and generate.

Deployment

This repository contains the code that makes our GPT-2 story generation models deployable on our evaluation platform, so it serves as a great template for how to structure your code. Please see the file figmentate.py for the simple API required for making your model deployable on our platform. You will also need to provide a json file with any properties needed to pass to your startup method. See for example the properties below:

{
  "scene_entry":
  {
    "properties": {
      "checkpoint_path": "/var/lib/figmentator/checkpoint",
      "sample": {
	"top_p": 0.9,
	"temperature": 0.9,
	"repetition_penalty": 1.2
      }
    },
    "requires": ["torch==1.3.0", "transformers==2.2.0", "kiwisolver==1.1.0"],
    "cls": "model=figmentate:GPT2Figmentator"
  }
}

The key scene_entry defines the type of model being created. Currently, we only support models that generate the text of a scene entry, though we might support other types of prediction models in the future, like suggesting cards or narrator actions.

The properties object will be passed to your startup method. It allows for defining any parameters needed for sampling from your model.

The requires list, is simply a list of python packages that need to be installed for your model to run. These will be automatically installed when your model is deployed. If you notice, we specify the deep learning package torch as a requirement. That's because our code is agnostic to the underlying deep learning framework being used by your model. That means it should support models using other frameworks like tensorflow or jax.

Finally, the cls string is the class that wraps your model. It is specified using Python's entry points syntax.

Cite

@inproceedings{akoury2020storium,
  Author = {Nader Akoury, Shufan Wang, Josh Whiting, Stephen Hood, Nanyun Peng and Mohit Iyyer},
  Booktitle = {Empirical Methods for Natural Language Processing},
  Year = "2020",
  Title = {{STORIUM}: {A} {D}ataset and {E}valuation {P}latform for {S}tory {G}eneration}
}

You might also like...

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

42 Jan 7, 2023

Comments

Can't download the dataset

Hi, your paper "STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation" is wonderful, I am very interested in the STORIUM dataset, and registered on storium.cs.umass.edu, but still can't download, when clicking the download button in the popup, it just redirects back to the home page.

opened by ghosthamlet 2

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Related tags

Overview

Storium GPT-2 Models

Overview

Deployment

Cite

You might also like...

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

《LXMERT: Learning Cross-Modality Encoder Representations from Transformers》(EMNLP 2020)

EMNLP 2020 - Summarizing Text on Any Aspects

TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

Related resources for our EMNLP 2021 paper Plan-then-Generate: Controlled Data-to-Text Generation via Planning

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation.

Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

The story of Chicken for Club Bing

Comments

Can't download the dataset

Owner

Nader Akoury

Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games

This is the repo for our work "Towards Persona-Based Empathetic Conversational Models" (EMNLP 2020)

Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”

Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.