Beyond Paragraphs: NLP for Long Sequences

AI2

Last update: Dec 2, 2022

Related tags

Text Data & NLP naacl2021-longdoc-tutorial

Overview

Beyond Paragraphs: NLP for Long Sequences

This NAACL 2021 tutorial will be held on Sunday, June 6, 2021.

Location & Time

Location: Underline.io link (zoom link available; accessible upon registration)
Time: 8am-12pm PST / 11am-3pm EST / 3pm-7pm GMT
Schedule

PST	EST	GMT	Schedule	Location
8-9:30	11-12:30	3-4:30	Watch Part 1, 2 and 3	Prerecorded videos
9:30-10	12:30-1	4:30-5	Break + Optional QnA	Zoom
10-11 ~~11:30~~	1-2 ~~2:30~~	5-6 ~~6:30~~	Watch Part 4 and 5	Prerecorded videos
~~11:30~~ 11-12	~~2:30~~ 2-3	~~6:30~~ 6-7	QnA	Zoom

Speakers

Iz Beltagy (Al2) [email protected]
Arman Cohan (Al2) [email protected]
Hanna Hajishirzi (UW, Al2) [email protected]
Sewon Min (UW) [email protected]
Matthew Peters (AI2) [email protected]

Materials

Note: Parts 5 and 6 are presented in the 5th video on Underline.

Reading list

Part 1. Intro & Overview of tasks

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, Christopher Potts. Learning Word Vectors for Sentiment Analysis
Johannes Kiesel, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, Martin Potthast. SemEval-2019 Task 4: Hyperpartisan News Detection
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Johannes Welbl, Pontus Stenetorp, Sebastian Riedel. 2018. Constructing Datasets for Multi-hop Reading Comprehension Across Documents
Courtney Napoles, Matthew Gormley, Benjamin Van Durme. 2012. Annotated Gigaword
Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, Nazli Goharian. 2018. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

Part 2. Graph based methods

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, Eduard Hovy. 2016. Hierarchical Attention Networks for Document Classification
Sarthak Jain, Madeleine van Zuylen, Hannaneh Hajishirzi, Iz Beltagy. 2020. SciREX: A Challenge Dataset for Document-Level Information Extraction
Ming-Wei Chang, Kristina Toutanova, Kenton Lee, Jacob Devlin. 2019. Language Model Pre-training for Hierarchical Document Representation
Xingxing Zhang, Furu Wei, Ming Zhou. 2019. HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization
Kenton Lee, Luheng He, Luke Zettlemoyer. 2018. Higher-order Coreference Resolution with Coarse-to-fine Inference
David Wadden, Ulme Wennberg, Yi Luan, Hannaneh Hajishirzi. 2019. Entity, Relations, and Event Extraction with Contextualized Span Representations
Linfeng Song, Zhiguo Wang, Mo Yu, Yue Zhang, Radu Florian, Daniel Gildea. 2018. Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Graph Neural Networks
Yunxuan Xiao, Yanru Qu, Lin Qiu, Hao Zhou, Lei Li, Weinan Zhang, Yong Yu. 2019. Dynamically Fused Graph Network for Multi-hop Reasoning
Yuwei Fang, Siqi Sun, Zhe Gan, Rohit Pillai, Shuohang Wang, Jingjing Liu. 2020. Hierarchical Graph Network for Multi-hop Question Answering
Sewon Min, Danqi Chen, Luke Zettlemoyer, Hannaneh Hajishirzi. 2019. Knowledge-guided Text Retrieval and Reading for Open Domain Question Answering

Part 3. Long sequence transformers

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap. 2019. Compressive Transformers for Long-Range Sequence Modelling
Aurko Roy, Mohammad Saffar, Ashish Vaswani, David Grangier. 2020. Efficient Content-Based Sparse Attention with Routing Transformers
Yi Tay, Dara Bahri, Liu Yang, Donald Metzler, Da-Cheng Juan. 2020. Sparse Sinkhorn Attention
Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya. 2020. Reformer: The Efficient Transformer
Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever. 2019. Generating Long Sequences with Sparse Transformers
Iz Beltagy, Matthew E. Peters, Arman Cohan. 2020. Longformer: The Long-Document Transformer
Joshua Ainslie, Santiago Ontanon, Chris Alberti, Vaclav Cvicek, Zachary Fisher, Philip Pham, Anirudh Ravula, Sumit Sanghai, Qifan Wang, Li Yang. 2020. ETC: Encoding Long and Structured Inputs in Transformers
Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 2020. Big Bird: Transformers for Longer Sequences
Tom B. Brown et al. 2020. Language Models are Few-Shot Learners
Scott Gray, Alec Radford and Diederik P. Kingma. 2017. GPU Kernels for Block-Sparse Weights
Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret. 2020. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller. 2020. Rethinking Attention with Performers
Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, Lingpeng Kong. 2021. Random Feature Attention
Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma. 2020. Linformer: Self-Attention with Linear Complexity
Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler. 2020. Long Range Arena: A Benchmark for Efficient Transformers

Part 4. Pretraining and finetuning

Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. 2021. Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
Ofir Press, Noah A. Smith, Mike Lewis. 2020. Shortformer: Better Language Modeling using Shorter Inputs
Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, Ido Dagan. 2021. Cross-Document Language Modeling

You might also like...

Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

COCO LM Pretraining (wip) Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch. They were a

44 Jul 28, 2022

Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.

Import Subtitles for Blender VSE Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module. Supported formats by py

4 Feb 27, 2022

When doing audio and video sentiment recognition, I found that a lot of code is duplicated, often a function in different time debugging for a long time, based on this problem, I want to manage all the previous work, organized into an open source library can be iterative. For their own use and others.

FastAudioVisual Our project is developed here. The goal finish time is March 01, 2021 What is FastAudioVisual? FastAudioVisual is a tool that allows u

39 Oct 27, 2022

Extracting Summary Knowledge Graphs from Long Documents

GraphSum This repo contains the data and code for the G2G model in the paper: Extracting Summary Knowledge Graphs from Long Documents. The other basel

10 Oct 21, 2022

ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension

ThinkTwice ThinkTwice is a retriever-reader architecture for solving long-text machine reading comprehension. It is based on the paper: ThinkTwice: A

4 Aug 6, 2021

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

186 Dec 24, 2022

Official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

This repository is the official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

101 Dec 30, 2022

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

3 Dec 22, 2021

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention ACL2021 Findings Usage 0. Prepare environment Requirements: python==3.6 te

8 Dec 16, 2022

Comments

Added rouge 2,L and model checkpointing
This PR contains the following changes:

logging rouge 2, L, LSum

writing metrics to file

saving best 3 checkpoints based on val_rouge 1

syntax error fix for ddp_plugin
opened by vidhishanair 0
README updated
Add date, time, location (underline link): schedule still TBA

Add speaker info

Add slide materials

Add reading list

For Part 1, as adding all tasks/datasets are less meaningful, sampled some of them

For Part 3 and 4, many papers overlapped, so added based on when they first appear

No reading list for Part 5 and 6
opened by shmsw25 0
Rouge very low using provided example
Hi, thanks for the demo script and nice tutorial. I tried to use the example code included in summarization.py as below. But I only get Rouge ~0.1 for testing data. Is there anything I didn't notice? Or I have to use all training data to get Rouge 43 as shown in the slide. Thanks!

PYTHONWARNINGS="ignore" CUDA_VISIBLE_DEVICES=6,7 python summarization.py \ --fp16 --batch_size 2 --grad_accum 1 --grad_ckpt \ --max_input_len 16384 --attention_window 1024
opened by yjqiu 0