Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

David Bamman

Last update: Dec 6, 2022

Related tags

Text Data & NLP anlp21

Overview

anlp21

Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dbamman/info256.html

Notebook	Description
1.words/EvaluateTokenizationForSentiment	The impact of tokenization choices on sentiment classification.
1.words/ExploreTokenization	Different methods for tokenizing texts (whitespace, NLTK, spacy, regex)
1.words/TokenizePrintedBooks	Design a better tokenizer for printed books
1.words/Text_Complexity	Implement type-token ratio and Flesch-Kincaid Grade Level scores for text
2.compare/ChiSquare, Mann-Whitney Tests	Explore two tests for finding distinctive terms
2.compare/Log-odds ratio with priors	Implement the log-odds ratio with an informative (and uninformative) Dirichlet prior
3.dictionaries/DictionaryTimeSeries	Plot sentiment over time using human-defined dictionaries
3.dictionaries/Empath	Explore using Empath dictionaries to characterize texts
4.embeddings/DistributionalSimilarity	Explore distributional hypothesis to build high-dimensional, sparse representations for words
4.embeddings/WordEmbeddings	Explore word embeddings using Gensim
4.embeddings/Semaxis	Implement SemAxis for scoring terms along a user-defined axis (e.g., positive-negative, concrete-abstract, hot-cold),
4.embeddings/BERT	Explore the basics of token representations in BERT and use it to find token nearest neighbors
4.embedings/SequenceEmbeddings	Use sequence embeddings to find TV episode summaries most similar to a short description
5.eda/WordSenseClustering	Inferring distinct word senses using KMeans clustering over BERT representations
5.eda/Haiku KMeans	Explore text representation in clustering by trying to group haiku and non-haiku poems into two distinct clusters

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

69 Jan 3, 2023

Code for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

This repo provides the code of the following papers: (GAR) "Generation-Augmented Retrieval for Open-domain Question Answering", ACL 2021 (RIDER) "Read

49 Dec 26, 2022

Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"

22 Oct 21, 2022

This repository is home to the Optimus data transformation plugins for various data processing needs.

Transformers Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus. To i

37 Dec 14, 2022

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

Parser-Free Virtual Try-on via Distilling Appearance Flows, CVPR 2021 Official code for CVPR 2021 paper 'Parser-Free Virtual Try-on via Distilling App

395 Jan 3, 2023

Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

capbot-siic Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021. Problem Inspiration A plethora

19 Feb 17, 2022

This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

1.1k Dec 27, 2022

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances This repository contains the code and pre-trained mode

90 Dec 27, 2022

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Mask-Align: Self-Supervised Neural Word Alignment This is the implementation of our work Mask-Align: Self-Supervised Neural Word Alignment. @inproceed

46 Dec 15, 2022

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Related tags

Overview

anlp21

You might also like...

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Code for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"

This repository is home to the Optimus data transformation plugins for various data processing needs.

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Owner

David Bamman

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Simple and efficient RevNet-Library with DeepSpeed support

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

Py65 65816 - Add support for the 65C816 to py65

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].