Deep learning for NLP crash course at ABBYY.

Dan Anastasyev

Last update: Dec 18, 2022

Related tags

Overview

Deep NLP Course at ABBYY

Deep learning for NLP crash course at ABBYY.

I'm gradually updating and translating the notebooks right now. Stay in touch.

Materials

Week 1: Introduction

Sentiment analysis on the IMDB movie review dataset: a short overview of classical machine learning for NLP + indecently brief intro to keras.

Russian version:

Updated English version:

Week 2: Word Embeddings: Part 1

Meet the Word Embeddings: an unsupervised method to capture some fun relationships between words.
Phrases similarity with word embeddings model + word based machine translation without parallel data (with MUSE word embeddings).

Russian version:

Updated English version:

Week 3: Word Embeddings: Part 2

Introduction to PyTorch. Implementation of pet linear regression on pure numpy and pytorch. Implementations of CBoW, skip-gram, negative sampling and structured Word2vec models.

Russian version:

Updated English version:

Week 4: Convolutional Neural Networks

Introduction to convolutional networks. Relations between convolutions and n-grams. Simple surname detector on character-level convolutions + fun visualizations.

Russian version:

Updated English version:

Week 5: RNNs: Part 1

RNNs for text classification. Simple RNN implementation + memorization test. Surname detector in multilingual setup: character-level LSTM classifier.

Russian version:

Updated English version:

Week 6: RNNs: Part 2

RNNs for sequence labelling. Part-of-speech tagger implementations based on word embeddings and character-level word embeddings.

Russian version:

Week 7: Language Models: Part 1

Character-level language model for Russian troll tweets generation: fixed-window model via convolutions and RNN model.
Simple conditional language model: surname generation given source language.
And Toxic Comment Classification Challenge - to apply your skills to a real-world problem.

Russian version:

Week 8: Language Models: Part 2

Word-level language model for poetry generation. Pet examples of transfer learning and multi-task learning applied to language models.

Russian version:

Week 9: Seq2seq

Seq2seq for machine translation and image captioning. Byte-pair encoding, beam search and other usefull stuff for machine translation.

Russian version:

Week 10: Seq2seq with Attention

Seq2seq with attention for machine translation and image captioning.

Russian version:

Week 11: Transformers & Text Summarization

Implementation of Transformer model for text summarization. Discussion of Pointer-Generator Networks for text summarization.

Russian version:

Week 12: Dialogue Systems: Part 1

Goal-orientied dialogue systems. Implemention of the multi-task model: intent classifier and token tagger for dialogue manager.

Russian version:

Week 13: Dialogue Systems: Part 2

General conversation dialogue systems and DSSMs. Implementation of question answering model on SQuAD dataset and chit-chat model on OpenSubtitles dataset.

Russian version:

Week 14: Pretrained Models

Pretrained models for various tasks: Universal Sentence Encoder for sentence similarity, ELMo for sequence tagging (with a bit of CRF), BERT for SWAG - reasoning about possible continuation.

Russian version:

Final Presentation

NLP Summary - summary of cool stuff that appeared and didn't in the course.

You might also like...

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

(Framework for Adapting Representation Models) What is it? FARM makes Transfer Learning with BERT & Co simple, fast and enterprise-ready. It's built u

1.1k Feb 14, 2021

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you

2k Feb 9, 2021

Machine learning models from Singapore's NLP research community

SG-NLP Machine learning models from Singapore's natural language processing (NLP) research community. sgnlp is a Python package that allows you to eas

21 Dec 17, 2022

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from huggingface transformers. In the future, we will also support PLMs implemented by other libraries.

2.3k Jan 8, 2023

NLP, Machine learning

Netflix-recommendation-system NLP, Machine learning About Recommendation algorithms are at the core of the Netflix product. It provides their members

6 Jan 12, 2022

NLP - Machine learning

Flipkart-product-reviews NLP - Machine learning About Product reviews is an essential part of an online store like Flipkart’s branding and marketing.

1 Oct 29, 2021

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Frog for Python This is a Python binding to the Natural Language Processing suite Frog. Frog is intended for Dutch and performs part-of-speech tagging

46 Dec 14, 2022

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

Colibri Core by Maarten van Gompel, [email protected], Radboud University Nijmegen Licensed under GPLv3 (See http://www.gnu.org/licenses/gpl-3.0.html

122 Nov 17, 2022

💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

24.9k Jan 2, 2023

Comments

ImportError: cannot import name 'RandomizedLogisticRegression'
In Week 01 - Introduction.ipynb, even though there is no import for "RandomizedLogisticRegression" (which is now deprecated), on executing below elicode we see ImportError. Strange that for this from sklearn.linear_model import LogisticRegression, it shows a different import error. Please Help!! Is this because of eli5s version 0.8.1?

import eli5 eli5.show_weights(classifier, vec=vectorizer, top=40)
opened by shishir13 0
Картинка в 1ой неделе сломана

Если набрать больше 71 процента в первом задании с ключевыми словами, то выдаёт ошибку /usr/local/lib/python3.6/dist-packages/IPython/core/formatters.py:364: FormatterWarning: image/png formatter returned invalid type <class 'tuple'> (expected (<class 'bytes'>, <class 'str'>)) for object: <IPython.core.display.Image object> FormatterWarning

opened by AlexUmnov 0

Deep learning for NLP crash course at ABBYY.

Related tags

Overview

Deep NLP Course at ABBYY

Materials

Week 1: Introduction

Week 2: Word Embeddings: Part 1

Week 3: Word Embeddings: Part 2

Week 4: Convolutional Neural Networks

Week 5: RNNs: Part 1

Week 6: RNNs: Part 2

Week 7: Language Models: Part 1

Week 8: Language Models: Part 2

Week 9: Seq2seq

Week 10: Seq2seq with Attention

Week 11: Transformers & Text Summarization

Week 12: Dialogue Systems: Part 1

Week 13: Dialogue Systems: Part 2

Week 14: Pretrained Models

Final Presentation

You might also like...

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Machine learning models from Singapore's NLP research community

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

NLP, Machine learning

NLP - Machine learning

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

💫 Industrial-strength Natural Language Processing (NLP) in Python

Comments

ImportError: cannot import name 'RandomizedLogisticRegression'

Картинка в 1ой неделе сломана

Owner

Dan Anastasyev

Course project of NLP@UCAS

Code for the project carried out fulfilling the course requirements for Fall 2021 NLP at NYU

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

This is my reading list for my PhD in AI, NLP, Deep Learning and more.

This repo is to provide a list of literature regarding Deep Learning on Graphs for NLP

A NLP program: tokenize method, PoS Tagging with deep learning

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.