How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.

Last update: Jun 5, 2022

Related tags

Deep Learning codemix

Overview

This repo contains codes for the following paper:

How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.
Aditya Shah, Chandresh Kumar Maurya, In Proceedings of the 18th International Conference on Natural Language Processing - (ACL 2021).

The presentation slides are available here

Requirements

Python 3.6 or higher
Pytorch >= 1.3.0
Pytorch_transformers (also known as transformers)
Pandas, Numpy, Pickle
Fasttext

Download the fasttext embed file:

The fasttext embedding file can be obtained here

Dataset

We release the benchmark sarcasm dataset for Hinglish language to facilitate further research on code-mix NLP.

We create a dataset using TweetScraper built on top of scrapy to extract code-mix hindi-english tweets. We pass search tags like #sarcasm, #humor, #bollywood, #cricket, etc., combined with most commonly used code-mix Hindi words as query. All the tweets with hashtags like #sarcasm, #sarcastic, #irony, #humor etc. are treated as positive. Non sarcastic tweets are extracted using general hashtags like #politics, #food, #movie, etc. The balanced dataset comprises of 166K tweets.

Finally, we preprocess and clean the data by removing urls, hashtags, mentions, and punctuation in the data. The respective files can be found here as train.csv, val.csv, and test.csv

Arguments:

--epochs:  number of total epochs to run, default=10

--batch-size: train batchsize, default=2

--lr: learning rate for the model, default=5.16e-05

--hidden_size_lstm: hidden size of lstm, default=1024

--hidden_size_linear: hidden size of linear layer, default=128

--seq_len: sequence lenght of input text, default=56

--clip: gradient clipping, default=0.218

--dropout: dropout value, default=0.198

--num_layers: number of lstm layers, default=1

--lstm_bidirectional: bidirectional lstm, default=False

--fasttext_embed_file: path to fasttext embedding file, default='new_hing_emb'

--train_dir: path to train file, default='train.csv'

--valid_dir: path to validation file, default='valid.csv'

--test_dir: path to test file, default='test.csv'

--checkpoint_dir: path to the saved, default='selfnet.pt'

--test: testing the model, default=False

Train

python main.py

Test

python main.py --test True

You might also like...

Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

CSRA This is the official code of ICCV 2021 paper: Residual Attention: A Simple But Effective Method for Multi-Label Recoginition Demo, Train and Vali

163 Dec 22, 2022

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

WECHSEL Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models. arXiv: https://arx

45 Dec 29, 2022

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

8.4k Jan 1, 2023

A scientific and useful toolbox, which contains practical and effective long-tail related tricks with extensive experimental results

Bag of tricks for long-tailed visual recognition with deep convolutional neural networks This repository is the official PyTorch implementation of AAA

181 Dec 28, 2022

Pytorch codes for "Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation"

Self-Supervised-MVS This repository is the official PyTorch implementation of our AAAI 2021 paper: "Self-supervised Multi-view Stereo via Effective Co

127 Jan 4, 2023

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper) @misc{zhang2021compress,

46 Dec 7, 2022

How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.

Related tags

Overview

Requirements

Download the fasttext embed file:

Dataset

Arguments:

Train

Test

You might also like...

Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

A scientific and useful toolbox, which contains practical and effective long-tail related tricks with extensive experimental results

Pytorch codes for "Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation"

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

We propose a new method for effective shadow removal by regarding it as an exposure fusion problem.

Run Effective Large Batch Contrastive Learning on Limited Memory GPU

Owner

Code for the paper: "On the Bottleneck of Graph Neural Networks and Its Practical Implications"

PyTorch implementation code for the paper MixCo: Mix-up Contrastive Learning for Visual Representation

Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, as a standalone package for Pytorch

Codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Lightweight mmm - Lightweight (Bayesian) Media Mix Model

Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars, CVPR 2022.

An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.