Adversarial Examples for Extreme Multilabel Text Classification

Last update: May 14, 2022

Related tags

Text Data & NLP adv-xmtc

Overview

Adversarial Examples for Extreme Multilabel Text Classification

The code is adapted from the source codes of BERT-ATTACK [1], APLC_XLNet [2], and AttentionXML [3]

Requirements

The code has been test by the following packages:

python==3.6.13
boto3==1.17.70
ruamel.yaml==0.16.12
numpy==1.19.2
scipy==1.5.4
matplotlib==3.2.2
scikit-learn==0.24.2
transformers==2.9.0
torch==1.4.0
nltk==3.4
pandas==1.1.5
requests==2.25.1
tqdm==4.60.0

A small experiment on Wikipedia-31K with only 10 samples per bin

Downolad the data and the APLC_XLNet model trained on this data as follows:

bash download_data_model.sh

For preprocessing the data and run positive-targeted attacks with 10 samples per bin, run the following:

bash pos_attack.sh

To check the results of the attacks, run the following:

bash resutls.sh

Reference

[1] Li et al., BERT-ATTACK: Adversarial Attack Against BERT Using BERT, EMNLP 2020

[2] Ye et al., Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification, ICML 2020

[3] You et al., AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification, NeurIPS 2019

You might also like...

Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

160 Feb 9, 2021

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

118 Dec 30, 2022

Active learning for text classification in Python

Active Learning allows you to efficiently label training data in a small-data scenario.

375 Dec 28, 2022

Pipeline for fast building text classification TF-IDF + LogReg baselines.

Text Classification Baseline Pipeline for fast building text classification TF-IDF + LogReg baselines. Usage Instead of writing custom code for specif

57 Dec 7, 2022

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task。涵盖68个领域、共计916万词的专业词典知识库，可用于文本分类、知识增强、领域词汇库扩充等自然语言处理应用。

357 Dec 24, 2022

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

The baseline code is for EDA: Easy Data Augmentation techniques for boosting performance on text classification tasks

81 Dec 9, 2022

NLP Text Classification

多标签文本分类任务近年来随着深度学习的发展，模型参数的数量飞速增长。为了训练这些参数，需要更大的数据集来避免过拟合。然而，对于大部分NLP任务来说，构建大规模的标注数据集非常困难（成本过高），特别是对于句法和语义相关的任务。相比之下，大规模的未标注语料库的构建则相对容易。为了利用这些数据，我们可以

1 Nov 11, 2021

Text Classification Using LSTM

Text classification is the task of assigning a set of predefined categories to free text. Text classifiers can be used to organize, structure, and categorize pretty much anything. For example, new articles can be organized by topics, support tickets can be organized by urgency, chat conversations can be organized by language, brand mentions can be organized by sentiment, and so on.

3 Jan 3, 2023

Binary LSTM model for text classification

Text Classification The purpose of this repository is to create a neural network model of NLP with deep learning for binary classification of texts re

1 Mar 11, 2022

Adversarial Examples for Extreme Multilabel Text Classification

Related tags

Overview

Requirements

A small experiment on Wikipedia-31K with only 10 samples per bin

Reference

You might also like...

Text vectorization tool to outperform TFIDF for classification tasks

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

Active learning for text classification in Python

Pipeline for fast building text classification TF-IDF + LogReg baselines.

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

NLP Text Classification

Text Classification Using LSTM

Binary LSTM model for text classification

Owner

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

Library for fast text representation and classification.

Text vectorization tool to outperform TFIDF for classification tasks

Library for fast text representation and classification.