CorNet Correlation Networks for Extreme Multi-label Text Classification

Overview

CorNet

Correlation Networks for Extreme Multi-label Text Classification

Prerequisites

  • python==3.6.3
  • pytorch==1.2.0
  • torchgpipe==0.0.5
  • click==7.0
  • ruamel.yaml==0.16.5
  • numpy==1.16.2
  • scipy==1.2.1
  • scikit-learn==0.20.3
  • gensim==3.7.2
  • nltk==3.2.4
  • tqdm==4.31.1
  • joblib==0.13.2
  • logzero==1.5.0

Datasets

Pretrained Word Embeddings in gensim format

Run

Preprocess (the EUR-Lex dataset is already tokenized in advance)

./scripts/preprocess_eurlex.sh

or (the other datasets need to be tokenized using NLTK)

./scripts/preprocess_others.sh

Train and evaluate

./scripts/run_models.sh

Baselines

The codes for the baseline models are adapted from the following repositories: XML-CNN, BERT, MeSHProbeNet, and AttentionXML.

You might also like...
Library for fast text representation and classification.

fastText fastText is a library for efficient learning of word representations and sentence classification. Table of contents Resources Models Suppleme

Text vectorization tool to outperform TFIDF for classification tasks
Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

Library for fast text representation and classification.

fastText fastText is a library for efficient learning of word representations and sentence classification. Table of contents Resources Models Suppleme

Text vectorization tool to outperform TFIDF for classification tasks
Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

Active learning for text classification in Python
Active learning for text classification in Python

Active Learning allows you to efficiently label training data in a small-data scenario.

Pipeline for fast building text classification TF-IDF + LogReg baselines.

Text Classification Baseline Pipeline for fast building text classification TF-IDF + LogReg baselines. Usage Instead of writing custom code for specif

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task。涵盖68个领域、共计916万词的专业词典知识库,可用于文本分类、知识增强、领域词汇库扩充等自然语言处理应用。

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification
This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

The baseline code is for EDA: Easy Data Augmentation techniques for boosting performance on text classification tasks

Comments
  • License for the codebase

    License for the codebase

    Hello! We are planning to re-use the code as baselines for our research. Is it possible for you to add a license (i.e. MIT or BSD) for the codebase? Thanks!

    opened by RifleZhang 1
  • 关于CorNet在其他模型上实现无效的疑惑

    关于CorNet在其他模型上实现无效的疑惑

    您好!在阅读您的文章以及相关代码后,我尝试把CorNet同样用于极多标签文本分类的一些模型上,例如近一两年的LightXML,但是我依你所指,在output后添加CorNet,但是发现运行出来的结果不升反降,这当中究竟是不是我在理解文章以及源码的过程中有什么忽略的地方所在呢?烦请您百忙中抽空解答谢谢!

    opened by Chanchiuhung 0
Owner
Guangxu Xun
Guangxu Xun
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2.3k Dec 29, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2k Feb 9, 2021
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

LancoPKU 105 Jan 3, 2023
Mirco Ravanelli 2.3k Dec 27, 2022
Finding Label and Model Errors in Perception Data With Learned Observation Assertions

Finding Label and Model Errors in Perception Data With Learned Observation Assertions This is the project page for Finding Label and Model Errors in P

Stanford Future Data Systems 17 Oct 14, 2022
Label data using HuggingFace's transformers and automatically get a prediction service

Label Studio for Hugging Face's Transformers Website • Docs • Twitter • Join Slack Community Transfer learning for NLP models by annotating your textu

Heartex 135 Dec 29, 2022
A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

A Python package implementing a new model for text classification with visualization tools for Explainable AI ?? Online live demos: http://tworld.io/s

Sergio Burdisso 285 Jan 2, 2023