《LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classiﬁcation》(AAAI 2021) GitHub:

Last update: Dec 5, 2022

Related tags

Deep Learning LightXML

Overview

LightXML

LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classiﬁcation

Datasets

LightXML uses the same dataset with AttentionXML and X-Transform.

please download them from the following links.

Experiments

train and eval

./run.sh [eurlex4k|wiki31k|amazon13k|amazon670k|wiki500k]

Comments

Question on dataset

Hi, I am new to XMC. After downloading all datasets, I noticed that the labels for both test and train sets are number or index. I am wondering if there is a file storing the exact text label corresponding to each label index. Thank you so much.

opened by jil818 6
LightXML on other dataset

Hi! Did I need to do something for label clustering when I train LightXML on my own datasets other than [eurlex4k|wiki31k|amazon13k|amazon670k|wiki500k] Why does LightXML only use label clustering on Wiki500k&670k?

opened by wangchichi1999 6
Training about Wiki10-31k

Thanks for your code !!!

After running your code, I found that you just did a simply binary classification for Wiki10-31k (without label recalling , label ranking and dynamic negative sampling) :

https://github.com/kongds/LightXML/blob/b9af9443004d3bce8b9116edfe038b702d1b295c/src/model.py#L100-L107

So, I wonder whether it was all you've done for Wiki10-31k or I missed something important, thanks a lot!

opened by yc1999 3
Remove `apex` from requirements
The apex==0.9.10dev that you mentioned in the requirements.txt file is not a valid version of apex which you wanted to use. I got bellow error when running your code

File "/usr/local/lib/python3.7/dist-packages/apex/__init__.py", line 13, in <module> from pyramid.session import UnencryptedCookieSessionFactoryConfig ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)

According to this discussion best way to solve this issue is to:

remove apex==0.9.10dev from requirements.txt

add bellow instructions in readme as a step of requirement installation

git clone https://github.com/NVIDIA/apex cd apex python setup.py install
opened by sadrasabouri 3
Question related to the model training and usage of raw text

Hello. Thank you very much for the impressive work and for sharing your work. I have one question related to the training process. In the paper, I read that LightXML can use raw text data to provide end-to-end prediction, similar to usual deep learning based approaches which use raw text.

However, when I ran the LightXML code, I found that TfIdf feature is used in the train.txt file. I wonder why this TfIdf is necessary? Would the TfIdf affect the performance? Can we swap the TfIdf with other vectorizer (e.g., bag of word feature, word piece, etc.).

Thank you very much for the help!

opened by StefanusAgus 3

Only ensemble of 'bert-base', 'roberta' and 'xlnet' would get sota on dataset of Eurlex-4K, AmazonCat-13K

Only ensemble of 'bert-base', 'roberta' and 'xlnet' would get sota on dataset of Eurlex-4K, AmazonCat-13K. Is that realy？

code from 'src/model.py':

is_training = labels is not None
outs = self.bert(
    input_ids,
    attention_mask=attention_mask,
    token_type_ids=token_type_ids
)[-1]

out = torch.cat([outs[-i][:, 0] for i in range(1, self.feature_layers+1)], dim=-1)
out = self.drop_out(out)
group_logits = self.l0(out)
if self.group_y is None:
    logits = group_logits
    if is_training:
        loss_fn = torch.nn.BCEWithLogitsLoss()
        loss = loss_fn(logits, labels)
        return logits, loss
    else:
        return logits

opened by yongzhuo 2

Question of the library apex in the requirements

The "requirements.txt" shows the level of the apex is 0.9.10dev. Do you mean the apex from the pypi(https://libraries.io/pypi/apex) or the Nivida apex(https://github.com/NVIDIA/apex)?

opened by 0ystersauce 2
version of Python&Bert pretrained model

Hi! Great work! : D If the version of Python is 3.6 in your code? And Does LightXML just use Bert-base-uncased from Huggingface, or train it on specific corpus for specific task？ Thanks!

opened by wangchichi1999 2
FileNotFoundError: [Errno 2] No such file or directory: './data/Eurlex-4K/train_raw_texts.txt'

Thanks for your excellent work! I download EUR-Lex.tar but only find train.txt, train_labels.txt, train_texts.txt, test.txt ,test_labels.txt and test_texts.txt. There is no Eurlex-4K/train_raw_texts.txt in tar file. Could you please tell me where to find such file? Thanks!

opened by zzzmm1 1
CUDA version of the project.

Could you please tell me the CUDA version of the project cause I encounter the problem of "RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)". I think the description of CUDA version may help.

opened by 0ystersauce 0
Code to cluster labels on new datasets?

Hi

Thanks for releasing the code for your paper. I was trying out your code on a new dataset. Things are clear to me except the clustering part: can you please release the code to cluster your labels for a new dataset or describe the exact steps?

opened by kunaldahiya 7

Owner

GitHub

DECAF: Deep Extreme Classification with Label Features

DECAF DECAF: Deep Extreme Classification with Label Features @InProceedings{Mittal21, author = "Mittal, A. and Dahiya, K. and Agrawal, S. and Sain

46 Nov 6, 2022

Label Mask for Multi-label Classification

LM-MLC 一种基于完型填空的多标签分类算法 1 前言本文主要介绍本人在全球人工智能技术创新大赛【赛道一】设计的一种基于完型填空(模板)的多标签分类算法：LM-MLC，该算法拟合能力很强能感知标签关联性，在多个数据集上测试表明该算法与主流算法无显著性差异，在该比赛数据集上的dev效果很好，但是由

52 Nov 20, 2022

This is the implementation of our work Deep Extreme Cut (DEXTR), for object segmentation from extreme points.

828 Jan 5, 2023

Skipgram Negative Sampling in PyTorch

PyTorch SGNS Word2Vec's SkipGramNegativeSampling in Python. Yet another but quite general negative sampling loss implemented in PyTorch. It can be use

287 Dec 14, 2022

git《Self-Attention Attribution: Interpreting Information Interactions Inside Transformer》(AAAI 2021) GitHub:

Self-Attention Attribution This repository contains the implementation for AAAI-2021 paper Self-Attention Attribution: Interpreting Information Intera

60 Dec 29, 2022

Official implementation of paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".

Introdunction This is the official implementation of the paper "Query2Label: A Simple Transformer Way to Multi-Label Classification". Abstract This pa

274 Dec 28, 2022

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

236 Dec 22, 2022

PyTorch implementation of Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network

hierarchical-multi-label-text-classification-pytorch Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach This

17 Dec 13, 2022

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Semantic Diversity Learning for Zero-Shot Multi-label Classification Paper Official PyTorch Implementation Avi Ben-Cohen, Nadav Zamir, Emanuel Ben Bar

28 Aug 29, 2022

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

Open-set Label Noise Can Improve Robustness Against Inherent Label Noise NeurIPS 2021: This repository is the official implementation of ODNL. Require

12 Dec 7, 2022

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

60 Dec 31, 2022

git《Investigating Loss Functions for Extreme Super-Resolution》(CVPR 2020) GitHub:

Investigating Loss Functions for Extreme Super-Resolution NTIRE 2020 Perceptual Extreme Super-Resolution Submission. Our method ranked first and secon

0 Oct 17, 2022

Parallel and High-Fidelity Text-to-Lip Generation; AAAI 2022 ; Official code

Parallel and High-Fidelity Text-to-Lip Generation This repository is the official PyTorch implementation of our AAAI-2022 paper, in which we propose P

77 Dec 21, 2022

General Multi-label Image Classification with Transformers

General Multi-label Image Classification with Transformers Jack Lanchantin, Tianlu Wang, Vicente Ordóñez Román, Yanjun Qi Conference on Computer Visio

154 Dec 21, 2022

Official implementation for the paper: "Multi-label Classification with Partial Annotations using Class-aware Selective Loss"

Multi-label Classification with Partial Annotations using Class-aware Selective Loss Paper | Pretrained models Official PyTorch Implementation Emanuel

99 Dec 27, 2022

《LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classiﬁcation》(AAAI 2021) GitHub:

Related tags

Overview

LightXML

Datasets

Experiments

Comments

Question on dataset

LightXML on other dataset

Training about Wiki10-31k

Remove `apex` from requirements

Question related to the model training and usage of raw text

Only ensemble of 'bert-base', 'roberta' and 'xlnet' would get sota on dataset of Eurlex-4K, AmazonCat-13K

Question of the library apex in the requirements

version of Python&Bert pretrained model

FileNotFoundError: [Errno 2] No such file or directory: './data/Eurlex-4K/train_raw_texts.txt'

CUDA version of the project.

Code to cluster labels on new datasets?