《LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classification》(AAAI 2021) GitHub:

Overview

LightXML

LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classification

Datasets

LightXML uses the same dataset with AttentionXML and X-Transform.

please download them from the following links.

Experiments

train and eval

./run.sh [eurlex4k|wiki31k|amazon13k|amazon670k|wiki500k]
Comments
  • Question on dataset

    Question on dataset

    Hi, I am new to XMC. After downloading all datasets, I noticed that the labels for both test and train sets are number or index. I am wondering if there is a file storing the exact text label corresponding to each label index. Thank you so much.

    opened by jil818 6
  • LightXML on other dataset

    LightXML on other dataset

    Hi! Did I need to do something for label clustering when I train LightXML on my own datasets other than [eurlex4k|wiki31k|amazon13k|amazon670k|wiki500k] Why does LightXML only use label clustering on Wiki500k&670k?

    opened by wangchichi1999 6
  • Training about  Wiki10-31k

    Training about Wiki10-31k

    Thanks for your code !!!

    After running your code, I found that you just did a simply binary classification for Wiki10-31k (without label recalling , label ranking and dynamic negative sampling) :

    https://github.com/kongds/LightXML/blob/b9af9443004d3bce8b9116edfe038b702d1b295c/src/model.py#L100-L107

    image

    So, I wonder whether it was all you've done for Wiki10-31k or I missed something important, thanks a lot!

    opened by yc1999 3
  • Remove `apex` from requirements

    Remove `apex` from requirements

    The apex==0.9.10dev that you mentioned in the requirements.txt file is not a valid version of apex which you wanted to use. I got bellow error when running your code

      File "/usr/local/lib/python3.7/dist-packages/apex/__init__.py", line 13, in <module>
        from pyramid.session import UnencryptedCookieSessionFactoryConfig
    ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)
    

    According to this discussion best way to solve this issue is to:

    1. remove apex==0.9.10dev from requirements.txt
    2. add bellow instructions in readme as a step of requirement installation
    git clone https://github.com/NVIDIA/apex
    cd apex
    python setup.py install
    
    opened by sadrasabouri 3
  • Question related to the model training and usage of raw text

    Question related to the model training and usage of raw text

    Hello. Thank you very much for the impressive work and for sharing your work. I have one question related to the training process. In the paper, I read that LightXML can use raw text data to provide end-to-end prediction, similar to usual deep learning based approaches which use raw text.

    However, when I ran the LightXML code, I found that TfIdf feature is used in the train.txt file. I wonder why this TfIdf is necessary? Would the TfIdf affect the performance? Can we swap the TfIdf with other vectorizer (e.g., bag of word feature, word piece, etc.).

    Thank you very much for the help!

    opened by StefanusAgus 3
  • Only ensemble of 'bert-base', 'roberta' and 'xlnet' would get sota on dataset of Eurlex-4K, AmazonCat-13K

    Only ensemble of 'bert-base', 'roberta' and 'xlnet' would get sota on dataset of Eurlex-4K, AmazonCat-13K

    Only ensemble of 'bert-base', 'roberta' and 'xlnet' would get sota on dataset of Eurlex-4K, AmazonCat-13K. Is that realy?

    code from 'src/model.py':

    is_training = labels is not None
    outs = self.bert(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids
    )[-1]
    
    out = torch.cat([outs[-i][:, 0] for i in range(1, self.feature_layers+1)], dim=-1)
    out = self.drop_out(out)
    group_logits = self.l0(out)
    if self.group_y is None:
        logits = group_logits
        if is_training:
            loss_fn = torch.nn.BCEWithLogitsLoss()
            loss = loss_fn(logits, labels)
            return logits, loss
        else:
            return logits
    
    opened by yongzhuo 2
  • Question of the library apex in the requirements

    Question of the library apex in the requirements

    The "requirements.txt" shows the level of the apex is 0.9.10dev. Do you mean the apex from the pypi(https://libraries.io/pypi/apex) or the Nivida apex(https://github.com/NVIDIA/apex)?

    opened by 0ystersauce 2
  • version of Python&Bert pretrained model

    version of Python&Bert pretrained model

    Hi! Great work! : D If the version of Python is 3.6 in your code? And Does LightXML just use Bert-base-uncased from Huggingface, or train it on specific corpus for specific task? Thanks!

    opened by wangchichi1999 2
  • FileNotFoundError: [Errno 2] No such file or directory: './data/Eurlex-4K/train_raw_texts.txt'

    FileNotFoundError: [Errno 2] No such file or directory: './data/Eurlex-4K/train_raw_texts.txt'

    Thanks for your excellent work! I download EUR-Lex.tar but only find train.txt, train_labels.txt, train_texts.txt, test.txt ,test_labels.txt and test_texts.txt. There is no Eurlex-4K/train_raw_texts.txt in tar file. Could you please tell me where to find such file? Thanks!

    opened by zzzmm1 1
  • CUDA version of the project.

    CUDA version of the project.

    Could you please tell me the CUDA version of the project cause I encounter the problem of "RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)". I think the description of CUDA version may help.

    opened by 0ystersauce 0
  • Code to cluster labels on new datasets?

    Code to cluster labels on new datasets?

    Hi

    Thanks for releasing the code for your paper. I was trying out your code on a new dataset. Things are clear to me except the clustering part: can you please release the code to cluster your labels for a new dataset or describe the exact steps?

    opened by kunaldahiya 7
Owner
null
DECAF: Deep Extreme Classification with Label Features

DECAF DECAF: Deep Extreme Classification with Label Features @InProceedings{Mittal21, author = "Mittal, A. and Dahiya, K. and Agrawal, S. and Sain

null 46 Nov 6, 2022
Label Mask for Multi-label Classification

LM-MLC 一种基于完型填空的多标签分类算法 1 前言 本文主要介绍本人在全球人工智能技术创新大赛【赛道一】设计的一种基于完型填空(模板)的多标签分类算法:LM-MLC,该算法拟合能力很强能感知标签关联性,在多个数据集上测试表明该算法与主流算法无显著性差异,在该比赛数据集上的dev效果很好,但是由

null 52 Nov 20, 2022
This is the implementation of our work Deep Extreme Cut (DEXTR), for object segmentation from extreme points.

This is the implementation of our work Deep Extreme Cut (DEXTR), for object segmentation from extreme points.

Sergi Caelles 828 Jan 5, 2023
Skipgram Negative Sampling in PyTorch

PyTorch SGNS Word2Vec's SkipGramNegativeSampling in Python. Yet another but quite general negative sampling loss implemented in PyTorch. It can be use

Jamie J. Seol 287 Dec 14, 2022
git《Self-Attention Attribution: Interpreting Information Interactions Inside Transformer》(AAAI 2021) GitHub:

Self-Attention Attribution This repository contains the implementation for AAAI-2021 paper Self-Attention Attribution: Interpreting Information Intera

null 60 Dec 29, 2022
Official implementation of paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".

Introdunction This is the official implementation of the paper "Query2Label: A Simple Transformer Way to Multi-Label Classification". Abstract This pa

Shilong Liu 274 Dec 28, 2022
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
PyTorch implementation of Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network

hierarchical-multi-label-text-classification-pytorch Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach This

Mingu Kang 17 Dec 13, 2022
Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Semantic Diversity Learning for Zero-Shot Multi-label Classification Paper Official PyTorch Implementation Avi Ben-Cohen, Nadav Zamir, Emanuel Ben Bar

null 28 Aug 29, 2022
Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

Open-set Label Noise Can Improve Robustness Against Inherent Label Noise NeurIPS 2021: This repository is the official implementation of ODNL. Require

Hongxin Wei 12 Dec 7, 2022
:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

huybery 60 Dec 31, 2022
git《Investigating Loss Functions for Extreme Super-Resolution》(CVPR 2020) GitHub:

Investigating Loss Functions for Extreme Super-Resolution NTIRE 2020 Perceptual Extreme Super-Resolution Submission. Our method ranked first and secon

Sejong Yang 0 Oct 17, 2022
Parallel and High-Fidelity Text-to-Lip Generation; AAAI 2022 ; Official code

Parallel and High-Fidelity Text-to-Lip Generation This repository is the official PyTorch implementation of our AAAI-2022 paper, in which we propose P

Zhying 77 Dec 21, 2022
General Multi-label Image Classification with Transformers

General Multi-label Image Classification with Transformers Jack Lanchantin, Tianlu Wang, Vicente Ordóñez Román, Yanjun Qi Conference on Computer Visio

QData 154 Dec 21, 2022
Official implementation for the paper: "Multi-label Classification with Partial Annotations using Class-aware Selective Loss"

Multi-label Classification with Partial Annotations using Class-aware Selective Loss Paper | Pretrained models Official PyTorch Implementation Emanuel

null 99 Dec 27, 2022
A benchmark dataset for mesh multi-label-classification based on cube engravings introduced in MeshCNN

Double Cube Engravings This script creates a dataset for multi-label mesh clasification, with an intentionally difficult setup for point cloud classif

Yotam Erel 1 Nov 30, 2021
This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

null 4 Aug 2, 2022
A PyTorch implementation of ICLR 2022 Oral paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

PiCO: Contrastive Label Disambiguation for Partial Label Learning This is a PyTorch implementation of ICLR 2022 Oral paper PiCO; also see our Project

王皓波 83 May 11, 2022
Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Vision Transformer with Progressive Sampling This is the official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

yuexy 123 Jan 1, 2023