Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

Last update: Nov 14, 2022

Related tags

Deep Learning semco

Overview

SemCo

The official pytorch implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training (appearing in CVPR2021)

Install Dependencies

Create a new environment and install dependencies using pip install -r requirements.txt
Install apex to enable automatic mixed precision training (AMP).

git clone https://github.com/NVIDIA/apex
cd apex
python setup.py install --cpp_ext --cuda_ext

Note: Installing apex is optional, if you don't want to implement amp, you can simply pass --no_amp command line argument to the launcher.

Dataset

We use a standard directory structure for all our datasets to enable running the code on any dataset of choice without the need to edit the dataloaders. The datasets directory follow the below structure (only shown for cifar100 but is the same for all other datasets):

datasets
└───cifar100
   └───train
       │   <image1>
       │   <image2>
       │   ...
   └───test
       │   <image1-test>
       │   <image2-test>
       │   ...
   └───labels
       │   labels_train.feather
       │   labels_test.feather

An example of the above directory structure for cifar100 can be found here.

To preprocess a generic dataset into the above format, you can refer to utils/utils.py for several examples.

To configure the datasets directory path, you can either set the environment variable SEMCO_DATA_PATH or pass a command line argument --dataset-path to the launcher. (e.g. export SEMCO_DATA_PATH=/home/data). Note that this path references the parent datasets directory which contains the different sub directories for the individual datasets (e.g. cifar100, mini-imagenet, etc.)

Label Semantics Embeddings

SemCo expects a prior representation of all class labels via a semantic embedding for each class name. In our experiments, we use embeddings obtained from ConceptNet knowledge graph which contains a total of ~550K term embeddings. SemCo uses a matching criteria to find the best embedding for each of the class labels. Alternatively, you can use class attributes as the prior (like we did for CUB200 dataset), so you can build your own semantic dictionary.

To run experiments, please download the semantic embedding file here and set the path to the downloaded file either via SEMCO_WV_PATH environment variable or --word-vec-path command line argument. (e.g. export SEMCO_WV_PATH=/home/inas0003/data/numberbatch-en-19.08_128D.dict.pkl

Defining the Splits

For each of the experiments, you will need to specify to the launcher 4 command line arguments:

--dataset-name: denoting the dataset directory name (e.g. cifar100)
--train-split-pickle: path to pickle file with training split
--valid-split-pickle: (optional) path to pickle file with validation/test split (by default contains all the files in the test folder)
--classes-pickle: (optional) path to pickle file with list of class names

To obtain the three pickle files for any dataset, you can use generate_tst_pkls.py script specifying the dataset name and the number of instances per label and optionally a random seed. Example as follows:

python generate_tst_pkls.py --dataset-name cifar100 --instances-per-label 10 --random-seed 000 --output-path splits

The above will generate a train split with 10 images per class using a random seed of 000 together with the class names and the validation split containing all the files placed in the test folder. This can be tweaked by editing the python script.

Training the model

To train the model on cifar100 with 40 labeled samples, you can run the script:

    $ python launch_semco.py --dataset-name cifar100 --train-split-pickle splits/cifar100_labelled_data_40_seed123.pkl --model_backbone=wres --wres-k=2

or without amp

    $ python launch_semco.py --dataset-name cifar100 --train-split-pickle splits/cifar100_labelled_data_40_seed123.pkl --model_backbone=wres --wres-k=2 --no_amp

Similary to train the model on mini_imagenet with 400 labeled samples, you can run the script:

    $  python launch_semco.py --dataset-name mini_imagenet --train-split-pickle testing/mini_imagenet_labelled_data_40_seed456.pkl --model_backbone=resnet18 --im-size=84 --cropsize=84

You might also like...

Simple and Robust Loss Design for Multi-Label Learning with Missing Labels

Simple and Robust Loss Design for Multi-Label Learning with Missing Labels Official PyTorch Implementation of the paper Simple and Robust Loss Design

28 Oct 27, 2022

A coin flip game in which you can put the amount of money below or equal to 1000 and then choose heads or tail

COIN_FLIPPY ##This is a simple example package. You can use Github-flavored Markdown to write your content. Coinflippy A coin flip game in which you c

2 Dec 26, 2021

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

Open-set Label Noise Can Improve Robustness Against Inherent Label Noise NeurIPS 2021: This repository is the official implementation of ODNL. Require

12 Dec 7, 2022

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

64 Dec 12, 2022

[CVPR 2022] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

Using Unreliable Pseudo Labels Official PyTorch implementation of Semi-Supervised Semantic Segmentation Using Unreliable Pseudo Labels, CVPR 2022. Ple

268 Dec 24, 2022

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

ROSITA News & Updates (24/08/2021) Release the demo to perform fine-grained semantic alignments using the pretrained ROSITA model. (15/08/2021) Releas

48 Dec 23, 2022

Pytorch implementation of the paper "Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization"

4 Sep 18, 2022

[CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

TorchSemiSeg [CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision by Xiaokang Chen1, Yuhui Yuan2, Gang Zeng1, Jingdong Wang

387 Jan 8, 2023

Label Mask for Multi-label Classification

LM-MLC 一种基于完型填空的多标签分类算法 1 前言本文主要介绍本人在全球人工智能技术创新大赛【赛道一】设计的一种基于完型填空(模板)的多标签分类算法：LM-MLC，该算法拟合能力很强能感知标签关联性，在多个数据集上测试表明该算法与主流算法无显著性差异，在该比赛数据集上的dev效果很好，但是由

52 Nov 20, 2022

Comments

OSError: [Errno 22] Invalid argument:

When I run cifar100 with semco model, I face the problem:

OSError: [Errno 22] Invalid argument: '\Desktop\semco-main\data\cifar100\x2500\2022-02-16_17:10:56_2500_labelled_instances.log'

How can I do for this problem, thanks!

opened by aihcyllop 9
The result of Epoch 299. Top1
When I run the following command, I got the result:

INFO - train - Epoch 299. Top1: 65.9493. Top5: 88.9249 result. I want to ask what other parameters can be adjusted to achieve better result, thank you! Or did I write something wrong? Thank you!

======= command

--word-vec-path C:/Users/Aihcyllop/Desktop/semco-main/numberbatch-en-19.08_128D.dict.pkl --dataset-name cifar100 --train-split-pickle splits/cifar100_labelled_data_25_seed123.pkl --model_backbone=wres --wres-k=2 --no_amp

======= result

2022-02-19 07:11:57,178 - INFO - train - Epoch 299. Top1: 65.9493. Top5: 88.9249. Top1_emb: 65.2811. Top5_emb: 82.3998. Top1_comb: 65.5660. best_metric: 66.1557 in epoch282 2022-02-19 07:11:57,180 - INFO - train - Break epoch is reached
opened by aihcyllop 1
Some questions about the reported results

Thanks a lot for your released codes and your excellent work. I have some questions about the reported results in your paper. (1) Does the reported results are obtained by calculating the mean accuracy value of the latest several (e.g., 20) iterations/epochs during the training phase or just select the best accuracy of someone iteration/epoch, or any other protocols? (2) Does all reported results (and the re-implemented FixMatch & MixMatch) are obtained by final EMA models (which seems better than student models, and the reported results in original FixMatch paper)? (3) As your paper indicates, the total training epoch is only 300, did you try to train model with more epochs on CIFAR (e.g., 1024 epochs as many other SOTA works did for fair comparison)?

opened by QiushiYang 1

Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

Related tags

Overview

SemCo

Install Dependencies

Dataset

Label Semantics Embeddings

Defining the Splits

Training the model

You might also like...

Simple and Robust Loss Design for Multi-Label Learning with Missing Labels

A coin flip game in which you can put the amount of money below or equal to 1000 and then choose heads or tail

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

[CVPR 2022] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Pytorch implementation of the paper "Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization"

[CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

Label Mask for Multi-label Classification

Comments

OSError: [Errno 22] Invalid argument:

The result of Epoch 299. Top1

======= command

======= result

Some questions about the reported results

Owner

noisy labels; missing labels; semi-supervised learning; entropy; uncertainty; robustness and generalisation.

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

Erpnext app for make employee salary on payroll entry based on one or more project with percentage for all project equal 100 %

Official source code to CVPR'20 paper, "When2com: Multi-Agent Perception via Communication Graph Grouping"

Enhancing Knowledge Tracing via Adversarial Training

A PyTorch implementation of ICLR 2022 Oral paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels].

Learning trajectory representations using self-supervision and programmatic supervision.

The repo of the preprinting paper "Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection"