A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.

Ming

Last update: Jan 8, 2023

Related tags

Overview

Imbalanced Dataset Sampler

Introduction

In many machine learning applications, we often come across datasets where some types of data may be seen more than other types. Take identification of rare diseases for example, there are probably more normal samples than disease ones. In these cases, we need to make sure that the trained model is not biased towards the class that has more data. As an example, consider a dataset where there are 5 disease images and 20 normal images. If the model predicts all images to be normal, its accuracy is 80%, and F1-score of such a model is 0.88. Therefore, the model has high tendency to be biased toward the ‘normal’ class.

To solve this problem, a widely adopted technique is called resampling. It consists of removing samples from the majority class (under-sampling) and / or adding more examples from the minority class (over-sampling). Despite the advantage of balancing classes, these techniques also have their weaknesses (there is no free lunch). The simplest implementation of over-sampling is to duplicate random records from the minority class, which can cause overfitting. In under-sampling, the simplest technique involves removing random records from the majority class, which can cause loss of information.

resampling

In this repo, we implement an easy-to-use PyTorch sampler ImbalancedDatasetSampler that is able to

rebalance the class distributions when sampling from the imbalanced dataset
estimate the sampling weights automatically
avoid creating a new balanced dataset
mitigate overfitting when it is used in conjunction with data augmentation techniques

Usage

For a simple start install the package via one of following ways:

pip install https://github.com/ufoym/imbalanced-dataset-sampler/archive/master.zip

Simply pass an ImbalancedDatasetSampler for the parameter sampler when creating a DataLoader. For example:

from torchsampler import ImbalancedDatasetSampler

train_loader = torch.utils.data.DataLoader(
    train_dataset,
    sampler=ImbalancedDatasetSampler(train_dataset),
    batch_size=args.batch_size,
    **kwargs
)

Then in each epoch, the loader will sample the entire dataset and weigh your samples inversely to your class appearing probability.

Example: Imbalanced MNIST Dataset

Distribution of classes in the imbalanced dataset:

With Imbalanced Dataset Sampler:

(left: test acc in each epoch; right: confusion matrix)

Without Imbalanced Dataset Sampler:

(left: test acc in each epoch; right: confusion matrix)

Note that there are significant improvements for minor classes such as 2 6 9, while the accuracy of the other classes is preserved.

Contributing

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion. If you plan to contribute new features, utility functions or extensions, please first open an issue and discuss the feature with us.

Licensing

MIT licensed.

Comments

NotImplemented Error while running ImbalancedDatasetSampler

I followed the steps exactly according to the readme file. Yet I am getting a notimplemented error. There's no explanation for the error as well.

Here's my code: `from torchvision import transforms from torchsampler import ImbalancedDatasetSampler

batch_size = 128 val_split = 0.2 shuffle_dataset=True random_seed=42

dataset_size = len(melanoma_dataset) indices = list(range(dataset_size)) split = int(np.floor(val_split * dataset_size)) if shuffle_dataset : np.random.seed(random_seed) np.random.shuffle(indices) train_indices, test_indices = indices[split:], indices[:split]

train_loader = torch.utils.data.DataLoader(melanoma_dataset,batch_size=batch_size,sampler=ImbalancedDatasetSampler(melanoma_dataset)) test_loader = torch.utils.data.DataLoader(melanoma_dataset,batch_size=batch_size,sampler=test_sampler)`

opened by aryamansriram 8

'MyDataset' object has no attribute 'get_labels'

When I try to use my own Dataset class, I get the error 'MyDataset' object has no attribute 'get_labels' and cannot proceed.

The content of the Dataloader is as follows, and there is nothing strange about it. It processes the image data and label data in .npz format.

class MyDataset(data.Dataset):
    def __init__(self, images, labels, transform=None):
        self.images = images
        self.labels = labels
        self.transform = transform

    def __len__(self):
        return len(self.images)

    def __getitem__(self, index):
        image = self.images[index]
        label = self.labels[index]

        if self.transform is not None:
            image = self.transform(image=image)["image"]

        return image, label

train_dataset = MyDataset(train_imgs, train_labels, transform=transform)
train_dataloader = torch.utils.data.DataLoader(train_dataset,
                                               sampler=ImbalancedDatasetSampler(train_dataset),
                                               batch_size= batch_size,
                                               shuffle=True,
                                               num_workers=2)

Is there something wrong with the code? I don't think it's a typo.

How can I fix it so that it works correctly?

opened by kuri54 5

Reduce time for handling large imbalanced dataset

In the current code, it has lots of costs for calculating weights and statistics of the number of each label when calling init(). It might be a problem when incoming a large dataset, so I changed this part using pandas. I checked it reduces time within 1 second when I use over 5 million data. (In previous code, it takes 20 minutes to use a sampler)

opened by hwany-j 3
pip install error

Follows your method to install the package: git clone https://github.com/ufoym/imbalanced-dataset-sampler.git cd imbalanced-dataset-sampler python setup.py install pip install .

But when I run "pip install .", I met the error as follows: FileNotFoundError: [Errno 2] No such file or directory: '/home/miniconda3/envs/pytorch/lib/python3.7/site-packages/torchsampler-0.1-py3.7.egg'

How can I resolve it?

opened by wly-ai-bj 3
Publish package to PyPi and GitHub Releases
Here at https://github.com/neuropoly/ we think your project is very useful and would like to build on it! Unfortunately if we write a setup.cfg for ourselves that depends on "torchsampler", it fails to install, because https://pypi.org/project/torchsampler/ is a 404:

This Workflow script makes publishing the conventional way easy and reliable, and will mean that projects like ours can build on your work!

To use it,

Create an account at https://pypi.org/

Go to https://pypi.org/manage/account/token/ and make a token

Go to then go to https://github.com/kousu/imbalanced-dataset-sampler/settings/secrets/actions and paste the token in there, with name "PYPI_TOKEN"

Then every time you are ready to publish,

Go to https://github.com/ufoym/imbalanced-dataset-sampler/releases/new,

Fill in a new tag like "1.0.0"

Click Publish.

It will run the Action and then in a minute or so will show up on https://github.com/kousu/imbalanced-dataset-sampler/releases/ and https://pypi.org/project/torchsampler/#history

For your first few runs, while you get used to this script, I recommend using "rc" suffixes to post versions without committing to them. For example, I've used this script myself to produce

the ones with the yellow "pre-release" tags get ignored by pip by default, unless a user opts into them with pip install --pre.

Publishing to PyPI will avoid issues like

https://github.com/ufoym/imbalanced-dataset-sampler/issues/20

https://github.com/ufoym/imbalanced-dataset-sampler/issues/15

https://github.com/ufoym/imbalanced-dataset-sampler/issues/12

and again, let us build on your work. Without publishing to pypi, it means our install instructions need to include yours:

pip install https://github.com/ufoym/imbalanced-dataset-sampler/archive/master.zip neuropoly-seg-model

which is pretty awkward for us.

Thanks a lot for your work, it's really helping out our research prototyping.
opened by kousu 2
latest commit cannot get correct labels from ImageFolder dataset
the ImageFolder.imgs return a list of 2-element tuple, such as:

[ ('classA/img01.jpg', 0), ('classB/img01.jpg', 1), ... ('classN/img01.jpg', N-1) ]

if we use dataset.imgs[:][1], it cannot not get correct labels of all samples.

at this line: https://github.com/ufoym/imbalanced-dataset-sampler/blob/756b0b61ca48a026e9a5c216296a520a10faf7df/torchsampler/imbalanced.py#L46
opened by TriLoo 2
reduce the time when calling sampler

In the current code, it has lots of costs for calculating weights and statistics of the number of each label when calling init(). It might be a problem when incoming a large dataset, so I changed this part using pandas. I checked it reduces time within 1 second when I use over 5 million data. (In previous code, it takes 20 minutes to use a sampler)

opened by hwany-j 2
Use setuptools_scm
Follows up https://github.com/ufoym/imbalanced-dataset-sampler/pull/47

on pypi it's 0.1.1

on github it's 0.1.0

but the attached wheel and sdist are 0.1.1

setuptools-scm makes git the single-source of truth for the version of the package, and works well with the build script in #47 (which is triggered by tagging a new version in git).

The other thing setuptools-scm does is make git the single source of truth for your MANIFEST, so this drops check-manifest and also most of the contents of your MANIFEST.in -- except the prune lines, those are still doing something for the moment.
opened by kousu 1
Too much time cost

It slower than before too many times when I use this sampler

(self.indices[i] for i in torch.multinomial(self.weights, self.num_samples, replacement=True)) it seems that this expression cost too much time!

Any one have any solution?

opened by nothingeasy 1
AttributeError: 'ConcatDataset' object has no attribute 'img_norm_cfg'

when i run test.py, there is an error: File "tools/test.py", line 211, in
main()
File "tools/test.py", line 181, in main
outputs = single_gpu_test(model, data_loader, args.show, args.log_dir)
File "tools/test.py", line 39, in single_gpu_test
model.module.show_result(data, result, dataset.img_norm_cfg, dataset='DOTA1_5') AttributeError: 'ConcatDataset' object has no attribute 'img_norm_cfg'

How can I solve this problem?

opened by ZSX2018 0
I think solve issue 32

https://github.com/ufoym/imbalanced-dataset-sampler/issues/32

this issue, return one element when ImageFolder, so, i changed return part

I hope this helps.

opened by LeeTaeHoon97 0
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
Hi, I am using BERT for multi label classification. The dataset is imbalance and I use ImbalancedDatasetSampler as the sampler.

The train data has been tokenized, has id, mask and label:

(tensor([ 101, 112, 872, 4761, 6887, 1914, 840, 1914, 7353, 6818, 3300, 784, 720, 1408, 136, 1506, 1506, 3300, 4788, 2357, 5456, 119, 119, 119, 4696, 4638, 741, 677, 1091, 4638, 872, 1420, 1521, 119, 119, 119, 872, 2157, 6929, 1779, 4788, 2357, 3221, 686, 4518, 677, 3297, 1920, 4638, 4788, 2357, 117, 1506, 1506, 117, 7745, 872, 4638, 1568, 2124, 3221, 6432, 2225, 1217, 2861, 4478, 4105, 2357, 3221, 686, 4518, 677, 3297, 1920, 4638, 4105, 2357, 1568, 119, 119, 119, 1506, 1506, 1506, 112, 112, 4268, 4268, 117, 1961, 4638, 1928, 1355, 5456, 106, 2769, 812, 1920, 2812, 7370, 3488, 2094, 6963, 6206, 5436, 677, 3341, 2769, 4692, 1168, 3312, 1928, 5361, 7027, 3300, 1928, 1355, 119, 119, 119, 671, 2137, 3221, 8584, 809, 1184, 1931, 1168, 4638, 117, 872, 6432, 3221, 679, 3221, 136, 138, 4495, 4567, 140, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor(0))

When using

from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

they are fine:

batch_size=3 dataloader_train_o = DataLoader( dataset_train, sampler=RandomSampler(dataset_train), batch_size=batch_size, # **kwargs )

However, replace the sampler to ImbalancedDatasetSampler

batch_size=3 dataloader_train_o = DataLoader( dataset_train, sampler=ImbalancedDatasetSampler(dataset_train), batch_size=batch_size, # **kwargs )

The error print below

ValueError Traceback (most recent call last) File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\frame.py:3892, in DataFrame._ensure_valid_index(self, value) 3891 try: -> 3892 value = Series(value) 3893 except (ValueError, NotImplementedError, TypeError) as err:

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\series.py:451, in Series.init(self, data, index, dtype, name, copy, fastpath) 450 else: --> 451 data = sanitize_array(data, index, dtype, copy) 453 manager = get_option("mode.data_manager")

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\construction.py:601, in sanitize_array(data, index, dtype, copy, raise_cast_failure, allow_2d) 599 subarr = maybe_infer_to_datetimelike(subarr) --> 601 subarr = _sanitize_ndim(subarr, data, dtype, index, allow_2d=allow_2d) 603 if isinstance(subarr, np.ndarray): 604 # at this point we should have dtype be None or subarr.dtype == dtype

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\construction.py:652, in _sanitize_ndim(result, data, dtype, index, allow_2d) 651 return result --> 652 raise ValueError("Data must be 1-dimensional") 653 if is_object_dtype(dtype) and isinstance(dtype, ExtensionDtype): 654 # i.e. PandasDtype("O")

ValueError: Data must be 1-dimensional

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last) Input In [49], in <cell line: 5>() 2 from torchsampler import ImbalancedDatasetSampler 4 batch_size=3 5 dataloader_train_o = DataLoader( 6 dataset_train, ----> 7 sampler=ImbalancedDatasetSampler(dataset_train), 8 batch_size=batch_size, 9 # **kwargs 10 ) 12 dataloader_validation_o = DataLoader( 13 dataset_val, 14 sampler=SequentialSampler(dataset_val), 15 batch_size=batch_size, 16 # **kwargs 17 )

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\torchsampler\imbalanced.py:37, in ImbalancedDatasetSampler.init(self, dataset, labels, indices, num_samples, callback_get_label) 35 # distribution of classes in the dataset 36 df = pd.DataFrame() ---> 37 df["label"] = self._get_labels(dataset) if labels is None else labels 38 df.index = self.indices 39 df = df.sort_index()

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\frame.py:3655, in DataFrame.setitem(self, key, value) 3652 self._setitem_array([key], value) 3653 else: 3654 # set column -> 3655 self._set_item(key, value)

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\frame.py:3832, in DataFrame._set_item(self, key, value) 3822 def _set_item(self, key, value) -> None: 3823 """ 3824 Add series to DataFrame in specified column. 3825 (...) 3830 ensure homogeneity. 3831 """ -> 3832 value = self._sanitize_column(value) 3834 if ( 3835 key in self.columns 3836 and value.ndim == 1 3837 and not is_extension_array_dtype(value) 3838 ): 3839 # broadcast across multiple columns if necessary 3840 if not self.columns.is_unique or isinstance(self.columns, MultiIndex):

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\frame.py:4528, in DataFrame._sanitize_column(self, value) 4515 def _sanitize_column(self, value) -> ArrayLike: 4516 """ 4517 Ensures new columns (which go into the BlockManager as new blocks) are 4518 always copied and converted into an array. (...) 4526 numpy.ndarray or ExtensionArray 4527 """ -> 4528 self._ensure_valid_index(value) 4530 # We should never get here with DataFrame value 4531 if isinstance(value, Series):

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\frame.py:3894, in DataFrame._ensure_valid_index(self, value) 3892 value = Series(value) 3893 except (ValueError, NotImplementedError, TypeError) as err: -> 3894 raise ValueError( 3895 "Cannot set a frame with no defined index " 3896 "and a value that cannot be converted to a Series" 3897 ) from err 3899 # GH31368 preserve name of index 3900 index_copy = value.index.copy()

ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
opened by zhangjizby 1

Implementation for Pytorch-geometric dataset

I have added a few lines that allow to work with pytorch-geometric dataset. Since Pytorch-geometric data is saved as a list before being loaded by a Pytorch-geometric Dataloader, the modification is pretty simple. Hope this could be helpful to someone.

Best,

Anna

`from typing import Callable

import pandas as pd import torch import torch.utils.data import torchvision

class ImbalancedDatasetSampler(torch.utils.data.sampler.Sampler): """Samples elements randomly from a given list of indices for imbalanced dataset

Arguments:
    indices: a list of indices
    num_samples: number of samples to draw
    callback_get_label: a callback-like function which takes two arguments - dataset and index
"""

def __init__(self, dataset, indices: list = None, num_samples: int = None, callback_get_label: Callable = None):
    # if indices is not provided, all elements in the dataset will be considered
    self.indices = list(range(len(dataset))) if indices is None else indices

    # define custom callback
    self.callback_get_label = callback_get_label

    # if num_samples is not provided, draw `len(indices)` samples in each iteration
    self.num_samples = len(self.indices) if num_samples is None else num_samples

    # distribution of classes in the dataset
    df = pd.DataFrame()
    df["label"] = self._get_labels(dataset)
    df.index = self.indices
    df = df.sort_index()

    label_to_count = df["label"].value_counts()

    weights = 1.0 / label_to_count[df["label"]]

    self.weights = torch.DoubleTensor(weights.to_list())

def _get_labels(self, dataset):
    if self.callback_get_label:
        return self.callback_get_label(dataset)
    elif isinstance(dataset, torchvision.datasets.MNIST):
        return dataset.train_labels.tolist()
    elif isinstance(dataset, torchvision.datasets.ImageFolder):
        return [x[1] for x in dataset.imgs]
    elif isinstance(dataset, torchvision.datasets.DatasetFolder):
        return dataset.samples[:][1]
    elif isinstance(dataset, torch.utils.data.Subset):
        return dataset.dataset.imgs[:][1]
    elif isinstance(dataset, torch.utils.data.Dataset):
        return dataset.get_labels()
    elif isinstance(dataset, list):
        return [dataset[i].y.item() for i in range(len(dataset))]  #here the modification
    else:
        raise NotImplementedError

def __iter__(self):
    return (self.indices[i] for i in torch.multinomial(self.weights, self.num_samples, replacement=True))

def __len__(self):
    return self.num_samples

opened by avarbella 0

callback_get_label no longer works as described

callback_get_label: a callback-like function which takes two arguments - dataset and index

This no longer seems to be the case?

Please update how the new use-case looks like, because above commit breaks a lot of my previous code.

@hwany-j

opened by zimonitrome 3

Releases(v0.1.2)

v0.1.2(May 23, 2022)

Source code(tar.gz)
Source code(zip)
torchsampler-0.1.2-py3-none-any.whl(5.47 KB)
torchsampler-0.1.2.tar.gz(6.44 KB)
v0.1.0(May 18, 2022)

Source code(tar.gz)
Source code(zip)
torchsampler-0.1.1-py3-none-any.whl(5.29 KB)
torchsampler-0.1.1.tar.gz(5.63 KB)

Owner

Ming

PhD@SYSU -> Researcher@CVTE

GitHub

Official implementation of "Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation" (RSS 2022)

Intro Official implementation of "Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation" Robotics:Science and

21 Dec 7, 2022

Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

HLA-Face: Joint High-Low Adaptation for Low Light Face Detection The official PyTorch implementation for HLA-Face: Joint High-Low Adaptation for Low L

77 Dec 8, 2022

MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

This repository is the official PyTorch implementation of Meta-Balance. Find the paper on arxiv MetaBalance: High-Performance Neural Networks for Clas

20 Oct 18, 2021

Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

R2RNet Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network." Jiang Hai, Zhu Xuan, Ren Yang, Yutong Hao, Fengzhu

77 Dec 24, 2022

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

RSG: A Simple but Effective Module for Learning Imbalanced Datasets (CVPR 2021) A Pytorch implementation of our CVPR 2021 paper "RSG: A Simple but Eff

120 Dec 12, 2022

Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

70 Jan 3, 2023

Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

How The New York Times can increase Engagement on Facebook Using machine learning to understand characteristics of news content that garners "high" Fa

0 Sep 16, 2021

Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

Punctuation Restoration using Transformer Models This repository contins official implementation of the paper Punctuation Restoration using Transforme

142 Jan 1, 2023

[ICLR 2021] Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization Kaidi Cao, Yining Chen, Junwei Lu, Nikos Arechiga, Adrien Gaidon, Tengyu Ma

29 Oct 20, 2022

The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

Dice Loss for NLP Tasks This repository contains code for Dice Loss for Data-imbalanced NLP Tasks at ACL2020. Setup Install Package Dependencies The c

223 Dec 17, 2022

[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression

Delving into Deep Imbalanced Regression This repository contains the implementation code for paper: Delving into Deep Imbalanced Regression Yuzhe Yang

568 Dec 30, 2022

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

24 Dec 17, 2022

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness Code for Paper "Imbalanced Gradients: A Subtle Cause of Overestimated Adv

11 Nov 30, 2022

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

Balanced-Evolutionary-Semi-Stacking Code for the paper ''BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalan

0 Jan 16, 2022

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision Project | Arxiv | Abstract It is very challenging for various visual tasks such as image

377 Jan 7, 2023

Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning" (AAAI 2021)

Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic

30 Dec 6, 2022

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

66 Dec 26, 2022

Nicely is a real-time Feedback and Intervention Program Depression is a prevalent issue across all age groups, socioeconomic classes, and cultural identities.

1 Jan 16, 2022

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

HAWQ: Hessian AWare Quantization HAWQ is an advanced quantization library written for PyTorch. HAWQ enables low-precision and mixed-precision uniform

293 Dec 30, 2022

A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.

Related tags

Overview

Imbalanced Dataset Sampler

Introduction

Usage

Example: Imbalanced MNIST Dataset

Contributing

Licensing

Comments

Releases(v0.1.2)

v0.1.2(May 23, 2022)

v0.1.0(May 18, 2022)

Owner

Ming

Official implementation of "Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation" (RSS 2022)

Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

[ICLR 2021] Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision

Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning" (AAAI 2021)

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Nicely is a real-time Feedback and Intervention Program Depression is a prevalent issue across all age groups, socioeconomic classes, and cultural identities.

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.