This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

Last update: Oct 30, 2022

Related tags

Deep Learning DC-GMM

Overview

Deep Conditional Gaussian Mixture Model for Constrained Clustering.

This repository holds the code for the paper Deep Conditional Gaussian Mixture Model for Constrained Clustering.

Motivation

Clustering with constraints has gained significant attention in the field of constrained machine learning as it can leverage partial prior information on a growing amount of unlabelled data. Following recent advances in deep generative models, we derive a novel probabilistic approach to constrained clustering that can be trained efficiently in the framework of stochastic gradient variational Bayes. In contrast to existing approaches, our model (DC-GMM) uncovers the underlying distribution of the data conditioned on prior clustering preferences, expressed as \textit{pairwise constraints}. The inclusion of such constraints allows the user to drive the clustering process towards a desirable configuration by indicating which samples should or should not belong to the same class.

Data Download

To download Reuters data, run the following:

cd dataset/reuters

sh download_data.sh

Download STL data (Matlab files) from https://cs.stanford.edu/~acoates/stl10/. Save them in dataset/stl10/stl10_matlab. Then run the following:

cd dataset/stl10

python compute_stl_features.py

To download and configure the UTKFace datset:

Download the cropped and aligned dataset archive from https://susanqq.github.io/UTKFace/
Extract the images from this archive to <code root>/dataset/utkface

Implementation

To run DC-GMM using the default setting on MNIST data set:

python main.py --pretrain True

To run DC-GMM without pairwise constraints using the default setting:

python main.py --pretrain True --num_constrains 0

To choose different configurations of the hyper-parameters:

python main.py --data ... num_constrains ... --alpha ... --lr ...

Important hyper-parameters:

data: choose from MNIST, fMNIST, Reuters, har, utkface
num_constrains: by default it should be set to 6000 (note that the total number of pairwise constraints in a dataset is O(N*N))
alpha: measure the confidence in your labels (default is 10000)
pretrain: False if you want to use your own pretrain weights

Pairwise constraints

In the current implementation, the pairwise constraints are obtained from labels by randomly sampled two data points and assigning a must-link constraint (+1) if the two samples have the same label and a cannot-link constraint (-1) otherwise. The pairwise constraints are stored in a matrix W. See the file: source/data.py

Comments

About the problem that the results of the paper cannot be reproduced

I have done five experiments on MNIST, and the accuracy results on the test set are as follows: 0.895200, 0.893200, 0.959100, 0.953000, 0.961500. The mean is 0.9324 and the standard deviation is 0.03, which is different from the results in the paper. Is there a solution?

opened by LPH0 1
Some question about other datasets

Hello, Thank you for your code. It helps me a lot. Because I saw the cifar10.yml, I want to make sure if I use cifar10, may I have to extract features using ResNet-50 like STL-50 as pre-processing and what else should I notice. Thanks a lot!

opened by V1oletM 0

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

XL-Sum This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Lang

190 Jan 3, 2023

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Deep Continuous Clustering Introduction This is a Pytorch implementation of the DCC algorithms presented in the following paper (paper): Sohil Atul Sh

197 Nov 29, 2022

This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Omnimatte in PyTorch This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effect

728 Dec 28, 2022

This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks Project Page | Video | Presentation | Paper | Data L

281 Dec 22, 2022

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

41 Nov 8, 2022

This repository contains the code and models for the following paper.

DC-ShadowNet Introduction This is an implementation of the following paper DC-ShadowNet: Single-Image Hard and Soft Shadow Removal Using Unsupervised

65 Dec 27, 2022

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

42 Jan 7, 2023

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

MultiModal-InfoMax This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Informa

Deep Cognition and Language Research (DeCLaRe) Lab

89 Dec 26, 2022

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

TUCH This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright License fo

45 Jan 7, 2023

This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

Related tags

Overview

Deep Conditional Gaussian Mixture Model for Constrained Clustering.

Motivation

Data Download

Implementation

Pairwise constraints

You might also like...

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

This repository contains the code and models for the following paper.

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

Comments

About the problem that the results of the paper cannot be reproduced

Some question about other datasets

Owner

This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

A code repository associated with the paper A Benchmark for Rough Sketch Cleanup by Chuan Yan, David Vanderhaeghe, and Yotam Gingold from SIGGRAPH Asia 2020.

Code repository for paper `Skeleton Merger: an Unsupervised Aligned Keypoint Detector`.

Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"