A Python library for Deep Probabilistic Modeling

Overview

MIT license PyPI version

Logo

Abstract

DeeProb-kit is a Python library that implements deep probabilistic models such as various kinds of Sum-Product Networks, Normalizing Flows and their possible combinations for probabilistic inference. Some models are implemented using PyTorch for fast training and inference on GPUs.

Features

  • Inference algorithms for SPNs. 1 4
  • Learning algorithms for SPNs structure. 1 2 3 4
  • Chow-Liu Trees (CLT) as SPN leaves. 11 12
  • Batch Expectation-Maximization (EM) for SPNs with arbitrarily leaves. 13 14
  • Structural marginalization and pruning algorithms for SPNs.
  • High-order moments computation for SPNs.
  • JSON I/O operations for SPNs and CLTs. 4
  • Plotting operations based on NetworkX for SPNs and CLTs. 4
  • Randomized And Tensorized SPNs (RAT-SPNs) using PyTorch. 5
  • Masked Autoregressive Flows (MAFs) using PyTorch. 6
  • Real Non-Volume-Preserving (RealNVP) and Non-linear Independent Component Estimation (NICE) flows. 7 8
  • Deep Generalized Convolutional SPNs (DGC-SPNs) using PyTorch. 10

The collection of implemented models is summarized in the following table. The supported data dimensionality for each model is showed in the Input Dimensionality column. Moreover, the Supervised column tells which model is suitable for a supervised learning task, other than density estimation task.

Model Description Input Dimensionality Supervised
Binary-CLT Binary Chow-Liu Tree (CLT) D
SPN Vanilla Sum-Product Network, using LearnSPN D
RAT-SPN Randomized and Tensorized Sum-Product Network D
DGC-SPN Deep Generalized Convolutional Sum-Product Network (1, D, D); (3, D, D)
MAF Masked Autoregressive Flow D
NICE Non-linear Independent Components Estimation Flow (1, H, W); (3, H, W)
RealNVP Real-valued Non-Volume-Preserving Flow (1, H, W); (3, H, W)

Installation & Documentation

The library can be installed either from PIP repository or by source code.

# Install from PIP repository
pip install deeprob-kit
# Install from `main` git branch
pip install -e git+https://github.com/deeprob-org/deeprob-kit.git@main#egg=deeprob-kit

The documentation is generated automatically by Sphinx (with Read-the-Docs theme), and it's hosted using GitHub Pages at deeprob-kit.

Datasets and Experiments

A collection of 29 binary datasets, which most of them are used in Probabilistic Circuits literature, can be found at UCLA-StarAI-Binary-Datasets.

Moreover, a collection of 5 continuous datasets, commonly present in works regarding Normalizing Flows, can be found at MAF-Continuous-Datasets.

After downloading them, the datasets must be stored in the experiments/datasets directory to be able to run the experiments (and Unit Tests). The experiments scripts are available in the experiments directory and can be launched using the command line by specifying the dataset and hyper-parameters.

Code Examples

A collection of code examples can be found in the examples directory. However, the examples are not intended to produce state-of-the-art results, but only to present the library.

The following table contains a description about them and a code complexity ranging from one to three stars. The Complexity column consists of a measure that roughly represents how many features of the library are used, as well as the expected time required to run the script.

Example Description Complexity
naive_model.py Learn, evaluate and print statistics about a naive factorized model.
spn_plot.py Instantiate, prune, marginalize and plot some SPNs.
clt_plot.py Learn a Binary CLT and plot it.
spn_moments.py Instantiate and compute moments statistics about the random variables.
sklearn_interface.py Learn and evaluate a SPN using the scikit-learn interface.
spn_custom_leaf.py Learn, evaluate and serialize a SPN with a user-defined leaf distribution.
clt_to_spn.py Learn a Binary CLT, convert it to a structured decomposable SPN and plot it.
spn_clt_em.py Instantiate a SPN with Binary CLTs, apply EM algorithm and sample some data.
clt_queries.py Learn a Binary CLT, plot it, run some queries and sample some data.
ratspn_mnist.py Train and evaluate a RAT-SPN on MNIST.
dgcspn_olivetti.py Train, evaluate and complete some images with DGC-SPN on Olivetti-Faces.
dgcspn_mnist.py Train and evaluate a DGC-SPN on MNIST.
nvp1d_moons.py Train and evaluate a 1D RealNVP on Moons dataset.
maf_cifar10.py Train and evaluate a MAF on CIFAR10.
nvp2d_mnist.py Train and evaluate a 2D RealNVP on MNIST.
nvp2d_cifar10.py Train and evaluate a 2D RealNVP on CIFAR10.
spn_latent_mnist.py Train and evaluate a SPN on MNIST using the features extracted by an autoencoder.

Related Repositories

References

1. Peharz et al. On Theoretical Properties of Sum-Product Networks. AISTATS (2015).

2. Poon and Domingos. Sum-Product Networks: A New Deep Architecture. UAI (2011).

3. Molina, Vergari et al. Mixed Sum-Product Networks: A Deep Architecture for Hybrid Domains. AAAI (2018).

4. Molina, Vergari et al. SPFLOW : An easy and extensible library for deep probabilistic learning using Sum-Product Networks. CoRR (2019).

5. Peharz et al. Probabilistic Deep Learning using Random Sum-Product Networks. UAI (2020).

6. Papamakarios et al. Masked Autoregressive Flow for Density Estimation. NeurIPS (2017).

7. Dinh et al. Density Estimation using RealNVP. ICLR (2017).

8. Dinh et al. NICE: Non-linear Independent Components Estimation. ICLR (2015).

9. Papamakarios, Nalisnick et al. Normalizing Flows for Probabilistic Modeling and Inference. JMLR (2021).

10. Van de Wolfshaar and Pronobis. Deep Generalized Convolutional Sum-Product Networks for Probabilistic Image Representations. PGM (2020).

11. Rahman et al. Cutset Networks: A Simple, Tractable, and Scalable Approach for Improving the Accuracy of Chow-Liu Trees. ECML-PKDD (2014).

12. Di Mauro, Gala et al. Random Probabilistic Circuits. UAI (2021).

13. Desana and Schnörr. Learning Arbitrary Sum-Product Network Leaves with Expectation-Maximization. CoRR (2016).

14. Peharz et al. Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits. ICML (2020).

Comments
  • Add implementations for cutset networks (CNets)

    Add implementations for cutset networks (CNets)

    Adding implementations for cutset networks (CNets), including:

    • A BinaryCNet class, which implements the classical CNet learning algorithm based on the information gain heuristic from decision tree literature.
    • The learning algorithm based on the Bayesian-Dirichlet equivalent uniform (BDeu) score.
    • The learning algorithm based on the Bayesian information criterion (BIC) score.
    • Functions to compute counts of RVs given some data sets.
    • Functions to estimate Bayesian posterior parameters.
    enhancement 
    opened by yangyang-tue 3
  • Feedback on the example running experience

    Feedback on the example running experience

    I ran all examples. They are a nice way of testing how the code runs on one's computer and show its capabilities. Below, I provide some points of feedback/suggestions that may improve the experience people have when running the examples. Some of that feedback may pertain or be relevant to other parts of the code base as well.

    1. Often, files are created as part of an example, such as the nice illustrative figures. It would be useful to alert the user of all files being created, so that they are aware of this even if they do not keep an eye on their working folder. Also, some files have unclear purpose (such as the pt files). Clarifying their use when alerting they are created would therefore be useful. (If they are temporary files, delete them at the end of running the example or use the tempfile module.)
    2. The console output provides useful information about the time it takes to run an example. If possible, generalize this to all examples that are not trivially short. (I think the first stage of spn_latent_mnist.py does not.)
    3. The console output numerical values often have a large number of digits displayed. There is little reason to believe that many are actually significant. Furthermore, it makes the output more difficult to read and digest. Ideally, output only significant digits, but if you do not know how many digits are significant, 4 digits in total is a good upper bound (like 57.63 %, 1234, 1.234e6).
    4. Many of the console output numbers have units (s, it/s, batch/s). The international standard is to always have a space between a number and its unit.
    5. Sometimes, JSON output is created either as console output or in files. Try to pretty-print it a bit, to make it easier to scan. If it is not meant to be read, perhaps consider omitting it.
    6. For many of the examples, you generate images, which is great. It would add value to have every example generate some image, even if it is not a sample. Namely, the examples can also provide users of the package inspiration of the type of images that they might generate.
    7. In one case, an image was generated in an interactive window (nvp1d_moons.py) and not in an image file. That is nice. Could it be generalized to all examples, with a fallback to image file generation?
    8. In two cases, the examples automatically downloaded some datasets. While convenient, some users might not expect this, may not like it, or may not have an internet connection. I think it would be more user-friendly to ask first or instruct first where to download the dataset. Furthermore, I saw that MNIST was downloaded from LeCunn's original website, who explicitly requests not to do that (“Please refrain from accessing these files from automated scripts with high frequency. Make copies!”); it would be polite to honor that request. In general, make sure to download from permanent repositories if possible instead of possibly non-permanent websites.
    9. The console output lists accuracy percentages. These generally are quite a bit closer to 100 % than to 0 %. Therefore, the initial digit (7, 8, 9) is often not very significant and therefore distracting. It is more user friendly to use error rate instead, so, e.g., [12.49, 8.66, 4.57] instead of [87.51, 91.34, 95.43].

    Obviously, these are mostly cosmetic suggestions, so I'd understand that you classify (parts of) this issue as ‘wontfix’.

    good first issue wontfix 
    opened by equaeghe 3
  • Unclear where experiments folder is

    Unclear where experiments folder is

    I installed deeprob-kit using pip:

    $ pip install --user deeprob-kit
    

    Now, I want try out the experiments to see if the code works on my system. However, I do not seem to have the experiments folder and therefore do not seem to be able to run them or put the datasets in place. Namely, what I have after installation is the following tree:

    ~/.local/lib/python3.9 $ tree -L 3
    .
    └── site-packages
        ├── deeprob
        │   ├── __init__.py
        │   ├── __pycache__
        │   ├── context.py
        │   ├── flows
        │   ├── spn
        │   ├── torch
        │   └── utils
        └── deeprob_kit-1.0.0.dist-info
            ├── INSTALLER
            ├── LICENSE
            ├── METADATA
            ├── RECORD
            ├── REQUESTED
            ├── WHEEL
            └── top_level.txt
    
    8 directories, 9 files
    

    My impression is that the bundle on PyPi only contains deeprob-kit itself, without any of the other materials. Perhaps putting the experiments folder (and others) under deeprob may provide a solution, but I guess you chose the current structure for a reason. Or perhaps I am looking in the wrong location.

    documentation question 
    opened by equaeghe 3
  • Issue#33

    Issue#33

    1. Introduced rsample method to normalizing flows models for differentiable sampling.
    2. Backpropagating through autoregressive layers (MAFs) in forward mode is now possible, i.e., see __backprop_apply_forward method.
    opened by loreloc 0
  • Fully differentiable MAFs

    Fully differentiable MAFs

    The method "apply_forward" of the class "AutoregressiveLayer" is not differentiable, due to in-place operations. This makes training using the flow sampling direction impossible.

    bug 
    opened by gengala 0
  • Refactor Unit Tests

    Refactor Unit Tests

    • Refactor tests to use pytest instead of unittest
    • Add tests for shapes checking
    • Introduce Continuous Integration (CI) (e.g. GitHub Action using Codecov on merge on main)
    enhancement 
    opened by loreloc 0
  • Introduce multithreaded implementation of forward and backward evaluation of SPNs

    Introduce multithreaded implementation of forward and backward evaluation of SPNs

    • The forward evaluation (used for EVI, MAR and MPE queries and sampling) can be parallelized by considering a layered topological ordering of the SPN graph. That is, every leaf node can be evaluated in parallel and, after that, every parent node can be computed in parallel as well, and so on.
    • The backward evaluation (used for MPE query and sampling) can be parallelized by considering a layered topological ordering of the SPN graph, as for forward evaluation.
    • Moreover, introduce unit tests ensuring the correctness of the implementation.

    A suitable multiprocessing library for this task is joblib, for which it is possible to specify 'threading' as lightweight backend.

    enhancement 
    opened by loreloc 0
  • FID score implementation

    FID score implementation

    This PR includes an implementation of the FID score based (by default) on Torchvision's InceptionV3 model pretrained on ImageNet. However, it is generalized by using any model that extracts features. The core Numpy implementation is based on pytorch-fid v0.2.1.

    Closes #5 .

    enhancement 
    opened by loreloc 0
  • Update README.md and fix implicit imports

    Update README.md and fix implicit imports

    • Update the table of implemented models in README.md
    • Add NormalizingFlow abstract class import in flows/models/__init__.py
    • Add RatSpn abstract class import in spn/models/__init__.py
    • Fix 'type' object is not subscriptable using sphinx
    • Prepend MIT license information to every source file in deeprob/
    documentation 
    opened by loreloc 0
  • On flows, mean and standard deviation of default base distribution are not kept constant during training

    On flows, mean and standard deviation of default base distribution are not kept constant during training

    When training a normalizing flow having a Standard Gaussian as base distribution (i.e. using in_base=None by default), mean and standard deviation are not kept constant during training. The expected behavior is that they must be kept constant during training.

    This is probably due to a wrong initialization of mean and standard deviation parameters: https://github.com/deeprob-org/deeprob-kit/blob/main/deeprob/flows/models/base.py#L52-L53.

    bug 
    opened by loreloc 0
  • Make benchmark reults more reliable

    Make benchmark reults more reliable

    • Disable automatic garbage collection in Python.
    • Execute garbage collection manually outside of code blocks that measure elapsed times.
    • Increase the number of repetitions from 20 to 50.
    • Add benchmarks rule to Makefile.
    bug 
    opened by loreloc 0
  • Introduce additional scripts regarding XPCs

    Introduce additional scripts regarding XPCs

    • Add an example about learning and using XPCs.
    • Add a script (similar to experiments/spn.py) to launch XPC experiments.
    • Add basic unit tests about XPC related modules.
    enhancement 
    opened by loreloc 0
  • Create TreeBN class

    Create TreeBN class

    Most of the code available in BinaryCLT actually works for any tree-shaped Bayesian Network. Therefore, it would be better to create a super-class called TreeBN and then make BinaryCLT a subclass of it.

    enhancement 
    opened by gengala 0
Owner
DeeProb-org
DeeProb-org
(ICCV 2021) ProHMR - Probabilistic Modeling for Human Mesh Recovery

ProHMR - Probabilistic Modeling for Human Mesh Recovery Code repository for the paper: Probabilistic Modeling for Human Mesh Recovery Nikos Kolotouros

Nikos Kolotouros 209 Dec 13, 2022
Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

HamasKhan 3 Jul 8, 2022
Generative Flow Networks for Discrete Probabilistic Modeling

Energy-based GFlowNets Code for Generative Flow Networks for Discrete Probabilistic Modeling by Dinghuai Zhang, Nikolay Malkin, Zhen Liu, Alexandra Vo

Narsil-Dinghuai Zhang 51 Dec 20, 2022
Deep universal probabilistic programming with Python and PyTorch

Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab

null 7.7k Dec 30, 2022
Deep Probabilistic Programming Course @ DIKU

Deep Probabilistic Programming Course @ DIKU

null 52 May 14, 2022
Registration Loss Learning for Deep Probabilistic Point Set Registration

RLLReg This repository contains a Pytorch implementation of the point set registration method RLLReg. Details about the method can be found in the 3DV

Felix Järemo Lawin 35 Nov 2, 2022
DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predicate.

DeepProbLog DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predic

KU Leuven Machine Learning Research Group 94 Dec 18, 2022
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Bayesian Methods for Hackers Using Python and PyMC The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chap

Cameron Davidson-Pilon 25.1k Jan 2, 2023
Fast, flexible and easy to use probabilistic modelling in Python.

Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work! pomegranate is a package for building probabilistic

Jacob Schreiber 3k Dec 29, 2022
Deep generative modeling for time-stamped heterogeneous data, enabling high-fidelity models for a large variety of spatio-temporal domains.

Neural Spatio-Temporal Point Processes [arxiv] Ricky T. Q. Chen, Brandon Amos, Maximilian Nickel Abstract. We propose a new class of parameterizations

Facebook Research 75 Dec 19, 2022
A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generative Modeling" (ICCV 2021)

Manifold Matching via Deep Metric Learning for Generative Modeling A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generat

null 69 Dec 10, 2022
Urban mobility simulations with Python3, RLlib (Deep Reinforcement Learning) and Mesa (Agent-based modeling)

Deep Reinforcement Learning for Smart Cities Documentation RLlib: https://docs.ray.io/en/master/rllib.html Mesa: https://mesa.readthedocs.io/en/stable

null 1 May 15, 2022
Supervised domain-agnostic prediction framework for probabilistic modelling

A supervised domain-agnostic framework that allows for probabilistic modelling, namely the prediction of probability distributions for individual data

The Alan Turing Institute 112 Oct 23, 2022
Probabilistic Programming and Statistical Inference in PyTorch

PtStat Probabilistic Programming and Statistical Inference in PyTorch. Introduction This project is being developed during my time at Cogent Labs. The

Stefano Peluchetti 109 Nov 26, 2022
Modular Probabilistic Programming on MXNet

MXFusion | | | | Tutorials | Documentation | Contribution Guide MXFusion is a modular deep probabilistic programming library. With MXFusion Modules yo

Amazon 100 Dec 10, 2022
PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning

PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning Warning: This is a rapidly evolving research prototype.

MIT Probabilistic Computing Project 190 Dec 27, 2022
NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling For Official repo of NU-Wave: A Diffusion Probabilistic Model for Neural Audio Up

Rishikesh (ऋषिकेश) 38 Oct 11, 2022
Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

SYMPAIS: Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis Overview | Installation | Documentation | Examples | Notebo

Yicheng Luo 4 Sep 13, 2022