Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

Overview

pgmpy

Build Status Appveyor codecov Codacy Badge Downloads Join the chat at https://gitter.im/pgmpy/pgmpy

pgmpy is a python library for working with Probabilistic Graphical Models.

Documentation and list of algorithms supported is at our official site http://pgmpy.org/
Examples on using pgmpy: https://github.com/pgmpy/pgmpy/tree/dev/examples
Basic tutorial on Probabilistic Graphical models using pgmpy: https://github.com/pgmpy/pgmpy_notebook

Our mailing list is at https://groups.google.com/forum/#!forum/pgmpy .

We have our community chat at gitter.

Dependencies

pgmpy has following non optional dependencies:

  • python 3.6 or higher
  • networkX
  • scipy
  • numpy
  • pytorch

Some of the functionality would also require:

  • tqdm
  • pandas
  • pyparsing
  • statsmodels
  • joblib

Installation

pgmpy is available both on pypi and anaconda. For installing through anaconda use:

$ conda install -c ankurankan pgmpy

For installing through pip:

$ pip install -r requirements.txt  # only if you want to run unittests
$ pip install pgmpy

To install pgmpy from the source code:

$ git clone https://github.com/pgmpy/pgmpy 
$ cd pgmpy/
$ pip install -r requirements.txt
$ python setup.py install

If you face any problems during installation let us know, via issues, mail or at our gitter channel.

Development

Code

Our latest codebase is available on the dev branch of the repository.

Contributing

Issues can be reported at our issues section.

Before opening a pull request, please have a look at our contributing guide

Contributing guide contains some points that will make our life's easier in reviewing and merging your PR.

If you face any problems in pull request, feel free to ask them on the mailing list or gitter.

If you want to implement any new features, please have a discussion about it on the issue tracker or the mailing list before starting to work on it.

Testing

After installation, you can launch the test form pgmpy source directory (you will need to have the pytest package installed):

$ pytest -v

to see the coverage of existing code use following command

$ pytest --cov-report html --cov=pgmpy

Documentation and usage

The documentation is hosted at: http://pgmpy.org/

We use sphinx to build the documentation. To build the documentation on your local system use:

$ cd /path/to/pgmpy/docs
$ make html

The generated docs will be in _build/html

Examples

We have a few example jupyter notebooks here: https://github.com/pgmpy/pgmpy/tree/dev/examples For more detailed jupyter notebooks and basic tutorials on Graphical Models check: https://github.com/pgmpy/pgmpy_notebook/

Citing

Please use the following bibtex for citing pgmpy in your research:

@inproceedings{ankan2015pgmpy,
  title={pgmpy: Probabilistic graphical models using python},
  author={Ankan, Ankur and Panda, Abinash},
  booktitle={Proceedings of the 14th Python in Science Conference (SCIPY 2015)},
  year={2015},
  organization={Citeseer}
}

License

pgmpy is released under MIT License. You can read about our license at here

Issues
  • Adds base class for continuous node representation

    Adds base class for continuous node representation

    This PR deals with the basic continuous node representation feature. It will comprise a base class for Continuous node representation along with various methods to discretize the continuous variables into discrete factors. This involves the first three weeks of my GSoC project.

    opened by yashu-seth 102
  • Hamiltonian Monte Carlo

    Hamiltonian Monte Carlo

    This PR deals with implementing HMC with dual averaging. The implementation is still open for discussion. If you find anything ambiguous please comment on the line. This PR is open for discussion.

    opened by khalibartan 74
  • ContinousFactor and Joint Gaussian Representation

    ContinousFactor and Joint Gaussian Representation

    This PR deals with

    • the creation of a base class ContinuousFactor for multivariate representations.
    • the creation of the class JointGaussainDistribution - a model to represent the gaussain random variables.
    opened by yashu-seth 55
  • Added BIF.py into readwrite

    Added BIF.py into readwrite

    All functions are not implemented. only get_variable,get_states and get_property are implemented.

    Creating this pull request for easy review. I have tested these methods on munin2.bif and dog-problem.bif and they are working fine. Implemented these functions in accordance with BIF v0.15 as given here . Time taken to run if not importing pgmpy and numpy was 0.06s on average with printing the complete variable_states for munin2.bif (PS: 1003 nodes) Check issue #506 .

    opened by khalibartan 45
  • updates in check_model method

    updates in check_model method

    @ankurankan I have removed cardinalities from the model attributes but it seems it is being used at other places as well. Should cardinalities be computed every single time it is required? Is there a particular problem in having it as an attribute?

    opened by yashu-seth 32
  • Improving Variable Elimination (VE)

    Improving Variable Elimination (VE)

    Here, we finish the implementation of VE with few missing steps, namely computing good elimination orderings (with 4 heuristics - min neighbors, min fill, min weight, and weighted min fill) and safely removing irrelevant variables from the model (barren and independent by evidence nodes). In order to make these improvements, few new methods were necessary and modifying other members too. In our preliminary experimental results, queries that were taking up to 30 minutes using VE now take less than 2 minutes. Please, help us test this new code for a robust implementation and also looking for better ways of coding the algorithms. Thanks.

    opened by jhonatanoliveira 27
  • Replaced recursive call with `while` loop.

    Replaced recursive call with `while` loop.

    • Replaces the recursive version of fun with an iterative one, much easier to read in my opinion. This should be slightly fast as well because recursive calls are expensive in Python(results in new stack frame each time), plus Python has a default limit of 1000 for recursive calls(though this is highly unlikely to occur in our case).
    • I am not sure why the library is using Python 2 based super() calls considering the fact that dependency includes Python 3.3. In Python 3 thanks to the cell variable __class__ we can simply use super().method_name(...)(PEP 3135 -- New Super).
    opened by ashwch 25
  • Added model.predict_probability

    Added model.predict_probability

    Added a new method that gives probabilities of missing variables given a predict data #794

            B_0         B_1
        80  0.439178    0.560822
        81  0.581970    0.418030
        82  0.488275    0.511725
        83  0.581970    0.418030
        84  0.510794    0.489206
        85  0.439178    0.560822
        86  0.439178    0.560822
        87  0.417124    0.582876
        88  0.407978    0.592022
        89  0.429905    0.570095
        90  0.581970    0.418030
        91  0.407978    0.592022
        92  0.429905    0.570095
        93  0.429905    0.570095
        94  0.439178    0.560822
        95  0.407978    0.592022
        96  0.559904    0.440096
        97  0.417124    0.582876
        98  0.488275    0.511725
        99  0.407978    0.592022`
    

    Also a new error test for predict that increases the coverage.

    opened by raghavg7796 24
  • Strange Behavior HillClimbSearch

    Strange Behavior HillClimbSearch

    Subject of the issue

    I want to reproduce the example here

    Your environment

    • pgmpy version: 0.1.12
    • Python version: 3.6.9
    • Operating System: Ubuntu 18.04.5 LTS

    Steps to reproduce

    import pandas as pd
    import numpy as np
    from pgmpy.estimators import HillClimbSearch, BicScore
    data = pd.DataFrame(np.random.randint(0, 5, size=(5000, 9)), columns=list('ABCDEFGHI'))
    data['J'] = data['A'] * data['B']
    est = HillClimbSearch(data, scoring_method=BicScore(data))
    best_model = est.estimate()
    best_model.edges()
    

    Expected behaviour

    [('B', 'J'), ('A', 'J')]

    Actual behaviour

    [('A', 'B'), ('J', 'A'), ('J', 'B')]

    opened by ivanDonadello 23
  • Hamiltonian Monte Carlo & Hamiltonian Monte Carlo with dual averaging

    Hamiltonian Monte Carlo & Hamiltonian Monte Carlo with dual averaging

    @ankurankan I have send this PR to aid us in discussion. I was experimenting things with how to handle gradients (removing the gradient argument). I tested with two ways:

    • First I tried handled grad_log_pdf argument on places itself depending upon how user passed the argument, if None was passed then I created a lambda function to call model.get_gradient_log_pdf otherwise I created a lambda function to use the custom class. But with this things were messy as I have to handle this parameter at two places, one in sampling class and other in BaseSimulateHamiltonianDynamics class.
    • Second ( this PR implements it). Handle everything in model.get_gradient_log_pdf. This code is less messy, because every call is made to model.get_gradient_log_pdf and the method internally handles the rest so need of making suitable changes at different places.

    How do you suggest I should handle the gradients ? You can look at the last commit to specifically see the changes I made https://github.com/pgmpy/pgmpy/pull/702/commits/748eb1fe13488bb8f0cf27a7064a67384ec3315e

    After the discussion I'll close one of the PR.

    opened by khalibartan 21
  • Efficient factor product

    Efficient factor product

    A factor product following "Probabilistic Graphical Models" (Koller 09) on page 359, Algorithm 10.A.1 - Efficient implementation of a factor product operation. Koller's algorithm was modified to fit the configuration used in pgmpy. For example, in pgmpy the configurations of Factor are supposed to be like (0,0,0) (0,0,1) (0,1,0) (1,0,0) and so on, instead of (0,0,0) (1,0,0) (0,1,0) (0,0,1) as expected for Koller's algorithm.

    Koller's implementation is around 98% faster than the current one in pgmpy. This benchmark was done by using a simple python script as follows:

    from pgmpy.factors import Factor
    from pgmpy.factors import factor_product
    from time import time
    
    phi = Factor(['x1', 'x2'], [2, 2], range(4))
    phi1 = Factor(['x3', 'x4'], [2, 2], range(4))
    t0 = time()
    prod = factor_product(phi, phi1)
    t1 = time()
    print(t1-t0)
    

    After running 6 time each implementation, here is the results:

    Comparison

    Unfortunately, we don't know how to use JobLib. But we leave this TODO with the hope that using parallel computation can improve this implementation even further.

    opened by jhonatanoliveira 21
  • BN of multi-sensor

    BN of multi-sensor

    I have some data of multi-sensor. There are 8 channels. And I partition it into sample with the dimensions of 1024*8. There 1000 samples. I use the score-based method to learn the structure. I want to get 1000 BNs, each has 8 nodes. But it seems the method can't learn the structure. The adjacency matrices are all empty.
    Following is my code:

        data_graph = pd.DataFrame(x.T, columns=['C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8'])
        hc = HillClimbSearch(data_graph)
        adj_tmp = hc.estimate(scoring_method="bicscore")
    

    x :8*1024, original time domain data adj_tmp: DAG with no edges

    opened by XMAHA 1
  • Smooth param

    Smooth param

    Hi,

    Thank you for this amazing library. Do you think that could you incorporate a smooth parameter in order to do a gridsearch depending on score and smooth, like in bnclassify of R?

    Thanks! Pablo

    opened by PARODBE 1
  • How to access to the variable that gives the max probability

    How to access to the variable that gives the max probability

    From a situation like this:

    print(q['bronc']) +---------+--------------+ | bronc | phi(bronc) | |---------+--------------| | bronc_0 | 0.6000 | | bronc_1 | 0.4000 | +---------+--------------+

    how do i access "bronc_0" which is the most likely label? I know that q.values[1] give me 0.6

    opened by DaniFra 1
  • sampling doesn't work when nodes are types other than 'str'

    sampling doesn't work when nodes are types other than 'str'

    Sampling doesn't work when nodes are types other than 'str'

    When you want to make a sample of any size, in the 'Sampling' module, if the nodes of the network are of types other than 'str' it raises an error. While if you change this line: types = [(var_name, "int") for var_name in self.variables] to: types = [(str(var_name), "int") for var_name in self.variables] then it will work regardless of the node(variable) type. Ofcourse the specific type must have the _str_ magical function defined.

    Your environment

    • pgmpy version 0.1.18
    • Python version 3.10
    • Operating System macOS Monterey

    Steps to reproduce

    Define a class for a Node:

    class Node:
        def __init__(self, name, card, probabilities):
            self.name = name
            self.card = card
            self.probabilities = probabilities]
        def __str__(self):
            return self.name
    

    Create two nodes:

    a = Node("A", 2, [
        [.5], 
        [.5],
        ])
    b = Node("B", 2, [
        [.5, .5], 
        [.5, .5],
        ])
    

    Create a Bayesian Network and add cpds:

    model = BayesianNetwork([
        (a, b)
    ])
    cpds = [
        TabularCPD(
                node,
                node.card,
                node.probabilities,
                evidence=model.get_parents(node),
                evidence_card=[evidence.card for evidence in model.get_parents(node)],
            )
            for node in model.nodes()
    ]
    model.add_cpds(*cpds)
    

    Generate samples:

    gibbs_chain = GibbsSampling(model)
    gibbs_chain.sample(seed=1)
    

    Expected behaviour

    A B
    0 1 1

    Actual behaviour

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    /Users/mz/VSCode/Funda/Exam2.ipynb Cell 23' in <module>
         [26]()[ model.add_cpds(*cpds)
         ]()[27]()[ gibbs_chain = GibbsSampling(model)
    ---> ]()[28]()[ gibbs_chain.sample(seed=1)
    
    File /usr/local/lib/python3.10/site-packages/pgmpy/sampling/Sampling.py:528, in GibbsSampling.sample(self, start_state, size, seed, include_latents)
        ]()[525]()[     np.random.seed(seed)
        ]()[527]()[ types = [(var_name, "int") for var_name in self.variables]
    --> ]()[528]()[ sampled = np.zeros(size, dtype=types).view(np.recarray)
        ]()[529]()[ sampled[0] = tuple(st for var, st in self.state)
        ]()[530]()[ for i in tqdm(range(size - 1)):
    
    TypeError: First element of field tuple is neither a tuple nor str]()
    
    opened by mowhammadrezaa 1
  • KeyError in DAG.active_trail_nodes() when node is not str

    KeyError in DAG.active_trail_nodes() when node is not str

    Subject of the issue

    When the graph has node with name that has a list or tuple type, invoking the DAG or BayesianNetwork's 'active_trail_node()' or get_independent raises KeyError. Similar to #1334 .

    Your environment

    • pgmpy 0.1.18dev
    • Python 3.10.4
    • Operating System windows 11 x64

    Steps to reproduce

    from pgmpy.base import DAG
    student = DAG()
    student.add_nodes_from(['diff', 'intel', ('grades', 0)])
    student.add_edges_from([('diff', ('grades', 0)), ('intel', ('grades', 0))])
    print(student.active_trail_nodes('diff'))
    print(student.active_trail_nodes(('grades', 0)))
    

    Expected behaviour

    Should return the active trails of the node.

    Actual behaviour

    Traceback (most recent call last):
      File "~\AppData\Roaming\Python\Python310\site-packages\networkx\classes\digraph.py", line 835, in predecessors      
        return iter(self._pred[n])
    KeyError: 'grades'
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "A_PATH\crash.py", line 7, in <module>
        print(student.active_trail_nodes(('grades', 0)))
      File "B_PATH\pgmpy\base\DAG.py", line 743, in active_trail_nodes
        for parent in self.predecessors(node):
      File "~\AppData\Roaming\Python\Python310\site-packages\networkx\classes\digraph.py", line 837, in predecessors      
        raise NetworkXError(f"The node {n} is not in the digraph.") from err
    networkx.exception.NetworkXError: The node grades is not in the digraph.
    

    Possible solution

    This problem is caused by this line in active_trail_nodes():

    for start in variables if isinstance(variables, (list, tuple)) else [variables]:
    

    I'm not sure if other places with similar logic will also produce similar errors.

    opened by THUzxj 1
  • The result of each DAG (pattern) construction is different

    The result of each DAG (pattern) construction is different

    Subject of the issue

    I use est.skeleton_to_pdag and est.pdag_to_dag to generate different dictionaries every time, so the final generated bayes network is also different. I want to know what is the reason for this. Can I fix my results through random seed? In the file plus contains the two result pictures I generated: result A and result B.

    Thanks´╝ü

    Your environment

    • pgmpy version 0.1.11
    • Python version 3.8
    • Operating System Windows

    Steps to reproduce

    Tell us how to reproduce this issue. Please provide a minimal reproducible code of the issue you are facing if possible.

    Expected behaviour

    I wonder if this module can set random seeds to fix the result.

    opened by OccZa 2
Releases(v0.1.18)
Owner
pgmpy
Python library for Probabilistic Graphical Models
pgmpy
LBK 20 Apr 25, 2022
LBK 16 Apr 12, 2022
Deep Learning Models for Causal Inference

Extensive tutorials for learning how to build deep learning models for causal inference using selection on observables in Tensorflow 2.

Bernard  J Koch 97 May 20, 2022
CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.

CausalNLP CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable. Install pip install -U

Arun S. Maiya 75 May 13, 2022
Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

PyVarInf PyVarInf provides facilities to easily train your PyTorch neural network models using variational inference. Bayesian Deep Learning with Vari

null 336 May 8, 2022
Probabilistic Programming and Statistical Inference in PyTorch

PtStat Probabilistic Programming and Statistical Inference in PyTorch. Introduction This project is being developed during my time at Cogent Labs. The

Stefano Peluchetti 108 Mar 28, 2022
Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression. Not an official Google product. Me

Google Research 26 Nov 5, 2021
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Bayesian Methods for Hackers Using Python and PyMC The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chap

Cameron Davidson-Pilon 24.4k May 19, 2022
Data-depth-inference - Data depth inference with python

Welcome! This readme will guide you through the use of the code in this reposito

Marco 3 Feb 8, 2022
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

null 69 May 12, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 151 May 13, 2022
Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

Katsuya Hyodo 6 Feb 16, 2022
Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

Demetri Pananos 7 May 9, 2022
A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

PGPElib A mini library for Policy Gradients with Parameter-based Exploration [1] and friends. This library serves as a clean re-implementation of the

NNAISENSE 45 Apr 25, 2022
Bayesian optimisation library developped by Huawei Noah's Ark Library

Bayesian Optimisation Research This directory contains official implementations for Bayesian optimisation works developped by Huawei R&D, Noah's Ark L

HUAWEI Noah's Ark Lab 251 May 21, 2022
Drslmarkov - Distributionally Robust Structure Learning for Discrete Pairwise Markov Networks

Distributionally Robust Structure Learning for Discrete Pairwise Markov Networks

null 1 Feb 5, 2022
Multi-task Learning of Order-Consistent Causal Graphs (NeuRIPs 2021)

Multi-task Learning of Order-Consistent Causal Graphs (NeuRIPs 2021) Authors: Xinshi Chen, Haoran Sun, Caleb Ellington, Eric Xing, Le Song Link to pap

Xinshi Chen 2 Dec 20, 2021
Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

Invariant Causal Imitation Learning for Generalizable Policies Ioana Bica, Daniel Jarrett, Mihaela van der Schaar Neural Information Processing System

Ioana Bica 13 Apr 5, 2022