A library for hidden semi-Markov models with explicit durations



hsmmlearn is a library for unsupervised learning of hidden semi-Markov models with explicit durations. It is a port of the hsmm package for R, and in fact wraps the same underlying C++ library.

hsmmlearn borrows its name and the design of its api from hmmlearn.


hsmmlearn supports Python 2.7 and Python 3.4 and up. After cloning the repository, first install the requirements

pip install -r requirements.txt

Then run either

python setup.py develop


python setup.py install

to install the package from source.

To run the unit tests, do

python -m unittest discover -v .

Building the documentation

The documentation for hsmmlearn is a work in progress. To build the docs, first install the doc requirements, then run Sphinx:

cd docs
pip install -r doc_requirements.txt
make html

If everything goes well, the documentation should be in docs/_build/html.

Some of the documentation comes as jupyter notebooks, which can be found in the notebooks/ folder. Sphinx ingests these, and produces rst documents out of them. If you end up modifying the notebooks, run make notebooks in the documentation folder and check in the output.


hsmmlearn incorporates a significant amount of code from R's hsmm package, and is therefore released under the GPL, version 3.0.

  • Model Fitting

    Model Fitting


    while reading your code, I wonder is there a requirement of the emission distribution in the model fitting? If I know the emission distribution is gaussian, how the model fitting will be? Does it will fit to such specific distribution or not? It seems to me that there's no such requirement.

    Just want to make sure.

    opened by 0x10cxR1 6
  • test failed

    test failed


    after installing hsmmlearn, I ran "python3 -m unittest discover -v ."

    I got: ImportError: No module named 'hsmmlearn.base".

    Any idea why? Thanks.

    opened by 0x10cxR1 4
  • undefined symbol: __cxa_throw_bad_array_new_length

    undefined symbol: __cxa_throw_bad_array_new_length

    I installed hsmmlearn as described and the installation went through. However, when I run the tests, I get

    ERROR: hsmmlearn.tests.test_base (unittest.loader._FailedTest)
    ImportError: Failed to import test module: hsmmlearn.tests.test_base
    Traceback (most recent call last):
      File "/home/ano/anaconda3/lib/python3.6/unittest/loader.py", line 428, in _find_test_path
        module = self._get_module_from_name(name)
      File "/home/ano/anaconda3/lib/python3.6/unittest/loader.py", line 369, in _get_module_from_name
      File "/home/ano/hsmmlearn/hsmmlearn/tests/test_base.py", line 7, in <module>
        from hsmmlearn.base import _viterbi_impl
    ImportError: /home/ano/hsmmlearn/hsmmlearn/base.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cxa_throw_bad_array_new_length

    The same ImportError occurs when I try to import classes such as e.g. GaussianHSMM:

    ImportError                               Traceback (most recent call last)
    <ipython-input-9-afdd3f146f00> in <module>()
    ----> 1 from hsmmlearn.hsmm import MultinomialHSMM
    ~/anaconda3/lib/python3.6/site-packages/hsmmlearn-0.1.0-py3.6-linux-x86_64.egg/hsmmlearn/hsmm.py in <module>()
          1 import numpy as np
    ----> 3 from .base import _viterbi_impl, _fb_impl
          4 from .emissions import GaussianEmissions, MultinomialEmissions
          5 from .properties import Durations, Emissions, TransitionMatrix
    ImportError: /home/ano/anaconda3/lib/python3.6/site-packages/hsmmlearn-0.1.0-py3.6-linux-x86_64.egg/hsmmlearn/base.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cxa_throw_bad_array_new_length

    I am running Python 3.6 and GCC 7.3.1-2 on Fedora 27. Any ideas what I have to change about my build configuration to make this work?

    Apart from that: Thanks for the great work!

    opened by lsha31 3
  • Could you tell me which paper you referred to to write this algorithm?

    Could you tell me which paper you referred to to write this algorithm?

    Hello, Could you tell me which paper you referred to to write this algorithm? I have some questions about the EM algorithm.

    Given the sequence of observations, we use the EM algorithm to estimate the parameters of the HSMM model. In the EM algorithm, we first need to give the initial value, and then iterate through the EM algorithm to get the optimal value.

    The first step: we use the following code, this code gives the real value of tmat, means, scales, durations and other HSMM parameters to get the observation sequence and hidden sequence. true

    The second step: use the EM algorithm to solve the parameters of the HSMM model. Firstly, give the hypothetical initial values of the new_tmat, new_means, new_scales, and new_durations parameters, and then use the hsmm.fit function to train our observation sequence to obtain the parameters of the HSMM. Are the parameters obtained in this step approximate to the real values? fit

    opened by Wusir2018 1
  • Multinomial model inference is faulty

    Multinomial model inference is faulty

    I used the following initial dummy parameters and handcrafted a super toy dataset, then use the model to fit. However, the posterior has negative probability values.

    #assume three hidden states
    prior = np.array([[0.1,0.1,0.1,0.7],
    durations = np.array([[0.1, 0.1, 0.8, 0.0, 0.0],
                          [0.1, 0.7, 0.2, 0.0, 0.0],
                          [0.2, 0.2, 0.2, 0.2, 0.2]])
    tmat = np.array([[0.0, 0.5, 0.5],
                     [0.3, 0.0, 0.7],
                     [0.6, 0.4, 0.0]])

    Toy data:

    D = 5
    samples = list()
    for i in range(3): # simply repeat the pattern 3 times.
        for s in range(4): # 4 categories
            samples.append([s for _ in range(D)]) # duration of each categories are always D
    samples = np.array(tmp).reshape(-1)


    new_hsmm = MultinomialHSMM(
        prior, durations, tmat,


    print new_hsmm.emissions._probabilities
    [[  5.00000000e-01  -3.08395409e-16  -3.08395358e-16   5.00000000e-01]
     [  2.50000000e-01   6.94826100e-17   5.00000000e-01   2.50000000e-01]
     [  7.22089011e-16   8.33333333e-01   1.66666667e-01   3.03954851e-14]]
    opened by kunrenzhilu 1
  • Where to find API documentation reference?

    Where to find API documentation reference?

    Hi, I am trying to find the API documentation reference. I google it and found the following link


    But the API reference is empty. Can you point to me the right link?



    opened by yanpanlau 1
  • changed: switch to poetry

    changed: switch to poetry

    I write to switch system to poetry.

    Because, poetry add "git+https://github.com/jvkersch/hsmmlearn" is failed. poetry don't load system Cython.

    Poetry doesn`t use provided private repo when compiling dependencies from source · Issue #3744 · python-poetry/poetry

    2021-06-02T18:46:48 ✖ poetry add "git+https://github.com/jvkersch/hsmmlearn"
      Unable to determine package info for path: /tmp/pypoetry-git-hsmmlearn0klkxnrl
      Fallback egg_info generation failed.
      Command ['/tmp/tmpqwo321rq/.venv/bin/python', 'setup.py', 'egg_info'] errored with the following return code 1, and output:
      Traceback (most recent call last):
        File "/tmp/pypoetry-git-hsmmlearn0klkxnrl/setup.py", line 54, in <module>
        File "/tmp/pypoetry-git-hsmmlearn0klkxnrl/setup.py", line 23, in get_extension_modules
          from Cython.Build import cythonize
      ModuleNotFoundError: No module named 'Cython'
      at ~/.local/share/pypoetry/venv/lib/python3.9/site-packages/poetry/inspection/info.py:502 in _pep517_metadata
          498│                 try:
          499│                     venv.run("python", "setup.py", "egg_info")
          500│                     return cls.from_metadata(path)
          501│                 except EnvCommandError as fbe:
        → 502│                     raise PackageInfoError(
          503│                         path, "Fallback egg_info generation failed.", fbe
          504│                     )
          505│                 finally:
          506│                     os.chdir(cwd.as_posix())

    Even if you were to upload it to PyPi. Basically, it is better to keep the information needed for the build closed locally. It is easier to keep the information needed to build locally, even if it is uploaded to PyPi, so I let poetry do the work and made the global library as unnecessary as possible.

    This is a near draft, it still has data from the previous build system, and it doesn't properly accept information from the previous build system. If this project is still alive and accepting PRs, I will work on it more carefully.

    opened by ncaq 0
  • ModuleNotFoundError: No module named 'hsmmlearn.hsmm ?

    ModuleNotFoundError: No module named 'hsmmlearn.hsmm ?

    I have cloned the git repository and I think I have all the required files. But I can't import anything inside hsmmlearn. Also. print(hsmmlearn.file) gives 'none'. What should I do?

    opened by TamimAhmEd177 2
  • fitting model to high dimensional data

    fitting model to high dimensional data

    Could you please give a hint, how I can fit model parameters using high dimensional data. For example if I have Data(n*p) n-> number of observation and p -> dimension of observation.

    opened by NimaMojtahedi 1
  • Fitting data to hsmmlearn model

    Fitting data to hsmmlearn model

    We are working on device failure detection using syslogs and trying to use HSMM Model for the same .We have built training data for Error sequences and another with Errorfree sequences. We want to build a HSMM model for both Error and Error free data using the hsmmlearn module.In the example model demonstrated in the git, model is trained on only the sample data based out of Gaussian distribution, but we require the HSMM model to be trained on our Error and Error free data.Kindly help in understanding of how to fit hsmmlearn model for our training data(Error and Error Free data) and do predictions.

    opened by manasa001 1
  • predict liklihood method for given observation ?

    predict liklihood method for given observation ?

    I want to get probability for a given observation sequence. I could see liklihood function is there in emission package, but that is not giving one value.

    new_hsmm.emissions.likelihood(np.array([1,2,3])) Ouput array([[2.26e-01, 4.26e-02, 2.76e-03], [4.25e-04, 8.43e-03, 7.06e-02], [3.41e-32, 6.83e-26, 2.67e-20]])

    I want one single value. Is it possible ?


    opened by amitkumarx86 5
