Python library for multilinear algebra and tensor factorizations

Overview

scikit-tensor

Travis CI

scikit-tensor is a Python module for multilinear algebra and tensor factorizations. Currently, scikit-tensor supports basic tensor operations such as folding/unfolding, tensor-matrix and tensor-vector products as well as the following tensor factorizations:

  • Canonical / Parafac Decomposition
  • Tucker Decomposition
  • RESCAL
  • DEDICOM
  • INDSCAL

Moreover, all operations support dense and tensors.

Dependencies

The required dependencies to build the software are Numpy >= 1.3, SciPy >= 0.7.

Usage

Example script to decompose sensory bread data (available from http://www.models.life.ku.dk/datasets) using CP-ALS

import logging
from scipy.io.matlab import loadmat
from sktensor import dtensor, cp_als

# Set logging to DEBUG to see CP-ALS information
logging.basicConfig(level=logging.DEBUG)

# Load Matlab data and convert it to dense tensor format
mat = loadmat('../data/sensory-bread/brod.mat')
T = dtensor(mat['X'])

# Decompose tensor using CP-ALS
P, fit, itr, exectimes = cp_als(T, 3, init='random')

Install

This package uses distutils, which is the default way of installing python modules. The use of virtual environments is recommended.

pip install scikit-tensor

To install in development mode

git clone [email protected]:mnick/scikit-tensor.git
pip install -e scikit-tensor/

Contributing & Development

scikit-tensor is still an extremely young project, and I'm happy for any contributions (patches, code, bugfixes, documentation, whatever) to get it to a stable and useful point. Feel free to get in touch with me via email (mnick at AT mit DOT edu) or directly via github.

Development is synchronized via git. To clone this repository, run

git clone git://github.com/mnick/scikit-tensor.git

Authors

Maximilian Nickel: Web, [Email](mailto://mnick AT mit DOT edu), Twitter

License

scikit-tensor is licensed under the GPLv3

Related Projects

  • Matlab Tensor Toolbox: A Matlab toolbox for tensor factorizations and tensor operations freely available for research and evaluation.
  • Matlab Tensorlab A Matlab toolbox for tensor factorizations, complex optimization, and tensor optimization freely available for non-commercial academic research.
Comments
  • TypeError when result of sptensor.ttv(vectors) is a sptensor

    TypeError when result of sptensor.ttv(vectors) is a sptensor

    When applying ttv (tensor times vector) between a sparse tensor and a set of vectors, if the result is a sparse tensor, I get the following error:

    "TypeError: arange: scalar arguments expected instead of a tuple."

    The error can be reproduced with this test case:

        def test_ttv():
            subs = (
                array([0, 1, 0, 5, 7, 8]),
                array([2, 0, 4, 5, 3, 9]),
                array([0, 1, 2, 2, 1, 0])
            )
            vals = array([1, 1, 1, 1, 1, 1])
            S = sptensor(subs, vals, shape=[10, 10, 3])
    
            sttv = S.ttv((zeros(10), zeros(10)), modes=[0, 1])
            assert_equal(type(sttv), sptensor)
            assert_true((allclose(zeros(3), sttv.vals)))
            assert_true((allclose(np.arange(3), sttv.subs)))
    
    opened by panisson 4
  • First push for docs. Example, API docs and installation

    First push for docs. Example, API docs and installation

    This is a first effort to document this package. The docs can be created using

    python setup.py build_sphinx
    

    The result ends up in build/sphinx/html/index.html.

    All inline LaTeX will be rendered using pngmath (custom macros like \tens, \kr and \unfold are defined in docs/conf.py).

    setup.py defined some redundant build_sphinx command that was trying to build scipy first. I removed this as we do not need that and can directly build docs.

    I recommend creating the package on https://readthedocs.org/ so you will have automatically generated online docs.

    opened by nils-werner 1
  • Use setuptools install_requires instead of pkg_resources.require

    Use setuptools install_requires instead of pkg_resources.require

    This PR enables scikit-tensor to install without user interaction. Instead of failing during install

    pip install scikit-tensor
    

    will now automatically install numpy, scipy and nose, if not already installed.

    opened by nils-werner 1
  • Error with Python 3.4 in cp_als: unsupported operand types for +:'range' and 'range'

    Error with Python 3.4 in cp_als: unsupported operand types for +:'range' and 'range'

    Reproduction script is the example in the README

    Python 3.4.1 |Anaconda 2.0.0 (64-bit)| (default, May 19 2014, 13:02:41) [GCC 4.1.2 20080704 (Red Hat 4.1.2-54)] on linux

    git commit ef063d0e2f7f3160caf05b2b988ef3479ea2e1c9

    Works fine in 2.7 (I switched to a 2.7 environment using Anaconda)

    Traceback (most recent call last):
      File "tmp.py", line 13, in <module>
        P, fit, itr, exectimes = cp_als(T, 3, init='random')
      File "/home/kkastner/src/scikit-tensor/sktensor/cp.py", line 143, in als
        Unew = X.uttkrp(U, n)
      File "/home/kkastner/src/scikit-tensor/sktensor/dtensor.py", line 162, in uttkrp
        order = range(n) + range(n + 1, self.ndim)
    TypeError: unsupported operand type(s) for +: 'range' and 'range'
    

    I will take a look and see if I can submit a PR

    opened by kastnerkyle 1
  • Sptensor fixes

    Sptensor fixes

    Some fixes in sptensor:

    • accum correction for cases with more than one dimension: accum function was not behaving as expected when the number of dimensions in subs was more than one.
    • Corrections in sptensor and test case to avoid test_unfold failure: with random initialization of test data, test_unfold was also failing randomly (only when cases with duplicated indexes are created). I added a seed initialization for a case where the test was failing and corrected the problems.
    • Optimized sptensor creation in ttv_compute when result is vector-like. Also, in this case, the sptensor should be initialized only with nonzero values (also corrected).
    opened by panisson 1
  • TypeError in sptensor.uttkrp when ttv returns a sptensor

    TypeError in sptensor.uttkrp when ttv returns a sptensor

    This error occurs when calling uttkrp and when ttv (inside uttkrp) returns a sptensor. It can be reproduced with this test case:

    def test_uttkrp():
        subs, vals, shape = mysetup()
        S = sptensor(subs, vals, shape)
        U = []
        for shp in (25, 11, 18, 7, 2):
            U.append(np.zeros((shp, 5)))
        SU = S.uttkrp(U, mode=0)
        assert_equal(SU.shape, (25, 5))
    

    The error is "TypeError: float() argument must be a string or a number", and it happens in sptensor.py, line 191, in uttkrp:

        V[:, r] = self.ttv(Z, mode, without=True)
    

    It seems that, when the result of ttv is a sptensor, it cannot be assigned to V using this operation.

    bug 
    opened by panisson 1
  • Pytest

    Pytest

    This PR replaces Nosetests with Pytest. Nosetests has been dead for a long time and Nose2 is nowhere near Pytest in terms of development activity, number of users and maturity.

    What this PR does:

    • Replace Nose with Pytest
    • Remove "fixtures" that are messing with sys.modules. This was a huge antipattern actually, you never want tests to mess with global variables whose state you cannot track. Pytest fixtures do a similar thing to what you wanted them to do, but in a nice way of explicitly injecting them into tests.
    • Adds a .travis.yml file so integration testing can be used (you only need to enable it).

    What this PR doesn't do:

    • Fix problems in sktensor/tests/test_tucker_hooi.py: You are importing sktensor.rotation.orthomax which doesn't exist and, judging from Git logs, never existed before. I don't think these tests ever passed in the past.
    opened by nils-werner 0
  • Add a Gitter chat badge to README.md

    Add a Gitter chat badge to README.md

    mnick/scikit-tensor now has a Chat Room on Gitter

    @mnick has just created a chat room. You can visit it here: https://gitter.im/mnick/scikit-tensor.

    This pull-request adds this badge to your README.md:

    Gitter

    If my aim is a little off, please let me know.

    Happy chatting.

    PS: Click here if you would prefer not to receive automatic pull-requests from Gitter in future.

    opened by gitter-badger 0
  • In sparse tensor initialization, fix shape inference from the index arrays

    In sparse tensor initialization, fix shape inference from the index arrays

    This pull request should fix the tensor shape inference when creating a sparse tensor. When inferring the shape of the tensor from the maximum values of the subs arrays, the shape values must be incremented by 1 in order to fit a 0-based index array.

    opened by panisson 0
  • Does not corresponds to the formula in the paper

    Does not corresponds to the formula in the paper

    https://github.com/mnick/scikit-tensor/blob/fe517e9661a08164b8d30d2dddf7c96aeeabcf36/sktensor/tucker.py#L103

    This calculation of U_tilde does not correspond to the formula in the paper!!

    opened by rafikg 0
  • Sparse tensor with Tucker

    Sparse tensor with Tucker

    I need to run the tucker with a sparse tensor. I got this error : TypeError: 'numpy.int32' object is not iterable When I run this code: from sktensor import tucker_hooi from sktensor import sptensor

    S = sptensor(([0,1,2], [3,2,0], [2,2,2]), [1,1,1], shape=(10, 20, 5), dtype=np.float)

    tucker_hooi(X=S, rank=[5, 5, 4], init='nvecs')

    Can you help me, please? Thanks.

    opened by simo1000k 0
  • Batch_size problem in using ttm in a custom layer in keras

    Batch_size problem in using ttm in a custom layer in keras

    I have a custom layer in keras where the input is a 2D tensor, but the layer takes a batch of those and the shape it gets is (?, 61,80). The '?' part is for the batch_size . While using the ttm function of sktensor I am getting the error . please help.

    The custom layer is

     def call(self, inputs):
    
    num, n, m = inputs.shape 
    print(inputs.shape)
    (k1,k2) = self.output_dim
    input_tensor = inputs 
    
    print(input_tensor)
         
    input_tensor = dtensor(input_tensor)
    kernel1=np.array(self.W1)
    kernel2=np.array(self.W2)
    
    
    
    feed_forward_product=input_tensor.ttm([kernel1, kernel2],mode=0, transp=False, without=True)
    
    feed_forward_product = np.array(feed_forward_product )
    result = K.tanh(feed_forward_product)
    
    return result
    

    Variables W1 and W2 are declared beforehand. The error I am getting is-

    x_train shape: (269, 61, 80) 269 train samples 70 test samples <tf.Variable 'neural_tensor_layer_2/W1:0' shape=(40, 61) dtype=float32_ref> <tf.Variable 'neural_tensor_layer_2/W2:0' shape=(40, 80) dtype=float32_ref> (?, 61, 80) Tensor("sequential_2_input:0", shape=(?, 61, 80), dtype=float32) Traceback (most recent call last):

    File "", line 1, in runfile('/home/hanumant/Documents/December_experiments/python_codes/TFNN_v2.py', wdir='/home/hanumant/Documents/December_experiments/python_codes')

    File "/home/hanumant/.conda/envs/myenv/lib/python3.5/site-packages/spyder_kernels/customize/spydercustomize.py", line 668, in runfile execfile(filename, namespace)

    File "/home/hanumant/.conda/envs/myenv/lib/python3.5/site-packages/spyder_kernels/customize/spydercustomize.py", line 108, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

    File "/home/hanumant/Documents/December_experiments/python_codes/TFNN_v2.py", line 79, in validation_data=(np.array(x_test),np.array(y_test)))

    File "/home/hanumant/.conda/envs/myenv/lib/python3.5/site-packages/keras/engine/training.py", line 952, in fit batch_size=batch_size)

    File "/home/hanumant/.conda/envs/myenv/lib/python3.5/site-packages/keras/engine/training.py", line 677, in _standardize_user_data self._set_inputs(x)

    File "/home/hanumant/.conda/envs/myenv/lib/python3.5/site-packages/keras/engine/training.py", line 589, in _set_inputs self.build(input_shape=(None,) + inputs.shape[1:])

    File "/home/hanumant/.conda/envs/myenv/lib/python3.5/site-packages/keras/engine/sequential.py", line 221, in build x = layer(x)

    File "/home/hanumant/.conda/envs/myenv/lib/python3.5/site-packages/keras/engine/base_layer.py", line 457, in call output = self.call(inputs, **kwargs)

    File "/home/hanumant/Documents/December_experiments/python_codes/Neural_Tensor_layer.py", line 138, in call feed_forward_product=input_tensor.ttm([kernel1, kernel2],mode=0, transp=False, without=True)

    File "/home/hanumant/Downloads/scikit-tensor-master/scikit-tensor/sktensor/core.py", line 99, in ttm dims, vidx = check_multiplication_dims(mode, self.ndim, len(V), vidx=True, without=without)

    File "/home/hanumant/Downloads/scikit-tensor-master/scikit-tensor/sktensor/core.py", line 256, in check_multiplication_dims raise ValueError('More multiplicants than dimensions')

    ValueError: More multiplicants than dimensions

    please help

    opened by sandeeppandey456 0
  • sktensor.tucker.hooi doesn't return fit, itr and exectimes

    sktensor.tucker.hooi doesn't return fit, itr and exectimes

    These three metrics are returned on the CP-ALS function. but not on the HOOI.

    I believe it is a simple add, since in the hooi() function fit, itr and exectimes are already calculated.

    Maybe sktensor/tests/test_tucker_hooi.py also needs to be changed.

    opened by ddfabbro 0
  • Installation error: SyntaxError: Missing parentheses in call to 'print'. Did you mean print(mod.__version__)?

    Installation error: SyntaxError: Missing parentheses in call to 'print'. Did you mean print(mod.__version__)?

    I'm getting a fatal installation error when installing scikit-tensor on my Mac:

    ~$ pip3 --version pip 10.0.1 from /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip (python 3.6) ~$ pip3 install scikit-tensor Collecting scikit-tensor Using cached https://files.pythonhosted.org/packages/e9/5e/2ce76cc8f9da0517085e17cd70210ed996aeb8f972e7080d0bc89d82bbd9/scikit-tensor-0.1.tar.gz Complete output from command python setup.py egg_info: Traceback (most recent call last): File "", line 1, in File "/private/var/folders/4y/cwpdv5dd37q0djhsgnr558m0000b4c/T/pip-install-4m41b2gw/scikit-tensor/setup.py", line 79 print mod.version ^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print(mod.version)?

    ----------------------------------------
    

    Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/4y/cwpdv5dd37q0djhsgnr558m0000b4c/T/pip-install-4m41b2gw/scikit-tensor/ ~$

    opened by BhuvaneshBhatt 1
  • Sptensor wrong check.

    Sptensor wrong check.

    In line 71 of sptensor.py, I believe the condition to check should be len(subs) == len(vals). The reason is that we want to check the number of provided subscripts to be equal to the number of values. Currently, it is checking the dimension of the tensor to be equal to the number of values which probably isn't right. Although, it works in the provided example because the dimension, the subscripts, and the values are all three. Correct me, if I am wrong.

    opened by gaurush-hiranandani 0
Owner
Maximilian Nickel
Maximilian Nickel
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 3, 2023
Uber Open Source 1.6k Dec 31, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 5, 2023
Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir

null 6 Jun 30, 2022
A python library for easy manipulation and forecasting of time series.

Time Series Made Easy in Python darts is a python library for easy manipulation and forecasting of time series. It contains a variety of models, from

Unit8 5.2k Jan 4, 2023
STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

TD Ameritrade 2.5k Jan 6, 2023
A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

matrixprofile-ts matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keo

Target 696 Dec 26, 2022
Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

By Investors, For Investors. Want to read this in Chinese? Click here Empyrial is a Python-based open-source quantitative investment library dedicated

Santosh 640 Dec 31, 2022
SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

SageMaker Python SDK SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the S

Amazon Web Services 1.8k Jan 1, 2023
Open source time series library for Python

PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible array

Ross Taylor 2k Jan 2, 2023
MLBox is a powerful Automated Machine Learning python library.

MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle

Axel 1.4k Jan 6, 2023
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
A python library for Bayesian time series modeling

PyDLM Welcome to pydlm, a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and W

Sam 438 Dec 17, 2022
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

null 152 Jan 2, 2023
CobraML: Completely Customizable A python ML library designed to give the end user full control

CobraML: Completely Customizable What is it? CobraML is a python library built on both numpy and numba. Unlike other ML libraries CobraML gives the us

Sriram Govindan 14 Dec 19, 2021
Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Oracle 95 Dec 28, 2022