A Python Package for Convex Regression and Frontier Estimation

Overview

pyStoNED Documentation Status

pyStoNED is a Python package that provides functions for estimating multivariate convex regression, convex quantile regression, convex expectile regression, isotonic regression, stochastic nonparametric envelopment of data, and related methods. It also facilitates efficiency measurement using the conventional Data Envelopement Analysis (DEA) and Free Disposable Hull (FDH) approaches. The pyStoNED package allows practitioners to estimate these models in an open access environment under a GPL-3.0 License.

Installation

The pyStoNED package is now avaiable on PyPI and the latest development version can be installed from the Github repository pyStoNED. Please feel free to download and test it. We welcome any bug reports and feedback.

PyPI PyPI version DownloadsPyPI downloads

pip install pystoned

GitHub

pip install -U git+https://github.com/ds2010/pyStoNED

Authors

  • Sheng Dai, Ph.D. candidate, Aalto University School of Business.
  • Yu-Hsueh Fang, Computer Engineer, Institute of Manufacturing Information and Systems, National Cheng Kung University.
  • Chia-Yen Lee, Professor, College of Management, National Taiwan University.
  • Timo Kuosmanen, Professor, Aalto University School of Business.

Citation

If you use pyStoNED for published work, we encourage you to cite our following paper and other related works. We appreciate it.

Dai S, Fang YH, Lee CY, Kuosmanen T. (2021). pyStoNED: A Python Package for Convex Regression and Frontier Estimation. arXiv preprint arXiv:2109.12962.
Comments
  • StoNED and Plot2d/3d: can not plot the StoNED frontier

    StoNED and Plot2d/3d: can not plot the StoNED frontier

    Hi @JulianATA, I found we can not plot the StoNED frontier using the plot. It should be OK. Please check the following error.

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-14-cfd06442dd17> in <module>
          2 rd = StoNED.StoNED(model)
          3 model_new = rd.get_frontier(RED_MOM)
    ----> 4 plot2d(model_new, x_select=0, label_name="StoNED frontier", fig_name="stoned_2d")
    
    C:\Anaconda3\lib\site-packages\pystoned\plot.py in plot2d(model, x_select, label_name, fig_name)
         15         fig_name (String, optional): The name of figure to save. Defaults to None.
         16     """
    ---> 17     x = np.array(model.x).T[x_select]
         18     y = np.array(model.y).T
         19     if y.ndim != 1:
    
    AttributeError: 'numpy.ndarray' object has no attribute 'x'
    
    

    I have tried to add the following line to StoNED. https://github.com/ds2010/pyStoNED/blob/b673006ff8fe7152125f42702173f9ce49d1d83e/pystoned/StoNED.py#L17

    But it still does not work. Could you please help to fix it? Many thanks in advance!

    Sheng

    opened by ds2010 8
  • StoNED: can not get unconditional expected inefficiency

    StoNED: can not get unconditional expected inefficiency

    Hi @JulianATA , It seems that there is a bug in StoNED.py when calculating the unconditional expected inefficiency. Please check the following error and fix it. Thanks in advance!

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-4-8d44572d25fb> in <module>
          1 # retrive the unconditional expected inefficiency \mu
          2 rd = StoNED.StoNED(model)
    ----> 3 print(model.get_unconditional_expected_inefficiency('KDE'))
    
    AttributeError: 'CNLS' object has no attribute 'get_unconditional_expected_inefficiency'
    
    opened by ds2010 8
  • Refactor basic DEA and FDH by class

    Refactor basic DEA and FDH by class

    • This pr provides the refactored basic DEA and FDH You can test the classes with fallowing codes:

    DEA

    import pandas as pd
    import numpy as np
    
    # import the package pystoned
    from pystoned import DEA
    
    # import Finnish electricity distribution firms data
    url = 'https://raw.githubusercontent.com/ds2010/pyStoNED-Tutorials/master/Data/firms.csv'
    df = pd.read_csv(url, error_bad_lines=False)
    
    # output
    y = df['Energy']
    
    # inputs
    x1 = df['OPEX']
    x1 = np.asmatrix(x1).T
    x2 = df['CAPEX']
    x2 = np.asmatrix(x2).T
    x = np.concatenate((x1, x2), axis=1)
    
    model = DEA.DEA(y,x,"oo","vrs")
    model.optimize(False)
    model.display_theta()
    

    FDH

    import pandas as pd
    import numpy as np
    
    # import the package pystoned
    from pystoned import FDH
    
    # import Finnish electricity distribution firms data
    url = 'https://raw.githubusercontent.com/ds2010/pyStoNED-Tutorials/master/Data/firms.csv'
    df = pd.read_csv(url, error_bad_lines=False)
    
    # output
    y = df['Energy']
    
    # inputs
    x1 = df['OPEX']
    x1 = np.asmatrix(x1).T
    x2 = df['CAPEX']
    x2 = np.asmatrix(x2).T
    x = np.concatenate((x1, x2), axis=1)
    
    model = FDH.FDH(y,x,"oo")
    model.optimize(False)
    model.display_theta()
    
    • The results are identical to the original codes.
    • The StoNED + CNLSDDF has been implemented, however it is a little bit complicated.
      • To reduce the complexity of StoNED + CNLSDDF, I'm recently working on get_frontier function.
      • The get_frontier function can make the implementation of StoNED/StoNED+DDF more consistency.
    opened by Fangop 7
  • feat(StoNEZD): Implement StoNEZD by classes

    feat(StoNEZD): Implement StoNEZD by classes

    This pr provides a tiny refactor of CNLSZ and an implementation of StoNEZD. First, thanks for the major refactoring of CNLSZ class, this pr used inheritance to reduce the duplicated codes for consistency of our package. Second, the StoNEZD model has been implemented. It is now simple to implement this kind o advanced model since we have so many basic models as complements.

    The testing codes are provided below:

    CNLSZ

    from pystoned import CNLSZ
    
    import pandas as pd
    import numpy as np
    
    # import Finnish electricity distribution firms data
    url = 'https://raw.githubusercontent.com/ds2010/pyStoNED-Tutorials/master/Data/firms.csv'
    df = pd.read_csv(url, error_bad_lines=False)
    df.head(5)
    
    # output (total cost)
    y  = df['TOTEX']
    
    # inputs 
    x1  = df['Energy']
    x1  = np.asmatrix(x1).T
    x2  = df['Length']
    x2  = np.asmatrix(x2).T
    x3  = df['Customers']
    x3  = np.asmatrix(x3).T
    x   = np.concatenate((x1, x2, x3), axis=1)
    
    # Z variables
    z = df['PerUndGr']
    
    # import the CNLSZ module
    cet = "mult"
    fun = "cost"
    rts = "crs"
    
    model = CNLSZ.CNLSZ(y, x, z, cet, fun, rts)
    model.optimize()
    
    model.display_residual()
    

    StoNEDZ

    # import package pystoned
    from pystoned import StoNEDZ
    
    import pandas as pd
    import numpy as np
    
    # import Finnish electricity distribution firms data
    url = 'https://raw.githubusercontent.com/ds2010/pyStoNED-Tutorials/master/Data/firms.csv'
    df = pd.read_csv(url, error_bad_lines=False)
    df.head(5)
    
    # output (total cost)
    y  = df['TOTEX']
    
    # inputs 
    x1  = df['Energy']
    x1  = np.asmatrix(x1).T
    x2  = df['Length']
    x2  = np.asmatrix(x2).T
    x3  = df['Customers']
    x3  = np.asmatrix(x3).T
    x   = np.concatenate((x1, x2, x3), axis=1)
    
    # Z variables
    z = df['PerUndGr']
    
    # import the CNLSZ module
    cet = "mult"
    fun = "cost"
    rts = "crs"
    
    model = StoNEDZ.StoNEDZ(y, x, z, cet, fun, rts)
    model.optimize()
    
    model.display_residual()
    
    print(model.get_technical_inefficiency("MOM"))
    

    The models based on CNLS/StoNED are now available!

    • The implementation of get_frontier is in the next pr.
    • CNLSG and CNLSZG looks nice, maybe just a little modification to structure the files.
    • Maybe implement some well-know models for users, like StoNED+DDF.
    • Some other user functions can be provided, like marginal productivity.

    The models have not been refactored are below:

    • Free disposal hull
    • DEA/DEADDF
    opened by Fangop 7
  • feat(CNLS/CNLSDDF): Implementation of get_frontier.

    feat(CNLS/CNLSDDF): Implementation of get_frontier.

    The get_frontier function is for getting the value of estimated frontier(y value) by CNLS/CNLSDDF. Here is the some thought for better implementation of get_frontier. Please help me justify if my thought have some logical error.

    Since true y value = estimated y value + residual for additive models, we may implement the frontier like below:

    CNLS

    The fallowing y refer to the true y value; frontier refer to estimated y value.

    Additive

    frontier = y - residual
    

    Multiplicative

    frontier = y/(exp(residual)) -1
    

    CNLSDDF

    The fallowing y refer to the true y value; frontier refer to estimated y value.

    frontier list = y list - residual list
    
    opened by Fangop 4
  • feat(pyStoned): Implement data checking

    feat(pyStoned): Implement data checking

    This draft pr provides for pyStoNED. Both basic models and directional distance function based models are included. However, it is tricky to test all the circumstance of input. Hence this is just a draft pr.

    Please help me check if the message are providing clear information.

    I'm on the work for testing all the models, and trying to adapt DEA and FDH models to these checking. So, please do not merge this pr yet.

    opened by Fangop 3
  • feat(CNLS/tools): Implement basic error/exception system

    feat(CNLS/tools): Implement basic error/exception system

    Here is a draft for exception system.

    Sometimes, we comes to a situation that should stop the process and inform the users are defined as exceptions in python. Built-in Exceptions

    The error is included in exception. The pyStoNED brings in at least 2 types of exceptions here.

    Basic exception:

    Additive CNLS with CRS

    The additive CNLS model with CRS does not exist (or be needed). Hence when creating and additive CNLS model with CRS should raise an exception, since it is not an error but an exception of existing model.

    The following codes may halt and bring out an exception

    # import packages
    from pystoned import CNLS
    from pystoned.constant import CET_ADDI, FUN_PROD, OPT_LOCAL, RTS_CRS
    from pystoned.dataset import load_Finnish_electricity_firm
    
    # import Finnish electricity distribution firms data
    data = load_Finnish_electricity_firm(x_select=['Energy', 'Length', 'Customers'],
                                          y_select=['TOTEX'])
    
    # define and solve the CNLS model
    model = CNLS.CNLS(y=data.y, x=data.x, z=None,
                        cet = CET_ADDI, fun = FUN_PROD, rts = RTS_CRS)
    

    Please help me justify the discussion above and polish the exception message.

    Retrieving variables without optimization

    User should optimize the model before retrieving and printing any variables. If not, the program will halt the program and inform the users to optimize the model.

    The following codes may halt and bring out an exception

    # import packages
    from pystoned import CNLS
    from pystoned.constant import CET_ADDI, FUN_PROD, OPT_LOCAL, RTS_VRS
    from pystoned.dataset import load_Finnish_electricity_firm
    
    # import Finnish electricity distribution firms data
    data = load_Finnish_electricity_firm(x_select=['Energy', 'Length', 'Customers'],
                                          y_select=['TOTEX'])
    
    # define and solve the CNLS model
    model = CNLS.CNLS(y=data.y, x=data.x, z=None,
                        cet = CET_ADDI, fun = FUN_PROD, rts = RTS_VRS)
    
    
    model.display_alpha()
    

    Value error:

    Construct model with unknown parameters

    User should construct a model with constant labels in pystoned.constant. If a random string is giving, the program will halt the program and inform the users the model parameter is not defined.

    This example construct a model with a random string as cet, causing the value error.

    from pystoned import CNLS
    from pystoned.constant import CET_ADDI, FUN_PROD, OPT_LOCAL, RTS_VRS
    from pystoned.dataset import load_Finnish_electricity_firm
    
    # import Finnish electricity distribution firms data
    data = load_Finnish_electricity_firm(x_select=['Energy', 'Length', 'Customers'],
                                          y_select=['TOTEX'])
    
    # define and solve the CNLS model
    model = CNLS.CNLS(y=data.y, x=data.x, z=None,
                        cet = "Not an CET label", fun = FUN_PROD, rts = RTS_VRS)
    

    Note: This does not affect the default setting.

    Invalid email address

    When users using remote optimization, the user may use incorrect string(not an email address and OPT_LOCAL label).

    This should leads to a halt and informs the user.

    # import packages
    from pystoned import CNLS
    from pystoned.constant import CET_ADDI, FUN_PROD, OPT_LOCAL, RTS_VRS
    from pystoned.dataset import load_Finnish_electricity_firm
    
    # import Finnish electricity distribution firms data
    data = load_Finnish_electricity_firm(x_select=['Energy', 'Length', 'Customers'],
                                          y_select=['TOTEX'])
    
    # define and solve the CNLS model
    model = CNLS.CNLS(y=data.y, x=data.x, z=None,
                        cet = CET_ADDI, fun = FUN_PROD, rts = RTS_VRS)
    
    model.optimize(email="NotAnEmailAddress")
    

    Optimization multiplicative model without specifying solvers.

    When users using local optimization, the user should specify the solver for optimization.

    This should leads to a halt and informs the user to choose a installed solver.

    # import packages
    from pystoned import CNLS
    from pystoned.constant import CET_MULT, FUN_PROD, OPT_LOCAL, RTS_VRS
    from pystoned.dataset import load_Finnish_electricity_firm
    
    # import Finnish electricity distribution firms data
    data = load_Finnish_electricity_firm(x_select=['Energy', 'Length', 'Customers'],
                                          y_select=['TOTEX'])
    
    # define and solve the CNLS model
    model = CNLS.CNLS(y=data.y, x=data.x, z=None,
                        cet = CET_MULT, fun = FUN_PROD, rts = RTS_VRS)
    
    model.optimize(email=OPT_LOCAL)
    

    These modification is a draft for discussing error/exception types, situation, and the messages. Hence only the CNLS and the utils/tools module are modified as examples. Any other exceptions and errors can be included and discussed!

    Thanks!

    Note: This part may not be included in the document. Since the document should indicate the right way to use the program, and here is for prevention of wrong ways.

    opened by Fangop 3
  • Solver Binding Error

    Solver Binding Error

    Hello. Great work. I have been looking for something like this for a while.

    I am trying to run some examples but I am facing some issues with bindings ro the solver. Error message:

    "No Python bindings available for <class 'pyomo.solvers.plugins.solvers.mosek_direct.MOSEKDirect'> solver plugin"

    Any hints on how to solve this?

    opened by fmobrj 3
  • CNLSG: return then error when using the local solver

    CNLSG: return then error when using the local solver

    Hi @JulianATA, it seems that there is another bug in line 122 CNLSG. I have used the CNLSG to estimate the multiplicative cost function using a local solver MINOS, but it returns the following error:

    File "/home/dais2/anaconda3/lib/python3.8/site-packages/pystoned/CNLSG.py", line 122, in __convergence_test self.Active2[i, j] = - alpha[i] - np.sum(beta[i, :] * x[i, :]) + \ TypeError: bad operand type for unary -: 'NoneType'.

    Interestingly, when I using the 'NEOS' to solve the same model, there is no error, and I can receive the final estimation results. Further, there is no problem when we estimate the additive production function using the local solver MOSEK.

    Could you please help to check and fix it? Many thanks! For your convenience, please see the following example:

    Example

    import numpy as np
    import pandas as pd
    from pystoned import CNLSG
    from pystoned.constant import CET_MULT, FUN_COST, OPT_LOCAL, RTS_VRS
    
    
    url='https://raw.githubusercontent.com/ds2010/pyStoNED/master/pystoned/data/electricityFirms.csv'
    df = pd.read_csv(url, error_bad_lines=False)
    
    # output
    y = df['TOTEX']
    
    # inputs
    x1 = df['Energy']
    x1 = np.asmatrix(x1).T
    x2 = df['Length']
    x2 = np.asmatrix(x2).T
    x3 = df['Customers']
    x3 = np.asmatrix(x3).T
    x = np.concatenate((x1, x2, x3), axis=1)
    
    model = CNLSG.CNLSG(y, x, z=None, cet=CET_MULT, fun=FUN_COST, rts=RTS_VRS)
    model.optimize(OPT_LOCAL)
    
    model.display_beta()
    
    opened by ds2010 3
  • feat(dataset): Implement dataset support

    feat(dataset): Implement dataset support

    Hi, I recently considered about the example we used for testing pystoned could be a feature.

    This is inspired by sklearn, which provides user toy datasets for better comprehension of the usage/feature of the model. The toy datasets made sklearn the wildly used all over the world, since it is pretty easy to use/comprehend for the beginners.

    This pr reduce the complexity of the use of the datasets Original:

    import pandas as pd
    import numpy as np
    
    url = 'https://raw.githubusercontent.com/ds2010/pyStoNED-Tutorials/master/Data/firms.csv'
    df = pd.read_csv(url, error_bad_lines=False)
    df.head(5)
    
    # output
    y = df['Energy']
    
    # inputs
    x1 = df['OPEX']
    x1 = np.asmatrix(x1).T
    x2 = df['CAPEX']
    x2 = np.asmatrix(x2).T
    x = np.concatenate((x1, x2), axis=1)
    

    This pr:

    from pystoned import dataset
    
    x, y = dataset.firm(['OPEX', 'CAPEX'], 'Energy')
    

    This pr is not yet finished

    Please give me the information of the datasets, in order to:

    • making sure the datasets are used in rational way
    • give the user the brief introduction of the dataset
    • etc..

    thanks for your review, do not merge yet!

    opened by Fangop 3
  • API documentations

    API documentations

    The new pr #23 (Autodoc) works well locally but does not on the ReadTheDocs. You can check the CNLS API in the website generated by ReadTheDocs. It is empty. However, if we compile the sphinx locally using make html, the docstring will show in the HTML file. See the following screenshot.

    Screenshot from 2020-12-06 22-10-46

    I failed to fix it. Since the website is automatically generated by the ReadTheDocs, @JulianATA , could you please help me to fix it? Thanks in advance!

    opened by ds2010 2
Owner
Sheng Dai
Ph.D student in Management Science at Aalto University School of Business. My research area is productivity and efficiency analysis.
Sheng Dai
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Denis Emelin 42 Nov 24, 2022
Code in PyTorch for the convex combination linear IAF and the Householder Flow, J.M. Tomczak & M. Welling

VAE with Volume-Preserving Flows This is a PyTorch implementation of two volume-preserving flows as described in the following papers: Tomczak, J. M.,

Jakub Tomczak 87 Dec 26, 2022
Universal Probability Distributions with Optimal Transport and Convex Optimization

Sylvester normalizing flows for variational inference Pytorch implementation of Sylvester normalizing flows, based on our paper: Sylvester normalizing

Rianne van den Berg 172 Dec 13, 2022
Convex optimization for fun and profit.

CFMM Optimal Routing This repository contains the code needed to generate the figures used in the paper Optimal Routing for Constant Function Market M

Guillermo Angeris 183 Dec 29, 2022
Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression

Quantile Regression DQN Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression (https://arx

Arsenii Senya Ashukha 80 Sep 17, 2022
Riemannian Convex Potential Maps

Modeling distributions on Riemannian manifolds is a crucial component in understanding non-Euclidean data that arises, e.g., in physics and geology. The budding approaches in this space are limited by representational and computational tradeoffs. We propose and study a class of flows that uses convex potentials from Riemannian optimal transport. These are universal and can model distributions on any compact Riemannian manifold without requiring domain knowledge of the manifold to be integrated into the architecture. We demonstrate that these flows can model standard distributions on spheres, and tori, on synthetic and geological data.

Facebook Research 61 Nov 28, 2022
Neural Fixed-Point Acceleration for Convex Optimization

Licensing The majority of neural-scs is licensed under the CC BY-NC 4.0 License, however, portions of the project are available under separate license

Facebook Research 27 Oct 6, 2022
ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

Katherine Crowson 53 Dec 29, 2022
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
HPRNet: Hierarchical Point Regression for Whole-Body Human Pose Estimation

HPRNet: Hierarchical Point Regression for Whole-Body Human Pose Estimation Official PyTroch implementation of HPRNet. HPRNet: Hierarchical Point Regre

Nermin Samet 53 Dec 4, 2022
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

GDR-Net This repo provides the PyTorch implementation of the work: Gu Wang, Fabian Manhardt, Federico Tombari, Xiangyang Ji. GDR-Net: Geometry-Guided

null 169 Jan 7, 2023
Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Human Pose Regression with Residual Log-likelihood Estimation [Paper] [arXiv] [Project Page] Human Pose Regression with Residual Log-likelihood Estima

JeffLi 347 Dec 24, 2022
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

timeseriesAI 2.8k Jan 8, 2023
Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

Sayom Shakib 4 Nov 3, 2022
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Dec 31, 2022
Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

Pranav 39 Nov 21, 2022
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 4, 2023
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 5.7k Feb 12, 2021