Statsmodels: statistical modeling and econometrics in Python

statsmodels

Last update: Dec 29, 2022

Related tags

Data Analysis python statistics econometrics data-analysis regression-models generalized-linear-models timeseries-analysis

Overview

About statsmodels

statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics and estimation and inference for statistical models.

Documentation

The documentation for the latest release is at

https://www.statsmodels.org/stable/

The documentation for the development version is at

https://www.statsmodels.org/dev/

Recent improvements are highlighted in the release notes

https://www.statsmodels.org/stable/release/version0.9.html

Backups of documentation are available at https://statsmodels.github.io/stable/ and https://statsmodels.github.io/dev/.

Main Features

Linear regression models:
- Ordinary least squares
- Generalized least squares
- Weighted least squares
- Least squares with autoregressive errors
- Quantile regression
- Recursive least squares
Mixed Linear Model with mixed effects and variance components
GLM: Generalized linear models with support for all of the one-parameter exponential family distributions
Bayesian Mixed GLM for Binomial and Poisson
GEE: Generalized Estimating Equations for one-way clustered or longitudinal data
Discrete models:
- Logit and Probit
- Multinomial logit (MNLogit)
- Poisson and Generalized Poisson regression
- Negative Binomial regression
- Zero-Inflated Count models
RLM: Robust linear models with support for several M-estimators.
Time Series Analysis: models for time series analysis
- Complete StateSpace modeling framework
  - Seasonal ARIMA and ARIMAX models
  - VARMA and VARMAX models
  - Dynamic Factor models
  - Unobserved Component models
- Markov switching models (MSAR), also known as Hidden Markov Models (HMM)
- Univariate time series analysis: AR, ARIMA
- Vector autoregressive models, VAR and structural VAR
- Vector error correction modle, VECM
- exponential smoothing, Holt-Winters
- Hypothesis tests for time series: unit root, cointegration and others
- Descriptive statistics and process models for time series analysis
Survival analysis:
- Proportional hazards regression (Cox models)
- Survivor function estimation (Kaplan-Meier)
- Cumulative incidence function estimation
Multivariate:
- Principal Component Analysis with missing data
- Factor Analysis with rotation
- MANOVA
- Canonical Correlation
Nonparametric statistics: Univariate and multivariate kernel density estimators
Datasets: Datasets used for examples and in testing
Statistics: a wide range of statistical tests
- diagnostics and specification tests
- goodness-of-fit and normality tests
- functions for multiple testing
- various additional statistical tests
Imputation with MICE, regression on order statistic and Gaussian imputation
Mediation analysis
Graphics includes plot functions for visual analysis of data and model results
I/O
- Tools for reading Stata .dta files, but pandas has a more recent version
- Table output to ascii, latex, and html
Miscellaneous models
Sandbox: statsmodels contains a sandbox folder with code in various stages of development and testing which is not considered "production ready". This covers among others
- Generalized method of moments (GMM) estimators
- Kernel regression
- Various extensions to scipy.stats.distributions
- Panel data models
- Information theoretic measures

How to get it

The master branch on GitHub is the most up to date code

https://www.github.com/statsmodels/statsmodels

Source download of release tags are available on GitHub

https://github.com/statsmodels/statsmodels/tags

Binaries and source distributions are available from PyPi

https://pypi.org/project/statsmodels/

Binaries can be installed in Anaconda

conda install statsmodels

Installing from sources

See INSTALL.txt for requirements or see the documentation

https://statsmodels.github.io/dev/install.html

Contributing

Contributions in any form are welcome, including:

Documentation improvements
Additional tests
New features to existing models
New models

https://www.statsmodels.org/stable/dev/test_notes

for instructions on installing statsmodels in editable mode.

License

Modified BSD (3-clause)

Discussion and Development

Discussions take place on the mailing list

https://groups.google.com/group/pystatsmodels

and in the issue tracker. We are very interested in feedback about usability and suggestions for improvements.

Bug Reports

Bug reports can be submitted to the issue tracker at

https://github.com/statsmodels/statsmodels/issues

Comments

Gam gsoc2015
@josef-pkt I am starting a PR . At the moment there is the gam file that contains the gam penalty class. Smooth_basis contains some functions to get bsplines and polynomial basis. This file will be removed when we will be able to use directly patsy.

There are also 2 files that contains examples or small scripts. We will remove them later.

Let me know what do you think about that.

Todo

[ ] predict errors (stateful transform, patsy ?), Note fittedvalues are available only a problem in pirls example, predict after fit works, requires spline basis values

[ ] get_prediction also errors, maybe consequence of predict error currently errors because weights is None

[ ] check test coverage for offset and exposure

Interface

[ ] partial_values plot_partial has inconvenient arguments (smoother and mask) instead of column index or term name or similar

[ ] formula-like interface for predict (create spline basis values internally)

[ ] adjust inherited methods like plot_partial_residuals (after GSOC?)

[ ] default param names for splines (should be more informative than "xi" (i is range(k)), but less (?) verbose than patsy's), need them for test_terms

type-enh comp-base comp-genmod
opened by DonBeo 232
Multivariate Kalman Filter
Here's a simple branch with the code added into statsmodels.tsa.statespace. A couple of thoughts

At least in the dev process, I thought it might be nicer to keep it in its own module, rather than putting it with the kalmanf. I don't know what makes the most sense in the long run.

I have unit tests that rely on the statespace model, but I'm rewriting them so the KF pull request can be done on its own without other dependencies, especially since the statespace model is likely to change.

Edit: Original line comments.

L72

Question: do we want to keep the single-precision version (and the complex single precision, below)? I don't really see a use case, and it appears from preliminary tests that the results tend to overflow. Maybe I'll post a unit test to demonstrate and we can go from there.

L332

Question: there are a bunch of ways to initialize the KF, depending on the theory underlying the data. This one is only valid for stationary processes. Probably best to move out to the Python wrapper?

L444 Question: I think we'll want to add the ability to specify whether or not to check for convergence and alter the tolerance.

L414

This inversion is using an LU decomposition, but I think in this case I can rely on f to be positive definite since it's the covariance matrix of the forecast error, in which case I could use the much faster Cholesky decomposition approach. This is something I'm looking into, but if you happen to know one way or the other, that would be great too.

Related to this is the idea that you should "never take inverses", and I guess I need to look into replacing this with a linear solver routine if possible, in the updating stage below.

comp-tsa type-enh
opened by ChadFulton 213
New kernel_methods module for new KDE implementation

This is a new version of the kernel density estimation. The purpose is to provide an implementation that is faster in the case of grid evaluation, and also works on bounded domains.

There is still some work to do, in particular, I need to add tests for the multi-dimensional and discrete densities.
type-enh comp-nonparametric type-refactor

opened by PierreBdR 183
GSOC2017 Zero-Inflated models
This model include following models:

Generic Zero Inflated model

Zero-Inflated Poisson model

Zero-Inflated Generalized Poisson model (ZIGP-P)

Zero-Inflated Generalized Negative Binomial model (ZINB-P)

Each model include this parts:

[x] LLF

[x] Score

[x] Hessian

[x] Predict

[x] Fit

[x] Docs

[x] Tests

Status: - reviewing, need to implement better way to generate start_params Last commit for end of GSoC17: Changed way to find start params
type-enh comp-discrete
opened by evgenyzhurko 170
GSOC2017 Generalized Poisson (GP-P) model
This PR include implementation of Generalized Poisson model This model include this parts:

[x] Log-likelihood function

[x] Score

[x] Hessian

[x] Fit

[x] Result

[x] Tests

[x] Docs

Status - merged #3795
rejected
opened by evgenyzhurko 149
GSoC 2016: State-Space Models with Markov Switching

Hi, I have started implementing Kim Filter, outlined a basic functionality, as described in Kim-Nelson book (see diagram on p. 105). I didn't even run the code yet to check for errors. Coding style and class interface bother me more for the moment, as well as the possible ways to test it without implementing models.
comp-tsa type-enh

opened by ValeryTyumen 129
ENH: Revise loglike/deviance to correctly handle scale

xref #3773

I apologize for this taking longer than had hoped, but my busy period was exacerbated by some unforeseen events. But I'm somewhat back now.

This is the first step to solving the above issue.

Still a lot is missing... I've only really gotten loglike to work for all the families except Gamma, Binomial, Gaussian, and Tweedie (which has never had loglike). I also need to re-work the docstrings. I'm thinking its more logical to have a loglike_obs function called in each family and then have a loglike function in the Family class so that loglike will simply be inherited. I think you the doctrings could be elegantly written to handle this too.

I'm thinking I will work on deviance next and then ciricle back to loglike... I feel like R takes some computational shortcuts...
type-enh comp-genmod type-refactor

opened by thequackdaddy 125
WIP/ENH Gam 2744 rebased2

rebased version of #4575 which was rebased version of #2744 The original GSOC PR with most of the development discussion is #2435

rebase conflict in compat.python: This has now unneeded itertools.combinations import but I dropped the py 2.6 compat code.
type-enh comp-genmod

opened by josef-pkt 104
ENH: Add var_weights to GLM

Hello,

xref #3371

This (should) get var_weights to work for GLM. I added a Tweedie usecase against R.

I can't seem to get the Poisson with var_weights to HC0 test that was added (but disabled because of the lack of functionality) to work. The test is here:

https://github.com/thequackdaddy/statsmodels/blob/var_weight/statsmodels/genmod/tests/test_glm_weights.py#L142

It fails on the following assert

assert_allclose(res1.bse, corr_fact * res2.bse, atol= 1e-6, rtol=2e-6)

Its relatively close... consistently off by a factor of 0.98574... corr_fact brings it much closer... I'm wondering if another adjustment is necessary?

Honestly, I don't really understand much about sandwiches (except PBJ, meatball, and bánh mì).

Rest of the test seem to work pretty well.

I'd be happy to have this merged relatively soon, so thanks for the feedback and review!
type-enh comp-genmod topic-weights

opened by thequackdaddy 101

[MRG] Add MANOVA class

PR as discussed in #3274. Tested with a SAS example and two R examples, produced the same results.

To-do:

[X] Core stats computation
[x] api
[x] automatic create dummy variable and hypothesis testing for categorical type independent variables.
[x] Input validation
[x] More examples to be tested

references: [1] https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_introreg_sect012.htm) [2] GLM Algorithms ftp://public.dhe.ibm.com/software/analytics/spss/documentation/statistics/20.0/en/client/Manuals/IBM_SPSS_Statistics_Algorithms.pdf [3] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.278.6976&rep=rep1&type=pdf

Unit test in test_MultivariateOLS.py

compare_r_output_dogs_data()

It reproduces results from the following R code:

library(car)
Drug = c('Morphine', 'Morphine', 'Morphine', 'Morphine', 'Morphine', 'placebo', 'placebo', 'placebo', 'placebo', 'placebo', 'Trimethaphan', 'Trimethaphan', 'Trimethaphan', 'Trimethaphan', 'Trimethaphan')
Depleted = c('N', 'N', 'N', 'N', 'Y', 'Y', 'Y', 'N', 'N', 'N', 'N', 'Y', 'Y', 'Y', 'Y')
subject  = c(1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
Histamine0 = c(-3.218876, -3.912023, -2.65926, -1.771957, -2.302585, -2.65926, -2.995732, -3.506558, -3.506558, -2.65926, -2.407946, -2.302585, -2.525729, -2.040221, -2.813411)
Histamine1 = c(-1.609438, -2.813411, 0.336472, -0.562119, -2.407946, -2.65926, -2.65926, -0.478036, 0.04879, -0.18633, 1.141033, -2.407946, -2.407946, -2.302585, -2.995732)
Histamine3 = c(-2.302585, -3.912023, -0.733969, -1.049822, -2.040221, -2.813411, -2.813411, -1.171183, -0.314711, 0.067659, 0.722706, -2.407946, -2.407946, -2.120264, -2.995732)
Histamine5 = c(-2.525729, -3.912023, -1.427116, -1.427116, -1.966113, -2.65926, -2.65926, -1.514128, -0.510826, -0.223144, 0.207014, -2.525729, -2.302585, -2.120264, -2.995732)
data = data.frame(Histamine0, Histamine1, Histamine3, Histamine5)
hismat = as.matrix(data[,1:4])
result = lm(hismat ~ Drug * Depleted)
linearHypothesis(result, c(1, 0, 0, 0, 0, 0)) 
linearHypothesis(result, t(cbind(c(0, 1, 0, 0, 0, 0), c(0, 0, 1, 0, 0, 0))))
linearHypothesis(result, c(0, 0, 0, 1, 0, 0)) 
linearHypothesis(result, t(cbind(c(0, 0, 0, 0, 1, 0), c(0, 0, 0, 0, 0, 1))))
# Or ManRes <- Manova(result, type="III")

test_affine_hypothesis()

It reproduces results from the following R code:

result = lm(hismat ~ Drug*Depleted)
fml = t(cbind(c(0, 1.1, 1.2, 1.3, 1.4, 1.5), c(0, 2.1, 3.2, 3.3, 4.4, 5.5)))
linearHypothesis(result, fml, t(cbind(c(1, 2, 3, 4), c(5, 6, 7, 8))), verbose=TRUE)

type-enh comp-multivariate

opened by yl565 101

Distributed estimation
Ok, I wanted to make a PR for this, there are still a couple of things that I need to fix but things are pretty close to done and I'm happy with the current approach.

The key part of the current approach is a function distributed_estimation. This function works by taking a generator for endog and exog, endog_generator and exog_generator, as well as a series of functions and key word arguments to be run on each machine and then used to recombine the results. The generator approach allows for a variety of use cases and can handle a lot of the ideas discussed in the initial proposal. For each data set yielded by the generators a model is initialized using model_class and init_kwds and then the function estimation_method is applied to the model along with the key words fit_kwds and estimation_kwds. Finally, the results are recombined from each data set using join_method.

Currently, this defaults to the distributed regularized approach discussed here:

http://arxiv.org/pdf/1503.04337v3.pdf

but the way I've set things up means that the user should be able to apply any number of procedures here.

The current todo list:

[x] Fix hess_obs

[x] Fix joblib fit

[x] Add dask fit

[x] Add WLS/GLS for debiased regularized fit

[x] Add likelihood result

[x] Add data tests

Let me know if there are any comments/questions/criticisms, as I've said, it certainly isn't complete but I wanted to get this out there so I can start integrating any changes as I finish it up.
type-enh comp-base comp-genmod comp-regression
opened by lbybee 85
Feature Request: Addition of non-parametric test based on Chebyshev's Inequality

Is your feature request related to a problem? Please describe

Whenever I need to perform a non-parametric test using Chebyshev's Inequality, I have to write the equation myself. It would be good to have it as a part of statsmodel

Describe the solution you'd like

A function which takes in a distribution(array), a scalar value to be used for the test, the type of test(one tailed/two tailed), confidence and returns whether the scalar value is within the limits

Describe alternatives you have considered

Currently I have to write the code myself so I don't have any other alternative, would be better to have it as a part of statsmodel

Additional context

If anyone thinks this will be helpful, do let me know, I can raise a pull request.

opened by oshin94 2
Question: robust poisson regression implementation status
I am currently looking for how to implement robust poisson regression in Python. I found one article explaining the way to implement the robust poisson regression in R language using the similar method implemented in statsmodels.

POISSON REGRESSION | R DATA ANALYSIS EXAMPLES

The following code creates the output in R,

library(sandwich) p <- read.csv("https://stats.idre.ucla.edu/stat/data/poisson_sim.csv") p <- within(p, { prog <- factor(prog, levels=1:3, labels=c("General", "Academic", "Vocational")) id <- factor(id) }) summary(p) summary(m1 <- glm(num_awards ~ prog + math, family="poisson", data=p)) cov.m1 <- vcovHC(m1, type="HC0") std.err <- sqrt(diag(cov.m1)) r.est <- cbind(Estimate= coef(m1), "Robust SE" = std.err, "Pr(>|z|)" = 2 * pnorm(abs(coef(m1)/std.err), lower.tail=FALSE), LL = coef(m1) - 1.96 * std.err, UL = coef(m1) + 1.96 * std.err) r.est

and here is the corresponding code in statsmodels.

import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf data = pd.read_csv("https://stats.idre.ucla.edu/stat/data/poisson_sim.csv") res = smf.poisson("num_awards ~ C(prog) + math", data=data).fit(cov_type="HC0") res.summary()

It seems the results from both codes are matched quite well. However, exploring the past threads about heteroscedasticity for poisson regression, for example

Standard Error differences Binomial Regression vs Poisson Regression

ENH: handling cluster_weights, freq_weights in models, cov_type, moments, descriptive statistics

I am afraid even cov_type HC0 is correctly implemented or not. If you have time, could you explain the current implementation situation around heteroscedasticity of the poisson regression? What is the potential problem not covered by the current package, and what is currently firmly implemented?
opened by toshiakiasakura 1
MAINT/REF: remove extradoc from distribution, scipy deprecation

see https://github.com/statsmodels/statsmodels/issues/8538#issuecomment-1366169937

errors in pre-release testing of scipy extradoc has been removed from keyword arguments

this comments out extradoc keyword in sandbox distributions no alternative How to modify docs to add extra info is open.
prio-high comp-distributions maintenance

opened by josef-pkt 0
[ENH] Add CRV3 Inference via the Cluster Jackknife to OLS and WLS #8461
Hi all!

This PR closes #8461.

It adds support for CRV3 cluster robust inference for OLS and WLS via a cluster jackknife.

In short, the PR adds a new cov_type, cluster-jackknife, which can be run in the following way:

crv3 = model.fit(cov_type='cluster-jackknife', cov_kwds={'groups': id})

I have updated the docstring in linear_model.py.

I have not yet added tests, as I'd first like to get feedback on where and how to test the new cov_type (and I also have to get up to speed with pytest) . There are multiple strategies how to test the new feature:

test against implementations in other languages (e.g. Stata's summclust, R's summclust, or R's sandwich::vcovJK).

internal tests: under appropriate small sample corrections and when clusters are singletons, the cluster jackknife is equivalent to the HC3 estimator.

Below is a small example of the ladder test, using the new cluster-jackknife covtype and HC3 inference:

from statsmodels.regression.linear_model import OLS from statsmodels.datasets import grunfeld from statsmodels.tools.tools import add_constant import pandas as pd import numpy as np dtapa = grunfeld.data.load_pandas() dtapa_endog = dtapa.endog[:200] dtapa_exog = dtapa.exog[:200] exog = add_constant(dtapa_exog[["value", "capital"]], prepend=False) N = exog.shape[0] id = pd.Series(range(0, N)) res_hc3 = OLS(dtapa_endog, exog).fit(cov_type = "HC3") res_crv3 = OLS(dtapa_endog, exog).fit(cov_type = "cluster-jackknife", cov_kwds={'groups': id}) res = pd.DataFrame({'tstat hc3:': res_hc3.tvalues * np.sqrt((N-1) / (N)), 'tstat crv3:' : res_crv3.tvalues}) res # tstat hc3: tstat crv3: # value 16.093573 16.093573 # capital 3.932688 3.932688 # const -3.040458 -3.040458

Let me know which types of tests you'd like to see =)
opened by s3alfisc 3
statsmodels.tsa.x13.x13_arima_analysis & Change Specifications

Is your feature request related to a problem? Please describe

I want to use "X-13ARIMA-SEATS Seasonal Adjustment Program" via Python. I know statsmodels.tsa.x13.x13_arima_analysis is the Python wrapper.

Using statsmodels, is it possible to provide the inputs (specification) files into the Seasonal adjustment procedure? Eg, I want to add/drop certain outliers or regressors. How do I go about doing it in statsmodels?

Thank you

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you have considered

A clear and concise description of any alternative solutions or features you have considered.

Additional context

Add any other context about the feature request here.
comp-tsa

opened by PL450 2
ENH: penalized mixin: custom penalization with model properties

(semi-random idea while browsing SAS docs for logistic and firth penalization)

related to Firth penalization #3561

reference Heinze, G., and Schemper, M. (2002). “A Solution to the Problem of Separation in Logistic Regression.” Statistics in Medicine 21:2409–2419.

penalized loglikelihood is log L*(beta) = log L*(beta) + 1/2 log | I(beta)| where I(beta) is the information matrix evaluated at beta

That means we need the penalization class to have access to the model so we can reuse the hessian (OIM or EIM) This might currently be tricky or impossible because we define the penalization instance before creating a penalized model instance.

One possibility may be to assign a _model attribute to the penalty instance after creating the model. This makes it a bit circulate The penalized model instance holds the penalty instance which holds the penalized model instance as attributes. I guess the circularity cannot be avoided unless we use extra args in each method of the penalty class.

Note we need the hessian/EIM of the original model and not the penalized hessian of the penalized model.
type-enh comp-genmod comp-discrete topic-penalization

opened by josef-pkt 0

Releases(v0.13.5)

v0.13.5(Nov 2, 2022)

The statsmodels developers are happy to announce the Python 3.11 compatibility release for the 0.13 branch.

This release contains no bug fixes other than any needed to ensure statsmodels is compatible with Python 3.11. It also resolves an issue with PyPI that affects 0.13.4.
Source code(tar.gz)
Source code(zip)
v0.13.4(Nov 1, 2022)

The statsmodels developers are happy to announce the Python 3.11 compatibility release for the 0.13 branch. This release contains no bug fixes other than any needed to ensure statsmodels is compatible with Python 3.11. It also resolves an issue with the source code generation in 0.13.3 that affects installs on Python 3.11 that use the source tarball.
Source code(tar.gz)
Source code(zip)
v0.13.3(Nov 1, 2022)

The statsmodels developers are happy to announce the Python 3.11 compatibility release for the 0.13 branch. This release contains no bug fixes other than any needed to ensure statsmodels is compatible with Python 3.11.
Source code(tar.gz)
Source code(zip)
v0.13.2(Feb 8, 2022)

The statsmodels developers are happy to announce the bugfix release for the 0.13 branch. This release fixes 10 bugs and provides protection against changes in recent versions of upstream packages.
Source code(tar.gz)
Source code(zip)
v0.13.1(Nov 12, 2021)

The statsmodels developers are happy to announce the bug fix release for the 0.13 branch. This release fixes 8 bugs and brings initial support for Python 3.10.
Source code(tar.gz)
Source code(zip)
v0.13.0(Oct 1, 2021)
The statsmodels developers are happy to announce the first release candidate for 0.13.0. 227 issues were closed in this release and 143 pull requests were merged. Major new features include:

Autoregressive Distributed Lag models

Copulas

Ordered Models (Ordinal Regression)

Beta Regression

Improvements to ARIMA estimation options

Source code(tar.gz)
Source code(zip)
v0.13.0rc0(Sep 17, 2021)
The statsmodels developers are happy to announce the first release candidate for 0.13.0. 227 issues were closed in this release and 143 pull requests were merged. Major new features include:

Autoregressive Distributed Lag models

Copulas

Ordered Models (Ordinal Regression)

Beta Regression

Improvements to ARIMA estimation options

Source code(tar.gz)
Source code(zip)
v0.12.2(Feb 2, 2021)

This is a bug-fix release from the 0.12.x branch. Users are encouraged to upgrade.

Notable changes include fixes for a bug that could lead to incorrect results in forecasts with the new ARIMA model (when d > 0 and trend='t') and a bug in the LM test for autocorrelation.
Source code(tar.gz)
Source code(zip)
v0.12.1(Oct 29, 2020)

This is a minor release from the 0.12.x branch with bug fixes and essential maintenance only.
Source code(tar.gz)
Source code(zip)
v0.12.0(Aug 27, 2020)
The statsmodels developers are happy to announce release 0.12.0. 239 issues were closed in this release and 221 pull requests were merged.

Major new features include:

New exponential smoothing model: ETS (Error, Trend, Seasonal)

New dynamic factor model for large datasets and monthly/quarterly mixed frequency models

Decomposition of forecast updates based on the "news"

Sparse Cholesky Simulation Smoother

Option to use Chandrasekhar recursions

Two popular methods for forecasting time series, forecasting after STL decomposition and the Theta model

Functions for constructing complex Deterministic Terms in time series models

New statistics function: one-way ANOVA-type tests, hypothesis tests for 2-samples and meta-analysis.

Source code(tar.gz)
Source code(zip)
v0.12.0rc0(Aug 11, 2020)
The statsmodels developers are happy to announce the first release candidate for 0.12.0. 223 issues were closed in this release and 208 pull requests were merged. Major new features include:

New exponential smoothing model: ETS (Error, Trend, Seasonal)

New dynamic factor model for large datasets and monthly/quarterly mixed frequency models

Decomposition of forecast updates based on the "news"

Sparse Cholesky Simulation Smoother

Option to use Chandrasekhar recursions

Two popular methods for forecasting time series, forecasting after STL decomposition and the Theta model

Functions for constructing complex Deterministic Terms in time series models

Source code(tar.gz)
Source code(zip)
v0.11.1(Feb 21, 2020)

This is a bug fix release. It fixes a small number of bugs including two that affect the installation on statmodels on Python 2.7 and 3.8.

See the full release notes (or in rst format) for the full set of backported pull requests.
Source code(tar.gz)
Source code(zip)
v0.11.0(Jan 22, 2020)
statsmodels developers are happy to announce a new release.

Major new features include:

Regression

Rolling OLS and WLS

Statistics

Oaxaca-Blinder decomposition

Distance covariance measures (new in RC2)

New regression diagnostic tools (new in RC2)

Statespace Models

Statespace-based Linear exponential smoothing models¶

Methods to apply parameters fitted on one dataset to another dataset¶

Method to hold some parameters fixed at known values

Option for low memory operations

Improved access to state estimates

Improved simulation and impulse responses for time-varying models

Time-Series Analysis

STL Decomposition

New AR model

New ARIMA model

Zivot-Andrews Test

More robust regime-switching models

See release notes for full details.
Source code(tar.gz)
Source code(zip)
v0.11.0rc2(Jan 15, 2020)
The second and final release candidate for statsmodels 0.11.

Major new features include:

Regression

Rolling OLS and WLS

Statistics

Oaxaca-Blinder decomposition

Distance covariance measures (new in RC2)

New regression diagnostic tools (new in RC2)

Statespace Models

Statespace-based Linear exponential smoothing models¶

Methods to apply parameters fitted on one dataset to another dataset¶

Method to hold some parameters fixed at known values

Option for low memory operations

Improved access to state estimates

Improved simulation and impulse responses for time-varying models

Time-Series Analysis

STL Decomposition

New AR model

New ARIMA model

Zivot-Andrews Test

More robust regime-switching models

See release notes for full details.
Source code(tar.gz)
Source code(zip)
v0.11.0rc1(Dec 18, 2019)
Release candidate for statsmodels 0.11.

Major new features include:

Regression

Rolling OLS and WLS

Statistics

Oaxaca-Blinder decomposition

Statespace Models

Statespace-based Linear exponential smoothing models¶

Methods to apply parameters fitted on one dataset to another dataset¶

Method to hold some parameters fixed at known values

Option for low memory operations

Improved access to state estimates

Improved simulation and impulse responses for time-varying models

Time-Series Analysis

STL Decomposition

New AR model

New ARIMA model

Zivot-Andrews Test

More robust regime switching models

See release notes for full details.
Source code(tar.gz)
Source code(zip)
v0.10.2(Nov 23, 2019)
This is a minor release from the 0.10.x branch with bug fixes and essential maintenance only. The key new feature is:

Compatibility with Python 3.8

Source code(tar.gz)
Source code(zip)
v0.10.1(Jul 19, 2019)
This is a minor release from the 0.10.x branch with bug fixes and essential maintenance only. The key features are:

Compatibility with pandas 0.25

Compatibility with Numpy 1.17

Source code(tar.gz)
Source code(zip)
v0.10.0(Jun 24, 2019)
This is a major release from 0.9.0 and includes a number new statistical models and many bug fixes.

Highlights include:

Generalized Additive Models. This major feature is experimental and may change.

Conditional Models such as ConditionalLogit, which are known as fixed effect models in Econometrics.

Dimension Reduction Methods include Sliced Inverse Regression, Principal Hessian Directions and Sliced Avg. Variance Estimation

Regression using Quadratic Inference Functions (QIF)

Gaussian Process Regression

See the release notes for a full list of all the change from 0.9.0.

python -m pip install --upgrade statsmodels

Note that 0.10.x will likely be the last series of releases to support Python 2, so please consider upgrading to Python 3 if feasible.

Please report any issues with the release candidate on the statsmodels issue tracker.
Source code(tar.gz)
Source code(zip)
v0.10.0rc2(Jun 7, 2019)

Release candidate for 0.10.0.

See the release notes in the documentation for details on the changes.
Source code(tar.gz)
Source code(zip)
statsmodels-0.10.0rc2.tar.gz(13.39 MB)
v0.9.0(May 15, 2018)

Source code(tar.gz)
Source code(zip)
0.9.0rc1(Apr 30, 2018)

Source code(tar.gz)
Source code(zip)
v0.8.0(Feb 8, 2017)

Source code(tar.gz)
Source code(zip)
v0.8.0rc1(Jun 21, 2016)

Source code(tar.gz)
Source code(zip)

Owner

statsmodels

GitHub http://www.statsmodels.org/devel/

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is a state-of-the-art platform for statistical modeling and high-

229 Dec 29, 2022

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

2.2k Dec 25, 2022

Describing statistical models in Python using symbolic formulas

Patsy is a Python library for describing statistical models (especially linear models, or models that have a linear component) and building design mat

866 Dec 16, 2022

Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

1.2k Dec 31, 2022

statDistros is a Python library for dealing with various statistical distributions

StatisticalDistributions statDistros statDistros is a Python library for dealing with various statistical distributions. Now it provides various stati

1 Oct 3, 2021

Probabilistic reasoning and statistical analysis in TensorFlow

TensorFlow Probability TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFl

3.8k Jan 5, 2023

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine Intro This repo contains the python/stan version of the Statistical Rethinking

3 Nov 8, 2022

Creating a statistical model to predict 10 year treasury yields

Predicting 10-Year Treasury Yields Intitially, I wanted to see if the volatility in the stock market, represented by the VIX index (data source), had

10 Oct 27, 2021

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

7.2k Dec 30, 2022

BioMASS - A Python Framework for Modeling and Analysis of Signaling Systems

Mathematical modeling is a powerful method for the analysis of complex biological systems. Although there are many researches devoted on produ

22 Dec 27, 2022

A Python package for the mathematical modeling of infectious diseases via compartmental models

A Python package for the mathematical modeling of infectious diseases via compartmental models. Originally designed for epidemiologists, epispot can be adapted for almost any type of modeling scenario.

12 Dec 28, 2022

OpenDrift is a software for modeling the trajectories and fate of objects or substances drifting in the ocean, or even in the atmosphere.

opendrift OpenDrift is a software for modeling the trajectories and fate of objects or substances drifting in the ocean, or even in the atmosphere. Do

167 Dec 13, 2022

We're Team Arson and we're using the power of predictive modeling to combat wildfires.

We're Team Arson and we're using the power of predictive modeling to combat wildfires. Arson Map Inspiration There’s been a lot of wildfires in Califo

3 Oct 17, 2021

A real data analysis and modeling project - restaurant inspections

A real data analysis and modeling project - restaurant inspections Jafar Pourbemany 9/27/2021 This project represents data analysis and modeling of re

2 Aug 21, 2022

Flood modeling by 2D shallow water equation

hydraulicmodel Flood modeling by 2D shallow water equation. Refer to Hunter et al (2005), Bates et al. (2010). Diffusive wave approximation Local iner

6 Nov 30, 2022

Python Kalman filtering and optimal estimation library. Implements Kalman filter, particle filter, Extended Kalman filter, Unscented Kalman filter, g-h (alpha-beta), least squares, H Infinity, smoothers, and more. Has companion book 'Kalman and Bayesian Filters in Python'.

FilterPy - Kalman filters and other optimal and non-optimal estimation filters in Python. NOTE: Imminent drop of support of Python 2.7, 3.4. See secti

2.5k Dec 30, 2022

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

102 Nov 10, 2022

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

???? ??. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

97 Dec 8, 2022

Example Of Splunk Search Query With Python And Splunk Python SDK

SSQAuto (Splunk Search Query Automation) Example Of Splunk Search Query With Python And Splunk Python SDK installation: ➜ ~ git clone https://github.c

1 Nov 14, 2021

Statsmodels: statistical modeling and econometrics in Python

Related tags

Overview

About statsmodels

Documentation

Main Features

How to get it

Installing from sources

Contributing

License

Discussion and Development

Bug Reports

Comments

Unit test in test_MultivariateOLS.py

compare_r_output_dogs_data()

test_affine_hypothesis()

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you have considered

Additional context

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you have considered

Additional context

Releases(v0.13.5)

v0.13.5(Nov 2, 2022)

v0.13.4(Nov 1, 2022)

v0.13.3(Nov 1, 2022)

v0.13.2(Feb 8, 2022)

v0.13.1(Nov 12, 2021)

v0.13.0(Oct 1, 2021)

v0.13.0rc0(Sep 17, 2021)

v0.12.2(Feb 2, 2021)

v0.12.1(Oct 29, 2020)

v0.12.0(Aug 27, 2020)

v0.12.0rc0(Aug 11, 2020)

v0.11.1(Feb 21, 2020)

v0.11.0(Jan 22, 2020)

v0.11.0rc2(Jan 15, 2020)

v0.11.0rc1(Dec 18, 2019)

v0.10.2(Nov 23, 2019)

v0.10.1(Jul 19, 2019)

v0.10.0(Jun 24, 2019)

v0.10.0rc2(Jun 7, 2019)

v0.9.0(May 15, 2018)

0.9.0rc1(Apr 30, 2018)

v0.8.0(Feb 8, 2017)

v0.8.0rc1(Jun 21, 2016)

Owner

statsmodels

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

Describing statistical models in Python using symbolic formulas

Statistical package in Python based on Pandas

statDistros is a Python library for dealing with various statistical distributions

Probabilistic reasoning and statistical analysis in TensorFlow

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Creating a statistical model to predict 10 year treasury yields

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

BioMASS - A Python Framework for Modeling and Analysis of Signaling Systems

A Python package for the mathematical modeling of infectious diseases via compartmental models

OpenDrift is a software for modeling the trajectories and fate of objects or substances drifting in the ocean, or even in the atmosphere.

We're Team Arson and we're using the power of predictive modeling to combat wildfires.

A real data analysis and modeling project - restaurant inspections

Flood modeling by 2D shallow water equation

Python Kalman filtering and optimal estimation library. Implements Kalman filter, particle filter, Extended Kalman filter, Unscented Kalman filter, g-h (alpha-beta), least squares, H Infinity, smoothers, and more. Has companion book 'Kalman and Bayesian Filters in Python'.

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

Example Of Splunk Search Query With Python And Splunk Python SDK