# Statsmodels: statistical modeling and econometrics in Python

##### Overview

statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics and estimation and inference for statistical models.

## Documentation

The documentation for the latest release is at

https://www.statsmodels.org/stable/

The documentation for the development version is at

https://www.statsmodels.org/dev/

Recent improvements are highlighted in the release notes

https://www.statsmodels.org/stable/release/

Backups of documentation are available at https://statsmodels.github.io/stable/ and https://statsmodels.github.io/dev/.

## Main Features

• Linear regression models:
• Ordinary least squares
• Generalized least squares
• Weighted least squares
• Least squares with autoregressive errors
• Quantile regression
• Recursive least squares
• Mixed Linear Model with mixed effects and variance components
• GLM: Generalized linear models with support for all of the one-parameter exponential family distributions
• Bayesian Mixed GLM for Binomial and Poisson
• GEE: Generalized Estimating Equations for one-way clustered or longitudinal data
• Discrete models:
• Logit and Probit
• Multinomial logit (MNLogit)
• Poisson and Generalized Poisson regression
• Negative Binomial regression
• Zero-Inflated Count models
• RLM: Robust linear models with support for several M-estimators.
• Time Series Analysis: models for time series analysis
• Complete StateSpace modeling framework
• Seasonal ARIMA and ARIMAX models
• VARMA and VARMAX models
• Dynamic Factor models
• Unobserved Component models
• Markov switching models (MSAR), also known as Hidden Markov Models (HMM)
• Univariate time series analysis: AR, ARIMA
• Vector autoregressive models, VAR and structural VAR
• Vector error correction model, VECM
• exponential smoothing, Holt-Winters
• Hypothesis tests for time series: unit root, cointegration and others
• Descriptive statistics and process models for time series analysis
• Survival analysis:
• Proportional hazards regression (Cox models)
• Survivor function estimation (Kaplan-Meier)
• Cumulative incidence function estimation
• Multivariate:
• Principal Component Analysis with missing data
• Factor Analysis with rotation
• MANOVA
• Canonical Correlation
• Nonparametric statistics: Univariate and multivariate kernel density estimators
• Datasets: Datasets used for examples and in testing
• Statistics: a wide range of statistical tests
• diagnostics and specification tests
• goodness-of-fit and normality tests
• functions for multiple testing
• Imputation with MICE, regression on order statistic and Gaussian imputation
• Mediation analysis
• Graphics includes plot functions for visual analysis of data and model results
• I/O
• Table output to ascii, latex, and html
• Miscellaneous models
• Sandbox: statsmodels contains a sandbox folder with code in various stages of development and testing which is not considered "production ready". This covers among others
• Generalized method of moments (GMM) estimators
• Kernel regression
• Various extensions to scipy.stats.distributions
• Panel data models
• Information theoretic measures

## How to get it

The main branch on GitHub is the most up to date code

https://www.github.com/statsmodels/statsmodels

https://github.com/statsmodels/statsmodels/tags

Binaries and source distributions are available from PyPi

https://pypi.org/project/statsmodels/

Binaries can be installed in Anaconda

conda install statsmodels

## Installing from sources

See INSTALL.txt for requirements or see the documentation

https://statsmodels.github.io/dev/install.html

## Contributing

Contributions in any form are welcome, including:

• Documentation improvements
• New features to existing models
• New models

https://www.statsmodels.org/stable/dev/test_notes

for instructions on installing statsmodels in editable mode.

Modified BSD (3-clause)

## Discussion and Development

Discussions take place on the mailing list

and in the issue tracker. We are very interested in feedback about usability and suggestions for improvements.

## Bug Reports

Bug reports can be submitted to the issue tracker at

https://github.com/statsmodels/statsmodels/issues

• #### Gam gsoc2015

@josef-pkt I am starting a PR . At the moment there is the gam file that contains the gam penalty class. Smooth_basis contains some functions to get bsplines and polynomial basis. This file will be removed when we will be able to use directly patsy.

There are also 2 files that contains examples or small scripts. We will remove them later.

Let me know what do you think about that.

Todo

• [ ] `predict` errors (stateful transform, patsy ?), Note fittedvalues are available only a problem in pirls example, predict after fit works, requires spline basis values
• [ ] `get_prediction` also errors, maybe consequence of predict error currently errors because weights is None
• [ ] check test coverage for offset and exposure

Interface

• [ ] `partial_values` `plot_partial` has inconvenient arguments (smoother and mask) instead of column index or term name or similar
• [ ] formula-like interface for predict (create spline basis values internally)
• [ ] adjust inherited methods like `plot_partial_residuals` (after GSOC?)
• [ ] default param names for splines (should be more informative than "xi" (i is range(k)), but less (?) verbose than patsy's), need them for test_terms
type-enh comp-base comp-genmod
opened by DonBeo 232
• #### Multivariate Kalman Filter

Here's a simple branch with the code added into statsmodels.tsa.statespace. A couple of thoughts

• At least in the dev process, I thought it might be nicer to keep it in its own module, rather than putting it with the kalmanf. I don't know what makes the most sense in the long run.
• I have unit tests that rely on the statespace model, but I'm rewriting them so the KF pull request can be done on its own without other dependencies, especially since the statespace model is likely to change.

L72

Question: do we want to keep the single-precision version (and the complex single precision, below)? I don't really see a use case, and it appears from preliminary tests that the results tend to overflow. Maybe I'll post a unit test to demonstrate and we can go from there.

L332

Question: there are a bunch of ways to initialize the KF, depending on the theory underlying the data. This one is only valid for stationary processes. Probably best to move out to the Python wrapper?

L444 Question: I think we'll want to add the ability to specify whether or not to check for convergence and alter the tolerance.

L414

This inversion is using an LU decomposition, but I think in this case I can rely on f to be positive definite since it's the covariance matrix of the forecast error, in which case I could use the much faster Cholesky decomposition approach. This is something I'm looking into, but if you happen to know one way or the other, that would be great too.

Related to this is the idea that you should "never take inverses", and I guess I need to look into replacing this with a linear solver routine if possible, in the updating stage below.

comp-tsa type-enh
• #### New kernel_methods module for new KDE implementation

This is a new version of the kernel density estimation. The purpose is to provide an implementation that is faster in the case of grid evaluation, and also works on bounded domains.

There is still some work to do, in particular, I need to add tests for the multi-dimensional and discrete densities.

type-enh comp-nonparametric type-refactor
opened by PierreBdR 183
• #### GSOC2017 Zero-Inflated models

This model include following models:

1. Generic Zero Inflated model
2. Zero-Inflated Poisson model
3. Zero-Inflated Generalized Poisson model (ZIGP-P)
4. Zero-Inflated Generalized Negative Binomial model (ZINB-P)

Each model include this parts:

• [x] LLF
• [x] Score
• [x] Hessian
• [x] Predict
• [x] Fit
• [x] Docs
• [x] Tests

Status: - reviewing, need to implement better way to generate start_params Last commit for end of GSoC17: Changed way to find start params

type-enh comp-discrete
opened by evgenyzhurko 170
• #### GSOC2017 Generalized Poisson (GP-P) model

This PR include implementation of Generalized Poisson model This model include this parts:

• [x] Log-likelihood function
• [x] Score
• [x] Hessian
• [x] Fit
• [x] Result
• [x] Tests
• [x] Docs

Status - merged #3795

rejected
opened by evgenyzhurko 149
• #### GSoC 2016: State-Space Models with Markov Switching

Hi, I have started implementing Kim Filter, outlined a basic functionality, as described in Kim-Nelson book (see diagram on p. 105). I didn't even run the code yet to check for errors. Coding style and class interface bother me more for the moment, as well as the possible ways to test it without implementing models.

comp-tsa type-enh
opened by ValeryTyumen 129
• #### ENH: Revise loglike/deviance to correctly handle scale

xref #3773

I apologize for this taking longer than had hoped, but my busy period was exacerbated by some unforeseen events. But I'm somewhat back now.

This is the first step to solving the above issue.

Still a lot is missing... I've only really gotten `loglike` to work for all the families except `Gamma`, `Binomial`, `Gaussian`, and `Tweedie` (which has never had `loglike`). I also need to re-work the docstrings. I'm thinking its more logical to have a `loglike_obs` function called in each family and then have a `loglike` function in the `Family` class so that `loglike` will simply be inherited. I think you the doctrings could be elegantly written to handle this too.

I'm thinking I will work on deviance next and then ciricle back to `loglike`... I feel like R takes some computational shortcuts...

type-enh comp-genmod type-refactor
• #### WIP/ENH Gam 2744 rebased2

rebased version of #4575 which was rebased version of #2744 The original GSOC PR with most of the development discussion is #2435

rebase conflict in compat.python: This has now unneeded `itertools.combinations` import but I dropped the py 2.6 compat code.

type-enh comp-genmod
opened by josef-pkt 104
• #### ENH: Add var_weights to GLM

Hello,

xref #3371

This (should) get `var_weights` to work for GLM. I added a Tweedie usecase against R.

I can't seem to get the Poisson with `var_weights` to `HC0` test that was added (but disabled because of the lack of functionality) to work. The test is here:

It fails on the following `assert`

`assert_allclose(res1.bse, corr_fact * res2.bse, atol= 1e-6, rtol=2e-6)`

Its relatively close... consistently off by a factor of `0.98574`... `corr_fact` brings it much closer... I'm wondering if another adjustment is necessary?

Honestly, I don't really understand much about sandwiches (except PBJ, meatball, and bánh mì).

Rest of the test seem to work pretty well.

I'd be happy to have this merged relatively soon, so thanks for the feedback and review!

type-enh comp-genmod topic-weights
• #### [MRG] Add MANOVA class

PR as discussed in #3274. Tested with a SAS example and two R examples, produced the same results.

To-do:

• [X] Core stats computation
• [x] api
• [x] automatic create dummy variable and hypothesis testing for categorical type independent variables.
• [x] Input validation
• [x] More examples to be tested

## Unit test in test_MultivariateOLS.py

### compare_r_output_dogs_data()

It reproduces results from the following R code:

``````library(car)
Drug = c('Morphine', 'Morphine', 'Morphine', 'Morphine', 'Morphine', 'placebo', 'placebo', 'placebo', 'placebo', 'placebo', 'Trimethaphan', 'Trimethaphan', 'Trimethaphan', 'Trimethaphan', 'Trimethaphan')
Depleted = c('N', 'N', 'N', 'N', 'Y', 'Y', 'Y', 'N', 'N', 'N', 'N', 'Y', 'Y', 'Y', 'Y')
subject  = c(1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
Histamine0 = c(-3.218876, -3.912023, -2.65926, -1.771957, -2.302585, -2.65926, -2.995732, -3.506558, -3.506558, -2.65926, -2.407946, -2.302585, -2.525729, -2.040221, -2.813411)
Histamine1 = c(-1.609438, -2.813411, 0.336472, -0.562119, -2.407946, -2.65926, -2.65926, -0.478036, 0.04879, -0.18633, 1.141033, -2.407946, -2.407946, -2.302585, -2.995732)
Histamine3 = c(-2.302585, -3.912023, -0.733969, -1.049822, -2.040221, -2.813411, -2.813411, -1.171183, -0.314711, 0.067659, 0.722706, -2.407946, -2.407946, -2.120264, -2.995732)
Histamine5 = c(-2.525729, -3.912023, -1.427116, -1.427116, -1.966113, -2.65926, -2.65926, -1.514128, -0.510826, -0.223144, 0.207014, -2.525729, -2.302585, -2.120264, -2.995732)
data = data.frame(Histamine0, Histamine1, Histamine3, Histamine5)
hismat = as.matrix(data[,1:4])
result = lm(hismat ~ Drug * Depleted)
linearHypothesis(result, c(1, 0, 0, 0, 0, 0))
linearHypothesis(result, t(cbind(c(0, 1, 0, 0, 0, 0), c(0, 0, 1, 0, 0, 0))))
linearHypothesis(result, c(0, 0, 0, 1, 0, 0))
linearHypothesis(result, t(cbind(c(0, 0, 0, 0, 1, 0), c(0, 0, 0, 0, 0, 1))))
# Or ManRes <- Manova(result, type="III")
``````

### test_affine_hypothesis()

It reproduces results from the following R code:

``````result = lm(hismat ~ Drug*Depleted)
fml = t(cbind(c(0, 1.1, 1.2, 1.3, 1.4, 1.5), c(0, 2.1, 3.2, 3.3, 4.4, 5.5)))
linearHypothesis(result, fml, t(cbind(c(1, 2, 3, 4), c(5, 6, 7, 8))), verbose=TRUE)
``````
type-enh comp-multivariate
opened by yl565 101
• #### Distributed estimation

Ok, I wanted to make a PR for this, there are still a couple of things that I need to fix but things are pretty close to done and I'm happy with the current approach.

The key part of the current approach is a function `distributed_estimation`. This function works by taking a generator for endog and exog, `endog_generator` and `exog_generator`, as well as a series of functions and key word arguments to be run on each machine and then used to recombine the results. The generator approach allows for a variety of use cases and can handle a lot of the ideas discussed in the initial proposal. For each data set yielded by the generators a model is initialized using `model_class` and `init_kwds` and then the function `estimation_method` is applied to the model along with the key words `fit_kwds` and `estimation_kwds`. Finally, the results are recombined from each data set using `join_method`.

Currently, this defaults to the distributed regularized approach discussed here:

http://arxiv.org/pdf/1503.04337v3.pdf

but the way I've set things up means that the user should be able to apply any number of procedures here.

The current todo list:

• [x] Fix hess_obs
• [x] Fix joblib fit
• [x] Add WLS/GLS for debiased regularized fit

Let me know if there are any comments/questions/criticisms, as I've said, it certainly isn't complete but I wanted to get this out there so I can start integrating any changes as I finish it up.

type-enh comp-base comp-genmod comp-regression
opened by lbybee 85
• #### ENH: GEE add use_t option for cov_params

GEE does not have use_t option, use_t = False is hardcoded in summary, no fit option

Monte Carlo comparison. they use both t and z/normal based tests and pvalues t-test can be conservative with some of the bias corrections and small number of clusters

Also we might add more options for small sample adjusted, bias reduced cov_types related to variants of cluster correlation robust sandwiches, CR3, ...

Li, Peng, and David T. Redden. “Small Sample Performance of Bias-Corrected Sandwich Estimators for Cluster-Randomized Trials with Binary Outcomes.” Statistics in Medicine 34, no. 2 (2015): 281–96. https://doi.org/10.1002/sim.6344

reference found in Huang, Shuang, Mallorie H Fiero, and Melanie L Bell. “Generalized Estimating Equations in Cluster Randomized Trials with a Small Number of Clusters: Review of Practice and Simulation Study.” Clinical Trials 13, no. 4 (August 1, 2016): 445–49. https://doi.org/10.1177/1740774516643498.

also compares several small sample corrections

type-enh comp-genmod topic-covtype
opened by josef-pkt 0
• #### ENH: hc3 cov_type for GMM, GMMIVNonlinear

this article has pseudo-hatmatrix for small sample corrections for HC2, HC3, ...

moment condition `z (y - g(x, beta))` i.e. nonlinear IV, but not for generic nonlinear GMM

Lin, Eric S., and Ta-Sheng Chou. “Finite-Sample Refinement of GMM Approach to Nonlinear Models under Heteroskedasticity of Unknown Form.” Econometric Reviews 37, no. 1 (January 2, 2018): 1–28. https://doi.org/10.1080/07474938.2014.999499.

I did not find any HC3 equivalent for generic GMM, and nothing for more general non-gaussian MLE. (Poisson, GLM use the IRLS, WLS hatmatrix)

type-enh comp-regression topic-covtype
opened by josef-pkt 0
• #### ENH: cluster bootstrap using weights

parking a reference for a useful approach to cluster bootstrap, "resampling" clusters

Instead of actually resampling clusters which would result in different nobs in replication samples, we can use weights by cluster.

Without computational shortcut, we can just use freq_weights in GLM or cluster weights in GEE and compute full model for each replication sample. I don't know how much we gain in computational efficiency if we just adjust weights in a single model instance. We might only safe on model `__init__` setup cost.

Cheng, Guang, Zhuqing Yu, and Jianhua Z. Huang. “The Cluster Bootstrap Consistency in Generalized Estimating Equations.” Journal of Multivariate Analysis 115 (March 1, 2013): 33–47. https://doi.org/10.1016/j.jmva.2012.09.003.

type-enh comp-base comp-genmod topic-covtype
opened by josef-pkt 1
• #### ENH: GEE for distributions that are not LEF/GLM-families

parking references

we already have several issues for variation on GEE/GLM including variance modeling #2898, #5674

Here the extension is to use "standard" GEE but with underlying score functions that are not coming from a LEF family, e.g. zero-inflated count models.

How do we reuse the pattern for GEE estimation with working correlation for score functions or moment conditions that do not come from GLM (mean, variance functions)? I don't know what the theoretical properties are. Non-LEF, non-GLM do not have the consistency properties robust to misspecification.

Hall, Daniel B, and Zhengang Zhang. “Marginal Models for Zero Inflated Clustered Data.” Statistical Modelling 4, no. 3 (October 1, 2004): 161–80. https://doi.org/10.1191/1471082X04st076oa.

Kong, Maiying, Sheng Xu, Steven M. Levy, and Somnath Datta. “GEE Type Inference for Clustered Zero-Inflated Negative Binomial Regression with Application to Dental Caries.” Computational Statistics & Data Analysis 85 (May 1, 2015): 54–66. https://doi.org/10.1016/j.csda.2014.11.014.

type-enh comp-genmod comp-othermod
opened by josef-pkt 0
• #### Improvements to seasonal_decompose: specifying frequency and filling NaN

The `statsmodels.tsa.seasonal.seasonal_decompose` method throws errors for not specifying a frequency and having NaN values, which can be conveniently addressed within the method by inferring frequencies and `dropna()` respectively.

#### Describe the solution you'd like

Quality of life improvements: include the relevant pandas transformations within the `seasonal_decompose` method. Include detailed error messages when DateTimeIndex is broken, etc.

#### Describe alternatives you have considered

NIL

I can work on this, but I would like to understand reasons behind the current implementation first.

opened by statistactics 0
• #### v0.13.5(Nov 2, 2022)

The statsmodels developers are happy to announce the Python 3.11 compatibility release for the 0.13 branch.

This release contains no bug fixes other than any needed to ensure statsmodels is compatible with Python 3.11. It also resolves an issue with PyPI that affects 0.13.4.

Source code(tar.gz)
Source code(zip)
• #### v0.13.4(Nov 1, 2022)

The statsmodels developers are happy to announce the Python 3.11 compatibility release for the 0.13 branch. This release contains no bug fixes other than any needed to ensure statsmodels is compatible with Python 3.11. It also resolves an issue with the source code generation in 0.13.3 that affects installs on Python 3.11 that use the source tarball.

Source code(tar.gz)
Source code(zip)
• #### v0.13.3(Nov 1, 2022)

The statsmodels developers are happy to announce the Python 3.11 compatibility release for the 0.13 branch. This release contains no bug fixes other than any needed to ensure statsmodels is compatible with Python 3.11.

Source code(tar.gz)
Source code(zip)
• #### v0.13.2(Feb 8, 2022)

The statsmodels developers are happy to announce the bugfix release for the 0.13 branch. This release fixes 10 bugs and provides protection against changes in recent versions of upstream packages.

Source code(tar.gz)
Source code(zip)
• #### v0.13.1(Nov 12, 2021)

The statsmodels developers are happy to announce the bug fix release for the 0.13 branch. This release fixes 8 bugs and brings initial support for Python 3.10.

Source code(tar.gz)
Source code(zip)
• #### v0.13.0(Oct 1, 2021)

The statsmodels developers are happy to announce the first release candidate for 0.13.0. 227 issues were closed in this release and 143 pull requests were merged. Major new features include:

• Autoregressive Distributed Lag models
• Copulas
• Ordered Models (Ordinal Regression)
• Beta Regression
• Improvements to ARIMA estimation options
Source code(tar.gz)
Source code(zip)
• #### v0.13.0rc0(Sep 17, 2021)

The statsmodels developers are happy to announce the first release candidate for 0.13.0. 227 issues were closed in this release and 143 pull requests were merged. Major new features include:

• Autoregressive Distributed Lag models
• Copulas
• Ordered Models (Ordinal Regression)
• Beta Regression
• Improvements to ARIMA estimation options
Source code(tar.gz)
Source code(zip)
• #### v0.12.2(Feb 2, 2021)

This is a bug-fix release from the 0.12.x branch. Users are encouraged to upgrade.

Notable changes include fixes for a bug that could lead to incorrect results in forecasts with the new ARIMA model (when `d > 0` and `trend='t'`) and a bug in the LM test for autocorrelation.

Source code(tar.gz)
Source code(zip)

• #### v0.12.0(Aug 27, 2020)

The statsmodels developers are happy to announce release 0.12.0. 239 issues were closed in this release and 221 pull requests were merged.

Major new features include:

• New exponential smoothing model: ETS (Error, Trend, Seasonal)
• New dynamic factor model for large datasets and monthly/quarterly mixed frequency models
• Decomposition of forecast updates based on the "news"
• Sparse Cholesky Simulation Smoother
• Option to use Chandrasekhar recursions
• Two popular methods for forecasting time series, forecasting after STL decomposition and the Theta model
• Functions for constructing complex Deterministic Terms in time series models
• New statistics function: one-way ANOVA-type tests, hypothesis tests for 2-samples and meta-analysis.
Source code(tar.gz)
Source code(zip)
• #### v0.12.0rc0(Aug 11, 2020)

The statsmodels developers are happy to announce the first release candidate for 0.12.0. 223 issues were closed in this release and 208 pull requests were merged. Major new features include:

• New exponential smoothing model: ETS (Error, Trend, Seasonal)
• New dynamic factor model for large datasets and monthly/quarterly mixed frequency models
• Decomposition of forecast updates based on the "news"
• Sparse Cholesky Simulation Smoother
• Option to use Chandrasekhar recursions
• Two popular methods for forecasting time series, forecasting after STL decomposition and the Theta model
• Functions for constructing complex Deterministic Terms in time series models
Source code(tar.gz)
Source code(zip)

• #### v0.11.0(Jan 22, 2020)

statsmodels developers are happy to announce a new release.

Major new features include:

• Regression
• Rolling OLS and WLS
• Statistics
• Oaxaca-Blinder decomposition
• Distance covariance measures (new in RC2)
• New regression diagnostic tools (new in RC2)
• Statespace Models
• Statespace-based Linear exponential smoothing models¶
• Methods to apply parameters fitted on one dataset to another dataset¶
• Method to hold some parameters fixed at known values
• Option for low memory operations
• Improved simulation and impulse responses for time-varying models
• Time-Series Analysis
• STL Decomposition
• New AR model
• New ARIMA model
• Zivot-Andrews Test
• More robust regime-switching models
Source code(tar.gz)
Source code(zip)
• #### v0.11.0rc2(Jan 15, 2020)

The second and final release candidate for statsmodels 0.11.

Major new features include:

• Regression
• Rolling OLS and WLS
• Statistics
• Oaxaca-Blinder decomposition
• Distance covariance measures (new in RC2)
• New regression diagnostic tools (new in RC2)
• Statespace Models
• Statespace-based Linear exponential smoothing models¶
• Methods to apply parameters fitted on one dataset to another dataset¶
• Method to hold some parameters fixed at known values
• Option for low memory operations
• Improved simulation and impulse responses for time-varying models
• Time-Series Analysis
• STL Decomposition
• New AR model
• New ARIMA model
• Zivot-Andrews Test
• More robust regime-switching models
Source code(tar.gz)
Source code(zip)
• #### v0.11.0rc1(Dec 18, 2019)

Release candidate for statsmodels 0.11.

Major new features include:

• Regression
• Rolling OLS and WLS
• Statistics
• Oaxaca-Blinder decomposition
• Statespace Models
• Statespace-based Linear exponential smoothing models¶
• Methods to apply parameters fitted on one dataset to another dataset¶
• Method to hold some parameters fixed at known values
• Option for low memory operations
• Improved simulation and impulse responses for time-varying models
• Time-Series Analysis
• STL Decomposition
• New AR model
• New ARIMA model
• Zivot-Andrews Test
• More robust regime switching models
Source code(tar.gz)
Source code(zip)
• #### v0.10.2(Nov 23, 2019)

This is a minor release from the 0.10.x branch with bug fixes and essential maintenance only. The key new feature is:

• Compatibility with Python 3.8
Source code(tar.gz)
Source code(zip)
• #### v0.10.1(Jul 19, 2019)

This is a minor release from the 0.10.x branch with bug fixes and essential maintenance only. The key features are:

• Compatibility with pandas 0.25
• Compatibility with Numpy 1.17
Source code(tar.gz)
Source code(zip)
• #### v0.10.0(Jun 24, 2019)

This is a major release from 0.9.0 and includes a number new statistical models and many bug fixes.

Highlights include:

• Generalized Additive Models. This major feature is experimental and may change.
• Conditional Models such as ConditionalLogit, which are known as fixed effect models in Econometrics.
• Dimension Reduction Methods include Sliced Inverse Regression, Principal Hessian Directions and Sliced Avg. Variance Estimation
• Regression using Quadratic Inference Functions (QIF)
• Gaussian Process Regression

See the release notes for a full list of all the change from 0.9.0.

python -m pip install --upgrade statsmodels

Note that 0.10.x will likely be the last series of releases to support Python 2, so please consider upgrading to Python 3 if feasible.

Please report any issues with the release candidate on the statsmodels issue tracker.

Source code(tar.gz)
Source code(zip)

• #### v0.8.0rc1(Jun 21, 2016)

###### Incubator for useful bioinformatics code, primarily in Python and R

Collection of useful code related to biological analysis. Much of this is discussed with examples at Blue collar bioinformatics. All code, images and

560 Dec 24, 2022
###### Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code

A Python framework for creating reproducible, maintainable and modular data science code.

7.9k Jan 1, 2023
###### Efficient Python Tricks and Tools for Data Scientists

Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.

944 Dec 28, 2022
###### CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation

CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation. The code should run on any Linux system, from massively parallel computer clusters to laptops.

62 Dec 8, 2022
3.6k Dec 27, 2022
###### 3D visualization of scientific data in Python

Mayavi: 3D visualization of scientific data in Python Mayavi docs: http://docs.enthought.com/mayavi/mayavi/ TVTK docs: http://docs.enthought.com/mayav

1.1k Jan 6, 2023
###### Datamol is a python library to work with molecules

Datamol is a python library to work with molecules. It's a layer built on top of RDKit and aims to be as light as possible.

276 Dec 19, 2022
###### Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

1.8k Dec 31, 2022
###### A computer algebra system written in pure Python

SymPy See the AUTHORS file for the list of authors. And many more people helped on the SymPy mailing list, reported bugs, helped organize SymPy's part

9.9k Jan 8, 2023
###### PennyLane is a cross-platform Python library for differentiable programming of quantum computers.

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural network.

1.6k Jan 4, 2023
###### SCICO is a Python package for solving the inverse problems that arise in scientific imaging applications.

Scientific Computational Imaging COde (SCICO) SCICO is a Python package for solving the inverse problems that arise in scientific imaging applications

37 Dec 21, 2022
###### Float2Binary - A simple python class which finds the binary representation of a floating-point number.

Float2Binary A simple python class which finds the binary representation of a floating-point number. You can find a class in IEEE754.py file with the

3 Dec 14, 2021
###### A simple computer program made with Python on the brachistochrone curve.

Brachistochrone-curve This is a simple computer program made with Python on the brachistochrone curve. I decided to write it after a physics lesson on

1 Dec 16, 2021
###### A flexible package manager that supports multiple versions, configurations, platforms, and compilers.

Spack Spack is a multi-platform package manager that builds and installs multiple versions and configurations of software. It works on Linux, macOS, a

3.1k Dec 31, 2022
###### Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

915 Dec 29, 2022
###### A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage

0 Sep 5, 2021
###### Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

8.1k Jan 2, 2023
###### Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

8.1k Dec 30, 2022
###### Software for Advanced Spatial Econometrics

GeoDaSpace Software for Advanced Spatial Econometrics GeoDaSpace current version 1.0 (32-bit) Development environment: Mac OSX 10.5.x (32-bit) wxPytho

38 Jan 3, 2023
###### Statistical Analysis 📈 focused on statistical analysis and exploration used on various data sets for personal and professional projects.

Statistical Analysis ?? This repository focuses on statistical analysis and the exploration used on various data sets for personal and professional pr

1 Sep 3, 2022