SparseLasso: Sparse Solutions for the Lasso

Overview

SparseLasso: Sparse Solutions for the Lasso

Introduction

SparseLasso provides a Scikit-Learn based estimation of the Lasso with cross-validation tuning for the penalty choice using the 'one standard error' rule to yield sparse solutions. The 'one standard error' rule recognizes the fact that the cross-validation path is estimated with error and selects the more parsimonious model (see Hastie, Tibshirani and Friedman, 2009). This rule thus chooses the largest possible penalty which is still within the one standard error of the cross-validation optimal value. Given that the Lasso often selects too many variables in practice, the one standard error rule provides a practical solution to yield sparser models. The software implementation of this rule is readily available in the R-package 'glmnet' (Friedman, Hastie and Tibshirani, 2010), however, it is absent from the Scikit-Learn module (Pedregosa et al., 2011). SparseLasso provides estimation of the penalized linear and logistic model based on Scikit-Learn's LassoCV and LogisticRegressionCV, respectively and thus accepts the standard Scikit-Learn arguments.

Installation

SparseLasso module relies on Python 3 and is based on the scikit-learn module. The required modules can be installed by navigating to the root of this project and executing the following command: pip install -r requirements.txt.

Example

The example below demonstrates the basic usage of the SparseLasso module.

# import modules
import pandas as pd
import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import LassoCV

# import SparseLasso
from sparse_lasso import SparseLassoCV

# simulate some example data for the linear model
X, y, coef = make_regression(n_samples=1000,
                             n_features=100, 
                             n_informative=10,
                             noise=10,
                             coef=True,
                             random_state=0)

# estimate standard LassoCV with optimal lambda minimizing error
lasso_min = LassoCV(n_alphas=100, cv=10).fit(X=X, y=y)

# estimate SparseLassoCV with lambda using 1 standard error rule
lasso_1se = SparseLassoCV(n_alphas=100, cv=10).fit(X=X, y=y)

# compare the penalty values
print('Lasso Min Penalty: ', round(lasso_min.alpha_, 2), '\n',
      'Lasso 1se Penalty: ', round(lasso_1se.alpha, 2), '\n')

# compare the number of selected features
print('Lasso Min Number of Selected Variables:     ',
      np.sum((lasso_min.coef_ != 0) * 1), '\n',
      'Lasso 1se Number of Selected Variables:     ',
      np.sum((lasso_1se.coef_ != 0) * 1), '\n')

For a more detailed example see the sparse_lasso_example.py as well as the sparse_lasso_simulation.py for a simulation exercise comparing the optimal cross-validation penalty choice with the one standard error rule for variable selection.

References

  • Hastie, Trevor, Robert Tibshirani, and J H. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. , 2009. Print.
  • Friedman, Jerome, Trevor Hastie, and Rob Tibshirani. "Regularization paths for generalized linear models via coordinate descent." Journal of statistical software 33.1 (2010): 1.
  • Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." the Journal of machine Learning research 12 (2011): 2825-2830.
You might also like...
Chameleon is yet another PowerShell obfuscation tool designed to bypass AMSI and commercial antivirus solutions.

Chameleon is yet another PowerShell obfuscation tool designed to bypass AMSI and commercial antivirus solutions. The tool has been developed as a Python port of the Chimera project, by tokioneon_.

Rename and categorize your DMOJ solutions

DMOJ Downloader What is this for? DMOJ lets you download the code for all your solutions, however the files are just named as numbers

Google Foobar challenge solutions from my experience and other's on the web.
Google Foobar challenge solutions from my experience and other's on the web.

Google Foobar challenge Google Foobar challenge solutions from my experience and other's on the web. Note: Problems indicated with "Mine" are tested a

LeetCode Solutions https://t.me/tenvlad

leetcode LeetCode Solutions groupped by common patterns YouTube: https://www.youtube.com/c/vladten Telegram: https://t.me/nilinterface Problems source

Automatically find solutions when your Python code encounters an issue.
Automatically find solutions when your Python code encounters an issue.

What The Python?! Helping you find answers to the errors Python spits out. Installation You can find the source code on GitHub at: https://github.com/

MegFlow - Efficient ML solutions for long-tailed demands.

Efficient ML solutions for long-tailed demands.

Python solutions to Codeforces problems
Python solutions to Codeforces problems

CodeForces This repository is dedicated to my Python solutions for CodeForces problems. Feel free to copy, contribute and/or comment. If you find any

🏃 Python Solutions of All Problems in FHC 2021 (In Progress)

FacebookHackerCup-2021 Python solutions of Facebook Hacker Cup 2021. Solution begins with * means it will get TLE in the largest data set (total compu

This repository is for adding codes of data structures and algorithms, leetCode, hackerrank etc solutions in different languages

DSA-Code-Snippet This repository is for adding codes of data structures and algorithms, leetCode, hackerrank etc solutions in different languages Cont

Leetcode solutions - All algorithms implemented in Python 3 (for education)

Leetcode solutions - All algorithms implemented in Python 3 (for education)

Solutions of Reinforcement Learning 2nd Edition

Solutions of Reinforcement Learning, An Introduction

Google Landmark Recogntion and Retrieval 2021 Solutions

Google Landmark Recogntion and Retrieval 2021 Solutions In this repository you can find solution and code for Google Landmark Recognition 2021 and Goo

Solutions to all 6 programming assignments in Dan Boneh's course Cryptography I, in statically typed Python.

Solutions to Cryptography I programming exercises Dan Boneh from Stanford University has an excellent online course on cryptography, hosted on Courser

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

This Repository consists of my solutions in Python 3 to various problems in Data Structures and Algorithms
This Repository consists of my solutions in Python 3 to various problems in Data Structures and Algorithms

Problems and it's solutions. Problem solving, a great Speed comes with a good Accuracy. The more Accurate you can write code, the more Speed you will

Here You will Find CodeChef Challenge Solutions
Here You will Find CodeChef Challenge Solutions

Here You will Find CodeChef Challenge Solutions

These are my solutions to Advent of Code problems.

Advent of Code These are my solutions to Advent of Code problems. If you want to join my leaderboard, the code is 540750-9589f56d. When I solve for sp

Simple AoC helper program you can use to develop your own solutions in python.

AoC-Compabion Simple AoC helper program you can use to develop your own solutions in python. Simply install it in your python environment using pip fr

"Cambio de monedas" Change-making problem with Python, dynamic programming best solutions,

Change-making-problem / Cambio de monedas Entendiendo el problema Dada una cantidad de dinero y una lista de denominaciones de monedas, encontrar el n

Owner
Gabriel Okasa
PhD Candidate in Econometrics at the University of St.Gallen, Switzerland
Gabriel Okasa
SparseLasso: Sparse Solutions for the Lasso

SparseLasso: Sparse Solutions for the Lasso Introduction SparseLasso provides a Scikit-Learn based estimation of the Lasso with cross-validation tunin

Gabriel Okasa 1 Nov 8, 2021
Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch

Differentiable Neural Computers and family, for Pytorch Includes: Differentiable Neural Computers (DNC) Sparse Access Memory (SAM) Sparse Differentiab

ixaxaar 302 Dec 14, 2022
My Advent of Code solutions. I also upload videos of my solves: https://www.youtube.com/channel/UCuWLIm0l4sDpEe28t41WITA

My solutions to adventofcode.com puzzles. I post videos of me solving the puzzles in real-time at https://www.youtube.com/channel/UCuWLIm0l4sDpEe28t41

null 195 Jan 4, 2023
OpenShot Video Editor is an award-winning free and open-source video editor for Linux, Mac, and Windows, and is dedicated to delivering high quality video editing and animation solutions to the world.

OpenShot Video Editor is an award-winning free and open-source video editor for Linux, Mac, and Windows, and is dedicated to delivering high quality v

OpenShot Studios, LLC 3.1k Jan 1, 2023
Experimental solutions to selected exercises from the book [Advances in Financial Machine Learning by Marcos Lopez De Prado]

Advances in Financial Machine Learning Exercises Experimental solutions to selected exercises from the book Advances in Financial Machine Learning by

Brian 1.4k Jan 4, 2023
Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

Modeling High-Frequency Limit Order Book Dynamics Using Machine Learning Framework to capture the dynamics of high-frequency limit order books. Overvi

Chang-Shu Chung 1.3k Jan 7, 2023
Exact Pareto Optimal solutions for preference based Multi-Objective Optimization

Exact Pareto Optimal solutions for preference based Multi-Objective Optimization

Debabrata Mahapatra 40 Dec 24, 2022
A collection of write-ups and solutions for Cyber FastTrack Spring 2021.

IMPORTANT: Please contact us before you use any styling or content shown here! Cyber FastTrack Spring 2021 / National Cyber Scholarship Competition -

Alice 48 Aug 28, 2022
BMW TechOffice MUNICH 148 Dec 21, 2022
🏅 The Most Comprehensive List of Kaggle Solutions and Ideas 🏅

?? Collection of Kaggle Solutions and Ideas ??

Farid Rashidi 2.3k Jan 8, 2023