LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice,

Ahmet Erdem

Last update: Dec 23, 2022

Related tags

Overview

LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice, for a model of choice, by iteratively removing each feature from the set, and evaluating the performance of the model, with a validation scheme of choice, based on the chosen metric.

LOFO first evaluates the performance of the model with all the input features included, then iteratively removes one feature at a time, retrains the model, and evaluates its performance on a validation set. The mean and standard deviation (across the folds) of the importance of each feature is then reported.

If a model is not passed as an argument to LOFO Importance, it will run LightGBM as a default model.

Install

LOFO Importance can be installed using

pip install lofo-importance

Advantages of LOFO Importance

LOFO has several advantages compared to other importance types:

It does not favor granular features
It generalises well to unseen test sets
It is model agnostic
It gives negative importance to features that hurt performance upon inclusion
It can group the features. Especially useful for high dimensional features like TFIDF or OHE features.
It can automatically group highly correlated features to avoid underestimating their importance.

Example on Kaggle's Microsoft Malware Prediction Competition

In this Kaggle competition, Microsoft provides a malware dataset to predict whether or not a machine will soon be hit with malware. One of the features, Centos_OSVersion is very predictive on the training set, since some OS versions are probably more prone to bugs and failures than others. However, upon splitting the data out of time, we obtain validation sets with OS versions that have not occurred in the training set. Therefore, the model will not have learned the relationship between the target and this seasonal feature. By evaluating this feature's importance using other importance types, Centos_OSVersion seems to have high importance, because its importance was evaluated using only the training set. However, LOFO Importance depends on a validation scheme, so it will not only give this feature low importance, but even negative importance.

import pandas as pd
from sklearn.model_selection import KFold
from lofo import LOFOImportance, Dataset, plot_importance
%matplotlib inline

# import data
train_df = pd.read_csv("../input/train.csv", dtype=dtypes)

# extract a sample of the data
sample_df = train_df.sample(frac=0.01, random_state=0)
sample_df.sort_values("AvSigVersion", inplace=True)

# define the validation scheme
cv = KFold(n_splits=4, shuffle=False, random_state=0)

# define the binary target and the features
dataset = Dataset(df=sample_df, target="HasDetections", features=[col for col in train_df.columns if col != target])

# define the validation scheme and scorer. The default model is LightGBM
lofo_imp = LOFOImportance(dataset, cv=cv, scoring="roc_auc")

# get the mean and standard deviation of the importances in pandas format
importance_df = lofo_imp.get_importance()

# plot the means and standard deviations of the importances
plot_importance(importance_df, figsize=(12, 20))

Another Example: Kaggle's TReNDS Competition

In this Kaggle competition, pariticipants are asked to predict some cognitive properties of patients. Independent component features (IC) from sMRI and very high dimensional correlation features (FNC) from 3D fMRIs are provided. LOFO can group the fMRI correlation features into one.

def get_lofo_importance(target):
    cv = KFold(n_splits=7, shuffle=True, random_state=17)

    dataset = Dataset(df=df[df[target].notnull()], target=target, features=loading_features,
                      feature_groups={"fnc": df[df[target].notnull()][fnc_features].values
                      })

    model = Ridge(alpha=0.01)
    lofo_imp = LOFOImportance(dataset, cv=cv, scoring="neg_mean_absolute_error", model=model)

    return lofo_imp.get_importance()

plot_importance(get_lofo_importance(target="domain1_var1"), figsize=(8, 8), kind="box")

Flofo Importance

If running the LOFO Importance package is too time-costly for you, you can use Fast LOFO. Fast LOFO, or FLOFO takes, as inputs, an already trained model and a validation set, and does a pseudo-random permutation on the values of each feature, one by one, then uses the trained model to make predictions on the validation set. The mean of the FLOFO importance is then the difference in the performance of the model on the validation set over several randomised permutations. The difference between FLOFO importance and permutation importance is that the permutations on a feature's values are done within groups, where groups are obtained by grouping the validation set by k=2 features. These k features are chosen at random n=10 times, and the mean and standard deviation of the FLOFO importance are calculated based on these n runs. The reason this grouping makes the measure of importance better is that permuting a feature's value is no longer completely random. In fact, the permutations are done within groups of similar samples, so the permutations are equivalent to noising the samples. This ensures that:

The permuted feature values are very unlikely to be replaced by unrealistic values.
A feature that is predictable by features among the chosen n*k features will be replaced by very similar values during permutation. Therefore, it will only slightly affect the model performance (and will yield a small FLOFO importance). This solves the correlated feature overestimation problem.

Comments

add categorical_feature like lightgbm

I don't know how to feature request XD.

It would be great if you can add the categorical_feature parameter in your Dataset just like in the lightgbm docs. Thanks!!

opened by rafmacalaba 9

Add the choice between Mean/Std and Median/IQR

Median and IQR could be more robust and useful if distribution of importances is not normal.

Something like this

importance_df["importance_md"] = lofo_cv_scores_normalized.median(axis=1)
importance_df["importance_iqr"] = stats.iqr(lofo_cv_scores_normalized, axis=1)

Also for plot_importance there could be a choice between error and 95%CI;

For std it would be

importance_df.plot(x="feature", 
y="importance_mean", 
xerr=1.96 * importance_df.importance_std,
kind='barh', 
color=importance_df["color"], 
figsize=figsize)

and for iqr

importance_df.plot(x="feature", 
y="importance_md", 
xerr=1.57 * importance_df.importance_iqr / np.sqrt(n), # num_sampling for flofo and num of folds for lofo
kind='barh', 
color=importance_df["color"], 
figsize=figsize)

opened by glevv 5

I am not able to import Dataset

from lofo import Dataset doesn't work and gives error of

cannot import name 'Dataset' from 'lofo'

But if I write from lofo.dataset import Dataset then it works fine.

I think it has something to do with init.py

opened by ashutosh1919 5

Understanding LOFO Importance

Hi, Consider the following code:

import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import KFold
from lofo import LOFOImportance

brca = load_breast_cancer()
df = pd.DataFrame(brca["data"], columns=brca["feature_names"]).assign(target=brca["target"])

lofo = LOFOImportance(df, brca["feature_names"], "target", 
                      cv=KFold(n_splits=5, random_state=4), 
                      scoring="roc_auc")
importance_df = lofo.get_importance()

image1

This produces a importance ranking of features. There are some positive means and some negative means in the dataset.

I will now select all features, whose mean is above zero and calculate the importance again.

new_features = importance_df.query("importance_mean > 0")["feature"].tolist()
lofo = LOFOImportance(df, new_features, "target", 
                      cv=KFold(n_splits=5, random_state=4), 
                      scoring="roc_auc")
importance_df2 = lofo.get_importance()

image2

Again, there are some positive means and some negative means in the dataset. This is not what I was expecting. I removed the supposedly unnecessary features after the first step, why are there new unnecessary features after the second step?

opened by r0f1 4

Having a lot of features + Using LOFO?

Hi,

I have 1673 features. When I tried using LOFO importance, the result is the following:

Are the features showing up one on top of the other because the plot isn't long enough? What would you suggest to fix this problem?

Thank you

opened by Mymoza 3

Support multiclass classification ?

The code below is okay to get importance_df.

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer, load_iris
from sklearn.model_selection import KFold
from lofo import LOFOImportance, Dataset, plot_importance

data = load_breast_cancer(as_frame=True)# load as dataframe
df = data.data
df['target']=data.target.values

# model
model = RandomForestClassifier()
# dataset
dataset = Dataset(df=df, target="target", features=[col for col in df.columns if col != 'target'])
# get feature importance
cv = KFold(n_splits=5, shuffle=True, random_state=666)
lofo_imp = LOFOImportance(dataset, cv=cv, scoring="f1",model=model)
importance_df = lofo_imp.get_importance()
print(importance_df)

But if we modify load_breast_cancer to load_iris, the importance_df values are all NaN.

Is the lofo-importance only support binary classification?

opened by ybdesire 2

usage question

This is not an issue, but rather a quick question for clarification.

From the brief definition of the method, it is a little hard to tell how LOFO and RFE/Backward Selection differ from each other. Could you please compare & contrast?

Thank you again for sharing this lib with the community!

Serdar

opened by skadio 2
Sample_weight?

Is there any way to pass a sample weight column into a Dataset object, or is there any plan to add this functionality? It should fit pretty smoothly into the sklearn framework I would expect, but maybe I'm missing something.

Thanks for the wonderful package!

opened by kmedved 2
Add logging or restart mechanism

When there are many features, the task takes a long time and is easy to collapse. Therefore, I think it is necessary to add the logging function and breakpoint recovery function in lofo.

opened by RainFung 2
Could you add a reference?

Thanks for putting this repo together. Who developed this method? Is it the same as LOCO? https://www.tandfonline.com/doi/full/10.1080/01621459.2017.1307116?casa_token=HAl_ErrKi18AAAAA:YyDJybfbzaLMDU1Zlzq8D4OQnZmUeEwukWJFagcsFB7_JA-W-7ifcINhc8N0FTbtbImLjezWESo

Thanks, Daniel

opened by danielmlow 1
Groupkfold or Groupshufflesplit Cross Validation

Thanks for the very cool package. I was wondering if it is possible to pass a group_id somehow in order to do Groupkfold or Groupshufflesplit cross validation? I don't see anywhere obvious, but wanted to check in case I'm missing anything.

Thanks.

opened by kmedved 1
Compatibility with neural network: replacing with constant value instead of dropping the feature
Hi,

For neural network if you change the number of features, you need to change the input dimension and therefore the number of neurons. So, we could have an option like:

leaving='drop' for current behaviour

leaving='replace' for NN

What do you think?
opened by stephanecollot 2
Running the example in the readme throws errors

Indeed, in the first example here, the target attribute does not exist and should be instead "HasDetections" or defined before creating the model

I have created pull request here

opened by KameniAlexNea 0
Multiclass models

The algorithm doesn't support multiclass classification. In infer_model function the classification task is defined only for two unique values of the target.

opened by AndreaPesce 4

Owner

Ahmet Erdem

GitHub

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR2022)[paper] Authors: Chenhang He, Ruihuang Li, Shuai Li, L

141 Dec 30, 2022

Calculates carbon footprint based on fuel mix and discharge profile at the utility selected. Can create graphs and tabular output for fuel mix based on input file of series of power drawn over a period of time.

carbon-footprint-calculator Conda distribution ~/anaconda3/bin/conda install anaconda-client conda-build ~/anaconda3/bin/conda config --set anaconda_u

Seattle university Renewable energy research

7 Sep 26, 2022

A lightweight Python-based 3D network multi-agent simulator. Uses a cell-based congestion model. Calculates risk, loudness and battery capacities of the agents. Suitable for 3D network optimization tasks.

AMAZ3DSim AMAZ3DSim is a lightweight python-based 3D network multi-agent simulator. It uses a cell-based congestion model. It calculates risk, battery

13 Nov 4, 2022

Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

Despite its importance, there are few previous works applying I2I translation to webtoon. I collected dataset from naver webtoon 연애혁명 and tried to transfer human faces to webtoon domain.

64 Oct 19, 2022

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

405 Jan 6, 2023

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Image Crop Analysis This is a repo for the code used for reproducing our Image Crop Analysis paper as shared on our blog post. If you plan to use this

239 Jan 2, 2023

Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

SYMPAIS: Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis Overview | Installation | Documentation | Examples | Notebo

4 Sep 13, 2022

I-SECRET: Importance-guided fundus image enhancement via semi-supervised contrastive constraining

I-SECRET This is the implementation of the MICCAI 2021 Paper "I-SECRET: Importance-guided fundus image enhancement via semi-supervised contrastive con

13 Dec 2, 2022

Differentiable Annealed Importance Sampling (DAIS)

Differentiable Annealed Importance Sampling (DAIS) This repository contains the code to reproduce the DAIS results from the paper Differentiable Annea

6 Dec 26, 2021

Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

shindo.py Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data stored in NumPy array Introduction Japa

3 Sep 23, 2022

Python PID Tuner - Makes a model of the System from a Process Reaction Curve and calculates PID Gains

PythonPID_Tuner_SOPDT Step 1: Takes a Process Reaction Curve in csv format - assumes data at 100ms interval (column names CV and PV) Step 2: Makes a r

1 Jan 18, 2022

Milano is a tool for automating hyper-parameters search for your models on a backend of your choice.

Milano (This is a research project, not an official NVIDIA product.) Documentation https://nvidia.github.io/Milano Milano (Machine learning autotuner

147 Dec 17, 2022

Code for one-stage adaptive set-based HOI detector AS-Net.

AS-Net Code for one-stage adaptive set-based HOI detector AS-Net. Mingfei Chen*, Yue Liao*, Si Liu, Zhiyuan Chen, Fei Wang, Chen Qian. "Reformulating

45 Dec 9, 2022

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

This is the repository for our 2020 paper "Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis". Data We provide

35 Nov 16, 2022

Open-Set Recognition: A Good Closed-Set Classifier is All You Need

Open-Set Recognition: A Good Closed-Set Classifier is All You Need Code for our paper: "Open-Set Recognition: A Good Closed-Set Classifier is All You

194 Jan 3, 2023

Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of images as "pixels"

picinpics Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of

1 Oct 24, 2021

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

380 Nov 5, 2022

A simple but complete full-attention transformer with a set of promising experimental features from various papers

x-transformers A concise but fully-featured transformer, complete with a set of promising experimental features from various papers. Install $ pip ins

2.3k Jan 3, 2023

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

310 Dec 28, 2022