Code for "High-Precision Model-Agnostic Explanations" paper

Marco Tulio Correia Ribeiro

Last update: Jan 5, 2023

Related tags

Deep Learning Model Explanation anchor

Overview

Anchor

This repository has code for the paper High-Precision Model-Agnostic Explanations.

An anchor explanation is a rule that sufficiently “anchors” the prediction locally – such that changes to the rest of the feature values of the instance do not matter. In other words, for instances on which the anchor holds, the prediction is (almost) always the same.

At the moment, we support explaining individual predictions for text classifiers or classifiers that act on tables (numpy arrays of numerical or categorical data). If there is enough interest, I can include code and examples for images.

The anchor method is able to explain any black box classifier, with two or more classes. All we require is that the classifier implements a function that takes in raw text or a numpy array and outputs a prediction (integer)

Installation

The Anchor package is on pypi. Simply run:

pip install anchor-exp

Or clone the repository and run:

python setup.py install

If you want to use AnchorTextExplainer, you have to run the following:

python -m spacy download en_core_web_lg

And if you want to use BERT to perturb inputs (recommended), also install transformers:

pip install torch transformers spacy && python -m spacy download en_core_web_sm

Examples

See notebooks folder for tutorials. Note that from version 0.0.1.0, it only works on python 3.

Citation

Here is the bibtex if you want to cite this work.

Comments

AnchorTabularExplainer without categorical features

Hi @marcotcr ,

Firstly, the paper is great and I'm really looking forward to using the package.

I tried to use it on my own data where the AnchorTabularExplainer() object does not have any categorical_names (i.e. categorical features). I see that the code when calling the explain_instance() method goes to https://github.com/marcotcr/anchor/blob/master/anchor/anchor_tabular.py#L215 and since there are no categorical features, the mapping dict remains empty and so the method is not working.

Am I missing something? Or, is there something I can do to overcome this?

opened by asstergi 10
Precision & Coverage Issue, Possible effects on anchor?

I was running playing around with Anchors and was getting really weird coverage and precision values. Then I saw the TODO (line 301 of anchor_tabular.py) saying that the precision and coverage measures are incorrect.

Does this mean that the anchors that I computed were also wrong? Or does the issue only affect those metrics?

Thanks!

opened by GDPlumb 6
issue with the predict function, classifier problem, categorical dataset

Hi, I found the paper on anchor extremely interesting. The dataset I have only has categorical features with values 0 and 1. I tested it for different models but the code, throws an error in the line, classifier_fn(self.encoder.transform(x)) . As the feature vectors that the dataset has are already discretized, anchor discretizes it further, irrespective of the input throwing an error from the predict function. Could you please help me with the issue. Thanks.

opened by skavula 5

Issue with Spacy and the en_core_web_lg.

Hi,

I'm trying to run this simple snippet of code, after having successfully (i.e., no error/warning) installed anchor, spacy and all the requirements (included the command 'python -m spacy download en_core_web_lg'):

import spacy
from anchor import anchor_text

nlp = spacy.load('en_core_web_lg')
explainer = anchor_text.AnchorText(nlp, ['negative', 'positive'], use_unk_distribution=False, use_bert=False)

But I obtain the following error:

Exception                                 Traceback (most recent call last)
<ipython-input-5-7f4e7f3d6066> in <module>
----> 1 explainer = anchor_text.AnchorText(nlp, ['negative', 'positive'], use_unk_distribution=False, use_bert=False)

~/.local/lib/python3.7/site-packages/anchor/anchor_text.py in __init__(self, nlp, class_names, use_unk_distribution, use_bert, mask_string)
    117         self.tg = None
    118         self.use_bert = use_bert
--> 119         self.neighbors = utils.Neighbors(self.nlp)
    120         self.mask_string = mask_string
    121         if not self.use_unk_distribution and self.use_bert:

~/.local/lib/python3.7/site-packages/anchor/utils.py in __init__(self, nlp_obj)
    319         self.to_check = [w for w in self.nlp.vocab if w.prob >= -15 and w.has_vector]
    320         if not self.to_check:
--> 321             raise Exception('No vectors. Are you using en_core_web_sm? It should be en_core_web_lg')
    322         self.n = {}
    323 

Exception: No vectors. Are you using en_core_web_sm? It should be en_core_web_lg

I'm using this setting:

Fedora 30 (but I can replicate it on Ubuntu 18.04)
python 3.7.4
spacy 2.3.2 (but I've also tried with 2.2.3)

Thank you, Emanuele

opened by EmanueleLM 3

AnchorTabularExplainer explain_instance

Hi! I am trying to use AnchorTabularExplainer to explain the predictions of a random forest classifier, but when calling explain_instance function I get this error: "ValueError: X has different shape than during fitting. Expected 4, got 11." I have checked that the train dataset and test dataset have the same shape.

Any suggestions of what can I do?

opened by ri3017 3
Using anchor on pre-processed dataset

Hi,

First I want to thank you for the wonderful paper and creating this library in the first place. I would also like to ask a question.

I want to use anchor on a dataset which is already processed (i.e. everything is binned and properly one-hot encoded so now I have a |'n_cases' x 'm_one-hot-encoded_features'| sparse matrix as (X) and |n_cases x 2| matrix as (y). Unfortunately, there is a lot of 'automated' dataset processing which is done during instatializaton of an anchor_tabular.AnchorTabularExplainer(), which produces various errors when I want to use it on an arbitrary dataset.

Can you please advise what one needs to provide for 'class_names', 'feature_names' and 'categorical_names' in cases when the dataset is already one-hot encoded?

Thank you!

Best regards, Kiril

opened by kirilgeorgiev82 3
Is it possible to use AnchorText with Tokenizer instead of CountVectorizer?
Good afternoon. Thank you for such a great package!

Is it possible to implement AnchorsText explainer with a model which takes in Tokenizer.texts_to_sequences data?

My current implementation:

Creating a reverse dictionary

reverse_word_map = dict(map(reversed, word_index.items())) # Function takes a tokenized sentence and returns the words def sequence_to_text(list_of_indices): # Looking up words in dictionary words = [reverse_word_map.get(letter) for letter in list_of_indices] return words my_texts = np.array(list(map(sequence_to_text, X_test_encoded))) test_text = ' '.join(my_texts[4]) def wrapped_predict(strings): print(strings) cnn_rep = tokenizer.texts_to_sequences(strings) text_data = pad_sequences(cnn_rep, maxlen=30) print(text_data) prediction = model.predict(text_data) print(prediction) return model.predict(text_data) nlp = spacy.load('en_core_web_sm') explainer = AnchorText(nlp, ['negative', 'positive'], use_unk_distribution=True) exp = explainer.explain_instance(test_text, wrapped_predict, threshold=0.95)

And the current output is:

['war clan versus clan touch zenith final showdown bridge body count countless demons magic swords priests versus buddhist monks beautiful visions provided maestro rest good japanese flick rainy summer night'] [[ 181 6818 3962 6818 1039 19084 332 4277 2956 519 1415 3404 2136 1193 8736 8834 3962 14769 8249 197 5440 1925 15445 245 5 766 356 6073 1320 195]] [[0.50682825]] ['UNK UNK UNK clan touch UNK final showdown bridge UNK UNK countless UNK UNK UNK priests UNK UNK monks beautiful UNK provided UNK rest UNK japanese UNK rainy UNK UNK'] [[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6818 1039 332 4277 2956 3404 8834 8249 197 1925 245 766 6073]] [[0.50716233]]

Error being thrown:

ValueError: all the input arrays must have the same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

It appears like it is working...sort of I am not really sure how if possible I can work around this error, any help would be greatly appreciated.
opened by Enantiodromis 2
About the 'lending club' dataset

Hello， I want to use the 'lending club' dataset in my paper to do experiment. While the website has the terms of use. I can't confirm if I can use it for free. And when I try to use the 'anchor' I notice that you used it . So I wonder if you paid for the usage or just used it for free? Hope you could answer my question. Thank you so much. Shucen Ma

opened by der1997 2
dependencies

I'm trying to use anchor on my dataset, but I run into dependencies issues. My environment currently uses sklearn 0.22, but when looking into the anchor code I see that an older version of SKlearn is used. The setup.py file shows that the packet has dependencies on numpy, scipy, spacy and scikit learn but does not specify the version. Could you give the versions of the packets that you used?

opened by LHuysmans 2
Filtering candidate anchors by coverage

I noticed in anchor_base.py line 305, newly generated candidate anchors are filtered out if they don't have a larger coverage than the previous best candidate whose precision is large enough. But these new candidates are larger, and will always have coverage at most as large as the previous valid candidate. Would it be equivalent if we check if best_coverage == -1 as the first line in the while loop on line 302, and break if best_coverage is not -1 (meaning we select the best coverage anchor with high enough precision from the previous round, and not try making new candidates as their coverage can't possibly be larger)?

Let me know if I'm missing something when you get the chance. Thanks.

opened by DaveDe 2
Can we use Anchor to explain already built models in our own way?

Hi,

First of all, I should say that this is really a great paper. Good luck with you.

My question is, let's assume we have a tabular dataset. We just do encoding categorical variables as pre-processing and make a model using the random forest. And we save that model using .pickle format.

After that can we use Anchor to explain that saved model.

For example, I don't need to use these lines of code when making the model.

explainer = anchor_tabular.AnchorTabularExplainer(dataset.class_names, dataset.feature_names, dataset.data, dataset.categorical_names) explainer.fit(dataset.train, dataset.labels_train, dataset.validation, dataset.labels_validation) c = sklearn.ensemble.RandomForestClassifier(n_estimators=50, n_jobs=5) c.fit(explainer.encoder.transform(dataset.train), dataset.labels_train)

I just need to use, c = sklearn.ensemble.RandomForestClassifier(n_estimators=50, n_jobs=5) c.fit(dataset.train, dataset.labels_train) these two lines when making the model.

Thank you.

opened by DiliSR 2
Get anchors from tuple method

``Hi, Thanks for the great contribution. I noticed a comment in the beginning of the method get_anchor_from_tuple() in anchor_base.py, which says: # TODO: This is wrong, some of the intermediate anchors may not exist. Is it still the case that this method is incorrect? What is this method supposed to be doing?

opened by ishcha 0
anchor rules for time series classification
thanks for the excellent job.

I have created a dataset where each target variable has a 2D array. The dataset format is the one used by TSAI:

Considerations about our 3D input data or X vs a 2D target vars Y (my dataset are just numpy arrays)

To be able to use timeseriesAI your data needs to have 3 dimensions:

number of samples

number of features (aka variables, dimensions, channels)

number of steps (or length, time steps, sequence steps)
(from this noteboook)

here a diagram of the dataset format

The problem is not only related with time series classification, but with any tabular data, where each target variable contains multiple values per input feature var.

The question is:

could I use this package to explain anchor rules of a model trained with this format of data?

here a code example - 01_Intro_to_Time_Series_Classification.ipynb I am using to generate my own model

PD: This format would be quite similar to a B/W image classification dataset, where we have one image per sample and the labels are just numbers (NMIST example)
opened by Jalagarto 2
# TODO: precision recall is all wrong, coverage functions wont work

In master for this repo, in anchor_tabular.py, this comment sits at the top of a function near the bottom of the module. It's not a function that obviously has to DO with coverage, so at first read it was a bit cryptic to me.

Is this TODO still describing a present issue? Should I be worried that the coverage values I am seeing from anchors_tabular are inaccurate?

opened by CJMenart 0
Justification for removing bisection for computing KL-confidence regions

Hi @marcotcr, whilst browsing the repo I noticed that you've removed the bisection part for computing the upper and lower confidence bounds: https://github.com/marcotcr/anchor/commit/ff0924e6bcaaa7149e2940303cd0b22994112157.

The bisection is required to compute the KL-bounds (4) and (5) defined in the bandit paper so I'm a bit puzzled as to why you've removed it. The new behaviour is also not Hoeffding-bound based (3) but rather is equivalent to running bisection just once and then returning whatever is found (note - there is also no guarantee that the bound returned will satisfy the inequalities in (4) and (5) - in practice I think this will result in looser bounds).

opened by jklaise 1
Possible anchor coverage issue
Hello Marco,

In the 'Anchor on tabular data.ipynb' example, and also in the SP Anchors example from the anchor-experiments repository, I am not sure if I am misunderstanding this, but I think you may be computing the coverage of an Anchor in a wrong way. You seem to only check for equality in feature values (of course, only for the features included in the Anchor) between the instance that generated the Anchor and the other instances from the dataset. The specific files and lines I am referring to are the following:

'Anchor on tabular data.ipynb': Line fit_anchor = np.where(np.all(dataset.test[:, exp.features()] == dataset.test[idx][exp.features()], axis=1))[0]

'utils.py' from the 'anchor-experiments' repository: Line 347, 347:

covered[i] = set( np.all(data[:, fs] == d[fs], axis=1).nonzero()[0])

Shouldn't you compute the coverage based on the operator used in the Anchor, similar to what you do in file 'anchor_tabular.py' from the on Lines 254-261?

for i in mapping: f, op, v = mapping[i] if op == 'eq': data[:, i] = (d_raw_data[:, f] == data_row[f]).astype(int) if op == 'leq': data[:, i] = (d_raw_data[:, f] <= v).astype(int) if op == 'geq': data[:, i] = (d_raw_data[:, f] > v).astype(int)

Thank you! -Alexandru
opened by danyveve 0
Anchor for regression?

Dear Marco,

Thank you for publishing the code!

I have tried to use Anchor to explain a RandomForestRegressor, it works with no bug, however it took longer than a RandomForestClassifier and the length of rules returned is longer expecially when the precision is low. I would like to know if Anchor is indeed expected to work for regression tasks, and if so, what values expected for the parameter 'class_names' ?

Thank you in advance!

opened by LishengSun 2

Code for "High-Precision Model-Agnostic Explanations" paper

Related tags

Overview

Anchor

Installation

Examples

Citation

Comments

Creating a reverse dictionary

could I use this package to explain anchor rules of a model trained with this format of data?

Owner

Marco Tulio Correia Ribeiro

Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)

Net2Vis automatically generates abstract visualizations for convolutional neural networks from Keras code.

Code for visualizing the loss landscape of neural nets

The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

This is discord nitro code generator and checker made with python. This will generate nitro codes and checks if the code is valid or not. If code is valid then it will print the code leaving 2 lines and if not then it will print '*'.

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

A toolkit to automatically crawl the paper list and download paper pdfs of ACL Ahthology.

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

A Replit Game Know As ROCK PAPER SCISSOR AND ALSO KNOW AS STONE PAPER SCISSOR

Rock-Paper-Scissors - Rock Paper Scissors With Python

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"