Code for "High-Precision Model-Agnostic Explanations" paper

Overview

Anchor

This repository has code for the paper High-Precision Model-Agnostic Explanations.

An anchor explanation is a rule that sufficiently “anchors” the prediction locally – such that changes to the rest of the feature values of the instance do not matter. In other words, for instances on which the anchor holds, the prediction is (almost) always the same.

At the moment, we support explaining individual predictions for text classifiers or classifiers that act on tables (numpy arrays of numerical or categorical data). If there is enough interest, I can include code and examples for images.

The anchor method is able to explain any black box classifier, with two or more classes. All we require is that the classifier implements a function that takes in raw text or a numpy array and outputs a prediction (integer)

Installation

The Anchor package is on pypi. Simply run:

pip install anchor-exp

Or clone the repository and run:

python setup.py install

If you want to use AnchorTextExplainer, you have to run the following:

python -m spacy download en_core_web_lg

And if you want to use BERT to perturb inputs (recommended), also install transformers:

pip install torch transformers spacy && python -m spacy download en_core_web_sm

Examples

See notebooks folder for tutorials. Note that from version 0.0.1.0, it only works on python 3.

Citation

Here is the bibtex if you want to cite this work.

Comments
  • AnchorTabularExplainer without categorical features

    AnchorTabularExplainer without categorical features

    Hi @marcotcr ,

    Firstly, the paper is great and I'm really looking forward to using the package.

    I tried to use it on my own data where the AnchorTabularExplainer() object does not have any categorical_names (i.e. categorical features). I see that the code when calling the explain_instance() method goes to https://github.com/marcotcr/anchor/blob/master/anchor/anchor_tabular.py#L215 and since there are no categorical features, the mapping dict remains empty and so the method is not working.

    Am I missing something? Or, is there something I can do to overcome this?

    opened by asstergi 10
  • Precision & Coverage Issue, Possible effects on anchor?

    Precision & Coverage Issue, Possible effects on anchor?

    I was running playing around with Anchors and was getting really weird coverage and precision values. Then I saw the TODO (line 301 of anchor_tabular.py) saying that the precision and coverage measures are incorrect.

    Does this mean that the anchors that I computed were also wrong? Or does the issue only affect those metrics?

    Thanks!

    opened by GDPlumb 6
  • issue with the predict function, classifier problem, categorical dataset

    issue with the predict function, classifier problem, categorical dataset

    Hi, I found the paper on anchor extremely interesting. The dataset I have only has categorical features with values 0 and 1. I tested it for different models but the code, throws an error in the line, classifier_fn(self.encoder.transform(x)) . As the feature vectors that the dataset has are already discretized, anchor discretizes it further, irrespective of the input throwing an error from the predict function. Could you please help me with the issue. Thanks.

    opened by skavula 5
  • Issue with Spacy and the en_core_web_lg.

    Issue with Spacy and the en_core_web_lg.

    Hi,

    I'm trying to run this simple snippet of code, after having successfully (i.e., no error/warning) installed anchor, spacy and all the requirements (included the command 'python -m spacy download en_core_web_lg'):

    import spacy
    from anchor import anchor_text
    
    nlp = spacy.load('en_core_web_lg')
    explainer = anchor_text.AnchorText(nlp, ['negative', 'positive'], use_unk_distribution=False, use_bert=False)
    

    But I obtain the following error:

    Exception                                 Traceback (most recent call last)
    <ipython-input-5-7f4e7f3d6066> in <module>
    ----> 1 explainer = anchor_text.AnchorText(nlp, ['negative', 'positive'], use_unk_distribution=False, use_bert=False)
    
    ~/.local/lib/python3.7/site-packages/anchor/anchor_text.py in __init__(self, nlp, class_names, use_unk_distribution, use_bert, mask_string)
        117         self.tg = None
        118         self.use_bert = use_bert
    --> 119         self.neighbors = utils.Neighbors(self.nlp)
        120         self.mask_string = mask_string
        121         if not self.use_unk_distribution and self.use_bert:
    
    ~/.local/lib/python3.7/site-packages/anchor/utils.py in __init__(self, nlp_obj)
        319         self.to_check = [w for w in self.nlp.vocab if w.prob >= -15 and w.has_vector]
        320         if not self.to_check:
    --> 321             raise Exception('No vectors. Are you using en_core_web_sm? It should be en_core_web_lg')
        322         self.n = {}
        323 
    
    Exception: No vectors. Are you using en_core_web_sm? It should be en_core_web_lg
    

    I'm using this setting:

    Fedora 30 (but I can replicate it on Ubuntu 18.04)
    python 3.7.4
    spacy 2.3.2 (but I've also tried with 2.2.3)
    

    Thank you, Emanuele

    opened by EmanueleLM 3
  • AnchorTabularExplainer explain_instance

    AnchorTabularExplainer explain_instance

    Hi! I am trying to use AnchorTabularExplainer to explain the predictions of a random forest classifier, but when calling explain_instance function I get this error: "ValueError: X has different shape than during fitting. Expected 4, got 11." I have checked that the train dataset and test dataset have the same shape.

    Any suggestions of what can I do?

    opened by ri3017 3
  • Using anchor on pre-processed dataset

    Using anchor on pre-processed dataset

    Hi,

    First I want to thank you for the wonderful paper and creating this library in the first place. I would also like to ask a question.

    I want to use anchor on a dataset which is already processed (i.e. everything is binned and properly one-hot encoded so now I have a |'n_cases' x 'm_one-hot-encoded_features'| sparse matrix as (X) and |n_cases x 2| matrix as (y). Unfortunately, there is a lot of 'automated' dataset processing which is done during instatializaton of an anchor_tabular.AnchorTabularExplainer(), which produces various errors when I want to use it on an arbitrary dataset.

    Can you please advise what one needs to provide for 'class_names', 'feature_names' and 'categorical_names' in cases when the dataset is already one-hot encoded?

    Thank you!

    Best regards, Kiril

    opened by kirilgeorgiev82 3
  • Is it possible to use AnchorText with Tokenizer instead of CountVectorizer?

    Is it possible to use AnchorText with Tokenizer instead of CountVectorizer?

    Good afternoon. Thank you for such a great package!

    Is it possible to implement AnchorsText explainer with a model which takes in Tokenizer.texts_to_sequences data?

    My current implementation:

    Creating a reverse dictionary

    reverse_word_map = dict(map(reversed, word_index.items()))
    
    # Function takes a tokenized sentence and returns the words
    def sequence_to_text(list_of_indices):
        # Looking up words in dictionary
        words = [reverse_word_map.get(letter) for letter in list_of_indices]
        return words
    my_texts = np.array(list(map(sequence_to_text, X_test_encoded)))
    test_text = ' '.join(my_texts[4])
    
    def wrapped_predict(strings):
        print(strings)
        cnn_rep = tokenizer.texts_to_sequences(strings)
        text_data = pad_sequences(cnn_rep, maxlen=30)
        print(text_data)
        prediction = model.predict(text_data)
        print(prediction)
        return model.predict(text_data)
    
    nlp = spacy.load('en_core_web_sm')
    explainer = AnchorText(nlp, ['negative', 'positive'], use_unk_distribution=True)
    exp = explainer.explain_instance(test_text, wrapped_predict, threshold=0.95)
    

    And the current output is:

    ['war clan versus clan touch zenith final showdown bridge body count countless demons magic swords priests versus buddhist monks beautiful visions provided maestro rest good japanese flick rainy summer night'] [[ 181 6818 3962 6818 1039 19084 332 4277 2956 519 1415 3404 2136 1193 8736 8834 3962 14769 8249 197 5440 1925 15445 245 5 766 356 6073 1320 195]] [[0.50682825]] ['UNK UNK UNK clan touch UNK final showdown bridge UNK UNK countless UNK UNK UNK priests UNK UNK monks beautiful UNK provided UNK rest UNK japanese UNK rainy UNK UNK'] [[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6818 1039 332 4277 2956 3404 8834 8249 197 1925 245 766 6073]] [[0.50716233]]

    Error being thrown:

    ValueError: all the input arrays must have the same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

    It appears like it is working...sort of I am not really sure how if possible I can work around this error, any help would be greatly appreciated.

    opened by Enantiodromis 2
  • About the 'lending club' dataset

    About the 'lending club' dataset

    Hello, I want to use the 'lending club' dataset in my paper to do experiment. While the website has the terms of use. I can't confirm if I can use it for free. And when I try to use the 'anchor' I notice that you used it . So I wonder if you paid for the usage or just used it for free? Hope you could answer my question. Thank you so much. Shucen Ma

    opened by der1997 2
  • dependencies

    dependencies

    I'm trying to use anchor on my dataset, but I run into dependencies issues. My environment currently uses sklearn 0.22, but when looking into the anchor code I see that an older version of SKlearn is used. The setup.py file shows that the packet has dependencies on numpy, scipy, spacy and scikit learn but does not specify the version. Could you give the versions of the packets that you used?

    opened by LHuysmans 2
  • Filtering candidate anchors by coverage

    Filtering candidate anchors by coverage

    I noticed in anchor_base.py line 305, newly generated candidate anchors are filtered out if they don't have a larger coverage than the previous best candidate whose precision is large enough. But these new candidates are larger, and will always have coverage at most as large as the previous valid candidate. Would it be equivalent if we check if best_coverage == -1 as the first line in the while loop on line 302, and break if best_coverage is not -1 (meaning we select the best coverage anchor with high enough precision from the previous round, and not try making new candidates as their coverage can't possibly be larger)?

    Let me know if I'm missing something when you get the chance. Thanks.

    opened by DaveDe 2
  • Can we use Anchor to explain already built models in our own way?

    Can we use Anchor to explain already built models in our own way?

    Hi,

    First of all, I should say that this is really a great paper. Good luck with you.

    My question is, let's assume we have a tabular dataset. We just do encoding categorical variables as pre-processing and make a model using the random forest. And we save that model using .pickle format.

    After that can we use Anchor to explain that saved model.

    For example, I don't need to use these lines of code when making the model.

    explainer = anchor_tabular.AnchorTabularExplainer(dataset.class_names, dataset.feature_names, dataset.data, dataset.categorical_names) explainer.fit(dataset.train, dataset.labels_train, dataset.validation, dataset.labels_validation) c = sklearn.ensemble.RandomForestClassifier(n_estimators=50, n_jobs=5) c.fit(explainer.encoder.transform(dataset.train), dataset.labels_train)

    I just need to use, c = sklearn.ensemble.RandomForestClassifier(n_estimators=50, n_jobs=5) c.fit(dataset.train, dataset.labels_train) these two lines when making the model.

    Thank you.

    opened by DiliSR 2
  • Get anchors from tuple method

    Get anchors from tuple method

    ``Hi, Thanks for the great contribution. I noticed a comment in the beginning of the method get_anchor_from_tuple() in anchor_base.py, which says: # TODO: This is wrong, some of the intermediate anchors may not exist. Is it still the case that this method is incorrect? What is this method supposed to be doing?

    opened by ishcha 0
  • anchor rules for time series classification

    anchor rules for time series classification

    thanks for the excellent job.

    I have created a dataset where each target variable has a 2D array. The dataset format is the one used by TSAI:

    Considerations about our 3D input data or X vs a 2D target vars Y (my dataset are just numpy arrays)

    To be able to use timeseriesAI your data needs to have 3 dimensions:

    • number of samples
    • number of features (aka variables, dimensions, channels)
    • number of steps (or length, time steps, sequence steps)
      (from this noteboook)

    here a diagram of the dataset format

    The problem is not only related with time series classification, but with any tabular data, where each target variable contains multiple values per input feature var.

    The question is:

    could I use this package to explain anchor rules of a model trained with this format of data?

    here a code example - 01_Intro_to_Time_Series_Classification.ipynb I am using to generate my own model

    PD: This format would be quite similar to a B/W image classification dataset, where we have one image per sample and the labels are just numbers (NMIST example)

    opened by Jalagarto 2
  • # TODO: precision recall is all wrong, coverage functions wont work

    # TODO: precision recall is all wrong, coverage functions wont work

    In master for this repo, in anchor_tabular.py, this comment sits at the top of a function near the bottom of the module. It's not a function that obviously has to DO with coverage, so at first read it was a bit cryptic to me.

    Is this TODO still describing a present issue? Should I be worried that the coverage values I am seeing from anchors_tabular are inaccurate?

    opened by CJMenart 0
  • Justification for removing bisection for computing KL-confidence regions

    Justification for removing bisection for computing KL-confidence regions

    Hi @marcotcr, whilst browsing the repo I noticed that you've removed the bisection part for computing the upper and lower confidence bounds: https://github.com/marcotcr/anchor/commit/ff0924e6bcaaa7149e2940303cd0b22994112157.

    The bisection is required to compute the KL-bounds (4) and (5) defined in the bandit paper so I'm a bit puzzled as to why you've removed it. The new behaviour is also not Hoeffding-bound based (3) but rather is equivalent to running bisection just once and then returning whatever is found (note - there is also no guarantee that the bound returned will satisfy the inequalities in (4) and (5) - in practice I think this will result in looser bounds).

    opened by jklaise 1
  • Possible anchor coverage issue

    Possible anchor coverage issue

    Hello Marco,

    In the 'Anchor on tabular data.ipynb' example, and also in the SP Anchors example from the anchor-experiments repository, I am not sure if I am misunderstanding this, but I think you may be computing the coverage of an Anchor in a wrong way. You seem to only check for equality in feature values (of course, only for the features included in the Anchor) between the instance that generated the Anchor and the other instances from the dataset. The specific files and lines I am referring to are the following:

    • 'Anchor on tabular data.ipynb': Line fit_anchor = np.where(np.all(dataset.test[:, exp.features()] == dataset.test[idx][exp.features()], axis=1))[0]
    • 'utils.py' from the 'anchor-experiments' repository: Line 347, 347:
    covered[i] = set(
                np.all(data[:, fs] == d[fs], axis=1).nonzero()[0])
    

    Shouldn't you compute the coverage based on the operator used in the Anchor, similar to what you do in file 'anchor_tabular.py' from the on Lines 254-261?

    for i in mapping:
                    f, op, v = mapping[i]
                    if op == 'eq':
                        data[:, i] = (d_raw_data[:, f] == data_row[f]).astype(int)
                    if op == 'leq':
                        data[:, i] = (d_raw_data[:, f] <= v).astype(int)
                    if op == 'geq':
                        data[:, i] = (d_raw_data[:, f] > v).astype(int)
    

    Thank you! -Alexandru

    opened by danyveve 0
  • Anchor for regression?

    Anchor for regression?

    Dear Marco,

    Thank you for publishing the code!

    I have tried to use Anchor to explain a RandomForestRegressor, it works with no bug, however it took longer than a RandomForestClassifier and the length of rules returned is longer expecially when the precision is low. I would like to know if Anchor is indeed expected to work for regression tasks, and if so, what values expected for the parameter 'class_names' ?

    Thank you in advance!

    opened by LishengSun 2
Owner
Marco Tulio Correia Ribeiro
Marco Tulio Correia Ribeiro
Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)

Hierarchical neural-net interpretations (ACD) ?? Produces hierarchical interpretations for a single prediction made by a pytorch neural network. Offic

Chandan Singh 111 Jan 3, 2023
Visual Computing Group (Ulm University) 99 Nov 30, 2022
Code for visualizing the loss landscape of neural nets

Visualizing the Loss Landscape of Neural Nets This repository contains the PyTorch code for the paper Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer

Tom Goldstein 2.2k Dec 30, 2022
The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

This repository contains the software implementation of most algorithms used or developed in my research. The LaTeX and Python code for generating the

João Fonseca 3 Jan 3, 2023
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 18 Oct 6, 2022
This is discord nitro code generator and checker made with python. This will generate nitro codes and checks if the code is valid or not. If code is valid then it will print the code leaving 2 lines and if not then it will print '*'.

Discord Nitro Generator And Checker ⚙️ Rᴜɴ Oɴ Rᴇᴘʟɪᴛ ??️ Lᴀɴɢᴜᴀɢᴇs Aɴᴅ Tᴏᴏʟs If you are taking code from this repository without a fork, then atleast

Vɪɴᴀʏᴀᴋ Pᴀɴᴅᴇʏ 37 Jan 7, 2023
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

null 81 Dec 1, 2022
Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

Benjamin Biggs 29 Dec 28, 2022
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

null 73 Nov 6, 2022
Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

Justin Terry 32 Nov 9, 2021
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021
Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

J K Terry 32 Nov 9, 2021
A toolkit to automatically crawl the paper list and download paper pdfs of ACL Ahthology.

ACL-Anthology-Crawler A toolkit to automatically crawl the paper list and download paper pdfs of ACL Anthology

Ray GG 9 Oct 9, 2022
PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

PaperRobot PaperRobot 是一个论文抓取工具,可以快速批量下载大量论文,方便后期进行持续的论文管理与学习。 PaperRobot通过多个接口抓取论文,目前抓取成功率维持在90%以上。通过配置Config文件,可以抓取任意计算机领域相关会议的论文。 Installation Down

moxiaoxi 47 Nov 23, 2022
A Replit Game Know As ROCK PAPER SCISSOR AND ALSO KNOW AS STONE PAPER SCISSOR

?? ᴿᴼᶜᴷ ᴾᴬᴾᴱᴿ ᔆᶜᴵᔆᔆᴼᴿ ?? ⚙️ Rᴜɴ Oɴ Rᴇᴘʟɪᴛ ??️ Lᴀɴɢᴜᴀɢᴇs Aɴᴅ Tᴏᴏʟs If you are taking code from this repository without a fork, then atleast give credit

ANKIT KUMAR 1 Dec 25, 2021
Rock-Paper-Scissors - Rock Paper Scissors With Python

Rock-Paper-Scissors The familiar game of Rock, Paper, Scissors is played like th

Lateefah Ajadi 0 Jan 15, 2022
Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

Malik Boudiaf 138 Dec 12, 2022
Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Recurrent VLN-BERT Code of the Recurrent-VLN-BERT paper: A Recurrent Vision-and-Language BERT for Navigation Yicong Hong, Qi Wu, Yuankai Qi, Cristian

YicongHong 109 Dec 21, 2022
Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination The offical implementation for the "NOH-NMS: Improving Pedestrian Detection by

Tencent YouTu Research 64 Nov 11, 2022