An Explainable Leaderboard for NLP

Overview

ExplainaBoard: An Explainable Leaderboard for NLP

Introduction | Website | Download | Backend | Paper | Video | Bib

Introduction

ExplainaBoard is an interpretable, interactive and reliable leaderboard with seven (so far) new features (F) compared with generic leaderboard.

  • F1: Single-system Analysis: What is a system good or bad at?
  • F2: Pairwise Analysis: Where is one system better (worse) than another?
  • F3: Data Bias Analysis: What are the characteristics of different evaluated datasets?
  • F5: Common errors: What are common mistakes that top-5 systems made?
  • F6: Fine-grained errors: where will errors occur?
  • F7: System Combination: Is there potential complementarity between different systems?

Website

We deploy ExplainaBoard as a Web toolkit, which includes 9 NLP tasks, 40 datasets and 300 systems. Detailed information is as follows.

Task

Task Sub-task Dataset Model Attribute
Sentiment 8 40 2
Text Classification Topics 4 18 2
Intention 1 3 2
Text-Span Classification Aspect Sentiment 4 20 4
Text pair Classification NLI 2 6 7
NER 3 74 9
Sequence Labeling POS 3 14 4
Chunking 3 14 9
CWS 7 64 7
Structure Prediction Semantic Parsing 4 12 4
Text Generation Summarization 2 36 7

Download System Outputs

We haven't released datasets or corresponding system outputs that require licenses. But If you have licenses please fill in this form and we will send them to you privately. (Description of output's format can refer here If these system outputs are useful for you, you can cite our work.

Test Your Results

pip install -r requirements.txt

Description of Each Directory

  • task-[task_name]: fine-grained analysis for each task, aiming to generating fine-grained analysis results with the json format. For example, task-mlqa can calculate the fine-graied F1 scores for different systems, and output corresponding json files in task-mlqa/output/ .

  • meta-eval is a sort of controller, which can be used to start the fine-graind anlsysis of all tasks, and analyze output json files.

    • calculate fine-grained results for all tasks: ./meta-eval/run-allTasks.sh
        cd ./meta-eval/
        ./run-allTasks.sh
    • merge json files of all tasks into a csv file, which would be useful for further SQL import: ./meta-eval/genCSV/json2csv.py
        cd ./meta-eval/genCSV/json2csv.py
        python json2csv.py > explainabord.csv
  • src stores some auxiliary codes.

Submit Your Results

You can submit your system's output by this form following the format description.

Acknowledgement

We thanks all authors who share their system outputs with us: Ikuya Yamada, Stefan Schweter, Colin Raffel, Yang Liu, Li Dong. We also thank Vijay Viswanathan, Yiran Chen, Hiroaki Hayashi for useful discussion and feedback about ExplainaBoard.

Comments
  • Is the current applicable condition of t-test correct?

    Is the current applicable condition of t-test correct?

    opened by tetsuok 22
  • Allowed specification of the metric #dimensions

    Allowed specification of the metric #dimensions

    This PR loosens the restriction that sufficient statistics must be a vector, and allows them to be a tensor with the dimension equal to Metric.stats_ndim().

    It also demonstrates how this works on the NLGMetaEvaluation metric.

    @pfliu-nlp and @odashi : could you please check this PR as a potential solution to the discussion in https://github.com/neulab/ExplainaBoard/pull/527 ?

    (sorry, after sending the review request I made a change of naming from dim->ndim, which I think is more in line with the naming in numpy)

    opened by neubig 12
  • test_generate_system_analysis in integration_tests.summarization_test.SummarizationTest is too slow

    test_generate_system_analysis in integration_tests.summarization_test.SummarizationTest is too slow

    commit 8c514c3d81a079d967d208f8bc330c2f202620bb (#437) increases the execution time of integration_tests.summarization_test.SummarizationTest. When I measured on my GCP VM, the time of the test increased by 430 seconds (from 6 seconds to 436 seconds), which is too slow to run as automated tests in pull requests. Slow tests need to be removed or replaced with more focused and fast tests. In general, having slow tests leads to productivity drains: Time to update pull requests takes longer, developers would try to include large commits into pull requests to work around slow CI time, pull requests become expensive to review, which makes identifying bugs or design flaws in code review difficult.

    Repro steps

    rm -rf ~/.cache/explainaboard
    time python -m unittest -v integration_tests.summarization_test.SummarizationTest
    

    Output

    test_datalab_loader (integration_tests.summarization_test.SummarizationTest) ... skipped 'time consuming'
    test_default_features_dont_modify_condgen (integration_tests.summarization_test.SummarizationTest) ... ok
    test_generate_system_analysis (integration_tests.summarization_test.SummarizationTest) ... WARNING:datalabs.load:Couldn't find a directory or a dataset named 'cnn_dailymail' in this version. It was picked from the master branch on github instead.
    WARNING:datalabs.builder:No config specified, defaulting to: cnn_dailymail/3.0.0
    WARNING:datalabs.builder:Reusing dataset cnn_dailymail (/home/t/.cache/expressai/datalab/cnn_dailymail/3.0.0/3.0.0/6e2f5d689f0225c4f22eb78d11ba7a21399810c5cb853edafe39b1d006a1ff95)
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 287113/287113 [06:20<00:00, 755.03it/s]
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 287113/287113 [00:29<00:00, 9616.19it/s]
    INFO:explainaboard:caching stats for cnn_dailymail None
    calculating example-level features: 3it [00:00, 51.88it/s]
    calculating token-level features: 3it [00:00, 139.83it/s]
    /home/t/explainaboard-fork/explainaboard/metrics/metric.py:336: DeprecationWarning: Use of keyword argument `alpha` for method `interval` is deprecated. Use first positional argument or keyword argument `confidence` instead.
      return stats_t.interval(
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 349.50it/s]
    ok
    test_generate_system_human_eval (integration_tests.summarization_test.SummarizationTest) ... skipped 'Not yet fixed in v0.11'
    test_load_tsv (integration_tests.summarization_test.SummarizationTest) ... ok
    
    ----------------------------------------------------------------------
    Ran 5 tests in 438.659s
    
    OK (skipped=2)
    python -m unittest -v integration_tests.summarization_test.SummarizationTest  434.35s user 2.58s system 98% cpu 7:22.46 total
    
    opened by tetsuok 12
  • Use 'confidence' instead of deprecated 'alpha' for scipy.stats.t.interval

    Use 'confidence' instead of deprecated 'alpha' for scipy.stats.t.interval

    Reduced heavy logging uncovered buried DeprecationWarnings in tests. We get the following DeprecationWarning in the tests that invoke scipy.stats.t.interval method:

    test_hits (explainaboard.tests.test_metric.TestMetric) ... /home/runner/work/ExplainaBoard/ExplainaBoard/explainaboard/metrics/metric.py:338: DeprecationWarning: Use of keyword argument `alpha` for method `interval` is deprecated. Use first positional argument or keyword argument `confidence` instead.
    

    This PR fixes the warning as the warning suggests.

    opened by tetsuok 12
  • Cache pip dependencies to speed up CI

    Cache pip dependencies to speed up CI

    This PR attempts to speed up both unit-tests and integration-tests CI jobs. Every CI job spends about 2 minutes on installing pip packages. The step dominates about 90% of the total time of unit-tests and about 30% of the total time of integration-tests. The step to install pip packages can be skipped by creating virtual environments and caching the installed packages onto the environments using actions/cache. Note that actions/setup-python@v4 doesn't support caching installed packages. It only allow to avoid re-downloading by caching downloaded packages from PyPI under ~/.cache/pip.

    Dependencies listed in setup.py are moved to requirements.txt. This is to generate lock files for every Python version from requirements.txt. The generated lock files are used as keys to caches to properly invalidate when dependencies are updated. Unless dependencies are changed, every CI job should be reproducible (with respect to installing pip dependencies). Making the CI jobs reproducible and faster achieves at the expense of periodical updates of these lock files. Maintaining lock files for dependencies is pretty common in other programming languages such as JS and Rust. This update can be done by running cicd/gen_requirements_lock.sh.

    opened by tetsuok 12
  • Refactor/loaders

    Refactor/loaders

    1. Commit 1: refactored Loader.__init__()
    • made data a required argument
    • all loaders now call the __init__ method of the base loader
    1. Commit 2: implemented file-specific loaders to simplify the task-specific loaders
    • implements TSVFileLoader, JSONFileLoader, DatalabFileLoader and CoNLLFileLoader which knows how to load a certain type of file given the fields
    • refactored all the existing loaders to use these file-specific loaders instead
    • QAMultipleChoiceLoader KgLinkTailPredictionLoader still uses custom load() methods because they support user-defined features. The way they load these extra features is different so I decided to leave them for now. It'll be easy to incorporate user-defined features to the file loaders (we just need to update the fields based on self.user_defined_features_configs)
    • hellaswag is removed in https://github.com/neulab/ExplainaBoard/commit/4b93b9542b714754eb91d718cd82b98ab706d11c
    • This refactor makes it easier to do #141 in the future. We just need to have two sets of file loaders for each task-specific loader. One is for the (input, reference_output) file and the other one is for the predictions file.

    Please let me know what you think! Thanks!

    opened by lyuyangh 12
  • Potential issue with spearman R bootstrapping

    Potential issue with spearman R bootstrapping

    We observed the following test failure when integrating another PR:

    ======================================================================
    FAIL: test_sample_level_spearmanr_bootstrap (integration_tests.meta_eval_wmt_da_test.MetaEvalNLGCITest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/runner/work/ExplainaBoard/ExplainaBoard/integration_tests/meta_eval_wmt_da_test.py", line 191, in test_sample_level_spearmanr_bootstrap
        self.assertAlmostEqual(ci[0], 0.6488, 2)
    AssertionError: 0.7325904563487001 != 0.6488 within 2 places (0.08379045634870008 difference)
    
    ----------------------------------------------------------------------
    

    We are not sure whether this is an issue with the test or the underlying code, but as a temporary measure we reduced the sensitivity of the test. We should go back and check to make sure whether this is just due to bootstrapping variance or whether it's due to a bug in the test itself.

    opened by neubig 10
  • Implement CalibrationAnalysis

    Implement CalibrationAnalysis

    Calibration is whether a system's confidence is well-correlated with whether the system got the answer right or not. It would be nice if we could do analyses related to calibration, such as calculating expected calibration error: https://arxiv.org/abs/1706.04599

    I think this should probably be implemented as an additional variety of analysis, which would be simple and self-contained: https://github.com/neulab/ExplainaBoard/blob/main/explainaboard/analysis/analyses.py#L45

    good first issue new-analysis 
    opened by neubig 10
  • Correct training set feature field names

    Correct training set feature field names

    Previously, calculation of training set features would fail if the datalab dataset used unconventional column names.

    This does the following things:

    1. Makes an option to use Loader to load only datasets without system outputs if output_data is set to None
    2. Changes _statistics_func to simply take in the samples and system info, and return the statistics (in contrast to previously using the datalab aggregating() functionality.
    3. Loads data used in calculating training features through Loader so that appropriate field mapping will be performed

    Fixes https://github.com/neulab/ExplainaBoard/issues/416

    Notably, @pfliu-nlp, "2." may require some discussion, here are the pros and cons of doing it this new way:

    Pros

    • it makes the statistics code self-contained and not rely on an external library. honestly, even though I'm very familiar with explainaboard, I was always a bit confused about what was actually going on here because the aggregating() decorator was a bit mysterious to me
    • statistics_func can now be called on any set of samples, so it could be called on a non-datalab dataset. this may be useful if we want to, for example, calculate training set features with custom datasets

    Cons

    • the datalab aggregating operator may have implemented parallelism so this aggregation of statistics might be able to be done faster? but I actually am not sure if that's actually the case in practice
    • something else I'm missing?
    opened by neubig 9
  • Unsafe en_core_web_sm downloading in setup.py

    Unsafe en_core_web_sm downloading in setup.py

    Currently setup.py will execute an external command python -m spacy download en_core_web_sm to install a spaCy model during setup. This approach has several issues about system consystency:

    • spaCy models are intendedly not registered to PyPI, and PyPI does not allow libraries depending on external requirements.
    • The command is just a system command which possibly breaks the system, or won't work correctly.

    Since there is no recommended way to add spaCy models to install_requires, we need to take either of follows:

    • Download the model programatically when spacy.load() fails.
    • Bundle the model file into this repository.
    • Ask users to download appropriate models additionally.
    opened by odashi 9
  • How to name metrics when registering them

    How to name metrics when registering them

    There are two ways to name metrics

    (1)

    
    @dataclass
    @metric_config_registry.register("AccuracyConfig")
    class AccuracyConfig(MetricConfig):
        def to_metric(self):
            return Accuracy(self)
    
    

    (2)

    @dataclass
    @metric_config_registry.register("Accuracy")
    class AccuracyConfig(MetricConfig):
        def to_metric(self):
            return Accuracy(self)
    
    

    Currently, we are using (1), which, however, is inconsistent with how the Processor names them. For example:

    https://github.com/neulab/ExplainaBoard/blob/cd54c1b61e490295db8c1cfee8460aff4cce1880/explainaboard/processors/text_classification.py#L132

    Which one do you prefer?

    If we go with (2), this code should be modified to avoid naming bug: https://github.com/neulab/ExplainaBoard/blob/cd54c1b61e490295db8c1cfee8460aff4cce1880/explainaboard/metrics/registry.py#L11

    config_cls = metric_config_registry.get_type(dikt["name"]) # instead of type
    

    I could send a PR of this.

    opened by pfliu-nlp 8
  • add tests for meval to replicate paper results

    add tests for meval to replicate paper results

    Overview

    This PR adds tests to verify whether our implemented meta-evaluation processor is able to replicate reported results from existing published papers.

    Relevant issue: https://github.com/inspired-co/taskboard/issues/180

    Details

    • Collect system outputs from this repo of two metrics (rouge1 and bartscore)
    • Using Explainaboard to process these outputs and compare the results with the ones reported from the above repo.

    References

    • Paper: BARTSCORE: Evaluating Generated Text as Text Generation
    • Code: https://github.com/neulab/BARTScore
    opened by pfliu-nlp 0
  • `TypeError: 'type' object is not subscriptable` when attempt to import or use CLI

    `TypeError: 'type' object is not subscriptable` when attempt to import or use CLI

    How I install ?

    pip install explainaboard
    or
    pip install -U --force-reinstall explainaboard
    

    Both cause same problem

    Version : 0.12.3

    When try to import explainaboard, or run explainaboard from CLI, same error:

    Python 3.8.15 (default, Nov 24 2022, 15:19:38) 
    [GCC 11.2.0] :: Anaconda, Inc. on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import explainaboard
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/__init__.py", line 6, in <module>
        from explainaboard.loaders import DatalabLoaderOption, get_loader_class
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/loaders/__init__.py", line 5, in <module>
        from explainaboard.loaders import file_loader, loader_factory
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/loaders/file_loader.py", line 18, in <module>
        from explainaboard.analysis.analyses import Analysis
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/analysis/analyses.py", line 14, in <module>
        from explainaboard.analysis.bucketing import get_bucketing_method
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/analysis/bucketing.py", line 13, in <module>
        from explainaboard.serialization.types import SerializableData
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/serialization/__init__.py", line 8, in <module>
        from explainaboard.serialization.types import Serializable
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/serialization/types.py", line 21, in <module>
        list["PrimitiveData"],  # type: ignore
    TypeError: 'type' object is not subscriptable
    
    
    opened by ttpro1995 0
  • Bump mypy version to 0.990

    Bump mypy version to 0.990

    Since mypy 0.990 was released yesterday (blog post), it would be better to bump mypy version to 0.990 to take advantage of the new features and bug fixes. It seems there is some sort of efforts to be made to adopt the version when I run mypy 0.990 in the codebase of explainaboard. Below is the output of pre-commit run mypy --color=never --all-files

    mypy.....................................................................Failed
    - hook id: mypy
    - exit code: 1
    
    explainaboard/utils/spacy_loader.py:5: error: Cannot find implementation or library stub for module named "spacy"  [import]
    explainaboard/utils/spacy_loader.py:6: error: Cannot find implementation or library stub for module named "spacy.language"  [import]
    explainaboard/utils/agreement.py:5: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/sum_attribute.py:8: error: Cannot find implementation or library stub for module named "nltk"  [import]
    explainaboard/analysis/sum_attribute.py:10: error: Cannot find implementation or library stub for module named "nltk.util"  [import]
    explainaboard/utils/async_eaas.py:10: error: Cannot find implementation or library stub for module named "eaas"  [import]
    explainaboard/third_party/text_to_sql_test_suit_eval/parse.py:7: error: Cannot find implementation or library stub for module named "sqlparse"  [import]
    explainaboard/third_party/text_to_sql_test_suit_eval/parse.py:8: error: Cannot find implementation or library stub for module named "sqlparse.sql"  [import]
    explainaboard/third_party/text_to_sql_test_suit_eval/parse.py:9: error: Cannot find implementation or library stub for module named "sqlparse.tokens"  [import]
    setup.py:3: error: Skipping analyzing "setuptools": module is installed, but missing library stubs or py.typed marker  [import]
    explainaboard/metrics/auxiliary/qa_table_text_hybrid_auxiliary.py:16: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/auxiliary/qa_table_text_hybrid_auxiliary.py:17: error: Cannot find implementation or library stub for module named "scipy.optimize"  [import]
    explainaboard/utils/logging.py:9: error: Library stubs not installed for "tqdm"  [import]
    explainaboard/utils/logging.py:9: note: Hint: "python3 -m pip install types-tqdm"
    explainaboard/utils/logging.py:9: note: (or run "mypy --install-types" to install all missing stub packages)
    explainaboard/utils/logging.py:16: error: Incompatible default for argument "desc" (default has type "None", argument has type "str")  [assignment]
    explainaboard/utils/logging.py:16: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
    explainaboard/utils/logging.py:16: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
    explainaboard/visualizers/bar_chart.py:8: error: Cannot find implementation or library stub for module named "matplotlib"  [import]
    explainaboard/visualizers/bar_chart.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/bucketing.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/feature.py:239: error: Incompatible types in assignment (expression has type "Dict[str, FeatureType]", target has type "SerializableData")  [assignment]
    explainaboard/utils/agreement_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/utils/typing_utils_test.py:10: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/serialization/serializers.py:53: error: Incompatible return value type (got "Union[List[Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]], Tuple[Union[None, bool, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable], ...]]", expected "Union[None, bool, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]")  [return-value]
    explainaboard/serialization/serializers.py:53: error: Generator has incompatible item type "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"; expected "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [misc]
    explainaboard/serialization/serializers.py:89: error: Incompatible return value type (got "Union[List[Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]], Tuple[Union[None, bool, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]], ...]]", expected "Union[None, bool, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]")  [return-value]
    explainaboard/serialization/serializers.py:89: error: Generator has incompatible item type "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [misc]
    explainaboard/utils/tensor_analysis.py:12: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/metric.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/metric.py:11: error: Cannot find implementation or library stub for module named "scipy.stats"  [import]
    explainaboard/metrics/metric.py:178: error: Dict entry 0 has incompatible type "str": "Dict[str, MetricValue]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/metrics/metric.py:196: error: Argument 1 to "MetricResult" has incompatible type "Dict[str, Union[None, bool, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]]"; expected "Dict[str, MetricValue]"  [arg-type]
    explainaboard/third_party/text_to_sql_test_suit_eval/process_sql.py:30: error: Cannot find implementation or library stub for module named "nltk"  [import]
    explainaboard/utils/tokenizer.py:15: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers"  [import]
    explainaboard/utils/tokenizer.py:16: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers.tokenizer_intl"  [import]
    explainaboard/utils/tokenizer.py:17: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers.tokenizer_ja_mecab"  [import]
    explainaboard/utils/tokenizer.py:18: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers.tokenizer_zh"  [import]
    explainaboard/metrics/continuous.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/metric_test.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/external_eval.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/meta_evaluation.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/meta_evaluation.py:9: error: Cannot find implementation or library stub for module named "scipy"  [import]
    explainaboard/analysis/feature_test.py:69: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/feature_test.py:134: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/feature_test.py:205: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:230: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:231: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:232: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:233: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Collection[str]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:234: error: List item 0 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:234: error: List item 1 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:234: error: List item 2 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:235: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Tuple[Dict[str, object], Dict[str, object], Dict[str, object]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:237: error: Dict entry 0 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:237: error: Dict entry 1 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:237: error: Dict entry 2 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:240: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:241: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:242: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:243: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Collection[str]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:244: error: List item 0 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:244: error: List item 1 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:244: error: List item 2 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:245: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Tuple[Dict[str, object], Dict[str, object], Dict[str, object]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:247: error: Dict entry 0 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:247: error: Dict entry 1 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:247: error: Dict entry 2 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/metrics/eaas.py:9: error: Cannot find implementation or library stub for module named "eaas.async_client"  [import]
    explainaboard/metrics/eaas.py:10: error: Cannot find implementation or library stub for module named "eaas.config"  [import]
    explainaboard/metrics/eaas.py:11: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/eaas.py:12: error: Cannot find implementation or library stub for module named "sacrebleu"  [import]
    explainaboard/metrics/eaas.py:13: error: Cannot find implementation or library stub for module named "sacrebleu.metrics.base"  [import]
    explainaboard/metrics/eaas.py:13: error: Cannot find implementation or library stub for module named "sacrebleu.metrics"  [import]
    explainaboard/metrics/ranking.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/performance.py:51: error: Dict entry 1 has incompatible type "str": "List[int]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/performance.py:52: error: Dict entry 2 has incompatible type "str": "Dict[str, MetricResult]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/performance.py:72: error: Argument 1 to "float" has incompatible type "Union[str, None, int, float, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"; expected "Union[SupportsFloat, SupportsIndex, str, bytes, bytearray, memoryview, array[Any], mmap, _CData, PickleBuffer]"  [arg-type]
    explainaboard/analysis/performance.py:73: error: Argument 1 to "float" has incompatible type "Union[str, None, int, float, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"; expected "Union[SupportsFloat, SupportsIndex, str, bytes, bytearray, memoryview, array[Any], mmap, _CData, PickleBuffer]"  [arg-type]
    explainaboard/metrics/log_prob.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/accuracy.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/external_eval_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/performance_test.py:219: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/performance_test.py:241: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/metrics/qa_table_text_hybrid.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    integration_tests/meta_eval_nlg_test.py:5: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/accuracy_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/analyses.py:12: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/analyses.py:245: error: Dict entry 0 has incompatible type "str": "List[BucketPerformance]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:446: error: Dict entry 0 has incompatible type "str": "List[BucketPerformance]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:563: error: Argument "bucket_setting" to "__call__" of "BucketingFn" has incompatible type "List[Tuple[float, float]]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses.py:563: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses.py:563: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses.py:658: error: Dict entry 2 has incompatible type "str": "List[int]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:722: error: Dict entry 1 has incompatible type "str": "List[ComboOccurence]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:841: error: Dict entry 1 has incompatible type "str": "Dict[str, FeatureType]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:842: error: Dict entry 2 has incompatible type "str": "Dict[str, MetricConfig]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/metrics/extractive_qa.py:11: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/analyses_test.py:90: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Collection[str]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:237: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[BucketPerformance]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:237: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:237: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses_test.py:266: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:280: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:321: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[ComboOccurence]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:321: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:321: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses_test.py:328: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Union[Sequence[str], None, int, float, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:350: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:477: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[BucketPerformance]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:477: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:477: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses_test.py:507: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:518: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "Dict[str, FeatureType]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:518: note: "Dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:518: note: Consider using "Mapping" instead, which is covariant in the value type
    explainaboard/analysis/analyses_test.py:519: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "Dict[str, MetricConfig]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:519: note: "Dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:519: note: Consider using "Mapping" instead, which is covariant in the value type
    explainaboard/analysis/result.py:33: error: Dict entry 0 has incompatible type "str": "Dict[str, Dict[str, MetricResult]]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/result.py:34: error: Dict entry 1 has incompatible type "str": "List[AnalysisResult]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/loaders/file_loader.py:15: error: Cannot find implementation or library stub for module named "datalabs"  [import]
    explainaboard/loaders/file_loader.py:16: error: Cannot find implementation or library stub for module named "datalabs.features.features"  [import]
    explainaboard/loaders/file_loader.py:212: error: Incompatible default for argument "fields" (default has type "None", argument has type "List[FileLoaderField]")  [assignment]
    explainaboard/loaders/file_loader.py:212: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
    explainaboard/loaders/file_loader.py:212: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
    explainaboard/loaders/file_loader.py:475: error: Incompatible default for argument "fields" (default has type "None", argument has type "List[FileLoaderField]")  [assignment]
    explainaboard/loaders/file_loader.py:475: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
    explainaboard/loaders/file_loader.py:475: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
    explainaboard/loaders/file_loader.py:522: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/analysis/result_test.py:35: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Dict[str, MetricResult]]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/result_test.py:36: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[AnalysisResult]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/result_test.py:36: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/result_test.py:36: note: Consider using "Sequence" instead, which is covariant
    explainaboard/third_party/text_to_sql_test_suit_eval/exec_eval.py:11: error: Library stubs not installed for "tqdm"  [import]
    explainaboard/info.py:186: error: Dict entry 11 has incompatible type "str": "List[AnalysisLevel]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/info.py:187: error: Dict entry 12 has incompatible type "str": "List[Analysis]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/info.py:260: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/feature_funcs.py:8: error: Cannot find implementation or library stub for module named "lexicalrichness"  [import]
    explainaboard/analysis/feature_funcs.py:8: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
    explainaboard/analysis/feature_funcs.py:9: error: Cannot find implementation or library stub for module named "sacrebleu"  [import]
    explainaboard/meta_analyses/ranking.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/meta_analyses/ranking.py:9: error: Cannot find implementation or library stub for module named "pandas"  [import]
    explainaboard/metrics/f1_score.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/processor.py:9: error: Cannot find implementation or library stub for module named "eaas.async_client"  [import]
    explainaboard/processors/processor.py:10: error: Cannot find implementation or library stub for module named "eaas.config"  [import]
    explainaboard/processors/sequence_labeling.py:43: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/processors/argument_pair_extraction.py:34: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/processors/qa_tat.py:7: error: Cannot find implementation or library stub for module named "datalabs"  [import]
    explainaboard/processors/language_modeling.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/conditional_generation.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/cloze_generative.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/summarization.py:8: error: Cannot find implementation or library stub for module named "datalabs.operations.featurize.plugins.summarization.sum_attribute"  [import]
    integration_tests/summarization_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    integration_tests/meta_eval_wmt_da_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/text_to_sql.py:11: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/f1_score_test.py:7: error: Cannot find implementation or library stub for module named "sklearn.metrics"  [import]
    explainaboard/visualizers/draw_charts.py:24: error: Cannot find implementation or library stub for module named "matplotlib"  [import]
    explainaboard/visualizers/draw_charts.py:25: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/info_test.py:116: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[AnalysisLevel]"; expected "SerializableData"  [arg-type]
    explainaboard/info_test.py:116: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/info_test.py:116: note: Consider using "Sequence" instead, which is covariant
    explainaboard/info_test.py:117: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[Analysis]"; expected "SerializableData"  [arg-type]
    explainaboard/info_test.py:117: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/info_test.py:117: note: Consider using "Sequence" instead, which is covariant
    explainaboard/info_test.py:160: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Union[Collection[str], None, int, float, List[PrimitiveData], Tuple[PrimitiveData, ...]]]"; expected "PrimitiveData"  [arg-type]
    integration_tests/metric_test.py:6: error: Cannot find implementation or library stub for module named "eaas"  [import]
    integration_tests/metric_test.py:7: error: Cannot find implementation or library stub for module named "eaas.async_client"  [import]
    integration_tests/metric_test.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/explainaboard_main.py:10: error: Cannot find implementation or library stub for module named "eaas.endpoint"  [import]
    explainaboard/explainaboard_main.py:10: error: Cannot find implementation or library stub for module named "eaas"  [import]
    explainaboard/explainaboard_main.py:89: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:90: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:91: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:92: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:93: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:94: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:364: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:365: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:367: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:368: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:369: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:370: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:371: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:390: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:401: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:402: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:403: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:404: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:405: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:406: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:407: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:408: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:499: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    integration_tests/cli_test.py:10: error: Cannot find implementation or library stub for module named "datalabs"  [import]
    Found 141 errors in 59 files (checked 231 source files)
    
    opened by tetsuok 0
  • add_tasks.md is out of date

    add_tasks.md is out of date

    It seems add_tasks.md is out of date. add_tasks.md mentions tasks.py in three places below:

    • https://github.com/neulab/ExplainaBoard/blame/fcedd5d7aab172b943c6b0025685b09744f149fd/docs/add_new_tasks.md#L6
    • https://github.com/neulab/ExplainaBoard/blame/fcedd5d7aab172b943c6b0025685b09744f149fd/docs/add_new_tasks.md#L12
    • https://github.com/neulab/ExplainaBoard/blame/fcedd5d7aab172b943c6b0025685b09744f149fd/docs/add_new_tasks.md#L133

    but the Python script was removed in #373. add_tasks.md needs to be updated properly.

    opened by tetsuok 0
  • Add system metadata class

    Add system metadata class

    Processor.process() takes metadata, which is used to directly initialize SysOutputInfo. However, these are essentially different data (especially, "metadata" $\subset$ SysOutputInfo, but not $=$) and the current implementation makes some confusion around this:

    The most significant abuse around this behavior is that FileLoaderMetadata is implicitly converted into SysOutputInfo. This shouldn't work unless explicit conversion: https://github.com/neulab/ExplainaBoard/blob/4cec0a01cbe2617e9a67a440be25ee4252f792b2/integration_tests/ner_test.py#L148-L154

    To this end, we need:

    • A struct defining the system metadata.
    • Change the behavior of Processor to take the system metadata, not a dict.
    • Either:
      • A conversion method between system metadata and FileLoaderReturn/SysOutputInfo
      • Include system metadata as a direct member of FileLoaderReturn/SysOutputInfo
    opened by odashi 3
  • Reconsider default number of buckets

    Reconsider default number of buckets

    Currently the default number of buckets is 4: https://github.com/neulab/ExplainaBoard/blob/38db95801cbd15e2e9b2db7b60c40bd7173e1deb/explainaboard/analysis/analyses.py#L117

    But this is probably too few when we're doing discrete bucketing. It'd probably be better to have the default be 4 for continuous and more (maybe 10) for discrete bucketing.

    opened by neubig 0
Releases(v0.8.5)
  • v0.8.5(Apr 2, 2022)

    This release:

    • Refactors the metrics class and the report structure.
    • Adds significance tests to all metrics.
    • Does major code style improvements and adds type checking.
    • Fixes several bugs.
    Source code(tar.gz)
    Source code(zip)
Owner
NeuLab
Graham Neubig's Lab at LTI/CMU
NeuLab
A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

A Python package implementing a new model for text classification with visualization tools for Explainable AI ?? Online live demos: http://tworld.io/s

Sergio Burdisso 285 Jan 2, 2023
Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Grading tools for Advanced NLP (11-711) Installation You'll need docker and unzip to use this repo. For docker, visit the official guide to get starte

Hao Zhu 2 Sep 27, 2022
Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Frog for Python This is a Python binding to the Natural Language Processing suite Frog. Frog is intended for Dutch and performs part-of-speech tagging

Maarten van Gompel 46 Dec 14, 2022
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 24.9k Jan 2, 2023
NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

Chartbeat Labs Projects 2k Jan 4, 2023
Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

RAMI ALRFOU 2.1k Jan 7, 2023
Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

Michael Petrochuk 2.1k Jan 1, 2023
Official Stanford NLP Python Library for Many Human Languages

Stanza: A Python NLP Library for Many Human Languages The Stanford NLP Group's official Python NLP library. It contains support for running various ac

Stanford NLP 6.4k Jan 2, 2023
运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。

OlittleRer 运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。编程语言和工具包括Java、Python、Matlab、CPLEX、Gurobi、SCIP 等。 关注我们: 运筹小公众号 有问题可以直接在

运小筹 151 Dec 30, 2022
NLP Core Library and Model Zoo based on PaddlePaddle 2.0

PaddleNLP 2.0拥有丰富的模型库、简洁易用的API与高性能的分布式训练的能力,旨在为飞桨开发者提升文本建模效率,并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

null 6.9k Jan 1, 2023
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 19.5k Feb 13, 2021
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

flair 12.3k Dec 31, 2022
Data loaders and abstractions for text and NLP

torchtext This repository consists of: torchtext.data: Generic data loaders, abstractions, and iterators for text (including vocabulary and word vecto

null 3.2k Dec 30, 2022
An open-source NLP research library, built on PyTorch.

An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks. Quic

AI2 11.4k Jan 1, 2023
NLP made easy

GluonNLP: Your Choice of Deep Learning for NLP GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you l

Distributed (Deep) Machine Learning Community 2.5k Jan 4, 2023
NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

Chartbeat Labs Projects 1.6k Feb 10, 2021
:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

deepset 6.4k Jan 9, 2023