An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

Overview

PyPI CircleCI PyPI - License PyPI - Python Version

VizSeq

VizSeq is a Python toolkit for visual analysis on text generation tasks like machine translation, summarization, image captioning, speech translation and video description. It takes multi-modal sources, text references as well as text predictions as inputs, and analyzes them visually in Jupyter Notebook or a built-in Web App (the former has Fairseq integration). VizSeq also provides a collection of multi-process scorers as a normal Python package.

[Paper] [Documentation] [Blog]

VizSeq Overview VizSeq Teaser

Task Coverage

Source Example Tasks
Text Machine translation, text summarization, dialog generation, grammatical error correction, open-domain question answering
Image Image captioning, image question answering, optical character recognition
Audio Speech recognition, speech translation
Video Video description
Multimodal Multimodal machine translation

Metric Coverage

Accelerated with multi-processing/multi-threading.

Type Metrics
N-gram-based BLEU (Papineni et al., 2002), NIST (Doddington, 2002), METEOR (Banerjee et al., 2005), TER (Snover et al., 2006), RIBES (Isozaki et al., 2010), chrF (Popović et al., 2015), GLEU (Wu et al., 2016), ROUGE (Lin, 2004), CIDEr (Vedantam et al., 2015), WER
Embedding-based LASER (Artetxe and Schwenk, 2018), BERTScore (Zhang et al., 2019)

Getting Started

Installation

VizSeq requires Python 3.6+ and currently runs on Unix/Linux and macOS/OS X. It will support Windows as well in the future.

You can install VizSeq from PyPI repository:

$ pip install vizseq

Or install it from source:

$ git clone https://github.com/facebookresearch/vizseq
$ cd vizseq
$ pip install -e .

Documentation

Jupyter Notebook Examples

Fairseq integration

Web App Example

Download example data:

$ git clone https://github.com/facebookresearch/vizseq
$ cd vizseq
$ bash get_example_data.sh

Launch the web server:

$ python -m vizseq.server --port 9001 --data-root ./examples/data

And then, navigate to the following URL in your web browser:

http://localhost:9001

License

VizSeq is licensed under MIT. See the LICENSE file for details.

Citation

Please cite as

@inproceedings{wang2019vizseq,
  title = {VizSeq: A Visual Analysis Toolkit for Text Generation Tasks},
  author = {Changhan Wang, Anirudh Jain, Danlu Chen, Jiatao Gu},
  booktitle = {In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  year = {2019},
}

Contact

Changhan Wang ([email protected]), Jiatao Gu ([email protected])

Comments
  • [Bug] - cannot import name 'tokenize_13a' from 'sacrebleu'

    [Bug] - cannot import name 'tokenize_13a' from 'sacrebleu'

    🐛 Bug

    I just followed the installation steps and got this error.

    To reproduce

    ** Minimal Code/Config snippet to reproduce **

    ** Stack trace/error message **

    (base) diegomoussallem@Diegos-MBP examples % python -m vizseq.server --port 9001 --data-root examples/data
    Traceback (most recent call last):
      File "/opt/anaconda3/lib/python3.7/runpy.py", line 183, in _run_module_as_main
        mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
      File "/opt/anaconda3/lib/python3.7/runpy.py", line 109, in _get_module_details
        __import__(pkg_name)
      File "/Users/diegomoussallem/Desktop/vizseq/vizseq/__init__.py", line 15, in <module>
        from vizseq.ipynb import *
      File "/Users/diegomoussallem/Desktop/vizseq/vizseq/ipynb/__init__.py", line 8, in <module>
        from .core import (view_examples, view_n_grams, view_stats, view_scores,
      File "/Users/diegomoussallem/Desktop/vizseq/vizseq/ipynb/core.py", line 15, in <module>
        from vizseq._data import (VizSeqDataSources, PathOrPathsOrDictOfStrList,
      File "/Users/diegomoussallem/Desktop/vizseq/vizseq/_data/__init__.py", line 14, in <module>
        from .config_manager import VizSeqTaskConfigManager, VizSeqGlobalConfigManager
      File "/Users/diegomoussallem/Desktop/vizseq/vizseq/_data/config_manager.py", line 13, in <module>
        from .tokenizers import VizSeqTokenization
      File "/Users/diegomoussallem/Desktop/vizseq/vizseq/_data/tokenizers.py", line 10, in <module>
        from sacrebleu import tokenize_13a, tokenize_v14_international, tokenize_zh
    ImportError: cannot import name 'tokenize_13a' from 'sacrebleu' (/opt/anaconda3/lib/python3.7/site-packages/sacrebleu/__init__.py)
    

    Expected Behavior

    System information

    Additional context

    Add any other context about the problem here.

    bug 
    opened by DiegoMoussallem 4
  • [Question] Calculated BLEU score

    [Question] Calculated BLEU score

    Hi :)

    The tool returns a BLEU score for the machine translation and runs great in general, but I am not sure if the BLEU score represents the sentence level or corpus level? I haven't been able to gather anything conclusive from the sacreBLEU implementation, so I am hoping you can help me with this :)

    Best regards, Tobias

    opened by Tojens 3
  • [Bug] Multiple references

    [Bug] Multiple references

    When having multiple references (attached muti_refs.zip), I cannot configure metric BLEU (but I can configure GLEU). I get a 500 internal server error.

    This does not happen with a single reference (see attached single_ref.zip).

    opened by nadjet 3
  • Question about `tag` and `group` in official example

    Question about `tag` and `group` in official example

    In the official scorer example from https://facebookresearch.github.io/vizseq/docs/getting_started/scorer_example/, the second block confuse me.

    Corpus-level BLEU: 67.945
    Sentence-level BLEU: [75.984, 61.479]
    Group BLEU: {'Test Group 2': 75.984, 'Test Group 1': 75.984}
    

    I can see two generated sentences with corresponding reference sentences in the first block.

    ref = [['This is a sample #1 reference.', 'This is a sample #2 reference.']]
    hypo = ['This is a sample #1 prediction.', 'This is a sample #2 model prediction.']
    tags = [['Test Group 1', 'Test Group 2']]
    scores = scorer.score(hypo, ref, tags=tags)
    print(f'Corpus-level BLEU: {scores.corpus_score}')
    print(f'Sentence-level BLEU: {scores.sent_scores}')
    print(f'Group BLEU: {scores.group_scores}')
    

    The first sample belongs to Test Group 1 and the second sample belongs to Test Group 2. If I'm not misunderstanding the use of the tag, according to the Sentence-level BLEU,the Group BLEU should be {'Test Group 2': 61.479, 'Test Group 1': 75.984}.

    But the execution result is Group BLEU: {'Test Group 2': 75.984, 'Test Group 1': 75.984}

    opened by YKX-A 2
  • Support for non-ascii chars

    Support for non-ascii chars

    Support for non-ascii chars

    Motivation

    Current version cannot open non-ascii files

    Have you read the Contributing Guidelines on pull requests?

    Yes

    Test Plan

    Open some non-ascii files may help to test

    Related Issues and PRs

    (Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)

    CLA Signed 
    opened by fuzihaofzh 2
  • 🐛 Uncaught exception GET

    🐛 Uncaught exception GET

    🐛 Bug

    When trying to run the webapp with the example data, I have this error :

    Uncaught exception GET

    To reproduce

    Follow README instructions : download example data and run :

    python -m vizseq.server --port 9001 --data-root ./examples/data

    The server starts fine, but when accessing the webapp at localhost:9001, I can only see 500: Internal Server Error.

    Stack trace/error message

    INFO - 11/04/19 10:36:39 - 0:00:00 - Application Started
    You can navigate to http://localhost:9001
    ERROR - 11/04/19 10:36:42 - 0:00:03 - Uncaught exception GET / (192.168.0.30)
                                          HTTPServerRequest(protocol='http', host='192.168.0.231:9001', method='GET', uri='/', version='HTTP/1.1', remote_ip='192.168.0.30')
    ERROR - 11/04/19 10:36:42 - 0:00:03 - 500 GET / (192.168.0.30) 1.18ms
    

    Expected Behavior

    The webapp run normally.

    System information

    • VizSeq Version : 0.1.2
    • Python version : 3.6.8
    • Operating system : Ubuntu 16.04
    bug 
    opened by astariul 2
  • View Scores NoneType not subscriptable

    View Scores NoneType not subscriptable

    When running the following for scores like Rouge and others like:

    vizseq.view_scores(ref, hypo, ['metric that's not bleu'], tags=tag)

    I am getting:

    ~/.local/lib/python3.6/site-packages/vizseq/scorers/__init__.py in _score_multiprocess_averaged(self, hypothesis, references, tags, sent_score_func)
        170             for t in tag_set:
        171                 indices = [i for i, cur in enumerate(tags) if t in cur]
    --> 172                 group_scores[t] = np.mean([sent_scores[i] for i in indices])
        173 
        174         return VizSeqScore.make(
    
    ~/.local/lib/python3.6/site-packages/vizseq/scorers/__init__.py in <listcomp>(.0)
        170             for t in tag_set:
        171                 indices = [i for i, cur in enumerate(tags) if t in cur]
    --> 172                 group_scores[t] = np.mean([sent_scores[i] for i in indices])
        173 
        174         return VizSeqScore.make(
    
    TypeError: 'NoneType' object is not subscriptable
    

    Any idea why that's going on? Using text data this is at least 3 tokens or more.

    This works fine when running view_examples.

    opened by smart-patrol 1
  • 🐛 TypeError: score() got an unexpected keyword argument 'bert'

    🐛 TypeError: score() got an unexpected keyword argument 'bert'

    🐛 Bug

    I tried to apply BertScore on my data, but received this error :

    TypeError: score() got an unexpected keyword argument 'bert'

    To reproduce

    In configuration, select BertScore as metric.
    Refresh the page

    Stack trace/error message

    Traceback (most recent call last):
      File "/home/me/.venv/presum/lib/python3.6/site-packages/tornado/web.py", line 1590, in _execute
        result = method(*self.path_args, **self.path_kwargs)
      File "/home/me/workspace/vizseq/vizseq/server.py", line 103, in get
        pd = wv.get_page_data()
      File "/home/me/workspace/vizseq/vizseq/_view/web_view.py", line 158, in get_page_data
        sorting_metric=self.sorting_metric, need_lang_tags=True
      File "/home/me/workspace/vizseq/vizseq/_view/data_view.py", line 132, in get
        for s in metrics
      File "/home/me/workspace/vizseq/vizseq/_view/data_view.py", line 132, in <dictcomp>
        for s in metrics
      File "/home/me/workspace/vizseq/vizseq/_view/data_view.py", line 130, in <dictcomp>
        for m, hh in cur_hypo.items()
      File "/home/me/workspace/vizseq/vizseq/scorers/bert_score.py", line 28, in score
        no_idf=True, verbose=self.verbose
    TypeError: score() got an unexpected keyword argument 'bert'
    

    Expected Behavior

    Able to see BertScore.

    System information

    • VizSeq Version : 0.1.2
    • Python version : 3.6.8
    • Operating system : Ubuntu 16.04
    bug 
    opened by astariul 1
  • 🐛 AttributeError: 'VizSeqLogger' object has no attribute 'set_console_mode'

    🐛 AttributeError: 'VizSeqLogger' object has no attribute 'set_console_mode'

    🐛 Bug

    When trying to run the webapp with the example data, I have this error :

    AttributeError: 'VizSeqLogger' object has no attribute 'set_console_mode'

    To reproduce

    Follow README instructions : download example data and run :

    python -m vizseq.server --port 9001 --data-root ./examples/data

    Stack trace/error message

    Traceback (most recent call last):
      File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/me/workspace/vizseq/vizseq/server.py", line 14, in <module>
        logger.set_console_mode(enable=True)
    AttributeError: 'VizSeqLogger' object has no attribute 'set_console_mode'
    

    Expected Behavior

    The code run normally.

    System information

    • VizSeq Version : 0.1.2
    • Python version : 3.6.8
    • Operating system : Ubuntu 16.04
    bug 
    opened by astariul 1
  • Bump qs from 6.5.2 to 6.5.3 in /website

    Bump qs from 6.5.2 to 6.5.3 in /website

    Bumps qs from 6.5.2 to 6.5.3.

    Changelog

    Sourced from qs's changelog.

    6.5.3

    • [Fix] parse: ignore __proto__ keys (#428)
    • [Fix] utils.merge: avoid a crash with a null target and a truthy non-array source
    • [Fix] correctly parse nested arrays
    • [Fix] stringify: fix a crash with strictNullHandling and a custom filter/serializeDate (#279)
    • [Fix] utils: merge: fix crash when source is a truthy primitive & no options are provided
    • [Fix] when parseArrays is false, properly handle keys ending in []
    • [Fix] fix for an impossible situation: when the formatter is called with a non-string value
    • [Fix] utils.merge: avoid a crash with a null target and an array source
    • [Refactor] utils: reduce observable [[Get]]s
    • [Refactor] use cached Array.isArray
    • [Refactor] stringify: Avoid arr = arr.concat(...), push to the existing instance (#269)
    • [Refactor] parse: only need to reassign the var once
    • [Robustness] stringify: avoid relying on a global undefined (#427)
    • [readme] remove travis badge; add github actions/codecov badges; update URLs
    • [Docs] Clean up license text so it’s properly detected as BSD-3-Clause
    • [Docs] Clarify the need for "arrayLimit" option
    • [meta] fix README.md (#399)
    • [meta] add FUNDING.yml
    • [actions] backport actions from main
    • [Tests] always use String(x) over x.toString()
    • [Tests] remove nonexistent tape option
    • [Dev Deps] backport from main
    Commits
    • 298bfa5 v6.5.3
    • ed0f5dc [Fix] parse: ignore __proto__ keys (#428)
    • 691e739 [Robustness] stringify: avoid relying on a global undefined (#427)
    • 1072d57 [readme] remove travis badge; add github actions/codecov badges; update URLs
    • 12ac1c4 [meta] fix README.md (#399)
    • 0338716 [actions] backport actions from main
    • 5639c20 Clean up license text so it’s properly detected as BSD-3-Clause
    • 51b8a0b add FUNDING.yml
    • 45f6759 [Fix] fix for an impossible situation: when the formatter is called with a no...
    • f814a7f [Dev Deps] backport from main
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    CLA Signed dependencies 
    opened by dependabot[bot] 0
  • Bump decode-uri-component from 0.2.0 to 0.2.2 in /website

    Bump decode-uri-component from 0.2.0 to 0.2.2 in /website

    Bumps decode-uri-component from 0.2.0 to 0.2.2.

    Release notes

    Sourced from decode-uri-component's releases.

    v0.2.2

    • Prevent overwriting previously decoded tokens 980e0bf

    https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.1...v0.2.2

    v0.2.1

    • Switch to GitHub workflows 76abc93
    • Fix issue where decode throws - fixes #6 746ca5d
    • Update license (#1) 486d7e2
    • Tidelift tasks a650457
    • Meta tweaks 66e1c28

    https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.0...v0.2.1

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    CLA Signed dependencies 
    opened by dependabot[bot] 0
  • Bump express from 4.17.1 to 4.18.2 in /website

    Bump express from 4.17.1 to 4.18.2 in /website

    Bumps express from 4.17.1 to 4.18.2.

    Release notes

    Sourced from express's releases.

    4.18.2

    4.18.1

    • Fix hanging on large stack of sync routes

    4.18.0

    ... (truncated)

    Changelog

    Sourced from express's changelog.

    4.18.2 / 2022-10-08

    4.18.1 / 2022-04-29

    • Fix hanging on large stack of sync routes

    4.18.0 / 2022-04-25

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    CLA Signed dependencies 
    opened by dependabot[bot] 0
  • [Bug] BLEUScorer uses wrong default tokenizer.

    [Bug] BLEUScorer uses wrong default tokenizer.

    🐛 Bug

    vizseq.scorers.bleu.BLEUScorer does not use Tokenizer13a by default. When I look at the code, it looks like it should be used by default. sacrebleu library uses the Tokenizer13a by default as well.

    To reproduce

    Minimal Code/Config snippet to reproduce

    import vizseq
    
    scorer = vizseq.scorers.bleu.BLEUScorer()
    print(scorer.score(["This is really nice."], [["That's really nice."]]))
    # corpus_score = 31.947
    
    scorer = vizseq.scorers.bleu.BLEUScorer(extra_args={'tokenizer': '13a'})
    print(scorer.score(["This is really nice."], [["That's really nice."]]))
    # corpus_score = 39.764
    

    Stack trace/error message

    The problem is here. Variable tokenizer is set to string none. When calling method get_default_args (here), default value 13a for parameter tokenize is not used, because the string none is passed.

    Expected Behavior

    vizseq.scorers.bleu.BLEUScorer should use Tokenizer13a by default.

    System information

    • vizseq==0.1.15
    • python==3.7.3
    • macOS
    bug 
    opened by landert 0
  • pip3 install vizseq failed on AArch64, Fedora 33

    pip3 install vizseq failed on AArch64, Fedora 33

    [jw@cn05 ~]$ pip3 install vizseq Defaulting to user installation because normal site-packages is not writeable Collecting vizseq Using cached vizseq-0.1.15-py3-none-any.whl (81 kB) Collecting nltk>=3.5 Using cached nltk-3.5-py3-none-any.whl Collecting sacrebleu>=1.4.13 Using cached sacrebleu-1.5.0-py3-none-any.whl (65 kB) Collecting langid Using cached langid-1.1.6.tar.gz (1.9 MB) Requirement already satisfied: tqdm in ./.local/lib/python3.9/site-packages (from vizseq) (4.31.1) Collecting google-cloud-translate Using cached google_cloud_translate-3.0.2-py2.py3-none-any.whl (93 kB) Collecting torch Using cached torch-0.1.2.post2.tar.gz (128 kB) Requirement already satisfied: numpy in ./.local/lib/python3.9/site-packages (from vizseq) (1.19.5) Requirement already satisfied: jinja2 in ./.local/lib/python3.9/site-packages (from vizseq) (2.10.3) Collecting soundfile Using cached SoundFile-0.10.3.post1-py2.py3-none-any.whl (21 kB) Requirement already satisfied: py-rouge in ./.local/lib/python3.9/site-packages (from vizseq) (1.1) Requirement already satisfied: matplotlib in ./.local/lib/python3.9/site-packages (from vizseq) (3.3.2) Requirement already satisfied: tornado in ./.local/lib/python3.9/site-packages (from vizseq) (6.1) Requirement already satisfied: IPython in ./.local/lib/python3.9/site-packages (from vizseq) (7.18.1) Collecting bert-score Using cached bert_score-0.3.7-py3-none-any.whl (53 kB) Requirement already satisfied: pandas in ./.local/lib/python3.9/site-packages (from vizseq) (1.1.4) Collecting laserembeddings Using cached laserembeddings-1.1.1-py3-none-any.whl (13 kB) Requirement already satisfied: click in ./.local/lib/python3.9/site-packages (from nltk>=3.5->vizseq) (7.1.2) Requirement already satisfied: regex in ./.local/lib/python3.9/site-packages (from nltk>=3.5->vizseq) (2020.11.13) Requirement already satisfied: joblib in ./.local/lib/python3.9/site-packages (from nltk>=3.5->vizseq) (0.17.0) Collecting portalocker Using cached portalocker-2.2.1-py2.py3-none-any.whl (15 kB) Collecting transformers>=3.0.0 Using cached transformers-4.3.3-py3-none-any.whl (1.9 MB) Collecting bert-score Using cached bert_score-0.3.6-py3-none-any.whl (53 kB) Using cached bert_score-0.3.5-py3-none-any.whl (52 kB) Using cached bert_score-0.3.4-py3-none-any.whl (52 kB) Using cached bert_score-0.3.3-py3-none-any.whl (52 kB) Using cached bert_score-0.3.2-py3-none-any.whl (52 kB) Using cached bert_score-0.3.1-py3-none-any.whl (51 kB) Using cached bert_score-0.3.0-py3-none-any.whl (48 kB) Using cached bert_score-0.2.3-py3-none-any.whl (15 kB) Using cached bert_score-0.2.2-py3-none-any.whl (14 kB) Using cached bert_score-0.1.2-py3-none-any.whl (9.4 kB) Using cached bert_score-0.1.1-py3-none-any.whl (9.4 kB) Using cached bert_score-0.1.0-py3-none-any.whl (7.3 kB) INFO: pip is looking at multiple versions of to determine which version is compatible with other requirements. This could take a while. INFO: pip is looking at multiple versions of sacrebleu to determine which version is compatible with other requirements. This could take a while. Collecting sacrebleu>=1.4.13 Using cached sacrebleu-1.4.14-py3-none-any.whl (64 kB) Using cached sacrebleu-1.4.13-py3-none-any.whl (43 kB) INFO: pip is looking at multiple versions of nltk to determine which version is compatible with other requirements. This could take a while. INFO: pip is looking at multiple versions of vizseq to determine which version is compatible with other requirements. This could take a while. Collecting vizseq Using cached vizseq-0.1.14-py3-none-any.whl (81 kB) Using cached vizseq-0.1.13-py3-none-any.whl (81 kB) Using cached vizseq-0.1.12-py3-none-any.whl (81 kB) Using cached vizseq-0.1.11-py3-none-any.whl (81 kB) Using cached vizseq-0.1.10-py3-none-any.whl (80 kB) Using cached vizseq-0.1.9-py3-none-any.whl (78 kB) Requirement already satisfied: nltk in ./.local/lib/python3.9/site-packages (from vizseq) (3.4.5) Collecting sacrebleu==1.4.7 Using cached sacrebleu-1.4.7-py3-none-any.whl (59 kB) Requirement already satisfied: typing in ./.local/lib/python3.9/site-packages (from sacrebleu==1.4.7->vizseq) (3.7.4.3) Collecting mecab-python3 Using cached mecab-python3-1.0.3.tar.gz (77 kB) INFO: pip is looking at multiple versions of to determine which version is compatible with other requirements. This could take a while. INFO: pip is looking at multiple versions of sacrebleu to determine which version is compatible with other requirements. This could take a while. ERROR: Cannot install vizseq and vizseq==0.1.9 because these package versions have conflicting dependencies.

    The conflict is caused by: vizseq 0.1.9 depends on torch bert-score 0.3.7 depends on torch>=1.0.0 vizseq 0.1.9 depends on torch bert-score 0.3.6 depends on torch>=1.0.0 vizseq 0.1.9 depends on torch bert-score 0.3.5 depends on torch>=1.0.0 vizseq 0.1.9 depends on torch bert-score 0.3.4 depends on torch>=1.0.0 vizseq 0.1.9 depends on torch bert-score 0.3.3 depends on torch>=1.0.0 vizseq 0.1.9 depends on torch bert-score 0.3.2 depends on torch>=1.0.0 vizseq 0.1.9 depends on torch bert-score 0.3.1 depends on torch>=1.0.0 vizseq 0.1.9 depends on torch bert-score 0.3.0 depends on torch>=1.0.0 vizseq 0.1.9 depends on torch bert-score 0.2.3 depends on torch>=1.0.0 vizseq 0.1.9 depends on torch bert-score 0.2.2 depends on torch>=1.0.0 vizseq 0.1.9 depends on torch bert-score 0.1.2 depends on torch>=0.4.1 vizseq 0.1.9 depends on torch bert-score 0.1.1 depends on torch>=0.4.1 vizseq 0.1.9 depends on torch bert-score 0.1.0 depends on torch>=0.4.1

    To fix this you could try to:

    1. loosen the range of package versions you've specified
    2. remove package versions to allow pip attempt to solve the dependency conflict

    ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies [jw@cn05 ~]$

    bug 
    opened by LutzWeischerFujitsu 0
  • Example speech task (IWSLT17 dev) not pairing correct audio source with reference [Bug]

    Example speech task (IWSLT17 dev) not pairing correct audio source with reference [Bug]

    🐛 Bug

    Audio segments from speech data in example speech translation task (IWSLT17 dev) are not correctly associated with reference data.

    Only the first TED talk audio segments are correctly aligned to the reference... playing the audio segments related to any other talks (from # 3 / 10 / 887 ( 153 / 887 ) onwards on page 16 of the task using the defaults) results in the segments of the first TED talk audio being played rather than the segments specified in the task directory speech_translation_iwslt17_dev/src_0.zip/source.txt

    To reproduce

    Get the example speech task data (IWSLT7 dev)

    $ bash get_example_data.sh speech_translation_iwslt17_dev
    

    Start the server and navigate to : http://127.0.0.1:5000/view?t=speech_translation_iwslt17_dev&m=&q=&p_sz=10&p_no=16&s=0&s_metric= Play the audio segments: first two on this page will be correctly associated with reference text, from # 3 / 10 / 887 ( 153 / 887 ) onwards they are not.

    bug 
    opened by jb101 0
  • [Feature Request] Update BertScorer with oop implementation

    [Feature Request] Update BertScorer with oop implementation

    🚀 Feature Request

    Bert Score (https://github.com/Tiiiger/bert_score) new version (0.3.1) supports oop implementation. Current vizseq uses the functional implementation which could be updated to oop implementation.

    Motivation

    Currently, using Bert Score in a validation loop causes re-loading the model again and again. This can be avoided with oop implementation.

    Pitch

    I see two solutions: (i) create a separate scorer named bert_score_oop (ii) in the current implementation of bert_score add argument whether to use oop implementation or not.

    Are you willing to open a pull request? Yes, I can send a pull request

    enhancement 
    opened by TheShadow29 1
  • [Bug] Vizseq CSS breaks Jupyter layout

    [Bug] Vizseq CSS breaks Jupyter layout

    🐛 Bug

    Executing vizseq.view_stats breaks the layout of the Jupyter. The menu at the top obscures a majority of the screen and a blank area of ~60px appears at the top of the page.

    Screenshot

    To reproduce

    ** Minimal Code/Config snippet to reproduce **

    1. start Jupyter jupyter notebook
    2. view any of the example notebooks e.g. speech_translation
    3. Execute the cells one-by-one.

    When the first cell containing vizseq.view_stats is finishes the layout changes and appears broken.

    Expected Behavior

    The display of tables and graphs by vizseq does not affect the layout of the Jupyter notebook.

    System information

    • VizSeq Version: '0.1.11' (clone from master yesterday)
    • Python version: Python 3.8.1 (default, Jan 8 2020, 23:09:20) [GCC 9.2.0] on linux
    • Operating system: Manjaro Linux

    Additional context

    Cause: The bootstrap.min.css and an inline stylesheet loaded by vizseq break the layout. The inline stylesheet is:

    body {
       padding-top: 60px; /* 60px to make the container go all the way to the bottom of the topbar */
    }
    

    The inline stylesheet is responsible for the blank bar at the top while Bootstrap breaks the menu's formatting.

    To test this disable both stylesheets in the stylesheet editor included in the developer tools of a browser.

    bug 
    opened by pyfisch 0
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

Abel 211 Dec 28, 2022
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

Abel 137 Feb 1, 2021
Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

Samuel Cahyawijaya 11 Aug 26, 2022
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.3k Jan 7, 2023
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.1k Feb 17, 2021
Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics. Jury offers a smooth and easy-to-use interface. It uses datasets for underlying metric computation, and hence adding custom metric is easy as adopting datasets.Metric.

Open Business Software Solutions 129 Jan 6, 2023
A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

MIDI Language Introduction Reference Paper: Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions: code This

Robert Bogan Kang 3 May 25, 2022
Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

Twitter-NLP-Analysis Business Problem I got last @turk_politika 3000 tweets with

Çağrı Karadeniz 7 Mar 12, 2022
Associated Repository for "Translation between Molecules and Natural Language"

MolT5: Translation between Molecules and Natural Language Associated repository for "Translation between Molecules and Natural Language". Table of Con

null 67 Dec 15, 2022
Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

Tanuj Sur 4 Jul 1, 2022
SummerTime - Text Summarization Toolkit for Non-experts

A library to help users choose appropriate summarization tools based on their specific tasks or needs. Includes models, evaluation metrics, and datasets.

Yale-LILY 213 Jan 4, 2023
Natural language Understanding Toolkit

Natural language Understanding Toolkit TOC Requirements Installation Documentation CLSCL NER References Requirements To install nut you need: Python 2

Peter Prettenhofer 119 Oct 8, 2022
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing Trankit is a light-weight Transformer-based Pyth

null 652 Jan 6, 2023
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Texar-PyTorch is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar

ASYML 726 Dec 30, 2022
A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers

A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers

Libo Qin 132 Nov 25, 2022
Code for the paper "Flexible Generation of Natural Language Deductions"

Code for the paper "Flexible Generation of Natural Language Deductions"

Kaj Bostrom 12 Nov 11, 2022
Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

背景 安装教程 快速上手 (一)预训练模型 (二)机器翻译 (三)文本分类 TenTrans 进阶 1. 多语言机器翻译 2. 跨语言预训练 背景 TrenTrans是一个统一的端到端的多语言多任务预训练平台,支持多种预训练方式,以及序列生成和自然语言理解任务。 安装教程 git clone git

Tencent Minority-Mandarin Translation Team 42 Dec 20, 2022