BLEURT is a metric for Natural Language Generation based on transfer learning.

Related tags

Deep Learning bleurt
Overview

BLEURT: a Transfer Learning-Based Metric for Natural Language Generation

BLEURT is an evaluation metric for Natural Language Generation. It takes a pair of sentences as input, a reference and a candidate, and it returns a score that indicates to what extent the candidate is fluent and conveys the mearning of the reference. It is comparable to sentence-BLEU, BERTscore, and COMET.

BLEURT is a trained metric, that is, it is a regression model trained on ratings data. The model is based on BERT and RemBERT. This repository contains all the code necessary to use it and/or fine-tune it for your own applications. BLEURT uses Tensorflow, and it benefits greatly from modern GPUs (it runs on CPU too).

An overview of BLEURT can be found in our our blog post. Further details are provided in the ACL paper BLEURT: Learning Robust Metrics for Text Generation and our EMNLP paper.

Installation

BLEURT runs in Python 3. It relies heavily on Tensorflow (>=1.15) and the library tf-slim (>=1.1). You may install it as follows:

pip install --upgrade pip  # ensures that pip is current
git clone https://github.com/google-research/bleurt.git
cd bleurt
pip install .

You may check your install with unit tests:

python -m unittest bleurt.score_test
python -m unittest bleurt.score_not_eager_test
python -m unittest bleurt.finetune_test
python -m unittest bleurt.score_files_test

Using BLEURT - TL;DR Version

The following commands download the recommended checkpoint and run BLEURT:

# Downloads the BLEURT-base checkpoint.
wget https://storage.googleapis.com/bleurt-oss-21/BLEURT-20.zip .
unzip BLEURT-20.zip

# Runs the scoring.
python -m bleurt.score_files \
  -candidate_file=bleurt/test_data/candidates \
  -reference_file=bleurt/test_data/references \
  -bleurt_checkpoint=BLEURT-20

The files bleurt/test_data/candidates and references contain test sentences, included by default in the BLEURT distribution. The input format is one sentence per line. You may replace them with your own files. The command outputs one score per sentence pair.

Oct 8th 2021 Update: we upgraded the recommended checkpoint to BLEURT-20, a more accurate, multilingual model 🎉 .

Using BLEURT - the Long Version

Command-line tools and APIs

Currently, there are three methods to invoke BLEURT: the command-line interface, the Python API, and the Tensorflow API.

Command-line interface

The simplest way to use BLEURT is through command line, as shown below.

python -m bleurt.score_files \
  -candidate_file=bleurt/test_data/candidates \
  -reference_file=bleurt/test_data/references \
  -bleurt_checkpoint=bleurt/test_checkpoint \
  -scores_file=scores

The files candidates and references contain one sentence per line (see the folder test_data for the exact format). Invoking the command should produce a file scores which contains one BLEURT score per sentence pair. Alternatively you may use a JSONL file, as follows:

python -m bleurt.score_files \
  -sentence_pairs_file=bleurt/test_data/sentence_pairs.jsonl \
  -bleurt_checkpoint=bleurt/test_checkpoint

The flags bleurt_checkpoint and scores_file are optional. If bleurt_checkpoint is not specified, BLEURT will default to a test checkpoint, based on BERT-Tiny, which is very light but also very inaccurate (we recommend against using it). If scores_files is not specified, BLEURT will use the standard output.

The following command lists all the other command-line options:

python -m bleurt.score_files -helpshort

Python API

BLEURT may be used as a Python library as follows:

from bleurt import score

checkpoint = "bleurt/test_checkpoint"
references = ["This is a test."]
candidates = ["This is the test."]

scorer = score.BleurtScorer(checkpoint)
scores = scorer.score(references=references, candidates=candidates)
assert type(scores) == list and len(scores) == 1
print(scores)

Here again, BLEURT will default to BERT-Tiny if no checkpoint is specified.

BLEURT works both in eager_mode (default in TF 2.0) and in a tf.Session (TF 1.0), but the latter mode is slower and may be deprecated in the near future.

Tensorflow API

BLEURT may be embedded in a TF computation graph, e.g., to visualize it on the Tensorboard while training a model.

The following piece of code shows an example:

import tensorflow as tf
# Set tf.enable_eager_execution() if using TF 1.x.

from bleurt import score

references = tf.constant(["This is a test."])
candidates = tf.constant(["This is the test."])

bleurt_ops = score.create_bleurt_ops()
bleurt_out = bleurt_ops(references=references, candidates=candidates)

assert bleurt_out["predictions"].shape == (1,)
print(bleurt_out["predictions"])

The crucial part is the call to score.create_bleurt_ops, which creates the TF ops.

Checkpoints

A BLEURT checkpoint is a self-contained folder that contains a regression model and some information that BLEURT needs to run. BLEURT checkpoints can be downloaded, copy-pasted, and stored anywhere. Furthermore, checkpoints are tunable, which means that they can be fine-tuned on custom ratings data.

BLEURT defaults to the test checkpoint, which is very inaccaurate. We recommend using BLEURT-20 for results reporting. You may use it as follows:

wget https://storage.googleapis.com/bleurt-oss-21/BLEURT-20.zip .
unzip BLEURT-20.zip
python -m bleurt.score_files \
  -candidate_file=bleurt/test_data/candidates \
  -reference_file=bleurt/test_data/references \
  -bleurt_checkpoint=BLEURT-20

The checkpoints page provides more information about how these checkpoints were trained, as well as pointers to smaller models. Additionally, you can fine-tune BERT or existing BLEURT checkpoints on your own ratings data. The checkpoints page describes how to do so.

Interpreting BLEURT Scores

Different BLEURT checkpoints yield different scores. The currently recommended checkpoint BLEURT-20 generates scores which are roughly between 0 and 1 (sometimes less than 0, sometimes more than 1), where 0 indicates a random output and 1 a perfect one. As with all automatic metrics, BLEURT scores are noisy. For a robust evaluation of a system's quality, we recommend averaging BLEURT scores across the sentences in a corpus. See the WMT Metrics Shared Task for a comparison of metrics on this aspect.

In principle, BLEURT should measure adequacy: most of its training data was collected by the WMT organizers who asked to annotators "How much do you agree that the system output adequately expresses the meaning of the reference?" (WMT Metrics'18, Graham et al., 2015). In practice however, the answers tend to be very correlated with fluency ("Is the text fluent English?"), and we added synthetic noise in the training set which makes the distinction between adequacy and fluency somewhat fuzzy.

Language Coverage

Currently, BLEURT-20 was tested on 13 languages: Chinese, Czech, English, French, German, Japanese, Korean, Polish, Portugese, Russian, Spanish, Tamil, Vietnamese (these are languages for which we have held-out ratings data). In theory, it should work for the 100+ languages of multilingual C4, on which RemBERT was trained.

If you tried any other language and would like to share your experience, either positive or negative, please send us feedback!

Speeding Up BLEURT

We describe three methods to speed up BLEURT, and how to combine them.

Batch size tuning

You may specify the flag -bleurt_batch_size which determines the number of sentence pairs processed at once by BLEURT. The default value is 16, you may want to increase or decrease it based on the memory available and the presence of a GPU (we typically use 16 when using a laptop without a GPU, 100 on a workstation with a GPU).

Length-based batching

Length-based batching is an optimization which consists in batching examples that have a similar a length and cropping the resulting tensor, to avoid wasting computations on padding tokens. This technique oftentimes results in spectacular speed-ups (typically, ~2-10X). It is described here, and it was successfully used by BERTScore in the field of learned metrics.

You can enable length-based by specifying -batch_same_length=True when calling score_files with the command line, or by instantiating a LengthBatchingBleurtScorer instead of BleurtScorer when using the Python API.

Distilled models

We provide pointers to several compressed checkpoints on the checkpoints page. These models were obtained by distillation, a lossy process, and therefore the outputs cannot be directly compared to those of the original BLEURT model (though they should be strongly correlated).

Putting everything together

The following command illustrates how to combine these three techniques, speeding up BLEURT by an order of magnitude (up to 20X with our configuration) on larger files:

# Downloads the 12-layer distilled model, which is ~3.5X smaller.
wget https://storage.googleapis.com/bleurt-oss-21/BLEURT-20-D12.zip .
unzip BLEURT-20-D12.zip

python -m bleurt.score_files \
  -candidate_file=bleurt/test_data/candidates \
  -reference_file=bleurt/test_data/references \
  -bleurt_batch_size=100 \            # Optimization 1.
  -batch_same_length=True \           # Optimization 2.
  -bleurt_checkpoint=BLEURT-20-D12    # Optimization 3.

Reproducibility

You may find information about how to work with ratings from the WMT Metrics Shared Task, reproduce results from our ACL paper, and a selection of models from our EMNLP paper here.

How to Cite

Please cite our ACL paper:

@inproceedings{sellam2020bleurt,
  title = {BLEURT: Learning Robust Metrics for Text Generation},
  author = {Thibault Sellam and Dipanjan Das and Ankur P Parikh},
  year = {2020},
  booktitle = {Proceedings of ACL}
}
Comments
  • Installation check error: Expected to be a int64 tensor but is a int32.

    Installation check error: Expected to be a int64 tensor but is a int32.

    Hi ,

    I have installed BLEURT and running the test script to test installation, getting below error. Paths to directory seems to be correct .

    python -m bleurt.score
    -candidate_file=bleurt/test_data/candidates
    -reference_file=bleurt/test_data/references
    -bleurt_checkpoint=bleurt/test_checkpoint
    -scores_file=scores


    INFO:tensorflow:BLEURT initialized. I0630 08:28:46.424506 24396 score.py:151] BLEURT initialized. INFO:tensorflow:Computing BLEURT scores... I0630 08:28:46.424506 24396 score.py:305] Computing BLEURT scores... Traceback (most recent call last): File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\site-packages\bleurt\score.py", line 344, in tf.app.run() File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\site-packages\absl\app.py", line 299, in run _run_main(main, args) File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\site-packages\absl\app.py", line 250, in _run_main sys.exit(main(argv)) File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\site-packages\bleurt\score.py", line 339, in main FLAGS.bleurt_checkpoint) File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\site-packages\bleurt\score.py", line 321, in score_files _consume_buffer() File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\site-packages\bleurt\score.py", line 300, in _consume_buffer scores = scorer.score(ref_buffer, cand_buffer, FLAGS.bleurt_batch_size) File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\site-packages\bleurt\score.py", line 186, in score predict_out = self.predict_fn(tf_input) File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\site-packages\bleurt\score.py", line 70, in _predict_fn segment_ids=tf.constant(input_dict["segment_ids"]) File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\eager\function.py", line 1551, in call return self._call_impl(args, kwargs) File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\eager\function.py", line 1591, in _call_impl return self._call_flat(args, self.captured_inputs, cancellation_manager) File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\eager\function.py", line 1692, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\eager\function.py", line 545, in call ctx=ctx) File "C:\Users\amit.prakash\Anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute __inference_pruned_1485 as input #0(zero-based) was expected to be a int64 tensor but is a int32 tensor [Op:__inference_pruned_1485]

    Thanks, Amit

    opened by prkasamit02 9
  • BLEURT returning scores less than zero

    BLEURT returning scores less than zero

    I'm not sure if this is supposed to happen or not, but when testing BLEURT on the test_data ( "Bud Powell was a legendary pianist.", etc...) I'm getting scores that are way below zero:

    1. Here is my output for bleurt/test_checkpoint
    0.9129246473312378
    0.2755325436592102
    -0.34470897912979126
    -0.737292468547821
    
    1. And my output for bleurt-base-128
    1.003721833229065
    0.5313903093338013
    -1.489485502243042
    -1.6975871324539185
    

    Though I may be wrong, my understanding is that scores should be between 0 and 1. Thanks!

    opened by machelreid 8
  • Python API test_checkpoint not found

    Python API test_checkpoint not found

    Hi,

    Thank you for releasing the code. Interesting work!

    I am trying to use the Python API in my code as:

    from bleurt import score
    checkpoint = "bleurt/test_checkpoint"
    references = ["This is a test."]
    candidates = ["This is the test."]
    scorer = score.BleurtScorer(checkpoint)
    

    however, the bleurt/test_checkpoint is not found.

    >>> scorer = score.BleurtScorer(checkpoint)
    INFO:tensorflow:Reading checkpoint bleurt/test_checkpoint.
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/opt/conda/envs/metrics/lib/python3.7/site-packages/bleurt/score.py", line 133, in __init__
        config = checkpoint_lib.read_bleurt_config(checkpoint)
      File "/opt/conda/envs/metrics/lib/python3.7/site-packages/bleurt/checkpoint.py", line 78, in read_bleurt_config
        "Could not find BLEURT checkpoint {}".format(path)
    AssertionError: Could not find BLEURT checkpoint bleurt/test_checkpoint
    

    Is there a missing download link here?

    If I don't provide any checkpoint

    scorer = score.BleurtScorer()
    

    Expected: If bleurt_checkpoint is not specified, BLEURT will default to the test checkpoint, based on BERT-Tiny, however, I am getting the assertion error

    >>> scorer = score.BleurtScorer()
    INFO:tensorflow:No checkpoint specified, defaulting to BLEURT-tiny.
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/opt/conda/envs/metrics/lib/python3.7/site-packages/bleurt/score.py", line 130, in __init__
        checkpoint = _get_default_checkpoint()
      File "/opt/conda/envs/metrics/lib/python3.7/site-packages/bleurt/score.py", line 56, in _get_default_checkpoint
        "Default checkpoint not found! Are you sure the install is complete?"
    AssertionError: Default checkpoint not found! Are you sure the install is complete?
    

    Could you please suggest a way around. Thanks!

    opened by shubhamagarwal92 5
  • UnrecognizedFlagError: Unknown command line flag 'f'

    UnrecognizedFlagError: Unknown command line flag 'f'

    Hello!

    I'm trying to run the code below, following the instructions in the README, and I'm getting an error. Can you help me? Follow the code used and the output. The tensorflow version used is 2.2.0.

    import os !git clone https://github.com/google-research/bleurt.git os.chdir('bleurt') !pip install . from bleurt import score import tensorflow as tf

    checkpoint = "bleurt/test_checkpoint" references = ["This is a test."] candidates = ["This is the test."]

    scorer = score.BleurtScorer(checkpoint) scores = scorer.score(references, candidates) assert type(scores) == list and len(scores) == 1 print(scores)


    UnrecognizedFlagError Traceback (most recent call last) in () 9 10 scorer = score.BleurtScorer(checkpoint) ---> 11 scores = scorer.score(references, candidates) 12 assert type(scores) == list and len(scores) == 1 13 print(scores)

    2 frames /usr/local/lib/python3.6/dist-packages/absl/flags/_flagvalues.py in call(self, argv, known_only) 631 suggestions = _helpers.get_flag_suggestions(name, list(self)) 632 raise _exceptions.UnrecognizedFlagError( --> 633 name, value, suggestions=suggestions) 634 635 self.mark_as_parsed()

    UnrecognizedFlagError: Unknown command line flag 'f'

    opened by CinthiaS 5
  • How to use the checkpoints of BERT

    How to use the checkpoints of BERT "warmed up" with synthetic ratings?

    After I downloaded bleurt and successfully use the test_checkpoint or the fine-tuned checkpoint, I am thinking use the "warmed up" version. However, if I directly download the "warmed up" checkpoint and use it, it will show an error:

    OSError: SavedModel file does not exist at: bleurt/bert-large-midtrained/bert-large//{saved_model.pbtxt|saved_model.pb}
    

    After looking into the details, I found the files type under "warmed up" checkpoint is different than those under test_checkpoint or fine-tuned checkpoint. Under fine-tuned checkpoint:

    bert_config.json  bleurt_config.json  saved_model.pb  variables  vocab.txt
    

    Under warmed up checkpoint:

    bert_config.json  bert-large.data-00000-of-00001  bert-large.index  bert-large.meta  bleurt_config.json  vocab.txt
    

    So how can we directly use the warmed up checkpoint for evaluation?

    opened by g-jing 4
  • Does bleurt support Chinese?

    Does bleurt support Chinese?

    I tried to use your fine-tuned model on Chinese, but the result is awful with a 0.5 pearson correlation with sacrebleu. Is it because your model does not support Chinese? If not, then how can I use your codes on Chinese?

    opened by HuihuiChyan 4
  • Error in finetuning BLEURT

    Error in finetuning BLEURT

    Thank you for the great work and for open-sourcing it!

    I am trying to follow the instructions in https://github.com/google-research/bleurt/blob/master/checkpoints.md#from-an-existing-bleurt-checkpoint to fine-tune the BLEURT-20 model on a customized set of ratings.

    However, when I run the suggested command,

    python -m bleurt.finetune \
      -train_set=../data/ratings_train.jsonl \
      -dev_set=../data/ratings_dev.jsonl \
      -num_train_steps=500 \
      -model_dir=../models/bleurt-20-fine1 \
      -init_bleurt_checkpoint=../models/BLEURT-20/
    

    I get the following issue:

    ValueError: Shape of variable bert/embeddings/LayerNorm/beta:0 ((1152,)) doesn't match with shape of tensor bert/embeddings/LayerNorm/beta ([256]) from checkpoint reader.
    

    I have checked this with both tensorflow 2.7 and 1.15

    Any help related to this would be appreciated!

    opened by SaiKeshav 3
  • Incompatible dependencies with installing through pip

    Incompatible dependencies with installing through pip

    When installing this repo through pip, it raises the following errors.

    ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.
    
    We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.
    
    tensorflow 2.3.1 requires numpy<1.19.0,>=1.16.0, but you'll have numpy 1.19.2 which is incompatible.
    

    My workaround is to install an older numpy version beforehand by running pip install numpy==1.18.5.

    However, when I run the test, it raises $ python -m unittest bleurt.score_test Illegal instruction (core dumped) from bash.

    opened by znculee 3
  • Can't use predict_fn with TF2

    Can't use predict_fn with TF2

    Hi !

    When I try to instantiate a PythonPredictor with a predict_fn function, I get this error:

        def __init__(self, predict_fn):
    >     tf.logging.info("Creating Python-based predictor.")
    E     AttributeError: module 'tensorflow' has no attribute 'logging'
    

    I'm using TF2 so tensorflow.logging isn't available.

    cc @tsellam I think this comes from the recent change to 0.0.2

    opened by lhoestq 2
  • Getting BLEURT to Work in Jupyter Notebook on Windows: Unknown command line flag 'f', 32int error

    Getting BLEURT to Work in Jupyter Notebook on Windows: Unknown command line flag 'f', 32int error

    Raising an unknown error flag. Installed exactly as described on front page.

    • tensorflow 2.3 or 2.0
    • python 3.7.5
    • Windows

    Arises from either:

    references = ['This is a test.', 'This is surely a test']
    candidates = ['This is also a text', 'This could be a test']
    checkpoint = 'C:/bleurt/bleurt/checkpoints/bleurt-tiny-512'
    scorer = score.BleurtScorer(checkpoint)
    scorer.score(references, candidates, batch_size = 2)
    

    or

    references = tf.constant(["This is a test."])
    candidates = tf.constant(["This is the test."])
    checkpoint = 'C:/bleurt/bleurt/checkpoints/bleurt-tiny-512'
    scorer = score.BleurtScorer(checkpoint)
    scorer.score(references, candidates, batch_size = 2)
    

    Error:

    UnrecognizedFlagError Traceback (most recent call last) in 13 14 scorer = score.BleurtScorer(checkpoint) ---> 15 scorer.score(references, candidates, batch_size = 2) 16 # bleurt_out = scorer(references, candidates) 17 # # bleurt_ops = score.create_bleurt_ops()

    c:\programdata\anaconda3\envs\context2\lib\site-packages\bleurt\score.py in score(self, references, candidates, batch_size) 178 batch_cand = candidates[i:i + batch_size] 179 input_ids, input_mask, segment_ids = encoding.encode_batch( --> 180 batch_ref, batch_cand, self.tokenizer, self.max_seq_length) 181 tf_input = { 182 "input_ids": input_ids,

    c:\programdata\anaconda3\envs\context2\lib\site-packages\bleurt\encoding.py in encode_batch(references, candidates, tokenizer, max_seq_length) 150 encoded_examples = [] 151 for ref, cand in zip(references, candidates): --> 152 triplet = encode_example(ref, cand, tokenizer, max_seq_length) 153 example = np.stack(triplet) 154 encoded_examples.append(example)

    c:\programdata\anaconda3\envs\context2\lib\site-packages\bleurt\encoding.py in encode_example(reference, candidate, tokenizer, max_seq_length) 56 # Tokenizes, truncates and concatenates the sentences, as in: 57 # bert/run_classifier.py ---> 58 tokens_ref = tokenizer.tokenize(reference) 59 tokens_cand = tokenizer.tokenize(candidate) 60

    c:\programdata\anaconda3\envs\context2\lib\site-packages\bleurt\lib\tokenization.py in tokenize(self, text) 144 def tokenize(self, text): 145 split_tokens = [] --> 146 for token in self.basic_tokenizer.tokenize(text): 147 if preserve_token(token, self.vocab): 148 split_tokens.append(token)

    c:\programdata\anaconda3\envs\context2\lib\site-packages\bleurt\lib\tokenization.py in tokenize(self, text) 189 split_tokens = [] 190 for token in orig_tokens: --> 191 if preserve_token(token, self.vocab): 192 split_tokens.append(token) 193 continue

    c:\programdata\anaconda3\envs\context2\lib\site-packages\bleurt\lib\tokenization.py in preserve_token(token, vocab) 43 def preserve_token(token, vocab): 44 """Returns True if the token should forgo tokenization and be preserved.""" ---> 45 if not FLAGS.preserve_unused_tokens: 46 return False 47 if token not in vocab:

    c:\programdata\anaconda3\envs\context2\lib\site-packages\tensorflow\python\platform\flags.py in getattr(self, name) 83 # a flag. 84 if not wrapped.is_parsed(): ---> 85 wrapped(_sys.argv) 86 return wrapped.getattr(name) 87

    c:\programdata\anaconda3\envs\context2\lib\site-packages\absl\flags_flagvalues.py in call(self, argv, known_only) 631 suggestions = _helpers.get_flag_suggestions(name, list(self)) 632 raise _exceptions.UnrecognizedFlagError( --> 633 name, value, suggestions=suggestions) 634 635 self.mark_as_parsed()

    UnrecognizedFlagError: Unknown command line flag 'f'

    opened by alexorona 2
  • Python API Error (UnrecognizedFlagError: Unknown command line flag 'f')

    Python API Error (UnrecognizedFlagError: Unknown command line flag 'f')

    Hello,

    I have installed the BLEURT according to steps mentioned in README. I also have run all installation tests, and everything works fine. Then, I am trying to use the Python API, following the mentioned script:

    from bleurt import score
    
    checkpoint = "bleurt/test_checkpoint"
    references = ["This is a test."]
    candidates = ["This is the test."]
    
    scorer = score.BleurtScorer(checkpoint)
    scores = scorer.score(references, candidates)
    assert type(scores) == list and len(scores) == 1
    print(scores)
    

    I have checked that my Tensorflow is >=1.15, and tf-slim is >=1.1. I am using Python3.6 in Google Colab, in Tesla K80 GPU. However I got this error:

    UnrecognizedFlagError                     Traceback (most recent call last)
    
    <ipython-input-2-8427261da7b2> in <module>()
          6 
          7 scorer = score.BleurtScorer(checkpoint)
    ----> 8 scores = scorer.score(references, candidates)
          9 assert type(scores) == list and len(scores) == 1
         10 print(scores)
    
    2 frames
    
    /usr/local/lib/python3.6/dist-packages/bleurt/score.py in score(self, references, candidates, batch_size)
        164     """
        165     if not batch_size:
    --> 166       batch_size = FLAGS.bleurt_batch_size
        167 
        168     candidates, references = list(candidates), list(references)
    
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/flags.py in __getattr__(self, name)
         83     # a flag.
         84     if not wrapped.is_parsed():
    ---> 85       wrapped(_sys.argv)
         86     return wrapped.__getattr__(name)
         87 
    
    /usr/local/lib/python3.6/dist-packages/absl/flags/_flagvalues.py in __call__(self, argv, known_only)
        631       suggestions = _helpers.get_flag_suggestions(name, list(self))
        632       raise _exceptions.UnrecognizedFlagError(
    --> 633           name, value, suggestions=suggestions)
        634 
        635     self.mark_as_parsed()
    
    UnrecognizedFlagError: Unknown command line flag 'f'
    

    Could you suggest any way around? Thank you!

    opened by FerdiantJoshua 2
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 1
  • BLEURT consumes all available memory on checkpoint load?

    BLEURT consumes all available memory on checkpoint load?

    Not quite sure what's happening here - running CUDA 11.6 and TensorFlow 2.10.0. No matter what checkpoint I use, all available GPU memory is consumed. Minimum reproducible example here:

    bleurtcheckpoints = os.path.join(os.getcwd(), "bleurtcktpts")
    from bleurt import score
    checkpoint = os.path.join(bleurtcheckpoints, "bleurt-tiny-128/")
    scorer = score.BleurtScorer(checkpoint)
    
    INFO:tensorflow:Reading checkpoint /data/visualization/vis-text/datasets/vis-text/bleurtcktpts/bleurt-tiny-128/.
    INFO:tensorflow:Config file found, reading.
    INFO:tensorflow:Will load checkpoint bert_custom
    INFO:tensorflow:Loads full paths and checks that files exists.
    INFO:tensorflow:... name:bert_custom
    INFO:tensorflow:... vocab_file:vocab.txt
    INFO:tensorflow:... bert_config_file:bert_config.json
    INFO:tensorflow:... do_lower_case:True
    INFO:tensorflow:... max_seq_length:128
    INFO:tensorflow:Creating BLEURT scorer.
    INFO:tensorflow:Creating WordPiece tokenizer.
    INFO:tensorflow:WordPiece tokenizer instantiated.
    INFO:tensorflow:Creating Eager Mode predictor.
    INFO:tensorflow:Loading model.
    
    2022-10-25 21:53:54.567812: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2022-10-25 21:53:54.690050: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2022-10-25 21:53:54.691074: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2022-10-25 21:53:54.692770: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
    To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    2022-10-25 21:53:54.693762: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2022-10-25 21:53:54.694475: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2022-10-25 21:53:54.695136: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2022-10-25 21:53:56.420924: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2022-10-25 21:53:56.422112: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2022-10-25 21:53:56.423287: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2022-10-25 21:53:56.424176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 11413 MB memory:  -> device: 0, name: NVIDIA TITAN Xp, pci bus id: 0000:00:05.0, compute capability: 6.1
    
    INFO:tensorflow:BLEURT initialized.
    

    nvidia-smi results after (right before, it's only showing 4 MiB in use):

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA TITAN Xp     On   | 00000000:00:05.0 Off |                  N/A |
    | 23%   30C    P2    58W / 250W |  11697MiB / 12288MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    0   N/A  N/A     16853      C   /usr/bin/python3.8              11693MiB |
    +-----------------------------------------------------------------------------+
    
    opened by versipellis 0
  • BLUERT for Spanish

    BLUERT for Spanish

    I want to use bleurt for a Image Captioning in spanish, but I was searching if there is any parameter to do that but I didn't find it. So I wanted to know if BLEURT dectect the languaje automatically of the captions o there are some parameter that I need to change or how do BLEURT do that and how can I do that?

    opened by YairCCastillo 0
  • How to load rembert distilled models?

    How to load rembert distilled models?

    Hi I am trying to load rembert distilled models for some of my downstream tasks. However, I am not able to do so.

    AutoTokenizer.from_pretrained(model, **kwargs)
    

    Can you help?

    opened by kaliaanup 1
  • Install with pip fails

    Install with pip fails

    When I'm try to install the repo as a package with pip, there is no problem in Linux, but in Windows Python 3.9 it fails due to fail in README.md. Basically, in Windows python tries to open file with cp1254 encoding by default, and it results in a failed installation giving the following error message

    (base) C:\Users\devri>pip install git+https://github.com/google-research/bleurt.git --force-reinstall --no-cache-dir     
    Collecting git+https://github.com/google-research/bleurt.git
      Cloning https://github.com/google-research/bleurt.git to c:\users\devri\appdata\local\temp\pip-req-build-l293_p23                    
      Running command git clone -q https://github.com/google-research/bleurt.git 'C:\Users\devri\AppData\Local\Temp\pip-req-build-l293_p23'
      Resolved https://github.com/google-research/bleurt.git to commit c6f2375c7c178e1480840cf27cb9e2af851394f9
        ERROR: Command errored out with exit status 1:
         command: 'C:\tools\Anaconda3\envs\jury\python.exe' -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\devri\\AppData\\Local\\Temp\\pip-req-build-l293_p23\\setup.py'"'"'; __file__='"'"'C:\\Users\\devri\\AppData\\Local\\Temp\\pip-req-build-l293_p23\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.
    exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\devri\AppData\Local\Temp\pip-pip-egg-info-s1jaf9gz'
             cwd: C:\Users\devri\AppData\Local\Temp\pip-req-build-l293_p23\
        Complete output (7 lines):
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "C:\Users\devri\AppData\Local\Temp\pip-req-build-l293_p23\setup.py", line 23, in <module>
            long_description = fh.read()
          File "C:\tools\Anaconda3\envs\base\lib\encodings\cp1254.py", line 23, in decode
            return codecs.charmap_decode(input,self.errors,decoding_table)[0]
        UnicodeDecodeError: 'charmap' codec can't decode byte 0x8e in position 2560: character maps to <undefined>
    
    opened by devrimcavusoglu 0
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

UNION Automatic Evaluation Metric described in the paper UNION: An UNreferenced MetrIc for Evaluating Open-eNded Story Generation (EMNLP 2020). Please

null 50 Dec 30, 2022
The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

Dinghan Shen 49 Dec 22, 2022
Transfer-Learn is an open-source and well-documented library for Transfer Learning.

Transfer-Learn is an open-source and well-documented library for Transfer Learning. It is based on pure PyTorch with high performance and friendly API. Our code is pythonic, and the design is consistent with torchvision. You can easily develop new algorithms, or readily apply existing algorithms.

THUML @ Tsinghua University 2.2k Jan 3, 2023
Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning" (AAAI 2021)

Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic

NAVER/LINE Vision 30 Dec 6, 2022
code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

SHOT++ Code for our TPAMI submission "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer" that is ext

null 75 Dec 16, 2022
Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

Transfer Style API It's an API to use with Tranfer Style App, where you can use

Brian Alejandro 1 Feb 13, 2022
Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning

Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning Code for the paper Harmonious Textual Layout Generation over Nat

null 7 Aug 9, 2022
Metric learning algorithms in Python

metric-learn: Metric Learning in Python metric-learn contains efficient Python implementations of several popular supervised and weakly-supervised met

null 1.3k Dec 28, 2022
Dogs classification with Deep Metric Learning using some popular losses

Tsinghua Dogs classification with Deep Metric Learning 1. Introduction Tsinghua Dogs dataset Tsinghua Dogs is a fine-grained classification dataset fo

QuocThangNguyen 45 Nov 9, 2022
Code reproduce for paper "Vehicle Re-identification with Viewpoint-aware Metric Learning"

VANET Code reproduce for paper "Vehicle Re-identification with Viewpoint-aware Metric Learning" Introduction This is the implementation of article VAN

EMDATA-AILAB 23 Dec 26, 2022
Towards Interpretable Deep Metric Learning with Structural Matching

DIML Created by Wenliang Zhao*, Yongming Rao*, Ziyi Wang, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for paper Towards Interpr

Wenliang Zhao 75 Nov 11, 2022
[ICCV 2021] Official PyTorch implementation for Deep Relational Metric Learning.

Deep Relational Metric Learning This repository is the official PyTorch implementation of Deep Relational Metric Learning. Framework Datasets CUB-200-

Borui Zhang 39 Dec 10, 2022
GeDML is an easy-to-use generalized deep metric learning library

GeDML is an easy-to-use generalized deep metric learning library

Borui Zhang 32 Dec 5, 2022
A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generative Modeling" (ICCV 2021)

Manifold Matching via Deep Metric Learning for Generative Modeling A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generat

null 69 Dec 10, 2022
Unofficial implementation of Proxy Anchor Loss for Deep Metric Learning

Proxy Anchor Loss for Deep Metric Learning Unofficial pytorch, tensorflow and mxnet implementations of Proxy Anchor Loss for Deep Metric Learning. Not

Geonmo Gu 3 Jun 9, 2021
Paper: Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification

Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification T M Feroz Ali, Subhasis Chaudhuri, ICVGIP-20-21

T M Feroz Ali 3 Jun 17, 2022
Near-Duplicate Video Retrieval with Deep Metric Learning

Near-Duplicate Video Retrieval with Deep Metric Learning This repository contains the Tensorflow implementation of the paper Near-Duplicate Video Retr

null 2 Jan 24, 2022
Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

2D-TAN (Optimized) Introduction This is an optimized re-implementation repository for AAAI'2020 paper: Learning 2D Temporal Localization Networks for

Joya Chen 112 Dec 31, 2022
🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

AI City 2021: Connecting Language and Vision for Natural Language-Based Vehicle Retrieval ?? The 1st Place Submission to AICity Challenge 2021 Natural

null 82 Dec 29, 2022