Simple embedding based text classifier inspired by fastText, implemented in tensorflow

Overview

FastText in Tensorflow

This project is based on the ideas in Facebook's FastText but implemented in Tensorflow. However, it is not an exact replica of fastText.

Classification is done by embedding each word, taking the mean embedding over the full text and classifying that using a linear classifier. The embedding is trained with the classifier. You can also specify to use 2+ character ngrams. These ngrams get hashed then embedded in a similar manner to the orginal words. Note, ngrams make training much slower but only make marginal improvements in performance, at least in English.

I may implement skipgram and cbow training later. Or preloading embedding tables.

<< Still WIP >>

You can use Horovod to distribute training across multiple GPUs, on one or multiple servers. See usage section below.

FastText Language Identification

I have added utilities to train a classifier to detect languages, as described in Fast and Accurate Language Identification using FastText

See usage below. It basically works in the same way as default usage.

Implemented:

  • classification of text using word embeddings
  • char ngrams, hashed to n bins
  • training and prediction program
  • serve models on tensorflow serving
  • preprocess facebook format, or text input into tensorflow records

Not Implemented:

  • separate word vector training (though can export embeddings)
  • heirarchical softmax.
  • quantize models (supported by tensorflow, but I haven't tried it yet)

Usage

The following are examples of how to use the applications. Get full help with --help option on any of the programs.

To transform input data into tensorflow Example format:

process_input.py --facebook_input=queries.txt --output_dir=. --ngrams=2,3,4

Or, using a text file with one example per line with an extra file for labels:

process_input.py --text_input=queries.txt --labels=labels.txt --output_dir=.

To train a text classifier:

classifier.py \
  --train_records=queries.tfrecords \
  --eval_records=queries.tfrecords \
  --label_file=labels.txt \
  --vocab_file=vocab.txt \
  --model_dir=model \
  --export_dir=model

To predict classifications for text, use a saved_model from classifier. classifier.py --export_dir stores a saved model in a numbered directory below export_dir. Pass this directory to the following to use that model for predictions:

predictor.py
  --saved_model=model/12345678
  --text="some text to classify"
  --signature_def=proba

To export the embedding layer you can export from predictor. Note, this will only be the text embedding, not the ngram embeddings.

predictor.py
  --saved_model=model/12345678
  --text="some text to classify"
  --signature_def=embedding

Use the provided script to train easily:

train_classifier.sh path-to-data-directory

Language Identification

To implement something similar to the method described in Fast and Accurate Language Identification using FastText you need to download the data:

lang_dataset.sh [datadir]

You can then process the training and validation data using process_input.py and classifier.py as described above.

There is a utility script to do this for you:

train_langdetect.sh datadir

It reaches about 96% accuracy using word embeddings and this increases to nearly 99% when adding --ngrams=2,3,4

Distributed Training

You can run training across multiple GPUs either on one or multiple servers. To do so you need to install MPI and Horovod then add the --horovod option. It runs very close to the GPU multiple in terms of performance. I.e. if you have 2 GPUs on your server, it should run close to 2x the speed.

NUM_GPUS=2
mpirun -np $NUM_GPUS python classifier.py \
  --horovod \
  --train_records=queries.tfrecords \
  --eval_records=queries.tfrecords \
  --label_file=labels.txt \
  --vocab_file=vocab.txt \
  --model_dir=model \
  --export_dir=model

The training script has this option added: train_classifier.sh.

Tensorflow Serving

As well as using predictor.py to run a saved model to provide predictions, it is easy to serve a saved model using Tensorflow Serving with a client server setup. There is a supplied simple rpc client (predictor_client.py) that provides predictions by using tensorflow server.

First make sure you install the tensorflow serving binaries. Instructions are here.

You then serve the latest saved model by supplying the base export directory where you exported saved models to. This directory will contain the numbered model directories:

tensorflow_model_server --port=9000 --model_base_path=model

Now you can make requests to the server using gRPC calls. An example simple client is provided in predictor_client.py:

predictor_client.py --text="Some text to classify"

Facebook Examples

<< NOT IMPLEMENTED YET >>

You can compare with Facebook's fastText by running similar examples to what's provided in their repository.

./classification_example.sh
./classification_results.sh
Comments
  • Error during training

    Error during training

    Hi,

    Thank you for making this code available. I am attempting to run against a simple dataset of mine but I am running into the following error (any pointers):

    `Processing training dataset file Processing test dataset file data/mydataset.train.vocab 87898 Testing INFO:tensorflow:Using config: {'_log_step_count_steps': 100, '_master': '', '_evaluation_master': '', '_model_dir': 'data/models/mydataset', '_save_checkpoints_secs': None, '_task_id': 0, '_keep_checkpoint_max': 5, '_task_type': None, '_tf_random_seed': None, '_num_worker_replicas': 0, '_save_checkpoints_steps': 1000, '_session_config': , '_save_summary_steps': 100, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x117c32e80>, '_num_ps_replicas': 0, '_tf_config': gpu_options { per_process_gpu_memory_fraction: 1 } , '_environment': 'local', '_keep_checkpoint_every_n_hours': 10000} STARTING TRAIN ParseSpec {'label': FixedLenFeature(shape=(1,), dtype=tf.int64, default_value=None), 'text': VarLenFeature(dtype=tf.string)} INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Saving checkpoints for 1 into data/models/mydataset/model.ckpt. INFO:tensorflow:loss = 3.87008, step = 1 2017-08-23 14:27:17.797448: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Received a label value of -1 which is outside the valid range of [0, 47). Label values: 19 41 22 19 31 1 39 8 22 4 43 12 27 39 19 43 22 44 21 19 4 42 19 21 27 9 41 6 41 44 1 14 5 6 37 6 41 1 6 16 42 39 4 0 25 14 4 30 6 31 9 19 41 41 41 6 23 1 19 19 9 17 26 41 43 19 41 23 22 14 14 9 6 41 1 1 -1 6 23 31 16 14 20 6 41 19 4 1 21 31 23 34 4 6 11 6 1 4 30 32 44 17 43 4 44 32 13 9 44 4 41 41 4 6 9 19 22 40 9 23 4 21 41 0 6 5 20 37 Traceback (most recent call last): File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call return fn(*args) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1306, in _run_fn status, run_metadata) File "~/anaconda3/lib/python3.5/contextlib.py", line 66, in exit next(self.gen) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of -1 which is outside the valid range of [0, 47). Label values: 19 41 22 19 31 1 39 8 22 4 43 12 27 39 19 43 22 44 21 19 4 42 19 21 27 9 41 6 41 44 1 14 5 6 37 6 41 1 6 16 42 39 4 0 25 14 4 30 6 31 9 19 41 41 41 6 23 1 19 19 9 17 26 41 43 19 41 23 22 14 14 9 6 41 1 1 -1 6 23 31 16 14 20 6 41 19 4 1 21 31 23 34 4 6 11 6 1 4 30 32 44 17 43 4 44 32 13 9 44 4 41 41 4 6 9 19 22 40 9 23 4 21 41 0 6 5 20 37 [[Node: SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](SparseSoftmaxCrossEntropyWithLogits/Reshape, SparseSoftmaxCrossEntropyWithLogits/Reshape_1)]]

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "classifier.py", line 218, in tf.app.run() File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "classifier.py", line 208, in main FastTrain() File "classifier.py", line 181, in FastTrain estimator.train(input_fn=train_input, steps=FLAGS.train_steps, hooks=None) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 241, in train loss = self._train_model(input_fn=input_fn, hooks=hooks) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 686, in _train_model _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 518, in run run_metadata=run_metadata) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 862, in run run_metadata=run_metadata) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 818, in run return self._sess.run(*args, **kwargs) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 972, in run run_metadata=run_metadata) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 818, in run return self._sess.run(*args, **kwargs) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 895, in run run_metadata_ptr) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run options, run_metadata) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of -1 which is outside the valid range of [0, 47). Label values: 19 41 22 19 31 1 39 8 22 4 43 12 27 39 19 43 22 44 21 19 4 42 19 21 27 9 41 6 41 44 1 14 5 6 37 6 41 1 6 16 42 39 4 0 25 14 4 30 6 31 9 19 41 41 41 6 23 1 19 19 9 17 26 41 43 19 41 23 22 14 14 9 6 41 1 1 -1 6 23 31 16 14 20 6 41 19 4 1 21 31 23 34 4 6 11 6 1 4 30 32 44 17 43 4 44 32 13 9 44 4 41 41 4 6 9 19 22 40 9 23 4 21 41 0 6 5 20 37 [[Node: SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](SparseSoftmaxCrossEntropyWithLogits/Reshape, SparseSoftmaxCrossEntropyWithLogits/Reshape_1)]]

    Caused by op 'SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits', defined at: File "classifier.py", line 218, in tf.app.run() File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "classifier.py", line 208, in main FastTrain() File "classifier.py", line 181, in FastTrain estimator.train(input_fn=train_input, steps=FLAGS.train_steps, hooks=None) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 241, in train loss = self._train_model(input_fn=input_fn, hooks=hooks) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 630, in _train_model model_fn_lib.ModeKeys.TRAIN) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 615, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "classifier.py", line 121, in model_fn labels=labels, logits=logits)) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 1706, in sparse_softmax_cross_entropy_with_logits precise_logits, labels, name=name) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 2491, in _sparse_softmax_cross_entropy_with_logits features=features, labels=labels, name=name) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1204, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

    InvalidArgumentError (see above for traceback): Received a label value of -1 which is outside the valid range of [0, 47). Label values: 19 41 22 19 31 1 39 8 22 4 43 12 27 39 19 43 22 44 21 19 4 42 19 21 27 9 41 6 41 44 1 14 5 6 37 6 41 1 6 16 42 39 4 0 25 14 4 30 6 31 9 19 41 41 41 6 23 1 19 19 9 17 26 41 43 19 41 23 22 14 14 9 6 41 1 1 -1 6 23 31 16 14 20 6 41 19 4 1 21 31 23 34 4 6 11 6 1 4 30 32 44 17 43 4 44 32 13 9 44 4 41 41 4 6 9 19 22 40 9 23 4 21 41 0 6 5 20 37 [[Node: SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](SparseSoftmaxCrossEntropyWithLogits/Reshape, SparseSoftmaxCrossEntropyWithLogits/Reshape_1)]] `

    The same mydataset.train / mydataset.test is processed fine by fasttext (C++ version). Thank yoiiu in advance!

    opened by arnaudsj 5
  • Python3 compat and multiple datasets

    Python3 compat and multiple datasets

    Makes train_classifier.sh a bit more standalone and configurable. Only minor.

    One thing to note is the model directory is changed from ${DATADIR}/model to ${DATADIR}/models/${DATASET} so you'd probably want to mv ${DATADIR}/model ${DATADIR}/model/ag_news after pulling to reuse your model.

    opened by darrengarvey 0
  • ValueError: Shape must be rank 2 but is rank 3 for 'concat' (op: 'ConcatV2') with input shapes: [?,16], [?,1,16], []

    ValueError: Shape must be rank 2 but is rank 3 for 'concat' (op: 'ConcatV2') with input shapes: [?,16], [?,1,16], []

    Hi,

    I'm trying to train text classifier using train_langdetect.shwithout horovod installed. And I'm getting this error:

      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1607, in _create_c_op
        c_op = c_api.TF_FinishOperation(op_desc)
    tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 2 but is rank 3 for 'concat' (op: 'ConcatV2') with input shapes: [?,16], [?,1,16], [].
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "classifier.py", line 199, in <module>
        tf.app.run()
      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
        _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
      File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/absl/app.py", line 299, in run
        _run_main(main, args)
      File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
        sys.exit(main(argv))
      File "classifier.py", line 193, in main
        FastTrain()
      File "classifier.py", line 167, in FastTrain
        estimator.train(input_fn=train_input, steps=FLAGS.train_steps, hooks=hooks)
      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
        loss = self._train_model(input_fn, hooks, saving_listeners)
      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
        return self._train_model_default(input_fn, hooks, saving_listeners)
      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default
        features, labels, ModeKeys.TRAIN, self.config)
      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
        model_fn_results = self._model_fn(features=features, **kwargs)
      File "classifier.py", line 118, in model_fn
        input_layer = tf.concat([text_embedding, ngram_embedding], -1)
      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
        return target(*args, **kwargs)
      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py", line 1420, in concat
        return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_array_ops.py", line 1257, in concat_v2
        "ConcatV2", values=values, axis=axis, name=name)
      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
        op_def=op_def)
      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
        return func(*args, **kwargs)
      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
        attrs, op_def, compute_device)
      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
        op_def=op_def)
      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1770, in __init__
        control_input_ops)
      File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1610, in _create_c_op
        raise ValueError(str(e))
    ValueError: Shape must be rank 2 but is rank 3 for 'concat' (op: 'ConcatV2') with input shapes: [?,16], [?,1,16], [].```
    
    
    Do you have any suggestions what could be causing this?
    Thank you.
    opened by kodlan 0
  • `DataLossError: corrupted record` during training

    `DataLossError: corrupted record` during training

    Hello, during training the following error occurs:

    tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0 [[{{node read_batch_features/read/ReaderReadUpToV2}} = ReaderReadUpToV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](read_batch_features/read/TFRecordReaderV2, read_batch_features/file_name_queue, read_batch_features/read/ReaderReadUpToV2/num_records)]]

    My environment is: Python 3.6, tensorflow 1.12.0, macOS 10.13

    Here is detailed information:

    python3 classifier.py
    --train_records=data/train-label-text.txt
    --eval_records=data/train-label-text.txt
    --label_file=data/train-label-text.txt.labels
    --vocab_file=data/train-label-text.txt.vocab
    --model_dir=model
    --export_dir=model FastTrain 1000 WARNING:tensorflow:From classifier.py:153: RunConfig.init (from tensorflow.contrib.learn.python.learn.estimators.run_config) is deprecated and will be removed in a future version. Instructions for updating: When switching to tf.estimator.Estimator, use tf.estimator.RunConfig instead. TESTdata/train-label-text.txt STARTING TRAIN ParseSpec {'text': VarLenFeature(dtype=tf.string), 'label': FixedLenFeature(shape=(), dtype=tf.string, default_value=None)} Input file: data/train-label-text.txt WARNING:tensorflow:From /Users/hans/repos/simple-tests/classify/workspace/inputs.py:54: read_batch_features (from tensorflow.contrib.learn.python.learn.learn_io.graph_io) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/graph_io.py:833: read_keyed_batch_features (from tensorflow.contrib.learn.python.learn.learn_io.graph_io) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/graph_io.py:542: read_keyed_batch_examples (from tensorflow.contrib.learn.python.learn.learn_io.graph_io) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/graph_io.py:423: string_input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensor_slices(string_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs). If shuffle=False, omit the .shuffle(...). WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/input.py:276: input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensor_slices(input_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs). If shuffle=False, omit the .shuffle(...). WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/input.py:188: limit_epochs (from tensorflow.python.training.input) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensors(tensor).repeat(num_epochs). WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/input.py:197: QueueRunner.init (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/input.py:197: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/graph_io.py:317: TFRecordReader.init (from tensorflow.python.ops.io_ops) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.TFRecordDataset. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/graph_io.py:449: shuffle_batch_join (from tensorflow.python.training.input) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.interleave(...).shuffle(min_after_dequeue).batch(batch_size). WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/graph_io.py:550: queue_parsed_features (from tensorflow.contrib.learn.python.learn.learn_io.graph_io) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a tf.sparse.SparseTensor and use tf.sparse.to_dense instead. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py:804: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. Traceback (most recent call last): File "classifier.py", line 199, in tf.app.run() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "classifier.py", line 193, in main FastTrain() File "classifier.py", line 167, in FastTrain estimator.train(input_fn=train_input, steps=FLAGS.train_steps, hooks=hooks) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default saving_listeners) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1471, in _train_with_estimator_spec _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 783, in exit self._close_internal(exception_type) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 821, in _close_internal self._sess.close() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1069, in close self._sess.close() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1229, in close ignore_live_threads=True) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(*self._exc_info_to_raise) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/six.py", line 692, in reraise raise value.with_traceback(tb) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 257, in _run enqueue_callable() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1257, in _single_operation_run self._call_tf_sessionrun(None, {}, [], target_list, None) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0 [[{{node read_batch_features/read/ReaderReadUpToV2}} = ReaderReadUpToV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](read_batch_features/read/TFRecordReaderV2, read_batch_features/file_name_queue, read_batch_features/read/ReaderReadUpToV2/num_records)]]

    opened by hyangyt 0
  • ngrams maybe contains bug

    ngrams maybe contains bug

    def GenerateNgrams(words, ngrams):
        nglist = []
        for ng in ngrams:
            for word in words:
                nglist.extend([word[n:n+ng] for n in range(len(word)-ng+1)])
        return ngli
    

    maybe it should like following

    def GenerateNgrams(words, ngrams):
        nglist = []
        for ng in ngrams:
            nglist.extend(''.join([words[n:n+ng]) for n in range(len(words)-ng+1)])
        return ngli
    
    opened by qqhard 0
Owner
Alan Patterson
Alan Patterson
🌳 A Python-inspired implementation of the Optimum-Path Forest classifier.

OPFython: A Python-Inspired Optimum-Path Forest Classifier Welcome to OPFython. Note that this implementation relies purely on the standard LibOPF. Th

Gustavo Rosa 30 Jan 4, 2023
Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

null 19 Sep 29, 2022
Line-level Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Line-level Handwritten Text Recognition with TensorFlow This model is an extended version of the Simple HTR system implemented by @Harald Scheidl and

Hoàng Tùng Lâm (Linus) 72 May 7, 2022
Object tracking implemented with YOLOv4, DeepSort, and TensorFlow.

Object tracking implemented with YOLOv4, DeepSort, and TensorFlow. YOLOv4 is a state of the art algorithm that uses deep convolutional neural networks to perform object detections. We can take the output of YOLOv4 feed these object detections into Deep SORT (Simple Online and Realtime Tracking with a Deep Association Metric) in order to create a highly accurate object tracker.

The AI Guy 1.1k Dec 29, 2022
A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

CSL-YOLO: A New Lightweight Object Detection System for Edge Computing This project provides a SOTA level lightweight YOLO called "Cross-Stage Lightwe

Miles Zhang 54 Dec 21, 2022
This is a yolo3 implemented via tensorflow 2.7

YoloV3 - an object detection algorithm implemented via TF 2.x source code In this article I assume you've already familiar with basic computer vision

null 2 Jan 17, 2022
DIT is a DTLS MitM proxy implemented in Python 3. It can intercept, manipulate and suppress datagrams between two DTLS endpoints and supports psk-based and certificate-based authentication schemes (RSA + ECC).

DIT - DTLS Interception Tool DIT is a MitM proxy tool to intercept DTLS traffic. It can intercept, manipulate and/or suppress DTLS datagrams between t

null 52 Nov 30, 2022
This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

LIBRAS-Image-Classifier This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian

Aryclenio Xavier Barros 26 Oct 14, 2022
Keras-1D-NN-Classifier

Keras-1D-NN-Classifier This code is based on the reference codes linked below. reference 1, reference 2 This code is for 1-D array data classification

Jae-Hoon Shim 6 May 18, 2021
People movement type classifier with YOLOv4 detection and SORT tracking.

Movement classification The goal of this project would be movement classification of people, in other words, walking (normal and fast) and running. Yo

null 4 Sep 21, 2021
Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

Lucas 103 Dec 14, 2022
Minimal But Practical Image Classifier Pipline Using Pytorch, Finetune on ResNet18, Got 99% Accuracy on Own Small Datasets.

PyTorch Image Classifier Updates As for many users request, I released a new version of standared pytorch immage classification example at here: http:

JinTian 106 Nov 6, 2022
CondNet: Conditional Classifier for Scene Segmentation

CondNet: Conditional Classifier for Scene Segmentation Introduction The fully convolutional network (FCN) has achieved tremendous success in dense vis

ycszen 31 Jul 22, 2022
Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples / ICLR 2018

Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples This project is for the paper "Training Confidence-Calibrated Clas

null 168 Nov 29, 2022
Open-Set Recognition: A Good Closed-Set Classifier is All You Need

Open-Set Recognition: A Good Closed-Set Classifier is All You Need Code for our paper: "Open-Set Recognition: A Good Closed-Set Classifier is All You

null 194 Jan 3, 2023
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization This codebase is the official implementation of Test-Time Classifier A

null 47 Dec 28, 2022
A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

Yunxia Zhao 3 Dec 29, 2022