Simple embedding based text classifier inspired by fastText, implemented in tensorflow

Alan Patterson

Last update: Dec 2, 2022

Related tags

Overview

FastText in Tensorflow

This project is based on the ideas in Facebook's FastText but implemented in Tensorflow. However, it is not an exact replica of fastText.

Classification is done by embedding each word, taking the mean embedding over the full text and classifying that using a linear classifier. The embedding is trained with the classifier. You can also specify to use 2+ character ngrams. These ngrams get hashed then embedded in a similar manner to the orginal words. Note, ngrams make training much slower but only make marginal improvements in performance, at least in English.

I may implement skipgram and cbow training later. Or preloading embedding tables.

<< Still WIP >>

You can use Horovod to distribute training across multiple GPUs, on one or multiple servers. See usage section below.

FastText Language Identification

I have added utilities to train a classifier to detect languages, as described in Fast and Accurate Language Identification using FastText

See usage below. It basically works in the same way as default usage.

Implemented:

classification of text using word embeddings
char ngrams, hashed to n bins
training and prediction program
serve models on tensorflow serving
preprocess facebook format, or text input into tensorflow records

Not Implemented:

separate word vector training (though can export embeddings)
heirarchical softmax.
quantize models (supported by tensorflow, but I haven't tried it yet)

Usage

The following are examples of how to use the applications. Get full help with --help option on any of the programs.

To transform input data into tensorflow Example format:

process_input.py --facebook_input=queries.txt --output_dir=. --ngrams=2,3,4

Or, using a text file with one example per line with an extra file for labels:

process_input.py --text_input=queries.txt --labels=labels.txt --output_dir=.

To train a text classifier:

classifier.py \
  --train_records=queries.tfrecords \
  --eval_records=queries.tfrecords \
  --label_file=labels.txt \
  --vocab_file=vocab.txt \
  --model_dir=model \
  --export_dir=model

To predict classifications for text, use a saved_model from classifier. classifier.py --export_dir stores a saved model in a numbered directory below export_dir. Pass this directory to the following to use that model for predictions:

predictor.py
  --saved_model=model/12345678
  --text="some text to classify"
  --signature_def=proba

To export the embedding layer you can export from predictor. Note, this will only be the text embedding, not the ngram embeddings.

predictor.py
  --saved_model=model/12345678
  --text="some text to classify"
  --signature_def=embedding

Use the provided script to train easily:

train_classifier.sh path-to-data-directory

Language Identification

To implement something similar to the method described in Fast and Accurate Language Identification using FastText you need to download the data:

lang_dataset.sh [datadir]

You can then process the training and validation data using process_input.py and classifier.py as described above.

There is a utility script to do this for you:

train_langdetect.sh datadir

It reaches about 96% accuracy using word embeddings and this increases to nearly 99% when adding --ngrams=2,3,4

Distributed Training

You can run training across multiple GPUs either on one or multiple servers. To do so you need to install MPI and Horovod then add the --horovod option. It runs very close to the GPU multiple in terms of performance. I.e. if you have 2 GPUs on your server, it should run close to 2x the speed.

NUM_GPUS=2
mpirun -np $NUM_GPUS python classifier.py \
  --horovod \
  --train_records=queries.tfrecords \
  --eval_records=queries.tfrecords \
  --label_file=labels.txt \
  --vocab_file=vocab.txt \
  --model_dir=model \
  --export_dir=model

The training script has this option added: train_classifier.sh.

Tensorflow Serving

As well as using predictor.py to run a saved model to provide predictions, it is easy to serve a saved model using Tensorflow Serving with a client server setup. There is a supplied simple rpc client (predictor_client.py) that provides predictions by using tensorflow server.

First make sure you install the tensorflow serving binaries. Instructions are here.

You then serve the latest saved model by supplying the base export directory where you exported saved models to. This directory will contain the numbered model directories:

tensorflow_model_server --port=9000 --model_base_path=model

Now you can make requests to the server using gRPC calls. An example simple client is provided in predictor_client.py:

predictor_client.py --text="Some text to classify"

Facebook Examples

<< NOT IMPLEMENTED YET >>

You can compare with Facebook's fastText by running similar examples to what's provided in their repository.

./classification_example.sh
./classification_results.sh

Comments

Error during training

Hi,

Thank you for making this code available. I am attempting to run against a simple dataset of mine but I am running into the following error (any pointers):

`Processing training dataset file Processing test dataset file data/mydataset.train.vocab 87898 Testing INFO:tensorflow:Using config: {'_log_step_count_steps': 100, '_master': '', '_evaluation_master': '', '_model_dir': 'data/models/mydataset', '_save_checkpoints_secs': None, '_task_id': 0, '_keep_checkpoint_max': 5, '_task_type': None, '_tf_random_seed': None, '_num_worker_replicas': 0, '_save_checkpoints_steps': 1000, '_session_config': , '_save_summary_steps': 100, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x117c32e80>, '_num_ps_replicas': 0, '_tf_config': gpu_options { per_process_gpu_memory_fraction: 1 } , '_environment': 'local', '_keep_checkpoint_every_n_hours': 10000} STARTING TRAIN ParseSpec {'label': FixedLenFeature(shape=(1,), dtype=tf.int64, default_value=None), 'text': VarLenFeature(dtype=tf.string)} INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Saving checkpoints for 1 into data/models/mydataset/model.ckpt. INFO:tensorflow:loss = 3.87008, step = 1 2017-08-23 14:27:17.797448: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Received a label value of -1 which is outside the valid range of [0, 47). Label values: 19 41 22 19 31 1 39 8 22 4 43 12 27 39 19 43 22 44 21 19 4 42 19 21 27 9 41 6 41 44 1 14 5 6 37 6 41 1 6 16 42 39 4 0 25 14 4 30 6 31 9 19 41 41 41 6 23 1 19 19 9 17 26 41 43 19 41 23 22 14 14 9 6 41 1 1 -1 6 23 31 16 14 20 6 41 19 4 1 21 31 23 34 4 6 11 6 1 4 30 32 44 17 43 4 44 32 13 9 44 4 41 41 4 6 9 19 22 40 9 23 4 21 41 0 6 5 20 37 Traceback (most recent call last): File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call return fn(*args) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1306, in _run_fn status, run_metadata) File "~/anaconda3/lib/python3.5/contextlib.py", line 66, in exit next(self.gen) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of -1 which is outside the valid range of [0, 47). Label values: 19 41 22 19 31 1 39 8 22 4 43 12 27 39 19 43 22 44 21 19 4 42 19 21 27 9 41 6 41 44 1 14 5 6 37 6 41 1 6 16 42 39 4 0 25 14 4 30 6 31 9 19 41 41 41 6 23 1 19 19 9 17 26 41 43 19 41 23 22 14 14 9 6 41 1 1 -1 6 23 31 16 14 20 6 41 19 4 1 21 31 23 34 4 6 11 6 1 4 30 32 44 17 43 4 44 32 13 9 44 4 41 41 4 6 9 19 22 40 9 23 4 21 41 0 6 5 20 37 [[Node: SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](SparseSoftmaxCrossEntropyWithLogits/Reshape, SparseSoftmaxCrossEntropyWithLogits/Reshape_1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "classifier.py", line 218, in tf.app.run() File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "classifier.py", line 208, in main FastTrain() File "classifier.py", line 181, in FastTrain estimator.train(input_fn=train_input, steps=FLAGS.train_steps, hooks=None) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 241, in train loss = self._train_model(input_fn=input_fn, hooks=hooks) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 686, in _train_model _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 518, in run run_metadata=run_metadata) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 862, in run run_metadata=run_metadata) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 818, in run return self._sess.run(*args, **kwargs) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 972, in run run_metadata=run_metadata) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 818, in run return self._sess.run(*args, **kwargs) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 895, in run run_metadata_ptr) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run options, run_metadata) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of -1 which is outside the valid range of [0, 47). Label values: 19 41 22 19 31 1 39 8 22 4 43 12 27 39 19 43 22 44 21 19 4 42 19 21 27 9 41 6 41 44 1 14 5 6 37 6 41 1 6 16 42 39 4 0 25 14 4 30 6 31 9 19 41 41 41 6 23 1 19 19 9 17 26 41 43 19 41 23 22 14 14 9 6 41 1 1 -1 6 23 31 16 14 20 6 41 19 4 1 21 31 23 34 4 6 11 6 1 4 30 32 44 17 43 4 44 32 13 9 44 4 41 41 4 6 9 19 22 40 9 23 4 21 41 0 6 5 20 37 [[Node: SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](SparseSoftmaxCrossEntropyWithLogits/Reshape, SparseSoftmaxCrossEntropyWithLogits/Reshape_1)]]

Caused by op 'SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits', defined at: File "classifier.py", line 218, in tf.app.run() File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "classifier.py", line 208, in main FastTrain() File "classifier.py", line 181, in FastTrain estimator.train(input_fn=train_input, steps=FLAGS.train_steps, hooks=None) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 241, in train loss = self._train_model(input_fn=input_fn, hooks=hooks) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 630, in _train_model model_fn_lib.ModeKeys.TRAIN) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 615, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "classifier.py", line 121, in model_fn labels=labels, logits=logits)) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 1706, in sparse_softmax_cross_entropy_with_logits precise_logits, labels, name=name) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 2491, in _sparse_softmax_cross_entropy_with_logits features=features, labels=labels, name=name) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1204, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Received a label value of -1 which is outside the valid range of [0, 47). Label values: 19 41 22 19 31 1 39 8 22 4 43 12 27 39 19 43 22 44 21 19 4 42 19 21 27 9 41 6 41 44 1 14 5 6 37 6 41 1 6 16 42 39 4 0 25 14 4 30 6 31 9 19 41 41 41 6 23 1 19 19 9 17 26 41 43 19 41 23 22 14 14 9 6 41 1 1 -1 6 23 31 16 14 20 6 41 19 4 1 21 31 23 34 4 6 11 6 1 4 30 32 44 17 43 4 44 32 13 9 44 4 41 41 4 6 9 19 22 40 9 23 4 21 41 0 6 5 20 37 [[Node: SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](SparseSoftmaxCrossEntropyWithLogits/Reshape, SparseSoftmaxCrossEntropyWithLogits/Reshape_1)]] `

The same mydataset.train / mydataset.test is processed fine by fasttext (C++ version). Thank yoiiu in advance!

opened by arnaudsj 5
Python3 compat and multiple datasets

Makes train_classifier.sh a bit more standalone and configurable. Only minor.

One thing to note is the model directory is changed from ${DATADIR}/model to ${DATADIR}/models/${DATASET} so you'd probably want to mv ${DATADIR}/model ${DATADIR}/model/ag_news after pulling to reuse your model.

opened by darrengarvey 0

ValueError: Shape must be rank 2 but is rank 3 for 'concat' (op: 'ConcatV2') with input shapes: [?,16], [?,1,16], []

Hi,

I'm trying to train text classifier using train_langdetect.shwithout horovod installed. And I'm getting this error:

  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1607, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 2 but is rank 3 for 'concat' (op: 'ConcatV2') with input shapes: [?,16], [?,1,16], [].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "classifier.py", line 199, in <module>
    tf.app.run()
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "classifier.py", line 193, in main
    FastTrain()
  File "classifier.py", line 167, in FastTrain
    estimator.train(input_fn=train_input, steps=FLAGS.train_steps, hooks=hooks)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "classifier.py", line 118, in model_fn
    input_layer = tf.concat([text_embedding, ngram_embedding], -1)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py", line 1420, in concat
    return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_array_ops.py", line 1257, in concat_v2
    "ConcatV2", values=values, axis=axis, name=name)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1770, in __init__
    control_input_ops)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1610, in _create_c_op
    raise ValueError(str(e))
ValueError: Shape must be rank 2 but is rank 3 for 'concat' (op: 'ConcatV2') with input shapes: [?,16], [?,1,16], [].```


Do you have any suggestions what could be causing this?
Thank you.

opened by kodlan 0

`DataLossError: corrupted record` during training

Hello, during training the following error occurs:

tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0 [[{{node read_batch_features/read/ReaderReadUpToV2}} = ReaderReadUpToV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](read_batch_features/read/TFRecordReaderV2, read_batch_features/file_name_queue, read_batch_features/read/ReaderReadUpToV2/num_records)]]

My environment is: Python 3.6, tensorflow 1.12.0, macOS 10.13

Here is detailed information:

python3 classifier.py
--train_records=data/train-label-text.txt
--eval_records=data/train-label-text.txt
--label_file=data/train-label-text.txt.labels
--vocab_file=data/train-label-text.txt.vocab
--model_dir=model
--export_dir=model FastTrain 1000 WARNING:tensorflow:From classifier.py:153: RunConfig.init (from tensorflow.contrib.learn.python.learn.estimators.run_config) is deprecated and will be removed in a future version. Instructions for updating: When switching to tf.estimator.Estimator, use tf.estimator.RunConfig instead. TESTdata/train-label-text.txt STARTING TRAIN ParseSpec {'text': VarLenFeature(dtype=tf.string), 'label': FixedLenFeature(shape=(), dtype=tf.string, default_value=None)} Input file: data/train-label-text.txt WARNING:tensorflow:From /Users/hans/repos/simple-tests/classify/workspace/inputs.py:54: read_batch_features (from tensorflow.contrib.learn.python.learn.learn_io.graph_io) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/graph_io.py:833: read_keyed_batch_features (from tensorflow.contrib.learn.python.learn.learn_io.graph_io) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/graph_io.py:542: read_keyed_batch_examples (from tensorflow.contrib.learn.python.learn.learn_io.graph_io) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/graph_io.py:423: string_input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensor_slices(string_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs). If shuffle=False, omit the .shuffle(...). WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/input.py:276: input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensor_slices(input_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs). If shuffle=False, omit the .shuffle(...). WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/input.py:188: limit_epochs (from tensorflow.python.training.input) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensors(tensor).repeat(num_epochs). WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/input.py:197: QueueRunner.init (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/input.py:197: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/graph_io.py:317: TFRecordReader.init (from tensorflow.python.ops.io_ops) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.TFRecordDataset. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/graph_io.py:449: shuffle_batch_join (from tensorflow.python.training.input) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.interleave(...).shuffle(min_after_dequeue).batch(batch_size). WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/graph_io.py:550: queue_parsed_features (from tensorflow.contrib.learn.python.learn.learn_io.graph_io) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a tf.sparse.SparseTensor and use tf.sparse.to_dense instead. WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py:804: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. Traceback (most recent call last): File "classifier.py", line 199, in tf.app.run() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "classifier.py", line 193, in main FastTrain() File "classifier.py", line 167, in FastTrain estimator.train(input_fn=train_input, steps=FLAGS.train_steps, hooks=hooks) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default saving_listeners) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1471, in _train_with_estimator_spec _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 783, in exit self._close_internal(exception_type) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 821, in _close_internal self._sess.close() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1069, in close self._sess.close() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1229, in close ignore_live_threads=True) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(*self._exc_info_to_raise) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/six.py", line 692, in reraise raise value.with_traceback(tb) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 257, in _run enqueue_callable() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1257, in _single_operation_run self._call_tf_sessionrun(None, {}, [], target_list, None) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0 [[{{node read_batch_features/read/ReaderReadUpToV2}} = ReaderReadUpToV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](read_batch_features/read/TFRecordReaderV2, read_batch_features/file_name_queue, read_batch_features/read/ReaderReadUpToV2/num_records)]]

opened by hyangyt 0

ngrams maybe contains bug

def GenerateNgrams(words, ngrams):
    nglist = []
    for ng in ngrams:
        for word in words:
            nglist.extend([word[n:n+ng] for n in range(len(word)-ng+1)])
    return ngli

maybe it should like following

def GenerateNgrams(words, ngrams):
    nglist = []
    for ng in ngrams:
        nglist.extend(''.join([words[n:n+ng]) for n in range(len(words)-ng+1)])
    return ngli

opened by qqhard 0

Owner

Alan Patterson

GitHub

🌳 A Python-inspired implementation of the Optimum-Path Forest classifier.

OPFython: A Python-Inspired Optimum-Path Forest Classifier Welcome to OPFython. Note that this implementation relies purely on the standard LibOPF. Th

30 Jan 4, 2023

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

19 Sep 29, 2022

Line-level Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Line-level Handwritten Text Recognition with TensorFlow This model is an extended version of the Simple HTR system implemented by @Harald Scheidl and

72 May 7, 2022

A repository that shares tuning results of trained models generated by TensorFlow / Keras. Post-training quantization (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization), Quantization-aware training. TensorFlow Lite. OpenVINO. CoreML. TensorFlow.js. TF-TRT. MediaPipe. ONNX. [.tflite,.h5,.pb,saved_model,tfjs,tftrt,mlmodel,.xml/.bin, .onnx]

PINTO_model_zoo Please read the contents of the LICENSE file located directly under each folder before using the model. My model conversion scripts ar

2.4k Jan 5, 2023

Object tracking implemented with YOLOv4, DeepSort, and TensorFlow.

Object tracking implemented with YOLOv4, DeepSort, and TensorFlow. YOLOv4 is a state of the art algorithm that uses deep convolutional neural networks to perform object detections. We can take the output of YOLOv4 feed these object detections into Deep SORT (Simple Online and Realtime Tracking with a Deep Association Metric) in order to create a highly accurate object tracker.

1.1k Dec 29, 2022

A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

CSL-YOLO: A New Lightweight Object Detection System for Edge Computing This project provides a SOTA level lightweight YOLO called "Cross-Stage Lightwe

54 Dec 21, 2022

This is a yolo3 implemented via tensorflow 2.7

YoloV3 - an object detection algorithm implemented via TF 2.x source code In this article I assume you've already familiar with basic computer vision

2 Jan 17, 2022

DIT is a DTLS MitM proxy implemented in Python 3. It can intercept, manipulate and suppress datagrams between two DTLS endpoints and supports psk-based and certificate-based authentication schemes (RSA + ECC).

DIT - DTLS Interception Tool DIT is a MitM proxy tool to intercept DTLS traffic. It can intercept, manipulate and/or suppress DTLS datagrams between t

52 Nov 30, 2022

This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

LIBRAS-Image-Classifier This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian

26 Oct 14, 2022

Keras-1D-NN-Classifier

Keras-1D-NN-Classifier This code is based on the reference codes linked below. reference 1, reference 2 This code is for 1-D array data classification

6 May 18, 2021

People movement type classifier with YOLOv4 detection and SORT tracking.

Movement classification The goal of this project would be movement classification of people, in other words, walking (normal and fast) and running. Yo

4 Sep 21, 2021

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

103 Dec 14, 2022

Minimal But Practical Image Classifier Pipline Using Pytorch, Finetune on ResNet18, Got 99% Accuracy on Own Small Datasets.

PyTorch Image Classifier Updates As for many users request, I released a new version of standared pytorch immage classification example at here: http:

106 Nov 6, 2022

CondNet: Conditional Classifier for Scene Segmentation

CondNet: Conditional Classifier for Scene Segmentation Introduction The fully convolutional network (FCN) has achieved tremendous success in dense vis

31 Jul 22, 2022

Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples / ICLR 2018

Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples This project is for the paper "Training Confidence-Calibrated Clas

168 Nov 29, 2022

Open-Set Recognition: A Good Closed-Set Classifier is All You Need

Open-Set Recognition: A Good Closed-Set Classifier is All You Need Code for our paper: "Open-Set Recognition: A Good Closed-Set Classifier is All You

194 Jan 3, 2023

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

92 Nov 30, 2022

This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization This codebase is the official implementation of Test-Time Classifier A

47 Dec 28, 2022

A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

3 Dec 29, 2022

Simple embedding based text classifier inspired by fastText, implemented in tensorflow

Related tags

Overview

FastText in Tensorflow

FastText Language Identification

Implemented:

Not Implemented:

Usage

Language Identification

Distributed Training

Tensorflow Serving

Facebook Examples

Comments

Error during training

Python3 compat and multiple datasets

ValueError: Shape must be rank 2 but is rank 3 for 'concat' (op: 'ConcatV2') with input shapes: [?,16], [?,1,16], []

`DataLossError: corrupted record` during training

ngrams maybe contains bug

Owner

Alan Patterson

🌳 A Python-inspired implementation of the Optimum-Path Forest classifier.

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Line-level Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Object tracking implemented with YOLOv4, DeepSort, and TensorFlow.

A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

This is a yolo3 implemented via tensorflow 2.7

DIT is a DTLS MitM proxy implemented in Python 3. It can intercept, manipulate and suppress datagrams between two DTLS endpoints and supports psk-based and certificate-based authentication schemes (RSA + ECC).

This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

Keras-1D-NN-Classifier

People movement type classifier with YOLOv4 detection and SORT tracking.

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Minimal But Practical Image Classifier Pipline Using Pytorch, Finetune on ResNet18, Got 99% Accuracy on Own Small Datasets.

CondNet: Conditional Classifier for Scene Segmentation

Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples / ICLR 2018

Open-Set Recognition: A Good Closed-Set Classifier is All You Need

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.