Library for fast text representation and classification.

Overview

fastText

fastText is a library for efficient learning of word representations and sentence classification.

CircleCI

Table of contents

Resources

Models

Supplementary data

FAQ

You can find answers to frequently asked questions on our website.

Cheatsheet

We also provide a cheatsheet full of useful one-liners.

Requirements

We are continuously building and testing our library, CLI and Python bindings under various docker images using circleci.

Generally, fastText builds on modern Mac OS and Linux distributions. Since it uses some C++11 features, it requires a compiler with good C++11 support. These include :

  • (g++-4.7.2 or newer) or (clang-3.3 or newer)

Compilation is carried out using a Makefile, so you will need to have a working make. If you want to use cmake you need at least version 2.8.9.

One of the oldest distributions we successfully built and tested the CLI under is Debian jessie.

For the word-similarity evaluation script you will need:

  • Python 2.6 or newer
  • NumPy & SciPy

For the python bindings (see the subdirectory python) you will need:

  • Python version 2.7 or >=3.4
  • NumPy & SciPy
  • pybind11

One of the oldest distributions we successfully built and tested the Python bindings under is Debian jessie.

If these requirements make it impossible for you to use fastText, please open an issue and we will try to accommodate you.

Building fastText

We discuss building the latest stable version of fastText.

Getting the source code

You can find our latest stable release in the usual place.

There is also the master branch that contains all of our most recent work, but comes along with all the usual caveats of an unstable branch. You might want to use this if you are a developer or power-user.

Building fastText using make (preferred)

$ wget https://github.com/facebookresearch/fastText/archive/v0.9.2.zip
$ unzip v0.9.2.zip
$ cd fastText-0.9.2
$ make

This will produce object files for all the classes as well as the main binary fasttext. If you do not plan on using the default system-wide compiler, update the two macros defined at the beginning of the Makefile (CC and INCLUDES).

Building fastText using cmake

For now this is not part of a release, so you will need to clone the master branch.

$ git clone https://github.com/facebookresearch/fastText.git
$ cd fastText
$ mkdir build && cd build && cmake ..
$ make && make install

This will create the fasttext binary and also all relevant libraries (shared, static, PIC).

Building fastText for Python

For now this is not part of a release, so you will need to clone the master branch.

$ git clone https://github.com/facebookresearch/fastText.git
$ cd fastText
$ pip install .

For further information and introduction see python/README.md

Example use cases

This library has two main use cases: word representation learning and text classification. These were described in the two papers 1 and 2.

Word representation learning

In order to learn word vectors, as described in 1, do:

$ ./fasttext skipgram -input data.txt -output model

where data.txt is a training file containing UTF-8 encoded text. By default the word vectors will take into account character n-grams from 3 to 6 characters. At the end of optimization the program will save two files: model.bin and model.vec. model.vec is a text file containing the word vectors, one per line. model.bin is a binary file containing the parameters of the model along with the dictionary and all hyper parameters. The binary file can be used later to compute word vectors or to restart the optimization.

Obtaining word vectors for out-of-vocabulary words

The previously trained model can be used to compute word vectors for out-of-vocabulary words. Provided you have a text file queries.txt containing words for which you want to compute vectors, use the following command:

$ ./fasttext print-word-vectors model.bin < queries.txt

This will output word vectors to the standard output, one vector per line. This can also be used with pipes:

$ cat queries.txt | ./fasttext print-word-vectors model.bin

See the provided scripts for an example. For instance, running:

$ ./word-vector-example.sh

will compile the code, download data, compute word vectors and evaluate them on the rare words similarity dataset RW [Thang et al. 2013].

Text classification

This library can also be used to train supervised text classifiers, for instance for sentiment analysis. In order to train a text classifier using the method described in 2, use:

$ ./fasttext supervised -input train.txt -output model

where train.txt is a text file containing a training sentence per line along with the labels. By default, we assume that labels are words that are prefixed by the string __label__. This will output two files: model.bin and model.vec. Once the model was trained, you can evaluate it by computing the precision and recall at k (P@k and R@k) on a test set using:

$ ./fasttext test model.bin test.txt k

The argument k is optional, and is equal to 1 by default.

In order to obtain the k most likely labels for a piece of text, use:

$ ./fasttext predict model.bin test.txt k

or use predict-prob to also get the probability for each label

$ ./fasttext predict-prob model.bin test.txt k

where test.txt contains a piece of text to classify per line. Doing so will print to the standard output the k most likely labels for each line. The argument k is optional, and equal to 1 by default. See classification-example.sh for an example use case. In order to reproduce results from the paper 2, run classification-results.sh, this will download all the datasets and reproduce the results from Table 1.

If you want to compute vector representations of sentences or paragraphs, please use:

$ ./fasttext print-sentence-vectors model.bin < text.txt

This assumes that the text.txt file contains the paragraphs that you want to get vectors for. The program will output one vector representation per line in the file.

You can also quantize a supervised model to reduce its memory usage with the following command:

$ ./fasttext quantize -output model

This will create a .ftz file with a smaller memory footprint. All the standard functionality, like test or predict work the same way on the quantized models:

$ ./fasttext test model.ftz test.txt

The quantization procedure follows the steps described in 3. You can run the script quantization-example.sh for an example.

Full documentation

Invoke a command without arguments to list available arguments and their default values:

$ ./fasttext supervised
Empty input or output path.

The following arguments are mandatory:
  -input              training file path
  -output             output file path

The following arguments are optional:
  -verbose            verbosity level [2]

The following arguments for the dictionary are optional:
  -minCount           minimal number of word occurrences [1]
  -minCountLabel      minimal number of label occurrences [0]
  -wordNgrams         max length of word ngram [1]
  -bucket             number of buckets [2000000]
  -minn               min length of char ngram [0]
  -maxn               max length of char ngram [0]
  -t                  sampling threshold [0.0001]
  -label              labels prefix [__label__]

The following arguments for training are optional:
  -lr                 learning rate [0.1]
  -lrUpdateRate       change the rate of updates for the learning rate [100]
  -dim                size of word vectors [100]
  -ws                 size of the context window [5]
  -epoch              number of epochs [5]
  -neg                number of negatives sampled [5]
  -loss               loss function {ns, hs, softmax} [softmax]
  -thread             number of threads [12]
  -pretrainedVectors  pretrained word vectors for supervised learning []
  -saveOutput         whether output params should be saved [0]

The following arguments for quantization are optional:
  -cutoff             number of words and ngrams to retain [0]
  -retrain            finetune embeddings if a cutoff is applied [0]
  -qnorm              quantizing the norm separately [0]
  -qout               quantizing the classifier [0]
  -dsub               size of each sub-vector [2]

Defaults may vary by mode. (Word-representation modes skipgram and cbow use a default -minCount of 5.)

References

Please cite 1 if using this code for learning word representations or 2 if using for text classification.

Enriching Word Vectors with Subword Information

[1] P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information

@article{bojanowski2017enriching,
  title={Enriching Word Vectors with Subword Information},
  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
  journal={Transactions of the Association for Computational Linguistics},
  volume={5},
  year={2017},
  issn={2307-387X},
  pages={135--146}
}

Bag of Tricks for Efficient Text Classification

[2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification

@InProceedings{joulin2017bag,
  title={Bag of Tricks for Efficient Text Classification},
  author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},
  booktitle={Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
  month={April},
  year={2017},
  publisher={Association for Computational Linguistics},
  pages={427--431},
}

FastText.zip: Compressing text classification models

[3] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, FastText.zip: Compressing text classification models

@article{joulin2016fasttext,
  title={FastText.zip: Compressing text classification models},
  author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Douze, Matthijs and J{\'e}gou, H{\'e}rve and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1612.03651},
  year={2016}
}

(* These authors contributed equally.)

Join the fastText community

See the CONTRIBUTING file for information about how to help out.

License

fastText is MIT-licensed.

Comments
  • fasttext installed but import fails

    fasttext installed but import fails

    Hi have successfully installed fasttext on python3.5. However, when I try to import it I get the following error:

    Using /usr/local/lib/python3.5/dist-packages
    Finished processing dependencies for fasttext==0.8.22
    user@server:~/GitHub/fastText$ python3.5
    Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
    [GCC 5.4.0 20160609] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import fasttext
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: No module named 'fasttext'
    >>> 
    

    I have tried installing both with pip install . and python setup.y install with no luck.

    opened by ahmedahmedov 25
  • Assertion failed on ./fasttext predict

    Assertion failed on ./fasttext predict

    predict command failed!

    ./fasttext predict model.bin test.txt

    Assertion failed: (counts.size() == osz_), function setTargetCounts, file src/model.cc, line 188.
    Abort trap: 6
    

    model train command was:

    ./fasttext supervised -input train.txt -output model -wordNgrams 4 -bucket 1000000 -thread 16

    Read 4223M words
    Number of words:  16577869
    Number of labels: 25
    Progress: 100.0%  words/sec/thread: 375706  lr: 0.000000  loss: 0.169518  eta: 0h0m 
    
    opened by spate141 25
  • How can we get the vector of a paragraph?

    How can we get the vector of a paragraph?

    I have ever tried doc2vec (from gensim, based on word2vec), with which I can extract fixed length vector for variant length paragraphs. Can I do the same with fastText?

    Thank you!

    opened by xchangcheng 22
  • OS X install problem

    OS X install problem

    When I install fasttext using "pip install .", I get some errors like following

    Failed to build fasttext
    Installing collected packages: fasttext
      Running setup.py install for fasttext ... error
        Complete output from command /miniconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/tz/msp_r50s03s59q40s_gmhx600000gn/T/pip-req-build-i2z3pyel/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/tz/msp_r50s03s59q40s_gmhx600000gn/T/pip-record-yg0h6noh/install-record.txt --single-version-externally-managed --compile:
        running install
        running build
        running build_py
        creating build
        creating build/lib.macosx-10.7-x86_64-3.6
        creating build/lib.macosx-10.7-x86_64-3.6/fastText
        copying python/fastText/__init__.py -> build/lib.macosx-10.7-x86_64-3.6/fastText
        copying python/fastText/FastText.py -> build/lib.macosx-10.7-x86_64-3.6/fastText
        creating build/lib.macosx-10.7-x86_64-3.6/fastText/util
        copying python/fastText/util/util.py -> build/lib.macosx-10.7-x86_64-3.6/fastText/util
        copying python/fastText/util/__init__.py -> build/lib.macosx-10.7-x86_64-3.6/fastText/util
        creating build/lib.macosx-10.7-x86_64-3.6/fastText/tests
        copying python/fastText/tests/test_script.py -> build/lib.macosx-10.7-x86_64-3.6/fastText/tests
        copying python/fastText/tests/__init__.py -> build/lib.macosx-10.7-x86_64-3.6/fastText/tests
        copying python/fastText/tests/test_configurations.py -> build/lib.macosx-10.7-x86_64-3.6/fastText/tests
        running build_ext
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -c /var/folders/tz/msp_r50s03s59q40s_gmhx600000gn/T/tmp1upvarhx.cpp -o var/folders/tz/msp_r50s03s59q40s_gmhx600000gn/T/tmp1upvarhx.o -stdlib=libc++
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -c /var/folders/tz/msp_r50s03s59q40s_gmhx600000gn/T/tmp9dzh7j94.cpp -o var/folders/tz/msp_r50s03s59q40s_gmhx600000gn/T/tmp9dzh7j94.o -std=c++14
        warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
        1 warning generated.
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -c /var/folders/tz/msp_r50s03s59q40s_gmhx600000gn/T/tmpw5pz6xr0.cpp -o var/folders/tz/msp_r50s03s59q40s_gmhx600000gn/T/tmpw5pz6xr0.o -fvisibility=hidden
        warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
        1 warning generated.
        building 'fasttext_pybind' extension
        creating build/temp.macosx-10.7-x86_64-3.6
        creating build/temp.macosx-10.7-x86_64-3.6/python
        creating build/temp.macosx-10.7-x86_64-3.6/python/fastText
        creating build/temp.macosx-10.7-x86_64-3.6/python/fastText/pybind
        creating build/temp.macosx-10.7-x86_64-3.6/src
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -I/Users/ruanxiaoyi/.local/include/python3.6m -Isrc -I/miniconda3/include/python3.6m -c python/fastText/pybind/fasttext_pybind.cc -o build/temp.macosx-10.7-x86_64-3.6/python/fastText/pybind/fasttext_pybind.o -stdlib=libc++ -DVERSION_INFO="0.8.22" -std=c++14 -fvisibility=hidden
        python/fastText/pybind/fasttext_pybind.cc:219:35: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'std::__1::vector<long long, std::__1::allocator<long long> >::size_type' (aka 'unsigned long') [-Wsign-compare]
                    for (int32_t i = 0; i < vocab_freq.size(); i++) {
                                        ~ ^ ~~~~~~~~~~~~~~~~~
        python/fastText/pybind/fasttext_pybind.cc:233:35: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'std::__1::vector<long long, std::__1::allocator<long long> >::size_type' (aka 'unsigned long') [-Wsign-compare]
                    for (int32_t i = 0; i < labels_freq.size(); i++) {
                                        ~ ^ ~~~~~~~~~~~~~~~~~~
        2 warnings generated.
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -I/Users/ruanxiaoyi/.local/include/python3.6m -Isrc -I/miniconda3/include/python3.6m -c src/dictionary.cc -o build/temp.macosx-10.7-x86_64-3.6/src/dictionary.o -stdlib=libc++ -DVERSION_INFO="0.8.22" -std=c++14 -fvisibility=hidden
        src/dictionary.cc:181:52: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
            for (size_t j = i, n = 1; j < word.size() && n <= args_->maxn; n++) {
                                                         ~ ^  ~~~~~~~~~~~
        src/dictionary.cc:186:13: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
              if (n >= args_->minn && !(n == 1 && (i == 0 || j == word.size()))) {
                  ~ ^  ~~~~~~~~~~~
        src/dictionary.cc:198:24: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int32_t' (aka 'int') [-Wsign-compare]
          for (size_t i = 0; i < size_; i++) {
                             ~ ^ ~~~~~
        src/dictionary.cc:296:24: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int32_t' (aka 'int') [-Wsign-compare]
          for (size_t i = 0; i < size_; i++) {
                             ~ ^ ~~~~~
        src/dictionary.cc:316:25: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'std::__1::vector<int, std::__1::allocator<int> >::size_type' (aka 'unsigned long') [-Wsign-compare]
          for (int32_t i = 0; i < hashes.size(); i++) {
                              ~ ^ ~~~~~~~~~~~~~
        src/dictionary.cc:318:31: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'std::__1::vector<int, std::__1::allocator<int> >::size_type' (aka 'unsigned long') [-Wsign-compare]
            for (int32_t j = i + 1; j < hashes.size() && j < i + n; j++) {
                                    ~ ^ ~~~~~~~~~~~~~
        src/dictionary.cc:515:25: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'std::__1::vector<fasttext::entry, std::__1::allocator<fasttext::entry> >::size_type' (aka 'unsigned long') [-Wsign-compare]
          for (int32_t i = 0; i < words_.size(); i++) {
                              ~ ^ ~~~~~~~~~~~~~
        src/dictionary.cc:517:12: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'std::__1::vector<int, std::__1::allocator<int> >::size_type' (aka 'unsigned long') [-Wsign-compare]
                (j < words.size() && words[j] == i)) {
                 ~ ^ ~~~~~~~~~~~~
        8 warnings generated.
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -I/Users/ruanxiaoyi/.local/include/python3.6m -Isrc -I/miniconda3/include/python3.6m -c src/main.cc -o build/temp.macosx-10.7-x86_64-3.6/src/main.o -stdlib=libc++ -DVERSION_INFO="0.8.22" -std=c++14 -fvisibility=hidden
        src/main.cc:348:3: warning: code will never be executed [-Wunreachable-code]
          exit(0);
          ^~~~
        1 warning generated.
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -I/Users/ruanxiaoyi/.local/include/python3.6m -Isrc -I/miniconda3/include/python3.6m -c src/fasttext.cc -o build/temp.macosx-10.7-x86_64-3.6/src/fasttext.o -stdlib=libc++ -DVERSION_INFO="0.8.22" -std=c++14 -fvisibility=hidden
        src/fasttext.cc:92:21: warning: comparison of integers of different signs: 'int' and 'std::__1::vector<int, std::__1::allocator<int> >::size_type' (aka 'unsigned long') [-Wsign-compare]
          for (int i = 0; i < ngrams.size(); i++) {
                          ~ ^ ~~~~~~~~~~~~~
        src/fasttext.cc:302:18: warning: comparison of integers of different signs: 'const int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
            return eosid == i1 || (eosid != i2 && norms[i1] > norms[i2]);
                   ~~~~~ ^  ~~
        src/fasttext.cc:302:34: warning: comparison of integers of different signs: 'const int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
            return eosid == i1 || (eosid != i2 && norms[i1] > norms[i2]);
                                   ~~~~~ ^  ~~
        src/fasttext.cc:323:16: warning: 'selectEmbeddings' is deprecated: selectEmbeddings is being deprecated. [-Wdeprecated-declarations]
            auto idx = selectEmbeddings(qargs.cutoff);
                       ^
        src/fasttext.h:165:3: note: 'selectEmbeddings' has been explicitly marked deprecated here
          FASTTEXT_DEPRECATED("selectEmbeddings is being deprecated.")
          ^
        src/utils.h:18:49: note: expanded from macro 'FASTTEXT_DEPRECATED'
        #define FASTTEXT_DEPRECATED(msg) __attribute__((__deprecated__(msg)))
                                                        ^
        src/fasttext.cc:322:40: warning: comparison of integers of different signs: 'const size_t' (aka 'const unsigned long') and 'int64_t' (aka 'long long') [-Wsign-compare]
          if (qargs.cutoff > 0 && qargs.cutoff < input->size(0)) {
                                  ~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
        src/fasttext.cc:327:24: warning: comparison of integers of different signs: 'int' and 'std::__1::vector<int, std::__1::allocator<int> >::size_type' (aka 'unsigned long') [-Wsign-compare]
            for (auto i = 0; i < idx.size(); i++) {
                             ~ ^ ~~~~~~~~~~
        src/fasttext.cc:380:25: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'std::__1::vector<int, std::__1::allocator<int> >::size_type' (aka 'unsigned long') [-Wsign-compare]
          for (int32_t w = 0; w < line.size(); w++) {
                              ~ ^ ~~~~~~~~~~~
        src/fasttext.cc:384:41: warning: comparison of integers of different signs: 'int' and 'std::__1::vector<int, std::__1::allocator<int> >::size_type' (aka 'unsigned long') [-Wsign-compare]
              if (c != 0 && w + c >= 0 && w + c < line.size()) {
                                          ~~~~~ ^ ~~~~~~~~~~~
        src/fasttext.cc:398:25: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'std::__1::vector<int, std::__1::allocator<int> >::size_type' (aka 'unsigned long') [-Wsign-compare]
          for (int32_t w = 0; w < line.size(); w++) {
                              ~ ^ ~~~~~~~~~~~
        src/fasttext.cc:402:41: warning: comparison of integers of different signs: 'int' and 'std::__1::vector<int, std::__1::allocator<int> >::size_type' (aka 'unsigned long') [-Wsign-compare]
              if (c != 0 && w + c >= 0 && w + c < line.size()) {
                                          ~~~~~ ^ ~~~~~~~~~~~
        src/fasttext.cc:479:27: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'std::__1::vector<int, std::__1::allocator<int> >::size_type' (aka 'unsigned long') [-Wsign-compare]
            for (int32_t i = 0; i < line.size(); i++) {
                                ~ ^ ~~~~~~~~~~~
        src/fasttext.cc:514:25: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'std::__1::vector<int, std::__1::allocator<int> >::size_type' (aka 'unsigned long') [-Wsign-compare]
          for (int32_t i = 0; i < ngrams.size(); i++) {
                              ~ ^ ~~~~~~~~~~~~~
        src/fasttext.cc:551:5: warning: 'precomputeWordVectors' is deprecated: precomputeWordVectors is being deprecated. [-Wdeprecated-declarations]
            precomputeWordVectors(*wordVectors_);
            ^
        src/fasttext.h:180:3: note: 'precomputeWordVectors' has been explicitly marked deprecated here
          FASTTEXT_DEPRECATED("precomputeWordVectors is being deprecated.")
          ^
        src/utils.h:18:49: note: expanded from macro 'FASTTEXT_DEPRECATED'
        #define FASTTEXT_DEPRECATED(msg) __attribute__((__deprecated__(msg)))
                                                        ^
        src/fasttext.cc:585:23: warning: comparison of integers of different signs: 'std::__1::vector<std::__1::pair<float, std::__1::basic_string<char> >, std::__1::allocator<std::__1::pair<float, std::__1::basic_string<char> > > >::size_type' (aka 'unsigned long') and 'int32_t' (aka 'int') [-Wsign-compare]
              if (heap.size() == k && similarity < heap.front().first) {
                  ~~~~~~~~~~~ ^  ~
        src/fasttext.cc:590:23: warning: comparison of integers of different signs: 'std::__1::vector<std::__1::pair<float, std::__1::basic_string<char> >, std::__1::allocator<std::__1::pair<float, std::__1::basic_string<char> > > >::size_type' (aka 'unsigned long') and 'int32_t' (aka 'int') [-Wsign-compare]
              if (heap.size() > k) {
                  ~~~~~~~~~~~ ^ ~
        src/fasttext.cc:701:24: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long long') [-Wsign-compare]
          for (size_t i = 0; i < n; i++) {
                             ~ ^ ~
        src/fasttext.cc:706:26: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long long') [-Wsign-compare]
            for (size_t j = 0; j < dim; j++) {
                               ~ ^ ~~~
        src/fasttext.cc:718:24: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long long') [-Wsign-compare]
          for (size_t i = 0; i < n; i++) {
                             ~ ^ ~
        src/fasttext.cc:723:26: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long long') [-Wsign-compare]
            for (size_t j = 0; j < dim; j++) {
                               ~ ^ ~~~
        19 warnings generated.
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -I/Users/ruanxiaoyi/.local/include/python3.6m -Isrc -I/miniconda3/include/python3.6m -c src/utils.cc -o build/temp.macosx-10.7-x86_64-3.6/src/utils.o -stdlib=libc++ -DVERSION_INFO="0.8.22" -std=c++14 -fvisibility=hidden
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -I/Users/ruanxiaoyi/.local/include/python3.6m -Isrc -I/miniconda3/include/python3.6m -c src/model.cc -o build/temp.macosx-10.7-x86_64-3.6/src/model.o -stdlib=libc++ -DVERSION_INFO="0.8.22" -std=c++14 -fvisibility=hidden
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -I/Users/ruanxiaoyi/.local/include/python3.6m -Isrc -I/miniconda3/include/python3.6m -c src/loss.cc -o build/temp.macosx-10.7-x86_64-3.6/src/loss.o -stdlib=libc++ -DVERSION_INFO="0.8.22" -std=c++14 -fvisibility=hidden
        src/loss.cc:83:21: warning: comparison of integers of different signs: 'std::__1::vector<std::__1::pair<float, int>, std::__1::allocator<std::__1::pair<float, int> > >::size_type' (aka 'unsigned long') and 'int32_t' (aka 'int') [-Wsign-compare]
            if (heap.size() == k && std_log(output[i]) < heap.front().first) {
                ~~~~~~~~~~~ ^  ~
        src/loss.cc:88:21: warning: comparison of integers of different signs: 'std::__1::vector<std::__1::pair<float, int>, std::__1::allocator<std::__1::pair<float, int> > >::size_type' (aka 'unsigned long') and 'int32_t' (aka 'int') [-Wsign-compare]
            if (heap.size() > k) {
                ~~~~~~~~~~~ ^ ~
        src/loss.cc:257:25: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'std::__1::vector<int, std::__1::allocator<int> >::size_type' (aka 'unsigned long') [-Wsign-compare]
          for (int32_t i = 0; i < pathToRoot.size(); i++) {
                              ~ ^ ~~~~~~~~~~~~~~~~~
        src/loss.cc:282:19: warning: comparison of integers of different signs: 'std::__1::vector<std::__1::pair<float, int>, std::__1::allocator<std::__1::pair<float, int> > >::size_type' (aka 'unsigned long') and 'int32_t' (aka 'int') [-Wsign-compare]
          if (heap.size() == k && score < heap.front().first) {
              ~~~~~~~~~~~ ^  ~
        src/loss.cc:289:21: warning: comparison of integers of different signs: 'std::__1::vector<std::__1::pair<float, int>, std::__1::allocator<std::__1::pair<float, int> > >::size_type' (aka 'unsigned long') and 'int32_t' (aka 'int') [-Wsign-compare]
            if (heap.size() > k) {
                ~~~~~~~~~~~ ^ ~
        5 warnings generated.
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -I/Users/ruanxiaoyi/.local/include/python3.6m -Isrc -I/miniconda3/include/python3.6m -c src/productquantizer.cc -o build/temp.macosx-10.7-x86_64-3.6/src/productquantizer.o -stdlib=libc++ -DVERSION_INFO="0.8.22" -std=c++14 -fvisibility=hidden
        src/productquantizer.cc:246:22: warning: comparison of integers of different signs: 'int' and 'std::__1::vector<float, std::__1::allocator<float> >::size_type' (aka 'unsigned long') [-Wsign-compare]
          for (auto i = 0; i < centroids_.size(); i++) {
                           ~ ^ ~~~~~~~~~~~~~~~~~
        1 warning generated.
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -I/Users/ruanxiaoyi/.local/include/python3.6m -Isrc -I/miniconda3/include/python3.6m -c src/args.cc -o build/temp.macosx-10.7-x86_64-3.6/src/args.o -stdlib=libc++ -DVERSION_INFO="0.8.22" -std=c++14 -fvisibility=hidden
        src/args.cc:93:23: warning: comparison of integers of different signs: 'int' and 'std::__1::vector<std::__1::basic_string<char>, std::__1::allocator<std::__1::basic_string<char> > >::size_type' (aka 'unsigned long') [-Wsign-compare]
          for (int ai = 2; ai < args.size(); ai += 2) {
                           ~~ ^ ~~~~~~~~~~~
        1 warning generated.
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -I/Users/ruanxiaoyi/.local/include/python3.6m -Isrc -I/miniconda3/include/python3.6m -c src/quantmatrix.cc -o build/temp.macosx-10.7-x86_64-3.6/src/quantmatrix.o -stdlib=libc++ -DVERSION_INFO="0.8.22" -std=c++14 -fvisibility=hidden
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -I/Users/ruanxiaoyi/.local/include/python3.6m -Isrc -I/miniconda3/include/python3.6m -c src/matrix.cc -o build/temp.macosx-10.7-x86_64-3.6/src/matrix.o -stdlib=libc++ -DVERSION_INFO="0.8.22" -std=c++14 -fvisibility=hidden
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -I/Users/ruanxiaoyi/.local/include/python3.6m -Isrc -I/miniconda3/include/python3.6m -c src/meter.cc -o build/temp.macosx-10.7-x86_64-3.6/src/meter.o -stdlib=libc++ -DVERSION_INFO="0.8.22" -std=c++14 -fvisibility=hidden
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -I/Users/ruanxiaoyi/.local/include/python3.6m -Isrc -I/miniconda3/include/python3.6m -c src/vector.cc -o build/temp.macosx-10.7-x86_64-3.6/src/vector.o -stdlib=libc++ -DVERSION_INFO="0.8.22" -std=c++14 -fvisibility=hidden
        gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/miniconda3/include -arch x86_64 -I/miniconda3/include -arch x86_64 -I/miniconda3/include/python3.6m -I/Users/ruanxiaoyi/.local/include/python3.6m -Isrc -I/miniconda3/include/python3.6m -c src/densematrix.cc -o build/temp.macosx-10.7-x86_64-3.6/src/densematrix.o -stdlib=libc++ -DVERSION_INFO="0.8.22" -std=c++14 -fvisibility=hidden
        g++ -bundle -undefined dynamic_lookup -L/miniconda3/lib -arch x86_64 -L/miniconda3/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/python/fastText/pybind/fasttext_pybind.o build/temp.macosx-10.7-x86_64-3.6/src/dictionary.o build/temp.macosx-10.7-x86_64-3.6/src/main.o build/temp.macosx-10.7-x86_64-3.6/src/fasttext.o build/temp.macosx-10.7-x86_64-3.6/src/utils.o build/temp.macosx-10.7-x86_64-3.6/src/model.o build/temp.macosx-10.7-x86_64-3.6/src/loss.o build/temp.macosx-10.7-x86_64-3.6/src/productquantizer.o build/temp.macosx-10.7-x86_64-3.6/src/args.o build/temp.macosx-10.7-x86_64-3.6/src/quantmatrix.o build/temp.macosx-10.7-x86_64-3.6/src/matrix.o build/temp.macosx-10.7-x86_64-3.6/src/meter.o build/temp.macosx-10.7-x86_64-3.6/src/vector.o build/temp.macosx-10.7-x86_64-3.6/src/densematrix.o -o build/lib.macosx-10.7-x86_64-3.6/fasttext_pybind.cpython-36m-darwin.so
        clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
        ld: library not found for -lstdc++
        clang: error: linker command failed with exit code 1 (use -v to see invocation)
        error: command 'g++' failed with exit status 1
    
        ----------------------------------------
    Command "/miniconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/tz/msp_r50s03s59q40s_gmhx600000gn/T/pip-req-build-i2z3pyel/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/tz/msp_r50s03s59q40s_gmhx600000gn/T/pip-record-yg0h6noh/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/tz/msp_r50s03s59q40s_gmhx600000gn/T/pip-req-build-i2z3pyel/
    

    And my environment is

    Apple LLVM version 10.0.0 (clang-1000.10.44.4)
    Target: x86_64-apple-darwin18.2.0
    Thread model: posix
    InstalledDir: /Library/Developer/CommandLineTools/usr/bin
     "/Library/Developer/CommandLineTools/usr/bin/clang" -cc1 -triple x86_64-apple-macosx10.14.0 -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -E -disable-free -disable-llvm-verifier -discard-value-names -main-file-name - -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -fno-strict-return -masm-verbose -munwind-tables -target-cpu penryn -dwarf-column-info -debugger-tuning=lldb -target-linker-version 409.12 -v -resource-dir /Library/Developer/CommandLineTools/usr/lib/clang/10.0.0 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk -I/usr/local/include -stdlib=libc++ -fdeprecated-macro -fdebug-compilation-dir /Users/ruanxiaoyi/Downloads/fastText-master -ferror-limit 19 -fmessage-length 204 -stack-protector 1 -fblocks -fencode-extended-block-signature -fobjc-runtime=macosx-10.14.0 -fcxx-exceptions -fexceptions -fmax-type-align=16 -fdiagnostics-show-option -fcolor-diagnostics -o - -x c++ -
    clang -cc1 version 10.0.0 (clang-1000.10.44.4) default target x86_64-apple-darwin18.2.0
    ignoring nonexistent directory "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/c++/v1"
    ignoring nonexistent directory "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/local/include"
    ignoring nonexistent directory "/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/Library/Frameworks"
    #include "..." search starts here:
    #include <...> search starts here:
     /usr/local/include
     /Library/Developer/CommandLineTools/usr/include/c++/v1
     /Library/Developer/CommandLineTools/usr/lib/clang/10.0.0/include
     /Library/Developer/CommandLineTools/usr/include
     /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include
     /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/System/Library/Frameworks (framework directory)
    

    Any suggestion for this problem?

    Python Build 
    opened by rxy1212 21
  • Any plan to support different weight for each class in loss function?

    Any plan to support different weight for each class in loss function?

    Looking at the current code, it seems to me that loss function are evaluated with the same weight for each class, which is OK for balanced data. For highly imbalanced data, are there any plan to support different weight for each class in loss function? I am thinking in command line, do:

    fasttext -input XXX -output XXX -weight_class1 10 -weight_class2 1 -weight_class3 3 
    

    or simply

    fasttext -weight_balanced 
    

    if the weight is inversely proportional to number of instances in that class?

    opened by kuangchen 18
  • Interpreting Multilabel output

    Interpreting Multilabel output

    So I loaded multilabel values for my targets. But when I use the predict_prob function; it seems like conditional probablity more than multilabel output.

    I was assuming that all the labels would have a value between 1 and 0, but I am seeing that all the labels add up to 1 instead for each class to have a value between 1 and 0.

    Can someone help me understand this output.

    opened by iymitchell 17
  • The memory error when loading the pre-trained model

    The memory error when loading the pre-trained model

    There is a memory error when I trying to load the pre-trained model, e.g., model = fasttext.load_model('D:/download/wiki.en/wiki.en.bin').

    Since the size of this bin file is almost 9G, and my memory size is only 4G. I am trying to find a memory friendly method to load the model. Can anyone give me a clue?
    Thanks a lot!

    opened by zhouchichun 16
  • Quantize error

    Quantize error

    I already have trained model_1.bin with supervised option, and when I am trying to quantize that model, I am getting following error!

    /opt/fastText/fasttext quantize -input data.txt -output models/model_1 -verbose 3 -wordNgrams 3 -bucket 1000000 -minn 3 -maxn 6 -lr 0.010 -dim 100 -loss ns -thread 8 -epoch 10 -qnorm -retrain -cutoff 100000
    
    fasttext: src/vector.cc:71: void fasttext::Vector::addRow(const fasttext::Matrix&, int64_t): Assertion `i < A.m_' failed.
    Aborted (core dumped)
    

    Edit: If I dont use -cutoff then I can run this without any error!

    opened by spate141 16
  • Loss - OVA model - Not predicting sigmoid output in Ubuntu 16.04

    Loss - OVA model - Not predicting sigmoid output in Ubuntu 16.04

    Install Log:

    c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/args.cc c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/matrix.cc c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/dictionary.cc c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/loss.cc c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/productquantizer.cc c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/densematrix.cc c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/quantmatrix.cc c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/vector.cc c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/model.cc c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/utils.cc c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/meter.cc c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/fasttext.cc src/fasttext.cc: In member function ‘void fasttext::FastText::quantize(const fasttext::Args&)’: src/fasttext.cc:323:16: warning: ‘std::vector fasttext::FastText::selectEmbeddings(int32_t) const’ is deprecated: selectEmbeddings is being deprecated. [-Wdeprecated-declarations] auto idx = selectEmbeddings(qargs.cutoff); ^ src/fasttext.cc:293:22: note: declared here std::vector<int32_t> FastText::selectEmbeddings(int32_t cutoff) const { ^ src/fasttext.cc:323:45: warning: ‘std::vector fasttext::FastText::selectEmbeddings(int32_t) const’ is deprecated: selectEmbeddings is being deprecated. [-Wdeprecated-declarations] auto idx = selectEmbeddings(qargs.cutoff); ^ src/fasttext.cc:293:22: note: declared here std::vector<int32_t> FastText::selectEmbeddings(int32_t cutoff) const { ^ src/fasttext.cc: In member function ‘void fasttext::FastText::lazyComputeWordVectors()’: src/fasttext.cc:551:5: warning: ‘void fasttext::FastText::precomputeWordVectors(fasttext::DenseMatrix&)’ is deprecated: precomputeWordVectors is being deprecated. [-Wdeprecated-declarations] precomputeWordVectors(*wordVectors_); ^ src/fasttext.cc:534:6: note: declared here void FastText::precomputeWordVectors(DenseMatrix& wordVectors) { ^ src/fasttext.cc:551:40: warning: ‘void fasttext::FastText::precomputeWordVectors(fasttext::DenseMatrix&)’ is deprecated: precomputeWordVectors is being deprecated. [-Wdeprecated-declarations] precomputeWordVectors(*wordVectors_); ^ src/fasttext.cc:534:6: note: declared here void FastText::precomputeWordVectors(DenseMatrix& wordVectors) { ^ c++ -pthread -std=c++0x -march=native -O3 -funroll-loops args.o matrix.o dictionary.o loss.o productquantizer.o densematrix.o quantmatrix.o vector.o model.o utils.o meter.o fasttext.o src/main.cc -o fasttext

    The output is not sigmoid. Its still same as the Softmax. Args: dim 100 ws 5 epoch 1 minCount 1 neg 5 wordNgrams 3 loss one-vs-all model sup bucket 1000000 minn 3 maxn 3 lrUpdateRate 100 t 0.0001

    bug 
    opened by giriannamalai 15
  • Binary model that was trained on Common crawl

    Binary model that was trained on Common crawl

    Hello! I enjoy using your library and pretrained vectors. I see that for vectors that were trained on wiki you provide both binary model and pretrained vectors. However, for vectors that were trained on Common crawl, you only provide pretrained vectors. Is it possible for you to publish binary model for them?

    Thanks, Alexander.

    opened by MrBoor 15
  • Running on PowerPC64LE (ppc64le)

    Running on PowerPC64LE (ppc64le)

    I am able to compile the stable (0.1.0) version of the code on a powerpc64le (IBM Minsky) without any errors/warnings. However when I run on any dataset (eg stackexchange cooking) using just the defaults ./fasttext supervised -input ... -output ... the program just hangs after displaying Reading ... words. I tried make debug as well. Same problem. (details: make 4.1, Ubuntu 16.04.3 LTS. Any ideas?

    opened by ironv 15
  • pip install imutils not working

    pip install imutils not working

    WARNING: Ignoring invalid distribution - (c:\python310\lib\site-packages) Installing collected packages: imutils DEPRECATION: imutils is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559 Running setup.py install for imutils ... error error: subprocess-exited-with-error

    × Running setup.py install for imutils did not run successfully. │ exit code: 1 ╰─> [118 lines of output] C:\Python310\lib\site-packages\setuptools\dist.py:717: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead warnings.warn( running install running build running build_py creating build creating build\lib creating build\lib\imutils copying imutils\contours.py -> build\lib\imutils copying imutils\convenience.py -> build\lib\imutils copying imutils\encodings.py -> build\lib\imutils copying imutils\meta.py -> build\lib\imutils copying imutils\object_detection.py -> build\lib\imutils copying imutils\paths.py -> build\lib\imutils copying imutils\perspective.py -> build\lib\imutils copying imutils\text.py -> build\lib\imutils copying imutils_init_.py -> build\lib\imutils creating build\lib\imutils\video copying imutils\video\count_frames.py -> build\lib\imutils\video copying imutils\video\filevideostream.py -> build\lib\imutils\video copying imutils\video\fps.py -> build\lib\imutils\video copying imutils\video\pivideostream.py -> build\lib\imutils\video copying imutils\video\videostream.py -> build\lib\imutils\video copying imutils\video\webcamvideostream.py -> build\lib\imutils\video copying imutils\video_init_.py -> build\lib\imutils\video creating build\lib\imutils\io copying imutils\io\tempfile.py -> build\lib\imutils\io copying imutils\io_init_.py -> build\lib\imutils\io creating build\lib\imutils\feature copying imutils\feature\dense.py -> build\lib\imutils\feature copying imutils\feature\factories.py -> build\lib\imutils\feature copying imutils\feature\gftt.py -> build\lib\imutils\feature copying imutils\feature\harris.py -> build\lib\imutils\feature copying imutils\feature\helpers.py -> build\lib\imutils\feature copying imutils\feature\rootsift.py -> build\lib\imutils\feature copying imutils\feature_init_.py -> build\lib\imutils\feature creating build\lib\imutils\face_utils copying imutils\face_utils\facealigner.py -> build\lib\imutils\face_utils copying imutils\face_utils\helpers.py -> build\lib\imutils\face_utils copying imutils\face_utils_init_.py -> build\lib\imutils\face_utils running build_scripts creating build\scripts-3.10 copying and adjusting bin\range-detector -> build\scripts-3.10 running install_lib creating C:\Python310\Lib\site-packages\imutils copying build\lib\imutils\contours.py -> C:\Python310\Lib\site-packages\imutils copying build\lib\imutils\convenience.py -> C:\Python310\Lib\site-packages\imutils copying build\lib\imutils\encodings.py -> C:\Python310\Lib\site-packages\imutils creating C:\Python310\Lib\site-packages\imutils\face_utils copying build\lib\imutils\face_utils\facealigner.py -> C:\Python310\Lib\site-packages\imutils\face_utils copying build\lib\imutils\face_utils\helpers.py -> C:\Python310\Lib\site-packages\imutils\face_utils copying build\lib\imutils\face_utils_init_.py -> C:\Python310\Lib\site-packages\imutils\face_utils creating C:\Python310\Lib\site-packages\imutils\feature copying build\lib\imutils\feature\dense.py -> C:\Python310\Lib\site-packages\imutils\feature copying build\lib\imutils\feature\factories.py -> C:\Python310\Lib\site-packages\imutils\feature copying build\lib\imutils\feature\gftt.py -> C:\Python310\Lib\site-packages\imutils\feature copying build\lib\imutils\feature\harris.py -> C:\Python310\Lib\site-packages\imutils\feature copying build\lib\imutils\feature\helpers.py -> C:\Python310\Lib\site-packages\imutils\feature copying build\lib\imutils\feature\rootsift.py -> C:\Python310\Lib\site-packages\imutils\feature copying build\lib\imutils\feature_init_.py -> C:\Python310\Lib\site-packages\imutils\feature creating C:\Python310\Lib\site-packages\imutils\io copying build\lib\imutils\io\tempfile.py -> C:\Python310\Lib\site-packages\imutils\io copying build\lib\imutils\io_init_.py -> C:\Python310\Lib\site-packages\imutils\io copying build\lib\imutils\meta.py -> C:\Python310\Lib\site-packages\imutils copying build\lib\imutils\object_detection.py -> C:\Python310\Lib\site-packages\imutils copying build\lib\imutils\paths.py -> C:\Python310\Lib\site-packages\imutils copying build\lib\imutils\perspective.py -> C:\Python310\Lib\site-packages\imutils copying build\lib\imutils\text.py -> C:\Python310\Lib\site-packages\imutils creating C:\Python310\Lib\site-packages\imutils\video copying build\lib\imutils\video\count_frames.py -> C:\Python310\Lib\site-packages\imutils\video copying build\lib\imutils\video\filevideostream.py -> C:\Python310\Lib\site-packages\imutils\video copying build\lib\imutils\video\fps.py -> C:\Python310\Lib\site-packages\imutils\video copying build\lib\imutils\video\pivideostream.py -> C:\Python310\Lib\site-packages\imutils\video copying build\lib\imutils\video\videostream.py -> C:\Python310\Lib\site-packages\imutils\video copying build\lib\imutils\video\webcamvideostream.py -> C:\Python310\Lib\site-packages\imutils\video copying build\lib\imutils\video_init_.py -> C:\Python310\Lib\site-packages\imutils\video copying build\lib\imutils_init_.py -> C:\Python310\Lib\site-packages\imutils byte-compiling C:\Python310\Lib\site-packages\imutils\contours.py to contours.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\convenience.py to convenience.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\encodings.py to encodings.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\face_utils\facealigner.py to facealigner.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\face_utils\helpers.py to helpers.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\face_utils_init_.py to init.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\feature\dense.py to dense.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\feature\factories.py to factories.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\feature\gftt.py to gftt.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\feature\harris.py to harris.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\feature\helpers.py to helpers.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\feature\rootsift.py to rootsift.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\feature_init_.py to init.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\io\tempfile.py to tempfile.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\io_init_.py to init.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\meta.py to meta.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\object_detection.py to object_detection.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\paths.py to paths.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\perspective.py to perspective.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\text.py to text.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\video\count_frames.py to count_frames.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\video\filevideostream.py to filevideostream.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\video\fps.py to fps.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\video\pivideostream.py to pivideostream.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\video\videostream.py to videostream.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\video\webcamvideostream.py to webcamvideostream.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils\video_init_.py to init.cpython-310.pyc byte-compiling C:\Python310\Lib\site-packages\imutils_init_.py to init.cpython-310.pyc running install_egg_info running egg_info creating imutils.egg-info writing imutils.egg-info\PKG-INFO writing dependency_links to imutils.egg-info\dependency_links.txt writing top-level names to imutils.egg-info\top_level.txt writing manifest file 'imutils.egg-info\SOURCES.txt' reading manifest file 'imutils.egg-info\SOURCES.txt' writing manifest file 'imutils.egg-info\SOURCES.txt' Copying imutils.egg-info to C:\Python310\Lib\site-packages\imutils-0.5.4-py3.10.egg-info running install_scripts copying build\scripts-3.10\range-detector -> C:\Python310\Scripts error: could not create 'C:\Python310\Scripts\range-detector': Permission denied [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip. error: legacy-install-failure

    × Encountered error while trying to install package. ╰─> imutils

    note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure. WARNING: Ignoring invalid distribution -p (c:\python310\lib\site-packages) WARNING: Ignoring invalid distribution -1p (c:\python310\lib\site-packages) WARNING: Ignoring invalid distribution -p (c:\python310\lib\site-packages) WARNING: Ignoring invalid distribution -0p (c:\python310\lib\site-packages) WARNING: Ignoring invalid distribution -ip (c:\python310\lib\site-packages) WARNING: Ignoring invalid distribution - (c:\python310\lib\site-packages)

    opened by Ritika-github 0
  • What's the status of this project?

    What's the status of this project?

    Last release in 2020-04, I see a lot unsolved installing issues and I miss pre-build wheels on https://pypi.org/project/fasttext/#files

    What is the future of this project or is it just dead?

    opened by return42 1
  • denpendency errors

    denpendency errors

    Hi,

    We recently conducted a study to detect build dependency errors, focusing on missing dependencies and redundant dependencies. A missing dependency (MS) is a dependency that is not declared in the build script and a redundant dependency(RD) is a dependency that is declared in the build script that is not actually used. We have detected the following dependency errors in your public projects. Could you please help us to check these dependency errors? The data format is dependency --- target. MS 0['/home/lv/WorkSpace/vmake_experiment/fastText-master/src/densematrix.h---fasttext'] 1['/home/lv/WorkSpace/vmake_experiment/fastText-master/src/vector.h---fasttext'] 2['/home/lv/WorkSpace/vmake_experiment/fastText-master/src/model.h---fasttext'] 3['/home/lv/WorkSpace/vmake_experiment/fastText-master/src/args.h---fasttext'] 4['/home/lv/WorkSpace/vmake_experiment/fastText-master/src/meter.h---fasttext'] 5['/home/lv/WorkSpace/vmake_experiment/fastText-master/src/fasttext.h---fasttext'] 6['/home/lv/WorkSpace/vmake_experiment/fastText-master/src/real.h---fasttext'] 7['/home/lv/WorkSpace/vmake_experiment/fastText-master/src/main.cc---fasttext'] 8['/home/lv/WorkSpace/vmake_experiment/fastText-master/src/matrix.h---fasttext'] 9['/home/lv/WorkSpace/vmake_experiment/fastText-master/src/utils.h---fasttext'] 10['/home/lv/WorkSpace/vmake_experiment/fastText-master/src/dictionary.h---fasttext']

    RD 0['src/utils.h---productquantizer.o'] 1['src/utils.h---quantmatrix.o'] 2['src/fasttext.cc---fasttext'] 3['src/utils.h---vector.o'] 4['src/args.h---model.o']

    opened by Meiye-lj 0
  • Program running results are abnormal

    Program running results are abnormal

    anaconda3/bin/python3.8

    import fasttext.util ft = fasttext.load_model('cc.zh.300.bin') sentence_w1=ft.get_sentence_vector('色诫');print(sentence_w1)

    [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

    opened by Chu-J 0
  • Language names of Languages supported by Fasttext

    Language names of Languages supported by Fasttext

    I am trying to find out the names of languages supported by Fasttext's LID tool, given these language codes listed here:

    af als am an ar arz as ast av az azb ba bar bcl be bg bh bn bo bpy br bs bxr ca cbk ce ceb ckb co cs cv cy da de diq dsb dty dv el eml en eo es et eu fa fi fr frr fy ga gd gl gn gom gu gv he hi hif hr hsb ht hu hy ia id ie ilo io is it ja jbo jv ka kk km kn ko krc ku kv kw ky la lb lez li lmo lo lrc lt lv mai mg mhr min mk ml mn mr mrj ms mt mwl my myv mzn nah nap nds ne new nl nn no oc or os pa pam pfl pl pms pnb ps pt qu rm ro ru rue sa sah sc scn sco sd sh si sk sl so sq sr su sv sw ta te tg th tk tl tr tt tyv ug uk ur uz vec vep vi vls vo wa war wuu xal xmf yi yo yue zh
    

    I tried to map the ISO codes to each language, but it seems non-standard, either using ISO-639-1 or ISO-639-3. Does anyone have a list of language names for these codes, or know how to find them?
    Wikipedia's list does not cover all of them either, so manual mapping too did not help.
    Thanks!

    opened by AetherPrior 1
Releases(v0.9.2)
  • v0.9.2(Apr 28, 2020)

    We are happy to announce the release of version 0.9.2.

    WebAssembly

    We are excited to release fastText bindings for WebAssembly. Classification tasks are widely used in web applications and we believe giving access to the complete fastText API from the browser will notably help our community to build nice tools. See our documentation to learn more.

    Autotune: automatic hyperparameter optimization

    Finding the best hyperparameters is crucial for building efficient models. However, searching the best hyperparameters manually is difficult. This release includes the autotune feature that allows you to find automatically the best hyperparameters for your dataset. You can find more information on how to use it here.

    Python

    fastText loves Python. In this release, we have:

    • several bug fixes for prediction functions
    • nearest neighbors and analogies for Python
    • a memory leak fix
    • website tutorials with Python examples

    The autotune feature is fully integrated with our Python API. This allows us to have a more stable autotune optimization loop from Python and to synchronize the best hyper-parameters with the _FastText model object.

    Pre-trained models tool

    We release two helper scripts:

    They can also be used directly from our Python API.

    More metrics

    When you test a trained model, you can now have more detailed results for the precision/recall metrics of a specific label or all labels.

    Paper source code

    This release contains the source code of the unsupervised multilingual alignment paper.

    Community feedback and contributions

    We want to thank our community for giving us feedback on Facebook and on GitHub.

    Source code(tar.gz)
    Source code(zip)
  • v0.9.1(Jul 4, 2019)

    We are happy to announce the release of version 0.9.1.

    New release of python module

    The main goal of this release is to merge two existing python modules: the official fastText module which was available on our github repository and the unofficial fasttext module which was available on pypi.org.

    You can find an overview of the new API here, and more insight in our blog post.

    Refactoring

    This version includes a massive rewrite of internal classes. The training and test are now split into three different classes : Model that takes care of the computational aspect, Loss that handles loss and applies gradients to the output matrix, and State that is responsible of holding the model's state inside each thread.

    That makes the code more straighforward to read but also gives a smaller memory footprint, because the data needed for loss computation is now hold only once unlike before where there was one for each thread.

    Misc

    • Compilation issues fix for recent versions of Mac OS X.
    • Better unicode handling :
      • on_unicode_error argument that helps to handle unicode issues one can face with some datasets
      • bug fix related to different behaviour of pybind11's py::str class between python2 and python3
    • script for unsupervised alignment
    • public file hosting changed from aws to fbaipublicfiles
    • we added a Code of Conduct file.

    Thank you !

    As always, we want to thank you for your help and your precious feedback which helps making this project better.

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Dec 19, 2018)

    We are happy to announce the change of the license from BSD+patents to MIT and the release of fastText 0.2.0.

    The main purpose of this release is to set a beta C++ API of the FastText class. The class now behaves as a computational library: we moved the display and some usage error handlings outside of it (mainly to main.cc and fasttext_pybind.cc). It is still compatible with older versions of the class, but some methods are now marked as deprecated and will probably be removed in the next release.

    In this respect, we also introduce the official support for python. The python binding of fastText is a client of the FastText class.

    Here is a short summary of the 104 commits since 0.1.0 :

    New :

    • Introduction of the “OneVsAll” loss function for multi-label classification, which corresponds to the sum of binary cross-entropy computed independently for each label. This new loss can be used with the -loss ova or -loss one-vs-all command line option ( 8850c51b972ed68642a15c17fbcd4dd58766291d ).
    • Computation of the precision and recall metrics for each label ( be1e597cb67c069ba9940ff241d9aad38ccd37da ).
    • Removed printing functions from FastText class ( 256032b87522cdebc4850c99b204b81b3255cb2a ).
    • Better default for number of threads ( 501b9b1e4543fd2de55e4a621a9924ce7d2b5b17 ).
    • Python support ( f10ec1faea1605d40fdb79fe472cc2204f3d584c ).
    • More tests for circleci/python ( eb9703a4a7ed0f7559d6f341cc8e5d166d5e4d88, 97fcde80ea107ca52d3d778a083564619175039c, 1de0624bfaff02d91fd265f331c07a4a0a7bb857 ).

    Bug fixes :

    • Normalize buffer vector in analogy queries.
    • Typo fixes and clarifications on website.
    • Improvements on python install issues : setup.py OS X compiler flags, pybind11 include.
    • Fix: getSubwords for EOS.
    • Fix: ETA time.
    • Fix: division by 0 in word analogy evaluation.
    • Fix for the infinite loop on ARM cpu.

    Operations :

    • We released more pre-trained vectors (92bc7d230959e2a94125fbe7d3b05257effb1111, 5bf8b4c615b6308d76ad39a5a50fa6c4174113ea ).

    Worth noting :

    • We added circleci build badges to the README.md
    • We modified the style to be in compliance with Facebook C++ style.
    • We added coverage option for Makefile and setup.py in order to build for measuring the coverage.

    Thank you fastText community!

    We want to thank you all for being a part of this community and sharing your passion with us. Some of these improvements would not have been possible without your help.

    Source code(tar.gz)
    Source code(zip)
Owner
Facebook Research
Facebook Research
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 5.7k Feb 12, 2021
Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

null 41 Jan 6, 2023
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

LancoPKU 105 Jan 3, 2023
Implement face detection, and age and gender classification, and emotion classification.

YOLO Keras Face Detection Implement Face detection, and Age and Gender Classification, and Emotion Classification. (image from wider face dataset) Ove

Chloe 10 Nov 14, 2022
Semi-supervised Representation Learning for Remote Sensing Image Classification Based on Generative Adversarial Networks

SSRL-for-image-classification Semi-supervised Representation Learning for Remote Sensing Image Classification Based on Generative Adversarial Networks

Feng 2 Nov 19, 2021
Image Classification - A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

null 0 Jan 23, 2022
Library of various Few-Shot Learning frameworks for text classification

FewShotText This repository contains code for the paper A Neural Few-Shot Text Classification Reality Check Environment setup # Create environment pyt

Thomas Dopierre 47 Jan 3, 2023
Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

null 19 Sep 29, 2022
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

Microsoft 61 Nov 14, 2022
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

mxin262 183 Jan 3, 2023
Simple-Image-Classification - Simple Image Classification Code (PyTorch)

Simple-Image-Classification Simple Image Classification Code (PyTorch) Yechan Kim This repository contains: Python3 / Pytorch code for multi-class ima

Yechan Kim 8 Oct 29, 2022
Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

A Network-Based High-Level Data Classification Algorithm Using Betweenness Centr

Esteban Vilca 3 Dec 1, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 8, 2023
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Phil Wang 4.4k Jan 3, 2023
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

null 187 Dec 26, 2022
FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification

FPGA & FreeNet Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification by Zhuo Zheng, Yanfei Zhong, Ailong M

Zhuo Zheng 92 Jan 3, 2023
Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

Albumentations Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to inc

null 11.4k Jan 9, 2023
Code for "Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search"

Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search This is an implementation for our paper Contextual Non-Loca

Tencent YouTu Research 50 Dec 3, 2022
Code for "Primitive Representation Learning for Scene Text Recognition" (CVPR 2021)

Primitive Representation Learning Network (PREN) This repository contains the code for our paper accepted by CVPR 2021 Primitive Representation Learni

Ruijie Yan 76 Jan 2, 2023