PECOS - Prediction for Enormous and Correlated Spaces

Overview

PECOS - Predictions for Enormous and Correlated Output Spaces

PyPi Latest Release License

PECOS is a versatile and modular machine learning (ML) framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval. PECOS' design is intentionally agnostic to the specific nature of the inputs and outputs as it is envisioned to be a general-purpose framework for multiple distinct applications.

Given an input, PECOS identifies a small set (10-100) of relevant outputs from amongst an extremely large (~100MM) candidate set and ranks these outputs in terms of relevance.

Features

Extreme Multi-label Ranking and Classification

  • X-Linear (pecos.xmc.xlinear): recursive linear models learning to traverse an input from the root of a hierarchical label tree to a few leaf node clusters, and return top-k relevant labels within the clusters as predictions. See more details in the PECOS paper (Yu et al., 2020).

    • fast real-time inference in C++
    • can handle 100MM output space
  • XR-Transformer (pecos.xmc.xtransformer): Transformer based XMC framework that fine-tunes pre-trained transformers recursively on multi-resolution objectives. It can be used to generate top-k relevant labels for a given instance or simply as a fine-tuning engine for task aware embeddings. See technical details in XR-Transformer paper (Zhang et al., 2021).

    • easy to extend with many pre-trained Transformer models from huggingface transformers.
    • establishes the State-of-the-art on public XMC benchmarks.
  • ANN Search with HNSW (pecos.ann.hnsw): a PECOS Approximated Nearest Neighbor (ANN) search module that implements the Hierarchical Navigable Small World Graphs (HNSW) algorithm (Malkov et al., TPAMI 2018).

    • Supports both sparse and dense input features
    • SIMD optimization for both dense/sparse distance computation
    • Supports thread-safe graph construction in parallel on multi-core shared memory machines
    • Supports thread-safe Searchers to do inference in parallel, which reduces inference overhead

Requirements and Installation

  • Python (>=3.6)
  • Pip (>=19.3)

See other dependencies in setup.py You should install PECOS in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.

Supporting Platforms

  • Ubuntu 18.04 and 20.04
  • Amazon Linux 2

Installation from Wheel

PECOS can be installed using pip as follows:

python3 -m pip install libpecos

Installation from Source

Prerequisite builder tools

  • For Ubuntu (18.04, 20.04):
sudo apt-get update && sudo apt-get install -y build-essential git python3 python3-distutils python3-venv
  • For Amazon Linux 2 Image:
sudo yum -y install python3 python3-devel python3-distutils python3-venv && sudo yum -y install groupinstall 'Development Tools' 

One needs to install at least one BLAS library to compile PECOS, e.g. OpenBLAS:

  • For Ubuntu (18.04, 20.04):
sudo apt-get install -y libopenblas-dev
  • For Amazon Linux 2 Image and AMI:
sudo amazon-linux-extras install epel -y
sudo yum install openblas-devel -y

Install and develop locally

git clone https://github.com/amzn/pecos
cd pecos
python3 -m pip install --editable ./

Quick Tour

To have a glimpse of how PECOS works, here is a quick tour of using PECOS API for the XMR problem.

Toy Example

The eXtreme Multi-label Ranking (XMR) problem is defined by two matrices

Some toy data matrices are available in the tst-data folder.

PECOS constructs a hierarchical label tree and learns linear models recursively (e.g., XR-Linear):

>>> from pecos.xmc.xlinear.model import XLinearModel
>>> from pecos.xmc import Indexer, LabelEmbeddingFactory

# Build hierarchical label tree and train a XR-Linear model
>>> label_feat = LabelEmbeddingFactory.create(Y, X)
>>> cluster_chain = Indexer.gen(label_feat)
>>> model = XLinearModel.train(X, Y, C=cluster_chain)
>>> model.save("./save-models")

After learning the model, we do prediction and evaluation

>>> from pecos.utils import smat_util
>>> Yt_pred = model.predict(Xt)
# print precision and recall at k=10
>>> print(smat_util.Metrics.generate(Yt, Yt_pred))

PECOS also offers optimized C++ implementation for fast real-time inference

>>> model = XLinearModel.load("./save-models", is_predict_only=True)
>>> for i in range(X_tst.shape[0]):
>>>   y_tst_pred = model.predict(X_tst[i], threads=1)

Citation

If you find PECOS useful, please consider citing the following paper:

Some papers from our group using PECOS:

License

Copyright (2021) Amazon.com, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • text2text model evaluation not working

    text2text model evaluation not working

    Description

    Model evaluation is not working properly to output the precision and recall

    How to Reproduce?

    I run the following line of code,

    python3 -m pecos.apps.text2text.evaluate --pred-path ./test-prediction.txt --truth-path ./test.txt --text-item-path ./output-labels.txt
    

    where, --pred-path is the path of file produced during model prediction, --truth-path is the path of test file, e.g. Out1, Out2, Out3 \t cheap door Out1, Out2 and Out3 are the line number in the the following output file
    --text-item-path ./output-labels.txt

    What have you tried to solve it?

    Error message or code output

    Traceback (most recent call last):
      File "/home/khalid/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/khalid/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/pecos/apps/text2text/evaluate.py", line 130, in <module>
        do_evaluation(args)
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/pecos/apps/text2text/evaluate.py", line 119, in do_evaluation
        Y_true = smat.csr_matrix((val_t, (row_id_t, col_id_t)), shape=(num_samples_t, len(item_dict)))
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/compressed.py", line 55, in __init__
        dtype=dtype))
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/coo.py", line 196, in __init__
        self._check()
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/coo.py", line 285, in _check
        raise ValueError('column index exceeds matrix dimensions')
    ValueError: column index exceeds matrix dimensions
    

    Environment

    • Operating system:
    • Python version:
    • PECOS version:

    (Add as much information about your environment as possible, e.g. dependencies versions.)

    bug 
    opened by Khalid-Usman 13
  • Format of yt label

    Format of yt label

    Hello,

    hope you are fine, have 2 questions about the format

    Question 1 Have one question about optimal format for label Yt. Is it preferable to have Yt as:

    (A) OneHot encoded with only one 1 per row. (B) Mutiple OneHot encode with mutiple 1 per rows (as this is the case for Xt).

    When the prediction is done, it seems only outputing only one 1 per row.

    Question 2:

    Is there any constraint by having Xt as having a mix of dense input and sparse input instead of sparse input only ?

    enhancement 
    opened by arita37 7
  • some formatting

    some formatting

    Hi, Thasnks for this.

    Just would like to confirm the format of the input

    X : CSR format x(i,k) = val Can valx be a float ? does it need to be binary or [0,1] value ?

    Y: CSR format, y(i,k) = valy . does it need to be binary ( 0 or 1) ?

    Thx

    opened by arita37 7
  • Online Inference Latency for XR-TRANSFORMER

    Online Inference Latency for XR-TRANSFORMER

    hi!

    When I use XR-TRANSFORMER for predict(per input), the online Inference lattency comes up to 400ms. this is why?

    the system I use is ubtuntu18.04, and XR-TRANSFORMER are evaluated on a Nvidia Tesla V100 GPU.

    Thanks!

    opened by xiaokening 6
  • Issue with --label-embed-type pifa_lf_concat::Z=${Z_pifa_file}

    Issue with --label-embed-type pifa_lf_concat::Z=${Z_pifa_file}

    Description

    I am trying to use ----label-embed-type parameter in the training and it produces this error. ValueError: Object arrays cannot be loaded when allow_pickle=False - Coming from np.load() function.

    I have tested loading of NPZ file for z_labels (compressed and uncompressed, both) it produces this error if allow_pickle=False I have load data by defining the allow_pickle=True for np.load() function.

    Can you please add description of this file format or can we sent this parameter as an input?

    This is the data I have after loading npz file with allow_pickle = True

    [array(['Trump', 'Bus', 'Trolly '], dtype='<U23')
     array(['Show', 'Disp'], dtype='<U20')
     array(['Recap rew'], dtype='<U24')
     array(['Core, '], dtype='<U32')
     array(['Hoe'], dtype='<U10')
     array(['Plan'], dtype='<U21')]
    

    How to Reproduce?

    Execute model training with numpy version 1.21.2

    python -m pecos.apps.text2text.train \
      --label-embed-type pifa_lf_concat::Z=${Z_pifa_file} \
      -i ${train_file} \ 
      -m ${model_folder}
    

    Error message or code output

    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/train.py", line 311, in <module>
        train(args)
      File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/train.py", line 302, in train
        workspace_folder=args.workspace_folder,
      File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/model.py", line 325, in train
        Z = smat_util.load_matrix(val)
      File "/home/jupyter/pecos_git/pecos/pecos/utils/smat_util.py", line 117, in load_matrix
        mat = np.load(src)
      File "/opt/conda/lib/python3.7/site-packages/numpy/lib/npyio.py", line 441, in load
        pickle_kwargs=pickle_kwargs)
      File "/opt/conda/lib/python3.7/site-packages/numpy/lib/format.py", line 743, in read_array
        raise ValueError("Object arrays cannot be loaded when "
    ValueError: Object arrays cannot be loaded when allow_pickle=False
    

    Environment

    • Operating system: Unix Ubuntu (on GCP)
    • Python version: 3.8
    • PECOS version: 0.1.0
    • numpy version: 1.21.2
    bug 
    opened by zusmani 6
  • Pecos killed on ranker training step

    Pecos killed on ranker training step

    Description

    The training has been killed on this training step:

    Data: Amazon-670k Model: X-Transformer

    [2022-12-01 21:38:23,019][pecos.xmc.xtransformer.model][INFO] - Start training ranker...
    [2022-12-01 21:38:24,001][pecos.xmc.base][INFO] - Training Layer 0 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
    [2022-12-01 21:39:05,191][pecos.xmc.base][INFO] - Training Layer 1 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
    [2022-12-01 21:40:24,829][pecos.xmc.base][INFO] - Training Layer 2 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
    [2022-12-01 21:43:25,293][pecos.xmc.base][INFO] - Training Layer 3 of 4 Layers in HierarchicalMLModel, neg_mining=tfn+man..
    

    Environment

    Distributor ID:	Ubuntu
    Description:	Ubuntu 18.04.6 LTS
    Release:	18.04
    Codename:	bionic
    Python 3.8.15
    libpecos~=0.4.0
    1 RTX A4500, 32 vCPU, and 250 GB RAM
    

    What could it be? Is it possible to resume training from that stage?

    bug 
    opened by celsofranssa 4
  • How to Use XR-Transformer in Text2Text App

    How to Use XR-Transformer in Text2Text App

    Description

    I want to use XR-Transformer in text2text app, following the parameters given here. But setting --params-path to this .json file raise the error:

    Traceback (most recent call last):
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/runpy.py", line 197, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/train.py", line 345, in <module>
        train(args)
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/train.py", line 328, in train
        t2t_model = Text2Text.train(
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/model.py", line 317, in train
        pred_params = pred_params.override_with_kwargs(kwargs)
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/model.py", line 126, in override_with_kwargs
        self.xlinear_params.override_with_kwargs(pred_kwargs)
    AttributeError: 'NoneType' object has no attribute 'override_with_kwargs'
    

    References

    enhancement 
    opened by lyy1994 4
  • Examples with text

    Examples with text

    Description

    Current example of X and Y only has numeric values. Could you please provide one example where X and Y are both text? Think the paper/method is targeted to solve such problems.

    enhancement 
    opened by xyan326 4
  • Add memory-mapped utilility module

    Add memory-mapped utilility module

    Issue #, if available: N/A

    Description of changes: Add memory-mapped utilility module.

    User could test with below code: Copy it into a filetest_mmap_util.cpp placed at pecos/core/util/, and run:

    gcc -lm -ldl -lstdc++ -fopenmp -std=c++14 -lgcc -lgomp -O3  -I ./ test_mmap_util.cpp
    ./a.out
    

    Output:

    Generate a Bar with data:
    ---Bar---
    ---Foo---
    foo_1: 0 1 2 3 4 5 6 7 8 9 
    foo_2: 1
    ---------
    bar: 5 5 5 5 5 
    ---------
    Save Bar into mmap file: ./bar_test_mmap.txt
    Load a new Bar from saved mmap file...
    Loaded Bar data:
    ---Bar---
    ---Foo---
    foo_1: 0 1 2 3 4 5 6 7 8 9 
    foo_2: 1
    ---------
    bar: 5 5 5 5 5 
    ---------
    
    #include <iostream>
    #include "mmap_util.hpp"
    
    using namespace pecos::mmap_util;
    
    // Nested class mmap example
    // Bar contains a Foo instance
    class Foo {
        public:
            Foo() {}
            ~Foo() { foo_1.clear(); }
    
            void init_data() {
                foo_1.resize(10, 0);
                for (int i=0; i<foo_1.size(); ++i) { foo_1[i] = i; }
                foo_2 = 1.0;
            }
    
            void print() {
                std::cout << "---Foo---" << std::endl;
                std::cout << "foo_1: ";
                for (int i=0; i<foo_1.size(); ++i) { std::cout << foo_1[i] << " "; }
                std::cout << std::endl;
                std::cout << "foo_2: " << foo_2 << std::endl;
                std::cout << "---------" << std::endl;
            }
    
            void save_to_mmap_store(MmapStore& mmap_s) const {
                foo_1.save_to_mmap_store(mmap_s);
                mmap_s.fput_one<double>(foo_2);
            }
    
            void load_from_mmap_store(MmapStore& mmap_s) {
                foo_1.load_from_mmap_store(mmap_s);
                foo_2 = mmap_s.fget_one<double>();
            }
    
        private:
            MmapableVector<int> foo_1;
            double foo_2;
    };
    
    class Bar {
        public:
            Bar() { }
            ~Bar() { bar.clear(); mmap_store.close(); }
    
            void init_data() {
                foo.init_data();
                bar.resize(5, 0);
                for (int i=0; i<bar.size(); ++i) { bar[i] = 5.0; }
            }
    
            void print() {
                std::cout << "---Bar---" << std::endl;
                foo.print();
                std::cout << "bar: ";
                for (int i=0; i<bar.size(); ++i) { std::cout << bar[i] << " "; }
                std::cout << std::endl;
                std::cout << "---------" << std::endl;
            }
    
            void save(const std::string & file_name) const {
                // Create a mmapfile for dump at the most outer layer class
                // You cannot reuse (i.e, close and reopen) mmap_store, since it may hold the data storage
                MmapStore mmap_s = MmapStore();
                mmap_s.open(file_name, "w");
    
                save_to_mmap_store(mmap_s);
    
                // Metadata dump and fp closure is automatically done at MmapStore destructor when this function ends
                // You can make it happen earlier with explicitly calling close()
                mmap_s.close();
            }
            void load(const std::string & file_name, const bool pre_load=true) {
                mmap_store.open(file_name, pre_load?"r":"r_lazy");
                load_from_mmap_store(mmap_store);
            }
    
            void save_to_mmap_store(MmapStore& mmap_s) const {
                foo.save_to_mmap_store(mmap_s);
                bar.save_to_mmap_store(mmap_s);
            }
    
            void load_from_mmap_store(MmapStore& mmap_s) {
                foo.load_from_mmap_store(mmap_s);
                bar.load_from_mmap_store(mmap_s);
            }
    
        private:
            Foo foo;
            MmapableVector<double> bar;
            // Mmap Data storage at the most outer layer class
            MmapStore mmap_store;
    };
    
    
    int main() {
        std::string f_name = "./bar_test_mmap.txt";
    
        std::cout << "Generate a Bar with data:" << std::endl;
        Bar bar;
        bar.init_data();
        bar.print();
    
        std::cout << "Save Bar into mmap file: " << f_name << std::endl;
        bar.save(f_name);
    
        std::cout << "Load a new Bar from saved mmap file..." << std::endl;
        Bar new_bar;
        new_bar.load(f_name, "r");
    
        std::cout << "Loaded Bar data:" << std::endl;
        new_bar.print();
    
        return 0;
    }
    

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by weiliw-amz 3
  • Is there at least one example showing how to use Pecos from a plain text dataset?

    Is there at least one example showing how to use Pecos from a plain text dataset?

    It has been difficult to infer how to use the PECOS properly. The usage case is splited over several README.md files and through the issues.

    Then, could you provide a toy example of an end-to-end approach (using XR-Transformer for instance)?

    Consider the following scenario: We have the training and testing samples in plain text

    #train samples:
        text: raw_text_1, labels: [L1, L7, ..., L3]
        text: raw_text_2, labels: [L8, L9]
        ...
        text: raw_text_N, labels: [L1, L7, ..., L4]
    
    #test samples:
        text: test_raw_text_1
        text: test_raw_text_2
        ...
        text: test_raw_text_M
    

    and someone has to:

    1. prepare the data to the accepted format;
    2. train the model;
    3. predict the top k labels.
    opened by celsofranssa 3
  • bug of installing from source

    bug of installing from source

    Description

    there is sonme problems when install pecos from source according to readme.md

    How to Reproduce?

    python3 -m pip install --editable ./ Obtaining file:///home/workspace/lishengchao/pecos Requirement already satisfied: scipy>=1.4.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (1.6.1) Requirement already satisfied: scikit-learn>=0.24.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (0.24.1) Requirement already satisfied: torch>=1.8.0 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (1.8.0) Collecting sentencepiece!=0.1.92,>=0.1.86 Using cached https://repo.huaweicloud.com/repository/pypi/packages/68/91/ded0f64f90abfc5413c620fc345a0aef1e7ff5addda8704cc6b3bf589c64/sentencepiece-0.1.96-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB) Requirement already satisfied: transformers>=4.1.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (4.8.2) Collecting numpy>=1.19.5 Using cached https://repo.huaweicloud.com/repository/pypi/packages/38/c0/c45c5eb0e25247d5fbb333fd0b56e570ba21cf0e3dca3abad174fb780e8c/numpy-1.22.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from scikit-learn>=0.24.1->libpecos==0.3.0) (2.1.0) Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.8/site-packages (from scikit-learn>=0.24.1->libpecos==0.3.0) (1.0.1) Requirement already satisfied: typing_extensions in /opt/conda/lib/python3.8/site-packages (from torch>=1.8.0->libpecos==0.3.0) (3.7.4.3) Collecting huggingface-hub==0.0.12 Downloading https://repo.huaweicloud.com/repository/pypi/packages/2f/ee/97e253668fda9b17e968b3f97b2f8e53aa0127e8807d24a547687423fe0b/huggingface_hub-0.0.12-py3-none-any.whl (37 kB) Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (2021.4.4) Requirement already satisfied: sacremoses in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (0.0.45) Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (2.24.0) Requirement already satisfied: packaging in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (21.3) Requirement already satisfied: tokenizers<0.11,>=0.10.1 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (0.10.3) Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (4.62.3) Requirement already satisfied: filelock in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (3.0.12) Requirement already satisfied: pyyaml in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (5.4.1) Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from sacremoses->transformers>=4.1.1->libpecos==0.3.0) (1.15.0) Requirement already satisfied: click in /opt/conda/lib/python3.8/site-packages (from sacremoses->transformers>=4.1.1->libpecos==0.3.0) (7.1.2) Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (2.10) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (1.25.11) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (2020.12.5) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging->transformers>=4.1.1->libpecos==0.3.0) (3.0.6) Installing collected packages: sentencepiece, numpy, libpecos, huggingface-hub Attempting uninstall: numpy Found existing installation: numpy 1.19.2 Uninstalling numpy-1.19.2: Successfully uninstalled numpy-1.19.2 Running setup.py develop for libpecos ERROR: Command errored out with exit status 1: command: /opt/conda/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/workspace/lishengchao/pecos/setup.py'"'"'; file='"'"'/home/workspace/lishengchao/pecos/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps cwd: /home/workspace/lishengchao/pecos/ Complete output (28 lines): Set version to 0.3.0 running develop running egg_info creating libpecos.egg-info writing libpecos.egg-info/PKG-INFO writing dependency_links to libpecos.egg-info/dependency_links.txt writing requirements to libpecos.egg-info/requires.txt writing top-level names to libpecos.egg-info/top_level.txt writing manifest file 'libpecos.egg-info/SOURCES.txt' reading manifest file 'libpecos.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.c' under directory 'pecos/core' writing manifest file 'libpecos.egg-info/SOURCES.txt' running build_ext building 'pecos.core.libpecos_float32' extension INFO: C compiler: gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

    creating build
    creating build/temp.linux-x86_64-3.8
    creating build/temp.linux-x86_64-3.8/pecos
    creating build/temp.linux-x86_64-3.8/pecos/core
    INFO: compile options: '-Ipecos/core -I/usr/include/ -I/usr/local/include -I/opt/conda/include/python3.8 -c'
    extra options: '-fopenmp -O3 -std=c++14'
    INFO: gcc: pecos/core/libpecos.cpp
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    /tmp/ccNJQf5g.s: Assembler messages:
    /tmp/ccNJQf5g.s: Fatal error: can't close build/temp.linux-x86_64-3.8/pecos/core/libpecos.o: Input/output error
    error: Command "gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ipecos/core -I/usr/include/ -I/usr/local/include -I/opt/conda/include/python3.8 -c pecos/core/libpecos.cpp -o build/temp.linux-x86_64-3.8/pecos/core/libpecos.o -fopenmp -O3 -std=c++14" failed with exit status 1
    ----------------------------------------
    

    ERROR: Command errored out with exit status 1: /opt/conda/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/workspace/lishengchao/pecos/setup.py'"'"'; file='"'"'/home/workspace/lishengchao/pecos/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.

    Environment

    • Ubtuntu 18.04
    • Python 3.8
    • PECOS 0.3.0

    (Add as much information about your environment as possible, e.g. dependencies versions.)

    bug 
    opened by xiaokening 3
  • Memory-mapped XLinear Model

    Memory-mapped XLinear Model

    Issue #, if available: N/A

    Description of changes:

    • Memory-mapped PECOS XLinear model
      • Greatly reduce loading time.
      • Ideal for large models that user want to quickly try a few inferences without waiting for loading full model into memory.
      • Also capable for large model inference that could not be stored in memory.

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by weiliw-amz 0
Releases(v0.4.0)
  • v0.4.0(Aug 9, 2022)

    Highlights

    • Enable distributed XR-Transformer fine-tuning
    • Enable the capability of large-batch prediction for ANN HNSW
    • Release interactive hands-on tutorial materials

    Enhancements

    • Unit test for sorted_csc, sorted_csr by @chepingt in https://github.com/amzn/pecos/pull/139
    • Unit test for csr_row_softmax by @houyuhan98 in https://github.com/amzn/pecos/pull/141
    • Bump numpy from 1.21.0 to 1.22.0 by @dependabot in https://github.com/amzn/pecos/pull/145 https://github.com/amzn/pecos/pull/146
    • Release the materials for the PECOS hands-on tutorial in KDD 2022 by @hallogameboy in https://github.com/amzn/pecos/pull/153 https://github.com/amzn/pecos/pull/154 https://github.com/amzn/pecos/pull/161
    • Enable the capability of large-batch prediction for HNSW by @OctoberChang in https://github.com/amzn/pecos/pull/156
    • Distributed XR-Transformer fine-tuning by @jiong-zhang in https://github.com/amzn/pecos/pull/144 https://github.com/amzn/pecos/pull/162

    Bug Fixes

    • Fix argument-passing issue in smat_util.sorted_csc by @jiong-zhang in https://github.com/amzn/pecos/pull/134
    • Fix indptr overflow issue in block_diag_csr() by @OctoberChang in https://github.com/amzn/pecos/pull/136
    • Fix the yum group install command in README by @hallogameboy in https://github.com/amzn/pecos/pull/138
    • Change file names for windows compatibility by @YangyiLi001 in https://github.com/amzn/pecos/pull/143
    • Avoid triggering CodeQL on push for Dependabot branches by @weiliw-amz in https://github.com/amzn/pecos/pull/148
    • Fix Pypi release version error by @weiliw-amz in https://github.com/amzn/pecos/pull/163

    Deprecation

    • Deprecate imbalanced hierarchical K-means from clustering and semantic indexing by @hallogameboy in https://github.com/amzn/pecos/pull/151

    New Contributors

    • @chepingt made their first contribution in https://github.com/amzn/pecos/pull/139
    • @houyuhan98 made their first contribution in https://github.com/amzn/pecos/pull/141
    • @YangyiLi001 made their first contribution in https://github.com/amzn/pecos/pull/143
    • @xiusic made their first contribution in https://github.com/amzn/pecos/pull/147

    Full Changelog: https://github.com/amzn/pecos/compare/v0.3.0...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Apr 1, 2022)

    Highlights

    • Enable distributed training for XLinear
    • Enable PECOS for aarch64(arm64) CPU Architecture
    • Enhance pecos.ann.hnsw with Function Multi-Versioning (FMV) technique to automatically select the best supported SIMD instructions (SSE, AVX2, AVX512) at runtime
    • Reduce CPU memory usage in pecos.xmc.xtransformer training

    Enhancements

    • Add distilbert model. by @mo-fu in https://github.com/amzn/pecos/pull/97
    • add CNAME by @jiong-zhang in https://github.com/amzn/pecos/pull/104
    • Bump numpy from 1.20.3 to 1.21.0 in /examples/qp2q by @dependabot in https://github.com/amzn/pecos/pull/110
    • enable Function Multi-Versioning (FMV) to support AVX512 by @rofuyu in https://github.com/amzn/pecos/pull/111
    • Modify supported Python version by @weiliw-amz in https://github.com/amzn/pecos/pull/113
    • Enabling PECOS for aarch64(arm64) CPU Architecture by @weiliw-amz in https://github.com/amzn/pecos/pull/114
    • Update OpenBLAS Version for x86 Wheel Build by @weiliw-amz in https://github.com/amzn/pecos/pull/117
    • SIMD Functions for aarch64(ARM64) by @weiliw-amz in https://github.com/amzn/pecos/pull/115
    • Add profile_util module by @weiliw-amz in https://github.com/amzn/pecos/pull/121
    • Fix FMV setup link flag and add test wheel CI by @weiliw-amz in https://github.com/amzn/pecos/pull/119
    • Fix xlinear.reconstruct_model; Add PII embedding by @weiliw-amz in https://github.com/amzn/pecos/pull/120
    • Add Distributed PECOS XLinear Modules by @weiliw-amz in https://github.com/amzn/pecos/pull/123
    • Add distributed PECOS README by @weiliw-amz in https://github.com/amzn/pecos/pull/127
    • update HNSW README and save/load in Python API by @OctoberChang in https://github.com/amzn/pecos/pull/129
    • Improve XR-Transformer memory efficiency by @jiong-zhang in https://github.com/amzn/pecos/pull/128

    Bug Fixes

    • properly set Text2Text prediction argument by @OctoberChang in https://github.com/amzn/pecos/pull/101
    • Fix HiearchicalMLModel pred-params initialization and add bugs by @weiliw-amz in https://github.com/amzn/pecos/pull/103
    • minor bug fix in XR-Transformer exp script by @jiong-zhang in https://github.com/amzn/pecos/pull/106
    • fixed multithreading bugs in py hierarchical kmeans by @OctoberChang in https://github.com/amzn/pecos/pull/108
    • set pytest of hierarchical kmeans with single thread by @OctoberChang in https://github.com/amzn/pecos/pull/109
    • Fix relative path in distributed README by @weiliw-amz in https://github.com/amzn/pecos/pull/130

    Experiment Codes for Publications

    • add overlap-clustering (Liu et al.) in NeurIPS21 by @xuanqing94 in https://github.com/amzn/pecos/pull/98
    • add MACLR codes by @xyh97 in https://github.com/amzn/pecos/pull/100
    • update experiment code for pecos jmlr paper by @OctoberChang in https://github.com/amzn/pecos/pull/107
    • update Philip's experiment code into example folder by @OctoberChang in https://github.com/amzn/pecos/pull/118

    New Contributors

    • @mo-fu made their first contribution in https://github.com/amzn/pecos/pull/97
    • @xuanqing94 made their first contribution in https://github.com/amzn/pecos/pull/98
    • @xyh97 made their first contribution in https://github.com/amzn/pecos/pull/100
    • @dependabot made their first contribution in https://github.com/amzn/pecos/pull/110

    Full Changelog: https://github.com/amzn/pecos/compare/v0.2.3...v0.3.0

    Source code(tar.gz)
    Source code(zip)
  • v0.2.3(Nov 15, 2021)

  • v0.2.2(Nov 4, 2021)

  • v0.2.1(Oct 27, 2021)

    Highlights

    • Remove support of Ubuntu 16.04
    • Implemented XR-Transformer
    • Enabled HNSW functionality
    • Enabled cost-sensitive learning in PECOS

    Enhancements

    ANN HNSW

    • Initial implementation of HNSW in C++ with single-thread [#44] (@OctoberChang)
    • Refactor HNSW in C++ to support sparse/dense features and multi-threading [#49] (@rofuyu)
    • Initial implementation of HNSW Python interface [#53] (@OctoberChang)
    • Refactor HNSW python API and readme markdown [#63] (@OctoberChang)
    • Refactor HNSW C++ to reuse priority queue for different inference calls within the same Searcher [#65] (@rofuyu)
    • Enable HNSW save/load functionality [#71] (@OctoberChang)
    • Add serialization version in HNSW save/load [#77] (@rofuyu)
    • Enable HNSW python command line interface [#79] (@OctoberChang)

    Cost-sensitive Learning

    • Enable Cost-Sensitive Learning via XLinear API/CLI [#64] (@jiong-zhang)
    • Enable cost sensitive for text2text CLI [#75] (@jiong-zhang)

    XR-Transformer [#27, #64] (@jiong-zhang)

    • Refactor pecos.xmc.xtransformer and enable end2end XR-Transformer training
    • CLI tool for generating embeddings pecos.xmc.xtransformer.encode
    • Faster transformer text tokenizers using huggingface's C implementation
    • Allow training XR-Transformer without numerical features.

    Better control over parameters for XLinear, XTransformer and Text2text [#64, #78, #80] (@jiong-zhang)

    • Enable advanced control of parameters via JSON input file
    • Add utility tool to generate parameter skeleton for further modification

    Other new functionalities

    • Added support for predicting on select outputs [#37, #43, #47] (@bhl00)
    • Added new primal solver L2R_L2LOSS_SVC_PRIMAL for XLinear [#67] (@yuhchenlin)
    • Add Makefile for easy format, install, clean and unittest. [#12] (@weiliw-amz)

    Bug Fixes

    • (#17) Fixed issues with github information obtaining when installing from .zip. [#21, #29] (@weiliw-amz)
    • (#42) Fixed transformer training issue on single GPU [#14] (@jiong-zhang)
    • Removed PECOS source-installation dependency on NumPy BLAS library. [#81] (@weili-amz)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Apr 26, 2021)

Owner
Amazon
Amazon
Aligning Latent and Image Spaces to Connect the Unconnectable

About This repo contains the official implementation of the Aligning Latent and Image Spaces to Connect the Unconnectable paper. It is a GAN model whi

Ivan Skorokhodov 203 Jan 3, 2023
Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Diverse Image Captioning with Context-Object Split Latent Spaces This repository is the PyTorch implementation of the paper: Diverse Image Captioning

Visual Inference Lab @TU Darmstadt 34 Nov 21, 2022
Official pytorch implementation of DeformSyncNet: Deformation Transfer via Synchronized Shape Deformation Spaces

DeformSyncNet: Deformation Transfer via Synchronized Shape Deformation Spaces Minhyuk Sung*, Zhenyu Jiang*, Panos Achlioptas, Niloy J. Mitra, Leonidas

Zhenyu Jiang 21 Aug 30, 2022
The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Habitat-Matterport 3D Dataset (HM3D) The Habitat-Matterport 3D Research Dataset is the largest-ever dataset of 3D indoor spaces. It consists of 1,000

Meta Research 62 Dec 27, 2022
Price-Prediction-For-a-Dream-Home - A machine learning based linear regression trained model for house price prediction.

Price-Prediction-For-a-Dream-Home ROADMAP TO THIS LINEAR REGRESSION BASED HOUSE PRICE PREDICTION PREDICTION MODEL Import all the dependencies of the p

DIKSHA DESWAL 1 Dec 29, 2021
Doge-Prediction - Coding Club prediction ig

Doge-Prediction Coding Club prediction ig Basically: Create an application that

null 1 Jan 10, 2022
Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

Robotics and Perception Group 69 Dec 26, 2022
Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.

Evidential Deep Learning for Guided Molecular Property Prediction and Discovery Ava Soleimany*, Alexander Amini*, Samuel Goldman*, Daniela Rus, Sangee

Alexander Amini 75 Dec 15, 2022
A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

ParallelFold Author: Bozitao Zhong This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (p

Bozitao Zhong 77 Dec 22, 2022
In this project I played with mlflow, streamlit and fastapi to create a training and prediction app on digits

Fastapi + MLflow + streamlit Setup env. I hope I covered all. pip install -r requirements.txt Start app Go in the root dir and run these Streamlit str

null 76 Nov 23, 2022
Image Classification - A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

null 0 Jan 23, 2022
Age and Gender prediction using Keras

cnn_age_gender Age and Gender prediction using Keras Dataset example : Description : UTKFace dataset is a large-scale face dataset with long age span

XN3UR0N 58 May 3, 2022
[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"

FiGNN for CTR prediction The code and data for our paper in CIKM2019: Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Predicti

Big Data and Multi-modal Computing Group, CRIPAC 75 Dec 30, 2022
git《Joint Entity and Relation Extraction with Set Prediction Networks》(2020) GitHub:

Joint Entity and Relation Extraction with Set Prediction Networks Source code for Joint Entity and Relation Extraction with Set Prediction Networks. W

null 130 Dec 13, 2022
:boar: :bear: Deep Learning based Python Library for Stock Market Prediction and Modelling

bulbea "Deep Learning based Python Library for Stock Market Prediction and Modelling." Table of Contents Installation Usage Documentation Dependencies

Achilles Rasquinha 1.8k Jan 5, 2023
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

TalkNet 2 [WIP] TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Predictio

Rishikesh (ऋषिकेश) 69 Dec 17, 2022
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

This is a release of our VIMPAC paper to illustrate the implementations. The pretrained checkpoints and scripts will be soon open-sourced in HuggingFace transformers.

Hao Tan 74 Dec 3, 2022
A selection of State Of The Art research papers (and code) on human locomotion (pose + trajectory) prediction (forecasting)

A selection of State Of The Art research papers (and code) on human trajectory prediction (forecasting). Papers marked with [W] are workshop papers.

Karttikeya Manglam 40 Nov 18, 2022
An end-to-end implementation of intent prediction with Metaflow and other cool tools

You Don't Need a Bigger Boat An end-to-end (Metaflow-based) implementation of an intent prediction flow for kids who can't MLOps good and wanna learn

Jacopo Tagliabue 614 Dec 31, 2022