TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

Overview

TensorFlow Decision Forests

TensorFlow

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models and supports classification, regression and ranking.

TF-DF is a TensorFlow wrapper around the Yggdrasil Decision Forests C++ libraries. Models trained with TF-DF are compatible with Yggdrasil Decision Forests' models, and vice versa.

Usage example

A minimal end-to-end run looks as follow:

import tensorflow_decision_forests as tfdf
import pandas as pd

# Load the dataset in a Pandas dataframe.
train_df = pd.read_csv("project/train.csv")
test_df = pd.read_csv("project/test.csv")

# Convert the dataset into a TensorFlow dataset.
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="my_label")
test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_df, label="my_label")

# Train the model
model = tfdf.keras.RandomForestModel()
model.fit(train_ds)

# Look at the model.
model.summary()

# Evaluate the model.
model.evaluate(test_ds)

# Export to a TensorFlow SavedModel.
# Note: the model is compatible with Yggdrasil Decision Forests.
model.save("project/model")

Documentation & Resources

The following resources are available:

Installation

To install TensorFlow Decision Forests, run:

pip3 install tensorflow_decision_forests --upgrade

See the installation page for more details, troubleshooting and alternative installation solutions.

Contributing

Contributions to TensorFlow Decision Forests and Yggdrasil Decision Forests are welcome. If you want to contribute, make sure to review the developer manual and contribution guidelines.

Credits

TensorFlow Decision Forests was developed by:

  • Mathieu Guillame-Bert (gbm AT google DOT com)
  • Jan Pfeifer (janpf AT google DOT com)
  • Sebastian Bruch (sebastian AT bruch DOT io)
  • Arvind Srinivasan (arvnd AT google DOT com)

License

Apache License 2.0

Comments
  • pip install does not work on Mac

    pip install does not work on Mac

    Hey there,

    First of all, congratulations for your effort, this is a great initiative!

    I am raising this issue because I have faced a problem with installation. I have created a Python 3.8.6 virtual environment on my Mac and installed tensorflow 2.5.0 successfully. When I ran the installation command for the "Tensorflow Decision Forests" package, pip3 install tensorflow_decision_forests --upgrade

    I got:

    ERROR: Could not find a version that satisfies the requirement tensorflow_decision_forests (from versions: none) ERROR: No matching distribution found for tensorflow_decision_forests

    It's a bit confusing because the installation command on PyPi (I guess this is the right one) contains dashes ,instead of underscores, in the package name.

    Any ideas?

    Thanks a lot

    opened by erwtokritos 37
  • Getting error at end of training: AbstractFeatureResourceE does not exist. [Op:SimpleMLModelTrainer]

    Getting error at end of training: AbstractFeatureResourceE does not exist. [Op:SimpleMLModelTrainer]

    I am getting the following error when I try a simple model.

    csv_feature_columns =  ['weekday_weekend'] + weather_columns + building_columns + schedules_columns + encoded_time_columns + ["total_site_electricity_kwh"] 
    
    train_df = pd.read_csv(timeseries_file_path,usecols=csv_feature_columns,nrows=10000)
    
    train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="total_site_electricity_kwh")
    
    model = tfdf.keras.RandomForestModel()
    model.fit(train_ds)
    
    
    157/157 [==============================] - 6s 18ms/step
    ---------------------------------------------------------------------------
    NotFoundError                             Traceback (most recent call last)
    <ipython-input-6-ce1e05e4d2c8> in <module>
          1 # Train a Random Forest model.
          2 model = tfdf.keras.RandomForestModel()
    ----> 3 model.fit(train_ds)
          4 
    
    ~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py in fit(self, x, y, callbacks, **kwargs)
        743 
        744     history = super(CoreModel, self).fit(
    --> 745         x=x, y=y, epochs=1, callbacks=callbacks, **kwargs)
        746 
        747     self._build(x)
    
    ~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
       1227           epoch_logs.update(val_logs)
       1228 
    -> 1229         callbacks.on_epoch_end(epoch, epoch_logs)
       1230         training_logs = epoch_logs
       1231         if self.stop_training:
    
    ~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py in on_epoch_end(self, epoch, logs)
        433     logs = self._process_logs(logs)
        434     for callback in self.callbacks:
    --> 435       callback.on_epoch_end(epoch, logs)
        436 
        437   def on_train_batch_begin(self, batch, logs=None):
    
    ~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py in on_epoch_end(***failed resolving arguments***)
        930     del logs
        931     if epoch == 0:
    --> 932       self._model._train_model()  # pylint:disable=protected-access
        933 
        934 
    
    ~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py in _train_model(self)
        864         guide=guide,
        865         training_config=self._advanced_arguments.yggdrasil_training_config,
    --> 866         deployment_config=self._advanced_arguments.yggdrasil_deployment_config,
        867     )
        868 
    
    ~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/tensorflow/core.py in train(input_ids, label_id, model_id, learner, task, generic_hparms, ranking_group, training_config, deployment_config, guide, model_dir, keep_model_in_resource)
        503       training_config=training_config.SerializeToString(),
        504       deployment_config=deployment_config.SerializeToString(),
    --> 505       guide=guide.SerializeToString())
        506 
        507 
    
    ~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow/python/util/tf_export.py in wrapper(*args, **kwargs)
        402           'Please pass these args as kwargs instead.'
        403           .format(f=f.__name__, kwargs=f_argspec.args))
    --> 404     return f(**kwargs)
        405 
        406   return tf_decorator.make_decorator(f, wrapper, decorator_argspec=f_argspec)
    
    ~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/tensorflow/ops/training/op.py in simple_ml_model_trainer(feature_ids, label_id, weight_id, model_id, model_dir, learner, hparams, task, training_config, deployment_config, guide, name)
        510       return _result
        511     except _core._NotOkStatusException as e:
    --> 512       _ops.raise_from_not_ok_status(e, name)
        513     except _core._FallbackException:
        514       pass
    
    ~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
       6895   message = e.message + (" name: " + name if name is not None else "")
       6896   # pylint: disable=protected-access
    -> 6897   six.raise_from(core._status_to_exception(e.code, message), None)
       6898   # pylint: enable=protected-access
       6899 
    
    ~/.conda/envs/tensorflow25/lib/python3.7/site-packages/six.py in raise_from(value, from_value)
    
    NotFoundError: Resource decision_forests/ 12-in/N27tensorflow_decision_forests3ops23AbstractFeatureResourceE does not exist. [Op:SimpleMLModelTrainer]
    
    opened by sibyjackgrove 16
  • AssertionError: Exception encountered when calling layer

    AssertionError: Exception encountered when calling layer "gradient_boosted_trees_model" (type GradientBoostedTreesModel).

    When trying to get a prediction I am getting the error in the title, it also gives the following:

    in user code:
    
        File "/home/laner107/.local/lib/python3.8/site-packages/tensorflow_decision_forests/keras/core.py", line 791, in call  *
            normalized_inputs = self._build_normalized_inputs(inputs)
        File "/home/laner107/.local/lib/python3.8/site-packages/tensorflow_decision_forests/keras/core.py", line 747, in _build_normalized_inputs  *
            assert len(self._semantics) == 1
    
        AssertionError: 
    
    
    Call arguments received:
      • inputs=tf.Tensor(shape=(14,), dtype=float32)
      • training=False
    
    

    The following is where I call the prediction:

      def predict_to_data(self):
            testing_data = test_preprocess()
            testing_data = np.array(testing_data)
            predicitions = self.model(testing_data[0])
    

    here is what the data im passing in looks like:

    [0.484375   0.83007665 0.56508876 0.46099291 0.52793453 0.75438596
     0.52066116 0.7826087  0.         0.65852121 0.40425532 0.58974359
     0.69047619 0.37058824]
    

    and here is the architecture for the model:

    def create_single_model(self):
            input_features = tf.keras.Input(shape=(self.num_features,))
    
            # bootstrap_size_ratio: Number of examples used to train each trees; expressed as a ratio of the training dataset size. Default: 1.0.
            rf_model_1 = tfdf.keras.GradientBoostedTreesModel(
                verbose=0,
                task=tfdf.keras.Task.CLASSIFICATION,
                hyperparameter_template="benchmark_rank1@v1",
                num_trees=self.num_of_trees,
            )
    
            model = tf.keras.models.Model(input_features, rf_model_1(input_features))
    
            return model
    

    For now I was just going to generate one prediction to see what the output was, eventually though I plan on doing it in batches just have to figure that out, any idea why this error is occurring?

    All of this is done after the GradientBoostedTreesModel is trained(fit) and evaluated using validation data.

    opened by laneciar 15
  • Cannot serve using precompiled tf-serving. Error: `GLIBC_2.33' not found

    Cannot serve using precompiled tf-serving. Error: `GLIBC_2.33' not found

    Hi!

    I am trying to serve the test tf-df example decision-forests/examples/minimal.py (aptly named tf-df-example below) using the precompiled tf-serving. Unfortunately, I am running into numerous GLIBC_X.XX not found errors like this:

    /usr/bin/tensorflow_model_server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /usr/bin/tensorflow_model_server)
    

    Best guess is that the issue has to do with the difference between the system that precompiled tensorflow_model_server_linux.zip and what I am in pulling as the base: tensorflow/serving:2.9.1. I have tried others, but ultimately this same GLIBC_X.XX not found error pops up and I'm not sure where to go from here (aside from trying to compile myself).

    Any thoughts or suggestions would be greatly appreciated. Thanks!

    Here is the set up.

    1. Dockerfile:
    FROM tensorflow/serving:2.9.1 as base
    
    # Install curl
    RUN apt-get update && apt-get install -y --no-install-recommends \
        curl unzip \
        && \
        apt-get clean && \
        rm -rf /var/lib/apt/lists/*
    
    RUN curl -LJO "https://github.com/tensorflow/decision-forests/releases/download/serving-0.2.6/tensorflow_model_server_linux.zip"
    RUN unzip -o tensorflow_model_server_linux.zip -d /usr/bin/
    
    COPY tf-df_serving_entrypoint.sh /usr/bin/tf-df_serving_entrypoint.sh
    COPY /models/tf-df-example/ /tensorflow/models/tf-df-example/
    
    WORKDIR /tensorflow/
    ENTRYPOINT ["/usr/bin/tf-df_serving_entrypoint.sh"]
    
    1. tf-df_serving_entrypoint.sh:
    # Using prebuild binary set in dockerfile
    TFSERVING="/usr/bin/tensorflow_model_server"
    
    # Configure the model path and name.
    MODEL_PATH=/tensorflow/models/tf-df-example/
    MODEL_NAME=tf-df-example
    
    # Start a TF Serving server
    ${TFSERVING} \
        --rest_api_port=8501 \
        --model_name=${MODEL_NAME} \
        --model_base_path=${MODEL_PATH}
    
    1. To build and run:
    docker build . -t tfdf/serving
    docker run -t --rm -p 8501:8501 tfdf/serving 
    
    bug 
    opened by SpenceLunderman 13
  • INVALID_ARGUMENT: No defined default loss for this combination of label type and task

    INVALID_ARGUMENT: No defined default loss for this combination of label type and task

    I'm trying to use GradientBoostedTreesModel in a TFX pipeline, the code is roughly as follows:

    model = tfdf.keras.GradientBoostedTreesModel(
            task=tfdf.keras.Task.CLASSIFICATION,
            num_trees=200,
            max_depth=6,
            verbose=True,
            hyperparameter_template="better_default",
            name="classifier",
        )
    model.compile(metrics=[tf.keras.metrics.AUC(), "accuracy"])
    model.fit(_input_fn(fn_args.train_files, fn_args.schema_path))
    

    This unfortunately gives me an INVALID_ARGUMENT: No defined default loss for this combination of label type and task exception and fails the model training.

    Definition of _input_fn is as follows:

    def _input_fn(...):
            tf.data.TFRecordDataset(
                tf.data.Dataset.list_files(files), compression_type="GZIP"
            )
            .batch(1024)
            .map(
                lambda batch: tf.io.parse_example(batch, specs),
                num_parallel_calls=tf.data.AUTOTUNE,
            )
            .map(lambda batch: (batch, batch.pop(FeatureManager.LABEL_KEY)))
            .cache()
            .prefetch(tf.data.AUTOTUNE)
    

    Which basically parses the schema into feature specs, parses the batch of TF-examples and finally maps them to a tuple of (Dict[feature_name, Tensor], Tensor), results is like this:

    <PrefetchDataset 
     element_spec=(
       {'feature1': TensorSpec(shape=(None, 1), dtype=tf.float32, name=None), 'feature2': ...}, 
       TensorSpec(shape=(None, 1), dtype=tf.int64, name=None)
      )
    >
    

    Labels can be 0 or 1 and the task is a binary classification task.

    Any idea what I might be doing wrong here?

    Mac OS Monterey, tfdv 0.2.4, python 3.8, tfx 1.7

    opened by AlirezaSadeghi 12
  • decision-forests 1.0.1

    decision-forests 1.0.1

    Hi, thanks for releasing version TF-DF v1.0.1. are there plans to re-add support for osx as well? I only see it for linux here: https://pypi.org/project/tensorflow-decision-forests/#files

    opened by Arnold1 10
  • Can I load and use trained tfdf model in Java?

    Can I load and use trained tfdf model in Java?

    Hi I trained my tfdf model in python and want to use it in java for production. For conventional NN model, we can load the model from SavedModelBundle and get prediction.

    try (SavedModelBundle b = SavedModelBundle.load("/tmp/model", "serve")) {
    
            // create the session from the Bundle
            Session sess = b.session();
            // create an input Tensor, value = 2.0f
            Tensor x = Tensor.create(
                new long[] {NUM_PREDICTIONS}, 
                FloatBuffer.wrap( new float[] {2.0f} ) 
            );
            
            // run the model
            float[] y = sess.runner()
                .feed("x", x)
                .fetch("y")
                .run()
                .get(0)
                .copyTo(new float[NUM_PREDICTIONS]);
    
            // print out the result.
            System.out.println(y[0]);
        }                
    

    I'm currently trying to use my tfdf model and wondering if current tfdf support loading and inference in Java? Will the model's graph and useful info be loaded? I'm still trying to load it and wondering if anyone has clue? Thank you so much!

    question 
    opened by AudreyW0201 9
  • I run example ,but  got error

    I run example ,but got error

    Traceback (most recent call last): File "mydf.py", line 29, in model.fit(x=train_ds) File "/home/wanghaikuan/anaconda3/envs/python37/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py", line 1535, in fit class_weight=class_weight) File "/home/wanghaikuan/anaconda3/envs/python37/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py", line 1668, in _fit_implementation iterator) File "/home/wanghaikuan/anaconda3/envs/python37/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py", line 587, in _method_wrapper result = method(self, *args, **kwargs) File "/home/wanghaikuan/anaconda3/envs/python37/lib/python3.7/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "/home/wanghaikuan/anaconda3/envs/python37/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py", line 1554, in _consumes_training_examples_until_eof num_examples += self.train_step(data) File "/home/wanghaikuan/anaconda3/envs/python37/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py", line 1027, in train_step return self.collect_data_step(data, is_training_example=True) File "/home/wanghaikuan/anaconda3/envs/python37/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py", line 1236, in collect_data_step if not self._is_trained: tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: Using a symbolic tf.Tensor as a Python bool is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.

    -------------------------------------------------------- my code is : import numpy as np import pandas as pd import tensorflow as tf import tensorflow_decision_forests as tfdf

    print("Found TF-DF v" + tfdf.version)

    dataset_path = tf.keras.utils.get_file( "adult.csv", "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/" "main/yggdrasil_decision_forests/test_data/dataset/adult.csv")

    dataset_df = pd.read_csv(dataset_path) # "df" for Pandas's DataFrame.

    print("First 3 examples:") print(dataset_df.head(3))

    test_indices = np.random.rand(len(dataset_df)) < 0.30 test_ds_pd = dataset_df[test_indices] train_ds_pd = dataset_df[~test_indices] print(f"{len(train_ds_pd)} examples in training" f", {len(test_ds_pd)} examples for testing.")

    train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_ds_pd, label="income") test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_ds_pd, label="income")

    model = tfdf.keras.RandomForestModel(verbose=2) model.fit(x=train_ds)

    i need you help, thanks

    opened by whk6688 7
  • Checkpointing models during training

    Checkpointing models during training

    It seems the Keras ModelCheckpoint call back doesn't work with TFDF. Is there an alternate way to create checkpoints during training? I am training on a data set with tens of millions of samples and it takes several hours to train. I want to save the progress so that it doesn't need to retrain from scratch in case training crashes.

    enhancement 
    opened by sibyjackgrove 7
  • Tensorflow decision forests after  update to tf 2.6.0

    Tensorflow decision forests after update to tf 2.6.0

    There is a problem with Tensorflow_decision_forests after updating to version 2.6.0

    here is the gist https://colab.research.google.com/gist/lukebor/70f7abd84d547bf39c4a8b47394e7017/beginner_colab.ipynb

    I have used tensorflow beginner tutorial and upgraded the tf. If there is other way to import tfdf please let me know

    bug 
    opened by lukebor 6
  • Shape error when using model.evaluate and model.fit(validation_data=validation_ds)

    Shape error when using model.evaluate and model.fit(validation_data=validation_ds)

    Dear authors,

    I used tfdf.pd_dataframe_to_tf_dataset for train and test set respectively after making sure that both train and test had all 4 classes (single label for each data point).

    I found that labels in two sets were integer encoded ([0 1 2 3]). I defined:

    train = tfdf.keras.pd_dataframe_to_tf_dataset(df_train, label=label_column_name)
    test = tfdf.keras.pd_dataframe_to_tf_dataset(df_test, label=label_column_name)
    model = RandomForestModel(num_trees=5)
    model.fit(train, validation_data=test)
    

    It raised error: ValueError: Shapes (None, 4) and (None, 1) are incompatible Then I move to this code:

    model.fit(train)
    model.evaluate(test)
    

    It raised error: ValueError: Shapes (None, 4) and (None, 1) are incompatible Then, I checked:

    pred = model.predict(test)
    print(pred[0])
    print(np.unique(pred))
    

    Output:

    [0. 1. 0. 0.]
    [0.  0.2 0.4 0.6 0.8 1. ]
    

    Please help me to fix this error. Thank you so much.

    opened by mainguyenanhvu 6
  • Shell classes

    Shell classes

    Initial folder structure for operator definitions. Defined:

    • base Operator and WindowOperator classes
    • example AssignOperator and SimpleMovingAverage shell classes

    For simplicity at this point, and given we are going to be using pandas for the MVP, we decided to create these aliases:

    • Interval as an alias of pd.Timedelta
    • Sampling as an alias of pd.MultiIndex (with the restriction of the last level of it being a DatetimeIndex)
    • EventSequence as an alias of pd.DataFrame
    opened by ianspektor 1
  • changes to protos

    changes to protos

    This PR updates the core.proto definition with latest changes discussed in today's sync.

    Main changes:

    • List of Features in main Processor object.
    • Renamed some messages (EventSequence => Event, FeatureSequence => Feature, Timestamps => Sampling).
    • Unified Input and Output into a single EventArgument used for both inputs and outputs.
    • FeatureSequence and Feature unified into a single Feature, and can now be part of several events.

    Final updated proto diagram: image

    opened by ianspektor 1
  • Contributing Tutorial

    Contributing Tutorial

    Hi, I'm a Kaggler, and I find tf-decision-forests very useful. Thus I want to contribute tutorials to the library ! Are tutorials contributions being accepted ? If yes then tutorials on which topics are currently in wishlist?

    opened by shivance 1
  • Plot very large decision tree

    Plot very large decision tree

    Hi,

    I have a decision tree with 20k nodes. How can I plot it?

    I checked the d3.js code but with svg its pretty slow to render 20k nodes and use some zoom with it.

    is there a way to generate a graphviz too and convert it to a huge png so I can view it with https://leafletjs.com/? or is there a way to draw the decision tree with d3 and canvas instead of svg?

    opened by Arnold1 6
  • Predictions do not function as documented.

    Predictions do not function as documented.

    Prediction with TFDF is extremely under documented.

    According to https://www.tensorflow.org/decision_forests/api_docs/python/tfdf/keras/RandomForestModel#predict you should be able to predict on numpy arrays, tensors, or datasets. Yet any attempt to do so has failed. It seems PrefetchDatasets are the only option.

    On top of this, prediction is dreadfully slow. My current use case is to do ensemble predictions of images. The images are 144k pixels which requires ~20 seconds for one model to make a prediction. Pixelwise predicts with normal TF can be near instantaneous with predict_on_batch which TFDF models are supposed to support. But PrefetchDatasets aren't compatible with it. So the answer is to use Numpy arrays. But that again is incompatible. All of this is said to be supported in the documentation but they appear unimplementable.

    I would like to stick with the TFDF method for my work but it is unreasonable slow.

    How can I implement faster prediction when it seems it's an under-documented area?

    opened by TheJeran 2
Releases(1.1.0)
  • 1.1.0(Nov 18, 2022)

    1.1.0 - 2022-11-18

    Features

    • Native support for TensorFlow Decision Forests in TensorFlow Serving.
    • Add support for zipped Yggdrasil Decision Forests model for yggdrasil_model_to_keras_model.
    • Added model prediction tutorial.
    • Prevent premature stopping of GBT training through new parameter early_stopping_initial_iteration.

    Fix

    • Using loaded datasets with TF-DF no longer fails (Github #131).
    • Automatically infer the semantic of int8 values as numerical (was categorical before).
    • Build script fixed
    • Model saving no longer fails when using invalid feature names.
    • Added keyword to pandas dataset drop (Github #135).
    Source code(tar.gz)
    Source code(zip)
  • 1.1.0rc2(Nov 10, 2022)

    Features

    • Support for Tensorflow Serving APIs.
    • Add support for zipped Yggdrasil Decision Forests model for yggdrasil_model_to_keras_model.
    • Added model prediction tutorial.
    • Prevent premature stopping of GBT training through new parameter early_stopping_initial_iteration.

    Fix

    • Using loaded datasets with TF-DF no longer fails (Github #131).
    • Automatically infer the semantic of int8 values as numerical (was categorical before).
    • Build script fixed
    • Model saving no longer fails when using invalid feature names.
    • Added keyword to pandas dataset drop (Github #135).
    Source code(tar.gz)
    Source code(zip)
  • serving-1.0.1(Sep 20, 2022)

    Nightly build of TensorFlow Serving 2.11. TensorFlow Serving >=2.11 supports natively TensorFlow Decision Forests models.

    Build instructions:

    git clone https://github.com/tensorflow/serving.git
    docker run -it -v ${PWD}/..:/working_dir -w /working_dir/serving tensorflow/serving:nightly-devel bash
    bazel build //tensorflow_serving/model_servers:tensorflow_model_server
    
    Source code(tar.gz)
    Source code(zip)
    tensorflow_model_server_linux.zip(89.36 MB)
  • 1.0.1(Sep 7, 2022)

    TensorFlow Decision Forests 1.0.1

    With this release, TensorFlow Decision Forests finally reaches its first major release 🥳

    With this milestone we want to communicate more broadly that TensorFlow Decision Forests has become a more stable and mature library. In particular, we established more comprehensive testing to make sure that TF-DF is ready for professional environments.

    Features

    • Add customization of the number of IO threads when using fit_on_dataset_path.

    Fix

    • Improved documentation
    • Improved testing and stability
    • Issue in the application of auditwheel
    Source code(tar.gz)
    Source code(zip)
  • 1.0.0rc0(Aug 26, 2022)

  • 0.2.7(Jul 17, 2022)

    Features

    • Multithreading of the oblique splitter for gradient boosted tree models.
    • Support for pure serving model i.e. model containing only serving data.
    • Add "edit_model" cli tool.

    Fix

    • Remove bias toward low outcome in uplift modeling.
    Source code(tar.gz)
    Source code(zip)
  • serving-0.2.6(Jun 1, 2022)

  • 0.2.5(May 19, 2022)

    Features

    • Adds the contrib module for contributed, non-core functionality.
    • Adds contrib.scikit_learn_model_converter, which facilitates converting Scikit-Learn tree-based models into TF-DF models.
    • Discard hessian splits with score lower than the parents. This change has little effect on the model quality, but it can reduce its size.
    • Add internal flag hessian_split_score_subtract_parent to subtract the parent score in the computation of an hessian split score.
    • Add support for hyper-parameter optimizers (also called tuner).
    • Add text pretty print of trees with tree.pretty() or str(tree).
    • Add support for loading YDF models with file prefixes. Newly created models have a random prefix attached to them. This allows combining multiple models in Keras.
    • Add support for discretized numerical features.
    Source code(tar.gz)
    Source code(zip)
  • 0.2.3(Jan 27, 2022)

    Features

    • Honest Random Forests (also work with Gradient Boosted Tree and CART).
    • Can train Random Forests with example sampling without replacement.
    • Add support for Focal Loss with Gradient Boosted Trees.
    • Add support for MacOS.

    Fixes

    • Incorrect default evaluation of categorical split with uplift tasks. This was making uplift models with missing categorical values perform worst, and made the inference of uplift model possibly slower.
    • Fix pd_dataframe_to_tf_dataset on Pandas dataframe not containing arrays.
    Source code(tar.gz)
    Source code(zip)
    tf_serving_linux.zip(83.06 MB)
  • 0.2.2(Dec 15, 2021)

    Features

    • Surface the validation_interval_in_trees, keep_non_leaf_label_distribution and 'random_seed' hyper-parameters.
    • Add the batch_size argument in the pd_dataframe_to_tf_dataset utility.
    • Automatically determine the number of threads if num_threads=None.
    • Add constructor argument try_resume_training to facilitate resuming training.
    • Check that the training dataset is well configured for TF-DF e.g. no repeat operation, has a large enough batch size, etc. The check can be disabled with check_dataset=False.
    • When a model is created manually with the model builder, and if the dataspec is not provided, tries to adapt the dataspec so that the model looks as if it was trained with the global imputation strategy for missing values (i.e. missing_value_policy: GLOBAL_IMPUTATION). This makes manually created models more likely to be compatible with the fast inference engines.
    • TF-DF models fit method now passes the validation_data to the Yggdrasil learners. This is used for example for early stopping in the case of GBT model.
    • Add the "loss" parameter of the GBT model directly in the model constructor.
    • Control the amount of training logs displayed in the notebook (if using notebook) or in the console with the verbose constructor argument and fit parameter of the model.

    Fixes

    • num_candidate_attributes is not ignored anymore when num_candidate_attributes_ratio=-1.
    • Use the median bucket split value strategy in the discretized numerical splitters (local and distributed).
    • Surface the max_num_scanned_rows_to_accumulate_statistics parameter to control how many examples are scanned to determine the feature statistics when training from a file dataset with fit_on_dataset_path.
    Source code(tar.gz)
    Source code(zip)
  • 0.2.1(Nov 8, 2021)

  • 0.2.0(Nov 1, 2021)

    Features

    • Add advanced option predict_single_probability_for_binary_classification to generate prediction tensors of shape [batch_size, 2] for binary classification model.
    • Add support for weighted training.
    • Add support for permutation variable importance in the GBT learner with the compute_permutation_variable_importance parameter.
    • Support for tf.int8 and tf.int16 values.
    • Support for distributed gradient boosted trees learning. Currently, the TF ParameterServerStrategy distribution strategy is only available in monolithic TF-DF builds. The Yggdrasil Decision Forest GRPC distribute strategy can be used instead.
    • Support for training from dataset stored on disk in CSV and RecordIO format (instead of creating a tensorflow dataset). This option is currently more efficient for distributed training (until the ParameterServerStrategy support per-worker datasets).
    • Add max_vocab_count argument to the model constructor. The existing max_vocab_count argument in FeatureUsage objects take precedence.

    Fixes

    • Missing filtering of unique values in the categorical-set training feature accumulator. Was responsible for a small (e.g. ~0.5% on SST2 dataset) drop of accuracy compared to the C++ API.
    • Fix broken support for max_vocab_count in a FeatureUsage with type CATEGORICAL_SET.
    Source code(tar.gz)
    Source code(zip)
  • 0.1.9(Aug 31, 2021)

    Features

    • Disable tree pruning in the CART algorithm if the validation dataset is empty (i.e. validation_ratio=0).
    • Migration to Tensorflow 2.6. You will see an undefined symbol error if you install this version with a TensorFlow version different than 2.6. Previous versions were compiled for TF 2.5.

    Fixes

    • Fix failure from Github Issue #45 where the wrong field was accessed for leaf node distributions.
    • Fix saving of categorical features specification in the Builder.
    Source code(tar.gz)
    Source code(zip)
  • 0.1.9rc1(Aug 25, 2021)

    Pre-release of 0.1.9

    Major change : Tensorflow 2.6 compatibility

    This release is currently being tested and will be updated to be the latest version in PyPI soon, in the meantime users who need the fixes below can install this version directly from the wheels below, i.e. pip install tensorflow_decision_forests-0.1.9-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl for python 3.9.

    Fixes

    • Fix failure from Github Issue #45 where the wrong field was accessed for leaf node distributions.

    • Fix incorrect handling of CART pruning when validation set is empty. Previously, the whole tree would be erroneously pruned. Now, pruning is disabled if the validation set is not specified.

    • Fix saving of categorical features specification in the Builder.

    • Migration to Tensorflow 2.6. You will see an undefined symbol error if you install this version with a TensorFlow version different than 2.6. Previous versions were compiled for TF 2.5.

    Source code(tar.gz)
    Source code(zip)
    tensorflow_decision_forests-0.1.9-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl(6.01 MB)
    tensorflow_decision_forests-0.1.9-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.1.whl(6.01 MB)
    tensorflow_decision_forests-0.1.9-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl(6.01 MB)
    tensorflow_decision_forests-0.1.9-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl(6.01 MB)
  • 0.1.8(Jul 29, 2021)

    Features

    • Model can be composed with the functional Keras API before being trained.
    • Makes all the Yggdrasil structural variable importances available.
    • Makes getting the variable importance instantaneous.
    • Surface the name argument in the model classes constructors.
    • Add a postprocessing model constructor argument to easy apply post-processing on the model predictions without relying on the Keras Functional API.
    • Add extract_all_trees method in the model inspector to efficiently exact all the trees.
    • Add num_threads constructor argument to control the number of training threads without using the advanced configuration.
    • By default, remove the temporary directory used to train the model when the model python object is garbage collected.
    • Add the import_dataspec constructor argument to the model builder to import the feature definition and dictionaries (instead of relying on automatic discovery).

    Changes

    • When saving a model in a directory already containing a model, only the assets directory is entirely removed before the export (instead of the entire model directory).

    Fixes

    • Wrong label shape in the model inspector's objective field for pre-integerized labels.
    Source code(tar.gz)
    Source code(zip)
  • 0.1.7(Jun 24, 2021)

    Features

    • Add more of characters to the non-recommended list of feature name characters.
    • Make the inference op multi-thread compatible.
    • Print an explicit error and some instructions when training a model with a Pandas dataframe.
    • pd_dataframe_to_tf_dataset can automatically rename feature to make them compatible with SavedModel export signatures.
    • model.save(...) can override an existing model.
    • The link function of GBT model can be removed. For example, a binary classification GBT model trained with apply_link_function=False will output logits.
    Source code(tar.gz)
    Source code(zip)
  • 0.1.6(Jun 8, 2021)

    Features

    • Add hyper-parameter sorting_strategy to disable the computation of the pre-sorted index (slower to train, but consumes less memory).
    • Format wrapper code for colab help display.
    • Raises an error when a feature name is not compatible (e.g. contains a space).
    Source code(tar.gz)
    Source code(zip)
  • 0.1.5(May 26, 2021)

    Features

    • Raise an error of the number of classes is greater than 100 (can be disabled).
    • Raise an error if the model's task does not match the pd_dataframe_to_tf_dataset's task.

    Bug fix

    • Fix failure when input feature contains commas.
    Source code(tar.gz)
    Source code(zip)
  • 0.1.4(May 21, 2021)

    Features

    • Stop the training when interrupting a colab cell / typing ctrl-c.
    • model.fit support training callbacks and a validation dataset.

    Bug fix

    • Fix failure when there are not input features.
    Source code(tar.gz)
    Source code(zip)
  • 0.1.2(May 18, 2021)

  • 0.1.0(May 17, 2021)

    Release 0.1.0 (2021-05-11)

    Initial Release of TensorFlow Decision Forests.

    Features

    • Random Forest learner.
    • Gradient Boosted Tree learner.
    • CART learner.
    • Model inspector: Inspect the internal model structure.
    • Model plotter: Plot decision trees.
    • Model builder: Create model "by hand".
    Source code(tar.gz)
    Source code(zip)
Owner
null
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models. Solve a variety of tasks with pre-trained models or finetune them in

Backprop 227 Dec 10, 2022
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

null 152 Jan 2, 2023
ThunderGBM: Fast GBDTs and Random Forests on GPUs

Documentations | Installation | Parameters | Python (scikit-learn) interface What's new? ThunderGBM won 2019 Best Paper Award from IEEE Transactions o

Xtra Computing Group 648 Dec 16, 2022
A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

?? Interactive Machine Learning experiments: ??️models training + ??models demo

Oleksii Trekhleb 1.4k Jan 6, 2023
Automated machine learning: Review of the state-of-the-art and opportunities for healthcare

Automated machine learning: Review of the state-of-the-art and opportunities for healthcare

null 42 Dec 23, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 7, 2023
Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API.

null 164 Jan 4, 2023
Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Call of Duty World League: Search & Destroy Outcome Predictions Growing up as an avid Call of Duty player, I was always curious about what factors led

Brett Vogelsang 2 Jan 18, 2022
Uber Open Source 1.6k Dec 31, 2022
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 363 Dec 14, 2022
It is a forest of random projection trees

rpforest rpforest is a Python library for approximate nearest neighbours search: finding points in a high-dimensional space that are close to a given

Lyst 211 Dec 29, 2022
LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate.

LibRerank LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRer

null 126 Dec 28, 2022
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

Tamas Madl 482 Nov 19, 2022
Test symmetries with sklearn decision tree models

Test symmetries with sklearn decision tree models Setup Begin from an environment with a recent version of python 3. source setup.sh Leave the enviro

Rupert Tombs 2 Jul 19, 2022
Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

FINRA 25 Dec 28, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 5, 2023
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

Horovod 12.9k Jan 7, 2023
Data Efficient Decision Making

Data Efficient Decision Making

Microsoft 197 Jan 6, 2023
Decision Tree Regression algorithm implemented on Python from scratch.

Decision_Tree_Regression I implemented the decision tree regression algorithm on Python. Unlike regular linear regression, this algorithm is used when

null 1 Dec 22, 2021