A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Overview

Website | Documentation | Tutorials | Installation | Release Notes

GitHub license PyPI version Conda Version GitHub issues Telegram

CatBoost is a machine learning method based on gradient boosting over decision trees.

Main advantages of CatBoost:

Get Started and Documentation

All CatBoost documentation is available here.

Install CatBoost by following the guide for the

Next you may want to investigate:

If you cannot open documentation in your browser try adding yastatic.net and yastat.net to the list of allowed domains in your privacy badger.

Catboost models in production

If you want to evaluate Catboost model in your application read model api documentation.

Questions and bug reports

Help to Make CatBoost Better

  • Check out open problems and help wanted issues to see what can be improved, or open an issue if you want something.
  • Add your stories and experience to Awesome CatBoost.
  • To contribute to CatBoost you need to first read CLA text and add to your pull request, that you agree to the terms of the CLA. More information can be found in CONTRIBUTING.md
  • Instructions for contributors can be found here.

News

Latest news are published on twitter.

Reference Paper

Anna Veronika Dorogush, Andrey Gulin, Gleb Gusev, Nikita Kazeev, Liudmila Ostroumova Prokhorenkova, Aleksandr Vorobev "Fighting biases with dynamic boosting". arXiv:1706.09516, 2017.

Anna Veronika Dorogush, Vasily Ershov, Andrey Gulin "CatBoost: gradient boosting with categorical features support". Workshop on ML Systems at NIPS 2017.

License

© YANDEX LLC, 2017-2019. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

Comments
  • UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)

    Problem:UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128) catboost version: catboost 0.25 Operating System:win10

    When I use setup.py to install Catboost, this error occurs, and if I look closely it is divided into two parts: 1. Using CUDA to create _catboost.pyd will cause an error like 'UnicodeDecodeError:' ASCII 'codec can't decode byte 0xCD in position 9: Ordinal not in range(128). 2. Do not use the CUDA to create _catboost. pyd, there will be "subprocess. CalledProcessError:Command '['D:\anaconda3\python.exe', 'D:\learn\catboost-master\ya', 'make', 'D:\learn\catboost-master\catboost\python-package\..\..\catboost\python-package\catboost', '--no-src-links', '--output', 'D:\ learn\ catboost-master\catboost\python-package\build\temp.win-amd64-3.8\Release', '-dpython_config =python3-config',' -duse_arcadia_python =no', '-dos_sdk =local', '-r','-DNO_DEBUGINFO', '-DHAVE_CUDA= NO '] returned non-zero exit status 1." I also tried converting _catboost.pyx from GitHub to _catboost.pyd using 'python setup.py build_ext --inplace' directly, but I got the same error as when installing CatBoost.

    C:\Users\王普聪>pip install -e D:\learn\catboost-master\catboost\python-package
    Obtaining file:///D:/learn/catboost-master/catboost/python-package
    Requirement already satisfied: graphviz in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (0.16)
    Requirement already satisfied: plotly in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (4.14.3)
    Requirement already satisfied: six in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.15.0)
    Requirement already satisfied: matplotlib in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (3.2.2)
    Requirement already satisfied: numpy>=1.16.0 in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.18.5)
    Requirement already satisfied: pandas>=0.24 in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.0.5)
    Requirement already satisfied: scipy in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.5.0)
    Requirement already satisfied: retrying>=1.3.3 in d:\anaconda3\lib\site-packages (from plotly->catboost==0.24.4) (1.3.3)
    Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (2.4.7)
    Requirement already satisfied: cycler>=0.10 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (0.10.0)
    Requirement already satisfied: kiwisolver>=1.0.1 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (1.2.0)
    Requirement already satisfied: python-dateutil>=2.1 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (2.8.1)
    Requirement already satisfied: pytz>=2017.2 in d:\anaconda3\lib\site-packages (from pandas>=0.24->catboost==0.24.4) (2020.1)
    Installing collected packages: catboost
      Running setup.py develop for catboost
        ERROR: Command errored out with exit status 1:
         command: 'D:\anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"'; __file__='"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps
             cwd: D:\learn\catboost-master\catboost\python-package\
        Complete output (159 lines):
        running develop
        15:30:22 I Targeting for CUDA support with C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1
        running egg_info
        writing catboost.egg-info\PKG-INFO
        writing dependency_links to catboost.egg-info\dependency_links.txt
        writing requirements to catboost.egg-info\requires.txt
        writing top-level names to catboost.egg-info\top_level.txt
        15:30:24 I Targeting for CUDA support with C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1
        reading manifest file 'catboost.egg-info\SOURCES.txt'
        writing manifest file 'catboost.egg-info\SOURCES.txt'
        running build_ext
        15:30:24 I Targeting for CUDA support with C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1
        15:30:24 I Buildling _catboost.pyd with ymake
        15:30:24 I EXECUTE: D:\anaconda3\python.exe D:\learn\catboost-master\ya make D:\learn\catboost-master\catboost\python-package\..\..\catboost\python-package\catboost --no-src-links --output D:\learn\catboost-master\catboost\python-package\build\temp.win-amd64-3.8\Release -DPYTHON_CONFIG=python3-config -DUSE_ARCADIA_PYTHON=no -DOS_SDK=local -r -DNO_DEBUGINFO -DHAVE_CUDA=yes "-DCUDA_ROOT=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1"
        Output root is subdirectory of Arcadia root, this may cause non-idempotent build
        Traceback (most recent call last):
          File "devtools/ya/app.py", line 422, in configure_exit_interceptor
            yield
          File "devtools/ya/app.py", line 65, in helper
            return action(args)
          File "devtools/ya/entry/entry.py", line 55, in do_main
            res = handler.handle(handler, args, prefix=['ya'])
          File "devtools/ya/core/handler.py", line 159, in handle
            return handler.handle(self, args[1:], prefix + [name])
          File "devtools/ya/core/dispatch.py", line 37, in handle
            return self.command().handle(root_handler, args, prefix)
          File "devtools/ya/core/handler.py", line 341, in handle
            return self._action(params)
          File "devtools/ya/app.py", line 92, in helper
            return action(ctx.params)
          File "devtools/ya/build/build_handler.py", line 85, in do_ya_make
            builder = ya_make.YaMake(params, app_ctx)
          File "devtools/ya/build/ya_make.py", line 895, in __init__
            self.ctx = Context(self.opts, app_ctx=app_ctx, graph=graph, tests=tests, stripped_tests=stripped_tests, configure_errors=configure_errors, make_files=make_files, lite_graph=lite_graph)
          File "devtools/ya/build/ya_make.py", line 574, in __init__
            self.graph, self.tests, self.stripped_tests, self.configure_errors, self.make_files = _build_graph_and_tests(self.opts, app_ctx)
          File "devtools/ya/build/ya_make.py", line 258, in _build_graph_and_tests
            graph, tests, stripped_tests, gh, make_files = lg.build_graph_and_tests(opts, check=True, ev_listener=ev_listener, display=display)
          File "devtools/ya/build/graph.py", line 1688, in build_graph_and_tests
            return _build_graph_and_tests(opts, check, ev_listener, exit_stack, display)
          File "devtools/ya/build/graph.py", line 1992, in _build_graph_and_tests
            real_ymake_bin = tools.tool('ymake')
          File "devtools/ya/yalibrary/tools/__init__.py", line 220, in tool
            return toolchain.find(name, with_params, for_platform, cache=cache)
          File "devtools/ya/yalibrary/tools/__init__.py", line 158, in find
            executable = cur_bottle[executable_name]  # if executable_name is None it's Ok
          File "devtools/ya/yalibrary/tools/__init__.py", line 64, in __getitem__
            path = self.resolve()
          File "devtools/ya/yalibrary/tools/__init__.py", line 46, in resolve
            return self.__fetcher.fetch_if_need(self.__formula["match"], tared, binname, cache=cache).where
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 385, in fetch_if_need
            self.__c[key] = self._fetch_if_need(*args, **kwargs)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 452, in _fetch_if_need
            if self._fetch(name, tared, lambda x: name.lower() in x.lower(), binname):
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 368, in _fetch
            _install(res_path, do_install)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 104, in _install
            fs_handler(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 95, in fs_handler
            func(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 350, in do_install
            deploy_params=(UNTAR, resource_info if resource_info else {"file_name": "FILE"}, ""))
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 137, in _deploy_tool
            exts.archive.extract_from_tar(archive, extract_to)
          File "devtools/ya/exts/archive.py", line 16, in extract_from_tar
            archive.extract_tar(tar_file_path, output_dir)
          File "library/python/archive/__init__.py", line 62, in extract_tar
            output_dir = encode(output_dir, ENCODING)
          File "library/python/archive/__init__.py", line 58, in encode
            return value.encode(encoding)
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)
        15:30:37 E Cannot build _catboost.pyd with CUDA support, will build without CUDA
        15:30:37 I EXECUTE: D:\anaconda3\python.exe D:\learn\catboost-master\ya make D:\learn\catboost-master\catboost\python-package\..\..\catboost\python-package\catboost --no-src-links --output D:\learn\catboost-master\catboost\python-package\build\temp.win-amd64-3.8\Release -DPYTHON_CONFIG=python3-config -DUSE_ARCADIA_PYTHON=no -DOS_SDK=local -r -DNO_DEBUGINFO -DHAVE_CUDA=no
        Output root is subdirectory of Arcadia root, this may cause non-idempotent build
        Traceback (most recent call last):
          File "devtools/ya/app.py", line 422, in configure_exit_interceptor
            yield
          File "devtools/ya/app.py", line 65, in helper
            return action(args)
          File "devtools/ya/entry/entry.py", line 55, in do_main
            res = handler.handle(handler, args, prefix=['ya'])
          File "devtools/ya/core/handler.py", line 159, in handle
            return handler.handle(self, args[1:], prefix + [name])
          File "devtools/ya/core/dispatch.py", line 37, in handle
            return self.command().handle(root_handler, args, prefix)
          File "devtools/ya/core/handler.py", line 341, in handle
            return self._action(params)
          File "devtools/ya/app.py", line 92, in helper
            return action(ctx.params)
          File "devtools/ya/build/build_handler.py", line 85, in do_ya_make
            builder = ya_make.YaMake(params, app_ctx)
          File "devtools/ya/build/ya_make.py", line 895, in __init__
            self.ctx = Context(self.opts, app_ctx=app_ctx, graph=graph, tests=tests, stripped_tests=stripped_tests, configure_errors=configure_errors, make_files=make_files, lite_graph=lite_graph)
          File "devtools/ya/build/ya_make.py", line 574, in __init__
            self.graph, self.tests, self.stripped_tests, self.configure_errors, self.make_files = _build_graph_and_tests(self.opts, app_ctx)
          File "devtools/ya/build/ya_make.py", line 258, in _build_graph_and_tests
            graph, tests, stripped_tests, gh, make_files = lg.build_graph_and_tests(opts, check=True, ev_listener=ev_listener, display=display)
          File "devtools/ya/build/graph.py", line 1688, in build_graph_and_tests
            return _build_graph_and_tests(opts, check, ev_listener, exit_stack, display)
          File "devtools/ya/build/graph.py", line 1992, in _build_graph_and_tests
            real_ymake_bin = tools.tool('ymake')
          File "devtools/ya/yalibrary/tools/__init__.py", line 220, in tool
            return toolchain.find(name, with_params, for_platform, cache=cache)
          File "devtools/ya/yalibrary/tools/__init__.py", line 158, in find
            executable = cur_bottle[executable_name]  # if executable_name is None it's Ok
          File "devtools/ya/yalibrary/tools/__init__.py", line 64, in __getitem__
            path = self.resolve()
          File "devtools/ya/yalibrary/tools/__init__.py", line 46, in resolve
            return self.__fetcher.fetch_if_need(self.__formula["match"], tared, binname, cache=cache).where
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 385, in fetch_if_need
            self.__c[key] = self._fetch_if_need(*args, **kwargs)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 452, in _fetch_if_need
            if self._fetch(name, tared, lambda x: name.lower() in x.lower(), binname):
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 368, in _fetch
            _install(res_path, do_install)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 104, in _install
            fs_handler(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 95, in fs_handler
            func(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 350, in do_install
            deploy_params=(UNTAR, resource_info if resource_info else {"file_name": "FILE"}, ""))
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 137, in _deploy_tool
            exts.archive.extract_from_tar(archive, extract_to)
          File "devtools/ya/exts/archive.py", line 16, in extract_from_tar
            archive.extract_tar(tar_file_path, output_dir)
          File "library/python/archive/__init__.py", line 62, in extract_tar
            output_dir = encode(output_dir, ENCODING)
          File "library/python/archive/__init__.py", line 58, in encode
            return value.encode(encoding)
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 259, in <module>
            setup(
          File "D:\anaconda3\lib\site-packages\setuptools\__init__.py", line 153, in setup
            return distutils.core.setup(**attrs)
          File "D:\anaconda3\lib\distutils\core.py", line 148, in setup
            dist.run_commands()
          File "D:\anaconda3\lib\distutils\dist.py", line 966, in run_commands
            self.run_command(cmd)
          File "D:\anaconda3\lib\distutils\dist.py", line 985, in run_command
            cmd_obj.run()
          File "D:\anaconda3\lib\site-packages\setuptools\command\develop.py", line 34, in run
            self.install_for_development()
          File "D:\anaconda3\lib\site-packages\setuptools\command\develop.py", line 136, in install_for_development
            self.run_command('build_ext')
          File "D:\anaconda3\lib\distutils\cmd.py", line 313, in run_command
            self.distribution.run_command(command)
          File "D:\anaconda3\lib\distutils\dist.py", line 985, in run_command
            cmd_obj.run()
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 186, in run
            self.build_with_ymake(topsrc_dir, build_dir, catboost_ext, put_dir, verbose, dry_run)
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 219, in build_with_ymake
            logging_execute(ymake_cmd + ['-DHAVE_CUDA=no'], verbose, dry_run)
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 62, in logging_execute
            subprocess.check_call(cmd, universal_newlines=True)
          File "D:\anaconda3\lib\subprocess.py", line 364, in check_call
            raise CalledProcessError(retcode, cmd)
        subprocess.CalledProcessError: Command '['D:\\anaconda3\\python.exe', 'D:\\learn\\catboost-master\\ya', 'make', 'D:\\learn\\catboost-master\\catboost\\python-package\\..\\..\\catboost\\python-package\\catboost', '--no-src-links', '--output', 'D:\\learn\\catboost-master\\catboost\\python-package\\build\\temp.win-amd64-3.8\\Release', '-DPYTHON_CONFIG=python3-config', '-DUSE_ARCADIA_PYTHON=no', '-DOS_SDK=local', '-r', '-DNO_DEBUGINFO', '-DHAVE_CUDA=no']' returned non-zero exit status 1.
        ----------------------------------------
    ERROR: Command errored out with exit status 1: 'D:\anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"'; __file__='"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.
    
    opened by Wangpc-972 67
  • User description is used by default. Move metric creation metric to corresponding class factories.

    User description is used by default. Move metric creation metric to corresponding class factories.

    Each metric now uses user-specified parameters in their descriptions by default.

    Design

    TMetric now stores a TMap<TString, TString> of user parameters, which are used to construct a metric description (e.g. MetricName:key1=value1;key2=value2). This implementation is defined in the base class and is now the default behaviour for building metric descriptions.

    Some of specifiv GetDescription method implementations are kept in order to be consistent with the existing behaviour.

    Note

    UserQuerywiseMetric now uses the options in its representation as well.

    opened by ivanychev 38
  • Sum of shap values does not equal to the prediction

    Sum of shap values does not equal to the prediction

    Problem: Sum of shap values does not equal to the prediction catboost version: 0.18.1 Operating System: Ubuntu 19.10 CPU: i7-8565U

    It only happens sometimes but we find that the of shap values does not equal to the prediction. Please let us know how we can provide further information

    in progress bug 
    opened by hopoluicha 27
  • How catboost handle with big data?

    How catboost handle with big data?

    Hi! I try to use catboost in kaggle competition. https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection The size of my train set about 40m rows with 14 features. When i try to train model, kernel always dies without any errors...

    need info 
    opened by Mechanix12 27
  • Unknown class labels

    Unknown class labels

    I'm beginner using boosting models ,I'm trying to implement catboost . My input data has 6 categorical features and 2 numerical feature . My target variable is numerical data. I'm running on GPU . I'm facing the problem below please help me. Cannot chare data due privacy issue.

    Traceback (most recent call last): File "/work/ilt/css8222/cat_boost/cat_boost.py", line 127, in save_snapshot = True File "/fibus/fs2/15/css8222/.local/lib/python3.6/site-packages/catboost/core.py", line 4718, in fit silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr) File "/fibus/fs2/15/css8222/.local/lib/python3.6/site-packages/catboost/core.py", line 2042, in _fit train_params["init_model"] File "/fibus/fs2/15/css8222/.local/lib/python3.6/site-packages/catboost/core.py", line 1464, in _train self._object._train(train_pool, test_pool, params, allow_clear_pool, init_model._object if init_model else None) File "_catboost.pyx", line 4393, in _catboost._CatBoost._train File "_catboost.pyx", line 4442, in _catboost._CatBoost._train _catboost.CatBoostError: catboost/private/libs/target/target_converter.cpp:226: Unknown class label: "14289"

    opened by sujay003 25
  • Faster SHAP values for small batches

    Faster SHAP values for small batches

    For small batches use direct SHAP values calculation. Direct algorithm (without precalculation) is faster when (where DocumentsNumber < MeanLeafCount), because for preprocessing we find SHAP values for MeanLeafCount documents.

    (algorithm from https://arxiv.org/abs/1802.03888)

    With preprocessing final complexity was O(NT(D+F))+O(TL^2 D^2) where N is the number of documents(objects), T - number of trees, D - average tree depth, F - average number of features in tree, L - average number of leaves in tree. But if the batch is small we can use default algorithm with complexity O(NTLD^2), which is better when N < L.

    Example: On dataset gisette (https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html) with 100 first features train CatBoostRegressor(iterations=500, depth=6, random_seed=42) and then use get_feature_importance to find SHAP values for the first object in test.

    Old:

    • 0.32 s

    New:

    • shap_mode="Auto" or "NoPreCalc"- 0.015 s
    • shap_mode="UsePreCalc" - 0.32 s (this is like it was before)

    I hereby agree to the terms of the CLA available at: link

    opened by Lokutrus 25
  • Tutorial for ranking modes in CatBoost

    Tutorial for ranking modes in CatBoost

    Hello.

    Looks like the current version of CatBoost supports learning to rank. There are some clues about it in the documentation, but I couldn't find any minimal working examples. I wonder which methods should be considered as a baseline approach and what are the prerequisites?

    Should we use YetiRank as the training metric and just provide a query id as the Pool group_id parameter? What other CatBoost parameters should be taken into account specifically for a ranking problem?

    Thank you!

    planned documentation 
    opened by hanky 24
  • GPU yields worse metric than CPU

    GPU yields worse metric than CPU

    Problem:various measurements become worse when I switch from CPU to GPU catboost version:0.22 Operating System:Linux 4.4.0-1100-aws x86_64 CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz

    GPU: Tesla M60

    I wanted to reduce the training time and so I specified 'task_type' as 'GPU'. I immediately noticed that its metrics got worse. The only change I made was setting task_type as GPU. The rest are the same.

    The training dataset has 1.2M rows and 218 columns. Among these 218 columns, 42 are categorical features. The rest are floats or integers, no text features. The validation dataset has 120K rows.

    The following are the parameters for the CPU version: {'nan_mode': 'Min', 'eval_metric': 'Logloss', 'combinations_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'Counter:CtrBorderCount=15:CtrBorderType=Uniform:Prior=0/1'], 'iterations': 1000, 'sampling_frequency': 'PerTree', 'fold_permutation_block': 0, 'leaf_estimation_method': 'Newton', 'od_pval': 0, 'counter_calc_method': 'SkipTest', 'grow_policy': 'SymmetricTree', 'boosting_type': 'Plain', 'model_shrink_mode': 'Constant', 'feature_border_type': 'GreedyLogSum', 'ctr_leaf_count_limit': 18446744073709551615, 'bayesian_matrix_reg': 0.10000000149011612, 'one_hot_max_size': 2, 'l2_leaf_reg': 3, 'random_strength': 1, 'od_type': 'Iter', 'rsm': 1, 'boost_from_average': False, 'max_ctr_complexity': 4, 'model_size_reg': 0.5, 'simple_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'Counter:CtrBorderCount=15:CtrBorderType=Uniform:Prior=0/1'], 'subsample': 0.800000011920929, 'use_best_model': True, 'od_wait': 35, 'class_names': [0, 1], 'random_seed': 42, 'depth': 6, 'ctr_target_border_count': 1, 'has_time': False, 'store_all_simple_ctr': False, 'border_count': 254, 'classes_count': 0, 'sparse_features_conflict_fraction': 0, 'leaf_estimation_backtracking': 'AnyImprovement', 'best_model_min_trees': 1, 'model_shrink_rate': 0, 'min_data_in_leaf': 1, 'loss_function': 'Logloss', 'learning_rate': 0.30000001192092896, 'score_function': 'Cosine', 'task_type': 'CPU', 'leaf_estimation_iterations': 10, 'bootstrap_type': 'MVS', 'max_leaves': 64, 'permutation_count': 4}

    The following are the parameters for the GPU version: {'nan_mode': 'Min', 'gpu_ram_part': 0.95, 'eval_metric': 'Logloss', 'combinations_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'FeatureFreq:CtrBorderCount=15:CtrBorderType=Median:Prior=0/1'], 'iterations': 1000, 'fold_permutation_block': 64, 'leaf_estimation_method': 'Newton', 'observations_to_bootstrap': 'TestOnly', 'od_pval': 0, 'counter_calc_method': 'SkipTest', 'grow_policy': 'SymmetricTree', 'boosting_type': 'Plain', 'ctr_history_unit': 'Sample', 'feature_border_type': 'GreedyLogSum', 'bayesian_matrix_reg': 0.10000000149011612, 'one_hot_max_size': 2, 'devices': '-1', 'pinned_memory_bytes': '104857600', 'l2_leaf_reg': 3, 'random_strength': 1, 'od_type': 'Iter', 'rsm': 1, 'boost_from_average': False, 'fold_size_loss_normalization': False, 'max_ctr_complexity': 4, 'gpu_cat_features_storage': 'GpuRam', 'simple_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'FeatureFreq:CtrBorderCount=15:CtrBorderType=MinEntropy:Prior=0/1'], 'use_best_model': True, 'od_wait': 35, 'class_names': [0, 1], 'random_seed': 42, 'depth': 6, 'ctr_target_border_count': 1, 'has_time': False, 'border_count': 128, 'min_fold_size': 100, 'data_partition': 'FeatureParallel', 'bagging_temperature': 1, 'classes_count': 0, 'leaf_estimation_backtracking': 'AnyImprovement', 'best_model_min_trees': 1, 'min_data_in_leaf': 1, 'add_ridge_penalty_to_loss_function': False, 'loss_function': 'Logloss', 'learning_rate': 0.30000001192092896, 'score_function': 'Cosine', 'task_type': 'GPU', 'leaf_estimation_iterations': 10, 'bootstrap_type': 'Bayesian', 'max_leaves': 64, 'permutation_count': 4}

    opened by kdlin 23
  • Using parameters from saved model for cross-validation leads to 'exclusive parameters' error.

    Using parameters from saved model for cross-validation leads to 'exclusive parameters' error.

    Problem: "Only one of parameters ['verbose', 'logging_level', 'verbose_eval', 'silent'] should be set" printed by cv function after loading from file previously saved model. catboost version: 0.12.2 Operating System: CentOS Linux release 7.4.1708 CPU: Intel(R) Xeon(R) CPU E5-2450 v2 @ 2.50GHz

    model = CatBoostClassifier(loss_function='MultiClass')
    model.fit(train_pool, 
      verbose=False, 
      plot=True,
      eval_set=validation_pool)
    model.save_model(str(model_path.absolute()))
    model = CatBoostClassifier()
    model.load_model(str(model_path.absolute()))
    cv_data = cv(
        whole_pool,
        params=model.get_params()
    )
    
    ---------------------------------------------------------------------------
    CatboostError                             Traceback (most recent call last)
    <ipython-input-40-f150897615b8> in <module>
          1 cv_data = cv(
          2     whole_pool,
    ----> 3     params=model.get_params()
          4 )
    
    ~/.conda/envs/catboost/lib/python3.6/site-packages/catboost/core.py in cv(pool, params, dtrain, iterations, num_boost_round, fold_count, nfold, inverted, partition_random_seed, seed, shuffle, logging_level, stratified, as_pandas, metric_period, verbose, verbose_eval, plot, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, max_time_spent_on_fixed_cost_ratio, dev_max_iterations_batch_size)
       2876 
       2877     params = deepcopy(params)
    -> 2878     _process_synonyms(params)
       2879 
       2880     metric_period, verbose, logging_level = _process_verbose(metric_period, verbose, logging_level, verbose_eval)
    
    ~/.conda/envs/catboost/lib/python3.6/site-packages/catboost/core.py in _process_synonyms(params)
        754         del params['silent']
        755 
    --> 756     metric_period, verbose, logging_level = _process_verbose(metric_period, verbose, logging_level, verbose_eval, silent)
        757 
        758     if metric_period is not None:
    
    ~/.conda/envs/catboost/lib/python3.6/site-packages/catboost/core.py in _process_verbose(metric_period, verbose, logging_level, verbose_eval, silent)
        133     at_most_one = sum(params.get(exclusive) is not None for exclusive in exclusive_params)
        134     if at_most_one > 1:
    --> 135         raise CatboostError('Only one of parameters {} should be set'.format(exclusive_params))
        136 
        137     if verbose is None:
    
    CatboostError: Only one of parameters ['verbose', 'logging_level', 'verbose_eval', 'silent'] should be set
    
    bug 
    opened by protsenkovi 23
  • Flag not copied unnecessarily with blank and whitespace

    Flag not copied unnecessarily with blank and whitespace

    Before submitting a pull request, please do the following steps:

    1. Read instructions for contributors here.
    2. Run ya make in catboost folder to make sure the code builds.
    3. Add tests that test your change.
    4. Run tests using ya make -t -A command.
    5. If you haven't already, complete the CLA. I hereby agree to the terms of the CLA available at https://yandex.ru/legal/cla/?lang=en.
    opened by sharaalfa 23
  • Issue trying to compile with specified gcc version

    Issue trying to compile with specified gcc version

    I'm trying to compile the catboost python wheel on my system. The default gcc version I have is 8, but I also have 7 installed so I'm trying to use that by setting the CC and CXX environment variables. However, when running:

    python mk_wheel.py -DCUDA_ROOT="/opt/cuda"
    

    I get the message:

    Info: Attention! Using system user-defined compiler: g++-7 (check CC and CXX env vars).
    Cross compilation with system CXX is not supported
    

    catboost version: git master Operating System: Linux CPU: i7 GPU: GTX 1080

    Thanks!

    build issues 
    opened by ctlaltdefeat 23
  • Prediction probability result mismatch - C API and Python

    Prediction probability result mismatch - C API and Python

    Problem: We used the Python API of catboost to train our multiclass classification model and the resultant .cbm model was used in python / C to do the prediction.

    I noticed that when making inferences using the same model and the same input data (the model expects 3 float features and 4 categorical features.), the prediction probability in Python is slightly different compared to the prediction probability using the C API.

    We use CatboostClassifier.predict_proba in Python with all default parameters, and we set SetPredictionType(modelHandle, APT_PROBABILITY); in C API.

    We found that the sum total of the probabilities returned in Python are always different from 1 (sometimes it is greater or less than 1), and in the case of the probabilities returned in C the sum of them is always equal to 1.

    We do not know if both ways to get the probability are the same (CatboostClassifier.predict_proba and SetPredictionType(modelHandle, APT_PROBABILITY);), but if they are the same, why is the result different?

    catboost version: 1.0.3

    Operating System: MacOS Ventura 13.1

    CPU: Apple M1

    opened by eli3xm 0
  • Spark Feature Importance issue

    Spark Feature Importance issue

    Problem: ai.catboost.CatBoostError: Unsupported data type for Label at ai.catboost.spark.DatasetLoadingContext$.getLabelCallback(DataHelpers.scala:465) catboost version: 1.1.1 Operating System: Linux, Spark 3.3.1

    The following method call fails with the error described above:

    ((CatBoostClassificationModel) model).getFeatureImportance(EFstrType.LossFunctionChange, evalPool, ECalcTypeShapValues.Regular)
    
    opened by eugene-kamenev 0
  •  Saved model's params are different from current model's params

    Saved model's params are different from current model's params

    Problem: Can't fit models on GPU, Saved model's params are different from current model's params catboost version: '1.1.1' Operating System: Windows 10 CPU: 0 GPU: 1

    model_cat_tm_1 = CatBoostClassifier( iterations=5000, loss_function ='Logloss', #eval_metric = 'AUC', learning_rate = 0.05, random_seed = 1, od_type = "Iter", od_wait = 200, depth = 5, task_type = "GPU", devices = '0:1', save_snapshot= False, )

    cv_params_tm_1 = model_cat_tm_1.get_params() cv_data_tm_1 = cv( Pool(train_tm_treatment_one_features, train_tm_treatment_one_target), cv_params_tm_1, plot=True, verbose=100, )

    Gettting this error (tried, rebooting the system, open another script - doesn't help)

    Training on fold [0/3]

    CatBoostError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_3516\715857703.py in 1 cv_params_tm_1 = model_cat_tm_1.get_params() ----> 2 cv_data_tm_1 = cv( 3 Pool(train_tm_treatment_one_features, train_tm_treatment_one_target), 4 cv_params_tm_1, 5 plot=True,

    ~\AppData\Roaming\Python\Python39\site-packages\catboost\core.py in cv(pool, params, dtrain, iterations, num_boost_round, fold_count, nfold, inverted, partition_random_seed, seed, shuffle, logging_level, stratified, as_pandas, metric_period, verbose, verbose_eval, plot, plot_file, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, metric_update_interval, folds, type, return_models, log_cout, log_cerr) 6648 with log_fixup(log_cout, log_cerr), plot_wrapper(plot, plot_file=plot_file, plot_title='Cross-validation plot', train_dirs=plot_dirs): 6649 if not return_models: -> 6650 return _cv(params, pool, fold_count, inverted, partition_random_seed, shuffle, stratified, 6651 metric_update_interval, as_pandas, folds, type, return_models) 6652 else:

    _catboost.pyx in _catboost._cv()

    _catboost.pyx in _catboost._cv()

    CatBoostError: C:/Program Files (x86)/Go Agent/pipelines/BuildMaster/catboost.git/catboost/cuda/methods/boosting_progress_tracker.cpp:171: Saved model's params are different from current model's params

    opened by MiMakh 0
  • Catboost spark fit error java.lang.ClassCastException

    Catboost spark fit error java.lang.ClassCastException

    Problem: net.razorvine.pickle.objects.TimeDelta cannot be cast to java.time.Duration catboost version: 1.0.6 Operating System: 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12)

    Hi, I'm trying to test catboost_spark in a Databricks notebook using the example from the official documentation: https://catboost.ai/en/docs/concepts/spark-quickstart-python#binary-classification

    When I run this command:

    classifier.fit(dataset=trainPool, evalDatasets=[evalPool])
    

    The following error is raised:

    java.lang.ClassCastException: net.razorvine.pickle.objects.TimeDelta cannot be cast to java.time.Duration
    
    ...
    
    Py4JJavaError: An error occurred while calling o18779.w.
    : java.lang.ClassCastException: net.razorvine.pickle.objects.TimeDelta cannot be cast to java.time.Duration
    	at ai.catboost.spark.params.DurationParam.w(Helpers.scala:61)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    	at py4j.Gateway.invoke(Gateway.java:295)
    	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    	at py4j.commands.CallCommand.execute(CallCommand.java:79)
    	at py4j.GatewayConnection.run(GatewayConnection.java:251)
    	at java.lang.Thread.run(Thread.java:748)
    

    I believe there is a similar issue to this but it is now closed. Thank you in advance for the help.

    opened by vitormanita 0
  • parameter missing for non_linear regression

    parameter missing for non_linear regression

    Problem: Non Linear Regression "Poly" Kernal parameter missing catboost version: 0.26.1 Operating System: Linux CPU:True GPU:False

    Hi there, I am training a model for linear regression problem but my data has non-linear in nature. So I have decided to change kernel like Poly or something for non_linear that we have Support Vector Regressor. I have tried searching for same in Catboost parameters but i couldn't get. Do you have plans for adding it? Thanks

    opened by hamza1424 0
Releases(v1.1.1)
Owner
CatBoost
CatBoost is a fast, scalable, high performance gradient boosting on decision trees library. Used for ranking, classification, regression and other ML tasks.
CatBoost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 3, 2023
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 84 Nov 25, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 8, 2023
MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees.

MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees. MooGBT optimizes for multiple objectives by defining constraints on sub-objective(s) along with a primary objective. The constraints are defined as upper bounds on sub-objective loss function. MooGBT uses a Augmented Lagrangian(AL) based constrained optimization framework with Gradient Boosted Trees, to optimize for multiple objectives.

Swiggy 66 Dec 6, 2022
An open-source library of algorithms to analyse time series in GPU and CPU.

An open-source library of algorithms to analyse time series in GPU and CPU.

Shapelets 216 Dec 30, 2022
LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate.

LibRerank LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRer

null 126 Dec 28, 2022
STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

TD Ameritrade 2.5k Jan 6, 2023
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

null 23.3k Dec 31, 2022
Uber Open Source 1.6k Dec 31, 2022
TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models and supports classification, regression and ranking. TF-DF is a TensorFlow wrapper around the Yggdrasil Decision Forests C++ libraries. Models trained with TF-DF are compatible with Yggdrasil Decision Forests' models, and vice versa.

null 538 Jan 1, 2023
Houseprices - Predict sales prices and practice feature engineering, RFs, and gradient boosting

House Prices - Advanced Regression Techniques Predicting House Prices with Machine Learning This project is build to enhance my knowledge about machin

null 1 Jan 1, 2022
Bonsai: Gradient Boosted Trees + Bayesian Optimization

Bonsai is a wrapper for the XGBoost and Catboost model training pipelines that leverages Bayesian optimization for computationally efficient hyperparameter tuning.

null 24 Oct 27, 2022
Decision Tree Regression algorithm implemented on Python from scratch.

Decision_Tree_Regression I implemented the decision tree regression algorithm on Python. Unlike regular linear regression, this algorithm is used when

null 1 Dec 22, 2021
AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models tabular data.

Robin 55 Dec 27, 2022
AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

AutoTabular AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just

wenqi 2 Jun 26, 2022
High performance implementation of Extreme Learning Machines (fast randomized neural networks).

High Performance toolbox for Extreme Learning Machines. Extreme learning machines (ELM) are a particular kind of Artificial Neural Networks, which sol

Anton Akusok 174 Dec 7, 2022
mlpack: a scalable C++ machine learning library --

a fast, flexible machine learning library Home | Documentation | Doxygen | Community | Help | IRC Chat Download: current stable version (3.4.2) mlpack

mlpack 4.2k Jan 1, 2023
Management of exclusive GPU access for distributed machine learning workloads

TensorHive is an open source tool for managing computing resources used by multiple users across distributed hosts. It focuses on granting

Paweł Rościszewski 131 Dec 12, 2022
Simple, fast, and parallelized symbolic regression in Python/Julia via regularized evolution and simulated annealing

Parallelized symbolic regression built on Julia, and interfaced by Python. Uses regularized evolution, simulated annealing, and gradient-free optimization.

Miles Cranmer 924 Jan 3, 2023