mlpack: a scalable C++ machine learning library --

Overview

mlpack: a fast, flexible machine learning library
a fast, flexible machine learning library

Home | Documentation | Doxygen | Community | Help | IRC Chat

Jenkins Coveralls License NumFOCUS

Download: current stable version (3.4.2)

mlpack is an intuitive, fast, and flexible C++ machine learning library with bindings to other languages. It is meant to be a machine learning analog to LAPACK, and aims to implement a wide array of machine learning methods and functions as a "swiss army knife" for machine learning researchers. In addition to its powerful C++ interface, mlpack also provides command-line programs, Python bindings, Julia bindings, Go bindings and R bindings.

mlpack uses an open governance model and is fiscally sponsored by NumFOCUS. Consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.


0. Contents

  1. Introduction
  2. Citation details
  3. Dependencies
  4. Building mlpack from source
  5. Running mlpack programs
  6. Using mlpack from Python
  7. Further documentation
  8. Bug reporting

1. Introduction

The mlpack website can be found at https://www.mlpack.org and it contains numerous tutorials and extensive documentation. This README serves as a guide for what mlpack is, how to install it, how to run it, and where to find more documentation. The website should be consulted for further information:

2. Citation details

If you use mlpack in your research or software, please cite mlpack using the citation below (given in BibTeX format):

@article{mlpack2018,
    title     = {mlpack 3: a fast, flexible machine learning library},
    author    = {Curtin, Ryan R. and Edel, Marcus and Lozhnikov, Mikhail and
                 Mentekidis, Yannis and Ghaisas, Sumedh and Zhang,
                 Shangtong},
    journal   = {Journal of Open Source Software},
    volume    = {3},
    issue     = {26},
    pages     = {726},
    year      = {2018},
    doi       = {10.21105/joss.00726},
    url       = {https://doi.org/10.21105/joss.00726}
}

Citations are beneficial for the growth and improvement of mlpack.

3. Dependencies

mlpack has the following dependencies:

  Armadillo      >= 8.400.0
  Boost (math_c99, spirit) >= 1.58.0
  CMake          >= 3.2.2
  ensmallen      >= 2.10.0
  cereal         >= 1.1.2

All of those should be available in your distribution's package manager. If not, you will have to compile each of them by hand. See the documentation for each of those packages for more information.

If you would like to use or build the mlpack Python bindings, make sure that the following Python packages are installed:

  setuptools
  cython >= 0.24
  numpy
  pandas >= 0.15.0

If you would like to build the Julia bindings, make sure that Julia >= 1.3.0 is installed.

If you would like to build the Go bindings, make sure that Go >= 1.11.0 is installed with this package:

 Gonum

If you would like to build the R bindings, make sure that R >= 4.0 is installed with these R packages.

 Rcpp >= 0.12.12
 RcppArmadillo >= 0.8.400.0
 RcppEnsmallen >= 0.2.10.0
 BH >= 1.58
 roxygen2

If the STB library headers are available, image loading support will be compiled.

If you are compiling Armadillo by hand, ensure that LAPACK and BLAS are enabled.

4. Building mlpack from source

This document discusses how to build mlpack from source. These build directions will work for any Linux-like shell environment (for example Ubuntu, macOS, FreeBSD etc). However, mlpack is in the repositories of many Linux distributions and so it may be easier to use the package manager for your system. For example, on Ubuntu, you can install the mlpack library and command-line executables (e.g. mlpack_pca, mlpack_kmeans etc.) with the following command:

$ sudo apt-get install libmlpack-dev mlpack-bin

On Fedora or Red Hat (EPEL): $ sudo dnf install mlpack-devel mlpack-bin

Note: Older Ubuntu versions may not have the most recent version of mlpack available---for instance, at the time of this writing, Ubuntu 16.04 only has mlpack 3.4.2 available. Options include upgrading your Ubuntu version, finding a PPA or other non-official sources, or installing with a manual build.

There are some useful pages to consult in addition to this section:

mlpack uses CMake as a build system and allows several flexible build configuration options. You can consult any of the CMake tutorials for further documentation, but this tutorial should be enough to get mlpack built and installed.

First, unpack the mlpack source and change into the unpacked directory. Here we use mlpack-x.y.z where x.y.z is the version.

$ tar -xzf mlpack-x.y.z.tar.gz
$ cd mlpack-x.y.z

Then, make a build directory. The directory can have any name, but 'build' is sufficient.

$ mkdir build
$ cd build

The next step is to run CMake to configure the project. Running CMake is the equivalent to running ./configure with autotools. If you run CMake with no options, it will configure the project to build with no debugging symbols and no profiling information:

$ cmake ../

Options can be specified to compile with debugging information and profiling information:

$ cmake -D DEBUG=ON -D PROFILE=ON ../

Options are specified with the -D flag. The allowed options include:

DEBUG=(ON/OFF): compile with debugging symbols
PROFILE=(ON/OFF): compile with profiling symbols
ARMA_EXTRA_DEBUG=(ON/OFF): compile with extra Armadillo debugging symbols
BOOST_ROOT=(/path/to/boost/): path to root of boost installation
ARMADILLO_INCLUDE_DIR=(/path/to/armadillo/include/): path to Armadillo headers
ARMADILLO_LIBRARY=(/path/to/armadillo/libarmadillo.so): Armadillo library
BUILD_CLI_EXECUTABLES=(ON/OFF): whether or not to build command-line programs
BUILD_PYTHON_BINDINGS=(ON/OFF): whether or not to build Python bindings
PYTHON_EXECUTABLE=(/path/to/python_version): Path to specific Python executable
PYTHON_INSTALL_PREFIX=(/path/to/python/): Path to root of Python installation
BUILD_JULIA_BINDINGS=(ON/OFF): whether or not to build Julia bindings
JULIA_EXECUTABLE=(/path/to/julia): Path to specific Julia executable
BUILD_GO_BINDINGS=(ON/OFF): whether or not to build Go bindings
GO_EXECUTABLE=(/path/to/go): Path to specific Go executable
BUILD_GO_SHLIB=(ON/OFF): whether or not to build shared libraries required by Go bindings
BUILD_R_BINDINGS=(ON/OFF): whether or not to build R bindings
R_EXECUTABLE=(/path/to/R): Path to specific R executable
BUILD_TESTS=(ON/OFF): whether or not to build tests
BUILD_SHARED_LIBS=(ON/OFF): compile shared libraries as opposed to
   static libraries
DISABLE_DOWNLOADS=(ON/OFF): whether to disable all downloads during build
DOWNLOAD_ENSMALLEN=(ON/OFF): If ensmallen is not found, download it
ENSMALLEN_INCLUDE_DIR=(/path/to/ensmallen/include): path to include directory
   for ensmallen
DOWNLOAD_STB_IMAGE=(ON/OFF): If STB is not found, download it
STB_IMAGE_INCLUDE_DIR=(/path/to/stb/include): path to include directory for
   STB image library
USE_OPENMP=(ON/OFF): whether or not to use OpenMP if available
BUILD_DOCS=(ON/OFF): build Doxygen documentation, if Doxygen is available
   (default ON)

Other tools can also be used to configure CMake, but those are not documented here. See this section of the build guide for more details, including a full list of options, and their default values.

By default, command-line programs will be built, and if the Python dependencies (Cython, setuptools, numpy, pandas) are available, then Python bindings will also be built. OpenMP will be used for parallelization when possible by default.

Once CMake is configured, building the library is as simple as typing 'make'. This will build all library components as well as 'mlpack_test'.

$ make

If you do not want to build everything in the library, individual components of the build can be specified:

$ make mlpack_pca mlpack_knn mlpack_kfn

If the build fails and you cannot figure out why, register an account on Github and submit an issue. The mlpack developers will quickly help you figure it out:

mlpack on Github

Alternately, mlpack help can be found in IRC at #mlpack on chat.freenode.net.

If you wish to install mlpack to /usr/local/include/mlpack/, /usr/local/lib/, and /usr/local/bin/, make sure you have root privileges (or write permissions to those three directories), and simply type

$ make install

You can now run the executables by name; you can link against mlpack with -lmlpack and the mlpack headers are found in /usr/local/include/mlpack/ and if Python bindings were built, you can access them with the mlpack package in Python.

If running the programs (i.e. $ mlpack_knn -h) gives an error of the form

error while loading shared libraries: libmlpack.so.2: cannot open shared object file: No such file or directory

then be sure that the runtime linker is searching the directory where libmlpack.so was installed (probably /usr/local/lib/ unless you set it manually). One way to do this, on Linux, is to ensure that the LD_LIBRARY_PATH environment variable has the directory that contains libmlpack.so. Using bash, this can be set easily:

export LD_LIBRARY_PATH="/usr/local/lib/:$LD_LIBRARY_PATH"

(or whatever directory libmlpack.so is installed in.)

5. Running mlpack programs

After building mlpack, the executables will reside in build/bin/. You can call them from there, or you can install the library and (depending on system settings) they should be added to your PATH and you can call them directly. The documentation below assumes the executables are in your PATH.

Consider the 'mlpack_knn' program, which finds the k nearest neighbors in a reference dataset of all the points in a query set. That is, we have a query and a reference dataset. For each point in the query dataset, we wish to know the k points in the reference dataset which are closest to the given query point.

Alternately, if the query and reference datasets are the same, the problem can be stated more simply: for each point in the dataset, we wish to know the k nearest points to that point.

Each mlpack program has extensive help documentation which details what the method does, what each of the parameters is, and how to use them:

$ mlpack_knn --help

Running mlpack_knn on one dataset (that is, the query and reference datasets are the same) and finding the 5 nearest neighbors is very simple:

$ mlpack_knn -r dataset.csv -n neighbors_out.csv -d distances_out.csv -k 5 -v

The -v (--verbose) flag is optional; it gives informational output. It is not unique to mlpack_knn but is available in all mlpack programs. Verbose output also gives timing output at the end of the program, which can be very useful.

6. Using mlpack from Python

If mlpack is installed to the system, then the mlpack Python bindings should be automatically in your PYTHONPATH, and importing mlpack functionality into Python should be very simple:

>>> from mlpack import knn

Accessing help is easy:

>>> help(knn)

The API is similar to the command-line programs. So, running knn() (k-nearest-neighbor search) on the numpy matrix dataset and finding the 5 nearest neighbors is very simple:

>>> output = knn(reference=dataset, k=5, verbose=True)

This will store the output neighbors in output['neighbors'] and the output distances in output['distances']. Other mlpack bindings function similarly, and the input/output parameters exactly match those of the command-line programs.

7. Further documentation

The documentation given here is only a fraction of the available documentation for mlpack. If doxygen is installed, you can type make doc to build the documentation locally. Alternately, up-to-date documentation is available for older versions of mlpack:

8. Bug reporting

(see also mlpack help)

If you find a bug in mlpack or have any problems, numerous routes are available for help.

Github is used for bug tracking, and can be found at https://github.com/mlpack/mlpack/. It is easy to register an account and file a bug there, and the mlpack development team will try to quickly resolve your issue.

In addition, mailing lists are available. The mlpack discussion list is available at

mlpack discussion list

and the git commit list is available at

commit list

Lastly, the IRC channel #mlpack on Freenode can be used to get help.

Comments
  • [GSoC] Augmented RNN models - benchmarking framework

    [GSoC] Augmented RNN models - benchmarking framework

    This PR is part of my GSoC project "Augmented RNNs". Imeplemented:

    • class CopyTask for evaluating models on the sequence copy problem, showcasing benchmarking framework;
    • unit test for it (a simple non-ML model that is hardcoded to copy the sequence required number of times is expected to ace the CopyTask).
    opened by sidorov-ks 102
  • Swap boost::variant with vtable.

    Swap boost::variant with vtable.

    I updated the abstract class and also update the Linear layer as an example, there are various layer we have to update, so if anybody likes to work on some of the layers I listed below, comment on the PR. Unfortunately I can't enable commit permission to a specific branch. So to get the changes in you you can just fork the repository as usual create a new feature branch and do the changes, but instead of opening another PR, just post the link to the branch here and I cherry-pick the commit.

    Steps:

    1. Inherit the Layer class, each layer should implement the necessary functions that are relevant for the layer-specific computations and inherent the rest from the base class.
    2. Rename InputDataType to InputType and OutputDataType to OutputType, to make the interface more consistent with the rest of the codebase, rename the type for the input and output data.
    3. Use InputType and OutputType instead of arma::mat or arma::Mat<eT>, to make the layer work with the abstract class we have to follow the interface accordingly.
    4. Provide default layer type to hide some of the template functionalities that could be confusing for users that aren’t familiar with templates. So instead of using Linear<> all the time, a user can just use Linear. This is a result of https://github.com/mlpack/mlpack/issues/2524#issuecomment-664776530.
    5. Update the tests to use the updated interface.

    Example: For an example checkout the Linear layer.

    Here is a list of layers we have to update:

    • [x] adaptive_max_pooling.hpp - @Aakash-kaushik
    • [x] adaptive_mean_pooling.hpp - @Aakash-kaushik
    • [x] add.hpp - @Aakash-kaushik
    • [x] add_merge.hpp - @Aakash-kaushik
    • [x] alpha_dropout.hpp - @Aakash-kaushik
    • [x] atrous_convolution.hpp - @Aakash-kaushik
    • [x] batch_norm.hpp - @Aakash-kaushik
    • [x] base_layer.hpp - @mrityunjay-tripathi
    • [x] bilinear_interpolation.hpp - @mrityunjay-tripathi
    • [x] c_relu.hpp - @zoq
    • [x] celu.hpp - @zoq
    • [x] concat.hpp - @mrityunjay-tripathi
    • [ ] concat_performance.hpp - @hello-fri-end
    • [x] concatenate.hpp - @mrityunjay-tripathi
    • [x] constant.hpp - @zoq
    • [x] convolution.hpp - @mrityunjay-tripathi
    • [x] dropconnect.hpp - @zoq
    • [x] dropout.hpp - @zoq
    • [x] elu.hpp - @zoq
    • [x] fast_lstm.hpp - @mrityunjay-tripathi
    • [x] flexible_relu.hpp - @zoq
    • [x] glimpse.hpp - @mrityunjay-tripathi
    • [ ] gru.hpp - @zoq
    • [x] hard_tanh.hpp - @zoq
    • [x] hardshrink.hpp - @zoq
    • [x] highway.hpp - @mrityunjay-tripathi
    • [x] join.hpp - @mrityunjay-tripathi
    • [x] layer_norm.hpp - @mrityunjay-tripathi
    • [x] leaky_relu.hpp - @zoq
    • [x] linear.hpp - @zoq
    • [x] linear3d.hpp - @mrityunjay-tripathi
    • [x] linear_no_bias.hpp - @zoq
    • [x] log_softmax.hpp - @zoq
    • [x] lookup.hpp - @mrityunjay-tripathi
    • [ ] lstm.hpp - @zoq
    • [x] max_pooling.hpp - @mrityunjay-tripathi
    • [x] mean_pooling.hpp - @mrityunjay-tripathi
    • [ ] minibatch_discrimination.hpp - @hello-fri-end
    • [x] multihead_attention.hpp - @mrityunjay-tripathi
    • [x] multiply_constant.hpp - @zoq
    • [x] multiply_merge.hpp - @mrityunjay-tripathi
    • [x] noisylinear.hpp - @zoq
    • [x] padding.hpp - @mrityunjay-tripathi
    • [x] parametric_relu.hpp - @zoq
    • [x] positional_encoding.hpp - @mrityunjay-tripathi
    • [x] radial_basis_function.hpp - @hello-fri-end
    • [ ] recurrent.hpp - @kaushal07wick
    • [ ] recurrent_attention.hpp - @kaushal07wick
    • [x] reinforce_normal.hpp - @mrityunjay-tripathi
    • [x] reparametrization.hpp - @mrityunjay-tripathi
    • [x] select.hpp - @mrityunjay-tripathi
    • [x] sequential.hpp - @mrityunjay-tripathi
    • [x] softmax.hpp - @zoq
    • [x] softmin.hpp - @zoq
    • [x] softshrink.hpp - @zoq
    • [x] spatial_dropout.hpp - @zoq
    • [x] subview.hpp - @mrityunjay-tripathi
    • [x] transposed_convolution.hpp - @mrityunjay-tripathi
    • [x] virtual_batch_norm.hpp - @mrityunjay-tripathi
    • [x] vr_class_reward.hpp - @mrityunjay-tripathi
    • [x] weight_norm.hpp - @mrityunjay-tripathi

    I left the base layer since I'm not sure yet if it makes sense to implement them as an independent class.


    Building upon the work from @Aakash-kaushik we can get a first impression of the advantage of using boost::visitor compared with a virtual inheritance approach (#2647)

    Note we stripped basically everything out, except the FNN class, linear layer, FlexibleReLU layer, LogSoftMax layer; which enables us to get a first impression about what timings we can expect from a virtual inheritance approach.

    I tested two scenarios, but used the same network for each:

    FFN<> model;
    model.Add<Linear<> >(trainData.n_rows, 128);
    model.Add<FlexibleReLU<> >();
    model.Add<Linear<> >(128, 256);
    model.Add<Linear<> >(256, 256);
    model.Add<Linear<> >(256, 256);
    model.Add<Linear<> >(256, 256);
    model.Add<Linear<> >(256, 512);
    model.Add<Linear<> >(512, 2048);
    model.Add<Linear<> >(2048, 512);
    model.Add<Linear<> >(512, 8);
    model.Add<Linear<> >(8, 3);
    model.Add<LogSoftMax<> >();
    

    Scenario - 1

    batch-size: 1 iterations: 10000 trials: 10

    vtable - DEBUG=ON

    mlpack version: mlpack git-aa6d2b1aa
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 494.485s
    elapsed time: 503.777s
    elapsed time: 496.802s
    elapsed time: 499.928s
    elapsed time: 502.504s
    elapsed time: 495.735s
    elapsed time: 495.745s
    elapsed time: 505.284s
    elapsed time: 495.32s
    elapsed time: 495.209s
    --------------------------------------
    elapsed time averaged(10): 498.479s
    

    boost::variant - DEBUG=ON

    mlpack version: mlpack git-4d01fe5e9
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 496.419s
    elapsed time: 495.27s
    elapsed time: 494.769s
    elapsed time: 494.922s
    elapsed time: 497.729s
    elapsed time: 497.464s
    elapsed time: 498.024s
    elapsed time: 501.722s
    elapsed time: 500.59s
    elapsed time: 497.925s
    --------------------------------------                                                                                                                                                                                                                                                                                       
    elapsed time averaged (10): 497.483s   
    

    vtable - DEBUG=OFF

    mlpack version: mlpack git-aa6d2b1aa
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 199.713s
    elapsed time: 205.177s
    elapsed time: 200.135s
    elapsed time: 200.179s
    elapsed time: 205.792s
    elapsed time: 198.293s
    elapsed time: 198.535s
    elapsed time: 206.635s
    elapsed time: 198.263s
    elapsed time: 198.521s
    --------------------------------------
    elapsed time averaged(10): 201.124s
    

    boost::variant - DEBUG=OFF

    mlpack version: mlpack git-4d01fe5e9
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 198.645s
    elapsed time: 194.854s
    elapsed time: 194.748s
    elapsed time: 194.983s
    elapsed time: 197.42s
    elapsed time: 196.864s
    elapsed time: 197.454s
    elapsed time: 204.318s
    elapsed time: 201.076s
    elapsed time: 200.549s
    --------------------------------------
    elapsed time averaged (10): 198.091s
    

    Scenario - 2

    batch-size: 32 iterations: 10000 trials: 10

    vtable - DEBUG=ON

    mlpack version: mlpack git-aa6d2b1aa
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 70.4116s
    elapsed time: 70.5631s
    elapsed time: 70.682s
    elapsed time: 70.5635s
    elapsed time: 71.2245s
    elapsed time: 71.1649s
    elapsed time: 71.4714s
    elapsed time: 71.2688s
    elapsed time: 71.3348s
    elapsed time: 71.3406s
    --------------------------------------
    elapsed time averaged(10): 71.0025s
    

    boost::variant - DEBUG=ON

    mlpack version: mlpack git-4d01fe5e9
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 70.3247s
    elapsed time: 70.5059s
    elapsed time: 70.5368s
    elapsed time: 70.5208s
    elapsed time: 70.4539s
    elapsed time: 70.788s
    elapsed time: 70.7692s
    elapsed time: 70.9473s
    elapsed time: 70.9146s
    elapsed time: 70.7278s
    --------------------------------------
    elapsed time averaged (10): 70.6489s
    

    vtable - DEBUG=OFF

    mlpack version: mlpack git-aa6d2b1aa
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 59.7968s
    elapsed time: 59.4626s
    elapsed time: 59.9147s
    elapsed time: 59.9682s
    elapsed time: 60.5511s
    elapsed time: 60.2109s
    elapsed time: 60.7782s
    elapsed time: 60.4981s
    elapsed time: 60.719s
    elapsed time: 60.7632s
    --------------------------------------
    elapsed time averaged(10): 60.2663s
    

    boost::variant - DEBUG=OFF

    mlpack version: mlpack git-4d01fe5e9
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 60.8466s
    elapsed time: 61.0629s
    elapsed time: 61.1269s
    elapsed time: 60.7426s
    elapsed time: 60.8178s
    elapsed time: 60.7287s
    elapsed time: 60.864s
    elapsed time: 60.8982s
    elapsed time: 60.9232s
    elapsed time: 60.8519s
    --------------------------------------
    elapsed time averaged (10): 60.8863s
    

    Looking at the timings, boost::variant doesn't provide the speedup I thought it would, on top of that the little speedup we would gain with boost::variant is marginal in comparison to the actual calculation.

    help wanted c: methods update dependencies 
    opened by zoq 98
  • Adding All Loss Functions

    Adding All Loss Functions

    Hello, I was going through loss functions and managed to get a list of loss functions that aren't implemented yet. I found these using pytorch and tensor flow kindly refer for more informations. The list goes as:

    1. HingeEmbedding Loss (taken by me)
    2. CosineEmbedding Loss (taken up by @kartikdutt18)
    3. MultiLabelMargin Loss
    4. TripletMargin Loss
    5. L1 Loss
    6. BCE Loss

    This might not be complete list. I will update this list as I find more. I hope this is ok with the community. Kindly feel free to take up any of the idle loss functions here. Thank You. :)

    help wanted good first issue s: stale c: methods 
    opened by ojhalakshya 93
  • Addition of all Activation Functions.

    Addition of all Activation Functions.

    Hi everyone, I have compiled a list of all activation functions that currently not implemented in mlpack but have can be found in either tensor flow or pytorch.

    1. ~~SELU~~
    2. CELU
    3. GELU (Currently taken up by @himanshupathak21061998 )
    4. Hard shrink
    5. Lisht ( I have currently taken up this issue)
    6. Soft shrink (Currently taken up by @ojhalakshya)
    7. ISRU (Inverse Square Root Unit)
    8. Inverse Square Root Linear.
    9. Square Non Linearity,

    I might have missed some functions, feel free to add them to list. If any one would like to taken up the above functions, please feel free to do so. I hope this is okay with members of the organisation, This was done in order to reduce effort in finding unimplemented functions as well as bring all add State of art activation functions to mlpack. In case I missed something or added an activation that has already been implemented, please forgive me. Thanks.

    help wanted good first issue s: stale c: methods 
    opened by kartikdutt18 89
  • ARMADILLO_INCLUDE_DIR-NOTFOUND/armadillo_bits/config.hpp not found

    ARMADILLO_INCLUDE_DIR-NOTFOUND/armadillo_bits/config.hpp not found

    -- The C compiler identification is GNU 4.8.1 -- The CXX compiler identification is GNU 4.8.1 -- Check for working C compiler: /usr/usc/gnu/gcc/4.8.1/bin/gcc -- Check for working C compiler: /usr/usc/gnu/gcc/4.8.1/bin/gcc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /usr/usc/gnu/gcc/4.8.1/bin/g++ -- Check for working CXX compiler: /usr/usc/gnu/gcc/4.8.1/bin/g++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Checking for C++11 compiler -- Checking for C++11 compiler - available -- Looking for backtrace -- Looking for backtrace - found -- backtrace facility detected in default set of libraries -- Found Backtrace: /usr/include
    CMake Error at CMake/FindArmadillo.cmake:327 (message): ARMADILLO_INCLUDE_DIR-NOTFOUND/armadillo_bits/config.hpp not found! Cannot determine what to link against. Call Stack (most recent call first): CMakeLists.txt:113 (find_package)

    how can I solve this problem? thanks a lot.

    t: question s: answered 
    opened by acgtun 83
  • Implementation of SPSA optimizer

    Implementation of SPSA optimizer

    As of now, I have just created the basic files necessary to implement the optimizer for the sake of creating the PR... I'll push the code in the subsequent commits :v:

    opened by Rajiv2605 79
  • Adapting armadillo's parser for mlpack(Removing Boost Dependencies)

    Adapting armadillo's parser for mlpack(Removing Boost Dependencies)

    For background knowledge, look at these

    Sample code to use the feature

    #include <iostream>
    #include <mlpack/core.hpp>
    
    int main()
    {
      arma::Mat<double> data;
      std::fstream file;
      
      file.open("data.csv");
      mlpack::data::load_data<double>(data, arma::csv_ascii, file);
      data.raw_print();
      
      return 0;  
    }
    
    c: core update dependencies 
    opened by heisenbuug 74
  • Addition of Essential Metrics Only.

    Addition of Essential Metrics Only.

    This is a good first issue and will help new contributors to get familiar with the codebase. Also This issue doesn't aim to add all Metrics to mlpack since each metric would have to be maintained, this aims to add metrics that either I find essential (or have used a couple of time) or those metrics which are very common. List of metrics that can be added include:

    1. IoU and meanIoU (Taken up by me)
    2. SSIM (Useful when you augment data and need to ensure that you don't augment it to an extent such that it becomes irrelevant. I used this in medical scans where there was heavy bias so I used as metric to find right augmentation parameters to perform oversampling [set augmentation parameters s.t. (average SSIM) > threshold] to automate process a bit). I think @ojhalakshya is working on it.

    Other interesting metrics would:

    1. r metric
    2. Top K Accuracy metric
    3. ~RMSE (Already implemented)~
    4. [Maybe, Not really sure about this.] Sparse Top K Accuracy

    In case some of these are implemented, please forgive my ignorance. Also anyone who starts working on them please check the following:

    1. Has it been implemented.
    2. Is there a PR open for this.
    3. Is this taken up by some one.

    This is especially necessary for functions like RMSE, r metric. Sorry for increasing workload of members and contributors, I think at least some of them will be nice additions. Thanks.

    t: feature request help wanted good first issue s: stale c: methods 
    opened by kartikdutt18 70
  • Resolve Comments in Go Bindings(#1492) and Add Markdown Documentation

    Resolve Comments in Go Bindings(#1492) and Add Markdown Documentation

    Hi @rcurtin I have tired to resolve some of the comments in PR#1492 and also add Markdown Documentation for Go Bindings.

    DONE:

    • [x] Build a fully working Go binding using make go.
    • [x] Configure CMake with cmake ../, which would find Go using FindGo.cmake.
    • [x] Add Markdown Documentation for Go Bindings.
    • [x] Resolve underscores to camelcase
    • [x] Tried to avoid unnecessary copies.
    • [x] Resolve output in arma_util.cpp , that was going out of scope.
    • [x] Removing unnecessary inputOptions and outputOptions.
    • [x] Resolve documentation for multiple outputs.
    • [x] Add Some getter and setter method for Umat,Urow and Ucol
    • [x] Add test for Umat ,Urow and Ucol
    • [x] Resolve Style issues(lines less than 80 characters) in go_binding_test.go
    • [x] Add vector of strings and int parameter type and added their tests.
    • [x] Add matrix with dataset info parameter type.
    s: keep open c: automatic bindings t: added feature 
    opened by Yashwants19 68
  • Algorithm yet to be implemented

    Algorithm yet to be implemented

    Hi there, I am interested in implementing an algorithm or a feature in mlpack which hasn't been implemented yet. It would be great if you could suggest any :smile:

    opened by Rajiv2605 61
  • Build scripts for Python bindings are not correct [Windows]

    Build scripts for Python bindings are not correct [Windows]

    Issue description

    Attempting to build python bindings on Windows using Visual Studio 2017 fails due to several issues:

    1. When using the flag BUILD_PYTHON_BINDINGS, CMake still shows a warning about not building python bindings, even though the bindings will be generated (not a roadblock).
    2. When the flag BUILD_PYTHON_BINDINGS is ON, the library will be built statically by default. I presume the python bindings require mlpack as a DLL? In that case, -DBUILD_SHARED_LIBS=ON must be enforced.
    3. Line 106 of setup.py refers to an invalid path. E.g. package_dir={ '': 'C:/mlpack/build/src/mlpack/bindings/python/' } This path ends in a slash which is not valid in a python package. What is more, I believe this path should be relative. If so, it should be replaced by: package_dir={ '': '.' },
    4. The linker expects to find mlpack and boost libraries in C:\mlpack\build\lib but this directory doesn't exist as a result of an mlpack build. Therefore, the directory needs to be manually created and populated with the following libraries: boost_serialization.lib, libboost_program_options-vc141-mt-1_65_1.lib, libboost_serialization-vc141-mt-1_65_1.lib, mlpack.dll, mlpack.lib
    5. After fixing issues 1 to 4, build will be successful. However, the resulting python package will fail to import mlpack with the following error: ImportError: cannot import name 'test_python_binding' from 'mlpack.test_python_binding' (C:\mlpack\build\src\mlpack\bindings\python\mlpack\test_python_binding.cp37-win_amd64.pyd)

    Your environment

    • version of mlpack: master branch April 19 (3.0.5)
    • operating system: windows 10 64 bits
    • compiler: MSVC 14.1
    • version of dependencies (Boost/Armadillo): boost 1.65.1, armadillo-9.300.2, OpenBLAS.0.2.14.1
    • any other environment information you think is relevant: miniconda3 (python 3.7.1)

    Steps to reproduce

    1. Clone master branch
    2. Run cmake including the flags: -DBUILD_PYTHON_BINDINGS=ON -DBUILD_SHARED_LIBS=ON
    3. Open solution with Visual Studio 2017 and build

    Expected behavior

    To successfully build python bindings AND the egg package to work (be able to import mlpack in python)

    Actual behavior

    Build failures (when workarounds are applied, the resulting package doesn't work)

    s: fixed t: bug report c: build system 
    opened by gmanlan 60
  • Use qualified std::move() to fix clang warnings for CRAN

    Use qualified std::move() to fix clang warnings for CRAN

    I received an email from Brian Ripley:

    clang 15 is warning:
    
    In file included from test_r_binding.cpp:9:
    ./mlpack/bindings/R/tests//test_r_binding_main.cpp:99:21: warning: unqualified call to 'std::move' [-Wunqualified-std-cast-call]
        arma::mat out = move(params.Get<arma::mat>("matrix_in"));
                        ^
                        std::
    
    ...
    
    Please correct before 2023-01-15 to safely retain your package on CRAN.
    

    I can't understand the motivation of such a warning, even when there is a using namespace std, but anyway, whatever, here's a change that makes all uses of std::move() qualified, so that CRAN can be happy.

    t: bugfix 
    opened by rcurtin 0
  • Fix DBSCAN handling of non-core points

    Fix DBSCAN handling of non-core points

    This handles #3339. @iad-ABDUL-RAOUF, thanks for reporting the issue! If you are willing to review the changes here and see if they make sense (at least the comments for the approach), I would appreciate it. I think I have done it correctly but I may have dropped a small detail.

    The problem is that the existing DBSCAN implementation grows clusters "through" noise/non-core points (defined as points that have less than the minimum number of neighbors minPoints). This is demonstrated by the nice test case that @iad-ABDUL-RAOUF supplied. The fix essentially boils down to allowing clusters to add non-core points, but not connect two disparate clusters through a noise point.

    Our DBSCAN implementation strategy differs a good deal from the original algorithm's pseudocode and uses a union find structure to process points serially. I spent a while considering it, and to the best of my understanding our implementation will give the same result as the original algorithm, although it does look quite different.

    c: methods t: bugfix 
    opened by rcurtin 0
  • Fix R build Github action

    Fix R build Github action

    I don't think we need to merge this before #3343, but this PR aims to address the issues found in the R build of that PR:

    • The Linux R CMD check build fails because rapidjson is not available. This can be addressed simply by installing libcereal-dev for that job.

    • The URL generated for documentation is invalid, if we are using git. Here we change it to https://www.mlpack.org/doc/mlpack-git/, instead of https://www.mlpack.org/doc/mlpack-<next version>/.

    opened by rcurtin 0
  • Check shape and size with respect to issue #2820

    Check shape and size with respect to issue #2820

    This is with respect to issue #2820. Adding shape and size checks for following methods and their related methods -

    1. Decision Tree
    2. GMM
    3. NCA
    4. Random Forest

    Please review and provide suggestions.

    s: needs review s: unanswered s: unlabeled 
    opened by AdarshSantoria 0
  • Fixing DBSCAN Alogrithm with issue #3339

    Fixing DBSCAN Alogrithm with issue #3339

    Implementing the concept of border points in order to fix the issue #3339. Steps performed -

    1. Forming the clusters with core points.
    2. Adding all border points to clusters.
    opened by AdarshSantoria 1
  • DBSCAN behaviour is different from what is described in the original article.

    DBSCAN behaviour is different from what is described in the original article.

    Not sure if I should open bug issue or a question issue. Before using the DBSCAN implementation provided by mlpack, I inspected the code (on master branch) to assert it was clustering as described in the 1996 article [1]. It seems it does not cluster like DBSCAN should.

    DBSCAN In mlpack : see mlpack/methods/dbscan/dbscan_impl.hpp file. Using the UnionFind class, it forms clusters of points that can be reach step by step through step of size epsilon. THEN it checks if each cluster count more than minpts element.

    DBSCAN In the original article : See "algorithm 1" in this 2017 article [2] published by the same authors. It describes the original DBSCAN in clearer term. For each point it looks for its neighbors in an epsilon radius. BEFORE processing the next point, it checks if this neighborhood contains at least minpts element. IF NOT the cluster is not propagated from the current point.

    DBSCAN_algo

    As consequences, the original algorithm is more robust to noisy dataset. Calling this current mlpack implementation "DBSCAN" is also misleading for users expecting it to actually run the official DBSCAN version.

    To fix this issue, I would suggest looking at the scikit-learn implementation (files _dbscan.py and _dbscan_inner.pyx) : https://github.com/scikit-learn/scikit-learn/blob/dc580a8ef5ee2a8aea80498388690e2213118efd/sklearn/cluster/_dbscan.py https://github.com/scikit-learn/scikit-learn/blob/dc580a8ef5ee2a8aea80498388690e2213118efd/sklearn/cluster/_dbscan_inner.pyx

    [1] Ester, Martin, et al. "A density-based algorithm for discovering clusters in large spatial databases with noise." kdd. Vol. 96. No. 34. 1996. [2] Schubert, Erich, et al. "DBSCAN revisited, revisited: why and how you should (still) use DBSCAN." ACM Transactions on Database Systems (TODS) 42.3 (2017): 1-21.

    t: bug report s: unanswered c: methods 
    opened by iad-ABDUL-RAOUF 4
Releases(4.0.1)
  • 4.0.1(Dec 29, 2022)

    Released Dec. 29, 2022.

    • Fix mapping of categorical data for Julia bindings (#3305).
    • Bugfix: catch all exceptions when running bindings from Julia, instead of crashing (#3304).
    • Various Python configuration fixes for Windows and OS X (#3312, #3313, #3311, #3309, #3308, #3297, #3302).
    • Optimize and strip compiled Python bindings when possible, resulting in significant size minimization (#3310).
    • The /std:c++17 and /Zc:__cplusplus options are now required when using Visual Studio (#3318). Documentation and compile-time checks added.
    • Set BUILD_TESTS to OFF by default. If you want to build tests, like mlpack_test, manually set BUILD_TESTS to ON in your CMake configuration step (#3316).
    • Fix handling of transposed matrix parameters in Python, Julia, R, and Go bindings (#3327).
    • Comment out definition of ARMA_NO DEBUG. This allows various Armadillo run-time checks such as non-conforming matrices and out-of-bounds element access. In turn this helps tracking down bugs and incorrect usage (#3322).
    Source code(tar.gz)
    Source code(zip)
  • 4.0.0(Oct 24, 2022)

    Released Oct. 24, 2022.

    This is a huge overhaul of mlpack so that the C++ portion of the library is header-only. The library no longer depends on Boost, and only requires cereal, Armadillo, and ensmallen. Compilation time has been significantly reduced due to these changes, and complicated linking processes are no longer necessary. Since this refactoring took quite a while, there have also been numerous other improvements, listed individually below:

    • Bump C++ standard requirement to C++14 (#3233).
    • Fix Perceptron to work with cross-validation framework (#3190).
    • Migrate from boost tests to Catch2 framework (#2523), (#2584).
    • Bump minimum armadillo version from 8.400 to 9.800 (#3043), (#3048).
    • Adding a copy constructor in the Convolution layer (#3067).
    • Replace boost::spirit parser by a local efficient implementation (#2942).
    • Disable correctly the autodownloader + fix tests stability (#3076).
    • Replace boost::any with core::v2::any or std::any if available (#3006).
    • Remove old non used Boost headers (#3005).
    • Replace boost::enable_if with std::enable_if (#2998).
    • Replace boost::is_same with std::is_same (#2993).
    • Remove invalid option for emsmallen and STB (#2960).
    • Check for armadillo dependencies before downloading armadillo (#2954).
    • Disable the usage of autodownloader by default (#2953).
    • Install dependencies downloaded with the autodownloader (#2952).
    • Download older Boost if the compiler is old (#2940).
    • Add support for embedded systems (#2531).
    • Build mlpack executable statically if the library is statically linked (#2931).
    • Fix cover tree loop bug on embedded arm systems (#2869).
    • Fix a LAPACK bug in FindArmadillo.cmake (#2929).
    • Add an autodownloader to get mlpack dependencies (#2927).
    • Remove Coverage files and configurations from CMakeLists (#2866).
    • Added Multi Label Soft Margin Loss loss function for neural networks (#2345).
    • Added Decision Tree Regressor (#2905). It can be used using the class mlpack::tree::DecisionTreeRegressor. It is accessible only though C++.
    • Added dict-style inspection of mlpack models in python bindings (#2868).
    • Added Extra Trees Algorithm (#2883). Currently, it can be used using the class mlpack::tree::ExtraTrees, but only through C++.
    • Add Flatten T Swish activation function (flatten-t-swish.hpp)
    • Added warm start feature to Random Forest (#2881); this feature is accessible from mlpack's bindings to different languages.
    • Added Pixel Shuffle layer (#2563).
    • Add "check_input_matrices" option to python bindings that checks for NaN and inf values in all the input matrices (#2787).
    • Add Adjusted R squared functionality to R2Score::Evaluate (#2624).
    • Disabled all the bindings by default in CMake (#2782).
    • Added an implementation to Stratify Data (#2671).
    • Add BUILD_DOCS CMake option to control whether Doxygen documentation is built (default ON) (#2730).
    • Add Triplet Margin Loss function (#2762).
    • Add finalizers to Julia binding model types to fix memory handling (#2756).
    • HMM: add functions to calculate likelihood for data stream with/without pre-calculated emission probability (#2142).
    • Replace Boost serialization library with Cereal (#2458).
    • Add PYTHON_INSTALL_PREFIX CMake option to specify installation root for Python bindings (#2797).
    • Removed boost::visitor from model classes for knn, kfn, cf, range_search, krann, and kde bindings (#2803).
    • Add k-means++ initialization strategy (#2813).
    • NegativeLogLikelihood<> now expects classes in the range 0 to numClasses - 1 (#2534).
    • Add Lambda1(), Lambda2(), UseCholesky(), and Tolerance() members to LARS so parameters for training can be modified (#2861).
    • Remove unused ElemType template parameter from DecisionTree and RandomForest (#2874).
    • Fix Python binding build when the CMake variable USE_OPENMP is set to OFF (#2884).
    • The mlpack_test target is no longer built as part of make all. Use make mlpack_test to build the tests.
    • Fixes to HoeffdingTree: ensure that training still works when empty constructor is used (#2964).
    • Fix Julia model serialization bug (#2970).
    • Fix LoadCSV() to use pre-populated DatasetInfo objects (#2980).
    • Add probabilities option to softmax regression binding, to get class probabilities for test points (#3001).
    • Fix thread safety issues in mlpack bindings to other languages (#2995).
    • Fix double-free of model pointers in R bindings (#3034).
    • Fix Julia, Python, R, and Go handling of categorical data for decision_tree() and hoeffding_tree() (#2971).
    • Depend on pkgbuild for R bindings (#3081).
    • Replaced Numpy deprecated code in Python bindings (#3126).

    Refer to the documentation on the website or in doc/ for updated instructions on how to use this new version of mlpack.

    Source code(tar.gz)
    Source code(zip)
  • 3.4.2(Oct 28, 2020)

    Released Oct. 28, 2020.

    • Added Mean Absolute Percentage Error.
    • Added Softmin activation function as layer in ann/layer.
    • Fix spurious ARMA_64BIT_WORD compilation warnings on 32-bit systems (#2665).
    Source code(tar.gz)
    Source code(zip)
  • 3.4.1(Sep 7, 2020)

    Released Sep. 7, 2020.

    • Fix incorrect parsing of required matrix/model parameters for command-line bindings (#2600).

    • Add manual type specification support to data::Load() and data::Save() (#2084, #2135, #2602).

    • Remove use of internal Armadillo functionality (#2596, #2601, #2602).

    Source code(tar.gz)
    Source code(zip)
  • 3.4.0(Sep 1, 2020)

    Released Sept. 1st, 2020.

    • Issue warnings when metrics produce NaNs in KFoldCV (#2595).

    • Added bindings for R during Google Summer of Code (#2556).

    • Added common striptype function for all bindings (#2556).

    • Refactored common utility function of bindings to bindings/util (#2556).

    • Renamed InformationGain to HoeffdingInformationGain in methods/hoeffding_trees/information_gain.hpp (#2556).

    • Added macro for changing stream of printing and warnings/errors (#2556).

    • Added Spatial Dropout layer (#2564).

    • Force CMake to show error when it didn't find Python/modules (#2568).

    • Refactor ProgramInfo() to separate out all the different information (#2558).

    • Add bindings for one-hot encoding (#2325).

    • Added Soft Actor-Critic to RL methods (#2487).

    • Added Categorical DQN to q_networks (#2454).

    • Added N-step DQN to q_networks (#2461).

    • Add Silhoutte Score metric and Pairwise Distances (#2406).

    • Add Go bindings for some missed models (#2460).

    • Replace boost program_options dependency with CLI11 (#2459).

    • Additional functionality for the ARFF loader (#2486); use case sensitive categories (#2516).

    • Add bayesian_linear_regression binding for the command-line, Python, Julia, and Go. Also called "Bayesian Ridge", this is equivalent to a version of linear regression where the regularization parameter is automatically tuned (#2030).

    • Fix defeatist search for spill tree traversals (#2566, #1269).

    • Fix incremental training of logistic regression models (#2560).

    • Change default configuration of BUILD_PYTHON_BINDINGS to OFF (#2575).

    Source code(tar.gz)
    Source code(zip)
  • 3.3.2(Jun 18, 2020)

    Released June 18, 2020.

    • Added Noisy DQN to q_networks (#2446).

    • Add [preview release of] Go bindings (#1884).

    • Added Dueling DQN to q_networks, Noisy linear layer to ann/layer and Empty loss to ann/loss_functions (#2414).

    • Storing and adding accessor method for action in q_learning (#2413).

    • Added accessor methods for ANN layers (#2321).

    • Addition of Elliot activation function (#2268).

    • Add adaptive max pooling and adaptive mean pooling layers (#2195).

    • Add parameter to avoid shuffling of data in preprocess_split (#2293).

    • Add MatType parameter to LSHSearch, allowing sparse matrices to be used for search (#2395).

    • Documentation fixes to resolve Doxygen warnings and issues (#2400).

    • Add Load and Save of Sparse Matrix (#2344).

    • Add Intersection over Union (IoU) metric for bounding boxes (#2402).

    • Add Non Maximal Supression (NMS) metric for bounding boxes (#2410).

    • Fix no_intercept and probability computation for linear SVM bindings (#2419).

    • Fix incorrect neighbors for k > 1 searches in approx_kfn binding, for the QDAFN algorithm (#2448).

    • Add RBF layer in ann module to make RBFN architecture (#2261).

    Source code(tar.gz)
    Source code(zip)
  • 3.3.1(Apr 30, 2020)

    Released April 29th, 2020.

    • Minor Julia and Python documentation fixes (#2373).

    • Updated terminal state and fixed bugs for Pendulum environment (#2354, #2369).

    • Added EliSH activation function (#2323).

    • Add L1 Loss function (#2203).

    • Pass CMAKE_CXX_FLAGS (compilation options) correctly to Python build (#2367).

    • Expose ensmallen Callbacks for sparseautoencoder (#2198).

    • Bugfix for LARS class causing invalid read (#2374).

    • Add serialization support from Julia; use mlpack.serialize() and mlpack.deserialize() to save and load from IOBuffers.

    Source code(tar.gz)
    Source code(zip)
  • 3.3.0(Apr 7, 2020)

    Released April 7th, 2020.

    • Templated return type of Forward function of loss functions (#2339).

    • Added R2 Score regression metric (#2323).

    • Added mean squared logarithmic error loss function for neural networks (#2210).

    • Added mean bias loss function for neural networks (#2210).

    • The DecisionStump class has been marked deprecated; use the DecisionTree class with NoRecursion=true or use ID3DecisionStump instead (#2099).

    • Added probabilities_file parameter to get the probabilities matrix of AdaBoost classifier (#2050).

    • Fix STB header search paths (#2104).

    • Add DISABLE_DOWNLOADS CMake configuration option (#2104).

    • Add padding layer in TransposedConvolutionLayer (#2082).

    • Fix pkgconfig generation on non-Linux systems (#2101).

    • Use log-space to represent HMM initial state and transition probabilities (#2081).

    • Add functions to access parameters of Convolution and AtrousConvolution layers (#1985).

    • Add Compute Error function in lars regression and changing Train function to return computed error (#2139).

    • Add Julia bindings (#1949). Build settings can be controlled with the BUILD_JULIA_BINDINGS=(ON/OFF) and JULIA_EXECUTABLE=/path/to/julia CMake parameters.

    • CMake fix for finding STB include directory (#2145).

    • Add bindings for loading and saving images (#2019); mlpack_image_converter from the command-line, mlpack.image_converter() from Python.

    • Add normalization support for CF binding (#2136).

    • Add Mish activation function (#2158).

    • Update init_rules in AMF to allow users to merge two initialization rules (#2151).

    • Add GELU activation function (#2183).

    • Better error handling of eigendecompositions and Cholesky decompositions (#2088, #1840).

    • Add LiSHT activation function (#2182).

    • Add Valid and Same Padding for Transposed Convolution layer (#2163).

    • Add CELU activation function (#2191)

    • Add Log-Hyperbolic-Cosine Loss function (#2207)

    • Change neural network types to avoid unnecessary use of rvalue references (#2259).

    • Bump minimum Boost version to 1.58 (#2305).

    • Refactor STB support so HAS_STB macro is not needed when compiling against mlpack (#2312).

    • Add Hard Shrink Activation Function (#2186).

    • Add Soft Shrink Activation Function (#2174).

    • Add Hinge Embedding Loss Function (#2229).

    • Add Cosine Embedding Loss Function (#2209).

    • Add Margin Ranking Loss Function (#2264).

    • Bugfix for incorrect parameter vector sizes in logistic regression and softmax regression (#2359).

    Source code(tar.gz)
    Source code(zip)
  • 3.2.1(Nov 26, 2019)

    Released Oct. 1, 2019. (But I forgot to release it on Github; sorry about that.)

    • Enforce CMake version check for ensmallen #2032.
    • Fix CMake check for Armadillo version #2029.
    • Better handling of when STB is not installed #2033.
    • Fix Naive Bayes classifier computations in high dimensions #2022.
    Source code(tar.gz)
    Source code(zip)
  • 3.2.0(Sep 26, 2019)

    Released Sept. 25, 2019.

    • Fix occasionally-failing RADICAL test (#1924).

    • Fix gcc 9 OpenMP compilation issue (#1970).

    • Added support for loading and saving of images (#1903).

    • Add Multiple Pole Balancing Environment (#1901, #1951).

    • Added functionality for scaling of data (#1876); see the command-line binding mlpack_preprocess_scale or Python binding preprocess_scale().

    • Add new parameter maximum_depth to decision tree and random forest bindings (#1916).

    • Fix prediction output of softmax regression when test set accuracy is calculated (#1922).

    • Pendulum environment now checks for termination. All RL environments now have an option to terminate after a set number of time steps (no limit by default) (#1941).

    • Add support for probabilistic KDE (kernel density estimation) error bounds when using the Gaussian kernel (#1934).

    • Fix negative distances for cover tree computation (#1979).

    • Fix cover tree building when all pairwise distances are 0 (#1986).

    • Improve KDE pruning by reclaiming not used error tolerance (#1954, #1984).

    • Optimizations for sparse matrix accesses in z-score normalization for CF (#1989).

    • Add kmeans_max_iterations option to GMM training binding gmm_train_main.

    • Bump minimum Armadillo version to 8.400.0 due to ensmallen dependency requirement (#2015).

    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.1.1(May 27, 2019)

    Released May 26, 2019.

    • Fix random forest bug for numerical-only data (#1887).
    • Significant speedups for random forest (#1887).
    • Random forest now has minimum_gain_split and subspace_dim parameters (#1887).
    • Decision tree parameter print_training_error deprecated in favor of print_training_accuracy.
    • output option changed to predictions for adaboost and perceptron binding. Old options are now deprecated and will be preserved until mlpack 4.0.0 (#1882).
    • Concatenated ReLU layer (#1843).
    • Accelerate NormalizeLabels function using hashing instead of linear search (see src/mlpack/core/data/normalize_labels_impl.hpp) (#1780).
    • Add ConfusionMatrix() function for checking performance of classifiers (#1798).
    • Install ensmallen headers when it is downloaded during build (#1900).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.1.0(Apr 26, 2019)

    Released April 25, 2019. Release email

    • Add DiagonalGaussianDistribution and DiagonalGMM classes to speed up the diagonal covariance computation and deprecate DiagonalConstraint (#1666).

    • Add kernel density estimation (KDE) implementation with bindings to other languages (#1301).

    • Where relevant, all models with a Train() method now return a double value representing the goodness of fit (i.e. final objective value, error, etc.) (#1678).

    • Add implementation for linear support vector machine (see src/mlpack/methods/linear_svm).

    • Change DBSCAN to use PointSelectionPolicy and add OrderedPointSelection (#1625).

    • Residual block support (#1594).

    • Bidirectional RNN (#1626).

    • Dice loss layer (#1674, #1714) and hard sigmoid layer (#1776).

    • output option changed to predictions and output_probabilities to probabilities for Naive Bayes binding (mlpack_nbc/nbc()). Old options are now deprecated and will be preserved until mlpack 4.0.0 (#1616).

    • Add support for Diagonal GMMs to HMM code (#1658, #1666). This can provide large speedup when a diagonal GMM is acceptable as an emission probability distribution.

    • Python binding improvements: check parameter type (#1717), avoid copying Pandas dataframes (#1711), handle Pandas Series objects (#1700).

    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.4(Nov 13, 2018)

    Released November 13, 2018.

    • Bump minimum CMake version to 3.3.2.
    • CMake fixes for Ninja generator by Marc Espie (#1550, #1537, #1523).
    • More efficient linear regression implementation (#1500).
    • Serialization fixes for neural networks (#1508, #1535).
    • Mean shift now allows single-point clusters (#1536).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.3(Jul 29, 2018)

    Released July 27th, 2018.

    • Fix Visual Studio compilation issue (#1443).
    • Allow running local_coordinate_coding binding with no initial_dictionary parameter when input_model is not specified (#1457).
    • Make use of OpenMP optional via the CMake USE_OPENMP configuration variable (#1474).
    • Accelerate FNN training by 20-30% by avoiding redundant calculations (#1467).
    • Fix math::RandomSeed() usage in tests (#1462, #1440).
    • Generate better Python setup.py with documentation (#1460).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.2(Jun 9, 2018)

    Released June 8th, 2018.

    • Documentation generation fixes for Python bindings (#1421).
    • Fix build error for man pages if command-line bindings are not being built (#1424).
    • Add shuffle parameter and Shuffle() method to KFoldCV (#1412). This will shuffle the data when the object is constructed, or when Shuffle() is called.
    • Added neural network layers: AtrousConvolution (#1390), Embedding (#1401), and LayerNorm (layer normalization) (#1389).
    • Add Pendulum environment for reinforcement learning (#1388) and update Mountain Car environment (#1394).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.1(May 11, 2018)

    Released May 10th, 2018.

    • Fix intermittently failing tests (#1387).
    • Add Big-Batch SGD (BBSGD) optimizer in src/mlpack/core/optimizers/bigbatch_sgd (#1131).
    • Fix simple compiler warnings (#1380, #1373).
    • Simplify NeighborSearch constructor and Train() overloads (#1378).
    • Add warning for OpenMP setting differences (#1358/#1382). When mlpack is compiled with OpenMP but another application linking against mlpack is not (or vice versa), a compilation warning will now be issued.
    • Restructured loss functions in src/mlpack/methods/ann/ (#1365).
    • Add environments for reinforcement learning tests (#1368, #1370, #1329).
    • Allow single outputs for multiple timestep inputs for recurrent neural networks (#1348).
    • Neural networks: add He and LeCun normal initializations (#1342), add FReLU and SELU activation functions (#1346, #1341), add alpha-dropout (#1349).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.0(Mar 31, 2018)

    Released March 30th, 2018.

    • Speed and memory improvements for DBSCAN. --single_mode can now be used for situations where previously RAM usage was too high.
    • Bump minimum required version of Armadillo to 6.500.0.
    • Add automatically generated Python bindings. These have the same interface as the command-line programs.
    • Add deep learning infrastructure in src/mlpack/methods/ann/.
    • Add reinforcement learning infrastructure in src/mlpack/methods/reinforcement_learning/.
    • Add optimizers: AdaGrad, CMAES, CNE, FrankeWolfe, GradientDescent, GridSearch, IQN, Katyusha, LineSearch, ParallelSGD, SARAH, SCD, SGDR, SMORMS3, SPALeRA, SVRG.
    • Add hyperparameter tuning infrastructure and cross-validation infrastructure in src/mlpack/core/cv/ and src/mlpack/core/hpt/.
    • Fix bug in mean shift.
    • Add random forests (see src/mlpack/methods/random_forest).
    • Numerous other bugfixes and testing improvements.
    • Add randomized Krylov SVD and Block Krylov SVD.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.2.5(Aug 26, 2017)

  • mlpack-2.2.4(Jul 19, 2017)

    Released July 18th, 2017.

    • Speed and memory improvements for DBSCAN. --single_mode can now be used for situations where previously RAM usage was too high.
    • Fix bug in CF causing incorrect recommendations.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.2.3(May 24, 2017)

  • mlpack-2.2.2(May 5, 2017)

    Released May 4th, 2017.

    • Install backwards-compatibility mlpack_allknn and mlpack_allkfn programs; note they are deprecated and will be removed in mlpack 3.0.0 (#992).
    • Fix RStarTree bug that surfaced on OS X only (#964).
    • Small fixes for MiniBatchSGD and SGD and tests.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.2.1(Apr 13, 2017)

  • mlpack-2.2.0(Mar 21, 2017)

    Released Mar. 21st, 2017.

    • Bugfix for mlpack_knn program (#816).
    • Add decision tree implementation in methods/decision_tree/. This is very similar to a C4.5 tree learner.
    • Add DBSCAN implementation in methods/dbscan/.
    • Add support for multidimensional discrete distributions (#810, #830).
    • Better output for Log::Debug/Log::Info/Log::Warn/Log::Fatal for Armadillo objects (#895, #928).
    • Refactor categorical CSV loading with boost::spirit for faster loading (#681).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.1.1(Dec 22, 2016)

    Released Dec. 22nd, 2016.

    • HMMs now use random initialization; this should fix some convergence issues (#828).
    • HMMs now initialize emissions according to the distribution of observations (#833).
    • Minor fix for formatted output (#814).
    • Fix DecisionStump to properly work with any input type.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.1.0(Oct 31, 2016)

    Released Oct. 31st, 2016.

    • Fixed CoverTree to properly handle single-point datasets.
    • Fixed a bug in CosineTree (and thus QUIC-SVD) that caused split failures for some datasets (#717).
    • Added mlpack_preprocess_describe program, which can be used to print statistics on a given dataset (#742).
    • Fix prioritized recursion for k-furthest-neighbor search (mlpack_kfn and the KFN class), leading to orders-of-magnitude speedups in some cases.
    • Bump minimum required version of Armadillo to 4.200.0.
    • Added simple Gradient Descent optimizer, found in src/mlpack/core/optimizers/gradient_descent/ (#792).
    • Added approximate furthest neighbor search algorithms QDAFN and DrusillaSelect in src/mlpack/methods/approx_kfn/, with command-line program mlpack_approx_kfn.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.0.3(Jul 21, 2016)

    Released July 21st, 2016.

    • Standardize some parameter names for programs (old names are kept for reverse compatibility, but warnings will now be issued).
    • RectangleTree optimizations (#721).
    • Fix memory leak in NeighborSearch (#731).
    • Documentation fix for k-means tutorial (#730).
    • Fix TreeTraits for BallTree (#727).
    • Fix incorrect parameter checks for some command-line programs.
    • Fix error in HMM training with probabilities for each point (#636).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.0.2(Jun 20, 2016)

    Released June 20th, 2016.

    • Added the function LSHSearch::Projections(), which returns an arma::cube with each projection table in a slice (#663). Instead of Projection(i), you should now use Projections().slice(i).
    • A new constructor has been added to LSHSearch that creates objects using projection tables provided in an arma::cube (#663).
    • LSHSearch projection tables refactored for speed (#675).
    • Handle zero-variance dimensions in DET (#515).
    • Add MiniBatchSGD optimizer (src/mlpack/core/optimizers/minibatch_sgd/) and allow its use in mlpack_logistic_regression and mlpack_nca programs.
    • Add better backtrace support from Grzegorz Krajewski for Log::Fatal messages when compiled with debugging and profiling symbols. This requires libbfd and libdl to be present during compilation.
    • CosineTree test fix from Mikhail Lozhnikov (#358).
    • Fixed HMM initial state estimation (#600).
    • Changed versioning macros __MLPACK_VERSION_MAJOR, __MLPACK_VERSION_MINOR, and __MLPACK_VERSION_PATCH to MLPACK_VERSION_MAJOR, MLPACK_VERSION_MINOR, and MLPACK_VERSION_PATCH. The old names will remain in place until mlpack 3.0.0.
    • Renamed mlpack_allknn, mlpack_allkfn, and mlpack_allkrann to mlpack_knn, mlpack_kfn, and mlpack_krann. The mlpack_allknn, mlpack_allkfn, and mlpack_allkrann programs will remain as copies until mlpack 3.0.0.
    • Add --random_initialization option to mlpack_hmm_train, for use when no labels are provided.
    • Add --kill_empty_clusters option to mlpack_kmeans and KillEmptyClusters policy for the KMeans class (#595, #596).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.0.1(Mar 3, 2016)

    Released Feb. 4th, 2016.

    • Fix CMake to properly detect when MKL is being used with Armadillo.
    • Minor parameter handling fixes to mlpack_logistic_regression (#504, #505).
    • Properly install arma_config.hpp.
    • Memory handling fixes for Hoeffding tree code.
    • Add functions that allow changing training-time parameters to HoeffdingTree class.
    • Fix infinite loop in sparse coding test.
    • Documentation spelling fixes (#501).
    • Properly handle covariances for Gaussians with large condition number (#496), preventing GMMs from filling with NaNs during training (and also HMMs that use GMMs).
    • CMake fixes for finding LAPACK and BLAS as Armadillo dependencies when ATLAS is used.
    • CMake fix for projects using mlpack's CMake configuration from elsewhere (#512).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.0.0(Dec 24, 2015)

    Released Dec. 23rd, 2015.

    • Removed overclustering support from k-means because it is not well-tested, may be buggy, and is (I think) unused. If this was support you were using, open a bug or get in touch with us; it would not be hard for us to reimplement it.
    • Refactored KMeans to allow different types of Lloyd iterations.
    • Added implementations of k-means: Elkan's algorithm, Hamerly's algorithm, Pelleg-Moore's algorithm, and the DTNN (dual-tree nearest neighbor) algorithm.
    • Significant acceleration of LRSDP via the use of accu(a % b) instead of trace(a * b).
    • Added MatrixCompletion class (matrix_completion), which performs nuclear norm minimization to fill unknown values of an input matrix.
    • No more dependence on Boost.Random; now we use C++11 STL random support.
    • Add softmax regression, contributed by Siddharth Agrawal and QiaoAn Chen.
    • Changed NeighborSearch, RangeSearch, FastMKS, LSH, and RASearch API; these classes now take the query sets in the Search() method, instead of in the constructor.
    • Use OpenMP, if available. For now OpenMP support is only available in the DET training code.
    • Add support for predicting new test point values to LARS and the command-line 'lars' program.
    • Add serialization support for Perceptron and LogisticRegression.
    • Refactor SoftmaxRegression to predict into an arma::Row<size_t> object, and add a softmax_regression program.
    • Refactor LSH to allow loading and saving of models.
    • ToString() is removed entirely (#487).
    • Add --input_model_file and --output_model_file options to appropriate machine learning algorithms.
    • Rename all executables to start with an "mlpack" prefix (#229).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-1.0.12(Jan 7, 2015)

Owner
mlpack
a scalable C++ machine learning library
mlpack
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 5, 2023
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 3, 2023
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 8, 2023
Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

SDK: Overview of the Kubeflow pipelines service Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on

Kubeflow 3.1k Jan 6, 2023
STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

TD Ameritrade 2.5k Jan 6, 2023
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Jan 9, 2023
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 84 Nov 25, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbit 8.1k Dec 30, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 19 Oct 3, 2022
LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading

LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading. The framework simplify development, testing, deployment, analysis and training algo trading strategies. The framework automatically analyzes trading sessions, and the analysis may be used to train predictive models.

Amichay Oren 458 Dec 24, 2022
UpliftML: A Python Package for Scalable Uplift Modeling

UpliftML is a Python package for scalable unconstrained and constrained uplift modeling from experimental data. To accommodate working with big data, the package uses PySpark and H2O models as base learners for the uplift models. Evaluation functions expect a PySpark dataframe as input.

Booking.com 254 Dec 31, 2022
cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

RAPIDS 3.1k Dec 28, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022
MLBox is a powerful Automated Machine Learning python library.

MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle

Axel 1.4k Jan 6, 2023
Library for machine learning stacking generalization.

stacked_generalization Implemented machine learning *stacking technic[1]* as handy library in Python. Feature weighted linear stacking is also availab

null 114 Jul 19, 2022
Uber Open Source 1.6k Dec 31, 2022
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

null 152 Jan 2, 2023
Pandas Machine Learning and Quant Finance Library Collection

Pandas Machine Learning and Quant Finance Library Collection

null 148 Dec 7, 2022