A non-linear, non-parametric Machine Learning method capable of modeling complex datasets

VAMSHI CHOWDARY

Last update: Jun 22, 2022

Related tags

Deep Learning Fast-Symbolic-Regression

Overview

Fast Symbolic Regression

Symbolic Regression is a non-linear, non-parametric Machine Learning method capable of modeling complex data sets. fastsr aims at providing the most simple, powerful models possible by optimizing not only for error but also for model complexity. fastsr is built on top of fastgp, a numpy implementation of genetic programming built on top of deap. All estimators adhere to the sklearn estimator interface and can thus be used in pipelines.

fastsr was designed and developed by the Morphology, Evolution & Cognition Laboratory at the University of Vermont. It extends research code which can be found here.

Installation

fastsr is compatible with Python 2.7+.

pip install fastsr

Example Usage

Symbolic Regression is really good at fitting nonlinear functions. Let's try to fit the third order polynomial x^3 + x^2 + x. This is the "regression" example from the examples folder.

import matplotlib.pyplot as plt

import numpy as np

from fastsr.estimators.symbolic_regression import SymbolicRegression

from fastgp.algorithms.fast_evaluate import fast_numpy_evaluate
from fastgp.parametrized.simple_parametrized_terminals import get_node_semantics

def target(x):
    return x**3 + x**2 + x

Now we'll generate some data on the domain [-10, 10].

X = np.linspace(-10, 10, 100, endpoint=True)
y = target(X)

Finally we'll create and fit the Symbolic Regression estimator and check the score.

sr = SymbolicRegression(seed=72066)
sr.fit(X, y)
score = sr.score(X, y)

Score: 0.0

Whoa! That's not much error. Don't get too used to scores like that though, real data sets aren't usually as simple as a third order polynomial.

fastsr uses Genetic Programming to fit the data. That means equations are evolving to fit the data better and better each generation. Let's have a look at the best individuals and their respective scores.

print('Best Individuals:')
sr.print_best_individuals()

Best Individuals:
0.0 : add(add(square(X0), cube(X0)), X0)
34.006734006733936 : add(square(X0), cube(X0))
2081.346746380927 : add(cube(X0), X0)
2115.3534803876605 : cube(X0)
137605.24466869785 : add(add(X0, add(X0, X0)), add(X0, X0))
141529.89102341252 : add(add(X0, X0), add(X0, X0))
145522.55084614072 : add(add(X0, X0), X0)
149583.22413688237 : add(X0, X0)
151203.96034032793 : numpy_protected_sqrt(cube(numpy_protected_log_abs(exp(X0))))
151203.96034032793 : cube(numpy_protected_sqrt(X0))
153711.91089563753 : numpy_protected_log_abs(exp(X0))
153711.91089563753 : X0
155827.26437602515 : square(X0)
156037.81673350732 : add(numpy_protected_sqrt(X0), cbrt(X0))
157192.02956807753 : numpy_protected_sqrt(exp(cbrt(X0)))

At the top we find our best individual, which is exactly the third order polynomial we defined our target function to be. You might be confused as to why we consider all these other individuals, some with very large errors be be "best". We can look through the history object to see some of the equations that led up to our winning model by ordering by error.

history = sr.history_
population = list(filter(lambda x: hasattr(x, 'error'), list(sr.history_.genealogy_history.values())))
population.sort(key=lambda x: x.error, reverse=True)

Let's get a sample of the unique solutions. There are quite a few so the print statements have been omitted.

X = X.reshape((len(X), 1))
i = 1
previous_errror = population[0]
unique_individuals = []
while i < len(population):
    ind = population[i]
    if ind.error != previous_errror:
        print(str(i) + ' | ' + str(ind.error) + ' | ' + str(ind))
        unique_individuals.append(ind)
    previous_errror = ind.error
    i += 1

Now we can plot the equations over the target functions.

def plot(index):
    plt.plot(X, y, 'r')
    plt.axis([-10, 10, -1000, 1000])
    y_hat = fast_numpy_evaluate(unique_individuals[index], sr.pset_.context, X, get_node_semantics)
    plt.plot(X, y_hat, 'g')
    plt.savefig(str(i) + 'ind.png')
    plt.gcf().clear()

i = 0
while i < len(unique_individuals):
    plot(i)
    i += 10
i = len(unique_individuals) - 1
plot(i)

Stitched together into a gif we get a view into the evolutionary process.

Fitness Age Size Complexity Pareto Optimization

In addition to minimizing the error when creating an interpretable model it's often useful to minimize the size of the equations and their complexity (as defined by the order of an approximating polynomial[1]). In Multi-Objective optimization we keep all individuals that are not dominated by any other individuals and call this group the Pareto Front. These are the individuals printed in the Example Usage above. The age component helps prevent the population of equations from falling into a local optimum and was introduced in AFPO [2] but is out of the scope of this readme.

The result of this optimization technique is that a range of solutions are considered "best" individuals. Although in practice you will probably be interested in the top or several top individuals, be aware that the population as a whole was pressured into keeping individual equations as simple as possible in addition to keeping error as low as possible.

Literature Cited

Ekaterina J Vladislavleva, Guido F Smits, and Dick Den Hertog. 2009. Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Transactions on Evolutionary Computation 13, 2 (2009), 333–349.
Michael Schmidt and Hod Lipson. 2011. Age-fitness pareto optimization. In Genetic Programming Theory and Practice VIII. Springer, 129–146.

With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function

With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function. At the moment, only TensorFlow sequential models are supported. Interfaces to either the Pyomo or Gurobi modeling environments are offered.

40 Dec 27, 2022

Hitters Linear Regression - Hitters Linear Regression With Python

Hitters_Linear_Regression Kullanacağımız veri seti Carnegie Mellon Üniversitesi'

2 Jan 26, 2022

Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers.

ConditionalQA Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers. Disclaimer This dataset

2 Oct 14, 2021

Parametric Contrastive Learning (ICCV2021)

Parametric-Contrastive-Learning This repository contains the implementation code for ICCV2021 paper: Parametric Contrastive Learning (https://arxiv.or

156 Dec 21, 2022

Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy

Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy Simplex Algorithm is a popular algorithm for linear programmi

2 Oct 12, 2022

PyTorch implementation of the WarpedGANSpace: Finding non-linear RBF paths in GAN latent space (ICCV 2021)

Authors official PyTorch implementation of the "WarpedGANSpace: Finding non-linear RBF paths in GAN latent space" [ICCV 2021].

100 Dec 6, 2022

The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

OC-SORT Observation-Centric SORT (OC-SORT) is a pure motion-model-based multi-object tracker. It aims to improve tracking robustness in crowded scenes

325 Jan 5, 2023

A parametric soroban written with CADQuery.

A parametric soroban written in CADQuery The purpose of this project is to demonstrate how "code CAD" can be intuitive to learn. See soroban.py for a

4 Aug 13, 2022

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.

DanceNet3D The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer. Dataset and Results Pleas

36 Dec 21, 2022

A non-linear, non-parametric Machine Learning method capable of modeling complex datasets

Related tags

Overview

Fast Symbolic Regression

Installation

Example Usage

Fitness Age Size Complexity Pareto Optimization

Literature Cited

You might also like...

With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function

Hitters Linear Regression - Hitters Linear Regression With Python

Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers.

Parametric Contrastive Learning (ICCV2021)

Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy

PyTorch implementation of the WarpedGANSpace: Finding non-linear RBF paths in GAN latent space (ICCV 2021)

The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

A parametric soroban written with CADQuery.

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.

Owner

VAMSHI CHOWDARY

Linear algebra python - Number of operations and problems in Linear Algebra and Numerical Linear Algebra

Simple Linear 2nd ODE Solver GUI - A 2nd constant coefficient linear ODE solver with simple GUI using euler's method

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

JumpDiff: Non-parametric estimator for Jump-diffusion processes for Python

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

A non-linear, non-parametric Machine Learning method capable of modeling complex datasets

Related tags

Overview

Fast Symbolic Regression

Installation

Example Usage

Fitness Age Size Complexity Pareto Optimization

Literature Cited

You might also like...

With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function

Hitters Linear Regression - Hitters Linear Regression With Python

Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers.

Parametric Contrastive Learning (ICCV2021)

Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy

PyTorch implementation of the WarpedGANSpace: Finding non-linear RBF paths in GAN latent space (ICCV 2021)

The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

A parametric soroban written with CADQuery.

The personal repository of the work: *DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer*.

Owner

VAMSHI CHOWDARY

Linear algebra python - Number of operations and problems in Linear Algebra and Numerical Linear Algebra

Simple Linear 2nd ODE Solver GUI - A 2nd constant coefficient linear ODE solver with simple GUI using euler's method

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

JumpDiff: Non-parametric estimator for Jump-diffusion processes for Python

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.