Implementation of K-Nearest Neighbors Algorithm Using PySpark

Zachary Petroff

Last update: Dec 30, 2022

Related tags

Machine Learning KNN-With-Spark

Overview

KNN With Spark

Implementation of KNN using PySpark. The KNN was used on two separate datasets (https://archive.ics.uci.edu/ml/datasets/iris and https://archive.ics.uci.edu/ml/datasets/Fertility). The data was first normalized, also using PySpark. Euclidean Distance was used as the similarity measure. The optimal k found for both datasets was 5. The iris dataset had a test accuracy of 97% and the fertility dataset had a test accuracy of 88%.

Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm.

Naive-Bayes Spam Classificator Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm. Main goal is to code a

1 Jun 27, 2022

The project's goal is to show a real world application of image segmentation using k means algorithm

2 Jan 22, 2022

Python implementation of the rulefit algorithm

RuleFit Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF) The algorithm can be used f

326 Jan 2, 2023

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment to test the algorithm

59 Dec 9, 2022

Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

GENDIS GENetic DIscovery of Shapelets In the time series classification domain, shapelets are small subseries that are discriminative for a certain cl

90 Oct 28, 2022

This machine-learning algorithm takes in data from the last 60 days and tries to predict tomorrow's price of any crypto you ask it.

Crypto-Currency-Predictor This machine-learning algorithm takes in data from the last 60 days and tries to predict tomorrow's price of any crypto you

6 Dec 4, 2022

BASTA: The BAyesian STellar Algorithm

BASTA: BAyesian STellar Algorithm Current stable version: v1.0 Important note: BASTA is developed for Python 3.8, but Python 3.7 should work as well.

16 Nov 15, 2022

Send rockets to Mars with artificial intelligence(Genetic algorithm) in python.

Send Rockets To Mars With AI Send rockets to Mars with artificial intelligence(Genetic algorithm) in python. Tools Python 3 EasyDraw How to Play Insta

3 Jul 15, 2022

Decision Tree Regression algorithm implemented on Python from scratch.

Decision_Tree_Regression I implemented the decision tree regression algorithm on Python. Unlike regular linear regression, this algorithm is used when

1 Dec 22, 2021

Owner

Zachary Petroff

Cognitive Science Major @ Indiana University.

GitHub

Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir

6 Jun 30, 2022

PySpark + Scikit-learn = Sparkit-learn

Sparkit-learn PySpark + Scikit-learn = Sparkit-learn GitHub: https://github.com/lensacom/sparkit-learn About Sparkit-learn aims to provide scikit-lear

1.1k Jan 4, 2023

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Petastorm Contents Petastorm Installation Generating a dataset Plain Python API Tensorflow API Pytorch API Spark Dataset Converter API Analyzing petas

1.6k Dec 31, 2022

Distributed scikit-learn meta-estimators in PySpark

sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn

282 Dec 9, 2022

Estudos e projetos feitos com PySpark.

PySpark (Spark com Python) PySpark é uma biblioteca Spark escrita em Python, e seu objetivo é permitir a análise interativa dos dados em um ambiente d

54 Nov 6, 2022

PySpark ML Bank Churn Prediction

PySpark-Bank-Churn Surname: corresponds to the record (row) number and has no effect on the output. CreditScore: contains random values and has no eff

2 Nov 11, 2021

ANNchor is a python library which constructs approximate k-nearest neighbour graphs for slow metrics.

Fast k-NN graph construction for slow metrics

28 Dec 29, 2022

A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

31 Nov 3, 2022

Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

Neighbourhood Retrieval with Distance Correlation Assign Pseudo class labels to datapoints in the latent space. NNDC is a slim wrapper around FAISS. N

1 Jan 16, 2022

using Machine Learning Algorithm to classification AppleStore application

AppleStore-classification-with-Machine-learning-Algo- using Machine Learning Algorithm to classification AppleStore application. the first step : 1: p

2 May 2, 2022

Implementation of K-Nearest Neighbors Algorithm Using PySpark

Related tags

Overview

KNN With Spark

You might also like...

Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm.

The project's goal is to show a real world application of image segmentation using k means algorithm

Python implementation of the rulefit algorithm

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

This machine-learning algorithm takes in data from the last 60 days and tries to predict tomorrow's price of any crypto you ask it.

BASTA: The BAyesian STellar Algorithm

Send rockets to Mars with artificial intelligence(Genetic algorithm) in python.

Decision Tree Regression algorithm implemented on Python from scratch.

Owner

Zachary Petroff

Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

PySpark + Scikit-learn = Sparkit-learn

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Distributed scikit-learn meta-estimators in PySpark

Estudos e projetos feitos com PySpark.

PySpark ML Bank Churn Prediction

ANNchor is a python library which constructs approximate k-nearest neighbour graphs for slow metrics.

A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

using Machine Learning Algorithm to classification AppleStore application