A naive Bayes model for cancer classification using a set of documents

Alex W King

Last update: Nov 24, 2021

Related tags

Machine Learning naivebayes

Overview

Naivebayes text classifcation model for cancer and noncancer documents

Author: Alex King

Purpose
Requirements/files included
How to use

1. Purpose

The Purpose of this program is to read in from csv files containing two columns:

                    Document | classifcation
                    xxxxxx   | cancer/nocancer
                    xxxxxx   | cancer/nocancer
                    xxxxxx   | cancer/nocancer

This program uses the data to read into classes containing each documents one file is used as the training set, and the other as the testing set. Each set goes through the same tokenization. From there one is trained and the other is tested.

2. Requirements/files used

* python3 * numpy library - for calculating log * pandas library - for reading in csv files * main.py and naivesbayes.py * stopwords.txt - list of stop words * Scoring.docx - list of scoring for precsion, Recall, F-score

3. How to use

This program has 3 modes of operation for tokenizing your sets:

                $python3 main.py -train 1 -test 1

This first command will execute std tokenization on training set 1 and test set 1. To change which training set just change the 1 into a 2.

                $python3 main.py -train 2 -test 1

#NOTE do not change testing set number leave it as 1 it was intended for multiple testing sets

For binary:

                $python3 main.py -train # -test 1 -b

For stopwords:

                $python3 main.py -train # -test 1 -s

For both stopwords and binary:

                $python3 main.py -train # -test 1 -b -s

You might also like...

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

6.9k Jan 5, 2023

A naive Bayes model for cancer classification using a set of documents

Related tags

Overview

Naivebayes text classifcation model for cancer and noncancer documents

Author: Alex King

1. Purpose

2. Requirements/files used

3. How to use

You might also like...

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

A Python package for time series classification

A simple example of ML classification, cross validation, and visualization of feature importances

Classification based on Fuzzy Logic(C-Means).

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification

Python package for machine learning for healthcare using a OMOP common data model

Simulation of early COVID-19 using SIR model and variants (SEIR ...).

Forecasting prices using Facebook/Meta's Prophet model

Owner

Alex W King

Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service

using Machine Learning Algorithm to classification AppleStore application

A machine learning web application for binary classification using streamlit

Covid-polygraph - a set of Machine Learning-driven fact-checking tools

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale.

A scikit-learn based module for multi-label et. al. classification

Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.