A machine learning project that predicts the price of used cars in the UK

Victor Umunna

Last update: Oct 13, 2022

Related tags

Overview

Car Price Prediction

Image Credit: AA Cars

Project Overview

Scraped 3000 used cars data from AA Cars website using Python and BeautifulSoup.
Cleaned the data and built a model to help determine the price of cars on auction
Built a flask web app and deploy to cloud

Packages/Tools Used

Python Version: 3.9
BeautifulSoup
Request
Numpy
Matplotlib
Seaborn
Scikit-Learn

Data

The data was scraped from AA Cars. The data was scraped from multiple pages from the site and was stored as a csv file. The scraped data contains:

Name
Price
Year
Mileage
Engine
Transmisson

Data Cleaning

The features (columns) contained messy entries and were tidied using some custom functions. The following steps were taken.

Removed the duplicate rows in the data because it will affect the analysis.
Deleted thhe rows with missing values because they ae not up to 1% of the data.
Extracted the manufaturer of each car from the name column
Corrected some of the values in the manufacturers column by merging similar value and correcting those wrongly extracted.
Removed the pounds symbol and the comma in the values of the price column
Created an age column by substacting the values in the year column fom the current year, 2021. This is an easier column to work with.
Removed the commas, space and miles input in all the values of the mileage columns.
- Corrected some of the values in the engine and transmission columns by merging similar value and correcting those wrongly extracted.

Exploratory Data Analysis

The count of the number of cars owned by each car manufacturer
The count of the number of cars from the different years
The count of the number of cars with the diffrent car engine types
The count of the number of cars with different car transmission types
The word cloud of all car manufacturers.

Model Building

The 'name' and 'year' column were dropped because they are irrelevant.
The categorical features (name, colour and transmission) were transformed into numerical data and I scaled all the feature values to make all of them be in the same range
Linear Regression, Ridge Regression, Random Forest Regressor, Ada Boost Regressor and Support Vector Regressor models were all built.
Root mean squared error (RMSE) which is the square root of the sum of the difference between the true value and the predicted value was the metric used to evaluate the performance of the model.
The CatBoost Regressor model has the best performance and it was hypertuned using GridSearchCV to improve the performance.
The model was tested on new data and it gave a good output.

A flask web app is currently under construction

NB: I am open to constructive criticisms about this project

A linear regression model for house price prediction

Linear_Regression_Model A linear regression model for house price prediction. This code is using these packages, so please make sure your have install

1 Nov 29, 2021

Avocado hass time series vs predict price

AVOCADO HASS TIME SERIES VÀ PREDICT PRICE Trước khi vào Heroku muốn giao diện đẹp mọi người chuyển giúp mình theo hình bên dưới https://avocado-hass.h

3 Dec 18, 2021

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

8.9k Jan 9, 2023

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

8.1k Dec 30, 2022

CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

19 Oct 3, 2022

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

14.5k Jan 7, 2023

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

6.9k Jan 5, 2023

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Petastorm Contents Petastorm Installation Generating a dataset Plain Python API Tensorflow API Pytorch API Spark Dataset Converter API Analyzing petas

1.6k Dec 31, 2022

learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

6 Nov 5, 2022

A machine learning project that predicts the price of used cars in the UK

Related tags

Overview

Car Price Prediction

Project Overview

Packages/Tools Used

Data

Data Cleaning

Exploratory Data Analysis

Model Building

You might also like...

A linear regression model for house price prediction

Avocado hass time series vs predict price

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

Owner

Victor Umunna

A simple python program which predicts the success of a movie based on it's type, actor, actress and director

This machine-learning algorithm takes in data from the last 60 days and tries to predict tomorrow's price of any crypto you ask it.

Stock Price Prediction Bank Jago Using Facebook Prophet Machine Learning & Python

Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Ml based project which uses regression technique to predict the price.

A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!

Warren - Stock Price Predictor

Price forecasting of SGB and IRFC Bonds and comparing there returns

Cryptocurrency price prediction and exceptions in python

This repository contains the code to predict house price using Linear Regression Method