Using the provided dataset which includes various book features, in order to predict the price of books, using various proposed methods and models.

Nikolas Petrou

Last update: Jan 13, 2022

Related tags

Deep Learning visualization python machine-learning books sklearn lightgbm tfidf topic-extraction text-translation catboost price-prediction lda-topic-modeling

Overview

Predict-The-Price-Of-Books

For this task, a big dataset which consists of book of different genres and authors was utilized. The provided dataset included various book features, such as Author, Edition, Reviews, etc. Those features have been used as regressors in order to predict the price of books, using various proposed methods and models.

Author: Nikolas Petrou, MSc in Data Science

Technical-Report and Code Availability

A complete file-folder guide is located in the folder-file guide folder
The technical report and analysis of the work is available and located in report.pdf file
The implementation and code of the project is located in the code files folder

Dataset Overview

Regarding the data of this work, there is an online competition for this task, which has been up since 27/09/2019. Currently, the competition has 3579 participants in total. The data was downloaded directly from MachineHack. There were two files forthe train and test sets. The training and test sets included 6237 and 1560 records respectively. In addition, the values of the target variable (Price) were not included in the test set, as the evaluation of the test set is employed through the website of MachineHack.

Methodology

Some of the key methods which were used throughout the work are:

Visualization
TF-IDF and LDA Topic Extraction
Text-tranlsation using Google Trasnlate Ajax API
Cyclical feature encoding for time-based feature extraction
Price Prediction using different conventional and advanced algorithms (e.g. GBM, RF, SVM, CatBoost, LightGBM)

An abstract methodology scheme of the work is illustrated in the following Figure.

Summarizing, firstly the exploratory data understanding process was commenced. Each feature was assessed in order to obtain a better understanding of what it represents and how it could affect book pricing. Next, each future was brought into a format that was appropriate for model development. Following, through visualization, it was examined how the different features were correlated to the dependent-target variable. Furthermore, the processed data were used to implement the employed models. The prediction-modelling phase was conducted with two different approaches. Finally, the whole methodology procedure followed a cyclical behaviour, until the final prediction model was implemented.

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

2 Oct 5, 2022

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Bayesian Methods for Hackers Using Python and PyMC The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chap

25.1k Jan 2, 2023

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

5 Sep 16, 2022

This project is based on RIFE and aims to make RIFE more practical for users by adding various features and design new models

This project is based on RIFE and aims to make RIFE more practical for users by adding various features and design new models. Because improving the PSNR index is not compatible with subjective effects, we hope this part of work and our academic research are independent of each other.

190 Jan 8, 2023

A module that used for encrypt code which includes RSA and AES

软件加密模块 requirement： Crypto,pycryptodome,pyqt5 本地加密信息为随机字符串使用说明命令行参数 -h 帮助 -checkWorking 检查是否能正常工作，后接1确认指令 -checkEndDate 检查截至日期，后接1确认指令 -activateCode

2 Sep 27, 2022

Ever felt tired after preprocessing the dataset, and not wanting to write any code further to train your model? Ever encountered a situation where you wanted to record the hyperparameters of the trained model and able to retrieve it afterward? Models Playground is here to help you do that. Models playground allows you to train your models right from the browser.

Models Playground 🗂️ Upload a Preprocessed Dataset 🌠 Choose whether to perform Classification or Regression 🦹 Enter the Dependent Variable ?

19 Dec 10, 2022

Attention-driven Robot Manipulation (ARM) which includes Q-attention

Attention-driven Robotic Manipulation (ARM) This codebase is home to: Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation I

84 Dec 29, 2022

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

2 Jul 25, 2022

Using the provided dataset which includes various book features, in order to predict the price of books, using various proposed methods and models.

Related tags

Overview

Predict-The-Price-Of-Books

Technical-Report and Code Availability

Dataset Overview

Methodology

You might also like...

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

This project is based on RIFE and aims to make RIFE more practical for users by adding various features and design new models

A module that used for encrypt code which includes RSA and AES

Attention-driven Robot Manipulation (ARM) which includes Q-attention

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

Owner

Nikolas Petrou

Price-Prediction-For-a-Dream-Home - A machine learning based linear regression trained model for house price prediction.

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

This is the dataset for testing the robustness of various VO/VIO methods

Use deep learning, genetic programming and other methods to predict stock and market movements

Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

A hobby project which includes a hand-gesture based virtual piano using a mobile phone camera and OpenCV library functions