Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

Gautam Diwan

Last update: Jan 18, 2022

Related tags

Deep Learning IMDB-Success-Predictor

Overview

IMDB Success Predictor

Project involves Web Scraping custom IMDB data between 2020 and 2021 of 10000 movies and shows sorted by number of votes ,fine tuning a pre trained DistilBERT Transformer using Transfer Learning and then saving and reusing the saved model for further use.

Stack

DistilBERT Transformer
Tensorflow
Numpy and Pandas
Selenium, BeautifulSoup4 and requests

Metrics

Accuracy achieved: 81.3492%
ROC_AUC_Score achieved: 0.7217

Installation

1) Ensure Python and Jupyter Notebook are installed. Optionally Conda environment can also be used.

Install the required modules using

pip install -r requirements.txt 

or conda install -r requirements.txt

or !pip install -r requirements.txt for Google Colab.

Selenium requires browser specific drivers. Guides for Chrome and Firefox are mentioned below. Alternatively,this step is optional if the notebook is run on Google Colab.
Chrome: https://chromedriver.chromium.org/getting-started
Firefox: https://www.lambdatest.com/blog/selenium-firefox-driver-tutorial/

Training

1)(Optional) Run the IMDB Web scraper . This generates the already provided csv file and imdb_movies pickle file.

Run the IMDB Web scraper on an environment which has GPU acceleration. Here it is used with Google Colab where Nvidia Tesla T4 or Nvidia Tesla K80 are allocated.
```
Training Time: Roughly 20-25 mins
Epochs: 10
Training Batch Size: 8
Max length of each Sentence: 512 
```
A Movie_prediction_model directory is created with config.json file(provided) and a tf_model.h5 (not provided due to space constraints).

Usage

1) Ensure the model has been created inside Movie_prediction_model directory.

Run the python file using python DistilBERT_Movie_Classifier.py
Enter the description of the movie or TV show you want to predict for. An output will be generated with the binary prediction of success based of IMDB Ratings.

Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

Related tags

Overview

IMDB Success Predictor

Stack

Metrics

Installation

Training

Usage

You might also like...

A custom DeepStack model that has been trained detecting ONLY the USPS logo

Repository to run object detection on a model trained on an autonomous driving dataset.

LSTM model trained on a small dataset of 3000 names written in PyTorch

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Universal Probability Distributions with Optimal Transport and Convex Optimization

TANL: Structured Prediction as Translation between Augmented Natural Languages

Reliable probability face embeddings

Implementation of Diverse Semantic Image Synthesis via Probability Distribution Modeling

A foreign language learning aid using a neural network to predict probability of translating foreign words

Owner

Gautam Diwan

Price-Prediction-For-a-Dream-Home - A machine learning based linear regression trained model for house price prediction.

PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

PyTorch implementation of a Real-ESRGAN model trained on custom dataset

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.

Annotate datasets with a semi-trained or fully trained YOLOv5 model

Python KNN model: Predicting a probability of getting a work visa. Tableau: Non-immigrant visas over the years.

This package is for running the semantic SLAM algorithm using extracted planar surfaces from the received detection

Sequence lineage information extracted from RKI sequence data repo

Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020