The Kaggle Book
Data analysis and machine learning for competitive data science
Code Repository for The Kaggle Book, Published by Packt Publishing
"Luca and Konradˈs book helps make Kaggle even more accessible. They are both top-ranked users and well-respected members of the Kaggle community. Those who complete this book should expect to be able to engage confidently on Kaggle – and engaging confidently on Kaggle has many rewards." — Anthony Goldbloom, Kaggle Founder & CEO
Get a step ahead of your competitors with a concise collection of smart data handling and modeling techniques
Getting started
You can run these notebooks on cloud platforms like Kaggle Colab or your local machine. Note that most chapters require a GPU even TPU sometimes to run in a reasonable amount of time, so we recommend one of the cloud platforms as they come pre-installed with CUDA.
Running on a cloud platform
To run these notebooks on a cloud platform, just click on one of the badges (Colab or Kaggle) in the table below. The code will be reproduced from Github directly onto the choosen platform (you may have to add the necessary data before running it). Alternatively, we also provide links to the fully working original notebook on Kaggle that you can copy and immediately run.
no | Chapter | Notebook | Colab | Kaggle |
---|---|---|---|---|
05 | Competition Tasks and Metrics | meta_kaggle | ||
06 | Designing Good Validation | adversarial-validation-example | ||
07 | Modeling for Tabular Competitions | interesting-eda-tsne-umap | ||
meta-features-and-target-encoding | ||||
really-not-missing-at-random | ||||
tutorial-feature-selection-with-boruta-shap | ||||
08 | Hyperparameter Optimization | basic-optimization-practices | ||
hacking-bayesian-optimization-for-dnns | ||||
hacking-bayesian-optimization | ||||
kerastuner-for-imdb | ||||
optuna-bayesian-optimization | ||||
scikit-optimize-for-lightgbm | ||||
tutorial-bayesian-optimization-with-lightgbm | ||||
09 | Ensembling with Blending and Stacking Solutions | ensembling | ||
10 | Modeling for Computer Vision | augmentations-examples | ||
images-classification | ||||
prepare-annotations | ||||
segmentation-inference | ||||
segmentation | ||||
object-detection-yolov5 | ||||
11 | Modeling for NLP | nlp-augmentations4 | ||
nlp-augmentation1 | ||||
qanswering | ||||
sentiment-extraction | ||||
12 | Simulation and Optimization Competitions | connectx | ||
mab-santa | ||||
rps-notebook1 |
Book Description
Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with the rest of the community, and gain valuable experience to help grow your career.
The first book of its kind, Data Analysis and Machine Learning with Kaggle assembles the techniques and skills you’ll need for success in competitions, data science projects, and beyond. Two masters of Kaggle walk you through modeling strategies you won’t easily find elsewhere, and the tacit knowledge they’ve accumulated along the way. As well as Kaggle-specific tips, you’ll learn more general techniques for approaching tasks based on image data, tabular data, textual data, and reinforcement learning. You’ll design better validation schemes and work more comfortably with different evaluation metrics.
Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you.
What you will learn
- Get acquainted with Kaggle and other competition platforms
- Make the most of Kaggle Notebooks, Datasets, and Discussion forums
- Understand different modeling tasks including binary and multi-class classification, object detection, NLP (Natural Language Processing), and time series
- Design good validation schemes, learning about k-fold, probabilistic, and adversarial validation
- Get to grips with evaluation metrics including MSE and its variants, precision and recall, IoU, mean average precision at k, as well as never-before-seen metrics
- Handle simulation and optimization competitions on Kaggle
- Create a portfolio of projects and ideas to get further in your career
Who This Book Is For
This book is suitable for Kaggle users and data analysts/scientists with at least a basic proficiency in data science topics and Python who are trying to do better in Kaggle competitions and secure jobs with tech giants. At the time of completion of this book, there are 96,190 Kaggle novices (users who have just registered on the website) and 67,666 Kaggle contributors (users who have just filled in their profile) enlisted in Kaggle competitions. This book has been written with all of them in mind and with anyone else wanting to break the ice and start taking part in competitions on Kaggle and learning from them.
Table of Contents
Part 1
- Introducing Kaggle and Other Data Science Competitions
- Organizing Data with Datasets
- Working and Learning with Kaggle Notebooks
- Leveraging Discussion Forums
Part 2
- Competition Tasks and Metrics
- Designing Good Validation
- Modeling for Tabular Competitions
- Hyperparameter Optimization
- Ensembling with Blending and Stacking Solutions
- Modeling for Computer Vision
- Modeling for NLP
- Simulation and Optimization Competitions
Part 3
- Creating Your Portfolio of Projects and Ideas
- Finding New Professional Opportunities