Meli Data Challenge 2021 - First Place Solution

Overview

Meli Data Challenge 2021 - First Place Solution

My solution for the Meli Data Challenge 2021, first place in both public and private leaderboards.

The Model

My final model is an ensemble combining recurrent neural networks and XGBoost regressors. Neural networks are trained to predict the stock days probability distribution using the RPS as loss function. XGBoost regressors are trained to predict stock days using different objectives, here the intuition behind this:

  • MSE loss: the regressor trained with this loss will output values close to the expected mean.
  • Pseudo-Huber loss: an alternative for the MAE loss, this regressor outputs values close to the expected median.
  • Quantile loss: 11 regressors are trained using a quantile loss with alpha 0, 0.1, 0.2, ..., 1. This helps to build the final probability distribution.

The outputs of all these level-0 models are concatenated to train a feedforward neural network with the RPS as loss function.

diagram

The last 30 days of the train dataset are used to generate the labels and the target stock input. The remaining 29 days are used to generate the time series input.

The train/validation split is done at a sku level:

  • For level-0 models: 450000 sku's are used for training and the rest for validation.
  • For the level-1 model: the sku's used for training level-0 models are removed from the dataset and the remaining sku's are split again into train/validation.

Once all models are trained, the last 29 days of the train dataset and the provided target stock values are used as input to generate the submission.

Disclaimer: the entire solution lacks some fine tuning since I came up with this little ensemble monster towards the end of the competition. I didn't have the time to fine-tune each model (there are technically 16 models to tune if we consider each quantile regressor as an independent model).

How to run the solution

Requirements

  • TensorFlow v2.
  • Pandas.
  • Numpy.
  • Scikit-learn.

CUDA drivers and a CUDA-compatible GPU is required (I didn't have the time to test this on a CPU).

Some scripts require up to 30GB of RAM (again, I didn't have the time to implement a more memory-efficient solution).

The solution was tested on Ubuntu 20.04 with Python 3.8.10.

Downloading the dataset

Download the dataset files from https://ml-challenge.mercadolibre.com/downloads and put them into the dataset/ directory.

On linux, you can do that by running:

cd dataset && wget \
https://meli-data-challenge.s3.amazonaws.com/2021/test_data.csv \
https://meli-data-challenge.s3.amazonaws.com/2021/train_data.parquet \
https://meli-data-challenge.s3.amazonaws.com/2021/items_static_metadata_full.jl

Running the scripts

All-in-one script

A convenient script to run the entire solution is provided:

cd src
./run-solution.sh

Note: the entire process may take more than 3 hours to run.

Step by step

If you find trouble running the al-in-one script, you can run the solution step by step following the instructions bellow:

cd into the src directory:

cd src

Extract time series from the dataset:

python3 ./preprocessing/extract-time-series.py

Generate a supervised learning dataset:

python3 ./preprocessing/generate-sl-dataset.py

Train all level-0 models:

python3 ./train-all.py

Train the level-1 ensemble:

python3 ./train-ensemble.py

Generate the submission file and gzip it:

python3 ./generate-submission.py && gzip ./submission.csv

Utility scripts

The training_scripts directory contains some scripts to train each model separately, example usage:

python3 ./training_scripts/train-lstm.py
You might also like...
1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime
1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

10th place solution for Google Smartphone Decimeter Challenge at kaggle.
10th place solution for Google Smartphone Decimeter Challenge at kaggle.

Under refactoring 10th place solution for Google Smartphone Decimeter Challenge at kaggle. Google Smartphone Decimeter Challenge Global Navigation Sat

Kohei's 5th place solution for xview3 challenge

xview3-kohei-solution Usage This repository assumes that the given data set is stored in the following locations: $ ls data/input/xview3/*.csv data/in

4st place solution for the PBVS 2022 Multi-modal Aerial View Object Classification Challenge - Track 1 (SAR) at PBVS2022
4st place solution for the PBVS 2022 Multi-modal Aerial View Object Classification Challenge - Track 1 (SAR) at PBVS2022

A Two-Stage Shake-Shake Network for Long-tailed Recognition of SAR Aerial View Objects 4st place solution for the PBVS 2022 Multi-modal Aerial View Ob

BirdCLEF 2021 - Birdcall Identification 4th place solution

BirdCLEF 2021 - Birdcall Identification 4th place solution My solution detail kaggle discussion Inference Notebook (best submission) Environment Use K

My 1st place solution at Kaggle Hotel-ID 2021

1st place solution at Kaggle Hotel-ID My 1st place solution at Kaggle Hotel-ID to Combat Human Trafficking 2021. https://www.kaggle.com/c/hotel-id-202

🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)
🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

AI City 2021: Connecting Language and Vision for Natural Language-Based Vehicle Retrieval 🏆 The 1st Place Submission to AICity Challenge 2021 Natural

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

TorchRL Disclaimer This library is not officially released yet and is subject to change. The features are available before an official release so that

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th place solution

Lyft Motion Prediction for Autonomous Vehicles Code for the 4th place solution of Lyft Motion Prediction for Autonomous Vehicles on Kaggle. Discussion

Owner
Matias Moreyra
Electronics Engineer, Software Developer.
Matias Moreyra
Xview3 solution - XView3 challenge, 2nd place solution

Xview3, 2nd place solution https://iuu.xview.us/ test split aggregate score publ

Selim Seferbekov 24 Nov 23, 2022
The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

AICITY2021_Track2_DMT The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop. Introduction

Hao Luo 91 Dec 21, 2022
Waymo motion prediction challenge 2021: 3rd place solution

Waymo motion prediction challenge 2021: 3rd place solution ?? Technical report ??️ Presentation ?? Announcement ??Motion Prediction Channel Website ??

null 158 Jan 8, 2023
4th place solution for the SIGIR 2021 challenge.

SIGIR-2021 (Tinkoff.AI) How to start Download train and test data: https://sigir-ecom.github.io/data-task.html Place it under sigir-2021/data/. Run py

Tinkoff.AI 4 Jul 1, 2022
The sixth place winning solution (6/220) in 2021 Gaofen Challenge.

SwinTransformer + OBBDet The sixth place winning solution (6/220) in the track of Fine-grained Object Recognition in High-Resolution Optical Images, 2

ming71 46 Dec 2, 2022
1st place solution in CCF BDCI 2021 ULSEG challenge

1st place solution in CCF BDCI 2021 ULSEG challenge This is the source code of the 1st place solution for ultrasound image angioma segmentation task (

Chenxu Peng 30 Nov 22, 2022
Code for 1st place solution in Sleep AI Challenge SNU Hospital

Sleep AI Challenge SNU Hospital 2021 Code for 1st place solution for Sleep AI Challenge (Note that the code is not fully organized) Refer to the notio

Saewon Yang 13 Jan 3, 2022
4th place solution to datafactory challenge by Intermarché.

Solution to Datafactory challenge by Intermarché. 4th place solution to datafactory challenge by Intermarché. The objective of the challenge is to pre

Raphael Sourty 11 Mar 19, 2022
Kaggle | 9th place (part of) solution for the Bristol-Myers Squibb – Molecular Translation challenge

Part of the 9th place solution for the Bristol-Myers Squibb – Molecular Translation challenge translating images containing chemical structures into I

Erdene-Ochir Tuguldur 22 Nov 30, 2022
Kaggle | 9th place single model solution for TGS Salt Identification Challenge

UNet for segmenting salt deposits from seismic images with PyTorch. General We, tugstugi and xuyuan, have participated in the Kaggle competition TGS S

Erdene-Ochir Tuguldur 276 Dec 20, 2022