Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

Related tags

Deep Learning FPT_data_centric_competition

Overview

FPT data centric competition

Introduction

Deep Learning models have become exceedingly developed and popular in recent years. On the other hand, data processing techniques have not been equally developed compared to models

In this competition, participants are provided with a dataset. The goal is to use processing techniques on that dataset to ensure that model achieves the best performance after training.

Following Reinforcement Learning Competition 2021 success, DataComp is a brand new competition with a new approach for researchers. Besides that, DataComp was created to contribute to the prevention of Covid-19 pandemic, using face mask recognition model.

Competition link: https://datacomp.io/gioi-thieu

Our performance

Achieve top 20/400 teams (5% highest team) having the highest score validated on the private test dataset
Our [email protected] score on private test: 0.545
Team name: "nan"
Leaderboard link: https://datacomp.io/bang-xep-hang-cuoi-cung

Methods

We tried many different data augmentation from the basic types such as rotation, shearing, ... to some quite advance techniques such as mosaic, random safe crop,... The library that we're using albumentation Consequently, the combination of these below technqiues result to the final highest score in our case:

Train dataset -> 934 images after relabeled to make sure the correctness is more than 99%
Validation dataset -> 154 images (design an as much as general set by ultilizing KNN technique which is explained below!)
toGray augmentation -> 100 images
CutOut + HorizontalFlip (p=0.5) -> 400 images
Filter only incorrect-mask label images + HorizontalFlip (p=0.7) -> 200 images
Mosaic augmentation -> 451 images (Note: after do the mosaic augmentation, it's crucial to check the set again to exclude all images having poor-quality bboxes at the edge of each image)
Rotation + Shear (prob 50/50) -> 600 images
- Rotation + Shear (prob 50/50) with no-mask & mask only -> 200 images
- Remaining images augmented normally -> 400 images
B.c model perform poorly with images having people appeared behide the door. Therefore, filter & augment specificailly those images in training dataset -> 100 images

--> TOTAL 2939 augmentation images to submit (training + validation)

KNN ultilization

Briefly instroduce about KNN
The application of KNN in our solution

Used to construct as general as possible validation dataset
Categorize type of images in training set to faster filter images with specific feature, characteristic (Ex: Img having people behide doors, img having people wearing different types of masks)

Solution of Kaggle competition: Sartorius - Cell Instance Segmentation

Sartorius - Cell Instance Segmentation https://www.kaggle.com/c/sartorius-cell-instance-segmentation Environment setup Build docker image bash .dev_sc

68 Dec 9, 2022

This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Feedback Prize - Evaluating Student Writing This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing. The

41 Dec 14, 2022

Industrial knn-based anomaly detection for images. Visit streamlit link to check out the demo.

Industrial KNN-based Anomaly Detection ⭐ Now has streamlit support! ⭐ Run $ streamlit run streamlit_app.py This repo aims to reproduce the results of

102 Dec 26, 2022

Weighted K Nearest Neighbors (kNN) algorithm implemented on python from scratch.

kNN_From_Scratch I implemented the k nearest neighbors (kNN) classification algorithm on python. This algorithm is used to predict the classes of new

1 Dec 14, 2021

Python KNN model: Predicting a probability of getting a work visa. Tableau: Non-immigrant visas over the years.

The value of international students to the United States. Probability of getting a non-immigrant visa. Project timeline: Jan 2021 - April 2021 Project

2 Nov 21, 2021

Product-based-recommendation-system - A product based recommendation system which uses Machine learning algorithm such as KNN and cosine similarity

Product-based-recommendation-system A product based recommendation system which

2 Feb 15, 2022

Xview3 solution - XView3 challenge, 2nd place solution

Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

Related tags

Overview

FPT data centric competition

Introduction

Our performance

Methods

KNN ultilization

You might also like...

Solution of Kaggle competition: Sartorius - Cell Instance Segmentation

This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Industrial knn-based anomaly detection for images. Visit streamlit link to check out the demo.

Weighted K Nearest Neighbors (kNN) algorithm implemented on python from scratch.

Python KNN model: Predicting a probability of getting a work visa. Tableau: Non-immigrant visas over the years.

Product-based-recommendation-system - A product based recommendation system which uses Machine learning algorithm such as KNN and cosine similarity

Xview3 solution - XView3 challenge, 2nd place solution

Does MAML Only Work via Feature Re-use? A Data Set Centric Perspective

StyleGAN-Human: A Data-Centric Odyssey of Human Generation

Owner

Pham Viet Hoang (Harry)

HackBMU-5.0-Team-Ctrl-Alt-Elite - HackBMU 5.0 Team Ctrl Alt Elite

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data visualization app for H&M competition in kaggle

The 3rd place solution for competition

Winning solution of the Indoor Location & Navigation Kaggle competition

2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

1st Solution For NeurIPS 2021 Competition on ML4CO Dual Task