Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly

Overview

Table of contents

  1. Introduction
  2. Dataset
  3. Model & Metrics
  4. How to Run

DATA COMPETITION

The COVID-19 pandemic, which is caused by the SARS-CoV-2 virus, is still continuing strong, infecting hundreds of millions of people and killing millions. Face masks reduce transmission by preventing aerosols and droplets from spreading too far into the atmosphere. As a result, there is a growing demand for automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly. This competition was designed in order to solve the problem mentioned above. This competition is unlike any other that has come before it. With a fixed model, participants will receive model code and configuration code that organizers use to train models. The candidate's task is to use data processing and generation techniques to improve the model's performance, then submit the dataset to the organizing team for training and evaluation on the private test set. The winner is the team with the highest score on the private test set.

Dataset

  • A dataset of 1100 images will be sent to you. This is an object detection dataset consisting of employee images at the office. The dataset has been assigned 3 labels by us which are no mask, mask, and incorrect mask, with the numbers 0,1,2 corresponding to each.

  • The dataset has been divided into three parts for you: train, valid, and public test. We have prepared a private test to be able to evaluate the candidate's model. This private test will be made public after the contest ends. In the public test, you can get a basic idea of the private test. Download the dataset here

  • To improve the model's performance, you can re-label it and employ data augmentation to generate more images (up to 3000).

The number of each label in each part is shown below:

No mask Mask incorrect mask
Train 308 882 51
Val 97 190 9
Public_test 47 95 13

Model & Metrics

  • The challenge is defined as object detection challenge. In the competition, We use YOLOv5s and also use a pre-trained model trained with easy mask dataset to greatly reduce training time.

  • We fix all hyperparameters of the model and do not use any augmentation tips in the source code. Therefore, each participant need to build the best possible dataset by relabeling incorrect labels, splitting train/val, augmentation tips, adding new dataset, etc.

  • In training process, Early Stopping method with patience setten to 100 iterations is used to keep track of validation set's [email protected]. Detail about [email protected] metric:

[email protected] = [email protected] = 0.2 * AP50_w + 0.3 * AP50_nw + 0.5 * AP50_wi

Where,
AP50_w: AP50 on valid mask boxes
AP50_nw: AP50 on non-mask boxes
AP50_wi: AP50 on invalid mask boxes

  • The [email protected] metric is also used as the main metric to evaluate participant's submission on private testing set.

How to Run

QuickStart

Click the image below

Open In Colab

Install requirements

  • All requirements are included in requirements.txt

  • Run the script below to clone and install all requirements

git clone https://github.com/fsoft-ailab/Data-Competition
cd Data-Competition
pip3 install -r requirements.txt

Training

  • Put your dataset into the Data-Competition folder. The structure of dataset folder is followed as folder structure below:
folder-name
├── images
│   ├── train
│   │   ├── train_img1.jpg
│   │   ├── train_img2.jpg
│   │   └── ...
│   │   
│   └── val
│       ├── val_img1.jpg
│       ├── val_img2.jpg
│       └── ...
│   
└── labels
    ├── train
    │   ├── train_img1.txt
    │   ├── train_img2.txt
    │   └── ...
    │   
    └── val
        ├── val_img1.txt
        ├── val_img2.txt
        └── ...
  • Change relative paths to train and val images folder in config/data_cfg.yaml file

  • train_cfg.yaml where we set up the model during training. You should not change such hyperparameters because it will result in incorrect results. The training results are saved in the results/train/ .

  • Run the script below to train the model. Specify particular name to identify your experiment:

python3 train.py --batch-size 64 --device 0 --name 
    

   

Note: If you get out of memory error, you can decrease batch-size to multiple of 2 as 32, 16.

Evaluation

  • Run script below to evaluate on particular dataset.
  • The --task's value is only one of train, val, or test, respectively evaluating on the training set, validation set, or public testing set.
  • Note: Specify relative path to images folder which you evaluate in config/data_cfg.yaml file.
python3 val.py --weights 
   
     --task test --name 
    
      --batch-size 64 --device 0
                                                 val
                                                 train

    
   
  • Results are saved at results/evaluate/ / .

Detection

  • You can use this script to make inferences on particular folder

  • Results are saved at .

python3 detect.py --weights 
   
     --source 
    
      --dir 
     
       --device 0

     
    
   
  • You can find more default arguments at detect.py

References

You might also like...
 🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.
🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

🧪📈 🐍. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.
fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

A data parser for the internal syncing data format used by Fog of World.
A data parser for the internal syncing data format used by Fog of World.

A data parser for the internal syncing data format used by Fog of World. The parser is not designed to be a well-coded library with good performance, it is more like a demo for showing the data structure.

Functional Data Analysis, or FDA, is the field of Statistics that analyses data that depend on a continuous parameter. Fancy data functions that will make your life as a data scientist easier.
Fancy data functions that will make your life as a data scientist easier.

WhiteBox Utilities Toolkit: Tools to make your life easier Fancy data functions that will make your life as a data scientist easier. Installing To ins

A Big Data ETL project in PySpark on the historical NYC Taxi Rides data

Processing NYC Taxi Data using PySpark ETL pipeline Description This is an project to extract, transform, and load large amount of data from NYC Taxi

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.
Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Utilize data analytics skills to solve real-world business problems using Humana’s big data

Humana-Mays-2021-HealthCare-Analytics-Case-Competition- The goal of the project is to utilize data analytics skills to solve real-world business probl

Owner
Thanh Dat Vu
Thanh Dat Vu
Flenser is a simple, minimal, automated exploratory data analysis tool.

Flenser Have you ever been handed a dataset you've never seen before? Flenser is a simple, minimal, automated exploratory data analysis tool. It runs

John McCambridge 79 Sep 20, 2022
Lale is a Python library for semi-automated data science.

Lale is a Python library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-safe fashion.

International Business Machines 293 Dec 29, 2022
Full automated data pipeline using docker images

Create postgres tables from CSV files This first section is only relate to creating tables from CSV files using postgres container alone. Just one of

null 1 Nov 21, 2021
Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python ??

Thomas 2 May 26, 2022
Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production

Numerics Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production Use procedure: Initialise a new i

George Whittle 1 Nov 13, 2021
ELFXtract is an automated analysis tool used for enumerating ELF binaries

ELFXtract ELFXtract is an automated analysis tool used for enumerating ELF binaries Powered by Radare2 and r2ghidra This is specially developed for PW

Monish Kumar 49 Nov 28, 2022
BioMASS - A Python Framework for Modeling and Analysis of Signaling Systems

Mathematical modeling is a powerful method for the analysis of complex biological systems. Although there are many researches devoted on produ

BioMASS 22 Dec 27, 2022
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen 3.7k Jan 3, 2023
Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

null 898 Jan 9, 2023
Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video.

Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video. You can chose the cha

null 2 Jul 22, 2022