BankNote-Net: Open dataset and encoder model for assistive currency recognition

Microsoft

Last update: Oct 28, 2022

Related tags

Deep Learning banknote-net

Overview

BankNote-Net: Open Dataset for Assistive Currency Recognition

Millions of people around the world have low or no vision. Assistive software applications have been developed for a variety of day-to-day tasks, including currency recognition. To aid with this task, we present BankNote-Net, an open dataset for assistive currency recognition. The dataset consists of a total of 24,816 embeddings of banknote images captured in a variety of assistive scenarios, spanning 17 currencies and 112 denominations. These compliant embeddings were learned using supervised contrastive learning and a MobileNetV2 architecture, and they can be used to train and test specialized downstream models for any currency, including those not covered by our dataset or for which only a few real images per denomination are available (few-shot learning). We deploy a variation of this model for public use in the last version of the Seeing AI app developed by Microsoft, which has over a 100 thousand monthly active users.

If you make use of this dataset or pre-trained model in your own project, please consider referencing this GitHub repository and citing our paper:

@article{oviedoBankNote-Net2022,
  title   = {BankNote-Net: Open Dataset for Assistive Currency Recognition},
  author  = {Felipe Oviedo, Srinivas Vinnakota, Eugene Seleznev, Hemant Malhotra, Saqib Shaikh & Juan Lavista Ferres},
  journal = {https://arxiv.org/pdf/2204.03738.pdf},
  year    = {2022},
}

Data Structure

The dataset data structure consists of 256-dimensional vector embeddings with additional columns for currency, denomination and face labels, as explained in the data exploration notebook. The dataset is saved as 24,826 x 258 flat table in feather and csv file formats. Figure 1 presents some of these learned embeddings.

Figure 1: t-SNE representations of the BankNote-Net embeddings for a few selected currencies.

Setup and Dataset Usage

Install requirements.

Please, use the conda environment file env.yaml to install the right dependencies.

# Create conda environment
conda create env -f env.yaml

# Activate environment to run examples
conda activate banknote_net

Example 1: Train a shallow classifier directly from the dataset embeddings for a currency available in the dataset. For inference, images should be encoded first using the keras MobileNet V2 pre-trained encoder model.

Run the following file from root: train_from_embedding.py

python src/train_from_embedding.py --currency AUD --bsize 128 --epochs 25 --dpath ./data/banknote_net.feather

  usage: train_from_embedding.py [-h] --currency
                              {AUD,BRL,CAD,EUR,GBP,INR,JPY,MXN,PKR,SGD,TRY,USD,NZD,NNR,MYR,IDR,PHP}
                              [--bsize BSIZE] [--epochs EPOCHS]
                              [--dpath DPATH]

  Train model from embeddings.

  optional arguments:
  -h, --help            show this help message and exit
  --currency {AUD,BRL,CAD,EUR,GBP,INR,JPY,MXN,PKR,SGD,TRY,USD,NZD,NNR,MYR,IDR,PHP}, --c {AUD,BRL,CAD,EUR,GBP,INR,JPY,MXN,PKR,SGD,TRY,USD,NZD,NNR,MYR,IDR,PHP}
                          String of currency for which to train shallow
                          classifier
  --bsize BSIZE, --b BSIZE
                          Batch size for shallow classifier
  --epochs EPOCHS, --e EPOCHS
                          Number of epochs for training shallow top classifier
  --dpath DPATH, --d DPATH
                          Path to .feather BankNote Net embeddings

Example 2: Train a classifier on top of the BankNote-Net pre-trained encoder model using images in a custom directory. Input images must be of size 224 x 224 pixels and have square aspect ratio. For this example, we use a couple dozen images spanning 8 classes for Swedish Krona, structured as in the example_images/SEK directory, that contains both training and validation images.

Run the following file from root: train_custom.py

python src/train_custom.py --bsize 4 --epochs 25 --data_path ./data/example_images/SEK/ --enc_path ./models/banknote_net_encoder.h5

usage: train_custom.py [-h] [--bsize BSIZE] [--epochs EPOCHS]
                  [--data_path DATA_PATH] [--enc_path ENC_PATH]

Train model from custom image folder using pre-trained BankNote-Net encoder.

optional arguments:
-h, --help            show this help message and exit
--bsize BSIZE, --b BSIZE
                      Batch size
--epochs EPOCHS, --e EPOCHS
                      Number of epochs for training shallow top classifier.
--data_path DATA_PATH, --data DATA_PATH
                      Path to folder with images.
--enc_path ENC_PATH, --enc ENC_PATH
                      Path to .h5 file of pre-trained encoder model.

Example 3: Perform inference using the SEK few-shot classifier of Example 2, and the validation images on example_images/SEK/val

Run the following file from root: predict_custom.py, returns encoded predictions.

  python src/predict_custom.py --bsize 1 --data_path ./data/example_images/SEK/val/ --model_path ./src/trained_models/custom_classifier.h5

  usage: predict_custom.py [-h] [--bsize BSIZE] [--data_path DATA_PATH]
                          [--model_path MODEL_PATH]

  Perform inference using trained custom classifier.

  optional arguments:
  -h, --help            show this help message and exit
  --bsize BSIZE, --b BSIZE
                          Batch size
  --data_path DATA_PATH, --data DATA_PATH
                          Path to custom folder with validation images.
  --model_path MODEL_PATH, --enc MODEL_PATH
                          Path to .h5 file of trained classification model.

License for Dataset and Model

The dataset is open for anyone to use under the CDLA-Permissive-2.0 license. The embeddings should not be used to reconstruct high resolution banknote images.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Comments

Via the I2S thread, I started reading the explainer, and it seems like several best practices are missing and design directions are not enunciated:
Via the I2S thread, I started reading the explainer, and it seems like several best practices are missing and design directions are not enunciated:

the end-user problem/benefit is not front-and-center

the explainer includes proposed IDL, which should be relegated to a spec draft

the explainer does not identify problems related to XR camera access that are not going to be addressed by the design (non-goals)

the explainer is hand-wavey about why this is not being proposed as an extension to getUserMedia()

why does the explainer not provide a way to also upload "raw" image data to a WebGL texture?

for a "raw" image, the API seems to lack color space controls/hints/output, and does not document the format of the resulting texture. Why not?

how does this feature handle multiple cameras? Stereo cameras? Why are views and cameras always linked 1:1?

how will this work in the context of Offscreen Canvas?

the considered alternatives section only contains a single other design, when we can easily imagine many different designs

the spec doc does not link to the explainer

the explainer does not link to the spec

Originally posted by @slightlyoff in https://github.com/immersive-web/raw-camera-access/issues/14
opened by Moi0 1

This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

H3DS Dataset This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction Access

72 Dec 10, 2022

U-Net Implementation: Convolutional Networks for Biomedical Image Segmentation" using the Carvana Image Masking Dataset in PyTorch

U-Net Implementation By Christopher Ley This is my interpretation and implementation of the famous paper "U-Net: Convolutional Networks for Biomedical

1 Jan 6, 2022

Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

ACTION-Net Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21). Getting Started EgoGesture data folder struct

171 Dec 26, 2022

OpenGAN: Open-Set Recognition via Open Data Generation

OpenGAN: Open-Set Recognition via Open Data Generation ICCV 2021 (oral) Real-world machine learning systems need to analyze novel testing data that di

90 Jan 6, 2023

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl

6.1k Jan 5, 2023

The open source code of SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation.

SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation(ICPR 2020) Overview This code is for the paper: Spatial Attention U-Net for Retinal V

151 Dec 28, 2022

COVID-Net Open Source Initiative

The COVID-Net models provided here are intended to be used as reference models that can be built upon and enhanced as new data becomes available

1.1k Dec 26, 2022

Ever felt tired after preprocessing the dataset, and not wanting to write any code further to train your model? Ever encountered a situation where you wanted to record the hyperparameters of the trained model and able to retrieve it afterward? Models Playground is here to help you do that. Models playground allows you to train your models right from the browser.

Models Playground 🗂️ Upload a Preprocessed Dataset 🌠 Choose whether to perform Classification or Regression 🦹 Enter the Dependent Variable ?

19 Dec 10, 2022

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.

PSPNet-logits and feature-distillation Introduction This repository is based on PSPNet and modified from semseg and Pixelwise_Knowledge_Distillation_P

6 Dec 1, 2022

BankNote-Net: Open dataset and encoder model for assistive currency recognition

Related tags

Overview

BankNote-Net: Open Dataset for Assistive Currency Recognition

Data Structure

Setup and Dataset Usage

License for Dataset and Model

Contributing

Trademarks

You might also like...

This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

U-Net Implementation: Convolutional Networks for Biomedical Image Segmentation" using the Carvana Image Masking Dataset in PyTorch

Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

OpenGAN: Open-Set Recognition via Open Data Generation

The open source code of SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation.

COVID-Net Open Source Initiative

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.

Comments

Via the I2S thread, I started reading the explainer, and it seems like several best practices are missing and design directions are not enunciated:

Owner

Microsoft

Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

U-2-Net: U Square Net - Modified for paired image training of style transfer

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

This is a simple backtesting framework to help you test your crypto currency trading. It includes a way to download and store historical crypto data and to execute a trading strategy.

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Node for thenewboston digital currency network.

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Official implementation of particle-based models (GNS and DPI-Net) on the Physion dataset.

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.