Publication describing 3 ML examples at NSLS-II and interfacing into Bluesky

BNL

Last update: Sep 24, 2022

Related tags

Deep Learning pub-ML_examples

Overview

Machine learning enabling high-throughput and remote operations at large-scale user facilities.

Overview

This repository contains the source code and examples for recreating the publication at arXiv:2201.03550.

Abstract

Imaging, scattering, and spectroscopy are fundamental in understanding and discovering new functional materials. Contemporary innovations in automation and experimental techniques have led to these measurements being performed much faster and with higher resolution, thus producing vast amounts of data for analysis. These innovations are particularly pronounced at user facilities and synchrotron light sources. Machine learning (ML) methods are regularly developed to process and interpret large datasets in real-time with measurements. However, there remain conceptual barriers to entry for the facility general user community, whom often lack expertise in ML, and technical barriers for deploying ML models. Herein, we demonstrate a variety of archetypal ML models for on-the-fly analysis at multiple beamlines at the National Synchrotron Light Source II (NSLS-II). We describe these examples instructively, with a focus on integrating the models into existing experimental workflows, such that the reader can easily include their own ML techniques into experiments at NSLS-II or facilities with a common infrastructure. The framework presented here shows how with little effort, diverse ML models operate in conjunction with feedback loops via integration into the existing Bluesky Suite for experimental orchestration and data management.

Explanation of Examples

As with all things at a user facility, each model is trained or set-up according to the needs of the user and their science. What is consistent across all AI agents, is their final communication paradigm. The agent loads and stores the model and/or necessary data, and has at minimum the following methods.

tell : tell the agent about some new data
report : construct a report (message, visualization, etc.) about the data
ask : ask the agent what to do next (for more see bluesky-adaptive)

Unsupervised learning (Non-negative matrix factorization)

The NMF companion agent keeps a constant cache of data to perform the reduction on. We treat these data as dependent variables, with independent variables coming fom the experiment. In the case study presented, the independent variables are temperature measurements, and the dependent variables are the 1-d spectra. Each call to report updates the decomposition using the full dataset, and updates the plots in the visualization.

The NMF companion agent is wrapped in a filesystem watcher, DirectoryAgent, which monitors a directory periodically. If there is new data in the target directory, the DirectoryAgent tells the NMF companion about the new data, and triggers a new report.

The construction of these objects, training, and visualization are all contained in the run_unsupervised file and mirrored in the corresponding notebook.

Anomaly detection

The model attributes a new observation to either normal or anomalous time series by comparing it to a large courpus of data collected at the beamline over an extended period of time. The development and updating of the model is done offline. Due to the nature of exparimental measurements, anomalous observatons may constitute a sizable portion of data withing a single collection period. Thus, a labeling of the data is required prior to model training. Once the model is trained it is saved as a binary file and loaded each time when AnomalyAgent is initialized.

A set of features devired from the original raw data, allowing the model to process time series of arbitary length.

The training can be found at run_anomaly.py with example deployment infrastructure at deploy_anomaly.py.

Supervised learning (Failure Classification)

The classifications of failures involves training the models entirely offline. This allows for robust model selection and specific deployment. A suite of models from scikit-learn are trained and tested, with the most promising model chosen to deploy. Since the models are lightweight, we re-train them at each instantiation during deployment with the most current dataset. For deep learning models, it would be appropriate to save and version the weights of a model, can construct the model at instantiation and load the weights.

The training can be found at run_supervised.py with example deployment infrastructure at deploy_supervised.py. How this is implemented at the BMM beamline can be found concisely here, where a wrapper agent does pointwise evaluation on UIDs of a document stream, using the ClassificationAgent's tell--report interface.

System Requirements

Hardware Requirements

Software Requirements

OS Requirements

This package has been tested exclusively on Linux operating systems.

RHEL 8.3
Ubuntu 18.04
PopOS 20.04

Python dependencies

numpy
matplotlib
scikit-learn
ipython

Getting Started

Installation guide

Install from github:

$ python3 -m venv pub_env
$ source pub_env/bin/activate

Experiments and examples converting Transformers to ONNX

Experiments and examples converting Transformers to ONNX This repository containes experiments and examples on converting different Transformers to ON

4 Dec 24, 2022

Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab

PySDM PySDM is a package for simulating the dynamics of population of particles. It is intended to serve as a building block for simulation systems mo

Atmospheric Cloud Simulation Group @ Jagiellonian University

32 Oct 18, 2022

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

NeurIPS 2021 Title: Distilling Robust and Non-Robust Features in Adversarial Exa

35 Dec 26, 2022

Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark

Comments

General repository and package updates
There are a few smallish tasks that need cleaning up, this includes:

[x] make the python project name consistent to be bnl_ml_examples

[x] Remove unused files generated by the cookie cutter.
opened by stuartcampbell 0
General code quality updates
Just some general minor updates for things such as:

Fix some typos

Fix some code quality warnings

Formatted files with black

Added codeql code security scanning

If you don't like the black code formatting then please let me know and I will undo it in the PR.
opened by stuartcampbell 0
Generate .py files from Jupyter notebooks automatically

To ensure that the raw python and notebooks do not diverge, we should treat the notebook as the source of truth and generate the .py files automatically.

Should add a comment to the top of the generated files to say that they should not be edited.

opened by stuartcampbell 0
Make deploy_supervised.py work standalone

My current plan is to remove the dependencies on the BMM profile. For reading data, I am going to read from our tiled demo server (https://tiled-demo.nsls2.bnl.gov)

opened by stuartcampbell 1

Publication describing 3 ML examples at NSLS-II and interfacing into Bluesky

Related tags

Overview

Machine learning enabling high-throughput and remote operations at large-scale user facilities.

Overview

Abstract

Explanation of Examples

Unsupervised learning (Non-negative matrix factorization)

Anomaly detection

Supervised learning (Failure Classification)

System Requirements

Hardware Requirements

Software Requirements

OS Requirements

Python dependencies

Getting Started

Installation guide

You might also like...

Experiments and examples converting Transformers to ONNX

Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark

Unadversarial Examples: Designing Objects for Robust Vision

Several simple examples for popular neural network toolkits calling custom CUDA operators.

A repo with study material, exercises, examples, etc for Devnet SPAUTO

🐸STT integration examples

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

Comments

General repository and package updates

General code quality updates

Generate .py files from Jupyter notebooks automatically

Make deploy_supervised.py work standalone

Owner

BNL

Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

Code in conjunction with the publication 'Contrastive Representation Learning for Hand Shape Estimation'

constructing maps of intellectual influence from publication data

Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

A set of examples around hub for creating and processing datasets

Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python

Numba-accelerated Pythonic implementation of MPDATA with examples in Python, Julia and Matlab