Publication describing 3 ML examples at NSLS-II and interfacing into Bluesky

Overview

Machine learning enabling high-throughput and remote operations at large-scale user facilities.

Overview

This repository contains the source code and examples for recreating the publication at arXiv:2201.03550.

Abstract

Imaging, scattering, and spectroscopy are fundamental in understanding and discovering new functional materials. Contemporary innovations in automation and experimental techniques have led to these measurements being performed much faster and with higher resolution, thus producing vast amounts of data for analysis. These innovations are particularly pronounced at user facilities and synchrotron light sources. Machine learning (ML) methods are regularly developed to process and interpret large datasets in real-time with measurements. However, there remain conceptual barriers to entry for the facility general user community, whom often lack expertise in ML, and technical barriers for deploying ML models. Herein, we demonstrate a variety of archetypal ML models for on-the-fly analysis at multiple beamlines at the National Synchrotron Light Source II (NSLS-II). We describe these examples instructively, with a focus on integrating the models into existing experimental workflows, such that the reader can easily include their own ML techniques into experiments at NSLS-II or facilities with a common infrastructure. The framework presented here shows how with little effort, diverse ML models operate in conjunction with feedback loops via integration into the existing Bluesky Suite for experimental orchestration and data management.

Explanation of Examples

As with all things at a user facility, each model is trained or set-up according to the needs of the user and their science. What is consistent across all AI agents, is their final communication paradigm. The agent loads and stores the model and/or necessary data, and has at minimum the following methods.

  • tell : tell the agent about some new data
  • report : construct a report (message, visualization, etc.) about the data
  • ask : ask the agent what to do next (for more see bluesky-adaptive)

Unsupervised learning (Non-negative matrix factorization)

The NMF companion agent keeps a constant cache of data to perform the reduction on. We treat these data as dependent variables, with independent variables coming fom the experiment. In the case study presented, the independent variables are temperature measurements, and the dependent variables are the 1-d spectra. Each call to report updates the decomposition using the full dataset, and updates the plots in the visualization.

The NMF companion agent is wrapped in a filesystem watcher, DirectoryAgent, which monitors a directory periodically. If there is new data in the target directory, the DirectoryAgent tells the NMF companion about the new data, and triggers a new report.

The construction of these objects, training, and visualization are all contained in the run_unsupervised file and mirrored in the corresponding notebook.

Anomaly detection

The model attributes a new observation to either normal or anomalous time series by comparing it to a large courpus of data collected at the beamline over an extended period of time. The development and updating of the model is done offline. Due to the nature of exparimental measurements, anomalous observatons may constitute a sizable portion of data withing a single collection period. Thus, a labeling of the data is required prior to model training. Once the model is trained it is saved as a binary file and loaded each time when AnomalyAgent is initialized.

A set of features devired from the original raw data, allowing the model to process time series of arbitary length.

The training can be found at run_anomaly.py with example deployment infrastructure at deploy_anomaly.py.

Supervised learning (Failure Classification)

The classifications of failures involves training the models entirely offline. This allows for robust model selection and specific deployment. A suite of models from scikit-learn are trained and tested, with the most promising model chosen to deploy. Since the models are lightweight, we re-train them at each instantiation during deployment with the most current dataset. For deep learning models, it would be appropriate to save and version the weights of a model, can construct the model at instantiation and load the weights.

The training can be found at run_supervised.py with example deployment infrastructure at deploy_supervised.py. How this is implemented at the BMM beamline can be found concisely here, where a wrapper agent does pointwise evaluation on UIDs of a document stream, using the ClassificationAgent's tell--report interface.

System Requirements

Hardware Requirements

Software Requirements

OS Requirements

This package has been tested exclusively on Linux operating systems.

  • RHEL 8.3
  • Ubuntu 18.04
  • PopOS 20.04

Python dependencies

  • numpy
  • matplotlib
  • scikit-learn
  • ipython

Getting Started

Installation guide

Install from github:

$ python3 -m venv pub_env
$ source pub_env/bin/activate
You might also like...
Experiments and examples converting Transformers to ONNX

Experiments and examples converting Transformers to ONNX This repository containes experiments and examples on converting different Transformers to ON

Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab
Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab

PySDM PySDM is a package for simulating the dynamics of population of particles. It is intended to serve as a building block for simulation systems mo

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21) Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark
Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark

Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark Yong

Unadversarial Examples: Designing Objects for Robust Vision
Unadversarial Examples: Designing Objects for Robust Vision

Unadversarial Examples: Designing Objects for Robust Vision This repository contains the code necessary to replicate the major results of our paper: U

Several simple examples for popular neural network toolkits calling custom CUDA operators.
Several simple examples for popular neural network toolkits calling custom CUDA operators.

Neural Network CUDA Example Several simple examples for neural network toolkits (PyTorch, TensorFlow, etc.) calling custom CUDA operators. We provide

A repo with study material, exercises, examples, etc for Devnet SPAUTO
A repo with study material, exercises, examples, etc for Devnet SPAUTO

MPLS in the SDN Era -- DevNet SPAUTO Get right to the study material: Checkout the Wiki! A lab topology based on MPLS in the SDN era book used for 30

🐸STT integration examples

🐸 STT 0.9.x Examples These are various examples on how to use or integrate 🐸 STT using our packages. It is a good way to just try out 🐸 STT before

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛
transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

transfer_adv CVPR-2021 AIC-VI: unrestricted Adversarial Attacks on ImageNet CVPR2021 安全AI挑战者计划第六期赛道2:ImageNet无限制对抗攻击 介绍 : 深度神经网络已经在各种视觉识别问题上取得了最先进的性能。

Comments
  • General repository and package updates

    General repository and package updates

    There are a few smallish tasks that need cleaning up, this includes:

    • [x] make the python project name consistent to be bnl_ml_examples
    • [x] Remove unused files generated by the cookie cutter.
    opened by stuartcampbell 0
  • General code quality updates

    General code quality updates

    Just some general minor updates for things such as:

    • Fix some typos
    • Fix some code quality warnings
    • Formatted files with black
    • Added codeql code security scanning

    If you don't like the black code formatting then please let me know and I will undo it in the PR.

    opened by stuartcampbell 0
  • Generate .py files from Jupyter notebooks automatically

    Generate .py files from Jupyter notebooks automatically

    To ensure that the raw python and notebooks do not diverge, we should treat the notebook as the source of truth and generate the .py files automatically.

    Should add a comment to the top of the generated files to say that they should not be edited.

    opened by stuartcampbell 0
  • Make deploy_supervised.py work standalone

    Make deploy_supervised.py work standalone

    My current plan is to remove the dependencies on the BMM profile. For reading data, I am going to read from our tiled demo server (https://tiled-demo.nsls2.bnl.gov)

    opened by stuartcampbell 1
Owner
BNL
Brookhaven National Laboratory
BNL
Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

?? Transformers Wav2Vec2 + PyCTCDecode Introduction This repo shows how ?? Transformers can be used in combination with kensho-technologies's PyCTCDec

Patrick von Platen 102 Oct 22, 2022
ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

ColossalAI-Examples This repository contains examples of training models with Co

HPC-AI Tech 185 Jan 9, 2023
Code in conjunction with the publication 'Contrastive Representation Learning for Hand Shape Estimation'

HanCo Dataset & Contrastive Representation Learning for Hand Shape Estimation Code in conjunction with the publication: Contrastive Representation Lea

Computer Vision Group, Albert-Ludwigs-Universität Freiburg 38 Dec 13, 2022
constructing maps of intellectual influence from publication data

Influencemap Project @ ANU Influence in the academic communities has been an area of interest for researchers. This can be seen in the popularity of a

CS Metrics 13 Jun 18, 2022
Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Maths from examples - Learning advanced mathematical computations from examples This is the source code and data sets relevant to the paper Learning a

Facebook Research 171 Nov 23, 2022
Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

Hila Chefer 489 Jan 7, 2023
Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial

Utku Ozbulak 53 Jul 4, 2022
A set of examples around hub for creating and processing datasets

Examples for Hub - Dataset Format for AI A repository showcasing examples of using Hub Uploading Dataset Places365 Colab Tutorials Notebook Link Getti

Activeloop 11 Dec 14, 2022
Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python

Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python THIS PROJECT IS CURRENTLY A WORK IN PROGRESS AND THUS THIS REPOSITORY I

Joshua Marshall 14 Dec 31, 2022
Numba-accelerated Pythonic implementation of MPDATA with examples in Python, Julia and Matlab

PyMPDATA PyMPDATA is a high-performance Numba-accelerated Pythonic implementation of the MPDATA algorithm of Smolarkiewicz et al. used in geophysical

Atmospheric Cloud Simulation Group @ Jagiellonian University 15 Nov 23, 2022