The hippynn python package - a modular library for atomistic machine learning with pytorch.

Los Alamos National Laboratory

Last update: Dec 29, 2022

Related tags

Deep Learning hippynn

Overview

The hippynn python package - a modular library for atomistic machine learning with pytorch.

We aim to provide a powerful library for the training of atomistic (or physical point-cloud) machine learning. We want entry-level users to be able to efficiently train models to millions of datapoints, and a modular structure for extension or contribution.

While hippynn's development so-far has centered around the HIP-NN architecture, don't let that discourage you if you are performing research with another model. Get in touch, and let's work together to provide a high-quality implementation of your work, either as a contribution or an interface extension to your own package.

Features:

Modular set of pytorch layers for atomistic operations

Atomistic operations can be tricky to write in native pytorch. Most operations provided here support linear-scaling models.
Model energy, force charge & charge moments, bond orders, and more!
nn.Modules are written with minimal reference to the rest of the library; if you want to use them in your scripts without using the rest of the features provided here -- no problem!

Graph level API for simple and flexible construction of models from pytorch components.

Build models based on the abstract physics/mathematics of the problem, without having to think about implementation details.
Graph nodes support native python syntax, for example different forms of loss can be directly added.
Link predicted values in the model with a database entry to compare predicted and true values
IndexType logic records metadata about tensor structure, and provides automatic conversion to compatible structures when possible.
Graph API is independent of module implementation.

Plot level API for tracking your training.

Using the graph API, define quantities to evaluate before, during, or after training as figures using matplotlib.

Training & Experiment API

Integrated with graph level API
Pretty-printing loss metrics, generating plots periodically
Callbacks and checkpointing

Custom Kernels for fast execution

Certain operations are not efficiently written in pure pytorch, we provide alternative implementations with numba
These are directly linked in with pytorch Autograd -- use them like native pytorch functions.
These provide advantages in memory footprint and speed
Includes CPU and GPU execution for custom kernels

Interfaces

ASE: Define ASE calculators based on the graph-level API.
PYSEQM: Use PYSEQM calculations as nodes in a graph.

Installation

Clone this repository and navigate into it.
Run pip install .

If you fee like tinkering, do an editable install: pip install -e .

You can install using all optional dependencies from pip with: pip install -e .[full]

Notes

Install dependencies with pip from requirements.txt .
Install dependencies with conda from conda_requirements.txt .
If you don't want pip to install them, conda install from file before installing hippynn. You may want to use -c pytorch for the pytorch channel. For ase, you may want to use -c conda-forge.
Optional dependencies are in optional_dependencies.txt

We are currently under development. At the moment you should be prepared for breaking changes -- keep track of what version you are using if you need to maintain consistency.

As we clean up the rough edges, we are preparing a manuscript. If, in the mean time, you are using hippynn in your work, please cite this repository and the HIP-NN paper:

Lubbers, N., Smith, J. S., & Barros, K. (2018). Hierarchical modeling of molecular energies using a deep neural network. The Journal of chemical physics, 148(24), 241715.

See AUTHORS.txt for information on authors.

See LICENSE.txt for licensing information. hippynn is licensed under the BSD-3 license.

Triad National Security, LLC (Triad) owns the copyright to hippynn, which it identifies as project number LA-CC-19-093.

Copyright 2019. Triad National Security, LLC. All rights reserved. This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.

Comments

Fix a problem with torch RNG when restarting
When a model is reloaded to a different CUDA device, an error of TypeError: RNG state must be a torch.ByteTensor might be thrown.

For example, originally the model was trained on GPU 1 and now to load it onto GPU 0, per torch docs, you can do load_checkpoint_from_cwd(map_location={'cuda:1':'cuda:0'}), which works fine. Unfortunately, this way requires users to know which GPU was used, and this is obviously a problem to automation.

However, using load_checkpoint_from_cwd(map_location=torch.device(0)) or load_checkpoint_from_cwd(map_location=lambda storage, loc: storage.cuda(0)) will throw an error of TypeError: RNG state must be a torch.ByteTensor.

The codes related are

hippynn.experiment.serialization.restore_checkpoint

torch.random.set_rng_state(state["torch_rng_state"])

and torch.random.set_rng_state

def set_rng_state(new_state: torch.Tensor) -> None: r"""Sets the random number generator state. .. note: This function only works for CPU. For CUDA, please use torch.manual_seed(seed), which works for both CPU and CUDA. Args: new_state (torch.ByteTensor): The desired state """ default_generator.set_state(new_state)

state["torch_rng_state"] will be a tensor on GPU 0 for the latter two ways of map_location

tensor([13, 78, 72, ..., 0, 0, 0], device=‘cuda:0’, dtype=torch.uint8

but it will stay on CPU with {'cuda:1':'cuda:0'}

tensor([13, 78, 72, ..., 0, 0, 0], dtype=torch.uint8

Forcing the tensor to be transferred to CPU solves the problem, and RNG is originally on CPU anyway.
opened by tautomer 11
Proper device handling for restart
@lubbersnick I guess it's still too early to merge for two reasons,

More tests are needed. I haven't found any issue, but I can always miss something.

Doc update is still missing.

However, I decided to open this PR, so at least you can review the code, and someone might be able to test it.

The core part is basically the same as we have discussed, but I split the codes into multiple functions.

Two hidden functions

__check_mapping_devices(map_location, model_device) which checks the options and assign values if necessary.

__load_saved_tensors(structure_fname, state_fname, **kwargs) which loads the tensors from disk

load_checkpoint and local_model_from_cwd will call these two functions. Actually, the first few lines of loading checkpoint and model are exactly the same. Worth wrapping these lines again?

Additionally, I took a look at experiment.routines.set_devices, and mimicked its behavior.

structure["training_modules"].model.to(model_device) structure["training_modules"].loss.to(model_device) structure["training_modules"].evaluator.model_device = model_device structure["training_modules"].evaluator.model = structure["training_modules"].model

Not sure if the last two lines are important or not.

Tests using a model training on GPU 1. Here are the results

| Options | Expected behavior | Actual behavior | |---------------------------------------------------------------|------------------------------|------------------------------| | load_checkpoint_from_cwd(map_location={"cuda:1": "cuda:0"}) | model on GPU 0 | model on GPU 0 | | load_checkpoint_from_cwd(map_location=torch.device(0)) | failure because of rng_state | failure because of rng_state | | load_checkpoint_from_cwd(model_device=2) | model on GPU 2 | model on GPU 2 | | load_checkpoint_from_cwd(model_device="auto") | model on GPU 0 | model on GPU 0 | | load_checkpoint_from_cwd(model_device="cpu") | model on CPU | model on CPU | | load_checkpoint_from_cwd() | model on GPU 1 | model on GPU 1 |

Similar tests were done for load_model_from_cwd as well.

One thing to note, restoring database always keeps the database on CPU, so my concern was totally unnecessary. Even doing map_location={"cuda:1": "cuda:0"} will keep the database originally on GPU 1 to GPU 0.
opened by tautomer 9
Auto build docs via Actions
Currently two ways to trigger this action

a push to main

manual trigger from the Actions tab

The new target in Makefile is renamed to "html_all", which does make apidoc && make html. Original make html should work as before. "all" alone will be ambiguous. It may refer to all file types.

I think it makes sense to directly add this to main or an orphan branch which only hosts the actions.

Manual trigger only works the default branch. If we add this to another branch, there is no way to manually trigger this action unless you switch default to that branch. And there is no push to main at this moment, so we won't have any docs built for a long time.

All actions in different branches will (probably) be triggered at the same time, so we either have only one copy of actions on main, or one copy on a separated branch which is only used to host all kinds of actions. Let me if you want to opt for the second way. We can create an orphan branch to do this to minimize "pollutions" to the main codebase.

To make this work, you need create an orphan branch first. Let's say it's called gh-pages like what's in the action. As you need something added to commit, you can create a dummy index.html.

git switch --orphan gh-pages touch index.html git add . git commit -m 'initial commit' git push --set-upstream origin gh-pages

Then on GitHub, point pages to this branch

Once this PR is merged to main, https://lanl.github.io/hippynn/ should show our docs. The action will take about 3 minutes to finish.

Future CI should be almost the same workflow, but we will want to check the main branch, development branch, and PRs.
opened by tautomer 4
'Glue-on' Method for damping Coulomb Interactions Locally.

Work in Progress; not ready to merge.

Uses the 'complement' of the cosine cutoff in hipnn for smooth cross-over from short-range hipnn energy predictions to long range coulomb energy predictions.

No long-range regularization (Wolf, Shifted-force, Ewald) is implemented in this class. Perhaps it will be best have separate classes for short-range 'damping' and long-range 'screening', which may be combined by the ScreenedCoulombEnergy?

opened by sakibmatin 4
LAMMPS ML-IAP Interface
Add LAMMPS ML-IAP interface based on the LAMMPS ML-IAP Unified Interface.

hippynn/interfaces/lammps_interface/mliap_interface.py

Inputs: pair_i, pair_j, rij, and nlocal

Outputs: local atom energies, total energy, and fij as gradient of local energy w.r.t rij

Add scripts for training and pickling models on the ANI Aluminum data set and SNAP Indium Phosphide data set.

examples/ani_aluminum_example_multilayer.py trains a two-interaction layer model on ANI Al data

examples/lammps_train_model_InP.py trains a model on SNAP InP data

examples/pickle_mliap_unified_hippynn_Al_multilayer.py pickles an ML-IAP Unified HIP-NN two-interaction layer model trained on ANI Al data

examples/pickle_mliap_unified_hippynn_Al.py pickles an ML-IAP Unified HIP-NN model trained on ANI Al data

examples/pickle_mliap_unifeid_hippynn_InP.py pickles an ML-IAP Unified HIP-NN model trained on SNAP InP data

(Note: the ML-IAP Unified Interface is not yet merged into the LAMMPS codebase.)
opened by Boogie3D 4
[WIP] Fix restart & multi-target dipole
Both should be working fine, but I think I should do more tests at this moment, hence the "WIP" tag.

Restarting

It turns out that there are some bugs in the new implementation of restarting.

When neither map_location nor model_device is set, the code will try to move tensors around, which should not happen

When map_location is used, None will be assigned to evaluator's model_device.

Reloading model is suffered from 1) as well.

After fixing restart (again), the logic will be like this

| Scenarios | behavior | | ----------- | ----------- | | map_location=None, model_device=None | Don't move tensors or set evaluator.model_device | | map_location set, model_device=None | Don't move tensors, but set evaluator.model_device | | map_location=None. model_device="cpu" | Don't move tensors, but set evaluator.model_device | | map_location=None. model_device set | Move tensors and set evaluator.model_device | | map_location set, model_device set | Error |

The old code will treat the first case as the 4th one, which will throw an error. For scenario 2, model_device variable is unset, so the device is now determined from one tensor in the checkpoint. 2 and 3 are in the same if, so the same treatment. We can probably add one more if so model_device is only checked again if it's scenario 2.

Time for creating the unit tests? When testing manually, I forgot to include scenario 1.

Dipole

The multi-target version implementation looks fine. Training only dipole, I get different histograms if comparing state by state between 5 single-target nodes and one 5-target node, but the histograms do look similar. I believe it's working, but let me collect more evidence to be sure.
opened by tautomer 3
Fix a optimizer device problem

If the optimizer is initialized first and then the model is transferred to GPU, Adagrad will crash because some of its tensors are on the CPU. This might affect some other optimizers as well.

See the official documentation for explanations.

To fix this bug, simply reload the state dictionary again after model transfer.

opened by tautomer 2
Fix settings and low distance warning
Some diffs are because of black...

Apart from that, here are the changes.

Local rc file was incorrectly parsed.

Documentation on the rc files is now consistent with the code.

Incorrect error message in array padding.

Disable the low distance warning from sensitivity plotting.

Update the change log (including the changes in the restart part).
opened by tautomer 1
Transparent plot, doc update, and some misc updates
Implemented the transparent plot option as a library setting, and defaults to False. One note though, transparent PDF does work, but most of the pdf reader will automatically add a white background, which looks like the keyword isn't working. The "library settings" section of the doc is updated as well.

Stylish the docs with a css file. The default RTD theme has a fixed width of 800 px, which is too narrow for the tables to show properly. Even worse, there is no way to narrow the sidebar, except truncating it or remove it completely. So I decided to lift the limit a little bit. Now the width of the whole page is 80% or 1000 px, whichever is smaller. For anything that is > 1280 pixels horizontally, 1000 will be used. (It will suck for sure on phones or vertically used 1080p monitors, but it sucks even without this PR...) The RTD theme is way too restrictive. They even refuse to merge this... https://github.com/readthedocs/sphinx_rtd_theme/pull/337 Probably nothing more we can do.

A comparison

| Before | After | | ----------- | ----------- | | | |

A bonus point is that we don't have to hard code line breaks anymore. The texts will be wrapped.

Misc updates. See if you like it.

Updated .gitignore to exclude files from pip install -e. I guess it won't be useful for normal users, but for the development branch, it should be super useful.

Add tools.py to init.py, so you get auto-completion for stuff like hippynn.tools.log_terminal. I know it's trivial 😂
opened by tautomer 1
Environment variable HIPPYNN_PROGRESS not handled correctly
>>> 'HIPPYNN_PROGRESS'.lstrip("HIPPYNN_") 'ROGRESS'

Similarly for other variable, if the first letter matching 'H', 'I', 'N', 'P', 'Y', or '_', it will be striped as well.
opened by tautomer 0
Allow HIPNN to work with ASE Mixing Calculators

Four additions: Imported Calculator from ase.calculators.calculator (base Calculator class for ASE). Made HippynnCalculator inherit Calculator. Strangely, although HippynnCalculator already inherits interface.Calculator, this appears to be the incorrect class, and when running a LinearCombinationCalculator for example, it will say it must inherit Calculator as defined in ase.calculators.calculator. In the init function for HippynnCalculator, manually added the property "energy" to self.implemented_properties In the calculate function in HippynnCalculator, add key in self.results, "energy", and set it equal to self.results["potential_energy"]

Added comments on each line added stating that each change is required for using ASE Mixing Calculators

opened by MChigaev 0
CombineScreenings Module
Products of multiple screenings for Screened Coulomb Interactions. E.g. Glue-on for Short-range and Wolf for long-range screenings can be combined as follows:

combined_screening = CombineScreenings( ( LocalDampingCosine(alpha=10.0), WolfScreening(alpha=0.1) ) )

The CombineScreenings operates similar to other Screening classes already implemented in Hippynn.
opened by sakibmatin 0

The hippynn python package - a modular library for atomistic machine learning with pytorch.

Related tags

Overview

The hippynn python package - a modular library for atomistic machine learning with pytorch.

Features:

Modular set of pytorch layers for atomistic operations

Graph level API for simple and flexible construction of models from pytorch components.

Plot level API for tracking your training.

Training & Experiment API

Custom Kernels for fast execution

Interfaces

Installation

Notes

Comments

Restarting

Dipole

Owner

Los Alamos National Laboratory

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

ilpyt: imitation learning library with modular, baseline implementations in Pytorch

An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

A modular active learning framework for Python

A highly efficient and modular implementation of Gaussian Processes in PyTorch

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

PyTorch implementation of Value Iteration Networks (VIN): Clean, Simple and Modular. Visualization in Visdom.

A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

robomimic: A Modular Framework for Robot Learning from Demonstration

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

This is a Machine Learning Based Hand Detector Project, It Uses Machine Learning Models and Modules Like Mediapipe, Developed By Google!

PyGAD, a Python 3 library for building the genetic algorithm and training machine learning algorithms (Keras & PyTorch).

Python package for Bayesian Machine Learning with scikit-learn API

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.