The hippynn python package - a modular library for atomistic machine learning with pytorch.

Overview

The hippynn python package - a modular library for atomistic machine learning with pytorch.

We aim to provide a powerful library for the training of atomistic (or physical point-cloud) machine learning. We want entry-level users to be able to efficiently train models to millions of datapoints, and a modular structure for extension or contribution.

While hippynn's development so-far has centered around the HIP-NN architecture, don't let that discourage you if you are performing research with another model. Get in touch, and let's work together to provide a high-quality implementation of your work, either as a contribution or an interface extension to your own package.

Features:

Modular set of pytorch layers for atomistic operations

  • Atomistic operations can be tricky to write in native pytorch. Most operations provided here support linear-scaling models.
  • Model energy, force charge & charge moments, bond orders, and more!
  • nn.Modules are written with minimal reference to the rest of the library; if you want to use them in your scripts without using the rest of the features provided here -- no problem!

Graph level API for simple and flexible construction of models from pytorch components.

  • Build models based on the abstract physics/mathematics of the problem, without having to think about implementation details.
  • Graph nodes support native python syntax, for example different forms of loss can be directly added.
  • Link predicted values in the model with a database entry to compare predicted and true values
  • IndexType logic records metadata about tensor structure, and provides automatic conversion to compatible structures when possible.
  • Graph API is independent of module implementation.

Plot level API for tracking your training.

  • Using the graph API, define quantities to evaluate before, during, or after training as figures using matplotlib.

Training & Experiment API

  • Integrated with graph level API
  • Pretty-printing loss metrics, generating plots periodically
  • Callbacks and checkpointing

Custom Kernels for fast execution

  • Certain operations are not efficiently written in pure pytorch, we provide alternative implementations with numba
  • These are directly linked in with pytorch Autograd -- use them like native pytorch functions.
  • These provide advantages in memory footprint and speed
  • Includes CPU and GPU execution for custom kernels

Interfaces

  • ASE: Define ASE calculators based on the graph-level API.
  • PYSEQM: Use PYSEQM calculations as nodes in a graph.

Installation

  • Clone this repository and navigate into it.
  • Run pip install .

If you fee like tinkering, do an editable install: pip install -e .

You can install using all optional dependencies from pip with: pip install -e .[full]

Notes

  • Install dependencies with pip from requirements.txt .
  • Install dependencies with conda from conda_requirements.txt .
  • If you don't want pip to install them, conda install from file before installing hippynn. You may want to use -c pytorch for the pytorch channel. For ase, you may want to use -c conda-forge.
  • Optional dependencies are in optional_dependencies.txt

We are currently under development. At the moment you should be prepared for breaking changes -- keep track of what version you are using if you need to maintain consistency.

As we clean up the rough edges, we are preparing a manuscript. If, in the mean time, you are using hippynn in your work, please cite this repository and the HIP-NN paper:

Lubbers, N., Smith, J. S., & Barros, K. (2018). Hierarchical modeling of molecular energies using a deep neural network. The Journal of chemical physics, 148(24), 241715.

See AUTHORS.txt for information on authors.

See LICENSE.txt for licensing information. hippynn is licensed under the BSD-3 license.

Triad National Security, LLC (Triad) owns the copyright to hippynn, which it identifies as project number LA-CC-19-093.

Copyright 2019. Triad National Security, LLC. All rights reserved. This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.

Comments
  • Fix a problem with torch RNG when restarting

    Fix a problem with torch RNG when restarting

    When a model is reloaded to a different CUDA device, an error of TypeError: RNG state must be a torch.ByteTensor might be thrown.

    For example, originally the model was trained on GPU 1 and now to load it onto GPU 0, per torch docs, you can do load_checkpoint_from_cwd(map_location={'cuda:1':'cuda:0'}), which works fine. Unfortunately, this way requires users to know which GPU was used, and this is obviously a problem to automation.

    However, using load_checkpoint_from_cwd(map_location=torch.device(0)) or load_checkpoint_from_cwd(map_location=lambda storage, loc: storage.cuda(0)) will throw an error of TypeError: RNG state must be a torch.ByteTensor.

    The codes related are

    hippynn.experiment.serialization.restore_checkpoint

    torch.random.set_rng_state(state["torch_rng_state"])
    

    and torch.random.set_rng_state

    def set_rng_state(new_state: torch.Tensor) -> None:
        r"""Sets the random number generator state.
    
        .. note: This function only works for CPU. For CUDA, please use
                 torch.manual_seed(seed), which works for both CPU and CUDA.
    
        Args:
            new_state (torch.ByteTensor): The desired state
        """
        default_generator.set_state(new_state)
    

    state["torch_rng_state"] will be a tensor on GPU 0 for the latter two ways of map_location

    tensor([13, 78, 72,  ...,  0,  0,  0], device=‘cuda:0’, dtype=torch.uint8
    

    but it will stay on CPU with {'cuda:1':'cuda:0'}

    tensor([13, 78, 72,  ...,  0,  0,  0], dtype=torch.uint8
    

    Forcing the tensor to be transferred to CPU solves the problem, and RNG is originally on CPU anyway.

    opened by tautomer 11
  • Proper device handling for restart

    Proper device handling for restart

    @lubbersnick I guess it's still too early to merge for two reasons,

    1. More tests are needed. I haven't found any issue, but I can always miss something.
    2. Doc update is still missing.

    However, I decided to open this PR, so at least you can review the code, and someone might be able to test it.

    The core part is basically the same as we have discussed, but I split the codes into multiple functions.

    • Two hidden functions
      1. __check_mapping_devices(map_location, model_device) which checks the options and assign values if necessary.
      2. __load_saved_tensors(structure_fname, state_fname, **kwargs) which loads the tensors from disk
    • load_checkpoint and local_model_from_cwd will call these two functions. Actually, the first few lines of loading checkpoint and model are exactly the same. Worth wrapping these lines again?

    Additionally, I took a look at experiment.routines.set_devices, and mimicked its behavior.

            structure["training_modules"].model.to(model_device)
            structure["training_modules"].loss.to(model_device)
            structure["training_modules"].evaluator.model_device = model_device
            structure["training_modules"].evaluator.model = structure["training_modules"].model
    

    Not sure if the last two lines are important or not.

    Tests using a model training on GPU 1. Here are the results

    | Options | Expected behavior | Actual behavior | |---------------------------------------------------------------|------------------------------|------------------------------| | load_checkpoint_from_cwd(map_location={"cuda:1": "cuda:0"}) | model on GPU 0 | model on GPU 0 | | load_checkpoint_from_cwd(map_location=torch.device(0)) | failure because of rng_state | failure because of rng_state | | load_checkpoint_from_cwd(model_device=2) | model on GPU 2 | model on GPU 2 | | load_checkpoint_from_cwd(model_device="auto") | model on GPU 0 | model on GPU 0 | | load_checkpoint_from_cwd(model_device="cpu") | model on CPU | model on CPU | | load_checkpoint_from_cwd() | model on GPU 1 | model on GPU 1 |

    Similar tests were done for load_model_from_cwd as well.

    One thing to note, restoring database always keeps the database on CPU, so my concern was totally unnecessary. Even doing map_location={"cuda:1": "cuda:0"} will keep the database originally on GPU 1 to GPU 0.

    opened by tautomer 9
  • Auto build docs via Actions

    Auto build docs via Actions

    Currently two ways to trigger this action

    • a push to main
    • manual trigger from the Actions tab

    The new target in Makefile is renamed to "html_all", which does make apidoc && make html. Original make html should work as before. "all" alone will be ambiguous. It may refer to all file types.

    I think it makes sense to directly add this to main or an orphan branch which only hosts the actions.

    • Manual trigger only works the default branch. If we add this to another branch, there is no way to manually trigger this action unless you switch default to that branch. And there is no push to main at this moment, so we won't have any docs built for a long time.
    • All actions in different branches will (probably) be triggered at the same time, so we either have only one copy of actions on main, or one copy on a separated branch which is only used to host all kinds of actions. Let me if you want to opt for the second way. We can create an orphan branch to do this to minimize "pollutions" to the main codebase.

    To make this work, you need create an orphan branch first. Let's say it's called gh-pages like what's in the action. As you need something added to commit, you can create a dummy index.html.

    git switch --orphan gh-pages
    touch index.html
    git add .
    git commit -m 'initial commit'
    git push --set-upstream origin gh-pages
    

    Then on GitHub, point pages to this branch image

    Once this PR is merged to main, https://lanl.github.io/hippynn/ should show our docs. The action will take about 3 minutes to finish.

    Future CI should be almost the same workflow, but we will want to check the main branch, development branch, and PRs.

    opened by tautomer 4
  • 'Glue-on' Method for damping Coulomb Interactions Locally.

    'Glue-on' Method for damping Coulomb Interactions Locally.

    Work in Progress; not ready to merge.

    Uses the 'complement' of the cosine cutoff in hipnn for smooth cross-over from short-range hipnn energy predictions to long range coulomb energy predictions.

    No long-range regularization (Wolf, Shifted-force, Ewald) is implemented in this class. Perhaps it will be best have separate classes for short-range 'damping' and long-range 'screening', which may be combined by the ScreenedCoulombEnergy?

    opened by sakibmatin 4
  • LAMMPS ML-IAP Interface

    LAMMPS ML-IAP Interface

    • Add LAMMPS ML-IAP interface based on the LAMMPS ML-IAP Unified Interface.

      • hippynn/interfaces/lammps_interface/mliap_interface.py
      • Inputs: pair_i, pair_j, rij, and nlocal
      • Outputs: local atom energies, total energy, and fij as gradient of local energy w.r.t rij
    • Add scripts for training and pickling models on the ANI Aluminum data set and SNAP Indium Phosphide data set.

      • examples/ani_aluminum_example_multilayer.py trains a two-interaction layer model on ANI Al data
      • examples/lammps_train_model_InP.py trains a model on SNAP InP data
      • examples/pickle_mliap_unified_hippynn_Al_multilayer.py pickles an ML-IAP Unified HIP-NN two-interaction layer model trained on ANI Al data
      • examples/pickle_mliap_unified_hippynn_Al.py pickles an ML-IAP Unified HIP-NN model trained on ANI Al data
      • examples/pickle_mliap_unifeid_hippynn_InP.py pickles an ML-IAP Unified HIP-NN model trained on SNAP InP data

    (Note: the ML-IAP Unified Interface is not yet merged into the LAMMPS codebase.)

    opened by Boogie3D 4
  • [WIP] Fix restart & multi-target dipole

    [WIP] Fix restart & multi-target dipole

    Both should be working fine, but I think I should do more tests at this moment, hence the "WIP" tag.

    Restarting

    It turns out that there are some bugs in the new implementation of restarting.

    1. When neither map_location nor model_device is set, the code will try to move tensors around, which should not happen
    2. When map_location is used, None will be assigned to evaluator's model_device.

    Reloading model is suffered from 1) as well.

    After fixing restart (again), the logic will be like this

    | Scenarios | behavior | | ----------- | ----------- | | map_location=None, model_device=None | Don't move tensors or set evaluator.model_device | | map_location set, model_device=None | Don't move tensors, but set evaluator.model_device | | map_location=None. model_device="cpu" | Don't move tensors, but set evaluator.model_device | | map_location=None. model_device set | Move tensors and set evaluator.model_device | | map_location set, model_device set | Error |

    The old code will treat the first case as the 4th one, which will throw an error. For scenario 2, model_device variable is unset, so the device is now determined from one tensor in the checkpoint. 2 and 3 are in the same if, so the same treatment. We can probably add one more if so model_device is only checked again if it's scenario 2.

    Time for creating the unit tests? When testing manually, I forgot to include scenario 1.

    Dipole

    The multi-target version implementation looks fine. Training only dipole, I get different histograms if comparing state by state between 5 single-target nodes and one 5-target node, but the histograms do look similar. I believe it's working, but let me collect more evidence to be sure.

    opened by tautomer 3
  • Fix a optimizer device problem

    Fix a optimizer device problem

    If the optimizer is initialized first and then the model is transferred to GPU, Adagrad will crash because some of its tensors are on the CPU. This might affect some other optimizers as well.

    See the official documentation for explanations.

    To fix this bug, simply reload the state dictionary again after model transfer.

    opened by tautomer 2
  • Fix settings and low distance warning

    Fix settings and low distance warning

    Some diffs are because of black...

    Apart from that, here are the changes.

    1. Local rc file was incorrectly parsed.
    2. Documentation on the rc files is now consistent with the code.
    3. Incorrect error message in array padding.
    4. Disable the low distance warning from sensitivity plotting.
    5. Update the change log (including the changes in the restart part).
    opened by tautomer 1
  • Transparent plot, doc update, and some misc updates

    Transparent plot, doc update, and some misc updates

    1. Implemented the transparent plot option as a library setting, and defaults to False. One note though, transparent PDF does work, but most of the pdf reader will automatically add a white background, which looks like the keyword isn't working. The "library settings" section of the doc is updated as well.

    2. Stylish the docs with a css file. The default RTD theme has a fixed width of 800 px, which is too narrow for the tables to show properly. Even worse, there is no way to narrow the sidebar, except truncating it or remove it completely. So I decided to lift the limit a little bit. Now the width of the whole page is 80% or 1000 px, whichever is smaller. For anything that is > 1280 pixels horizontally, 1000 will be used. (It will suck for sure on phones or vertically used 1080p monitors, but it sucks even without this PR...) The RTD theme is way too restrictive. They even refuse to merge this... https://github.com/readthedocs/sphinx_rtd_theme/pull/337 Probably nothing more we can do.

      A comparison

      | Before | After | | ----------- | ----------- | | before | after |

      A bonus point is that we don't have to hard code line breaks anymore. The texts will be wrapped.

    3. Misc updates. See if you like it.

      1. Updated .gitignore to exclude files from pip install -e. I guess it won't be useful for normal users, but for the development branch, it should be super useful.
      2. Add tools.py to init.py, so you get auto-completion for stuff like hippynn.tools.log_terminal. I know it's trivial 😂
    opened by tautomer 1
  • Environment variable HIPPYNN_PROGRESS not handled correctly

    Environment variable HIPPYNN_PROGRESS not handled correctly

    >>> 'HIPPYNN_PROGRESS'.lstrip("HIPPYNN_")
    'ROGRESS'
    

    Similarly for other variable, if the first letter matching 'H', 'I', 'N', 'P', 'Y', or '_', it will be striped as well.

    opened by tautomer 0
  • Allow HIPNN to work with ASE Mixing Calculators

    Allow HIPNN to work with ASE Mixing Calculators

    Four additions: Imported Calculator from ase.calculators.calculator (base Calculator class for ASE). Made HippynnCalculator inherit Calculator. Strangely, although HippynnCalculator already inherits interface.Calculator, this appears to be the incorrect class, and when running a LinearCombinationCalculator for example, it will say it must inherit Calculator as defined in ase.calculators.calculator. In the init function for HippynnCalculator, manually added the property "energy" to self.implemented_properties In the calculate function in HippynnCalculator, add key in self.results, "energy", and set it equal to self.results["potential_energy"]

    Added comments on each line added stating that each change is required for using ASE Mixing Calculators

    opened by MChigaev 0
  • CombineScreenings Module

    CombineScreenings Module

    Products of multiple screenings for Screened Coulomb Interactions. E.g. Glue-on for Short-range and Wolf for long-range screenings can be combined as follows:

    combined_screening = CombineScreenings(
            ( LocalDampingCosine(alpha=10.0),  WolfScreening(alpha=0.1) )
        )
    

    The CombineScreenings operates similar to other Screening classes already implemented in Hippynn.

    opened by sakibmatin 0
Owner
Los Alamos National Laboratory
Los Alamos National Laboratory
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

TorchRL Disclaimer This library is not officially released yet and is subject to change. The features are available before an official release so that

Meta Research 860 Jan 7, 2023
ilpyt: imitation learning library with modular, baseline implementations in Pytorch

ilpyt The imitation learning toolbox (ilpyt) contains modular implementations of common deep imitation learning algorithms in PyTorch, with unified in

The MITRE Corporation 11 Nov 17, 2022
An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

DeepNER An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models. This repository contains complex Deep

Derrick 9 May 30, 2022
FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning (FedML) developed and maintained by Scaleout Systems. FEDn enables highly scalable cross-silo and cross-device use-cases over FEDn networks.

Scaleout 75 Nov 9, 2022
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Machine Learning From Scratch About Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The purpose

Erik Linder-Norén 21.8k Jan 9, 2023
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Vowpal Wabbit 8.1k Jan 6, 2023
A modular active learning framework for Python

Modular Active Learning framework for Python3 Page contents Introduction Active learning from bird's-eye view modAL in action From zero to one in a fe

modAL 1.9k Dec 31, 2022
A highly efficient and modular implementation of Gaussian Processes in PyTorch

GPyTorch GPyTorch is a Gaussian process library implemented using PyTorch. GPyTorch is designed for creating scalable, flexible, and modular Gaussian

null 3k Jan 2, 2023
Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

Faster R-CNN and Mask R-CNN in PyTorch 1.0 maskrcnn-benchmark has been deprecated. Please see detectron2, which includes implementations for all model

Facebook Research 9k Jan 4, 2023
PyTorch implementation of Value Iteration Networks (VIN): Clean, Simple and Modular. Visualization in Visdom.

VIN: Value Iteration Networks This is an implementation of Value Iteration Networks (VIN) in PyTorch to reproduce the results.(TensorFlow version) Key

Xingdong Zuo 215 Dec 7, 2022
A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

torchsynth The fastest synth in the universe. Introduction torchsynth is based upon traditional modular synthesis written in pytorch. It is GPU-option

torchsynth 229 Jan 2, 2023
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Dec 30, 2022
robomimic: A Modular Framework for Robot Learning from Demonstration

robomimic [Homepage]   [Documentation]   [Study Paper]   [Study Website]   [ARISE Initiative] Latest Updates [08/09/2021] v0.1.0: Initial code and pap

ARISE Initiative 178 Jan 5, 2023
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Algo Phantoms 81 Nov 26, 2022
This is a Machine Learning Based Hand Detector Project, It Uses Machine Learning Models and Modules Like Mediapipe, Developed By Google!

Machine Learning Hand Detector This is a Machine Learning Based Hand Detector Project, It Uses Machine Learning Models and Modules Like Mediapipe, Dev

Popstar Idhant 3 Feb 25, 2022
PyGAD, a Python 3 library for building the genetic algorithm and training machine learning algorithms (Keras & PyTorch).

PyGAD: Genetic Algorithm in Python PyGAD is an open-source easy-to-use Python 3 library for building the genetic algorithm and optimizing machine lear

Ahmed Gad 1.1k Dec 26, 2022
Python package for Bayesian Machine Learning with scikit-learn API

Python package for Bayesian Machine Learning with scikit-learn API Installing & Upgrading package pip install https://github.com/AmazaspShumik/sklearn

Amazasp Shaumyan 482 Jan 4, 2023
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 3, 2023