GT4SD, an open-source library to accelerate hypothesis generation in the scientific discovery process.

Generative Toolkit 4 Scientific Discovery

Last update: Dec 24, 2022

Related tags

Deep Learning gt4sd-core

Overview

GT4SD (Generative Toolkit for Scientific Discovery)

The GT4SD (Generative Toolkit for Scientific Discovery) is an open-source platform to accelerate hypothesis generation in the scientific discovery process. It provides a library for making state-of-the-art generative AI models easier to use.

Installation

pip

You can install gt4sd directly from GitHub:

pip install git+https://github.com/GT4SD/gt4sd-core

Development setup & installation

If you would like to contribute to the package, we recommend the following development setup: Clone the gt4sd-core repository:

git clone [email protected]:GT4SD/gt4sd-core.git
cd gt4ds-core
conda env create -f conda.yml
conda activate gt4sd
pip install -e .

Learn more in CONTRIBUTING.md

Supported packages

Beyond implementing various generative modeling inference and training pipelines GT4SD is designed to provide a high-level API that implement an harmonized interface for several existing packages:

GuacaMol: inference pipelines for the baselines models.
MOSES: inference pipelines for the baselines models.
TAPE: encoder modules compatible with the protein language models.
PaccMann: inference pipelines for all algorithms of the PaccMann family as well as traiing pipelines for the generative VAEs.
transformers: training and inference pipelines for generative models from the HuggingFace Models

Using GT4SD

Running inference pipelines

Running an algorithm is as easy as typing:

from gt4sd.algorithms.conditional_generation.paccmann_rl.core import (
    PaccMannRLProteinBasedGenerator, PaccMannRL
)
target = 'MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTT'
# algorithm configuration with default parameters
configuration = PaccMannRLProteinBasedGenerator()
# instantiate the algorithm for sampling
algorithm = PaccMannRL(configuration=configuration, target=target)
items = list(algorithm.sample(10))
print(items)

Or you can use the ApplicationRegistry to run an algorithm instance using a serialized representation of the algorithm:

from gt4sd.algorithms.registry import ApplicationsRegistry
target = 'MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTT'
algorithm = ApplicationsRegistry.get_application_instance(
    target=target,
    algorithm_type='conditional_generation',
    domain='materials',
    algorithm_name='PaccMannRL',
    algorithm_application='PaccMannRLProteinBasedGenerator',
    generated_length=32,
    # include additional configuration parameters as **kwargs
)
items = list(algorithm.sample(10))
print(items)

Running training pipelines via the CLI command

GT4SD provides a trainer client based on the gt4sd-trainer CLI command. The trainer currently supports training pipelines for language modeling (language-modeling-trainer), PaccMann (paccmann-vae-trainer) and Granular (granular-trainer, multimodal compositional autoencoders).

$ gt4sd-trainer --help
usage: gt4sd-trainer [-h] --training_pipeline_name TRAINING_PIPELINE_NAME
                     [--configuration_file CONFIGURATION_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --training_pipeline_name TRAINING_PIPELINE_NAME
                        Training type of the converted model, supported types:
                        granular-trainer, language-modeling-trainer, paccmann-
                        vae-trainer. (default: None)
  --configuration_file CONFIGURATION_FILE
                        Configuration file for the trainining. It can be used
                        to completely by-pass pipeline specific arguments.
                        (default: None)

To launch a training you have two options.

You can either specify the training pipeline and the path of a configuration file that contains the needed training parameters:

gt4sd-trainer  --training_pipeline_name ${TRAINING_PIPELINE_NAME} --configuration_file ${CONFIGURATION_FILE}

Or you can provide directly the needed parameters as argumentsL

gt4sd-trainer  --training_pipeline_name language-modeling-trainer --type mlm --model_name_or_path mlm --training_file /pah/to/train_file.jsonl --validation_file /path/to/valid_file.jsonl

To get more info on a specific training pipeleins argument simply type:

gt4sd-trainer --training_pipeline_name ${TRAINING_PIPELINE_NAME} --help

References

If you use gt4sd in your projects, please consider citing the following:

@software{GT4SD,
author = {GT4SD Team},
month = {2},
title = {{GT4SD (Generative Toolkit for Scientific Discovery)}},
url = {https://github.com/GT4SD/gt4sd-core},
version = {main},
year = {2022}
}

License

The gt4sd codebase is under MIT license. For individual model usage, please refer to the model licenses found in the original packages.

Comments

cli-upload
cli-upload

Add upload functionality to the command line. It gives the user the possibility to upload specific artifacts on a server.

Given a specific version for an algorithm:

check if that version is already on the server: - check if the folder bucket/algorithm_type/algorithm_name/algorithm_application/version/ exists.

If yes, tell the user and stop the upload.

If not, upload all the files in that version.

cli-upload relies on minio and has been tested locally using docker-compose. cli-upload can be used to upload on a cloud or local server.

How to use cli-upload

Following the example in the README (in the Saving a trained algorithm for inference via the CLI command section) and assuming a trained model in /tmp/test_cli_upload, run:

gt4sd-upload --training_pipeline_name paccmann-vae-trainer --model_path /tmp/test_cli_upload --training_name fast-example --target_version fast-example-v0 --algorithm_application PaccMannGPGenerator
opened by georgosgeorgos 15
MOSES VAE from Guacamol training reconstruction is "incorrect"
Describe the bug The VAE in GT4SD uses the wrapper of the Moses VAE from Guacamol. Unfortunately, the decoding training step from the Moses VAE is bugged.

More detail The problem arises from the definition of the forward_decoder method:

def forward_decoder(self, x, z): lengths = [len(i_x) for i_x in x] x = nn.utils.rnn.pad_sequence(x, batch_first=True, padding_value=self.pad) x_emb = self.x_emb(x) z_0 = z.unsqueeze(1).repeat(1, x_emb.size(1), 1) x_input = torch.cat([x_emb, z_0], dim=-1) # <--- PROBLEM 1 x_input = nn.utils.rnn.pack_padded_sequence(x_input, lengths, batch_first=True) h_0 = self.decoder_lat(z) h_0 = h_0.unsqueeze(0).repeat(self.decoder_rnn.num_layers, 1, 1) output, _ = self.decoder_rnn(x_input, h_0) output, _ = nn.utils.rnn.pad_packed_sequence(output, batch_first=True) y = self.decoder_fc(output) recon_loss = F.cross_entropy( # <--- PROBLEM 2 y[:, :-1].contiguous().view(-1, y.size(-1)), x[:, 1:].contiguous().view(-1), ignore_index=self.pad ) return recon_loss

Namely, the reconstruction step is wrong in two spots:

construction of the true input: x_input = torch.cat([x_emb, z_0], dim=-1) In the visual representation of a typical RNN, the true token feeds in from the 'bottom" of the cell and the previous hidden state from the "left". In this implementation, the reparameterized latent vector z is fed in both from the "left" (normal) and the "bottom" (atypical). Fix: this line should be removed

calculation of the reconstruction loss: recon_loss = F.cross_entropy(...) This reconstruction loss is calculated as the per-token loss of the input batch (i.e., the mean of a batch of tokens) because the default reduction in F.cross_entropy is "mean". In turn, this results in reconstruction losses that are very low for the VAE, causing the optimizer to ignore the decoder and focus on the encoder. When a VAE focuses too hard on the encoder, you get mode collapse, and that's what happens with the Moses VAE. Fix: this line should be: F.cross_entropy(..., reduction="sum") / len(x)

To reproduce

Problem 1 is not a "problem" so much as it is highly atypical to structure a VAE like this. I can't say if it results in any actual problems, but it simply shouldn't be there

Problem 2 can be observed with two experiments:

Using PCA with two dimensions, plot the embeddings of a random batch z ~ q(z|x) and a sample from the standard normal distribution z ~ N(0, I). The embeddings from the encoder will look like a point at (0, 0) compared to the samples from the standard normal

Measure the reconstruction accuracy x_r ~ p(x | z ~ q(z | x_0)). In a well-trained VAE, sum(x_r == x_0 for x_0 in xs) / len(xs) should be above 50%. This VAE is generally fairly low (in my experience).

bug
opened by davidegraff 12
Improve CLA workflow

actions to commit to other peoples forks was not something super easy to do, so I'm settling for a bit more verbosity and automation.

the issue will be closed with a comment to the commit that added the contributor. There is a notice to merge this into a PR.

Therefore there is no assignment of the issue any more.

Looks like this: https://github.com/C-nit/gt4sd-core/issues/9 and can also be triggered in a different way: https://github.com/C-nit/gt4sd-core/issues/11

opened by C-nit 11
feat: Support in RT Trainer for multiple entities.

Solving #143 by expanding the Regression Transformer trainer to support multi-entity discriminations, i.e., support the multientity_cg collator from the RT repo.

Signed-off-by: Nicolai Ree Ree@sunray

opened by NicolaiRee 9
feat: property_predictors in scorer
Implement PropertyPredictorScorer in domains.materials.property_scorer. - circular import using domains.materials.scorer for the implementation

We are simply using the PropertyPredictorRegistry to select a property and parameters by name and PropertyPredictionScorer to compute a score on a sample wrt a target value.

Tests mimick the logic in properties.

cla-signed
opened by georgosgeorgos 8
Training pipeline Regression Transformer
Adding new training pipeline for RT

allows to finetune existing models available in the toolkit

allows to train models from scratch

patching LRSchedulers in torchdrug --> they are needed for RT training and threw errors

cla-signed
opened by jannisborn 6
Added toxicity and affinity to visum notebook

Signed-off-by: Eduardo [email protected] Added toxicity (Tox21 model from https://github.com/PaccMann/paccmann_sarscov2) and affinity (Paccmann predictor) to the notebook.

@drugilsberg , I am not sure about one specific step in the notebook and I would really appreciate it if you could help: When calling the sample in PaccMannGP for the first time the first line of the output is

configuring optimization for target: {'qed': {'weight': 1.0}, 'sa': {'weight': 1.0}}

However, on the second call to the same object (no reinitialization), in section "Sampling and Plotting Molecules with GT4SD", the first line reads:

configuring optimization for target: {'qed': {}, 'sa': {}}

Do you know if this has any influence on the molecules being generated? I attached a PDF file with the output for convenience.

visum-2022-handson-generative-models.pdf

@helenaMontenegro , the notebook now requires users to download a small model, but I don't think this is a problem.
cla-signed

opened by edux300 5
Problem multiprocess in requirements

The new multiprocess library version (0.70.13) gives problems when installing gt4sd-core using the development mode. I had to set multiprocess==0.70.12.2 to install the library.

opened by georgosgeorgos 5
Torchdrug trainer pipeline
Implemented torchdrug trainer pipeline. Models can be used via:

gt4sd-trainer --training_pipeline_name torchdrug-gcpn-trainer -h gt4sd-trainer --training_pipeline_name torchdrug-graphaf-trainer -h

Features:

[x] Support for the same two models that are available via inference TorchDrugGCPN and TorchDrugGraphAF.

[x] Both models can be trained on all MoleculeDatasets from torchdrug.Datasets. Those are around 20 predefined datasets.

[x] Implemented a custom dataset where users can pass their own data

[x] In addition to the unittests I verified functionalities from the CLI via gt4sd-trainer

Problems:

[ ] Property optimization does not work, due to instabilities in TorchDrug. I opened issue and PR but we have to wait until they merge, release a new version and then bump our dependency. The code I wrote here already supports the property optimization but I disabled the unittest for the moment because it would fail due to the TorchDrug issue. See details: https://github.com/DeepGraphLearning/torchdrug/issues/83

[x] gt4sd-saving: I ran a test via CLI but the saving failed. Not sure how problematic this is, here's the error:

INFO:gt4sd.cli.saving:Selected configuration: ConfigurationTuple(algorithm_type='generation', domain='materials', algorithm_name='TorchDrugGenerator', algorithm_application='TorchDrugGCPN') INFO:gt4sd.cli.saving:Saving model version "fast" with the following configuration: <class 'gt4sd.algorithms.generation.torchdrug.core.TorchDrugGCPN'> INFO:gt4sd.algorithms.core:TorchDrugGCPN can not save a version based on TorchDrugSavingArguments(model_path='/Users/jab/.gt4sd/runs/', training_name='gcpn_test')
enhancement cla-signed
opened by jannisborn 5
RT sampling_wrapper to specify a substructure or series of tokens to keep unmasked
I would like to propose an upgrade on the feature demonstrated in this notebook: https://github.com/GT4SD/gt4sd-core/blob/main/notebooks/regression-transformer-demo.ipynb (see cells 12-14)

In addition to explicitly specifying tokens_to_mask, one probably could more likely imagine that a chemist might want to specify a substructure to mask or to "freeze" (keep unchanged, i.e. unmasked). It might be easier to specify tokens to freeze as that would be just selecting a part of the string to be kept unmasked. Prototype example is given below.

sampling_wrapper={ 'property_goal': { '<logp>': 6.123, '<scs>': 1.5 }, 'fraction_to_mask': 0.6, # keep morpholino tail unchanged 'tokens_to_freeze': ['N4CCOCC4'] }

If one could specify substructure to freeze or to mask - that would be potentially even more advantageous, as that would remove ambiguities when a substructure can be expressed in more than one sequence.

sampling_wrapper={ 'property_goal': { '<logp>': 6.123, '<scs>': 1.5 }, 'fraction_to_mask': 0.6, # keep morpholino tail unchanged 'substructure_to_freeze': ['N1CCOCC1'], # explicitly mask benzene ring moiety 'substructure_to_mask': ['C1=CC=CC=C1'], }

One could use RDKit functionality to identify substructure tokens, as given here: https://www.rdkit.org/docs/Cookbook.html#substructure-matching

Regarding the interpretation of the 'fraction_to_mask', I would then imagine that it would best applied to the remaining set of tokens (after tokens_to_freeze and explicit tokens_to_mask are excluded). I hope this makes sense, happy to clarify and exemplify further.
enhancement
opened by OleinikovasV 4
Artifact storage for property predictors
Closes #116

Now we can store artifacts also for property predictors

New property predictors are tested

One thing that remains to do is to have functions under gt4sd.properties.molecules.functions. Atm this is not yet supported since it would yield circular imports.

cla-signed
opened by jannisborn 4
RT saving pipeline
Closes #169

gt4sd-saving now also supports the RT training pipeline. I implemented the get_filepath_mappings_for_training_pipeline_arguments method. The inference.json is now created inside the RT trainer and also saved in the model folder such that it can later be copied by gt4sd-saving. The Property class was needed as a helper for this, to track some attributes of each property.

Expanded the RT example. Describes now a full process of training/finetuning a model, saving it with gt4sd-saving, running inference on it and finally uploading it to the model hub.

I tested everything with the example from the README

Minors:

adding a method filter_stubbed to the molecular RT that removes stub-like molecules("invalid SELFIES")

Bumping paccmann_gp dependency

enhancement cla-signed
opened by jannisborn 0
RegressionTransformer saving pipeline
Is your feature request related to a problem? Please describe. gt4sd-saving is not fully supportive of RT

ToDo:

Implement get_filepath_mappings_for_training_pipeline_arguments

Save inference.json to model dir

enhancement
opened by jannisborn 0
Disentangle properties from algorithms

Is your feature request related to a problem? Please describe. Currently, the properties submodule imports stuff from algorithms.core and thus also from that __init__. In the init, we registry all the training pipelines and thus, one needs to have all those dependencies installed, including torchdrug, guacamol_baselines and other vcs-requirements

Describe the solution you'd like Creating a submodule gt4sd.core that specifies base classes used by multiple submodules like gt4sd.algorithms or gt4sd.properties

Describe alternatives you've considered Do the imports only when someone calls list_available_algorithms

NOTE: When creating gt4sd.core we have to make sure that all the rest remains functional, including relative imports, jupyter notebooks (should be fine since we barely import from algorithms.core directly) and in particular also documentation
enhancement

opened by jannisborn 0
Add methods for artifact-based property predictors

Is your feature request related to a problem? Please describe. Currently the artifact-based property predictors (like gt4sd.properties.molecules.core.Tox21) are not usable as functions via gt4sd.properties.molecules.tox_21, unlike all the non-artifact-based properties). Moving the functions there would yield circula import issues

Describe the solution you'd like A small refactor that goes around the circular imports
enhancement

opened by jannisborn 0
Refactor AlgorithmConfiguration baseclass
Inconsistent types between AlgorithmConfiguration base class and the child ConfigurablePropertyAlgorithm Configuration, concerning attributes like domain but also methods like ensure_artifacts_for_version (class methods in the base class but instance methods in the base class).

A simple refactor into 3 instead of 2 classes should fix this.

Originally posted by @jannisborn in https://github.com/GT4SD/gt4sd-core/pull/121#discussion_r943649339

So the ones in the contstructor for lines like self.domain=domain says: error: Cannot assign to class variable "domain" via instance. That's because in the parent class (AlgorithmConfiguration) we set it as domain: ClassVar[str]

the ones in the signatures like get_application_prefix which returns a str are because in the parent class those are class methods, not instance methods. THe error is Signature of "get_application_prefix" incompatible with supertype "AlgorithmConfiguration

It might be fixable by a refactor but I'm not sure it's worth it

refactoring
opened by jannisborn 0

Releases(v1.0.4)

v1.0.4(Dec 20, 2022)

Moses/Guacamol VAE updates.
Source code(tar.gz)
Source code(zip)
v1.0.3(Dec 9, 2022)

Release updating KeyBERT version.
Source code(tar.gz)
Source code(zip)
v1.0.2(Dec 9, 2022)

Release with PaccMann VAE fine-tuning option in trainer.
Source code(tar.gz)
Source code(zip)
v1.0.1(Dec 5, 2022)
Release including:

adding save dataset functionality to RT trainer

update unit tests for SCScore.

Source code(tar.gz)
Source code(zip)
v1.0.0(Nov 17, 2022)
Notes:

Miscellaneous fixes in notebooks for submission

improved error handling and documentation in RT

fixed a bug related to GPU usage in paccmann_gp inference

improved error handling in algorithms/core.py

refactor: remove .core from inits

fix: torchdrug openMP mismatch

refactor: avoid signal to run core outside main thread

Source code(tar.gz)
Source code(zip)
v0.58.0(Nov 8, 2022)

Release with support in RT Trainer for multiple entities.
Source code(tar.gz)
Source code(zip)
v0.57.1(Nov 5, 2022)

Release containing minor patches for version compatibility, configuration fixes and pydantic upper bound removal.
Source code(tar.gz)
Source code(zip)
v0.57.0(Nov 3, 2022)

Release including a sampling wrapper extension for RT that allows to handle substructures.
Source code(tar.gz)
Source code(zip)
v0.56.0(Oct 28, 2022)

Release including MolGX community edition.
Source code(tar.gz)
Source code(zip)
v0.55.0(Oct 26, 2022)

Release including GeoDiff.
Source code(tar.gz)
Source code(zip)
v0.54.1(Oct 24, 2022)

Release including an extension of RegressionTransformer to handle property-specific tolerances.
Source code(tar.gz)
Source code(zip)
v0.54.0(Oct 10, 2022)

Release extending python 3.8 support by updating MoLeR dependency.
Source code(tar.gz)
Source code(zip)
v0.53.0(Oct 10, 2022)

Release moving to importlib_metadata.
Source code(tar.gz)
Source code(zip)
v0.52.0(Oct 10, 2022)

Release updating pytorch_lightning and allow finer configuration of GPU usage.
Source code(tar.gz)
Source code(zip)
v0.51.0(Oct 6, 2022)

Including additional TDC oracles.
Source code(tar.gz)
Source code(zip)
v0.50.0(Sep 13, 2022)

Initial implementation of an interface for Diffusers.
Source code(tar.gz)
Source code(zip)
v0.49.1(Sep 9, 2022)

Updating validation for RegressionTransformer to discard samples identical to the seed.
Source code(tar.gz)
Source code(zip)
v0.49.0(Sep 6, 2022)

Releasing initial implementation of GFlowNets as a framework. It also includes a dedicated training pipeline.
Source code(tar.gz)
Source code(zip)
v0.48.2(Aug 25, 2022)

Uniforming logging and updating deps.
Source code(tar.gz)
Source code(zip)
v0.48.1(Aug 25, 2022)

Configurable cache using environment.
Source code(tar.gz)
Source code(zip)
v0.48.0(Aug 22, 2022)

Enabling configuration of the SSL default contexts verification via environment variables.
Source code(tar.gz)
Source code(zip)
v0.47.0(Aug 12, 2022)

Extended properties submodule supporting cache.
Source code(tar.gz)
Source code(zip)
v0.46.0(Jul 21, 2022)

Initial version of the properties module.
Source code(tar.gz)
Source code(zip)
v0.45.0(Jul 14, 2022)
Improvements on dynamic GPU handling and TorchDrug unpatching:

dynamic inference for RegressionTransformer

finer control on torch versions in the unpatching module

Source code(tar.gz)
Source code(zip)
v0.44.2(Jul 12, 2022)

Supporting flexible device allocation for RegressionTransformer inference.
Source code(tar.gz)
Source code(zip)
v0.44.1(Jul 12, 2022)

Updating MoLeR dependency to version v0.2.0.
Source code(tar.gz)
Source code(zip)
v0.44.0(Jul 5, 2022)

Updating target handling in PaccMannGP to ensure deepcopy in-between repeated runs.
Source code(tar.gz)
Source code(zip)
v0.43.0(Jul 5, 2022)

Parametrization support for Granular CLI.
Source code(tar.gz)
Source code(zip)
v0.42.0(Jul 1, 2022)

Adding training pipeline for RegressionTransformer.
Source code(tar.gz)
Source code(zip)
v0.41.0(Jun 22, 2022)

Support for a public model hub. Initial version with no possibility to replace/overwrite existing models.
Source code(tar.gz)
Source code(zip)

Owner

Generative Toolkit 4 Scientific Discovery

GitHub

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集，包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。人机交互主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022

Details about the wide minima density hypothesis and metrics to compute width of a minima

wide-minima-density-hypothesis Details about the wide minima density hypothesis and metrics to compute width of a minima This repo presents the wide m

9 Dec 27, 2022

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

59 Dec 28, 2022

PyTorch implementation of the paper The Lottery Ticket Hypothesis for Object Recognition

LTH-ObjectRecognition The Lottery Ticket Hypothesis for Object Recognition Sharath Girish*, Shishira R Maiya*, Kamal Gupta, Hao Chen, Larry Davis, Abh

16 Feb 6, 2022

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

FunMatch-Distillation TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A g

67 Dec 20, 2022

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation This repo is the official implementation of "MHFormer: Multi-Hypothesis Transforme

281 Jan 7, 2023

(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

ClassSR (CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic Paper Authors: Xiangtao Kong, Hengyuan

308 Jan 5, 2023

Accelerate Neural Net Training by Progressively Freezing Layers

FreezeOut A simple technique to accelerate neural net training by progressively freezing layers. This repository contains code for the extended abstra

203 Jun 19, 2022

PyTorch common framework to accelerate network implementation, training and validation

pytorch-framework PyTorch common framework to accelerate network implementation, training and validation. This framework is inspired by works from MML

3 Dec 19, 2022

🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools

Hugging Face Optimum ?? Optimum is an extension of ?? Transformers, providing a set of performance optimization tools enabling maximum efficiency to t

842 Dec 30, 2022

Scientific Computation Methods in C and Python (Open for Hacktoberfest 2021)

Sci - cpy README is a stub. Do expand it. Objective This repository is meant to be a ready reference for scientific computation methods. Do ⭐ it if yo

7 Oct 12, 2022

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics This work will be published in Nature Biomedical

71 Nov 15, 2022

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

27 Dec 20, 2022

xitorch: differentiable scientific computing library

xitorch is a PyTorch-based library of differentiable functions and functionals that can be widely used in scientific computing applications as well as deep learning.

24 Apr 15, 2021

Differentiable scientific computing library

xitorch: differentiable scientific computing library xitorch is a PyTorch-based library of differentiable functions and functionals that can be widely

98 Dec 26, 2022

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

18 Dec 29, 2022

OpenGAN: Open-Set Recognition via Open Data Generation

OpenGAN: Open-Set Recognition via Open Data Generation ICCV 2021 (oral) Real-world machine learning systems need to analyze novel testing data that di

90 Jan 6, 2023

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources (e.g. just the lead vocals).

14 Nov 7, 2022

Crab is a ﬂexible, fast recommender engine for Python that integrates classic information ﬁltering recommendation algorithms in the world of scientiﬁc Python packages (numpy, scipy, matplotlib).

Crab - A Recommendation Engine library for Python Crab is a ﬂexible, fast recommender engine for Python that integrates classic information ﬁltering r

1.2k Dec 21, 2022

GT4SD, an open-source library to accelerate hypothesis generation in the scientific discovery process.

Related tags

Overview

GT4SD (Generative Toolkit for Scientific Discovery)

Installation

pip

Development setup & installation

Supported packages

Using GT4SD

Running inference pipelines

Running training pipelines via the CLI command

References

License

Comments

cli-upload

How to use cli-upload

Releases(v1.0.4)

v1.0.4(Dec 20, 2022)

v1.0.3(Dec 9, 2022)

v1.0.2(Dec 9, 2022)

v1.0.1(Dec 5, 2022)

v1.0.0(Nov 17, 2022)

v0.58.0(Nov 8, 2022)

v0.57.1(Nov 5, 2022)

v0.57.0(Nov 3, 2022)

v0.56.0(Oct 28, 2022)

v0.55.0(Oct 26, 2022)

v0.54.1(Oct 24, 2022)

v0.54.0(Oct 10, 2022)

v0.53.0(Oct 10, 2022)

v0.52.0(Oct 10, 2022)

v0.51.0(Oct 6, 2022)

v0.50.0(Sep 13, 2022)

v0.49.1(Sep 9, 2022)

v0.49.0(Sep 6, 2022)

v0.48.2(Aug 25, 2022)

v0.48.1(Aug 25, 2022)

v0.48.0(Aug 22, 2022)

v0.47.0(Aug 12, 2022)

v0.46.0(Jul 21, 2022)

v0.45.0(Jul 14, 2022)

v0.44.2(Jul 12, 2022)

v0.44.1(Jul 12, 2022)

v0.44.0(Jul 5, 2022)

v0.43.0(Jul 5, 2022)

v0.42.0(Jul 1, 2022)

v0.41.0(Jun 22, 2022)

Owner

Generative Toolkit 4 Scientific Discovery

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

Details about the wide minima density hypothesis and metrics to compute width of a minima

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

PyTorch implementation of the paper The Lottery Ticket Hypothesis for Object Recognition

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

Accelerate Neural Net Training by Progressively Freezing Layers

PyTorch common framework to accelerate network implementation, training and validation

🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools

Scientific Computation Methods in C and Python (Open for Hacktoberfest 2021)

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

xitorch: differentiable scientific computing library

Differentiable scientific computing library

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

OpenGAN: Open-Set Recognition via Open Data Generation

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources

Crab is a ﬂexible, fast recommender engine for Python that integrates classic information ﬁltering recommendation algorithms in the world of scientiﬁc Python packages (numpy, scipy, matplotlib).