torchbearer: A model fitting library for PyTorch

Overview

Note: We're moving to PyTorch Lightning! Read about the move here. From the end of February, torchbearer will no longer be actively maintained. We'll continue to fix bugs when they are found and ensure that torchbearer runs on new versions of pytorch. However, we won't plan or implement any new functionality (if there's something you'd like to see in a training library, consider creating an issue on PyTorch Lightning).

logo

PyPI version Python 2.7 | 3.5 | 3.6 | 3.7 PyTorch 1.0.0 | 1.1.0 | 1.2.0 | 1.3.0 | 1.4.0 Build Status codecov Documentation Status Downloads

WebsiteDocsExamplesInstallCitingRelated

A PyTorch model fitting library designed for use by researchers (or anyone really) working in deep learning or differentiable programming. Specifically, we aim to dramatically reduce the amount of boilerplate code you need to write without limiting the functionality and openness of PyTorch.

Examples

General

Quickstart: Get up and running with torchbearer, training a simple CNN on CIFAR-10.
Callbacks: A detailed exploration of callbacks in torchbearer, with some useful visualisations.
Imaging: A detailed exploration of the imaging sub-package in torchbearer, useful for showing visualisations during training.
Serialization: This guide gives an introduction to serializing and restarting training in torchbearer.
History and Replay: This guide gives an introduction to the history returned by a trial and the ability to replay training.
Custom Data Loaders: This guide gives an introduction on how to run custom data loaders in torchbearer.
Data Parallel: This guide gives an introduction to using torchbearer with DataParrallel.
LiveLossPlot: A demonstration of the LiveLossPlot callback included in torchbearer.
PyCM: A demonstration of the PyCM callback included in torchbearer for generating confusion matrices.
NVIDIA Apex: A guide showing how to perform half and mixed precision training in torchbearer with NVIDIA Apex.

Deep Learning

Training a VAE: A demonstration of how to train (add do a simple visualisation of) a Variational Auto-Encoder (VAE) on MNIST with torchbearer.
Training a GAN: A demonstration of how to train (add do a simple visualisation of) a Generative Adversarial Network (GAN) on MNIST with torchbearer.
Generating Adversarial Examples: A demonstration of how to perform a simple adversarial attack with torchbearer.
Transfer Learning with Torchbearer: A demonstration of how to perform transfer learning on STL10 with torchbearer.
Regularisers in Torchbearer: A demonstration of how to use all of the built-in regularisers in torchbearer (Mixup, CutOut, CutMix, Random Erase, Label Smoothing and Sample Pairing).
Manifold Mixup: A demonstration of how to use the Manifold Mixup callback in Torchbearer.
Class Appearance Model: A demonstration of the Class Appearance Model (CAM) callback in torchbearer.

Differentiable Programming

Optimising Functions: An example (and some fun visualisations) showing how torchbearer can be used for the purpose of optimising functions with respect to their parameters using gradient descent.
Linear SVM: Train a linear support vector machine (SVM) using torchbearer, with an interactive visualisation!
Breaking Adam: The Adam optimiser doesn't always converge, in this example we reimplement some of the function optimisations from the AMSGrad paper showing this empirically.

Install

The easiest way to install torchbearer is with pip:

pip install torchbearer

Alternatively, build from source with:

pip install git+https://github.com/pytorchbearer/torchbearer

Citing Torchbearer

If you find that torchbearer is useful to your research then please consider citing our preprint: Torchbearer: A Model Fitting Library for PyTorch, with the following BibTeX entry:

@article{torchbearer2018,
  author = {Ethan Harris and Matthew Painter and Jonathon Hare},
  title = {Torchbearer: A Model Fitting Library for PyTorch},
  journal  = {arXiv preprint arXiv:1809.03363},
  year = {2018}
}

Related

Torchbearer isn't the only library for training PyTorch models. Here are a few others that might better suit your needs (this is by no means a complete list, see the awesome pytorch list or the incredible pytorch for more):

  • skorch, model wrapper that enables use with scikit-learn - crossval etc. can be very useful
  • PyToune, simple Keras style API
  • ignite, advanced model training from the makers of PyTorch, can need a lot of code for advanced functions (e.g. Tensorboard)
  • TorchNetTwo (TNT), can be complex to use but well established, somewhat replaced by ignite
  • Inferno, training utilities and convenience classes for PyTorch
  • Pytorch Lightning, lightweight wrapper on top of PyTorch with advanced multi-gpu and cluster support
  • Pywick, high-level training framework, based on torchsample, support for various segmentation models
Comments
  • RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle

    RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle

    Dear all,

    It seems that torchbearer does not want to work for me. I am trying to simply classify images using resnet. You can find my code here (https://github.com/FrancescoSaverioZuppichini/PyTorch-Deep-Learning-Template/tree/feature/cuda-error), the main training logic is:

    import time
    from comet_ml import Experiment
    import torchbearer
    import torch.optim as optim
    import torch.nn as nn
    from torchsummary import summary
    from Project import Project
    from data import get_dataloaders
    from data.transformation import train_transform, val_transform
    from models import MyCNN, resnet18
    from utils import device, show_dl
    from torchbearer import Trial
    from torchbearer.callbacks import CSVLogger, ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
    from callbacks import CometCallback
    from logger import logging
    
    if __name__ == '__main__':
        project = Project()
        # our hyperparameters
        params = {
            'lr': 0.001,
            'batch_size': 64,
            'epochs': 1,
            'model': 'resnet18-finetune',
            'id': time.time()
        }
    
        logging.info(f'Using device={device} 🚀')
        # everything starts with the data
        train_dl, val_dl, test_dl = get_dataloaders(
            project.data_dir,
            val_transform=val_transform,
            train_transform=train_transform,
            batch_size=params['batch_size'],
            num_workers=4,
        )
        # is always good practice to visualise some of the train and val images to be sure data-aug
        # is applied properly
        # show_dl(train_dl)
        # show_dl(test_dl)
        # define our comet experiment
        experiment = Experiment(api_key='8THqoAxomFyzBgzkStlY95MOf',
                                project_name="dl-pytorch-template", workspace="francescosaveriozuppichini")
        experiment.log_parameters(params)
        # create our special resnet18
        cnn = resnet18(n_classes=2).to(device)
        loss = nn.CrossEntropyLoss()
        # print the model summary to show useful information
        logging.info(summary(cnn, (3, 224, 244)))
        # define custom optimizer and instantiace the trainer `Model`
        optimizer = optim.Adam(cnn.parameters(), lr=params['lr'])
        # create our Trial object to train and evaluate the model
        trial = Trial(cnn, optimizer, loss, metrics=['acc', 'loss'],
                      callbacks=[
                          CometCallback(experiment),
                          ReduceLROnPlateau(monitor='val_loss',
                                            factor=0.1, patience=5),
                          EarlyStopping(monitor='val_acc', patience=5, mode='max'),
                          CSVLogger(str(project.checkpoint_dir / 'history.csv')),
                          ModelCheckpoint(str(project.checkpoint_dir / f'{params["id"]}-best.pt'), monitor='val_acc', mode='max')
        ]).to(device)
        trial.with_generators(train_generator=train_dl,
                              val_generator=val_dl, test_generator=test_dl)
        history = trial.run(epochs=params['epochs'], verbose=1)
        logging.info(history)
        preds = trial.evaluate(data_key=torchbearer.TEST_DATA)
        logging.info(f'test preds=({preds})')
        # experiment.log_metric('test_acc', test_acc)
    
    

    I am running the same logic (same model) with poutyne and I have no problems. I really would like to switch to torchbearer

    Error is:

    2020-02-03 13:32:03,386 - [INFO] - None
      0%|                                                                                                                                                             | 0/1 [00:00<?, ?it/s]C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: block: [0,0,0], thread: [2,0,0] Assertion `t >= 0 && t < n_classes` failed.
    C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: block: [0,0,0], thread: [13,0,0] Assertion `t >= 0 && t < n_classes` failed.
    C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: block: [0,0,0], thread: [17,0,0] Assertion `t >= 0 && t < n_classes` failed.
    C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: block: [0,0,0], thread: [20,0,0] Assertion `t >= 0 && t < n_classes` failed.
    C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: block: [0,0,0], thread: [21,0,0] Assertion `t >= 0 && t < n_classes` failed.
    C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: block: [0,0,0], thread: [22,0,0] Assertion `t >= 0 && t < n_classes` failed.
    C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: block: [0,0,0], thread: [23,0,0] Assertion `t >= 0 && t < n_classes` failed.
    C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: block: [0,0,0], thread: [25,0,0] Assertion `t >= 0 && t < n_classes` failed.
    C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: block: [0,0,0], thread: [29,0,0] Assertion `t >= 0 && t < n_classes` failed.
    C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: block: [0,0,0], thread: [30,0,0] Assertion `t >= 0 && t < n_classes` failed.
    Traceback (most recent call last):
      File "c:/Users/Francesco/Documents/PyTorch-Deep-Learning-Template/main.py", line 64, in <module>
        history = trial.run(epochs=params['epochs'], verbose=1)
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\trial.py", line 133, in wrapper
        res = func(self, *args, **kwargs)
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\trial.py", line 988, in run
        final_metrics = self._fit_pass(state)[torchbearer.METRICS]
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\trial.py", line 298, in wrapper
        res = func(self, *args, **kwargs)
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\trial.py", line 1033, in _fit_pass
        state[torchbearer.OPTIMIZER].step(lambda: self.closure(state))
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torch\optim\adam.py", line 58, in step
        loss = closure()
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\trial.py", line 1033, in <lambda>
        state[torchbearer.OPTIMIZER].step(lambda: self.closure(state))
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\bases.py", line 382, in closure
        state[loss].backward(**state[torchbearer.BACKWARD_ARGS])
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\comet_ml\monkey_patching.py", line 246, in wrapper
        return_value = original(*args, **kwargs)
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torch\tensor.py", line 195, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph)
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torch\autograd\__init__.py", line 99, in backward
        allow_unreachable=True)  # allow_unreachable flag
    RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
    

    Do your library work for you? Do you use it in your daily workflow?

    Thank you.

    Cheers,

    Francesco Saverio

    opened by FrancescoSaverioZuppichini 7
  • Accuracy computation for seq2seq model

    Accuracy computation for seq2seq model

    0/10(t): 100%|██████████| 1000/1000 [02:30<00:00,  6.75it/s, running_loss=0.0338, running_acc=0.326, loss=0.689, loss_std=1.17, acc=35.9, acc_std=0]
    0/10(v): 100%|██████████| 20/20 [00:01<00:00, 19.90it/s, val_loss=0.0341, val_loss_std=0.0122, val_acc=42, val_acc_std=0]
    1/10(t): 100%|██████████| 1000/1000 [02:30<00:00,  6.76it/s, running_loss=0.00997, running_acc=0.327, loss=0.019, loss_std=0.0166, acc=41.8, acc_std=0]
    1/10(v): 100%|██████████| 20/20 [00:01<00:00, 19.98it/s, val_loss=0.0126, val_loss_std=0.00798, val_acc=42.1, val_acc_std=0]
    2/10(t): 100%|██████████| 1000/1000 [02:30<00:00,  6.75it/s, running_loss=0.00493, running_acc=0.328, loss=0.00837, loss_std=0.00938, acc=41.8, acc_std=0]
    2/10(v): 100%|██████████| 20/20 [00:01<00:00, 19.89it/s, val_loss=0.00783, val_loss_std=0.00716, val_acc=42.2, val_acc_std=0]
    3/10(t):  45%|████▌     | 454/1000 [01:08<01:21,  6.73it/s, running_loss=0.00458, running_acc=0.316]
    

    Are the accuracies correct? (running_acc=.326, acc=35.9?)

    I may be misunderstanding something, but shouldn't running_acc and acc be the same at the end of each epoch?

    bug 
    opened by kl2792 6
  • Tqdm for Jupyter Notebook

    Tqdm for Jupyter Notebook

    Each iteration of TQDM starts a new line in Jupyter Notebook -- is there any way to integrate one of the suggested fixes into torchbearer?

    (ref: https://github.com/tqdm/tqdm/issues/375, https://stackoverflow.com/a/47200571)

    bug 
    opened by kl2792 6
  • ReduceLROnPlateau

    ReduceLROnPlateau

    Dear all,

    first of all, I love this library.

    The ReduceLROnPlateau is not working when I call trail.evaluate.

    ...
        trial = Trial(cnn, optimizer, loss, metrics=['acc', 'loss'],
                      callbacks=[
                        #   CometCallback(experiment),
                          ReduceLROnPlateau(monitor='val_loss',
                                            factor=0.1, patience=5),
                        #   EarlyStopping(monitor='val_acc', patience=5, mode='max'),
                        #   CSVLogger('history.csv'),
                        #   ModelCheckpoint('best.pt', monitor='val_acc', mode='max')
        ]).to(device)
        trial.with_generators(train_generator=train_dl,
                              val_generator=val_dl, test_generator=test_dl)
        # history = trial.run(params['epochs'], verbose=1)
        preds = trial.evaluate(data_key=torchbearer.TEST_DATA)
    

    error:

    0/1(e): 100%|███████████████████████████████████| 1/1 [00:00<00:00,  2.18it/s, test_acc=0.4667, test_loss=0.6516]
    Traceback (most recent call last):
      File "c:/Users/Francesco/Documents/PyTorch-Deep-Learning-Template/main.py", line 62, in <module>
        preds = trial.evaluate(data_key=torchbearer.TEST_DATA)
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\trial.py", line 298, in wrapper
        res = func(self, *args, **kwargs)
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\trial.py", line 133, in wrapper
        res = func(self, *args, **kwargs)
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\trial.py", line 1131, in evaluate
        state[torchbearer.CALLBACK_LIST].on_end_epoch(state)
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\callbacks\callbacks.py", line 221, in on_end_epoch
        self._for_list(lambda callback: callback.on_end_epoch(state))
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\trial.py", line 105, in _for_list
        function(self.callback_list)
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\callbacks\callbacks.py", line 221, in <lambda>
        self._for_list(lambda callback: callback.on_end_epoch(state))
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\callbacks\callbacks.py", line 221, in on_end_epoch
        self._for_list(lambda callback: callback.on_end_epoch(state))
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\callbacks\callbacks.py", line 66, in _for_list
        function(callback)
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\callbacks\callbacks.py", line 221, in <lambda>
        self._for_list(lambda callback: callback.on_end_epoch(state))
      File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\callbacks\torch_scheduler.py", line 32, in on_end_epoch
        self._scheduler.step(state[torchbearer.METRICS][self._monitor], epoch=state[torchbearer.EPOCH])
    KeyError: 'val_loss'
    

    Probably you need to disable the callback when evaluating (or just checking if the monitored metrics is in state['metrics'].

    Thank you!

    Best Regards,

    Francesco Saverio

    opened by FrancescoSaverioZuppichini 5
  • Trial predict fails with the given example

    Trial predict fails with the given example

    After training the model, I want to get the prediction on the test set. Not the accuracy. I know that should be Trial.evaluate(). It works well. Therefore I used Trial.predict(). Is that right?

    But the error says that AttributeError: 'dict' object has no attribute 'data'.

    I read the instruction of Trial that provides an example,

    
    # Simple trial to predict on some validation and test data
    >>> from torchbearer import Trial
    >>> val_data = torch.rand(5, 5)
    >>> test_data = torch.rand(5, 5)
    >>> t = Trial(None).with_test_data(test_data)
    >>> test_predictions = t.predict(data_key=torchbearer.TEST_DATA)
    
    

    I ran it but got an error AttributeError: 'NoneType' object has no attribute 'eval'

    So, is there any problem in this method?

    bug 
    opened by danielhuoo 5
  • Lean Model Checkpointing

    Lean Model Checkpointing

    Hi,

    I ran a Trial and have my model saves my model using torchbearer.callbacks.checkpointers.Best to a file model.pt.

    When I load the file with torch.load and run try to make a forward pass with it, I get the following error:

    model = MyModule()
    state_dict = torch.load('vae.pt')
    model.load_state_dict(state_dict) # <== I get the error here
    

    AttributeError: 'StateKey' object has no attribute 'startswith'

    I get that model is being saved so that I can be recovered to be ready for torchbearer, but how can we save the model lean?

    It seems like here, the model is only saved for reusability by torchbearer.

    Thanks a lot!

    docs 
    opened by dorukhansergin 5
  • loss_std resulting in complex number and breaking Tensorboard

    loss_std resulting in complex number and breaking Tensorboard

    I'm using torchbearer with PyTorch 0.4 and TensorboardX 1.2. Previously, I was using PyTorch 0.4.1, but I had to downgrade to use the TensorboardX because of a incompatibility with them. After adding the Tensorboard callback, the following error is raised after training for some time:

    {TypeError}can't convert complex to float

    When debugging, I noticed that the add_scalar() of TensorboardX tried to convert the scalar to float and, somehow, the val_loss_std was a complex number. Is there and error in how the std is calculated in order to result in a complex number?

    bug 
    opened by fernandocamargoai 5
  • Support multi input and output

    Support multi input and output

    Right now, it's not possible to:

    • Have a Module multiple inputs (eg. forward(x1, x2)).
    • Have a Module with multiple outputs (returning a tuple).

    I worked around the first problem by creating a Module with a single input and indexing each individual input. But the second problem makes it impossible to use the TripletMarginLoss, for example, since it expects 3 outputs from the Module.

    opened by fernandocamargoai 3
  • Model checkpointers save_weights_only

    Model checkpointers save_weights_only

    As per the discussion in #504 it would be good if the checkpointers had an option to just save the model state dict, rather than the trial one. Not sure what the argument should be, something like save_model_only / save_weights_only? @dorukhansergin @MattPainter01 any thoughts on this?

    enhancement 
    opened by ethanwharris 3
  • Running indefinitely?

    Running indefinitely?

    Currently there is no way to ask torchbearer to run until stopped. This would be useful for reinforcement learning where we don't know how long an episode will be.

    enhancement 
    opened by ethanwharris 3
  • lr_scheduler order changed since PyTorch 1.1.0

    lr_scheduler order changed since PyTorch 1.1.0

    Many thanks for the wonderful library.

    A warning message emerged when a scheduler was used in the callbacks: scheduler = torchbearer.callbacks.torch_scheduler.StepLR(step_size=5, gamma=0.1)

    Hope it could be considered in later update if not yet included.

    Python37\lib\site-packages\torch\optim\lr_scheduler.py:100: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
      "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
    
    bug 
    opened by yongduek 2
  • Queston about training loop

    Queston about training loop

    Hi! I'm trying to fork the repo and add some functionality for an experiment. But that requires an addition in the training loop. I've read the documentation and the code but I can't seem to understand where the training loop itself is defined. Can somebody point me in the right direction?

    Thanks in advance!

    opened by AnabetsyR 4
  • GradientNormClipping callback error

    GradientNormClipping callback error

    When I insert this callback in the trial I get the following error. Is this some kind of bug? It seems like the gradients are not passed in the callback.

    """ File "/home/dimitris/.local/lib/python3.6/site-packages/torch/nn/utils/clip_grad.py", line 30, in clip_grad_norm_ total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type) for p in parameters]), norm_type) RuntimeError: stack expects a non-empty TensorList """

    opened by dimimal 0
  • Native automatic mixed precision for torchbearer

    Native automatic mixed precision for torchbearer

    Native automatic mixed precision support (torch.cuda.amp) is now in master: https://pytorch.org/docs/master/amp.html https://pytorch.org/docs/master/notes/amp_examples.html

    Not sure if you ever tried Nvidia's (our) experimental Apex Amp, but I know it has many pain points (extension builds, forward/backward compatibilty, DataParallel support, flaky checkpointing, i don’t even know if it can be hacked to handle double backward/gradient penalty, others…). torch.cuda.amp fixes all these, the interface is more flexible and intuitive, and the tighter integration with pytorch brings more future optimizations into scope.

    I think the torch.cuda.amp API is a good fit for a higher-level library because its style is more functional (as in, it doesn't statefully alter anything outside itself). The necessary torch.cuda.amp calls don't have silent/weird effects elsewhere.

    If you want to talk about adding torch.cuda.amp to torchbearer, with an eye towards it becoming the future-proof source of mixed precision, message me on Pytorch slack anytime (or ask me for invites if you're not signed up). I'll check this issue periodically but I'm on Pytorch slack a greater fraction of the time than I care to admit.

    opened by mcarilli 0
  • Y_pred tuple behaviour changed

    Y_pred tuple behaviour changed

    When y_pred is a tuple (i.e. model returns multiple outputs) the criterion now recieves the tuple unpacked. This should either be reverted or more clearly documented.

    bug 
    opened by ethanwharris 0
Releases(0.5.3)
  • 0.5.3(Jan 31, 2020)

  • 0.5.2(Jan 28, 2020)

    [0.5.2] - 2020-28-01

    Added

    • Added option to use mixup loss with cutmix
    • Support for PyTorch 1.4.0

    Changed

    • Changed PyCM save methods to use *args and **kwargs

    Deprecated

    Removed

    Fixed

    • Fixed a bug where the PyCM callback would fail when saving
    Source code(tar.gz)
    Source code(zip)
  • 0.5.1(Nov 6, 2019)

    [0.5.1] - 2019-11-06

    Added

    • Added BCPlus callback for between-class learning
    • Added support for PyTorch 1.3
    • Added a show flag to the ImagingCallback.to_pyplot method, set to false to stop it from calling plt.show
    • Added manifold mixup

    Changed

    • Changed the default behaviour of ImagingCallback.to_pyplot to turn off the axis

    Deprecated

    Removed

    Fixed

    • Fixed a bug when resuming an old state dict with tqdm enabled
    • Fixed a bug in imaging where passing a title to to_pyplot was not possible
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Sep 17, 2019)

    [0.5.0] - 2019-09-17

    Added

    • Added PyTorch CyclicLR scheduler

    Changed

    • Torchbearer now supports Modules with multiple inputs and multiple outputs

    Deprecated

    Removed

    • Cyclic LR callback in favour of torch cyclic lr scheduler
    • Removed support for PyTorch 0.4.x

    Fixed

    • Fixed bug where aggregate predictions couldn't handle empty list
    • Fixed a bug where Runtime Errors on forward weren't handled properly
    • Fixed a bug where exceptions on forward wouldn't print the traceback properly
    • Fixed a documentation mistake whereby ReduceLROnPlateau was said to increase learning rate
    Source code(tar.gz)
    Source code(zip)
  • 0.4.0(Sep 17, 2019)

    [0.4.0] - 2019-07-05

    Added

    • Added with_loader trial method that allows running of custom batch loaders
    • Added a Mock Model which is set when None is passed as the model to a Trial. Mock Model always returns None.
    • Added __call__(state) to StateKey so that they can now be used as losses
    • Added a callback to do cutout regularisation
    • Added a with_data trial method that allows passing of train, val and test data in one call
    • Added the missing on_init callback decorator
    • Added a step_on_batch flag to the early stopping callback
    • Added multi image support to imaging
    • Added a callback to unpack state into torchbearer.X at sample time for specified keys and update state after the forward pass based on model outputs. This is useful for using DataParallel which pass the main state dict directly.
    • Added callback for generating confusion matrices with PyCM
    • Added a mixup callback with associated loss
    • Added Label Smoothing Regularisation (LSR) callback
    • Added CutMix regularisation
    • Added default metric from paper for when Mixup loss is used

    Changed

    • Changed history to now just be a list of records
    • Categorical Accuracy metric now also accepts tensors of size (B, C) and gets the max over C for the taget class

    Deprecated

    Removed

    • Removed the variational sub-package, this will now be packaged separately
    • Removed verbose argument from the early stopping callback

    Fixed

    • Fixed a bug where list or dictionary metrics would cause the tensorboard callback to error
    • Fixed a bug where running a trial without training steps would error
    • Fixed a bug where the caching imaging callback didn't reset data so couldn't be run in multiple trials
    • Fixed a bug in the ClassAppearanceModel callback
    • Fixed a bug where the state given to predict was not a State object
    • Fixed a bug with Cutout on gpu
    • Fixed a bug where MakeGrid callback wasn't passing all arguments correctly
    • Fixed a bug in ImagingCallback that would sometimes cause make_grid to throw an error
    • Fixed a bug where the verbose argument would not work unless given as a keyword argument
    • Fixed a bug where the data_key argument would sometimes not work as expected
    • Fixed a bug where cutout required a seed
    • Fixed a bug where cutmix wasn't sendign the beta distribution sample to the device
    Source code(tar.gz)
    Source code(zip)
  • 0.3.2(May 28, 2019)

    [0.3.2] - 2019-05-28

    Added

    Changed

    Deprecated

    Removed

    Fixed

    • Fixed a bug where for_steps would sometimes not work as expected if called in the wrong order
    • Fixed a bug where torchbearer installed via pip would crash on import
    Source code(tar.gz)
    Source code(zip)
  • 0.3.1(May 24, 2019)

    [0.3.1] - 2019-05-24

    Added

    • Added cyclic learning rate finder
    • Added on_init callback hook to run at the end of trial init
    • Added callbacks for weight initialisation in torchbearer.callbacks.init
    • Added with_closure trial method that allows running of custom closures
    • Added base_closure function to bases that allows creation of standard training loop closures
    • Added ImagingCallback class for callbacks which produce images that can be sent to tensorboard, visdom or a file
    • Added CachingImagingCallback and MakeGrid callback to make a grid of images
    • Added the option to give the only_if callback decorator a function of self and state rather than just state
    • Added Layer-sequential unit-variance (LSUV) initialization
    • Added ClassAppearanceModel callback and example page for visualising CNNs
    • Added on_checkpoint callback decorator
    • Added support for PyTorch 1.1.0

    Changed

    • No_grad and enable_grad decorators are now also context managers

    Deprecated

    Removed

    • Removed the fluent decorator, just use return self
    • Removed install dependency on torchvision, still required for some functionality

    Fixed

    • Fixed bug where replay errored when train or val steps were None
    • Fixed a bug where mock optimser wouldn't call it's closure
    • Fixed a bug where the notebook check raised ModuleNotFoundError when IPython not installed
    • Fixed a memory leak with metrics that causes issues with very long epochs
    • Fixed a bug with the once and once_per_epoch decorators
    • Fixed a bug where the test criterion wouldn't accept a function of state
    • Fixed a bug where type inference would not work correctly when chaining Trial methods
    • Fixed a bug where checkpointers would error when they couldn't find the old checkpoint to overwrite
    • Fixed a bug where the 'test' label would sometimes not populate correctly in the default accuracy metric
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Feb 28, 2019)

    [0.3.0] - 2019-02-28

    Added

    • Added torchbearer.variational, a sub-package for implementations of state of the art variational auto-encoders
    • Added SimpleUniform and SimpleExponential distributions
    • Added a decorator which can be used to cite a research article as part of a doc string
    • Added an optional dimension argument to the mean, std and running_mean metric aggregators
    • Added a var metric and decorator which can be used to calculate the variance of a metric
    • Added an unbiased flag to the std and var metrics to optionally not apply Bessel's correction (consistent with torch.std / torch.var)
    • Added support for rounding 1D lists to the Tqdm callback
    • Added SimpleWeibull distribution
    • Added support for Python 2.7
    • Added SimpleWeibullSimpleWeibullKL
    • Added SimpleExponentialSimpleExponentialKL
    • Added the option for model parameters only saving to Checkpointers.
    • Added documentation about serialization.
    • Added support for indefinite data loading. Iterators can now be run until complete independent of epochs or iterators can be refreshed during an epoch if complete.
    • Added support for batch intervals in interval checkpointer
    • Added line magic %torchbearer notebook
    • Added 'accuracy' variants of 'acc' default metrics

    Changed

    • Changed the default behaviour of the std metric to compute the sample std, in line with torch.std
    • Tqdm precision argument now rounds to decimal places rather than significant figures
    • Trial will now simply infer if the model has an argument called 'state'
    • Torchbearer now infers if inside a notebook and will use the appropriate tqdm module if not set

    Deprecated

    Removed

    • Removed the old Model API (deprecated since version 0.2.0)
    • Removed the 'pass_state' argument from Trial, this will now be inferred
    • Removed the 'std' decorator from the default metrics

    Fixed

    • Fixed a bug in the weight decay callback which would result in potentially negative decay (now just uses torch.norm)
    • Fixed a bug in the cite decorator causing the citation to not show up correctly
    • Fixed a memory leak in the mse primitive metric
    Source code(tar.gz)
    Source code(zip)
  • 0.2.6.1(Feb 25, 2019)

  • 0.2.6(Dec 19, 2018)

    [0.2.6] - 2018-12-19

    Added

    Changed

    • Y_PRED, Y_TRUE and X can now equivalently be accessed as PREDICTION, TARGET and INPUT respectively

    Deprecated

    Removed

    Fixed

    • Fixed a bug where the LiveLossPlot callback would trigger an error if run and evaluate were called separately
    • Fixed a bug where state key errors would report to the wrong stack level
    • Fixed a bug where the user would wrongly get a state key error in some cases
    Source code(tar.gz)
    Source code(zip)
  • 0.2.5(Dec 19, 2018)

    [0.2.5] - 2018-12-19

    Added

    • Added flag to replay to replay only a single batch per epoch
    • Added support for PyTorch 1.0.0 and Python 3.7
    • MetricTree can now unpack dictionaries from root, this is useful if you want to get a mean of a metric. However, this should be used with caution as it extracts only the first value in the dict and ignores the rest.
    • Added a callback for the livelossplot visualisation tool for notebooks

    Changed

    • All error / accuracy metrics can now optionally take state keys for predictions and targets as arguments

    Deprecated

    Removed

    Fixed

    • Fixed a bug with the EpochLambda metric which required y_true / y_pred to have specific forms
    Source code(tar.gz)
    Source code(zip)
  • 0.2.4(Nov 16, 2018)

    [0.2.4] - 2018-11-16

    Added

    • Added metric functionality to state keys so that they can be used as metrics if desired
    • Added customizable precision to the printer callbacks
    • Added threshold to binary accuracy. Now it will appropriately handle any values in [0, 1]

    Changed

    • Changed the default printer precision to 4s.f.
    • Tqdm on_epoch now shows metrics immediately when resuming

    Deprecated

    Removed

    Fixed

    • Fixed a bug which would incorrectly trigger version warnings when loading in models
    • Fixed bugs where the Trial would not fail gracefully if required objects were not in state
    • Fixed a bug where none criterion didn't work with the add_to_loss callback
    • Fixed a bug where tqdm on_epoch always started at 0
    Source code(tar.gz)
    Source code(zip)
  • 0.2.3(Oct 12, 2018)

    [0.2.3] - 2018-10-12

    Added

    • Added string representation of Trial to give summary
    • Added option to log Trial summary to TensorboardText
    • Added a callback point ('on_checkpoint') which can be used for model checkpointing after the history ios updated

    Changed

    • When resuming training checkpointers no longer delete the state file the trial was loaded from
    • Changed the metric eval to include a data_key which tells us what data we are evaluating on

    Deprecated

    Removed

    Fixed

    • Fixed a bug where callbacks weren't handled correctly in the predict and evaluate methods of Trial
    • Fixed a bug where the history wasn't updated when new metrics were calculated with the evaluate method of Trial
    • Fixed a bug where tensorboard writers couldn't be reused
    • Fixed a bug where the none criterion didn't require gradient
    • Fix bug where tqdm wouldn't get correct iterator length when evaluating on test generator
    • Fixed a bug where evaluating before training tried to update history before it existed
    • Fixed a bug where the metrics would output 'val_acc' even if evaluating on test or train data
    • Fixed a bug where roc metric didn't detach y_pred before sending to numpy
    • Fixed a bug where resuming from a checkpoint saved with one of the callbacks didn't populate the epoch number correctly
    Source code(tar.gz)
    Source code(zip)
  • 0.2.2(Sep 18, 2018)

    [0.2.2] - 2018-09-18

    Added

    • The default_for_key metric decorator can now be used to pass arguments to the init of the inner metric
    • The default metric for the key 'top_10_acc' is now the TopKCategoricalAccuracy metric with k set to 10
    • Added global verbose flag for trial that can be overridden by run, evaluate, predict
    • Added an LR metric which retrieves the current learning rate from the optimizer, default for key 'lr'

    Fixed

    • Fixed a bug where the DefaultAccuracy metric would not put the inner metric in eval mode if the first call to reset was after the call to eval
    • Fixed a bug where trying to load a state dict in a different session to where it was saved didn't work properly
    • Fixed a bug where the empty criterion would trigger an error if no Y_TRUE was put in state
    Source code(tar.gz)
    Source code(zip)
  • 0.2.1(Sep 11, 2018)

    [0.2.1] - 2018-09-11

    Added

    • Evaluation and prediction can now be done on any data using data_key keywork arg
    • Text tensorboard/visdom logger that writes epoch/batch metrics to text

    Changed

    • TensorboardX, Numpy, Scikit-learn and Scipy are no longer dependancies and only required if using the tensorboard callbacks or roc metric

    Deprecated

    Removed

    Fixed

    • Model class setting generator incorrectly leading to stop iterations.
    • Argument ordering is consistent in Trial.with_generators and Trial.__init__
    • Added a state dict for the early stopping callback
    • Fixed visdom parameters not getting set in some cases
    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Aug 21, 2018)

    See [NEW!] in README.md for new key features

    [0.2.0] - 2018-08-21

    Added

    • Added the ability to pass custom arguments to the tqdm callback
    • Added an ignore_index flag to the categorical accuracy metric, similar to nn.CrossEntropyLoss. Usage: metrics=[CategoricalAccuracyFactory(ignore_index=0)]
    • Added TopKCategoricalAccuracy metric (default for key: top_5_acc)
    • Added BinaryAccuracy metric (default for key: binary_acc)
    • Added MeanSquaredError metric (default for key: mse)
    • Added DefaultAccuracy metric (use with 'acc' or 'accuracy') - infers accuracy from the criterion
    • New Trial api torchbearer.Trial to replace the Model api. Trial api is more atomic and uses the fluent pattern to allow chaining of methods.
    • torchbearer.Trial has with_x_generator and with_x_data methods to add training/validation/testing generators to the trial. There is a with_generators method to allow passing of all generators in one call.
    • torchbearer.Trial has for_x_steps and for_steps to allow running of trails without explicit generators or data tensors
    • torchbearer.Trial keeps a history of run calls which tracks number of epochs ran and the final metrics at each epoch. This allows seamless resuming of trial running.
    • torchbearer.Trial.state_dict now returns the trial history and callback list state allowing for full resuming of trials
    • torchbearer.Trial has a replay method that can replay training (with callbacks and display) from the history. This is useful when loading trials from state.
    • The backward call can now be passed args by setting state[torchbearer.BACKWARD_ARGS]
    • torchbearer.Trial implements the forward pass, loss calculation and backward call as a optimizer closure
    • Metrics are now explicitly calculated with no gradient

    Changed

    • Callback decorators can now be chained to allow construction with multiple methods filled
    • Callbacks can now implement state_dict and ``load_state_dict` to allow callbacks to resume with state
    • State dictionary is now accepts StateKey objects which are unique and generated through torchbearer.state.get_state
    • State dictionary now warns when accessed with strings as this allows for collisions
    • Checkpointer callbacks will now resume from a state dict when resume=True in Trial

    Deprecated

    • torchbearer.Model has been deprecated in favour of the new torchbearer.Trial api

    Removed

    • Removed the MetricFactory class. Decorators still work in the same way but the Factory is no longer needed.

    Fixed

    Source code(tar.gz)
    Source code(zip)
  • 0.1.7(Aug 14, 2018)

    [0.1.7] - 2018-08-14

    Added

    • Added visdom logging support to tensorbard callbacks
    • Added option to choose tqdm module (tqdm, tqdm_notebook, ...) to Tqdm callback
    • Added some new decorators to simplify custom callbacks that must only run under certain conditions (or even just once).

    Changed

    • Instantiation of Model will now trigger a warning pending the new Trial API in the next version
    • TensorboardX dependancy now version 1.4

    Deprecated

    Removed

    Fixed

    • Mean and standard deviation calculations now work correctly for network outputs with many dimensions
    • Callback list no longer shared between fit calls, now a new copy is made each fit
    Source code(tar.gz)
    Source code(zip)
  • 0.1.6(Aug 10, 2018)

    [0.1.6] - 2018-08-10

    Added

    • Added a verbose level (options are now 0,1,2) which will print progress for the entire fit call, updating every epoch. Useful when doing dynamic programming with little data.
    • Added support for dictionary outputs of dataloader
    • Added abstract superclass for building TensorBoardX based callbacks

    Changed

    • Timer callback can now also be used as a metric which allows display of specified timings to printers and has been moved to metrics.
    • The loss_criterion is renamed to criterion in torchbearer.Model arguments.
    • The criterion in torchbearer.Model is now optional and will provide a zero loss tensor if it is not given.
    • TensorBoard callbacks refactored to be based on a common super class
    • TensorBoard callbacks refactored to use a common SummaryWriter for each log directory

    Deprecated

    Removed

    Fixed

    • Standard deviation calculation now returns 0 instead of complex value when given very close samples
    Source code(tar.gz)
    Source code(zip)
  • 0.1.5(Jul 30, 2018)

    [0.1.5] - 2018-07-30

    Added

    • Added a on_validation_criterion callback hook
    • Added a DatasetValidationSplitter which can be used to create a validation split if required for datasets like Cifar10 or MNIST
    • Added simple timer callback

    Changed

    Deprecated

    Removed

    Fixed

    • Fixed a bug where checkpointers would not save the model in some cases
    • Fixed a bug with the ROC metric causing it to not work
    Source code(tar.gz)
    Source code(zip)
  • 0.1.4(Jul 23, 2018)

    [0.1.4] - 2018-07-23

    Added

    • Added a decorator API for metrics which allows decorators to be used for metric construction
    • Added a default_for_key decorator which can be used to associate a string with a given metric in metric lists
    • Added a decorator API for callbacks which allows decorators to be used for simple callback construction
    • Added a add_to_loss callback decorator which allows quicker constructions of callbacks that add values to the loss

    Changed

    • Changed the API for running metrics and aggregators to no longer wrap a metric but instead receive input

    Deprecated

    Removed

    Fixed

    Source code(tar.gz)
    Source code(zip)
  • 0.1.3(Jul 18, 2018)

    [0.1.3] - 2018-07-17

    Added

    • Added a flag (step_on_batch) to the LR Scheduler callbacks which allows for step() to be called on each iteration instead of each epoch
    • Added on_sample_validation and on_forward_validation calls for validation callbacks
    • Added GradientClipping callback which simply clips the absolute gradient of the model parameters

    Changed

    • Changed the order of the arguments to the lambda function in the EpochLambda metric for consistency with pytorch and other metrics
    • Checkpointers now create directory to savepath if it doesn't exist
    • Changed the 'on_forward_criterion' callback method to 'on_criterion'
    • Changed epoch number in printer callbacks to be consistent with the rest of torchbearer

    Deprecated

    Removed

    Fixed

    • Fixed tests which were failing as of version 0.1.2
    • Fixed validation_steps not being added to state
    • Fixed checkpointer bug when path contained only filename and no directory path
    • Fixed console printer bug not printing validation statistics
    • Fixed console printer bug calling final_metrics before they existed in state
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Jun 8, 2018)

    [0.1.2] - 2018-06-08

    Added

    • Added support for tuple outputs from generators, bink expects output to be length 2. Specifically, x, y = next() is possible, where x and y can be tuples of arbitrary size or depth
    • Added support for torch dtypes in bink Model.to(...)
    • Added pickle_module and pickle_protocol to checkpointers for consistency with torch.save

    Changed

    • Changed the learning rate scheduler callbacks to no longer require an optimizer and to have the proper arguments

    Deprecated

    Removed

    Fixed

    • Fixed an issue in GradientNormClipping which raised a warning in PyTorch >= 0.4
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(May 30, 2018)

Owner
The torchbearer project, by @ecs-vlc
null
PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop.

VoiceLoop PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a n

Meta Archive 873 Dec 15, 2022
Joint parameterization and fitting of stroke clusters

StrokeStrip: Joint Parameterization and Fitting of Stroke Clusters Dave Pagurek van Mossel1, Chenxi Liu1, Nicholas Vining1,2, Mikhail Bessmeltsev3, Al

Dave Pagurek 44 Dec 1, 2022
In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Contrastive Learning of Object Representations Supervisor: Prof. Dr. Gemma Roig Institutions: Goethe University CVAI - Computational Vision & Artifici

Dirk Neuhäuser 6 Dec 8, 2022
Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

El Bruno 3 Mar 30, 2022
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

chitra What is chitra? chitra (चित्र) is a multi-functional library for full-stack Deep Learning. It simplifies Model Building, API development, and M

Aniket Maurya 210 Dec 21, 2022
😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

------ Update September 2018 ------ It's been a year since TorchMoji and DeepMoji were released. We're trying to understand how it's being used such t

Hugging Face 865 Dec 24, 2022
Convert Pytorch model to onnx or tflite, and the converted model can be visualized by Netron

Convert Pytorch model to onnx or tflite, and the converted model can be visualized by Netron

Roxbili 5 Nov 19, 2022
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model architecture for their classification problems (i.e., DNNs with different types of layers).

Google 3.2k Dec 31, 2022
Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Rubicon Purpose Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a r

Capital One 97 Jan 3, 2023
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Phil Wang 109 Dec 28, 2022
ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin et al., 2020).

ReConsider ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin

Facebook Research 47 Jul 26, 2022
Model Zoo for AI Model Efficiency Toolkit

We provide a collection of popular neural network models and compare their floating point and quantized performance.

Qualcomm Innovation Center 137 Jan 3, 2023
LIAO Shuiying 6 Dec 1, 2022
Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Demonstration of OpenVINO techniques - Model-division and a simplest-way to support custom layers Description: Model Optimizer in Intel(r) OpenVINO(tm

Yasunori Shimura 12 Nov 9, 2022
This project deploys a yolo fastest model in the form of tflite on raspberry 3b+. The model is from another repository of mine called -Trash-Classification-Car

Deploy-yolo-fastest-tflite-on-raspberry 觉得有用的话可以顺手点个star嗷 这个项目将垃圾分类小车中的tflite模型移植到了树莓派3b+上面。 该项目主要是为了记录在树莓派部署yolo fastest tflite的流程 (之后有时间会尝试用C++部署来提升

null 7 Aug 16, 2022
MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

offline-MBPO This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settings Pape

LxzGordon 1 Oct 24, 2021
RoMA: Robust Model Adaptation for Offline Model-based Optimization

RoMA: Robust Model Adaptation for Offline Model-based Optimization Implementation of RoMA: Robust Model Adaptation for Offline Model-based Optimizatio

null 9 Oct 31, 2022
An atmospheric growth and evolution model based on the EVo degassing model and FastChem 2.0

EVolve Linking planetary mantles to atmospheric chemistry through volcanism using EVo and FastChem. Overview EVolve is a linked mantle degassing and a

Pip Liggins 2 Jan 17, 2022