PyTorch deep learning projects made easy.

Victor Huang

Last update: Jan 1, 2023

Related tags

Deep Learning pytorch-template

Overview

PyTorch Template Project

PyTorch deep learning project made easy.

PyTorch Template Project

Requirements

Python >= 3.5 (3.6 recommended)
PyTorch >= 0.4 (1.2 recommended)
tqdm (Optional for test.py)
tensorboard >= 1.14 (see Tensorboard Visualization)

Features

Clear folder structure which is suitable for many deep learning projects.
.json config file support for convenient parameter tuning.
Customizable command line options for more convenient parameter tuning.
Checkpoint saving and resuming.
Abstract base classes for faster development:
- BaseTrainer handles checkpoint saving/resuming, training process logging, and more.
- BaseDataLoader handles batch generation, data shuffling, and validation data splitting.
- BaseModel provides basic model summary.

Folder Structure

pytorch-template/
│
├── train.py - main script to start training
├── test.py - evaluation of trained model
│
├── config.json - holds configuration for training
├── parse_config.py - class to handle config file and cli options
│
├── new_project.py - initialize new project with template files
│
├── base/ - abstract base classes
│   ├── base_data_loader.py
│   ├── base_model.py
│   └── base_trainer.py
│
├── data_loader/ - anything about data loading goes here
│   └── data_loaders.py
│
├── data/ - default directory for storing input data
│
├── model/ - models, losses, and metrics
│   ├── model.py
│   ├── metric.py
│   └── loss.py
│
├── saved/
│   ├── models/ - trained models are saved here
│   └── log/ - default logdir for tensorboard and logging output
│
├── trainer/ - trainers
│   └── trainer.py
│
├── logger/ - module for tensorboard visualization and logging
│   ├── visualization.py
│   ├── logger.py
│   └── logger_config.json
│  
└── utils/ - small utility functions
    ├── util.py
    └── ...

Usage

The code in this repo is an MNIST example of the template. Try python train.py -c config.json to run code.

Config file format

Config files are in .json format:

{
  "name": "Mnist_LeNet",        // training session name
  "n_gpu": 1,                   // number of GPUs to use for training.
  
  "arch": {
    "type": "MnistModel",       // name of model architecture to train
    "args": {

    }                
  },
  "data_loader": {
    "type": "MnistDataLoader",         // selecting data loader
    "args":{
      "data_dir": "data/",             // dataset path
      "batch_size": 64,                // batch size
      "shuffle": true,                 // shuffle training data before splitting
      "validation_split": 0.1          // size of validation dataset. float(portion) or int(number of samples)
      "num_workers": 2,                // number of cpu processes to be used for data loading
    }
  },
  "optimizer": {
    "type": "Adam",
    "args":{
      "lr": 0.001,                     // learning rate
      "weight_decay": 0,               // (optional) weight decay
      "amsgrad": true
    }
  },
  "loss": "nll_loss",                  // loss
  "metrics": [
    "accuracy", "top_k_acc"            // list of metrics to evaluate
  ],                         
  "lr_scheduler": {
    "type": "StepLR",                  // learning rate scheduler
    "args":{
      "step_size": 50,          
      "gamma": 0.1
    }
  },
  "trainer": {
    "epochs": 100,                     // number of training epochs
    "save_dir": "saved/",              // checkpoints are saved in save_dir/models/name
    "save_freq": 1,                    // save checkpoints every save_freq epochs
    "verbosity": 2,                    // 0: quiet, 1: per epoch, 2: full
  
    "monitor": "min val_loss"          // mode and metric for model performance monitoring. set 'off' to disable.
    "early_stop": 10	                 // number of epochs to wait before early stop. set 0 to disable.
  
    "tensorboard": true,               // enable tensorboard visualization
  }
}

Add addional configurations if you need.

Using config files

Modify the configurations in .json config files, then run:

python train.py --config config.json

Resuming from checkpoints

You can resume from a previously saved checkpoint by:

python train.py --resume path/to/checkpoint

Using Multiple GPU

You can enable multi-GPU training by setting n_gpu argument of the config file to larger number. If configured to use smaller number of gpu than available, first n devices will be used by default. Specify indices of available GPUs by cuda environmental variable.

python train.py --device 2,3 -c config.json

This is equivalent to

CUDA_VISIBLE_DEVICES=2,3 python train.py -c config.py

Customization

Project initialization

Use the new_project.py script to make your new project directory with template files. python new_project.py ../NewProject then a new project folder named 'NewProject' will be made. This script will filter out unneccessary files like cache, git files or readme file.

Custom CLI options

Changing values of config file is a clean, safe and easy way of tuning hyperparameters. However, sometimes it is better to have command line options if some values need to be changed too often or quickly.

This template uses the configurations stored in the json file by default, but by registering custom options as follows you can change some of them using CLI flags.

# simple class-like object having 3 attributes, `flags`, `type`, `target`.
CustomArgs = collections.namedtuple('CustomArgs', 'flags type target')
options = [
    CustomArgs(['--lr', '--learning_rate'], type=float, target=('optimizer', 'args', 'lr')),
    CustomArgs(['--bs', '--batch_size'], type=int, target=('data_loader', 'args', 'batch_size'))
    # options added here can be modified by command line flags.
]

target argument should be sequence of keys, which are used to access that option in the config dict. In this example, target for the learning rate option is ('optimizer', 'args', 'lr') because config['optimizer']['args']['lr'] points to the learning rate. python train.py -c config.json --bs 256 runs training with options given in config.json except for the batch size which is increased to 256 by command line options.

Data Loader

Writing your own data loader

Inherit BaseDataLoader

BaseDataLoader is a subclass of torch.utils.data.DataLoader, you can use either of them.

BaseDataLoader handles:
- Generating next batch
- Data shuffling
- Generating validation data loader by calling BaseDataLoader.split_validation()

DataLoader Usage

BaseDataLoader is an iterator, to iterate through batches:
```
for batch_idx, (x_batch, y_batch) in data_loader:
    pass
```
Example

Please refer to data_loader/data_loaders.py for an MNIST data loading example.

Trainer

Writing your own trainer

Inherit BaseTrainer

BaseTrainer handles:
- Training process logging
- Checkpoint saving
- Checkpoint resuming
- Reconfigurable performance monitoring for saving current best model, and early stop training.
  - If config monitor is set to max val_accuracy, which means then the trainer will save a checkpoint model_best.pth when validation accuracy of epoch replaces current maximum.
  - If config early_stop is set, training will be automatically terminated when model performance does not improve for given number of epochs. This feature can be turned off by passing 0 to the early_stop option, or just deleting the line of config.
Implementing abstract methods

You need to implement _train_epoch() for your training process, if you need validation then you can implement _valid_epoch() as in trainer/trainer.py

Example

Please refer to trainer/trainer.py for MNIST training.
Iteration-based training

Trainer.__init__ takes an optional argument, len_epoch which controls number of batches(steps) in each epoch.

Model

Writing your own model

Inherit BaseModel

BaseModel handles:
- Inherited from torch.nn.Module
- __str__: Modify native print function to prints the number of trainable parameters.
Implementing abstract methods

Implement the foward pass method forward()

Example

Please refer to model/model.py for a LeNet example.

Loss

Custom loss functions can be implemented in 'model/loss.py'. Use them by changing the name given in "loss" in config file, to corresponding name.

Metrics

Metric functions are located in 'model/metric.py'.

You can monitor multiple metrics by providing a list in the configuration file, e.g.:

"metrics": ["accuracy", "top_k_acc"],

Additional logging

If you have additional information to be logged, in _train_epoch() of your trainer class, merge them with log as shown below before returning:

additional_log = {"gradient_norm": g, "sensitivity": s}
log.update(additional_log)
return log

Testing

You can test trained model by running test.py passing path to the trained checkpoint by --resume argument.

Validation data

To split validation data from a data loader, call BaseDataLoader.split_validation(), then it will return a data loader for validation of size specified in your config file. The validation_split can be a ratio of validation set per total data(0.0 <= float < 1.0), or the number of samples (0 <= int < n_total_samples).

Note: the split_validation() method will modify the original data loader Note: split_validation() will return None if "validation_split" is set to 0

Checkpoints

You can specify the name of the training session in config files:

"name": "MNIST_LeNet",

The checkpoints will be saved in save_dir/name/timestamp/checkpoint_epoch_n, with timestamp in mmdd_HHMMSS format.

A copy of config file will be saved in the same folder.

Note: checkpoints contain:

{
  'arch': arch,
  'epoch': epoch,
  'state_dict': self.model.state_dict(),
  'optimizer': self.optimizer.state_dict(),
  'monitor_best': self.mnt_best,
  'config': self.config
}

Tensorboard Visualization

This template supports Tensorboard visualization by using either torch.utils.tensorboard or TensorboardX.

Install

If you are using pytorch 1.1 or higher, install tensorboard by 'pip install tensorboard>=1.14.0'.

Otherwise, you should install tensorboardx. Follow installation guide in TensorboardX.
Run training

Make sure that tensorboard option in the config file is turned on.
```
 "tensorboard" : true
```
Open Tensorboard server

Type tensorboard --logdir saved/log/ at the project root, then server will open at http://localhost:6006

By default, values of loss and metrics specified in config file, input images, and histogram of model parameters will be logged. If you need more visualizations, use add_scalar('tag', data), add_image('tag', image), etc in the trainer._train_epoch method. add_something() methods in this template are basically wrappers for those of tensorboardX.SummaryWriter and torch.utils.tensorboard.SummaryWriter modules.

Note: You don't have to specify current steps, since WriterTensorboard class defined at logger/visualization.py will track current steps.

Contribution

Feel free to contribute any kind of function or enhancement, here the coding style follows PEP8

Code should pass the Flake8 check before committing.

TODOs

Multiple optimizers
Support more tensorboard functions
Using fixed random seed
Support pytorch native tensorboard
tensorboardX logger support
Configurable logging layout, checkpoint naming
Iteration-based training (instead of epoch-based)
Adding command line option for fine-tuning

License

This project is licensed under the MIT License. See LICENSE for more details

Acknowledgements

This project is inspired by the project Tensorflow-Project-Template by Mahmoud Gemy

Comments

New handling of 'data_utils', adding: PyTorch data loader support, restructuring config file ++
Hi,

This pull request contains restructuring of the data handling. This includes splitting dataset and data loader handling into separate files. It also includes support for using the native data loader of PyTorch.

I look forward to your comments.

Changes:

'config.json':

(Some) parameters are now collected in larger groups

I find this grouping easier to read, as things are logically sorted

E.g., "data_loader" contains both "train" and "validation", since data loaders for both are created

**kwargs to class instance initialization can be included config file

Dataset transformations are parsed from ["dataset"]["transforms"] (which is a list)

'data_utils/data_loaders.py':

Now supports PyTorch data loader directly from config file (set ["data_loader"]["type"] = "PyTorch" to use it instead of the repository data loader)

Function "get_data_loaders" handles creation of both training and validation data loaders

'data_utils/datasets.py'

Function "get_dataset" returns dataset instance, if found (with transforms, see below)

Included template for writing custom datasets

'data_utils/transforms'

Reads transforms from the configuration ["dataset"]["transforms"], returns a composed transform object (used by 'datasets.py')

'train.py'

Minor changes to align with the above

'README.md': Updated parts as needed (although more could be written/rewritten in future for clarity)

Regarding BaseDataLoader

With reference to issue #6, 'BaseDataLoader' could be deprecated. This pull request brings us one step closer to achieving this.

The class can still be used after this pull requests (although I had to change references to the configuration file, due to new grouping structure of parameters)
enhancement
opened by borgesa 9
Latest checkpoint

I want to create latest checkpoint after every epochs beside save period in config file. For example, I already set save period every 10 epochs but I still want to create latest checkpoint for every epoch. Can you guide me to do that? Thank you

opened by tqdungctuct 8
Adding Configurable Arguments to Custom Loss and Metric(s) Functions
Hey guys,

Thank you for the great template. One of things I noticed when implementing custom loss and metric functions is that if I ever wanted to pass additional arguments (that are not the output and target tensors) to the function, there are a lot of locations within the code that would need to be modified.

This PR allows users to set additional arguments for the loss and metric functions directly in the config JSON file.

The implementation for loss is identical to the optimizer and lr_scheduler sections.

For metrics, the corresponding value for the metrics section has been changed.

Before: a list of the custom metric function names

After: a dictionary with each metric as the key and a dictionary of its arguments (argument keyword as the key, argument value as the value) as the corresponding values.

This is my first time submitting a PR, hopefully it's in the correct format :). Let me know what you think.

Best, Michael
opened by minghao4 8
Configurable logging
Hi guys,

This PR implements configurable logging via YAML.

In particular, it allows for:

configurable log format

configurable handlers (eg. rotating file handlers)

I've removed the train_logger, as we can use different log levels (info/debug) and the verbosity config to control what is output, and by logging to a file we can avoid having to save the log state out as part of a checkpoint.

Let me know what you think.

Cheers, Karl
opened by khornlund 7
Optimizer for fine tuning, Optional support of TensorboardX

Hi, this PR is 3 small fixes on your work.

First, I changed the optimizer call in the base_trainer to make it work with fine-tuning, where models have non-trainable parameters.

Second is minor fix on total number of train data displayed, which is changing Train Epoch: 1 [5120/53984 (9%)] Loss: 0.912355 to Train Epoch: 1 [5120/54000 (9%)] Loss: 0.912355.

Last one is adding items to .gitignore file, most of which are from default .gitignore file of github for python.

opened by SunQpark 7
Added fn to select data loader (instead of Mnist hardcoded in train.py)
Hi,

Created initial proposal for selecting data loader in configuration file.

Overview:

User can select data loader in configuration file

New data loaders are (for now*) written in 'data_loader.data_loaders.py'

Currently, the function 'get_data_loader' needs to be manually when adding new loaders (in addition to writing the loader class)

I propose this as a first step that we can build further on**.

Comment: *: I guess we over time can implement functionality to load loaders from new files inside the 'data_loader' module (if someone prefers to create 'data_loader/my_new_loader.py') **: I plan to implement another data loader that potentially can work as another example in the template. Will post when this is done.
opened by borgesa 6
Any support for multiple loss functions?

I'm using a simple aggregated loss function for now, but it'd be great if it was supported in the config, logging, etc.

Happy to contribute if you have any plans for this.

opened by itsnamgyu 5
Add pytorch 1.1 utils.tensorboard support.

This pull request adds the ability to choose to use PyTorch 1.1'storch.util.tensorboard.SummaryWriter. The config is changed so that the user passes in an array of modules names. If tensorboardX is not available, it tries to use torch.utils.tensorboard. The user can switch the ordering to change priority.

Tested with the default example using Pytorch 1.1 and Tenorboard 1.14. Also updated Readme to reflect requirements.

opened by christopherbate 5
Multi GPU training(data parallel)

This PR handles two multi processing. The first is multi-CPU for data loading, and the second is using multi-GPU(data parallel).

Multi-CPU is simply done by adding n_cpu argument in the config file and pasing it as num_workers option of pytorch native DataLoader.

Multi-GPU can be controlled with n_gpuoption in config file, which replaces the previous gpu index and cuda option. Training without gpu is still possible with "n_gpu": 0

Specifying the GPU indices to use is possible externally with the environmental variable CUDA_VISIBLE_DEVICES=0,1. I considered adding GPU indices into config file instead of num_GPU option and setting that on the train.py, but that would save GPU indices to the checkpoint, which can be problematic when resuming.

Tested on 3 machines, my laptop: pytorch 0.4.1, no GPU server1: pytorch 0.4.1, 8 * Tesla K80, cuda 9.1 server2: pytorch 0.4.0, 4 * GTX 1080, cuda 9.0

It worked fine for all of conditions I have tested, but I heard that one of my friend saying that giving non-zero value to the num_workers option raised exception for her machine. So, please tell me if anything goes wrong.

I'll update the README file later
enhancement

opened by SunQpark 5
Proposal in "model/loss.py" - Use loss classes instead of functional API
Hi,

Currently, 'get_loss_function' queries for local functions in the same file and returns the object, if it finds a function with the matching name. I propose that we instead use the interface of 'torch.nn.modules.loss' classes and return an instantiated class object, instead of a function reference.

These classes either way call the functional API, and are better documented.

As an example:

Current functionality:

'my_loss' returns 'F.nll_loss(y_in, y_target)'

Function 'get_loss_function' returns the function reference to 'my_loss'

Proposed functionality:

'get_loss_function' can instead use a combination of 'getattr(torch.nn.modules.loss, loss_fn_name)' (finding all built in loss classes in PyTorch) and searching for custom loss classes inside the file ('model/loss.py')

An instantiated class object is returned

What do you think?
opened by borgesa 5
Model is moved to GPU after the optimizer is instatiated, resulting in a performance hit.
I noticed that the optimizer is instantiated before the model is moved to the GPU.

This is contrary to the PyTorch docs:

If you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.

In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used. -- https://pytorch.org/docs/stable/optim.html#how-to-use-an-optimizer

I noticed the problem on my machine, because I had fluctuating GPU utilization (checked with nvtop). The utilization jumped every couple seconds from 10-20% to 80% and back. Moving the model to cuda beforehand (in train.py) fixed issue for me. (Afterwards the utilization never dropped under 70%).

model = config.init_obj('arch', module_arch) model.cuda() ... optimizer = config.init_obj('optimizer', torch.optim, trainable_params)

Can you reproduce the behavior?
opened by schlabrendorff 4
Support `in` operation in ConfigParser

Sometimes it is necessary to judge whether the object of ConfigParser contains a specific key during my developing. According to the overloading of __getitem__ operation, I add a overloading of __contains__ for fixing the fallback error when using in operation.

opened by Tackoil 0

TODO: also configure logging for sub-processes(not master)

Hi victoresque, Thanks for your hero repo! I used hydra_DDP branch to build my application, but got some problems in get_logger. Specifically, the program util.py loads the '.hydra/hydra.yaml' file from the directory, but hydra.yaml only exists in the 'output directory' such as 'outputs/2022-09-25/15-16-17' so python can't find it. I'm a little puzzled about the path of hydra.yaml. Maybe get_logger should load the /hydra.yaml from output directory? Could anyone help me! Thanks in advance!

(base) python train.py                     
Traceback (most recent call last):
  File "/mnt/petrelfs/qudelin/PJLAB/RS/VRS-Transformer/train.py", line 19, in <module>
    logger = get_logger("train")
  File "/mnt/petrelfs/qudelin/PJLAB/RS/VRS-Transformer/src/utils/util.py", line 19, in get_logger
    hydra_conf = OmegaConf.load('.hydra/hydra.yaml')
  File "/mnt/petrelfs/qudelin/miniconda3/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 187, in load
    with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/petrelfs/qudelin/PJLAB/RS/VRS-Transformer/.hydra/hydra.yaml'

opened by DelinQu 4

Strange bugs occur when the number of Gpus trained and tested is inconsistent

If the number of Gpus is inconsistent, the number of files loaded by the loader does not correspond to BatchSize, and the prediction accuracy is always 0.166666

opened by sherleyjj 0
Any plans to support Wandb Hyperparameter Searching?

Hi victoresque,

Many thanks for your hero repo! It must be a great enhancement if you can join wandb with visualizing and hyperparameter searching. Are there any plans to support wandb ?

opened by DelinQu 0

Some thing wrong with add_histogram function

I had a strange problem when I tried to implement EfficientNet model. Last week, this code worked fine. But now, when I retrain then occurs some error. I faced this error in the validation step.

This is my code: https://github.com/ngocgiang99/Paper-Implementation. Please checkout to branch feat_efficient_net.

This is error log:

Traceback (most recent call last):
  File "train.py", line 73, in <module>
    main(config)
  File "train.py", line 54, in main
    trainer.train()
  File "D:\Work\Me\Paper-Implementation\efficient_net\base\base_trainer.py", l
    result = self._train_epoch(epoch)
  File "D:\Work\Me\Paper-Implementation\efficient_net\trainer\trainer.py", lin
    val_log = self._valid_epoch(epoch)
  File "D:\Work\Me\Paper-Implementation\efficient_net\trainer\trainer.py", lin
    self.writer.add_histogram(name, p, bins='auto')
  File "D:\Work\Me\Paper-Implementation\efficient_net\logger\visualization.py"
    add_data(tag, data, self.step, *args, **kwargs)
  File "C:\Users\PC\anaconda3\envs\general\lib\site-packages\torch\utils\tenso
in add_histogram
    histogram(tag, values, bins, max_bins=max_bins), global_step, walltime)
  File "C:\Users\PC\anaconda3\envs\general\lib\site-packages\torch\utils\tenso in histogram
    hist = make_histogram(values.astype(float), bins, max_bins)
  File "C:\Users\PC\anaconda3\envs\general\lib\site-packages\torch\utils\tenso in make_histogram
    counts, limits = np.histogram(values, bins=bins)
  File "<__array_function__ internals>", line 6, in histogram
  File "C:\Users\PC\anaconda3\envs\general\lib\site-packages\numpy\lib\histogram
    bin_edges, uniform_bins = _get_bin_edges(a, bins, range, weights)
  File "C:\Users\PC\anaconda3\envs\general\lib\site-packages\numpy\lib\histogrn_edges
    endpoint=True, dtype=bin_type)
  File "<__array_function__ internals>", line 6, in linspace
  File "C:\Users\PC\anaconda3\envs\general\lib\site-packages\numpy\core\function_base.py", line 135, in linspace
    y = _nx.arange(0, num, dtype=dt).reshape((-1,) + (1,) * ndim(delta))
numpy.core._exceptions.MemoryError: Unable to allocate 10.3 TiB for an array with shape (1418558411252,) and data type float64

Conda environment:

# Name                    Version                   Build  Channel
absl-py                   0.12.0                   pypi_0    pypi
ca-certificates           2021.4.13            haa95532_1
cachetools                4.2.2                    pypi_0    pypi
certifi                   2020.12.5        py37haa95532_0
chardet                   4.0.0                    pypi_0    pypi
google-auth               1.30.1                   pypi_0    pypi
google-auth-oauthlib      0.4.4                    pypi_0    pypi
grpcio                    1.38.0                   pypi_0    pypi
idna                      2.10                     pypi_0    pypi
importlib-metadata        4.3.1                    pypi_0    pypi
markdown                  3.3.4                    pypi_0    pypi
numpy                     1.20.3                   pypi_0    pypi
oauthlib                  3.1.0                    pypi_0    pypi
openssl                   1.1.1k               h2bbff1b_0
pandas                    1.2.4                    pypi_0    pypi
pillow                    8.2.0                    pypi_0    pypi
pip                       21.1.1           py37haa95532_0
protobuf                  3.17.1                   pypi_0    pypi
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
python                    3.7.10               h6244533_0
python-dateutil           2.8.1                    pypi_0    pypi
pytz                      2021.1                   pypi_0    pypi
requests                  2.25.1                   pypi_0    pypi
requests-oauthlib         1.3.0                    pypi_0    pypi
rsa                       4.7.2                    pypi_0    pypi
setuptools                52.0.0           py37haa95532_0
six                       1.16.0                   pypi_0    pypi
sqlite                    3.35.4               h2bbff1b_0
tensorboard               2.5.0                    pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.0                    pypi_0    pypi
torch                     1.8.1+cu111              pypi_0    pypi
torchaudio                0.8.1                    pypi_0    pypi
torchvision               0.9.1+cu111              pypi_0    pypi
tqdm                      4.61.0                   pypi_0    pypi
typing-extensions         3.10.0.0                 pypi_0    pypi
urllib3                   1.26.5                   pypi_0    pypi
vc                        14.2                 h21ff451_1
vs2015_runtime            14.27.29016          h5e58377_2
werkzeug                  2.0.1                    pypi_0    pypi
wheel                     0.36.2             pyhd3eb1b0_0
wincertstore              0.2                      py37_0
zipp                      3.4.1                    pypi_0

Thanks.

opened by ngocgiang99 2

Owner

Victor Huang

GitHub

InferPy: Deep Probabilistic Modeling with Tensorflow Made Easy

InferPy: Deep Probabilistic Modeling Made Easy InferPy is a high-level API for probabilistic modeling written in Python and capable of running on top

141 Oct 13, 2022

A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

Torch-RecHub A Lighting Pytorch Framework for Recommendation Models, Easy-to-use and Easy-to-extend. 安装 pip install torch-rechub 主要特性 scikit-learn风格易用

67 Jan 4, 2023

A Sklearn-like Framework for Hyperparameter Tuning and AutoML in Deep Learning projects. Finally have the right abstractions and design patterns to properly do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.

Neuraxle Pipelines Code Machine Learning Pipelines - The Right Way. Neuraxle is a Machine Learning (ML) library for building machine learning pipeline

555 Dec 24, 2022

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

9.2k Jan 2, 2023

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

63 Oct 17, 2022

Template repository for managing machine learning research projects built with PyTorch-Lightning

Tutorial Repository with a minimal example for showing how to deploy training across various compute infrastructure.

3 Feb 11, 2022

Model Serving Made Easy

The easiest way to build Machine Learning APIs BentoML makes moving trained ML models to production easy: Package models trained with any ML framework

4.4k Jan 8, 2023

NLP made easy

GluonNLP: Your Choice of Deep Learning for NLP GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you l

Distributed (Deep) Machine Learning Community

2.5k Jan 4, 2023

Regression Metrics Calculation Made easy for tensorflow2 and scikit-learn

Regression Metrics Installation To install the package from the PyPi repository you can execute the following command: pip install regressionmetrics I

11 Dec 16, 2022

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis This is a PyTorch implementation of the Deep Streaming Linear Discriminant

41 Dec 25, 2022

An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

DeepNER An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models. This repository contains complex Deep

9 May 30, 2022

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

8.4k Jan 1, 2023

GeDML is an easy-to-use generalized deep metric learning library

32 Dec 5, 2022

BasicRL: easy and fundamental codes for deep reinforcement learning。It is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up.

BasicRL: easy and fundamental codes for deep reinforcement learning BasicRL is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up. It is

12 Apr 28, 2022

An efficient and easy-to-use deep learning model compression framework

TinyNeuralNetwork 简体中文 TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework, which contains features like neura

441 Dec 25, 2022

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

English | 简体中文 Easy Parallel Library Overview Easy Parallel Library (EPL) is a general and efficient library for distributed model training. Usability

185 Dec 21, 2022

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos. By adopting a unified pipeline-based API design, PyKale enforces standardization and minimalism, via reusing existing resources, reducing repetitions and redundancy, and recycling learning models across areas.

370 Dec 27, 2022

A Web API for automatic background removal using Deep Learning. App is made using Flask and deployed on Heroku.

Automatic_Background_Remover A Web API for automatic background removal using Deep Learning. App is made using Flask and deployed on Heroku. ?? https:

16 Oct 29, 2022

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

6.9k Jan 3, 2023

PyTorch deep learning projects made easy.

Related tags

Overview

PyTorch Template Project

Requirements

Features

Folder Structure

Usage

Config file format

Using config files

Resuming from checkpoints

Using Multiple GPU

Customization

Project initialization

Custom CLI options

Data Loader

Trainer

Model

Loss

Metrics

Additional logging

Testing

Validation data

Checkpoints

Tensorboard Visualization

Contribution

TODOs

License

Acknowledgements

Comments

Changes:

Regarding BaseDataLoader

Owner

Victor Huang

InferPy: Deep Probabilistic Modeling with Tensorflow Made Easy

A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Template repository for managing machine learning research projects built with PyTorch-Lightning

Model Serving Made Easy

NLP made easy

Regression Metrics Calculation Made easy for tensorflow2 and scikit-learn

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

GeDML is an easy-to-use generalized deep metric learning library

BasicRL: easy and fundamental codes for deep reinforcement learning。It is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up.

An efficient and easy-to-use deep learning model compression framework

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

A Web API for automatic background removal using Deep Learning. App is made using Flask and deployed on Heroku.

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch