PyTorch implementations of neural network models for keyword spotting

Related tags

Deep Learning honk
Overview

Honk: CNNs for Keyword Spotting

Honk is a PyTorch reimplementation of Google's TensorFlow convolutional neural networks for keyword spotting, which accompanies the recent release of their Speech Commands Dataset. For more details, please consult our writeup:

Honk is useful for building on-device speech recognition capabilities for interactive intelligent agents. Our code can be used to identify simple commands (e.g., "stop" and "go") and be adapted to detect custom "command triggers" (e.g., "Hey Siri!").

Check out this video for a demo of Honk in action!

Demo Application

Use the instructions below to run the demo application (shown in the above video) yourself!

Currently, PyTorch has official support for only Linux and OS X. Thus, Windows users will not be able to run this demo easily.

To deploy the demo, run the following commands:

  • If you do not have PyTorch, please see the website.
  • Install Python dependencies: pip install -r requirements.txt
  • Install GLUT (OpenGL Utility Toolkit) through your package manager (e.g. apt-get install freeglut3-dev)
  • Fetch the data and models: ./fetch_data.sh
  • Start the PyTorch server: python .
  • Run the demo: python utils/speech_demo.py

If you need to adjust options, like turning off CUDA, please edit config.json.

Additional notes for Mac OS X:

  • GLUT is already installed on Mac OS X, so that step isn't needed.
  • If you have issues installing pyaudio, this may be the issue.

Server

Setup and deployment

python . deploys the web service for identifying if audio contain the command word. By default, config.json is used for configuration, but that can be changed with --config=<file_name>. If the server is behind a firewall, one workflow is to create an SSH tunnel and use port forwarding with the port specified in config (default 16888).

In our honk-models repository, there are several pre-trained models for Caffe2 (ONNX) and PyTorch. The fetch_data.sh script fetches these models and extracts them to the model directory. You may specify which model and backend to use in the config file's model_path and backend, respectively. Specifically, backend can be either caffe2 or pytorch, depending on what format model_path is in. Note that, in order to run our ONNX models, the packages onnx and onnx_caffe2 must be present on your system; these are absent in requirements.txt.

Raspberry Pi (RPi) Infrastructure Setup

Unfortunately, getting the libraries to work on the RPi, especially librosa, isn't as straightforward as running a few commands. We outline our process, which may or may not work for you.

  1. Obtain an RPi, preferably an RPi 3 Model B running Raspbian. Specifically, we used this version of Raspbian Stretch.
  2. Install dependencies: sudo apt-get install -y protobuf-compiler libprotoc-dev python-numpy python-pyaudio python-scipy python-sklearn
  3. Install Protobuf: pip install protobuf
  4. Install ONNX without dependencies: pip install --no-deps onnx
  5. Follow the official instructions for installing Caffe2 on Raspbian. This process takes about two hours. You may need to add the caffe2 module path to the PYTHONPATH environment variable. For us, this was accomplished by export PYTHONPATH=$PYTHONPATH:/home/pi/caffe2/build
  6. Install the ONNX extension for Caffe2: pip install onnx-caffe2
  7. Install further requirements: pip install -r requirements_rpi.txt
  8. Install librosa: pip install --no-deps resampy librosa
  9. Try importing librosa: python -c "import librosa". It should throw an error regarding numba, since we haven't installed it.
  10. We haven't found a way to easily install numba on the RPi, so we need to remove it from resampy. For our setup, we needed to remove numba and @numba.jit from /home/pi/.local/lib/python2.7/site-packages/resampy/interpn.py
  11. All dependencies should now be installed. We should try deploying an ONNX model.
  12. Fetch the models and data: ./fetch_data.sh
  13. In config.json, change backend to caffe2 and model_path to model/google-speech-dataset-full.onnx.
  14. Deploy the server: python . If there are no errors, you have successfully deployed the model, accessible via port 16888 by default.
  15. Run the speech commands demo: python utils/speech_demo.py. You'll need a working microphone and speakers. If you're interacting with your RPi remotely, you can run the speech demo locally and specify the remote endpoint --server-endpoint=http://[RPi IP address]:16888.

Utilities

QA client

Unfortunately, the QA client has no support for the general public yet, since it requires a custom QA service. However, it can still be used to retarget the command keyword.

python client.py runs the QA client. You may retarget a keyword by doing python client.py --mode=retarget. Please note that text-to-speech may not work well on Linux distros; in this case, please supply IBM Watson credentials via --watson-username and --watson--password. You can view all the options by doing python client.py -h.

Training and evaluating the model

CNN models. python -m utils.train --type [train|eval] trains or evaluates the model. It expects all training examples to follow the same format as that of Speech Commands Dataset. The recommended workflow is to download the dataset and add custom keywords, since the dataset already contains many useful audio samples and background noise.

Residual models. We recommend the following hyperparameters for training any of our res{8,15,26}[-narrow] models on the Speech Commands Dataset:

python -m utils.train --wanted_words yes no up down left right on off stop go --dev_every 1 --n_labels 12 --n_epochs 26 --weight_decay 0.00001 --lr 0.1 0.01 0.001 --schedule 3000 6000 --model res{8,15,26}[-narrow]

For more information about our deep residual models, please see our paper:

There are command options available:

option input format default description
--audio_preprocess_type {MFCCs, PCEN} MFCCs type of audio preprocess to use
--batch_size [1, n) 100 the mini-batch size to use
--cache_size [0, inf) 32768 number of items in audio cache, consumes around 32 KB * n
--conv1_pool [1, inf) [1, inf) 2 2 the width and height of the pool filter
--conv1_size [1, inf) [1, inf) 10 4 the width and height of the conv filter
--conv1_stride [1, inf) [1, inf) 1 1 the width and length of the stride
--conv2_pool [1, inf) [1, inf) 1 1 the width and height of the pool filter
--conv2_size [1, inf) [1, inf) 10 4 the width and height of the conv filter
--conv2_stride [1, inf) [1, inf) 1 1 the width and length of the stride
--data_folder string /data/speech_dataset path to data
--dev_every [1, inf) 10 dev interval in terms of epochs
--dev_pct [0, 100] 10 percentage of total set to use for dev
--dropout_prob [0.0, 1.0) 0.5 the dropout rate to use
--gpu_no [-1, n] 1 the gpu to use
--group_speakers_by_id {true, false} true whether to group speakers across train/dev/test
--input_file string the path to the model to load
--input_length [1, inf) 16000 the length of the audio
--lr (0.0, inf) {0.1, 0.001} the learning rate to use
--type {train, eval} train the mode to use
--model string cnn-trad-pool2 one of cnn-trad-pool2, cnn-tstride-{2,4,8}, cnn-tpool{2,3}, cnn-one-fpool3, cnn-one-fstride{4,8}, res{8,15,26}[-narrow], cnn-trad-fpool3, cnn-one-stride1
--momentum [0.0, 1.0) 0.9 the momentum to use for SGD
--n_dct_filters [1, inf) 40 the number of DCT bases to use
--n_epochs [0, inf) 500 number of epochs
--n_feature_maps [1, inf) {19, 45} the number of feature maps to use for the residual architecture
--n_feature_maps1 [1, inf) 64 the number of feature maps for conv net 1
--n_feature_maps2 [1, inf) 64 the number of feature maps for conv net 2
--n_labels [1, n) 4 the number of labels to use
--n_layers [1, inf) {6, 13, 24} the number of convolution layers for the residual architecture
--n_mels [1, inf) 40 the number of Mel filters to use
--no_cuda switch false whether to use CUDA
--noise_prob [0.0, 1.0] 0.8 the probability of mixing with noise
--output_file string model/google-speech-dataset.pt the file to save the model to
--seed (inf, inf) 0 the seed to use
--silence_prob [0.0, 1.0] 0.1 the probability of picking silence
--test_pct [0, 100] 10 percentage of total set to use for testing
--timeshift_ms [0, inf) 100 time in milliseconds to shift the audio randomly
--train_pct [0, 100] 80 percentage of total set to use for training
--unknown_prob [0.0, 1.0] 0.1 the probability of picking an unknown word
--wanted_words string1 string2 ... stringn command random the desired target words

JavaScript-based Keyword Spotting

Honkling is a JavaScript implementation of Honk. With Honkling, it is possible to implement various web applications with in-browser keyword spotting functionality.

Keyword Spotting Data Generator

In order to improve the flexibility of Honk and Honkling, we provide a program that constructs a dataset from youtube videos. Details can be found in keyword_spotting_data_generator folder

Recording audio

You may do the following to record sequential audio and save to the same format as that of speech command dataset:

python -m utils.record

Input return to record, up arrow to undo, and "q" to finish. After one second of silence, recording automatically halts.

Several options are available:

--output-begin-index: Starting sequence number
--output-prefix: Prefix of the output audio sequence
--post-process: How the audio samples should be post-processed. One or more of "trim" and "discard_true".

Post-processing consists of trimming or discarding "useless" audio. Trimming is self-explanatory: the audio recordings are trimmed to the loudest window of x milliseconds, specified by --cutoff-ms. Discarding "useless" audio (discard_true) uses a pre-trained model to determine which samples are confusing, discarding correctly labeled ones. The pre-trained model and correct label are defined by --config and --correct-label, respectively.

For example, consider python -m utils.record --post-process trim discard_true --correct-label no --config config.json. In this case, the utility records a sequence of speech snippets, trims them to one second, and finally discards those not labeled "no" by the model in config.json.

Listening to sound level

python manage_audio.py listen

This assists in setting sane values for --min-sound-lvl for recording.

Generating contrastive examples

python manage_audio.py generate-contrastive --directory [directory] generates contrastive examples from all .wav files in [directory] using phonetic segmentation.

Trimming audio

Speech command dataset contains one-second-long snippets of audio.

python manage_audio.py trim --directory [directory] trims to the loudest one-second for all .wav files in [directory]. The careful user should manually check all audio samples using an audio editor like Audacity.

Comments
  • Technical Explanation of Desktop Application

    Technical Explanation of Desktop Application

    Hi, I would like to know what is the technical process happening "behind the scenes" when launching the Demo Application with command python.py and then python utils/speech_demo.py.

    How is the audio (streaming now, not a .wav recored of 1 second length) treated as input?

    Is it segmented in pieces of (overlapping) 1 seconds, as it seems in the log, after launching python .?

    How is the posterior handling managed? Is there a reasoning similar to what proposed by Parada?

    The reason why I am interested in the functioning of the Demo Application is that I tried to submit an own-recorded audio to the train.py by using --mode eval and I got a single prediction. So, the wav file was basically converted into a single image, which led eventually to a single prediction.

    Thanks in advance.

    opened by waltergenchi 7
  • Cannot build model from audio files with a length of 3 seconds

    Cannot build model from audio files with a length of 3 seconds

    I'm trying to create my own model. Google's Command Speech Set serves as the basis. Additionally I have six keywords (alexa / jarvis / computer are three of them), which are longer than 1 second. Therefore I brought all WAVs to a length of 3 seconds (many have silence at the end). Then I call:

    python -m utils.train --wanted_words alexa jarvis computer down left right learn dog sheila marvin --dev_every 1 --n_labels 12 --n_epochs 26 --weight_decay 0.00001 --lr 0.1 0.01 0.001 --schedule 3000 6000 --input_length 48000 --model res8 --no_cuda true --pos_key_size 1000 --data_folder ./speech_commands_v0.02/ --output_file ./speech_commands_v0.02/model.pt (input_length is set to 48000 because of the audio lengths)

    However, this leads to the following error:

    File "workspace/voice/honk/utils/model.py", line 258, in collate_fn audio_tensor = torch.from_numpy(self.audio_processor.compute_mfccs(audio_data).reshape(1, 101, 40)) ValueError: cannot reshape array of size 12040 into shape (1,101,40)

    I don't know what to do with the message or how to fix it. When adding param "--audio_preprocess_type PCEN" I am able to create the model. From this I can also create the file with the weights and use it in Honkling. But the recognition doesn't work at all. It constantly recognizes "computer" and nothing else, even if this keyword is not spoken at all or something is spoken at all.

    What can I do to make it work?

    opened by m-haecker 6
  • fix: bg audio on silence labels at eval/test set

    fix: bg audio on silence labels at eval/test set

    The model is never trained on pure zeros input, so depending on the convolution biases, the model consistently gets the silence label wrong during clean eval/test set evaluation. So i recommend either occasionally training with all-zeros for the silence label or (what I did in the PR) only evaluate on at least some background noise for the silence label.

    This has also been changed recently in the original repo with tensorflow/tensorflow@024b037

    opened by stocyr 6
  • Input Features Size

    Input Features Size

    thank for your great jobs . I see the model config in https://github.com/castorini/honk/blob/master/utils/model.py#L367-L400 ,the model input height is 101 ,It's means the input wav is over 1s when the frame is 10ms .why not choose the standard left 23 and right 8 ?

    opened by zhanglaplace 5
  • nice work,but i got

    nice work,but i got

    I1125 22:21:25.948103 4569679296 train.py:223] Step #7598: rate 0.001000, accuracy 77.0%, cross entropy 0.573181 INFO:tensorflow:Step #7599: rate 0.001000, accuracy 81.0%, cross entropy 0.542154 I1125 22:21:29.126586 4569679296 train.py:223] Step #7599: rate 0.001000, accuracy 81.0%, cross entropy 0.542154 INFO:tensorflow:Step #7600: rate 0.001000, accuracy 81.0%, cross entropy 0.574537 I1125 22:21:32.292307 4569679296 train.py:223] Step #7600: rate 0.001000, accuracy 81.0%, cross entropy 0.574537 Traceback (most recent call last): File "/Users/apple/Desktop/honk-master/utils/speech_commands_example/train.py", line 430, in tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/Users/apple/Desktop/honk-v/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/Users/apple/Desktop/honk-v/lib/python3.7/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/Users/apple/Desktop/honk-v/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/Users/apple/Desktop/honk-master/utils/speech_commands_example/train.py", line 248, in main total_conf_matrix += conf_matrix ValueError: operands could not be broadcast together with shapes (12,12) (11,11) (12,12)

    Thanks

    speech_commands_v0.02

    opened by 43998213 3
  • Where is the google-speech-dataset.pt ?

    Where is the google-speech-dataset.pt ?

    There is not a google-speech-dataset.pt which needed by the server.py in model.zip. I tried to get it by "python -m utils.train --data_folder ./training_data --type train", but I got a "model.pt" . Even I rename it and run "python .", an error will arise as below.

    RuntimeError: Error(s) in loading state_dict for SpeechModel: size mismatch for output.weight: copying a param of torch.Size([12, 26624]) from checkpoihe shape is torch.Size([4, 26624]) in current model. size mismatch for output.bias: copying a param of torch.Size([12]) from checkpoint, whereis torch.Size([4]) in current model.

    opened by azuredream 3
  • Support a Mac?

    Support a Mac?

    Thanks for the great work. Any chance this could be made to work on a Mac? I don't have a linux machine and initially don't want to faff around with a Pi.

    opened by torntrousers 3
  • ONNX models export

    ONNX models export

    Hi,

    I am trying to reproduce the raspberry-pi demo results, however I cannot produce my own onnx models.

    I tried to train a model of my own, and then convert it to onnx using torch.onnx.export. (as the end of first code cell here). However, this does not work because onnx does not have the unsqueeze operator which is used in both the SpeechModel and SpeechResModel forward functions. I get this warning :

    ONNX export failed on unsqueeze because torch.onnx.symbolic.unsqueeze does not exist

    It appears the unsqueeze operator is not yet implemented in onnx, though they seem to be working on it (cf https://github.com/onnx/onnx/pull/497). Did you use a specific branch of onnx or torch.onnx ?

    Could you please provide the code you used to generate your onnx models, or indicate how to export a pytorch model to onnx ?

    Thank you

    opened by 4p0pt0Z 3
  • onnx models won't pass the checker (Raspberry Pi)

    onnx models won't pass the checker (Raspberry Pi)

    Hi, I have followed the instructions for running on Raspberry Pi infrastructure (using RPI 3 model B with latest raspbian) and have encountered the following error when loading any .onnx models:

    File "/home/pi/honk/main.py", line 18, in main server.start(config) File "./server.py", line 144, in start lbl_service = load_service(config) File "./server.py", line 126, in load_service lbl_service = Caffe2LabelService(model_path, commands) File "./service.py", line 62, in init self.model = onnx_caffe2.backend.prepare(self._graph) File "/home/pi/.local/lib/python2.7/site-packages/onnx_caffe2/backend.py", line 513, in prepare super(Caffe2Backend, cls).prepare(model, device, **kwargs) File "/home/pi/.local/lib/python2.7/site-packages/onnx/backend/base.py", line 53, in prepare onnx.checker.check_model(model) File "/home/pi/.local/lib/python2.7/site-packages/onnx/checker.py", line 32, in checker proto.SerializeToString(), ir_version) onnx.onnx_cpp2py_export.checker.ValidationError: Unrecognized attribute: dilations ==> Context: Bad node spec: input: "13" output: "15" op_type: "MaxPool" attribute { name: "kernel_sha pe" ints: 2 ints: 2 } attribute { name: "pads" ints: 0 ints: 0 } attribute { name: "dilations" ints: 1 ints: 1 } attribute { name: "strides" ints: 2 ints: 2 }

    It seems that my pooling layers do not support dilation? (which coincides with operator descriptions [https://github.com/onnx/onnx/blob/master/docs/Operators.md#MaxPool] )

    May I ask what versions/branches did you use for successful loading of the model? Or if I can fix this in any other way?

    Note: In step 9, I've had a few more files that needed numba and jit commenting under the librosa package folder

    opened by BSifringer 3
  • Port TF audio ops to PyTorch

    Port TF audio ops to PyTorch

    Secondary objective is to port audio processing ops to PyTorch. Specifically, the following need to be implemented without third-party library support:

    • FFT/STFT (inverse FFT isn't required)
    • log-Mel filterbank
    • fast DCT

    torchaudio provides a wrapper of librosa's Mel spectrogram function (first two points). However, this implementation is claimed as relatively slow and non-GPU accelerated. It would be nice to implement the entire audio processing pipeline in a fast PyTorch-friendly manner.

    opened by daemon 3
  • Any chance Honk 2 might be in the wings?

    Any chance Honk 2 might be in the wings?

    Pytorch now has TORCHAUDIO so LibRosa and certain architecture install problems are not needed.

    Also newer models such as CRNN & DS-CNN look interesting, but it would be great if some of these incremental additions could be integrated into an example of Honk2.

    Stuart

    opened by StuartIanNaylor 2
  • how to save model in js format ?

    how to save model in js format ?

    hi , I want to save trained model as json format . I tried to add option --type json ,but parser tell --type { train,evel}. Could you tell how to save model in js format ? @ruebot @lintool @hellcoderz @yodakohl @stocyr

    opened by 08s011003 0
  • 无法获取trainning data ,下载路径没有那个文件

    无法获取trainning data ,下载路径没有那个文件

    Hi, I have tried your Demo Application with ./fetch_data.sh Would you give me some help for downloading data? If convenient,please send me data to my email ( [email protected])

    opened by elenazy 0
  • Decrease test dataloader batch size

    Decrease test dataloader batch size

    For me, decreasing the batch size of test dataloader was very helpful for GPU speed and memory. So in train.py, changing

    test_loader = data.DataLoader( test_set, batch_size=len(test_set), shuffle=False, collate_fn=test_set.collate_fn)

    to

    test_loader = data.DataLoader( test_set, batch_size=min(len(test_set), config["batch_size"] // 2), shuffle=False, collate_fn=test_set.collate_fn)

    There doesn't seem to be an advantage to loading the entire test set on GPU at the same time. I tried to pull request this change but I don't think I'm allowed to. Hope this is helpful!

    Thanks, Bryan

    opened by BryanWBear 0
Owner
Castorini
Deep learning for natural language processing and information retrieval at the University of Waterloo
Castorini
A Convolutional Transformer for Keyword Spotting

☢️ Audiomer ☢️ Audiomer: A Convolutional Transformer for Keyword Spotting [ arXiv ] [ Previous SOTA ] [ Model Architecture ] Results on SpeechCommands

null 49 Jan 27, 2022
Keyword spotting on Arm Cortex-M Microcontrollers

Keyword spotting for Microcontrollers This repository consists of the tensorflow models and training scripts used in the paper: Hello Edge: Keyword sp

Arm Software 1k Dec 30, 2022
Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Keyword Spotting Transformer This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train o

Intelligent Machines Limited 8 May 11, 2022
Keyword-BERT: Keyword-Attentive Deep Semantic Matching

project discription An implementation of the Keyword-BERT model mentioned in my paper Keyword-Attentive Deep Semantic Matching (Plz cite this github r

null 1 Nov 14, 2021
This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

The Neural Process Family This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CN

DeepMind 892 Dec 28, 2022
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

mxin262 183 Jan 3, 2023
A lightweight library to compare different PyTorch implementations of the same network architecture.

TorchBug is a lightweight library designed to compare two PyTorch implementations of the same network architecture. It allows you to count, and compar

Arjun Krishnakumar 5 Jan 2, 2023
PyTorch implementations for our SIGGRAPH 2021 paper: Editable Free-viewpoint Video Using a Layered Neural Representation.

st-nerf We provide PyTorch implementations for our paper: Editable Free-viewpoint Video Using a Layered Neural Representation SIGGRAPH 2021 Jiakai Zha

Diplodocus 258 Jan 2, 2023
XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale

XtremeDistilTransformers for Distilling Massive Multilingual Neural Networks ACL 2020 Microsoft Research [Paper] [Video] Releasing [XtremeDistilTransf

Microsoft 125 Jan 4, 2023
This is a model made out of Neural Network specifically a Convolutional Neural Network model

This is a model made out of Neural Network specifically a Convolutional Neural Network model. This was done with a pre-built dataset from the tensorflow and keras packages. There are other alternative libraries that can be used for this purpose, one of which is the PyTorch library.

null 9 Oct 18, 2022
Collection of TensorFlow2 implementations of Generative Adversarial Network varieties presented in research papers.

TensorFlow2-GAN Collection of tf2.0 implementations of Generative Adversarial Network varieties presented in research papers. Model architectures will

null 41 Apr 28, 2022
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Machine Learning From Scratch About Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The purpose

Erik Linder-Norén 21.8k Jan 9, 2023
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Algo Phantoms 81 Nov 26, 2022
Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

RNN-for-Joint-NLU Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

Kim SungDong 194 Dec 28, 2022
Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

Intel Labs 210 Jan 4, 2023
StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

null 3k Jan 8, 2023
Pytorch Lightning 1.2k Jan 6, 2023
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

DLR-RM 4.7k Jan 1, 2023