ICML 21 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification



Voice2Series: Reprogramming Acoustic Models for Time Series Classification

  • International Conference on Machine Learning (ICML), 2021 | Paper | Colab Demo


Tensorflow 2.2 (CUDA=10.0) and Kapre 0.2.0.

  • Noted: Echo to many interests from the community, we will also provide Pytorch V2S layers and frameworks around this September, incoperating the new torch audio layers. Feel free to email the authors for further collaboration.

  • option 1 (from yml)

conda env create -f V2S.yml
  • option 2 (from clean python 3.6)
pip install tensorflow-gpu==2.1.0
pip install kapre==0.2.0
pip install h5py==2.10.0


  • This is tengible Version. Please also check the paper for actual validation details. Many Thanks!
python v2s_main.py --dataset 0 --eps 100 --mapping 3
  • Result
seg idx: 0 --> start: 0, end: 500
seg idx: 1 --> start: 5000, end: 5500
seg idx: 2 --> start: 10000, end: 10500
Tensor("AddV2_2:0", shape=(None, 16000, 1), dtype=float32)
--- Preparing Masking Matrix
Model: "model_1"
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 500, 1)]     0                                            
zero_padding1d (ZeroPadding1D)  (None, 16000, 1)     0           input_1[0][0]                    
tf_op_layer_AddV2 (TensorFlowOp [(None, 16000, 1)]   0           zero_padding1d[0][0]             
zero_padding1d_1 (ZeroPadding1D (None, 16000, 1)     0           input_1[0][0]                    
tf_op_layer_AddV2_1 (TensorFlow [(None, 16000, 1)]   0           tf_op_layer_AddV2[0][0]          
zero_padding1d_2 (ZeroPadding1D (None, 16000, 1)     0           input_1[0][0]                    
tf_op_layer_AddV2_2 (TensorFlow [(None, 16000, 1)]   0           tf_op_layer_AddV2_1[0][0]        
art_layer (ARTLayer)            (None, 16000, 1)     16000       tf_op_layer_AddV2_2[0][0]        
reshape_1 (Reshape)             (None, 16000)        0           art_layer[0][0]                  
model (Model)                   (None, 36)           1292911     reshape_1[0][0]                  
tf_op_layer_MatMul (TensorFlowO [(None, 6)]          0           model[1][0]                      
tf_op_layer_Shape (TensorFlowOp [(2,)]               0           tf_op_layer_MatMul[0][0]         
tf_op_layer_strided_slice (Tens [()]                 0           tf_op_layer_Shape[0][0]          
tf_op_layer_Reshape_2/shape (Te [(3,)]               0           tf_op_layer_strided_slice[0][0]  
tf_op_layer_Reshape_2 (TensorFl [(None, 2, 3)]       0           tf_op_layer_MatMul[0][0]         
tf_op_layer_Mean (TensorFlowOpL [(None, 2)]          0           tf_op_layer_Reshape_2[0][0]      
Total params: 1,308,911
Trainable params: 217,225
Non-trainable params: 1,091,686
Epoch 1/100
2021-07-19 01:43:32.690913: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-07-19 01:43:32.919343: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
113/113 [==============================] - 6s 50ms/step - loss: 0.0811 - accuracy: 1.0000 - val_loss: 1.5589e-04 - val_accuracy: 1.0000
Epoch 2/100
113/113 [==============================] - 5s 41ms/step - loss: 5.0098e-05 - accuracy: 1.0000 - val_loss: 1.0906e-05 - val_accuracy: 1.0000

Class Activation Mapping

python cam_v2s.py --dataset 5 --weight wNo5_map6-88-0.7662.h5 --mapping 6 --layer conv2d_1


  • Voice2Series: Reprogramming Acoustic Models for Time Series Classification
  title = 	 {Voice2Series: Reprogramming Acoustic Models for Time Series Classification},
  author =       {Yang, Chao-Han Huck and Tsai, Yun-Yun and Chen, Pin-Yu},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {11808--11819},
  year = 	 {2021},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
You might also like...
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

A real world application of a Recurrent Neural Network on a binary classification of time series data
A real world application of a Recurrent Neural Network on a binary classification of time series data

What is this This is a real world application of a Recurrent Neural Network on a binary classification of time series data. This project includes data

TensorFlow code for the neural network presented in the paper:
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

Towards Understanding and Mitigating Social Biases in Language Models This repo contains code and data for evaluating and mitigating bias from generat

TAug :: Time Series Data Augmentation using Deep Generative Models

TAug :: Time Series Data Augmentation using Deep Generative Models Note!!! The package is under development so be careful for using in production! Fea

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)
Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control
[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

PG-MORL This repository contains the implementation for the paper Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Contro

Code for the ICML 2021 paper:
Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

ViLT Code for the paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision" Install pip install -r requirements.txt pip

Code for the ICML 2021 paper:
Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

ViLT Code for the paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision" Install pip install -r requirements.txt pip

  • Problem when shifting labels to zero in binary classification problems with negative labels

    Problem when shifting labels to zero in binary classification problems with negative labels

    Hello! I would like to report a possible bug.

    Code: v2s_main.py Lines: 43-45

    When we have multi-class problems, such as ECG 5000 where the original labels are [1,2,3,4,5], the mod function applied in lines 43-45 shifts the labels to zero correctly, so we obtain the values [0,1,2,3,4] as labels.

    However, when we have binary classification with negative labels, such as ECG 200, where the original labels are [-1,1], the mod function used yields 1 as remainder for both cases, and the label vector becomes an array of ones.

    Do you agree?

    opened by DominguesPH 3
  • ECG 200 Data loader error and fixed

    ECG 200 Data loader error and fixed

    Dear Users,

    We get one sincere feedback reported from our user that the ECG 200 dataset was associated with a data loading error.

    We have corrected that error and updated our code with the revised result with 85.5% acc.

    One current state-of-art performance is Nestor Cabello et al., ICDM 2020 with 90.9% acc.

    The other time series does not associate with that error and we have reported the related issues.

    Thank you again for all your feedbacks,

    V2S team

    opened by huckiyang 0
Speech, Reinforcement Learning, and Causal Inference.
Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

Stock Price Prediction Using Deep Learning Univariate Time Series Predicting stock price using historical data of a company using Neural networks for

Abdultawwab Safarji 7 Nov 27, 2022
This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.

This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.

Nils L. Westhausen 182 Jan 7, 2023
Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Framework overview This library allows to quickly implement different architectures based on Reservoir Computing (the family of approaches popularized

Filippo Bianchi 249 Dec 21, 2022
Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

The Boombox: Visual Reconstruction from Acoustic Vibrations Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick Columbia University Project Website |

Boyuan Chen 12 Nov 30, 2022
Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

DiffSinger - PyTorch Implementation PyTorch implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension). Status

Keon Lee 152 Jan 2, 2023
Multistream CNN for Robust Acoustic Modeling

Multistream Convolutional Neural Network (CNN) A multistream CNN is a novel neural network architecture for robust acoustic modeling in speech recogni

ASAPP Research 37 Sep 21, 2022


LEAP Lab 2 Sep 15, 2022
Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

APSIPA-SER-with-A-and-T This code is the implementation of Speech Emotion Recognition (SER) with acoustic and linguistic features. The network model i

kenro515 3 Jan 4, 2023
Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

Princeton Vision & Learning Lab 115 Jan 4, 2023
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

null 187 Dec 26, 2022