Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"

Ayush Daksh

Last update: Dec 1, 2022

Related tags

Deep Learning CMST

Overview

Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages

Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"

File organization

Preprocessing : contains all files used to preprocess the data (Python 3.6)
Data : contains data required to run this code
Statistics : contains all files that contains statistics of the dataset

Dataset

file name	discription
train/test/dev.csv	This is the dataset for code-mixed Speech Translation.
chopped_audios	This contains all the audios, transcription and translation.

Statistics of Corpora contained

Languages	#types	#tokens	Types per line	Tokens per line	Avg. token length
English[100%]	40,324	601889	10.58	11.27	4.92
French (France)	50510	645651	11.38	12.09	5.08
German[100%]	50748	584575	10.44	10.95	5.57
Gujarati[100%]	41959	584989	10.37	10.95	4.46
Hindi[100%]	29744	716800	12.36	13.42	3.74
Hungarian[100%]	84872	506608	9.13	9.49	5.89
Indonesian[100%]	39365	653374	11.54	12.23	6.14
Italian[100%]	52372	512061	9.23	9.59	5.37
Latvian[100%]	70040	477106	8.69	8.93	5.72
Lithuanian[100%]	75222	491558	8.92	9.2	6.04
Nepali[100%]	52630	570268	10.03	10.68	4.88
Persian (Farsi)[100%]	51722	598096	10.61	11.2	4.1
Polish[100%]	71662	494263	8.99	9.25	5.86
Portuguese (Brazil)[100%]	50087	608432	10.8	11.39	5.12
Russian[100%]	72162	490908	8.96	9.19	5.79
Slovak[100%]	73789	520465	9.39	9.75	5.37
Slovenian[100%]	68619	516649	9.35	9.67	5.3
Spanish[100%]	49806	608868	10.75	11.4	5.07
Swedish[100%]	48233	581751	10.31	10.89	5
Tamil[100%]	84183	460678	8.37	8.63	7.65
Telugu[100%]	72006	464665	8.34	8.7	6.56
Turkish[100%]	78957	453521	8.27	8.49	6.35
Bulgarian[100%]	60712	564150	10.1	10.56	5.24
Croatian[100%]	73075	531326	9.58	9.95	5.28
Danish[100%]	50170	587253	10.4	11	4.98
Dutch[100%]	42716	595464	10.52	11.15	5.05

Code-mixing

All languages in Code-mixing

Language	Total Words	Unique Words	Percentage
English	500136	6312	83.6
Bengali	46933	3907	7.84
Sanskrit	51246	7202	8.56
Total	598315	17421	100

Types of Code-mixing

	English-Sanskrit	Sanskrit-English	English-Bengali	Bengali-English
Inter-Sentential	2356	2366	339	339
Intra-Sentential	2338	851	124	0

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

66 Dec 26, 2022

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

138 Dec 12, 2022

Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination The offical implementation for the "NOH-NMS: Improving Pedestrian Detection by

64 Nov 11, 2022

Official TensorFlow code for the forthcoming paper

~ Efficient-CapsNet ~ Are you tired of over inflated and overused convolutional neural networks? You're right! It's time for CAPSULES :)

203 Jan 8, 2023

This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Contrastive Clustering (CC) This is the code for the paper "Contrastive Clustering" (AAAI 2021) Dependency python=3.7 pytorch=1.6.0 torchvision=0.8

210 Dec 30, 2022

Code for the paper Learning the Predictability of the Future

Learning the Predictability of the Future Code from the paper Learning the Predictability of the Future. Website of the project in hyperfuture.cs.colu

Computer Vision Lab at Columbia University

139 Nov 18, 2022

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

43 Nov 19, 2022

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

37 Nov 21, 2022

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

199 Jan 8, 2023

Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"

Related tags

Overview

Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages

File organization

Dataset

Statistics of Corpora contained

Code-mixing

All languages in Code-mixing

Types of Code-mixing

You might also like...

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

Official TensorFlow code for the forthcoming paper

This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Code for the paper Learning the Predictability of the Future

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

Owner

Ayush Daksh

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

The implementation of our CIKM 2021 paper titled as: "Cross-Market Product Recommendation"

Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation"

The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"