CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

CogComp

Last update: Feb 28, 2022

Related tags

Deep Learning mbert-study

Overview

M-BERT-Study

CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

Motivation

Multilingual BERT (M-BERT) has shown surprising cross lingual abilities --- even when it is trained without cross lingual objectives. In this work, we analyze what causes this multilinguality from three factors: linguistic properties of the languages, the architecture of the model, and the learning objectives.

Results

Linguistic properties:

Code switching text (word-piece overlap) is not the main cause of multilinguality.
Word ordering is crucial, when words in sentences are randomly permuted, multilinguality is low, however, still significantly better than random.
(Unigram) word frequency is not enough, as we resampled all words with the same frequency, and found almost random performance. Combining the second and the third property infers that there is language similarity other than ordering of words between two languages, and which unigram frequency does not capture. We hypothesize that it may be similarity of n-gram occurrences.

Architecture:

Depth of the transformer is the most important.
Number of attention heads effects the absolute performance on individual languages, but the gap between in-language supervision and cross-language zero-shot learning didn't change much.
Total number of parameters, like depth, effects multilinguality.

Learning Objectives:

Next Sentence Prediction objective, when removed, leads to slight increase in performance.
Even marking sentences in languages with language-ids, allowing BERT to know exactly which language its learning on, did not hurt performance
Using word-pieces leads to strong improvements on both source and target language (likely to depend on tasks) and slight improvement cross-lingually comparing to word or character based models.

Please refer to our paper for more details.

Scripts

Creating pre-training data

If you would like to pre-train a BERT with Fake language/permuted sentences, see preprocessing-scripts for how to create the tfrecords for BERT training.

Pre-training BERT

Once you have uploaded the tfrecords to google cloud, you can set up an instance and start BERT training via bert-running-scripts.

Evaluating

With models we provide or just trained, we provide code for evaluating on two tasks, NER and entailment. See evaluating-scripts.

BERT Models

We release the following bert models (in a few days):

Word-piece Experiments
Word Order Experiments
Word Frequency Experiments
Model Structure Experiments

See data for detailed paths to download (in a few days).

Requirements

allennlp: 0.9.0
ccg_nlpy

Citation

Please cite the following paper if you find our paper useful. Thanks!

Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth. "Cross-Lingual Ability of Multilingual BERT: An Empirical Study" arXiv preprint arXiv:1912.07840 (2019).

@article{wang2019cross,
  title={Cross-Lingual Ability of Multilingual BERT: An Empirical Study},
  author={K, Karthikeyan and Wang, Zihan and Mayhew, Stephen and Roth, Dan},
  journal={arXiv preprint arXiv:1912.07840},
  year={2019}
}

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

18 Dec 9, 2022

PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

2.7k Dec 27, 2022

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

43 Nov 7, 2022

For AILAB: Cross Lingual Retrieval on Yelp Search Engine

Cross-lingual Information Retrieval Model for Document Search Train Phase CUDA_VISIBLE_DEVICES="0,1,2,3" \ python -m torch.distributed.launch --nproc_

104 Nov 12, 2022

Code for the AAAI 2022 paper "Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-Sentence Dependency Graph".

multilingual-mrc-isdg Code for the AAAI 2022 paper "Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-Sentence Dependency Graph". This r

5 Dec 7, 2022

EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation.

This repository contains data and code for our EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation. Please contact me at [email protected]

9 Oct 28, 2022

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

WECHSEL Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models. arXiv: https://arx

45 Dec 29, 2022

Tilted Empirical Risk Minimization (ICLR '21)

Tilted Empirical Risk Minimization This repository contains the implementation for the paper Tilted Empirical Risk Minimization ICLR 2021 Empirical ri

40 Nov 28, 2022

A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

121 Dec 17, 2022

Comments

Release of Pretrained Models

Hi,

Thank you for making the code for your paper public. Is it possible for you to make the pretrained models used in experiments available as well? It would be extremely useful for my research.

Thank you, Kabir

opened by kabirahuja2431 1

CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

Related tags

Overview

M-BERT-Study

CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

Motivation

Results

Scripts

Creating pre-training data

Pre-training BERT

Evaluating

BERT Models

Requirements

Citation

You might also like...

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

PyTorch original implementation of Cross-lingual Language Model Pretraining.

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

For AILAB: Cross Lingual Retrieval on Yelp Search Engine

Code for the AAAI 2022 paper "Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-Sentence Dependency Graph".

EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation.

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

Tilted Empirical Risk Minimization (ICLR '21)

A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Comments

Release of Pretrained Models

Owner

CogComp

[ICCV-2021] An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Meta Representation Transformation for Low-resource Cross-lingual Learning

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Paddle implementation for "Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation" (NAACL 2021)

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".