Language Models are Few-shot Multilingual Learners
Paper
This is the source code of the paper [Arxiv] [ACL Anthology]:
This code has been written using PyTorch. If you use source codes or datasets included in this toolkit in your work, please cite the following paper:
@inproceedings{winata-etal-2021-language, title = "Language Models are Few-shot Multilingual Learners", author = "Winata, Genta Indra and Madotto, Andrea and Lin, Zhaojiang and Liu, Rosanne and Yosinski, Jason and Fung, Pascale", booktitle = "Proceedings of the 1st Workshop on Multilingual Representation Learning", month = nov, year = "2021", address = "Punta Cana, Dominican Republic", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.mrl-1.1", pages = "1--15", }
Setup Environment
GPU Machine
pip install -r requirements.txt
GPU Machine for Running GPT-J 6B Model
apt install zstd
# the "slim" version contain only bf16 weights and no optimizer parameters, which minimizes bandwidth and memory
wget -c https://the-eye.eu/public/AI/GPT-J-6B/step_383500_slim.tar.zstd
tar -I zstd -xf step_383500_slim.tar.zstd
pip install -r mesh_transformer_jax/requirements.txt
# jax 0.2.12 is required due to a regression with xmap in 0.2.13
pip install mesh-transformer-jax/ jax==0.2.12
# cuda[your_cuda_version]
pip install jaxlib==0.1.67+cuda101 -f https://storage.googleapis.com/jax-releases/jax_releases.html
How to run
Zero-shot Cross-task
❱❱❱ CUDA_VISIBLE_DEVICES=0 python evaluate.py --dataset snips --model_checkpoint facebook/bart-large-mnli --cuda --length 5 --label_type value --src_lang en --tgt_lang en --seed 42 --use_log_prob --use_confidence --is_cross_task
Finetune
❱❱❱ CUDA_VISIBLE_DEVICES=0 python finetune.py --dataset snips --model_checkpoint bert-base-multilingual-uncased --cuda --label_type value --src_lang en --tgt_lang en --seed 42