KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.


KLUE Baseline


KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark. See our paper for more details about KLUE and the baselines.


Make sure you have installed the packages listed in requirements.txt.

pip install -r requirements.txt

All expereiments are tested under Python 3.7 environment.

KLUE Benchmark Datasets

All train/dev sets of KLUE tasks are publicly available in this repo. You can access them by using git submodules. To clone the repo with datasets:

git clone --recursive https://github.com/KLUE-benchmark/KLUE-Baseline.git

or just download datasets after cloned this repo:

git submodule update --init --recursive

All test sets are not publicly available. To measure performance of your model on test set, you should first train your model on train set and submit the model to our submission system. Alternatively, you can compare dev set performances with our baseline models. They are also reported in our paper.


To reproduce our baselines, run run_all.sh.

NOTE: klue/roberta models accept input length at most 510 tokens. Details are explained here.


If you use this code or KLUE, please cite:

      title={KLUE: Korean Language Understanding Evaluation}, 
      author={Sungjoon Park and Jihyung Moon and Sungdong Kim and Won Ik Cho and Jiyoon Han and Jangwon Park and Chisung Song and Junseong Kim and Yongsook Song and Taehwan Oh and Joohong Lee and Juhyun Oh and Sungwon Lyu and Younghoon Jeong and Inkwon Lee and Sangwoo Seo and Dongjun Lee and Hyunwoo Kim and Myeonghwa Lee and Seongbo Jang and Seungwon Do and Sunkyoung Kim and Kyungtae Lim and Jongwon Lee and Kyumin Park and Jamin Shin and Seonghyun Kim and Lucy Park and Alice Oh and Jung-Woo Ha and Kyunghyun Cho},


Feel free to leave issues if there are any questions or comments. To contribute, please run make style before creating pull requests.

  • Task 별 submission

    Task 별 submission

    What's your idea? 🤔

    안녕하세요, 이렇게 klue benchmark 에 대한 pipeline 공개해주셔서 감사합니다. 다름이 아니라, fine tuning 결과를 test set 으로도 평가해보고 싶은데, KLUE 공식 페이지 에 submission 하는 형식이 어떻게 되나요 ? 우선 output_dir 자체를 tar -czvf submission.tar.gz [output_dir] 로 압축하고 있는데, 계속 fail 이 떠서 log 를 살펴보고 있습니다. checkpoint 만 따로 묶어 두어야 하는지, 혹은 다른 방법이 있는지 궁금합니다.


    opened by Jihyun22 2
  • klue_baseline/models/named_entity_recognition.py 에 validation_epoch_step 에 버그가 있습니다.

    klue_baseline/models/named_entity_recognition.py 에 validation_epoch_step 에 버그가 있습니다.

    Abstract(요약) 🔥

    unk 토큰이 있는 경우 제대로 character_preds 리스트가 제대로 생성되지 않는 문제가 있습니다.

    How to Reproduce(재현 방법) 🤔

    예를 들어 "전문성운줄알았어여~ᄏ"를 토크나이징하면 (sample id : klue-ner-v1_dev_00236-nsmc)

    ['전문', '##성', '##운', '##줄', '##알', '##았', '##어', '##여', '~', '[UNK]']

    이런 결과가 나오는데요.

    이러한 input이 https://github.com/KLUE-benchmark/KLUE-baseline/blob/main/klue_baseline/models/named_entity_recognition.py#L98-L129 이 if문을 타게 되면,

    character_preds가 원하는 형태로 생성되지 않게 됩니다.

    이유는 unk가 있는 공백기준으로 분리뒨 어절에 unk가 아닌 단어는 모두 기호일거라고 가정되어 코드작성이 되었기 때문인 것 같습니다.

    이 때문에 '전문' 같은 경우에는 char 이 2개임에도 subword_pred가 캐릭터 하나에 대한 pred만 append 되는 상황이 됩니다. (https://github.com/KLUE-benchmark/KLUE-baseline/blob/main/klue_baseline/models/named_entity_recognition.py#L125)

    How to solve (어떻게 해결할 수 있을까요) 🙋‍♀

                    if self.tokenizer.unk_token in subwords:  # 뻥튀기가 필요한 case!
                        unk_aligned_subwords = self.tokenizer_out_aligner(
                            word, subwords, strip_char
                        )  # [UNK] -> [UNK, +UNK]
                        add_char_preds_idx = 0  # 추가된 부분
                        unk_flag = False
                        for subword in unk_aligned_subwords:
                            if character_preds_idx >= self.hparams.max_seq_length - 1:
                            subword_pred = subword_preds[character_preds_idx].tolist()
                            subword_pred_label = label_list[subword_pred]
                            if subword == self.tokenizer.unk_token:
                                unk_flag = True
                                add_char_preds_idx += 1  # 추가된 부분
                            elif subword == self.in_unk_token:
                                if subword_pred_label == "O":
                                    _, entity_category = subword_pred_label.split("-")
                                    character_pred_label = "I-" + entity_category
                                    character_pred = label_list.index(character_pred_label)
                                add_char_preds_idx += 1  # 추가된 부분
                                if unk_flag:
                                    character_preds_idx += 1
                                    subword_pred = subword_preds[character_preds_idx].tolist()
                                    subword_pred = [subword_pred] * len(subword.lstrip(strip_char))  # 추가된 부분
                                    character_preds.extend(subword_pred)  # 추가된 부분
                                    unk_flag = False
                                    subword_pred = [subword_pred] * len(subword.lstrip(strip_char))    # 추가된 부분
                                    character_preds.extend(subword_pred)    # 추가된 부분
                                    character_preds_idx += 1  # `+UNK`가 끝나는 시점에서도 += 1 을 해줘야 다음 label로 넘어감
                        character_preds_idx += add_char_preds_idx    # 추가된 부분

    코드를 우선 려프하게 작성하게 되었는데, 해당 부분을 검토해주셔서 더 좋은 코드(?)로 업데이트 되면 좋을 것 같습니다!

    좋은 finetuning system을 만들어주셔서 감사합니다 🙇‍♂️

    opened by KhelKim 0
  • NER bug fix

    NER bug fix

    Original code includes 'O' class when calculating f1 score, which should have been excluded based on what KLUE paper says. This commit fixes the issue. 기존 코드는 NER F1 score 계산 시 'O' class를 포함하고 있습니다. KLUE 논문에 따르면 'O' class는 계산 시 제외되어야 합니다. 이 PR은 해당 문제를 fix 합니다.

    opened by Joon-June 0
  • Training error on klue-dp task

    Training error on klue-dp task

    Abstract(요약) 🔥

    run-all.sh multi gpu 실행 시 일부 task(dependency parsing)가 정상적으로 작동하지 않습니다.


    RuntimeError: The size of tensor a (23) must match the size of tensor b (25) at non-singleton dimension 2

    How to Reproduce(재현 방법) 🤔


    git clone --recursive https://github.com/KLUE-benchmark/KLUE-Baseline.git pip install -r requirements.txt pip install torch==1.7.0+cu110 -f https://download.pytorch.org/whl/torch_stable.html (cuda version matching with torch)

    run-all.sh 수정: KLUE-DP task="klue-dp"

    python run_klue.py train --task ${task} --output_dir ${OUTPUT_DIR} --data_dir ${DATA_DIR}/${task}-${VERSION} --model_name_or_path klue/roberta-large --learning_rate 5e-5 --num_train_epochs 15 --gradient_accumulation_steps 1 --warmup_ratio 0.2 --train_batch_size 32 --patience 10000 --max_seq_length 256 --metric_key uas_macro_f1 --gpus 0 --num_workers 4


    python run_klue.py train --task ${task} --output_dir ${OUTPUT_DIR} --data_dir ${DATA_DIR}/${task}-${VERSION} --model_name_or_path klue/roberta-large --learning_rate 3e-5 --num_train_epochs 10 --train_batch_size 16 --eval_batch_size 16 --max_seq_length 510 --gradient_accumulation_steps 2 --warmup_ratio 0.2 --weight_decay 0.01 --max_grad_norm 1.0 --patience 100000 --metric_key slot_micro_f1 --gpus 1 2 3 --num_workers 8

    bash run-all.sh

    RuntimeError: The size of tensor a (23) must match the size of tensor b (25) at non-singleton dimension 2

    How to solve (어떻게 해결할 수 있을까요) 🙋‍♀

    single GPU에선 메모리 부족으로 roBERTa-Large 모델로 학습이 불가하여 혹시 도움 받을 수 있을까 싶어 문의드립니다!


    opened by pion0926 0
  • klue_baseline/data/klue_dp.py에 관해서

    klue_baseline/data/klue_dp.py에 관해서

    Abstract(요약) 🔥

    안녕하세요! fine-tuning해보는 과정에서 직접적 bug는 아니지만 issue에 올려봅니다! klue_dp.py에서 사용하는 정보는 아니지만 example별 guid가 잘못 들어가게 됩니다.

    How to Reproduce(재현 방법) 🤔

    convert_examples_to_features함수의 feature.append과정에서 example.guid는 새로 받은 example의 guid이므로 한단계식 밀려서 들어갑니다.

    How to solve (어떻게 해결할 수 있을까요) 🙋‍♀

    이전 example의 guid를 넣는 방식으로 해결할 수 있습니다.

    opened by joonkeekim 0
  • Update requirements.txt

    Update requirements.txt

    먼저, KLUE 베이스라인을 만들어주셔서 감사드립니다. KLUE를 이해하는데 많은 도움이 되고 있습니다.

    PR Point

    • colab 환경에서 실행가능하도록 requiremens.txt에 라이브러리 버전 명시


    colab에서 라이브러리 설치를 해보았는데 의존성 문제가 있어 설치가 안되었습니다. 설치가 되도록 변경하여 PR을 만들어봅니다. 아래 링크의 AS-IS와 TO-BE 부분의 로그를 확인해주시면 감사하겠습니다. https://colab.research.google.com/drive/1KOy8VzKQT4Sk2J53NKjy5zzbs_RIk5zM?usp=sharing

    opened by tucan9389 1
Utilizing RBERT model for KLUE Relation Extraction task

RBERT for Relation Extraction task for KLUE Project Description Relation Extraction task is one of the task of Korean Language Understanding Evaluatio

snoop2head 14 Nov 15, 2022
PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding This repository contains the official PyTorch implementation of th

Xiao Xu 26 Dec 14, 2022
Training code for Korean multi-class sentiment analysis

KoSentimentAnalysis Bert implementation for the Korean multi-class sentiment analysis 왜 한국어 감정 다중분류 모델은 거의 없는 것일까?에서 시작된 프로젝트 Environment: Pytorch, Da

Donghoon Shin 3 Dec 2, 2022
Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Disfl-QA is a targeted dataset for contextual disfluencies in an information seeking setting, namely question answering over Wikipedia passages. Disfl-QA builds upon the SQuAD-v2 (Rajpurkar et al., 2018) dataset, where each question in the dev set is annotated to add a contextual disfluency using the paragraph as a source of distractors.

Google Research Datasets 52 Jun 21, 2022
Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

Samuel Cahyawijaya 11 Aug 26, 2022
中文医疗信息处理基准CBLUE: A Chinese Biomedical LanguageUnderstanding Evaluation Benchmark

English | 中文说明 CBLUE AI (Artificial Intelligence) is playing an indispensabe role in the biomedical field, helping improve medical technology. For fur

null 452 Dec 30, 2022
A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。

简体中文 | English 并行语音合成 [TOC] 新进展 2021/04/20 合并 wavegan 分支到 main 主分支,删除 wavegan 分支! 2021/04/13 创建 encoder 分支用于开发语音风格迁移模块! 2021/04/13 softdtw 分支 支持使用 Sof

Atomicoo 161 Dec 19, 2022
A collection of Korean Text Datasets ready to use using Tensorflow-Datasets.

tfds-korean A collection of Korean Text Datasets ready to use using Tensorflow-Datasets. TensorFlow-Datasets를 이용한 한국어/한글 데이터셋 모음입니다. Dataset Catalog |

Jeong Ukjae 20 Jul 11, 2022
A BERT-based reverse-dictionary of Korean proverbs

Wisdomify A BERT-based reverse-dictionary of Korean proverbs. 김유빈 : 모델링 / 데이터 수집 / 프로젝트 설계 / back-end 김종윤 : 데이터 수집 / 프로젝트 설계 / front-end Quick Start C

Eu-Bin KIM 94 Dec 8, 2022
Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

Neosapience 103 Dec 23, 2022
Korean Simple Contrastive Learning of Sentence Embeddings using SKT KoBERT and kakaobrain KorNLU dataset

KoSimCSE Korean Simple Contrastive Learning of Sentence Embeddings implementation using pytorch SimCSE Installation git clone https://github.com/BM-K/

null 34 Nov 24, 2022
A BERT-based reverse dictionary of Korean proverbs

Wisdomify A BERT-based reverse-dictionary of Korean proverbs. 김유빈 : 모델링 / 데이터 수집 / 프로젝트 설계 / back-end 김종윤 : 데이터 수집 / 프로젝트 설계 / front-end / back-end 임용

null 94 Dec 8, 2022
Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Korean Stereotype Detector Korean stereotype sentence classifier using K-StereoSet with TUNiB-Electra Web demo you can test this model easily in demo

Sae_Chan_Oh 11 Feb 18, 2022
Transformer Based Korean Sentence Spacing Corrector

TKOrrector Transformer Based Korean Sentence Spacing Corrector License Summary This solution is made available under Apache 2 license. See the LICENSE

Paul Hyung Yuel Kim 3 Apr 18, 2022
🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

Pretrained BigBird Model for Korean What is BigBird • How to Use • Pretraining • Evaluation Result • Docs • Citation 한국어 | English What is BigBird? Bi

Jangwon Park 183 Dec 14, 2022
KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT KoGPT (Korean Generative Pre-trained Transformer) https://github.com/kakaobrain/kogpt https://huggingface.co/kakaobrain/kogpt Model Descriptions

Kakao Brain 797 Dec 26, 2022
Generating Korean Slogans with phonetic and structural repetition

LexPOS_ko Generating Korean Slogans with phonetic and structural repetition Generating Slogans with Linguistic Features LexPOS is a sequence-to-sequen

Yeoun Yi 3 May 23, 2022
Korean extractive summarization. 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드

korean extractive summarization 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드 Leaderboard Notice Text Summarization with Pretrained Encoders에 나오는 bertsumext모델(ext

null 3 Aug 10, 2022