APEACH - Korean Hate Speech Evaluation Datasets
APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of the dataset are created by anonymous participants using an online crowdsourcing platform DeepNatural AI.
Download
You can download benchmark set APEACH. APEACH/test.csv
in this repository.
Dataset Description
- APEACH : A hate-speech evaluation dataset generated in 2021, using generation method followd by APEACH paper.
Guidelines
Topics
Lengths
Paper
Experiment Code
Experiment Results
Name | Beep! Dev Dataset | Apeach (Ours) |
---|---|---|
SoongsilBERT-Base | 0.8261 | 0.8424 |
SoongsilBERT-Small | 0.8149 | 0.8228 |
KcBERT-base | 0.8088 | 0.8086 |
KcBERT-large | 0.8295 | 0.8116 |
DistillKoBERT | 0.7570 | 0.7715 |
KoELECTRA-V3 | 0.7920 | 0.8101 |
KoBERT | 0.8030 | 0.7885 |
We also share BEST model of our dataset which we trained in this experiment as checkpoint, demo webite and api.
Citation
@article{yang2022apeach,
title={APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets},
author={Yang, Kichang and Jang, Wonjun and Cho, Won Ik},
journal={arXiv preprint arXiv:2202.12459},
year={2022}
}
Contributors
The main contributors of the work ( * : equal contribution) :
- Kichang Yang* (Kakao Corp., Kakao Enterprise Corp., Soongsil University)
- Wonjun Jang* (Kakao Corp., Soongsil University)
- Won Ik Cho* (Seoul National University)
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.