Information Gain Filtration
Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF shows significant improvements over baseline fine-tuning without data filtration. The provided Jupyter Notebook gives a simple demostration into the use of IGF during language model finetuning. Data for this demonstration is available on Figshare here.
If you use this method in your published work, please cite the ACL paper that describes this method here.