Kaggle Feedback Prize - Evaluating Student Writing 15th solution

Overview

Kaggle Feedback Prize - Evaluating Student Writing 15th solution


First of all, I would like to thank the excellent notebooks and discussions from https://www.kaggle.com/abhishek/two-longformers-are-better-than-1 @abhishek https://www.kaggle.com/c/feedback-prize-2021/discussion/308992 @hengck23 https://www.kaggle.com/librauee/infer-fast-ensemble-models @librauee I learned a lot from their work. This is the second kaggle competition we have participated in, and although we are one short of gold, we are already very satisfied. In our work, I am mainly responsible for the training of the model, and @yscho1 is mainly responsible for the post-processing.

Highlight

  • In the final commit, we ensemble 6 debreta_xlarge, 6 longformer-large-4096, 2 funnel-large, 2 deberta-v3-large and 2 deberta-large. We set the max_length to 1600. We use Fast Gradient Method(FGM) to improve robustness and use Exponential Moving Average(EMA) to smooth training.

  • Use optuna to learn all the hyperparameters in the post processing stage.

  • CV results show that deberta-xlarge(0.7092) > deberta-large(0.7025) > deberta-large-v3(0.6842) > funnel-large(0.6798) = longformer-large-4096(0.6748)

  • Merge consecutive predictions with same label, for example we merge [B-Lead, I-Lead, I-Lead], [B-Lead, I-Lead] into one single prediction. We only do this operation when the label is in ['Lead', 'Position', 'Concluding', 'Rebuttal'], since there are not consecutive predictions for these labels in the training data.

  • Filter "Lead" and "Concluding". There are only one Lead label and Concluding Label in almost all the trainging data, so we only keep the predictions that has higher score than threshold. Besides, we found that merge two Lead can increase cv further.

concluding_df = sorted(concluding_df, key=lambda x: np.mean(x[4]), reverse=True)
new_begin = min(concluding_df[0][3][0], concluding_df[1][3][0])
new_end = max(concluding_df[0][3][-1], concluding_df[1][3][-1])
  • Since the score is based on the overlap between prediction and ground truth, so we extend the predictions from word_list[begin:end] to word_list[begin - 1: end + 1]. Hoping the extended predictions can better hit ground truth and accross the 50% threshold.

  • Scaling. The probabilities of each token are multiplied by a factor. The factors are obtained through genetic algorithm search.

  • There are some other attempts but didn't work well. These attempts are included in the inference notebook.

Code

# Model Training
bash script/run_Base_train_gpu.sh
# Model Predict
bash script/run_predict.sh
# Params Learning
bash script/run_params_test.sh
You might also like...
Solution of Kaggle competition: Sartorius - Cell Instance Segmentation

Sartorius - Cell Instance Segmentation https://www.kaggle.com/c/sartorius-cell-instance-segmentation Environment setup Build docker image bash .dev_sc

Xview3 solution - XView3 challenge, 2nd place solution
Xview3 solution - XView3 challenge, 2nd place solution

Xview3, 2nd place solution https://iuu.xview.us/ test split aggregate score publ

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.
This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training
Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

Flood Detection Challenge This repository contains code for our submission to the ETCI 2021 Competition on Flood Detection (Winning Solution #2). Acco

This is the official pytorch implementation of Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation(TESKD)
This is the official pytorch implementation of Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation(TESKD)

Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation (TESKD) By Zheng Li[1,4], Xiang Li[2], Lingfeng Yang[2,4], Jian Yang[2], Zh

 Predicting Student Attentiveness using OpenCV
Predicting Student Attentiveness using OpenCV

Predicting-Student-Attentiveness-using-OpenCV The model will predict if a student is attentive or not through facial parameter received through the st

Implementation of Feedback Transformer in Pytorch

Feedback Transformer - Pytorch Simple implementation of Feedback Transformer in Pytorch. They improve on Transformer-XL by having each token have acce

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing
Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

SPLASH: Semantic Parsing with Language Assistance from Humans SPLASH is dataset for the task of semantic parse correction with natural language feedba

Code for
Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

PyMAF This repository contains the code for the following paper: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop Hongwe

Owner
Lingyuan Zhang
Lingyuan Zhang
Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

Rounak Das 1 Feb 15, 2022
Feedback is important: response-aware feedback mechanism for background based conversation

RFM The code for the paper: "Feedback is important: response-aware feedback mechanism for background based conversation." Requirements python 3.7 pyto

Jiatao Chen 2 Sep 29, 2022
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th place solution

Lyft Motion Prediction for Autonomous Vehicles Code for the 4th place solution of Lyft Motion Prediction for Autonomous Vehicles on Kaggle. Discussion

null 44 Jun 27, 2022
Winning solution of the Indoor Location & Navigation Kaggle competition

This repository contains the code to generate the winning solution of the Kaggle competition on indoor location and navigation organized by Microsoft

Tom Van de Wiele 62 Dec 28, 2022
7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle

kaggle-hpa-2021-7th-place-solution Code for 7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle. A description of the met

null 8 Jul 9, 2021
Kaggle | 9th place (part of) solution for the Bristol-Myers Squibb – Molecular Translation challenge

Part of the 9th place solution for the Bristol-Myers Squibb – Molecular Translation challenge translating images containing chemical structures into I

Erdene-Ochir Tuguldur 22 Nov 30, 2022
My 1st place solution at Kaggle Hotel-ID 2021

1st place solution at Kaggle Hotel-ID My 1st place solution at Kaggle Hotel-ID to Combat Human Trafficking 2021. https://www.kaggle.com/c/hotel-id-202

Kohei Ozaki 18 Aug 19, 2022
Kaggle | 9th place single model solution for TGS Salt Identification Challenge

UNet for segmenting salt deposits from seismic images with PyTorch. General We, tugstugi and xuyuan, have participated in the Kaggle competition TGS S

Erdene-Ochir Tuguldur 276 Dec 20, 2022
10th place solution for Google Smartphone Decimeter Challenge at kaggle.

Under refactoring 10th place solution for Google Smartphone Decimeter Challenge at kaggle. Google Smartphone Decimeter Challenge Global Navigation Sat

null 12 Oct 25, 2022
Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Hiroshechka Y 33 Dec 26, 2022