Text mining project; Using distilBERT to predict authors in the classification task authorship attribution.

Last update: Jan 13, 2022

Related tags

Deep Learning DistilBERT-Text-mining-authorship-attribution

Overview

DistilBERT-Text-mining-authorship-attribution

Dataset used: https://www.kaggle.com/azimulh/tweets-data-for-authorship-attribution-modelling/version/2 DistilBERT: https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation

dataset - Contains useful functions relating to the datasets.

feature_extraction_selection - Plots all models using best dataset and parameters. Used to compare feature extraction methods. (Code inspired by: https://github.com/jalammar/jalammar.github.io/blob/master/notebooks/bert/A_Visual_Notebook_to_Using_BERT_for_the_First_Time.ipynb)

dataset_selection - Prints out the average accuracy for every dataset.

ds_exploration - Used to print the shapes of each dataset and plot the class distributions.

baseline_process - Run gridsearch cv on all ml models (excluding BERT variants)

bert - Run gridsearch cv on all ml models (BERT variants) (Code inspired by: https://github.com/jalammar/jalammar.github.io/blob/master/notebooks/bert/A_Visual_Notebook_to_Using_BERT_for_the_First_Time.ipynb)

ml - Contains useful functions relating to the machine learning.

Project done for the TDDE16 - Text mining course at Linköpings university.

Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification

Fine-grainedImageClassification Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification We trained model here: lin

14 Oct 21, 2022

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

187 Dec 24, 2022

Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures.

NLP_0-project Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures1. We are a "democratic" and c

3 Mar 16, 2022

using STGCN to achieve egg classification task

EEG Classification The task requires us to classify electroencephalography(EEG) into six categories, including human body, human face, animal body,

4 Jun 13, 2022

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

Siamese Deep Neural Networks for Semantic Text Similarity PyTorch A repository c

32 Dec 15, 2022

Implementation of association rules mining algorithms (Apriori|FPGrowth) using python.

Association Rules Mining Using Python Implementation of association rules mining algorithms (Apriori|FPGrowth) using python. As a part of hw1 code in

2 Nov 10, 2021

Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Codes-for-Algorithms Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

1 Apr 12, 2022

Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

A Network-Based High-Level Data Classification Algorithm Using Betweenness Centr

3 Dec 1, 2022

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

105 Jan 3, 2023

Text mining project; Using distilBERT to predict authors in the classification task authorship attribution.

Related tags

Overview

DistilBERT-Text-mining-authorship-attribution

You might also like...

Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures.

using STGCN to achieve egg classification task

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

Implementation of association rules mining algorithms (Apriori|FPGrowth) using python.

Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Owner

PyTorch implementation of VAGAN: Visual Feature Attribution Using Wasserstein GANs

Authors implementation of LieTransformer: Equivariant Self-Attention for Lie Groups

Classic Papers for Beginners and Impact Scope for Authors.

PyTorch Implementation of the SuRP algorithm by the authors of the AISTATS 2022 paper "An Information-Theoretic Justification for Model Pruning"

git《Self-Attention Attribution: Interpreting Information Interactions Inside Transformer》(AAAI 2021) GitHub:

Keep CALM and Improve Visual Feature Attribution

This is the pytorch implementation of the paper - Axiomatic Attribution for Deep Networks.

This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (KDD'21).

Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.