An Unsupervised Detection Framework for Chinese Jargons in the Darknet

Last update: Nov 8, 2022

Related tags

Deep Learning Chinese-Jargon-Detection

Overview

An Unsupervised Detection Framework for Chinese Jargons in the Darknet

This repo is the Python 3 implementation of 《An Unsupervised Detection Framework for Chinese Jargons in the Darknet》 (Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (WSDM ’22).

Introduction

This project proposes Chinese jargon detection framework based on unsupervised learning.

Requirements

pip install -r requirements.txt

Data

Due to the sensitivity of the darknet information, we will not distribute the dataset directly, we show some samples of dataset in /dataset/sample.csv and we will leave the contact information for readers to request for Raw Corpus.

Please contact Liang Ke ([email protected]) for the Darknet corpus dataset.
The Modern Chinese Dictionary (the 7th edition) that we used for cross-corpus comparison is from here.

Code

Preprocess the raw corpus using preprocess.py and get the clean corpus.
Find out-of-vocabulary words using newWordsDiscovey.py, and add them to tokenizer dictionary.
Pretrain word-based DC-BERT model with clean corpus using pretrain.py.
Generate word embeddings with pretrained DC-BERT using genEmbedding.py.
Consruct seed criminal keywords with findSeedKeywords.py, we show an example of a list of seed criminal keywords for readers to reference, you can either delete or add words related to your task.
Find jargon candidates (words related to relevant cybercrimes and are very likely to be jargons) with findCandidate.py.
Finally, you can obtain real darknet Chinese jargons detected by our framework using findJargon.py.

Citation

waiting for camera-ready

You might also like...

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

32 Oct 26, 2022

An Unsupervised Graph-based Toolbox for Fraud Detection

An Unsupervised Graph-based Toolbox for Fraud Detection Introduction: UGFraud is an unsupervised graph-based fraud detection toolbox that integrates s

99 Dec 11, 2022

USAD - UnSupervised Anomaly Detection on multivariate time series

USAD - UnSupervised Anomaly Detection on multivariate time series Scripts and utility programs for implementing the USAD architecture. Implementation

116 Jan 4, 2023

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

DETReg: Unsupervised Pretraining with Region Priors for Object Detection Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik

283 Dec 27, 2022

Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows WACV 2022 preprint:https://arxiv.org/abs/2107.1

156 Dec 28, 2022

An Unsupervised Detection Framework for Chinese Jargons in the Darknet

Related tags

Overview

An Unsupervised Detection Framework for Chinese Jargons in the Darknet

Introduction

Requirements

Data

Code

Citation

You might also like...

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

An Unsupervised Graph-based Toolbox for Fraud Detection

USAD - UnSupervised Anomaly Detection on multivariate time series

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

Implementation of "Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency"

MvtecAD unsupervised Anomaly Detection

Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Owner

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

A set of tools for converting a darknet dataset to COCO format working with YOLOX

A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

A tutorial on training a DarkNet YOLOv4 model for the CrowdHuman dataset

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

EmoTag helps you train emotion detection model for Chinese audios

Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

An Unsupervised Detection Framework for Chinese Jargons in the Darknet

Related tags

Overview

An Unsupervised Detection Framework for Chinese Jargons in the Darknet

Introduction

Requirements

Data

Code

Citation

You might also like...

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

An Unsupervised Graph-based Toolbox for Fraud Detection

USAD - UnSupervised Anomaly Detection on multivariate time series

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

Implementation of "Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency"

MvtecAD unsupervised Anomaly Detection

Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Owner

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

A set of tools for converting a darknet dataset to COCO format working with YOLOX

A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

A tutorial on training a DarkNet YOLOv4 model for the CrowdHuman dataset

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

EmoTag helps you train emotion detection model for Chinese audios

Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,