A collection of GNN-based fake news detection models.

Overview

GNN-based Fake News Detection

Open in Code Ocean PWC PWC

Installation | Datasets | User Guide | Benchmark | How to Contribute

This repo includes the Pytorch-Geometric implementation of a series of Graph Neural Network (GNN) based fake news detection models. All GNN models are implemented and evaluated under the User Preference-aware Fake News Detection (UPFD) framework. The fake news detection problem is instantiated as a graph classification task under the UPFD framework.

You can make reproducible run on CodeOcean without manual configuration.

We welcome contributions of results of existing models and the SOTA results of new models based on our dataset. You can check the benchmark hosted by PaperWithCode for SOTA models and their performances.

If you use the code in your project, please cite the following paper:

SIGIR'21 (PDF)

@inproceedings{dou2021user,
  title={User Preference-aware Fake News Detection},
  author={Dou, Yingtong and Shu, Kai and Xia, Congying and Yu, Philip S. and Sun, Lichao},
  booktitle={Proceedings of the 44nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2021}
}

Installation

To run the code in this repo, you need to have Python>=3.6, PyTorch>=1.6, and PyTorch-Geometric>=1.6.1. Please follow the installation instructions of PyTorch-Geometric to install PyG.

Other dependencies can be installed using the following commands:

git clone https://github.com/safe-graph/GNN-FakeNews.git
cd GNN-FakeNews
pip install -r requirements.txt

Datasets

The dataset can be loaded using the PyG API. You can download the dataset (2.66GB) via the link below and unzip the data under the \data directory.

https://mega.nz/file/j5ZFEK7Z#KDnX2sjg65cqXsIRi0cVh6cvp7CDJZh1Zlm9-Xt28d4

The dataset includes fake&real news propagation networks on Twitter built according to fact-check information from Politifact and Gossipcop. The news retweet graphs were originally extracted by FakeNewsNet. We crawled near 20 million historical tweets from users who participated in fake news propagation in FakeNewsNet to generate node features in the dataset.

The statistics of the dataset is shown below:

Data #Graphs #Fake News #Total Nodes #Total Edges #Avg. Nodes per Graph
Politifact 314 157 41,054 40,740 131
Gossipcop 5464 2732 314,262 308,798 58

Due to the Twitter policy, we could not release the crawled user historical tweets publicly. To get the corresponding Twitter user information, you can refer to news lists under \data and map the news id to FakeNewsNet. Then, you can crawl the user information by following the instruction on FakeNewsNet. In the UPFD project, we use Tweepy and Twitter Developer API to get the user information.

We incorporate four node feature types in the dataset, the 768-dimensional bert and 300-dimensional spacy features are encoded using pretrained BERT and spaCy word2vec, respectively. The 10-dimensional profile feature is obtained from a Twitter account's profile. You can refer to profile_feature.py for profile feature extraction. The 310-dimensional content feature is composed of a 300-dimensional user comment word2vec (spaCy) embedding plus a 10-dimensional profile feature.

Each graph is a hierarchical tree-structured graph where the root node represents the news, the leaf nodes are Twitter users who retweeted the root news. A user node has an edge to the news node if he/she retweeted the news tweet. Two user nodes have an edge if one user retweeted the news tweet from the other user. The following figure shows the UPFD framework including the dataset construction details You can refer to the paper for more details about the dataset.



User Guide

All GNN-based fake news detection models are under the \gnn_model directory. You can fine-tune each model according to arguments specified in the argparser of each model. The implemented models are as follows:

  • GNN-CL: Han, Yi, Shanika Karunasekera, and Christopher Leckie. "Graph neural networks with continual learning for fake news detection from social media." arXiv preprint arXiv:2007.03316 (2020).
  • GCNFN: Monti, Federico, Fabrizio Frasca, Davide Eynard, Damon Mannion, and Michael M. Bronstein. "Fake news detection on social media using geometric deep learning." arXiv preprint arXiv:1902.06673 (2019).
  • BiGCN: Bian, Tian, Xi Xiao, Tingyang Xu, Peilin Zhao, Wenbing Huang, Yu Rong, and Junzhou Huang. "Rumor detection on social media with bi-directional graph convolutional networks." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, pp. 549-556. 2020.
  • UPFD-GCN: Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016).
  • UPFD-GAT: Veličković, Petar, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. "Graph attention networks." arXiv preprint arXiv:1710.10903 (2017).
  • UPFD-SAGE: Hamilton, William L., Rex Ying, and Jure Leskovec. "Inductive representation learning on large graphs." arXiv preprint arXiv:1706.02216 (2017).

Since the UPFD framework is built upon the PyG, you can easily try other graph classification models like GIN and HGP-SL under our dataset.

How to Contribute

You are welcomed to submit your model code, hyper-parameters, and results to this repo via create a pull request. After verifying the results, your model will be added to the repo and the result will be updated to the benchmark. Please email to [email protected] for other inquiries.

Comments
  • About eval_helper.py

    About eval_helper.py

    After I change data_size = len(loader.dataset.indices()) to data_size = len(loader.dataset.indices) on line 15, I can run the program normally. Otherwise TypeError will be reported: 'list' object is not callable.

    opened by Chenzhuomin529 6
  • Bug occurred in utils/eval_helper.py

    Bug occurred in utils/eval_helper.py

    Hello! A bug occurred in the utils/eval_helper.py file in the line 15 data_size = len(loader.dataset.indices()) . The error is 'list' object is not callable . I have tried to solve it but I failed . Could you please tell me why this happen and how to solve? Thanks!

    opened by shizia 5
  • Use models after training to classify unknown data

    Use models after training to classify unknown data

    Hi, first of all: Thanks for your work! I'm new to GNNs in the domain of fake new detection and I could reproduce the training. My question is now what to do next with those models? Or how can I check a news article if it is fake or not?

    opened by padmalcom 3
  • raw data requirement

    raw data requirement

    Hi,

    Could you kindly release the raw data (original tweets) for that the crawler cannot crawl some tweet due to out of date or they have been deleted. It will be good to release the raw data.

    opened by NonvolatileMemory 2
  • If the news and news publisher feature in dataset

    If the news and news publisher feature in dataset

    The content feature is composed of a 300-dimensional user comment feature and a 10-dimensional user profile feature. But it seems that in the code, the first feature of each graph is the news (the root). So is that means the first content feature of each graph is the feature of news and the profile of publisher , not the feature of user comment and user profile? Also, is the first spacy feature of each graph is the recent 200 news of the publisher?

    opened by shizia 1
  • About the order of feature data

    About the order of feature data

    Hello, sorry to bother you again! I wonder if the four types node feature in the dataset follow the same order from the first node to the last node ? For example , if I want to concatenate the 'spacy' feature and the 'profile' feature, if the same row of these two matrices decribe the same node so that I just need to concatenate the two feature vectors of the same row ? Also, even though there is the 'content' feature , it seems that it's different with the concatenation of 'spacy' feature and 'profile' feature when I print them. Is that a problem? Thanks!

    opened by shizia 1
  • 下载数据集时报错 No such file or directory

    下载数据集时报错 No such file or directory

    你好,我在Google colab中使用pytorch_geometric执行到代码 train_data = UPFD(root=".", name="gossipcop", feature="spacy", split="train") 会报错 "FileNotFoundError: [Errno 2] No such file or directory: './gossipcop/raw/new_spacy_feature.npz'"

    文件夹里有gossipcop/raw 的目录,但是其中没有文件 请问是出什么问题了?

    opened by Leon-377 1
  • 'tupleBatch' object has no attribute 'stores_as'

    'tupleBatch' object has no attribute 'stores_as'

    When I run the gnn.py, in line 156 the error 'tupleBatch' object has no attribute 'stores_as' occurred. It looks so strange and I don't know why this happen. Could you please tell me how to solve this?

    opened by shizia 1
  • SSLError occurred

    SSLError occurred

    When I run the code example of PyG, the SSLError occurred. HTTPSConnectionPool(host='docs.google.com', port=443): Max retries exceeded with url: /uc?export=download&id=1KOmSrlGcC50PjkvRVbyb_WoWHVql06J- (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1125)'))) I have looked for solution online but still didn't slove it. Do you know how to slove this problem ? Thanks!

    opened by shizia 1
  • Bug Report in utils/eval_helper.py

    Bug Report in utils/eval_helper.py

    There is a bug detected in the utils/eval_helper.py file in the line 15. The problematic line is:

    data_size = len(loader.dataset.indices)

    This raises an error while .indices is a function.

    In order to fix this issue, the line should be updated as following:

    data_size = len(loader.dataset.indices())

    opened by Hazerank 1
  • How to import the models and predict on a fresh dataset?

    How to import the models and predict on a fresh dataset?

    Discussed in https://github.com/safe-graph/GNN-FakeNews/discussions/5

    Originally posted by thepraveen19 December 24, 2021 I am working on a twitter dataset where I want to classify the tweets as real or fake. Is there a way to import the model and use it for predictions?

    Please help!

    opened by thepraveen19 0
  • Profile Features for news do not match earliest tweet

    Profile Features for news do not match earliest tweet

    The new_profile_features include features for the news article the trees are based on. In a closed issue it is stated, that "the first feature in each graph represents the news encoding plus the profile feature of the Twitter account who first tweets the news". So logically the profile feature of the first graph should be exactly the same as the profile feature for the account that tweets the news first, but it is equal to one of a later occurence in the graph.

    What is the correct way to create the profile_features for the first features?

    opened by rknntns 1
Owner
SafeGraph
Towards Secure Machine Learning on Graph Data
SafeGraph
The proliferation of disinformation across social media has led the application of deep learning techniques to detect fake news.

Fake News Detection Overview The proliferation of disinformation across social media has led the application of deep learning techniques to detect fak

Kushal Shingote 1 Feb 8, 2022
This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Nepali-news-notifier This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular in

Sachit Yadav 1 Feb 11, 2022
NLP project that works with news (NER, context generation, news trend analytics)

СоАвтор СоАвтор – платформа и открытый набор инструментов для редакций и журналистов-фрилансеров, который призван сделать процесс создания контента ма

null 38 Jan 4, 2023
Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline

Twitter-News-Summarizer Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline 1.) Extracts all tweets fr

Rohit Govindan 1 Jan 27, 2022
A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

Ethan 66 Dec 26, 2022
Fake Shakespearean Text Generator

Fake Shakespearean Text Generator This project contains an impelementation of stateful Char-RNN model to generate fake shakespearean texts. Files and

Recep YILDIRIM 1 Feb 15, 2022
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

pyannote 2.2k Jan 9, 2023
Jarvis is a simple Chatbot with a GUI capable of chatting and retrieving information and daily news from the internet for it's user.

J.A.R.V.I.S Kindly consider starring this repository if you like the program :-) What/Who is J.A.R.V.I.S? J.A.R.V.I.S is an chatbot written that is bu

Epicalable 50 Dec 31, 2022
A curated list of FOSS tools to improve the Hacker News experience

Awesome-Hackernews Hacker News is a social news website focusing on computer technologies, hacking and startups. It promotes any content likely to "gr

Bryton Lacquement 141 Dec 27, 2022
Abhijith Neil Abraham 2 Nov 5, 2021
topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

NLP Space News Topic Modeling Photos by nasa.gov (1, 2, 3, 4, 5) and extremetech.com Table of Contents Project Idea Data acquisition Primary data sour

edesz 1 Jan 3, 2022
An extensive UI tool built using new data scraped from BBC News

BBC-News-Analyzer An extensive UI tool built using new data scraped from BBC New

Antoreep Jana 1 Dec 31, 2021
Ελληνικά νέα (Python script) / Greek News Feed (Python script)

Ελληνικά νέα (Python script) / Greek News Feed (Python script) Ελληνικά English Το 2017 είχα υλοποιήσει ένα Python script για να εμφανίζει τα τωρινά ν

Loren Kociko 1 Jun 14, 2022
A simple Streamlit App to classify swahili news into different categories.

Swahili News Classifier Streamlit App A simple app to classify swahili news into different categories. Installation Install all streamlit requirements

Davis David 4 May 1, 2022
A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

wav2vec-toolkit A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models This repository accompanies the

Anton Lozhkov 29 Oct 23, 2022
A collection of models for image - text generation in ACM MM 2021.

Bi-directional Image and Text Generation UMT-BITG (image & text generator) Unifying Multimodal Transformer for Bi-directional Image and Text Generatio

Multimedia Research 63 Oct 30, 2022
An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

GPT-NeoX An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hun

EleutherAI 3.1k Jan 8, 2023
BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural language processing.

Benjamin Heinzerling 1.1k Jan 3, 2023