Data and code accompanying the paper Politics and Virality in the Time of Twitter

Cardiff NLP

Last update: Jul 2, 2022

Related tags

Data Analysis politics-and-virality-twitter

Overview

Politics and Virality in the Time of Twitter

Data and code accompanying the paper Politics and Virality in the Time of Twitter.

In specific:

the code used for the training of our models (./code/finetune_models.py and ./code/finetune_multi_cv.py)
a Jupyter Notebook containing the major parts of our analysis (./code/analysis.ipynb)
the model that was selected and used for the sentiment analysis.
the manually annotated data used for training are shared (./data/annotation/).
the ids of tweets that were used in our analyis and control experiments (./data/main/ & ./data/control)
names, parties and handles of the MPs that were tracked (./data/mps_list.csv).

Annotated Data (./data/annotation/)

One folder for each language (English, Spanish, Greek).
In each directory there are three files:
1. *_900.csv contains the 900 tweets that annotators labelled individually (300 tweets each annotator).
2. *_tiebreak_100.csv contains the initial 100 tweets all annotators labelled. 'annotator_3' indicates the annotator that was used as a tiebreaker.
3. *_combined.csv contains all tweets labelled for the language.

Model

While we plan to upload all the models trained for our experiments to huggingface.co, currently only the main model used in our analysis can be currently be find at: https://drive.google.com/file/d/1_Ngmh-uHGWEbKHFpKmQ1DhVf6LtDTglx/view?usp=sharing

The model, 'xlm-roberta-sentiment-multilingual', is based on the implementation of 'cardiffnlp/twitter-xlm-roberta-base-sentiment' while being further finetuned on the annotated dataset.

Example usage

from transformers import AutoModelForSequenceClassification, pipeline
model = AutoModelForSequenceClassification.from_pretrained('./xlm-roberta-sentiment-multilingual/')
sentiment_analysis_task = pipeline("sentiment-analysis", model=model, tokenizer="cardiffnlp/twitter-xlm-roberta-base-sentiment")

sentiment_analysis_task('Today is a good day')
Out: [{'label': 'Positive', 'score': 0.978614866733551}]

Reference paper

For more details, please check the reference paper. If you use the data contained in this repository for your research, please cite the paper using the following bib entry:

@inproceedings{antypas2022politics,
  title={{Politics and Virality in the Time of Twitter: A Large-Scale Cross-Party Sentiment Analysis in Greece, Spain and United Kingdom}},
  author={Antypas, Dimosthenis and Preece, Alun and Camacho-Collados, Jose},
  booktitle={arXiv preprint arXiv:2202.00396},
  year={2022}
}

You might also like...

fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

359 Dec 22, 2022

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

791 Jan 4, 2023

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

🧪📈 🐍. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

97 Dec 8, 2022

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

2 Nov 20, 2021

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

1 Jan 19, 2022

Python data processing, analysis, visualization, and data operations

Python This is a Python data processing, analysis, visualization and data operations of the source code warehouse, book ISBN: 9787115527592 Descriptio

1 Jan 16, 2022

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

27 Nov 1, 2022

A data parser for the internal syncing data format used by Fog of World.

A data parser for the internal syncing data format used by Fog of World. The parser is not designed to be a well-coded library with good performance, it is more like a demo for showing the data structure.

40 Dec 12, 2022

Functional Data Analysis, or FDA, is the field of Statistics that analyses data that depend on a continuous parameter.

Functional Data Analysis Python package

Grupo de Aprendizaje Automático - Universidad Autónoma de Madrid

184 Dec 27, 2022

Data and code accompanying the paper Politics and Virality in the Time of Twitter

Related tags

Overview

Politics and Virality in the Time of Twitter

Annotated Data (./data/annotation/)

Model

Example usage

Reference paper

You might also like...

fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Python data processing, analysis, visualization, and data operations

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

A data parser for the internal syncing data format used by Fog of World.

Functional Data Analysis, or FDA, is the field of Statistics that analyses data that depend on a continuous parameter.

Owner

Cardiff NLP

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python

First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we want to understand column level lineage and automate impact analysis.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Integrate bus data from a variety of sources (batch processing and real time processing).

ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

PyEmits, a python package for easy manipulation in time-series data.

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

An Integrated Experimental Platform for time series data anomaly detection.