For the paper entitled ''A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining''

Last update: Nov 10, 2021

Related tags

Deep Learning cross-lingual-opinion-mining

Overview

Summary

This is the source code for the paper "A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining", which was accepted as full paper with oral presentation at the 13th International Conference on Knowledge Discovery and Information Retrieval (KDIR).

Workflow

Fig. 1. Plain text is first tokenized into sentences and passed to topic modeling and sentiment analysis. Topic modeling involves (1) converting sentences of both languages into embeddings with XLING, (2) clustering all embeddings with K-means and (3) deriving a topic label of each cluster. Sentiment analysis is performed using Textblob. Topic and sentiment scores are aggregated for the analysis.

Documentation

XLING-simple-example.ipynb: shows simple examples for converting English and German sentences into sentence embeddings and the cosine similarity between them.
article_based_topic_modeling_review.ipynb: demonstrates how sentences in a single article are clustered into different topics and summarised the corresponding sentiment distribution.
cosine_similarity.ipynb: analyses the distribution of cosine similarity of sentences per topic.
create_article_wise_csv.ipynb: creates data file which contains sentiments and topic assignment for every sentence in a single article for further analysis.
radar_plot_final.ipynb: displays topic distribution per data source and document type.
radarfactory.py: contains Python class for generating radar plot.
regenerate_sentences_metadata_json.ipynb: helps to gather all related measurements and create an overall data file for analysis.
sankey_plot_final.ipynb: generates the final sankey plot which shows the flow of topic distribution for increasing number of topics for a fixed among of input sentences.
sankey_plot_k_clusters.ipynb: creates the first version of sankey plots.
sentence_posting_time.ipynb: handles minor issues on sentences posting time.
senti_util.py: includes utility class for sentiment analysis.
sentiment.py: is a Python class for assigning SentiWordNet sentiment.
sentiwordnet_vs_textblob.ipynb: shows distributions of Textblob sentiment and SentiWordNet sentiment.
simple_distribution_of_sentiment.ipynb: includes detail version of sentiment distribution.
time_related_distribution_of_sentiment.ipynb: indicates the change of sentiment of news and that of the responses from readers.
top_sentences_per_clusters.ipynb: lists the top sentences per topic.
top_words.ipynb: records top words per topic.
util.py: is a Python utility class with simple functions for analysis.

You might also like...

This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".

CORA This is the official implementation of the following paper: Akari Asai, Xinyan Yu, Jungo Kasai and Hannaneh Hajishirzi. One Question Answering Mo

59 Dec 28, 2022

PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

2.7k Dec 27, 2022

For AILAB: Cross Lingual Retrieval on Yelp Search Engine

Cross-lingual Information Retrieval Model for Document Search Train Phase CUDA_VISIBLE_DEVICES="0,1,2,3" \ python -m torch.distributed.launch --nproc_

104 Nov 12, 2022

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

WECHSEL Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models. arXiv: https://arx

45 Dec 29, 2022

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

22 Dec 8, 2022

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group

8.4k Jan 3, 2023

Comments

Added README.md and LICENSE

Dear Clara, can you please merge the given changes? In addition, please add one sentence for each bullet point in the README.md. Thank you very, very much :-)!

opened by ghagerer 0

For the paper entitled ''A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining''

Related tags

Overview

Summary

Workflow

Documentation

You might also like...

This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".

PyTorch original implementation of Cross-lingual Language Model Pretraining.

For AILAB: Cross Lingual Retrieval on Yelp Search Engine

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Comments

Added README.md and LICENSE

Owner

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation.

Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

Code for the AAAI 2022 paper "Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-Sentence Dependency Graph".

Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

Meta Representation Transformation for Low-resource Cross-lingual Learning

Paddle implementation for "Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation" (NAACL 2021)