Code for reproducing our paper: LMSOC: An Approach for Socially Sensitive Pretraining

Twitter Research

Last update: Dec 20, 2022

Related tags

Deep Learning machine-learning natural-language-processing social-network transformers network-analysis graph-machine-learning contextual-embeddings

Overview

LMSOC: An Approach for Socially Sensitive Pretraining

Code for reproducing the paper LMSOC: An Approach for Socially Sensitive Pretraining to appear at 2021 Conference on Empirical Methods in Natural Language Processing: Findings.

Abstract

While large-scale pretrained language models have been shown to learn effective linguistic representations for many NLP tasks, there remain many real-world contextual aspects of language that current approaches do not capture. For instance, consider a cloze-test "I enjoyed the ____ game this weekend": the correct answer depends heavily on where the speaker is from, when the utterance occurred, and the speaker's broader social milieu and preferences. Although language depends heavily on the geographical, temporal, and other social contexts of the speaker, these elements have not been incorporated into modern transformer-based language models. We propose a simple but effective approach to incorporate speaker social context into the learned representations of large-scale language models. Our method first learns dense representations of social contexts using graph representation learning algorithms and then primes language model pretraining with these social context representations. We evaluate our approach on geographically-sensitive language-modeling tasks and show a substantial improvement (more than 100% relative lift on MRR) compared to baselines.

Citation

Please cite as:

Kulkarni, V., Mishra, S., & Haghighi, A. (2021). LMSOC: An Approach for Socially Sensitive Pretraining. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: Findings. arXiv

@inproceedings{kulkarni2021lmsoc,
  title={LMSOC: An Approach for Socially Sensitive Pretraining},
  author={Kulkarni, Vivek and Mishra, Shubhanshu and Haghighi, Aria},
  booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: Findings},
  year={2021}
  address={Online},
  publisher={Association for Computational Linguistics},
  pages={1--9},
  eprint={2110.10319},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

Reproducibility

NOTE: Dependencies are specified in the notebooks. But we have also encluded an requirements.txt and environment.yml files to install dependencies using pip or conda.

Create Social Context Embeddings via the example notebook embed_time_toy_task.ipynb which contains the implementation of how to embed time for Task 1 in the paper.
Upload the files in data/ to the location where you will run the next notebook.
The notebook lmsoc_train_and_eval_toy_task.ipynb contains the LMSOC training code.
- NOTE: This notebook assumes you have already trained social context embeddings for the data you have (for example, here the social context is time).
- It is a runnable colab notebook which demonstrates the entire process of training and evaluating LMSOC as described in the paper.
- If run, it will reproduce the experimental setup for Task 1 and ultimately yield Figure 2.
- In order to run this notebook in colab, open this notebook in Google Colab and upload the files in "data" directory to your colab workspace.

Security Issues?

Please report sensitive security issues via Twitter's bug-bounty program (https://hackerone.com/twitter) rather than GitHub.

You might also like...

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

BMW-Anonymization-Api Data privacy and individuals’ anonymity are and always have been a major concern for data-driven companies. Therefore, we design

148 Dec 21, 2022

Location-Sensitive Visual Recognition with Cross-IOU Loss

The trained models are temporarily unavailable, but you can train the code using reasonable computational resource. Location-Sensitive Visual Recognit

146 Dec 25, 2022

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral) This repo is the official imp

46 Dec 21, 2022

A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning

Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

1 Nov 19, 2021

Pull sensitive data from users on windows including discord tokens and chrome data.

⭐ For a 🍪 Pegasus Pull sensitive data from users on windows including discord tokens and chrome data. Features 🟩 Discord tokens 🟩 Geolocation data

44 Dec 31, 2022

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

ImageNet-21K Pretraining for the Masses Paper | Pretrained models Official PyTorch Implementation Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, Lihi Zelni

574 Jan 2, 2023

Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

RealTime Sign Language Detection using Action Recognition Approach Real-Time Sign Language is commonly predicted using models whose architecture consi

15 Aug 20, 2022

Code for generating a single image pretraining dataset

Single Image Pretraining of Visual Representations As shown in the paper A critical analysis of self-supervision, or what we can learn from a single i

12 Dec 19, 2022

Code for Efficient Visual Pretraining with Contrastive Detection

Code for DetCon This repository contains code for the ICCV 2021 paper "Efficient Visual Pretraining with Contrastive Detection" by Olivier J. Hénaff,

56 Nov 13, 2022

Code for reproducing our paper: LMSOC: An Approach for Socially Sensitive Pretraining

Related tags

Overview

LMSOC: An Approach for Socially Sensitive Pretraining

Abstract

Citation

Reproducibility

Security Issues?

You might also like...

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

Location-Sensitive Visual Recognition with Cross-IOU Loss

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning

Pull sensitive data from users on windows including discord tokens and chrome data.

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

Code for generating a single image pretraining dataset

Code for Efficient Visual Pretraining with Contrastive Detection

Owner

Twitter Research

[ICCV'21] Official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Reproducing code of hair style replacement method from Barbershorp.

The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

Repository for reproducing `Model-Based Robust Deep Learning`

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.