Data for "Driving the Herd: Search Engines as Content Influencers" paper

Overview

herding_data

Data for "Driving the Herd: Search Engines as Content Influencers" paper

Dataset description

The collection contains 2250 documents, 30 initial relevant documents (round 0) - located in initial_documents.trectext file. 2100 documents (rounds 1-5) created by competitors. 120 documents are the example documents that were manually promoted in the herding method experiments.

This dataset is divided w.r.t. the different experiments for content effect, described in the paper.

Format: trectext. DOCNO Format: ROUND- - -

Relevance Judgments (qrels):

All documents in the collection were judged for relevance. Annotators were presented with both the title and the description of each TREC topic and were asked to classify a document as relevant if it satisfies the information need stated in the description.

A document judged relevant by less than three annotators was labeled as non-relevant (0). Documents judged relevant by at least three, four or five annotators were labeled as marginally relevant (1), fairly relevant (2) and highly relevant (3), respectively. For each experiment the relevance judgment file has ".rel" suffix.

Quality judgements:

All documents in the collection where judged for quality by five annotators. Annotators were presented with the text of the document and were asked to classify the docuemnt as: (1) Valid, (2) Keyword-stuffed, (3) Spam.

A document is deemed as keyword-stuffed if it contained excessive repetition of words which seemed unnatural or artificially introduced.

A document is considered as spam if its content could not possibly satisfy any information need.

If a document is not spam or keywordstuffed, it is considered as valid. Documents judged valid by at least three, four or five annotators were labeled as marginally high-quality (1), fairly high-quality (2) and highly high-quality (3), respectively. For each experiment the quality judgment file has ".ks" suffix.

Queries

We used 30 of ClueWeb09 queries which can be downloded here: http://trec.nist.gov/data/webmain.html.

Example documents

In the herding method experiment for each query and effect an exapmle document, manifesting the desired content effect, was manually promoted to 1'st place. For each effect the example documents are located at "herding__example_documents.trectext" file. The format of document names is: DOCNO Format: ROUND-00- -EXAMPLEDOC

Subtopic effect experiment

This content effect was tested both in terms of herding and biasing approaches. For each query 2 different subtopics were tested. The subtopics were taken from ClueWeb09 subtopics list. The mapping between qid and the subtopic number which was promoted (and the actual information need manifested by the subtopic) is located at _subtopics_map.txt files (in each relevant directory separetly).

We include relevance judgemnts for each document (competing for a rankings w.r.t a query) w.r.t. to both subtopics promoted for the query. Please note that each document was tested w.r.t. a single subtopic (can be induced by the mapping file) during the experiment. The judgments are for both subtopics for analysis porpuses only. Relevance judgments w.r.t. subtopics name is " _relevance_to_subptopic.rel".

The qrels format is: " ".

Directories

Herding

Document_length_effect

The data contained in this directory is related to the documents created in the document length effect experiment (herding method).

Non_relevance_effect

The data contained in this directory is related to the documents created in the non-relevance effect experiment (herding method).

Query_terms_effect

The data contained in this directory is related to the documents created in the query terms effect experiment (herding method).

Subtopic_effect

The data contained in this directory is related to the documents created in the subtopic effect experiment (herding method).

Biasing

Subtopic_effect

The data contained in this directory is related to the documents created in the subtopic effect experiment (biasing method).

Control

The data contained in this directory is related to the documents created in the control group. That is, no expore of any kind of manipulation for this group.

Dummies

The data contained in this directory is related to the documents taken from Raifer et al '17 dataset. Dummies with docnos "DUMMY_{0,1}" where shared over all groups.

Control group and biasing groups where filled with DUMMY_2 dummies (in the docno) as well.

You might also like...
Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"

LADA This repo contains codes for the following paper: Jiaao Chen*, Zhenghui Wang*, Ran Tian, Zichao Yang, Diyi Yang: Local Additivity Based Data Augm

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization
Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization

DDAMS This is the pytorch code for our IJCAI 2021 paper Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization [Arxiv Pr

Code for the paper "Adapting Monolingual Models: Data can be Scarce when Language Similarity is High"

Wietse de Vries • Martijn Bartelds • Malvina Nissim • Martijn Wieling Adapting Monolingual Models: Data can be Scarce when Language Similarity is High

Code for CVPR 2021 oral paper
Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

DS3L This is the code for paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020. Setups The code is implem

This repo contains the code and data used in the paper
This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh

Repository for the
Repository for the "Gotta Go Fast When Generating Data with Score-Based Models" paper

Gotta Go Fast When Generating Data with Score-Based Models This repo contains the official implementation for the paper Gotta Go Fast When Generating

Owner
null
[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

ZJU-VIPA 47 Jan 9, 2023
code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning Overview This code is for paper: Not All Unlabeled Data are Equa

Jason Ren 22 Nov 23, 2022
Automatically download the cwru data set, and then divide it into training data set and test data set

Automatically download the cwru data set, and then divide it into training data set and test data set.自动下载cwru数据集,然后分训练数据集和测试数据集

null 6 Jun 27, 2022
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 18 Oct 6, 2022
Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

Benjamin Biggs 29 Dec 28, 2022
This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

null 212 Dec 25, 2022
Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous Event-Based Data"

A Differentiable Recurrent Surface for Asynchronous Event-Based Data Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous

Marco Cannici 21 Oct 5, 2022
An attempt at the implementation of GLOM, Geoffrey Hinton's paper for emergent part-whole hierarchies from data

GLOM TensorFlow This Python package attempts to implement GLOM in TensorFlow, which allows advances made by several different groups transformers, neu

Rishit Dagli 32 Feb 21, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

THUDM 101 Dec 16, 2022
The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

Dice Loss for NLP Tasks This repository contains code for Dice Loss for Data-imbalanced NLP Tasks at ACL2020. Setup Install Package Dependencies The c

null 223 Dec 17, 2022