Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

You might also like...
CVE-2022-21907 Vulnerability PoC

CVE-2022-21907 Description POC for CVE-2022-21907: HTTP Protocol Stack Remote Code Execution Vulnerability. create by antx at 2022-01-17, just some sm

AttractionFinder - 2022 State Qualified FBLA Attraction Finder Application

Attraction Finder Developers: Riyon Praveen, Aaron Bijoy, & Yash Vora How It Wor

Valeria stealer- - (4Feb 2022) program detects wifi saved passwords in your ROM

Valeria_stealer- Requirements : python 3.9.2 and higher (4Feb 2022) program dete

A Python script that can be used to check if a SAP system is affected by CVE-2022-22536

Vulnerability assessment for CVE-2022-22536 This repository contains a Python script that can be used to check if a SAP system is affected by CVE-2022

CVE-2022-23046 - SQL Injection Vulnerability on PhpIPAM v1.4.4
CVE-2022-23046 - SQL Injection Vulnerability on PhpIPAM v1.4.4

CVE-2022-23046 PhpIPAM v1.4.4 allows an authenticated admin user to inject SQL s

2022-bridge - Example code belonging to the Bridge pattern video

Let's Take The Bridge Pattern To The Next Level This video covers how the bridge

spring-cloud-gateway-rce  CVE-2022-22947
spring-cloud-gateway-rce CVE-2022-22947

Spring Cloud Gateway Actuator API SpEL表达式注入命令执行(CVE-2022-22947) 1.installation pip3 install -r requirements.txt 2.Usage $ python3 spring-cloud-gateway

"Video Moment Retrieval from Text Queries via Single Frame Annotation" in SIGIR 2022.

ViGA: Video moment retrieval via Glance Annotation This is the official repository of the paper "Video Moment Retrieval from Text Queries via Single F

Comments
  • Fehler im

    Fehler im "Schedule for 24 June 2022"

    Leider ist unserer Gruppe beim analysieren des Zeitplans soeben aufgefallen dass unser Gruppennamen fälschlicherweise mit Whitespace geschrieben wurde. Dies kann natürlich zu Problemen beim einlesen führen.

    14:15 Information Revival Group

    Empfehlen würden wir Information_Revival_Group.

    Vielen Dank und schöne Grüße die Information_Revival_Group

    opened by PapaDuck 0
  • The perks of using Windows - The Germanet Problem

    The perks of using Windows - The Germanet Problem

    If you are using a Windows computer there are two things you have to consider (and maybe chance in your code):

    1. for importing germanet through pathlib you might get thrown exeptions, just because Path cannot read a Windows-Path. Instead, use from pathlib import WindowsPath, copy the Path from your PC (you might have to chance the \ into / aswell) and vóila!
    data_path = str(WindowsPath('c:/Users/svcsc/Downloads/GN_V170/GN_V170/GN_V170_XML'))
    frequencylist_nouns = str(WindowsPath('c:/Users/svcsc/Downloads/GN_V170/GN_V170/FreqLists/noun_freqs_decow14_16.txt'))
    gn = Germanet(data_path)
    
    1. if you have installed and import pandarallel, it should work sufficiently. Anfortunatelly, that is not always true. So I changed it to apply: single["synonyms"] = single.apply(lambda row: gn.get_synsets_by_orthform(row["suggestion_ger"], ignorecase = True), axis=1) After that, your code should work just fine.

    Have a nice weekend :)

    opened by SvenjaCSch 0
Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
QHack-2022 - Solutions to the Coding Challenges of QHack 2022

QHack 2022 Problems from Coding Challenges 2022. Rules and how it works To test

Isacco Gobbi 1 Feb 14, 2022
CVE-2022-22536 - SAP memory pipes(MPI) desynchronization vulnerability CVE-2022-22536

CVE-2022-22536 SAP memory pipes desynchronization vulnerability(MPI) CVE-2022-22

antx 49 Nov 9, 2022
Cve-2022-23131 - Cve-2022-23131 zabbix-saml-bypass-exp

cve-2022-23131 cve-2022-23131 zabbix-saml-bypass-exp replace [zbx_signed_session

东方有鱼名为咸 135 Dec 14, 2022
Suricata Language Server is an implementation of the Language Server Protocol for Suricata signatures

Suricata Language Server is an implementation of the Language Server Protocol for Suricata signatures. It adds syntax check, hints and auto-completion to your preferred editor once it is configured.

Stamus Networks 39 Nov 28, 2022
Unsafe Twig processing of static pages leading to RCE in Grav CMS 1.7.10

CVE-2021-29440 Unsafe Twig processing of static pages leading to RCE in Grav CMS 1.7.10 Grav is a file based Web-platform. Twig processing of static p

Enox 6 Oct 10, 2022
RedDrop is a quick and easy web server for capturing and processing encoded and encrypted payloads and tar archives.

RedDrop Exfil Server Check out the accompanying MaverisLabs Blog Post Here! RedDrop Exfil Server is a Python Flask Web Server for Penetration Testers,

null 53 Nov 1, 2022
Windows Server 2016, 2019, 2022 Extracter & Recovery

Parsing files from Deduplicated volumes. It can also recover deleted files from NTFS Filesystem that were deduplicated. Installation git clone https:/

null 0 Aug 28, 2022
Proof of concept of CVE-2022-21907 Double Free in http.sys driver, triggering a kernel crash on IIS servers

CVE-2022-21907 - Double Free in http.sys driver Summary An unauthenticated attacker can send an HTTP request with an "Accept-Encoding" HTTP request he

Podalirius 71 Dec 22, 2022
This repository detects a system vulnerable to CVE-2022-21907 and protects against this vulnerability if desired

This repository detects a system vulnerable to CVE-2022-21907 and protects against this vulnerability if desired

null 26 Dec 26, 2022