Quantifiers and Negations in RE Documents

Nicolas Ruscher

Last update: Feb 1, 2022

Related tags

Text Data & NLP Quantifiers-and-Negations-in-RE-Documents

Overview

Quantifiers-and-Negations-in-RE-Documents

This project was part of my work for a seminar at the Technical University of Munich (TUM) during my bachelor studies in 2019. The python project can be used to find quantifiers and negations in documents. It searches for problematic findings. Problematic findings are i.e. sentences that use specific combinations of quantifiers and negations that are ambiguous. This means there are multiple valid interpretations of the sentence. It can extract those and report them.

Motivation:

You want to avoid ambiguous sentences as they can cause problems that are hard to find and possibly hard to fix. This is especially the case for technical specifications and similar use cases. In this project we compare two different approaches to finding ambiguous sentences:

String based search
NLP based search

We want to find out if the computational overhead of using NLP gives better results than standard string based search methods.

Features:

Detect quantifiers and negations in .xml or .txt documents
Search either by a string based search or by NLP based search (using Stanfords CoreNLP library [1])
Extract possibly ambiguous sentences
Compare string search results with NLP search results

Prerequisites:

Java 8 or higher
Python 3.6 or higher as project interpreter
Stanford Corenlp library: https://stanfordnlp.github.io/CoreNLP/download.html
Environment variable "CORENLP_HOME" set to where the CoreNLP library is stored

References:

[1] Christopher D.Manning, MihaiSurdeanu, JohnBauer, JennyFinkel, StevenJ.Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55–60, 2014.

Quantifiers and Negations in RE Documents

Related tags

Overview

Quantifiers-and-Negations-in-RE-Documents

You might also like...

Auto-researching tool generating word documents.

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

Owner

Nicolas Ruscher

Python implementation of TextRank for phrase extraction and summarization of text documents

A full spaCy pipeline and models for scientific/biomedical documents.

Module for automatic summarization of text documents and HTML pages.

Python implementation of TextRank for phrase extraction and summarization of text documents

A full spaCy pipeline and models for scientific/biomedical documents.

Search for documents in a domain through Google. The objective is to extract metadata

texlive expressions for documents

Extracting Summary Knowledge Graphs from Long Documents

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

Python utility library for compositing PDF documents with reportlab.