A production-ready pipeline for text mining and subject indexing

Overview

DocuMiner

A production-ready pipeline for text mining and subject indexing

Want to Contribute?

More code and documentation coming soon.

Authors

Open Source Club

Comments
  • Add sentiment analysis functionality

    Add sentiment analysis functionality

    Several functions take in various lengths of text and return the subjectivity/objectivity score as a decimal. Example strings are already implemented to display functionality.

    • sentenceSubjectivity: given a string, it will use the textBlob library's functionality to give a subjectivity score to that string
    • averageSujectivity: given a list of strings, will use sentence subjectivity function to compute the average score of all of the passed in strings
    opened by MesaJonathan 3
  • Notifications

    Notifications

    Description

    Notify users via text or email of changes to their uploaded stash of documents. This is meant for a group contribution where more than one person can upload extra or delete documents from an initial stash of documents afterwards.

    Objectives

    1. Find a suitable Python API (highly recommend Twilio).
    2. Test a single SMS message.
    3. Test a single email.
    enhancement 
    opened by Fennec2000GH 0
  • Optical Character Recognition

    Optical Character Recognition

    Description

    Perform OCR on images of text to recognize and transform the text into digital format.

    Objectives

    1. Familiarize with the functions of a library e.g. pytesseract.
    2. Write a wrapper function that grayscales the image and then utilizes the appropriate OCR function.
    3. Not necessary but may help: add more steps for image preprocessing such as denoising, if that improves OCR accuracy.
    enhancement 
    opened by Fennec2000GH 0
  • Frontend UI

    Frontend UI

    Description

    Create basic frontend UI with widgets for file upload.

    Objectives

    • [x] pip install streamlit
    • [ ] Add title for project with a logo.
    • [x] Add file upload widget with label. Allowed extensions are currently .txt, doc, docx, pdf.
    • [x] Add button for wiki creation.
    good first issue 
    opened by Fennec2000GH 0
  • Keyword Mining

    Keyword Mining

    Description

    Extract keywords from a document.

    Objectives

    Use a stable library such as KeyBERT or an established (and sometimes more simple) algorithm like TF-IDF.

    opened by Fennec2000GH 0
Owner
UF Open Source Club
Open Source Club at the University of Florida
UF Open Source Club
Text to ASCII and ASCII to text

Text2ASCII Description This python script (converter.py) contains two functions: encode() is used to return a list of Integer, one item per character

null 4 Jan 22, 2022
A Python app which can convert normal text to Handwritten text.

Text to HandWritten Text ✍️ Converter Watch Tutorial for this project Usage:- Clone my repository. Open CMD in working directory. Run following comman

Kushal Bhavsar 5 Dec 11, 2022
strbind - lapidary text converter for translate an text file to the C-style string

strbind strbind - lapidary text converter for translate an text file to the C-style string. My motivation is fast adding large text chunks to the C co

Mihail Zaytsev 1 Oct 22, 2021
A python tool to convert Bangla Bijoy text to Unicode text.

Unicode Converter A python tool to convert Bangla Bijoy text to Unicode text. Installation Unicode Converter can be installed via PyPi. Make sure pip

Shahad Mahmud 10 Sep 29, 2022
TextStatistics - Get a text file wich contains English text

TextStatistics This program get a text file wich contains English text. The program analyses the text, and print some information. For this program I

null 2 Nov 15, 2021
Redlines produces a Markdown text showing the differences between two strings/text

Redlines Redlines produces a Markdown text showing the differences between two strings/text. The changes are represented with strike-throughs and unde

Houfu Ang 2 Apr 8, 2022
Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

Samuel Dobbie 146 Dec 18, 2022
🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙‍♀️

?? Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! ??‍♀️

Brandon 5.6k Jan 3, 2023
Fixes mojibake and other glitches in Unicode text, after the fact.

ftfy: fixes text for you >>> print(fix_encoding("(ง'⌣')ง")) (ง'⌣')ง Full documentation: https://ftfy.readthedocs.org Testimonials “My life is li

Luminoso Technologies, Inc. 3.4k Jan 8, 2023
a python package that lets you add custom colors and text formatting to your scripts in a very easy way!

colormate Python script text formatting package What is colormate? colormate is a python library that lets you add text formatting to your scripts, it

Rodrigo 2 Dec 14, 2022
Code Jam for creating a text-based adventure game engine and custom worlds

Text Based Adventure Jam Author: Devin McIntyre Our goal is two-fold: Create a text based adventure game engine that can parse a standard file format

HTTPChat 4 Dec 26, 2021
This is REST-API for Indonesian Text Summarization using Non-Negative Matrix Factorization for the algorithm to summarize documents and FastAPI for the framework.

Indonesian Text Summarization Using FastAPI This is REST-API for Indonesian Text Summarization using Non-Negative Matrix Factorization for the algorit

Viqi Nurhaqiqi 2 Nov 3, 2022
A program that looks through entered text and replaces certain commands with mathematical symbols

TextToSymbolConverter A program that looks through entered text and replaces certain commands with mathematical symbols Example: Syntax: Enter text in

null 1 Jan 2, 2022
Add your new words to a text file and get them randomly.

Memorize-New-Words In this very very very little project, I've wrote a code to memorize new english words. Therefore you can add the words and their m

Mostafa 2 Jul 4, 2022
Convert text to morse code and play morse code sound.

Convert text(english) to morse codes and play morse sound!

Mohammad Dori 5 Jul 15, 2022
A neat little program to read the text from the "All Ten Fingers" program, and write them back.

ATFTyper A neat little program to read the text from the "All Ten Fingers" program, and write them back. How does it work? This program uses the Pillo

null 1 Nov 26, 2021
Deasciify-highlighted - A Python script for deasciifying text to Turkish and copying clipboard

deasciify-highlighted is a Python script for deasciifying text to Turkish and copying clipboard.

Ümit Altıntaş 3 Mar 18, 2022
A python Tk GUI that creates, writes text and attaches images into a custom spreadsheet file

A python Tk GUI that creates, writes text and attaches images into a custom spreadsheet file

Mirko Simunovic 13 Dec 9, 2022
Paranoid text spacing in Python

pangu.py Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width charact

Vinta Chen 194 Nov 19, 2022