kawadi is a versatile tool that used as a form of weapon and is used to cut, shape and split wood.

Jay Vala

Last update: Jan 10, 2022

Related tags

Overview

kawadi

kawadi (કવાડિ in Gujarati) (Axe in English) is a versatile tool that used as a form of weapon and is used to cut, shape and split wood.

kawadi is collection of small tools that I found useful for me more often. Currently it contains text search which searches a string inside another string.

Text Search

Text search in kawadi uses sliding window technique to search for a word or phrase in another text. The step size in the sliding window is 1 and the window size is the size of the word/phrase we are interested in.

For example, if the text we are interested in searching is "The big brown fox jumped over the lazy dog" and the word that we want to search is "brown fox".

len(["brown", ["fox"]]) slides = sliding_window(text, window_size) -> ['The', 'big']['big', 'brown']['brown', 'fox']['fox', 'jumped']['jumped', 'over']['over', 'the']['the', 'lazy']['lazy', 'dog'] for each slide in slides score(" ".join(slide), interested_word) if score >= threshold then select slide else continue ">

text = "The big brown fox jumped over the lazy dog"
interested_word = "brown fox"
window_size = len(interested.split()) -> len(["brown", ["fox"]])

slides = sliding_window(text, window_size) -> ['The', 'big']['big', 'brown']['brown', 'fox']['fox', 'jumped']['jumped', 'over']['over', 'the']['the', 'lazy']['lazy', 'dog']

for each slide in slides
  score(" ".join(slide), interested_word)
  if score >= threshold then
    select slide
  else
    continue

Currently, there are 3 similarity scores are calculated and averaged to calculate the final score. These similarity scores are Cosine, JaroWinkler and Normalized Levinstine similarities.

In development

Add functionality to accept custom user similarity metrics.
[] Generate documentation.
[] Write the custom counter

You can follow the project development in the Projects tab.

Quick Start

from kawadi.text_search import SearchInText

search = SearchInText()

text_to_find = "String distance algorithm"
text_to_search = """SIFT4 is a general purpose string distance algorithm inspired by JaroWinkler and Longest Common Subsequence. It was developed to produce a distance measure that matches as close as possible to the human perception of string distance. Hence it takes into account elements like character substitution, character distance, longest common subsequence etc. It was developed using experimental testing, and without theoretical background."""

result = search.find_in_text(text_to_find, text_to_search)

print(result)
[
    {
        "sim_score": 1.0,
        "searched_text": "string distance algorithm",
        "to_find": "string distance algorithm",
        "start": 27,
        "end": 52,
    }
]

If the text that needs to be searched is big, SearchInText can utilize multiprocessing to make the search fast.

from kawadi.text_search import SearchInText

search = SearchInText(multiprocessing=True, max_workers=8)

Custom user defined score calculation.

Its often the case that the provided string similarity score is not enough for the use case that you may have. For this very case, you can add, your own score calculation.

from kawadi.text_search import SearchInText


def my_custom_fun(**kwargs):

  slide_of_text:str = kwargs["slide_of_text"]
  text_to_find:str = kwargs["text_to_find"]

  # Here you can then go on to do preprocessing if you like,
  # or use them to count char based n-gram string matching scores.

  return score: float

search = SearchInText(search_threshold=0.9, custom_score_func= your custom func)

This custom score function will have access to two things slide_of_text for every slide in text (From the example above, "The big", "big brown" and so on...) and text_to_find.

Note: The return type of this custom function should be same as the type of search_threshold as you can see from the above example.

Installation

Stable Release: pip install kawadi
Development Head: pip install git+https://github.com/jdvala/kawadi.git

Development

See CONTRIBUTING.md for information related to developing the code.

Free software: MIT license

Comments

:sparkles: Accept custom user defined score functions
This pull request adds feature to accept user defined custom score function for calculating string similarity.

Pull request recommendations:

[x] Name your pull request your-development-type/short-description. Ex: feature/read-tiff-files

[ ] Link to any relevant issue in the PR description. Ex: Resolves [gh-12], adds tiff file format support

[x] Provide context of changes.

[x] Provide relevant tests for your feature or bug fix.

[x] Provide or update documentation for any feature added by your pull request.

Thanks for contributing!
opened by jdvala 1

Releases(0.0.2)

0.0.2(Oct 31, 2021)
What's Changed

:memo: Update README.md by @jdvala in https://github.com/jdvala/kawadi/pull/1

:memo: Fix README.md by @jdvala in https://github.com/jdvala/kawadi/pull/2

:memo: Rename repo to Kawadi by @jdvala in https://github.com/jdvala/kawadi/pull/3

:sparkles: Accept custom user defined score functions by @jdvala in https://github.com/jdvala/kawadi/pull/4

:memo: update setup.py to read README for long description. by @jdvala in https://github.com/jdvala/kawadi/pull/5

New Contributors

@jdvala made their first contribution in https://github.com/jdvala/kawadi/pull/1

Full Changelog: https://github.com/jdvala/kawadi/commits/0.0.2
Source code(tar.gz)
Source code(zip)
0.0.1(Oct 31, 2021)
What's Changed

:memo: Update README.md by @jdvala in https://github.com/jdvala/kawadi/pull/1

:memo: Fix README.md by @jdvala in https://github.com/jdvala/kawadi/pull/2

:memo: Rename repo to Kawadi by @jdvala in https://github.com/jdvala/kawadi/pull/3

:sparkles: Accept custom user defined score functions by @jdvala in https://github.com/jdvala/kawadi/pull/4

:memo: update setup.py to read README for long description. by @jdvala in https://github.com/jdvala/kawadi/pull/5

New Contributors

@jdvala made their first contribution in https://github.com/jdvala/kawadi/pull/1

Full Changelog: https://github.com/jdvala/kawadi/commits/0.0.1
Source code(tar.gz)
Source code(zip)

kawadi is a versatile tool that used as a form of weapon and is used to cut, shape and split wood.

Related tags

Overview

kawadi

Text Search

In development

You can follow the project development in the Projects tab.

Quick Start

Custom user defined score calculation.

Installation

Development

You might also like...

A simple tool to extract python code from a Jupyter notebook, and then run pylint on it for static analysis.

A python tool give n number of inputs and parallelly you will get a output by separetely

NetConfParser is a tool that helps you analyze the rpcs coming and going from a netconf client to a server

Basic loader is a small tool that will help you generating Cloudflare cookies

Python tool to check a web applications compliance with OWASP HTTP response headers best practices

A tool for testing improper put method vulnerability

Tool for generating Memory.scan() compatible instruction search patterns

Stubmaker is an easy-to-use tool for generating python stubs.

PyHook is an offensive API hooking tool written in python designed to catch various credentials within the API call.

Comments

:sparkles: Accept custom user defined score functions

Releases(0.0.2)

0.0.2(Oct 31, 2021)

What's Changed

New Contributors

0.0.1(Oct 31, 2021)

What's Changed

New Contributors

Owner

Jay Vala

Audio Steganography is a technique used to transmit hidden information by modifying an audio signal in an imperceptible manner.

Multipurpose Growtopia Server tools, can be used for newbie to learn things.

Prime Path Generator is a prime path generator used to generate prime paths.

Dill_tils is a package that has my commonly used functions inside it for ease of use.

An extremely simple package with a single utillity class used for gracefully handling POSIX shutdown signals.

Handy Tool to check the availability of onion site and to extract the title of submitted onion links.

This tool lets you perform some quick tasks for CTFs and Pentesting.

Simple Python tool that generates a pseudo-random password with numbers, letters, and special characters in accordance with password policy best practices.

A simple tool to move and rename Nvidia Share recordings to a more sensible format.

The Black shade analyser and comparison tool.