My own Unicode compression algorithm

Overview

Zee Code

ZCode is a custom compression algorithm I originally developed for a competition held for the Spring 2019 Datastructures and Algorithms course of Dr. Mahdi Safarnejad-Boroujeni at Sharif University of Technology, at which I became first-place. The code is pretty slow and has a lot of room for optimization, but it is pretty readable. It can be an excellent educational resource for whoever is starting on compression algorithms.

The algorithm is a cocktail of classical compression algorithms mixed and served for Unicode documents. It hinges around the LZW algorithm to create a finite size symbol dictionary; the results are then byte-coded into variable-length custom symbols, which I call zee codes! Finally, the symbol table is truncated accordingly, and the compressed document is encoded into a byte stream.

Huffman trees highly inspire zee codes, but because in normal texts, symbols are usually much more uniformly distributed than the original geometrical (or exponential) distribution assumption for effective Huffman coding, the gains of using variable-sized byte-codes both from an implementation and performance perspective outweighed bit Huffman encodings. Results may vary, but my tests showed a steady ~4-5x compression ratio on Farsi texts, which is pretty nice!

Installation

ZCode is available on pip, and only requires a 3.6 or higher python installation beforehand.

pip install -U zcode

Usage

You can run the algorithm for any utf-8 encoded file using the zcode command. It will automatically decompress files ending with a .zee extensions and compress others into .zee files, but you can always override the default behavior by providing optional arguments like:

zcode INPUTFILE [--output OUTPUT_FILE --action compress/decompress --symbol-size SYMBOL_SIZE --code-size CODE_SIZE]

The symbol-size argument controls the algorithms' buffer size for processing symbols (in bytes). It is automatically set depending on your input file size but you can change it as you wish. code-size controls the maximum length for coded bytes while encoding symbols (this equals to 2 by default and needs to be provided to the algorithm upon decompression).

LICENSE

MIT LICENSE, see vahidzee/zcode/LICENSE

You might also like...
Search algorithm implementations meant for teaching

Search-py A collection of search algorithms for teaching and experimenting. Non-adversarial Search There’s a heavy separation of concerns which leads

8-puzzle-solver with UCS, ILS, IDA* algorithm
8-puzzle-solver with UCS, ILS, IDA* algorithm

Eight Puzzle 8-puzzle-solver with UCS, ILS, IDA* algorithm pre-usage requirements python3 python3-pip virtualenv prepare enviroment virtualenv -p pyth

A Python Package for Portfolio Optimization using the Critical Line Algorithm

A Python Package for Portfolio Optimization using the Critical Line Algorithm

🧬 Training the car to do self-parking using a genetic algorithm
🧬 Training the car to do self-parking using a genetic algorithm

🧬 Training the car to do self-parking using a genetic algorithm

A raw implementation of the nearest insertion algorithm to resolve TSP problems in a TXT format.

TSP-Nearest-Insertion A raw implementation of the nearest insertion algorithm to resolve TSP problems in a TXT format. Instructions Load a txt file wi

A GUI visualization of QuickSort algorithm
A GUI visualization of QuickSort algorithm

QQuickSort A simple GUI visualization of QuickSort algorithm. It only uses PySide6, it does not have any other external dependency. How to run Install

A python implementation of the Basic Photometric Stereo Algorithm
A python implementation of the Basic Photometric Stereo Algorithm

Photometric-Stereo A python implementation of the Basic Photometric Stereo Algorithm Result Usage run Photometric_Stereo.py Code Tree |data #原始数据,tga格

Python package to monitor the power consumption of any algorithm

CarbonAI This project aims at creating a python package that allows you to monitor the power consumption of any python function. Documentation The com

marching rectangles algorithm in python with clean code.

Marching Rectangles marching rectangles algorithm in python with clean code. Tools Python 3 EasyDraw Creators Mohammad Dori Run the Code Installation

Releases(v0.0.1)
Owner
Vahid Zehtab
Applied & Theoretical Deep Learning Researcher currently studying my last semester of Compute Engineering Bachelors program at Sharif University of Technology
Vahid Zehtab
A fast python implementation of the SimHash algorithm.

This Python package provides hashing algorithms for computing cohort ids of users based on their browsing history. As such, it may be used to compute cohort ids of users following Google's Federated Learning of Cohorts (FLoC) proposal.

Hybrid Theory 19 Dec 15, 2022
A genetic algorithm written in Python for educational purposes.

Genea: A Genetic Algorithm in Python Genea is a Genetic Algorithm written in Python, for educational purposes. I started writing it for fun, while lea

Dom De Felice 20 Jul 6, 2022
TikTok X-Gorgon & X-Khronos Generation Algorithm

TikTok X-Gorgon & X-Khronos Generation Algorithm X-Gorgon and X-Khronos headers are required to call tiktok api. I will provide you API as rental or s

TikTokMate 31 Dec 1, 2022
A minimal implementation of the IQRM interference flagging algorithm for radio pulsar and transient searches

A minimal implementation of the IQRM interference flagging algorithm for radio pulsar and transient searches. This module only provides the algorithm that infers a channel mask from some spectral statistic that measures the level of RFI contamination in a time-frequency data block. It should be useful as a reference implementation to developers who wish to integrate IQRM into an existing pipeline / search code.

Vincent Morello 6 Nov 29, 2022
Greedy Algorithm-Problem Solving

MAX-MIN-Hackrrank-Python-Solution Greedy Algorithm-Problem Solving You will be given a list of integers, , and a single integer . You must create an a

Mahesh Nagargoje 3 Jul 13, 2021
An implementation of ordered dithering algorithm in python as multimedia course project

One way of minimizing the size of an image is to simply reduce the number of bits you use to represent each pixel.

null 7 Dec 2, 2022
Fedlearn algorithm toolkit for researchers

Fedlearn algorithm toolkit for researchers

null 89 Nov 14, 2022
A custom prime algorithm, implementation, and performance code & review

Colander A custom prime algorithm, implementation, and performance code & review Pseudocode Algorithm 1. given a number of primes to find, the followi

Finn Lancaster 3 Dec 17, 2021
Sorting Algorithm Visualiser using pygame

SortingVisualiser Sorting Algorithm Visualiser using pygame Features Visualisation of some traditional sorting algorithms like quicksort and bubblesor

null 4 Sep 5, 2021
This project is an implementation of a simple K-means algorithm

Simple-Kmeans-Clustering-Algorithm Abstract K-means is a centroid-based algorithm, or a distance-based algorithm, where we calculate the distances to

Saman Khamesian 7 Aug 9, 2022