A way to analyse how malware and/or goodware samples vary from each other using Shannon Entropy, Hausdorff Distance and Jaro-Winkler Distance

Overview

A way to analyse how malware and/or goodware samples vary from each other using Shannon Entropy, Hausdorff Distance and Jaro-Winkler Distance

Python version Project version Codacy Grade


UsageDownload

Introduction

ByteCog is a python script that aims to help security researchers and others a like to classify malicious software compared to other samples, depending on what the unknown file(s) is/are being tested against. This script can be extended to use a machine learning model to classify malware if you wanted to do so. ByteCog uses multiple methods of analyzing and classifying samples given to it, such as using Shannon Entropy to give a visual aspect for the researchers to look at while analyzing the code and finding possible readable code/text in a sample. ByteCog also uses Hausdorff Distance to calculate a 'raw similarity' value based on the difference in the entropy graphs of both samples, and finally ByteCog uses Jaro-Winkler Distance to calculate the 'true similarity' since the Hausdorff Distance will in most cases return a very high value if the sample is mostly the same entropy wise, so the Jaro-Winkler Distance is used to 'adjust' the simliarity value for this case of a sample.

Requirements

  • A python installation above 3.5+, which you can download from the official python website here.

Installation

Clone this repository to your local machine by following these instructions layed out here

Then proceed to download the dependencies file by running the following line in your console window

pip install -r requirements.txt

Usage

======================================================
|      ____          __         ______               |
|     / __ ) __  __ / /_ ___   / ____/____   ____    |
|    / __  |/ / / // __// _ \ / /    / __ \ / __ \   |
|   / /_/ // /_/ // /_ /  __// /___ / /_/ // /_/ /   |
|  /_____/ \__, / \__/ \___/ \____/ \____/ \__, /    |
|         /____/                          /____/     |
|                                                    |
|                    Version: 0.4                    |
|               Author: IlluminatiFish               |
======================================================

usage: bytecog.py [-h] -k KNOWN -u UNKNOWN -i IDENTIFIER -v VISUAL

Determine whether an unknown provided sample is similar to a known sample

optional arguments:
  -h, --help            show this help message and exit
  -k KNOWN, --known KNOWN
                        The file path to the known sample
  -u UNKNOWN, --unknown UNKNOWN
                        The file path to the unknown sample
  -i IDENTIFIER, --identifier IDENTIFIER
                        The antivirus identifier of the known file
  -v VISUAL, --visual VISUAL
                        If you want to show a visual representation of the file entropy

Features & Use Cases

  • Calculates sample similarity
  • Generates chunked entropy graph
  • Able to possibly detect malicious and benign software samples

Screenshots

Chunked Entropy Graph
chunk_entropy_graph

Output of ByteCog
bytecog_output

ByteCog Log File
bytecog log file

License

ByteCog - A way to analyse how malware and/or goodware samples vary from each other using Shannon Entropy, Hausdorff Distance and Jaro-Winkler Distance Copyright (c) 2021 IlluminatiFish

This program is free software; you can redistribute it and/or modify the code base under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but without ANY warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/

Acknowledgements

  • Using a modified version of @venkat-abhi's Shannon Entropy calculator to work with my project script, you can find the original one here.

  • Using the fastest method to get maximum key from a dictionary using this snippet here.

References

Entropy Wiki
Jaro-Winkler Distance Wiki
Hausdorff Distance Wiki
Shannon Calculator
Referenced Article #1
Referenced Paper #1
Referenced Paper #2
Referenced Paper #3

You might also like...
This repository is one of a few malware collections on the GitHub.
This repository is one of a few malware collections on the GitHub.

This repository is one of a few malware collections on the GitHub.

An IDA pro python script to decrypt Qbot malware string
An IDA pro python script to decrypt Qbot malware string

Qbot-Strings-Decrypter An IDA pro python script to decrypt Qbot malware strings.

Discord Token Stealer Malware Protection
Discord Token Stealer Malware Protection

TokenGuard TokenGuard, protect your account, prevent token steal. Totally free and open source Discord Server: https://discord.gg/EmwfaGuBE8 Source Co

Android Malware (Analysis | Scoring) System
Android Malware (Analysis | Scoring) System

An Obfuscation-Neglect Android Malware Scoring System Quark-Engine is also bundled with Kali Linux, BlackArch. A trust-worthy, practical tool that's r

A guide to building basic malware in Python by implementing a keylogger application
A guide to building basic malware in Python by implementing a keylogger application

Keylogger-Malware-Project A guide to building basic malware in Python by implementing a keylogger application. If you want even more detail on the Pro

Detection tool of malware(s) by checksum (useful for forensic)

🐍 malware_checker.py Detection tool of malware(s) by checksum (useful for forensic) 📦 Dependencies installation $ pip3 install -r requirements.txt

Huskee: Malware made in Python for Educational purposes
Huskee: Malware made in Python for Educational purposes

𝐇𝐔𝐒𝐊𝐄𝐄 Caracteristicas: Discord Token Grabber Wifi Passwords Grabber Googl

Fast and easy way to rollout on multiple GitLab project file a particular content.

Volatile Fast and easy way to rollout on multiple GitLab project file a particular content. Why ? After looking for a tool to simply enforce a develop

Password Manager is a simple Python project which helps users in managing their passwords in a easier way

Password Manager is a simple Python project which helps users in managing their passwords in a easier way

Owner
I am a developer that has a passion for programming, mathematics and cyber security. Currently Developer @South-Hollow
null
Malware-analysis-writeups - Some of my Malware Analysis writeups

About This repo contains some malware analysis writeups i've created over time m

Itay Migdal 14 Jun 22, 2022
DirBruter is a Python based CLI tool. It looks for hidden or existing directories/files using brute force method. It basically works by launching a dictionary based attack against a webserver and analyse its response.

DirBruter DirBruter is a Python based CLI tool. It looks for hidden or existing directories/files using brute force method. It basically works by laun

vijay sahu 12 Dec 17, 2022
Analyse a forensic target (such as a directory) to find and report files found and not found from CIRCL hashlookup public service

Analyse a forensic target (such as a directory) to find and report files found and not found from CIRCL hashlookup public service. This tool can help a digital forensic investigator to know the context, origin of specific files during a digital forensic investigation.

hashlookup 96 Dec 20, 2022
A malware to encrypt all the .txt and .jpg files in target computer using RSA algorithms

A malware to encrypt all the .txt and .jpg files in target computer using RSA algorithms. Change the Blackgound image of targets' computer. and decrypt the targets' encrypted files in our own computer

Li Ka Lok 2 Dec 2, 2022
Enhancing Twin Delayed Deep Deterministic Policy Gradient with Cross-Entropy Method

Enhancing Twin Delayed Deep Deterministic Policy Gradient with Cross-Entropy Method Hieu Trung Nguyen, Khang Tran and Ngoc Hoang Luong Setup Clone thi

Evolutionary Learning & Optimization (ELO) Lab 6 Jun 29, 2022
Malware Configuration And Payload Extraction

CAPEv2 (Python3) has now been released CAPEv2 With the imminent end-of-life for Python 2 (January 1 2020), CAPEv1 will be phased out. Please upgrade t

Context Information Security 701 Dec 27, 2022
Malware Configuration And Payload Extraction

CAPE: Malware Configuration And Payload Extraction CAPE is a malware sandbox. It is derived from Cuckoo and is designed to automate the process of mal

Kevin O'Reilly 1k Dec 30, 2022
A small utility to deal with malware embedded hashes.

Uchihash is a small utility that can save malware analysts the time of dealing with embedded hash values used for various things such as: Dyn

Abdallah Elshinbary 48 Dec 19, 2022
Android Malware Behavior Deleter

Android Malware Behavior Deleter UDcide UDcide is a tool that provides alternative way to deal with Android malware. We help you to detect and remove

null 27 Sep 23, 2022
A Feature Rich Modular Malware Configuration Extraction Utility for MalDuck

Malware Configuration Extractor A Malware Configuration Extraction Tool and Modules for MalDuck This project is FREE as in FREE ?? , use it commercial

c3rb3ru5 103 Dec 18, 2022