HashDB is a community-sourced library of hashing algorithms used in malware.

Related tags

Algorithms hashdb
Overview

overview_hashdb

AWS Deploy Chat Support

HashDB

HashDB is a community-sourced library of hashing algorithms used in malware.

How To Use HashDB

HashDB can be used as a stand alone hashing library, but it also feeds the HashDB Lookup Service run by OALabs. This service allows analysts to reverse hashes and retrieve hashed API names and string values.

Stand Alone Module

HashDB can be cloned and used in your reverse engineering scripts like any standard Python module. Some example code follows.

>>> import hashdb
>>> hashdb.list_algorithms()
['crc32']
>>> hashdb.algorithms.crc32.hash(b'test')
3632233996

HashDB Lookup Service

OALabs run a free HashDB Lookup Service that can be used to query a hash table for any hash listed in the HashDb library. Included in the hash tables are the complete set of Windows APIs as well as a many common strings used in malware. You can even add your own strings!

HashDB IDA Plugin

The HashDB lookup service has an IDA Pro plugin that can be used to automate hash lookups directly from IDA! The client can be downloaded from GitHub here.

How To Add New Hashes

HashDB relies on community support to keep our hash library current! Our goal is to have contributors spend no more than five minutes adding a new hash, from first commit, to PR. To achieve this goal we offer the following streamlined process.

  1. Make sure the hash algorithm doesn’t already exist… we know that seems silly but just double check.

  2. Create a branch with a descriptive name.

  3. Add a new Python file to the /algorithms directory with the name of your hash algorithm. Try to use the official name of the algorithm, or if it is unique, use the name of the malware that it is unique to.

  4. Use the following template to setup your new hash algorithm. All fields are mandatory and case sensitive.

    #!/usr/bin/env python
    
    DESCRIPTION = "your hash description here"
    # Type can be either 'unsigned_int' (32bit) or 'unsigned_long' (64bit)
    TYPE = 'unsigned_int'
    # Test must match the exact has of the string 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'
    TEST_1 = hash_of_string_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
    
    
    def hash(data):
        # your hash code here
  5. Double check your Python style, we use Flake8 on Python 3.9. You can try the following lint commands locally from the root of the git repository.

    pip install flake8
    flake8 ./algorithms --count --exit-zero --max-complexity=15 --max-line-length=127 --statistics --show-source
    
  6. Test your code locally using our test suite. Run the folling commands locally from the root of the git repository. Note that you must run pytest as a module rather than directly or it won't pick up our test directory.

    pip install pytest
    python -m pytest
    
  7. Issue a pull request — your new algorithm will be automatically queued for testing and if successful it will be merged.

That’s it! Not only will your new hash be available in the HashDB library but a new hash table will be generated for the HashDB Lookup Service and you can start reversing hashes immediately!

Rules For New Hashes

PRs with changes outside of the /algorithms directory are not part of our automated CI and will be subjected to extra scrutiny.

All hashes must have a valid description in the DESCRIPTION field.

All hashes must have a type of either unsigned_int or unsigned_long in the TYPE field. HashDB currently only accepts unsigned 32bit or 64bit hashes.

All hashes must have the hash of the string ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 in the TEST_1 field.

All hashes must include a function hash(data) that accepts a byte string and returns a hash of the string.

Adding Custom API Hashes

Some hash algorithms hash the module name and API separately and combine the hashes to create a single module+API hash. An example of this is the standard Metasploit ROR13 hash. These algorithms will not work with the standard wordlist and require a custom wordlist that includes both the module name and API. To handle these we allow custom algorithms that will only return a valid hash for some words.

Adding a custom API hash requires the following additional components.

  1. The TEST_1 field must be set to 4294967294 (-1).

  2. The hash algorithm must return the value 4294967294 for all invalid hashes.

  3. An additional TEST_API_DATA_1 field must be added with an example word that is valid for the algorithm.

  4. An additional TEST_API_1 field must be added with the hash of the TEST_API_DATA_1 field.

Standing On The Shoulders of Giants

A big shout out to the FLARE team for their efforts with shellcode_hashes. Many years ago this project set the bar for quick and easy malware hash reversing and it’s still an extremely useful tool. So why duplicate it?

Frankly, it’s all about the wordlist and accessibility. We have seen a dramatic shift towards using hashes for all sorts of strings in malware now, and the old method of hashing all the Windows’ DLL exports just isn’t good enough. We wanted a solution that could continuously process millions of registry keys and values, filenames, and process names. And we wanted that data available via a REST API so that we could use it our automation workflows, not just our static analysis tools. That being said, we wouldn’t exist without shellcode_hashes, so credit where credit is due 🙌


Comments
  • Hunt endpoint should return the intersection (if not empty) not the union

    Hunt endpoint should return the intersection (if not empty) not the union

    The /hunt endpoint currently accepts multiple hashes. My assumption was that passing more hashes to that endpoint, the number of results would decrease, because the DB has more hashes to automatically decide if a given algorithm is better suited for resolution. Instead, it seems that the API returns all possible hashing functions, similar to when I would have tried to hunt for the hashes separately and merge the responses.

    Minimal Example:

    The body {"hashes": [1676620]} leads to

    {
        "hits": [
            {
                "algorithm": "revil_010F",
                "count": 8,
                "hitrate": 8.0
            }
        ]
    }
    

    and {"hashes": [219106]} leads to

    {
        "hits": [
            {
                "algorithm": "shl1_add",
                "count": 8,
                "hitrate": 8.0
            },
            {
                "algorithm": "revil_010F",
                "count": 6,
                "hitrate": 6.0
            }
        ]
    }
    

    But {"hashes": [219106, 1676620]} leads to

    {
        "hits": [
            {
                "algorithm": "shl1_add",
                "count": 8,
                "hitrate": 4.0
            },
            {
                "algorithm": "revil_010F",
                "count": 14,
                "hitrate": 7.0
            }
        ]
    }
    

    as well. Ideally the last response would be the same as the first.

    There might be a good reason /hunt behaves that way. In this case, could we get another endpoint that accepts multiple hashes and behaves as I suggested. Because if not, we would need to make N requests to the /hunt endpoint, one for each hash and intersect the results on the client side.

    Context: this came up during developing the HashDB client for Ghidra and in particular, implement automatic REvil API resolution.

    opened by larsborn 4
  • Function names are lower cased in API responses

    Function names are lower cased in API responses

    When using the /hash/:algorithm/:value endpoint to resolve a hash for example, the value in the field api is returned in lowercase. This imho makes reading the resulting resolved API hashes unnecessarily hard.

    https://hashdb.openanalysis.net/hash/crc32/2937175076

    {
      "hashes": [
        {
          "hash": 2937175076,
          "string": {
            "string": "RtlFreeHeap",
            "is_api": true,
            "permutation": "api",
            "api": "rtlfreeheap",
            "modules": [
              "ntdll"
            ]
          }
        }
      ]
    }
    

    The field string contains the original name including casing, so the data seems to be available in principle. This also effects other endpoint that return structures as the one displayed above btw, /module/:module/:algorithm/:permutation for example.

    I haven't looked at the backend code at all, hence I'm asking: Is there a technical reason, the api is lower-cased or can we just change this?

    enhancement API 
    opened by larsborn 3
  • Supporting macOS APIs

    Supporting macOS APIs

    Hello. hashdb.openanalysis.net is a great service. But I was wondering that if it supports macOS APIs. Unfortunately, it does not seems to support them at the moment. Do you have any plan to support them in the future?

    opened by mnrkbys 2
  • API support for hash algorithm type/size

    API support for hash algorithm type/size

    When querying an algorithm using the API, the only attributes provided are algorithm and description (see here).

    In order to add proper support for OALabs/hashdb-ida#5, the type (unsigned_int, unsigned_long) is required by the client so that we know which types to use in IDA (ida_bytes.create_dword vs. ida_bytes.create_qword, ...) when fetching the hash_value.

    opened by anthonyprintup 2
  • hashdb - code to generate the database?

    hashdb - code to generate the database?

    As usual, great stuff! I was wondering if the code to generate the database is available? I would like to plug it with our process for the CIRCL hashlookup database.

    opened by adulau 2
  • Hunt hitrate over 100% due to collisions

    Hunt hitrate over 100% due to collisions

    Because we have collisions in hash tables a single hash lookup can return multiple hits... this is no good when trying to hunt for a matching algorithm.

    Example. The body {"hashes": [1676620]} leads to

    {
        "hits": [
            {
                "algorithm": "revil_010F",
                "count": 8,
                "hitrate": 8.0
            }
        ]
    }
    
    bug 
    opened by herrcore 1
  • Added new hash algorithm for cryptobot

    Added new hash algorithm for cryptobot

    Added new algorithm found in the cryptobot malware. It is ror13 on the characters and then at the end adds 10. Didn't show up initially on hash db so I figured it wasn't added.

    opened by ByridianBlack 0
  • Adding modified shr2_shl5_xor algorithm from Stealbit malware

    Adding modified shr2_shl5_xor algorithm from Stealbit malware

    Implementing a slight modification of the existing shr2_shl5_xor algorithm with the initialization constant 0xc4d5a97a observed in Stealbit version 1.1 malware samples.

    opened by everybody-lies 0
Owner
OALabs
OALabs
A selection of a few algorithms used to sort or search an array

Sort and search algorithms This repository has some common search / sort algorithms written in python, I also included the pseudocode of each algorith

null 0 Apr 2, 2022
Minimal examples of data structures and algorithms in Python

Pythonic Data Structures and Algorithms Minimal and clean example implementations of data structures and algorithms in Python 3. Contributing Thanks f

Keon 22k Jan 9, 2023
Repository for data structure and algorithms in Python for coding interviews

Python Data Structures and Algorithms This repository contains questions requiring implementation of data structures and algorithms concepts. It is us

Prabhu Pant 1.9k Jan 1, 2023
All Algorithms implemented in Python

The Algorithms - Python All algorithms implemented in Python (for education) These implementations are for learning purposes only. Therefore they may

The Algorithms 150.6k Jan 3, 2023
:computer: Data Structures and Algorithms in Python

Algorithms in Python Implementations of a few algorithms and datastructures for fun and profit! Completed Karatsuba Multiplication Basic Sorting Rabin

Prakhar Srivastav 2.9k Jan 1, 2023
Algorithms implemented in Python

Python Algorithms Library Laurent Luce Description The purpose of this library is to help you with common algorithms like: A* path finding. String Mat

Laurent Luce 264 Dec 6, 2022
Algorithms and data structures for educational, demonstrational and experimental purposes.

Algorithms and Data Structures (ands) Introduction This project was created for personal use mostly while studying for an exam (starting in the month

null 50 Dec 6, 2022
A command line tool for memorizing algorithms in Python by typing them.

Algo Drills A command line tool for memorizing algorithms in Python by typing them. In alpha and things will change. How it works Type out an algorith

Travis Jungroth 43 Dec 2, 2022
Python sample codes for robotics algorithms.

PythonRobotics Python codes for robotics algorithm. Table of Contents What is this? Requirements Documentation How to use Localization Extended Kalman

Atsushi Sakai 17.2k Jan 1, 2023
This is the code repository for 40 Algorithms Every Programmer Should Know , published by Packt.

40 Algorithms Every Programmer Should Know, published by Packt

Packt 721 Jan 2, 2023
Solving a card game with three search algorithms: BFS, IDS, and A*

Search Algorithms Overview In this project, we want to solve a card game with three search algorithms. In this card game, we have to sort our cards by

Korosh 5 Aug 4, 2022
🧬 Performant Evolutionary Algorithms For Python with Ray support

?? Performant Evolutionary Algorithms For Python with Ray support

Nathan 49 Oct 20, 2022
Nature-inspired algorithms are a very popular tool for solving optimization problems.

Nature-inspired algorithms are a very popular tool for solving optimization problems. Numerous variants of nature-inspired algorithms have been develo

NiaOrg 215 Dec 28, 2022
All algorithms implemented in Python for education

The Algorithms - Python All algorithms implemented in Python - for education Implementations are for learning purposes only. As they may be less effic

null 1 Oct 20, 2021
Implementation of Apriori algorithms via Python

Installing run bellow command for installing all packages pip install -r requirements.txt Data Put csv data under this directory "infrastructure/data

Mahdi Rezaei 0 Jul 25, 2022
A simple python application to visualize sorting algorithms.

Visualize sorting algorithms A simple python application to visualize sorting algorithms. Sort Algorithms Name Function Name O( ) Bubble Sort bubble_s

Duc Tran 3 Apr 1, 2022
Programming Foundations Algorithms With Python

Programming-Foundations-Algorithms Algorithms purpose to solve a specific proplem with a sequential sets of steps for instance : if you need to add di

omar nafea 1 Nov 1, 2021
Planning Algorithms in AI and Robotics. MSc course at Skoltech Data Science program

Planning Algorithms in AI and Robotics course T2 2021-22 The Planning Algorithms in AI and Robotics course at Skoltech, MS in Data Science, during T2,

Mobile Robotics Lab. at Skoltech 6 Sep 21, 2022
Algorithms for calibrating power grid distribution system models

Distribution System Model Calibration Algorithms The code in this library was developed by Sandia National Laboratories under funding provided by the

Sandia National Laboratories 2 Oct 31, 2022