InsTrim: Lightweight Instrumentation for Coverage-guided Fuzzing

Overview

InsTrim

The paper: InsTrim: Lightweight Instrumentation for Coverage-guided Fuzzing

Build

Prerequisite

  • llvm-8.0-dev
  • clang-8.0
  • cmake >= 3.2

Make

git clone https://github.com/csienslab/instrim.git
cd instrim
cmake .
make

Patch and build AFL Fuzzer

Run build_afl.sh or

wget http://lcamtuf.coredump.cx/afl/releases/afl-2.52b.tgz
tar -xvf afl-2.52b.tgz
cd afl-2.52b
patch -p1 < ../instrim/afl-fuzzer.patch
make
cd llvm_mode
make

Usage

Setup the environment

export INSTRIM_LIB=[absolute path of instrim/libLLVMInsTrim.so]

Instrument the target program

With Instrim

MARKSET=1 afl-2.52b/afl-clang-fast [compilation options, your target ...]

With Instrim-Approx

MARKSET=1 LOOPHEAD=1 afl-2.52b/afl-clang-fast [compilation options, your target ...]

Skip single block functions

The following is recommendable for C/C++ targets that are not using vtables or similar techniques:

MARKSET=1 SKIPSINGLEBLOCK=1 afl-2.52b/afl-clang-fast [compilation options, your target ...]

Finally

Then you can use AFL with LLVM mode to fuzz those instrumented binaries.

Comments
  • small performance patch

    small performance patch

    simple patch: do not instrument functions that only have one basic block. It just pollutes the map. if there is a decision to call that function it is in the callee and therefore covered.

    opened by vanhauser-thc 10
  • Loss of coverage

    Loss of coverage

    Hi,

    I tried you code and read the paper, and I think the optimization is way too aggressive and loosing coverage.

    I have compiled a random code and then looked with a disassembler which basic blocks were instrumented and which were not.

    instrumented basic blocks with afl-2.52b llvm_mode: 279 instrumented basic blocks with instrim: 76

    Thats about only 30%of the blocks being instrumented. That already looks bad, so lets look at a specific example:

    Capture2

    In that picture we see 7 basic blocks, the two top blocks coming from yes/no decision branches, and the bottom 3 blocks resulting in the same final basic block.

    In the visible 7 basic blocks only one is instrumented. the resulting basic block from the bottom three blocks that is not visible, is also instrumented.

    If we now look at the non-instrumented blocks - they are all potentially interesting and should have been instrumented.

    Did you just do speed comparisons? Or did you also do coverage comparison? (for this to be effective you have to remove all srandom() calls in afl-fuzz and replace it with a fixed srandom(123) at the start of main() to have the same PRNG results, run it for 1-24 hours and then compare the coverage of lines of code, e.g. with afl-cov)

    opened by vanhauser-thc 9
  • error: unable to load plugin

    error: unable to load plugin

    Please excuse my poor English.

    I tried to compile some programs using patched afl-clang-fast. Unfortunately, this error occurred and compiling failed.

    error: unable to load plugin '/path/to/instrim/libLLVMInsTrim.so':
    '/path/to/instrim/libLLVMInsTrim.so: undefined symbol:
    _ZN4llvm24DisableABIBreakingChecksE'
    

    OS is Ubuntu 16.04.2 on x86_64. All prerequisite packages have already been installed (by apt-get), and I've done all steps written in README.

    What shoud I do to resolve this error?

    opened by vhertz 4
  •  Instrument exit blocks when there is an empty path.

    Instrument exit blocks when there is an empty path.

    The original algorithm will see the empty path as a distinguishable unique path.

    For example: In the following CFG, the algorithm will initial the entry with mark (0), then it will find there are 2 mark (0) coming from the predecessors of block D, so it will create a new mark (1). On the exit block, the predecessors are two different marks, so it won't be marked (instrumented).

    Untitled Diagram

    However, the path of mark (0) is actually an empty path without any mark. Although an empty path is distinguishable with any other paths with marks, the fuzzer won't receive any signal if there is no mark (instrumentation) on a path.

    This PR changes the marking algorithm, so it instrument function exit blocks when there is an empty path, which should address the issue #4.

    Here are the number of instrumented blocks on libxml2-v2.9.10: | Method | # of instrumented blocks | |------------|--------------------------------| | Full instrumentation | 68932 | Original marking algorithm | 16370 | Marking algorithm with this patch | 17264

    opened by pzread 1
  • How to get the number of marked vertices?

    How to get the number of marked vertices?

    It is mentioned in the project paper that the number of inserting blocks of instrim has been reduced a lot compared to afl. I tried to count the number of instrumented blocks, but I didn't find any better tools to make statistics. I checked the source code of instrim, and I didn't seem to see the corresponding statistical method. Could you please tell me how to count the number of marked vertices?

    opened by kimiwanano 0
  • The label generation should use the map size defined in AFL's config.h

    The label generation should use the map size defined in AFL's config.h

    Currently we use the fixed 65536 as map size, which is the default value in AFL config.h. We should include the config.h to make sure both sides have consistent map size.

    Thanks for the report from vanhauser-thc

    opened by pzread 0
  • PrevLocation incorrect

    PrevLocation incorrect

    The basic block which writes to the coverage map also has to set the previous location.

    There are two issues with how this is implemented in InsTrim:

    1. the algorithm for this in afl is:
         index = curr_location ^ (prev_location >> 1);
         map[index]++;
    

    In the InsTrim code the right shift of the prev_location is never performed.

    1. the prev_location written is always a specific one and not the one that was actually the path. e.g. EntryBlock / | \ A B C \ | / ExitBlock The writing to the map will happen in the ExitBlock, and the prev_location written will always be the ID of block A, and not depend on which actual path was taken.

    In the code this is visible in the following line:

    IRB.CreateStore(ConstantInt::get(Int32Ty, genLabel()), OldPrev);
    

    Two fix both issues, in afl++ we removed that CreateStore() and added after IRB.CreateStore(Incr, MapPtrIdx);:

    Value *Shr = IRB.CreateLShr(L, One32);
     IRB.CreateStore(Shr, OldPrev)->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));
    

    Note that this also needs a ConstantInt *One32 = ConstantInt::get(Int32Ty, 1); definition after IntegerType *Int32Ty ...

    opened by vanhauser-thc 3
  • Loss of coverage (#2)

    Loss of coverage (#2)

    This is an extension to issue #2 - I found out where the loss of coverage is. There seems to be a logic bug that sometimes results in a basic block not being instrumented when it should.

    In the following example the if (argc < 2) { BASIC_BLOCK } basic block is not instrumented. Note that is has to be compiled with AFL_DONT_OPTIMIZE: AFL_DONT_OPTIMIZE=1 MARKSET=1 afl-clang-fast -o bug bug.c

    #include <stdio.h>
    #include <stdlib.h>
    
    void foo(int a) {
      printf("foo\n");
      if (a == 1)
        printf("1\n");
      else
        printf("2\n");
      printf("done\n");
    }
    
    void bar() {
      printf("bar\n");
    }
    
    int main(int argc, char *argv[]) {
      printf("main\n");
      if (argc < 2) {
        printf("argc<2\n"); // *This is not instrumented*
        return -1;
      } else if (argc > 2)
        printf("argc>2\n");
      else
        printf("argc=2\n");
      printf("ok\n");
      if (argv[1][0] == 'a')
        foo(atoi(argv[1]));
      else
        bar();
      printf("end\n");
      return 0;
    }
    

    The disassembly looks like this:

    Dump of assembler code for function main:
       0x0000000000401320 <+0>:	push   rbp
       0x0000000000401321 <+1>:	mov    rbp,rsp
       0x0000000000401324 <+4>:	sub    rsp,0x20
       0x0000000000401328 <+8>:	mov    DWORD PTR [rbp-0x4],0x0
       0x000000000040132f <+15>:	mov    DWORD PTR [rbp-0x8],edi
       0x0000000000401332 <+18>:	mov    QWORD PTR [rbp-0x10],rsi
       0x0000000000401336 <+22>:	movabs rdi,0x402017
       0x0000000000401340 <+32>:	mov    al,0x0
       0x0000000000401342 <+34>:	call   0x401080 <printf@plt>
       0x0000000000401347 <+39>:	cmp    DWORD PTR [rbp-0x8],0x2
       0x000000000040134b <+43>:	jge    0x40136e <main+78>
    -> HERE begins the non-instrumented basic block
       0x0000000000401351 <+49>:	movabs rdi,0x40201d
       0x000000000040135b <+59>:	mov    al,0x0
       0x000000000040135d <+61>:	call   0x401080 <printf@plt>
       0x0000000000401362 <+66>:	mov    DWORD PTR [rbp-0x4],0xffffffff
       0x0000000000401369 <+73>:	jmp    0x401482 <main+354>
    -> HERE it jumps to the end
    

    Decompiled in Ghidra it looks like this:

    int main(uint param_1,char** param_2) {
     // remove variable declarations
      printf("main\n");
      if ((int)param_1 < 2) {
        printf("argc<2\n");
        local_c = 0xffffffff;
      } else {
        ...
      }
     return (ulong)local_c;
    }
    
    opened by vanhauser-thc 3
Owner
null
Fuzzing tool (TFuzz): a fuzzing tool based on program transformation

T-Fuzz T-Fuzz consists of 2 components: Fuzzing tool (TFuzz): a fuzzing tool based on program transformation Crash Analyzer (CrashAnalyzer): a tool th

HexHive 244 Nov 9, 2022
An AFL implementation with UnTracer (our coverage-guided tracer)

UnTracer-AFL This repository contains an implementation of our prototype coverage-guided tracing framework UnTracer in the popular coverage-guided fuz

null 113 Dec 17, 2022
ParmeSan: Sanitizer-guided Greybox Fuzzing

ParmeSan: Sanitizer-guided Greybox Fuzzing ParmeSan is a sanitizer-guided greybox fuzzer based on Angora. Published Work USENIX Security 2020: ParmeSa

VUSec 158 Dec 31, 2022
[ICSE2020] MemLock: Memory Usage Guided Fuzzing

MemLock: Memory Usage Guided Fuzzing This repository provides the tool and the evaluation subjects for the paper "MemLock: Memory Usage Guided Fuzzing

Cheng Wen 54 Jan 7, 2023
Piotr - IoT firmware emulation instrumentation for training and research

Piotr: Pythonic IoT exploitation and Research Introduction to Piotr Piotr is an emulation helper for Qemu that provides a convenient way to create, sh

Damien Cauquil 51 Nov 9, 2022
AFL binary instrumentation

E9AFL --- Binary AFL E9AFL inserts American Fuzzy Lop (AFL) instrumentation into x86_64 Linux binaries. This allows binaries to be fuzzed without the

null 242 Dec 12, 2022
Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

SCL Introduction Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)' We evaluated our approach using two baseline

null 34 Oct 8, 2022
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

Rex Cheng 456 Dec 12, 2022
Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

Angora Angora is a mutation-based coverage guided fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without s

null 833 Jan 7, 2023
FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack

FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack Case study of the FCA. The code can be find in FCA. Cas

IDRL 21 Dec 15, 2022
Codecov coverage standard for Python

Python-Standard Last Updated: 01/07/22 00:09:25 What is this? This is a Python application, with basic unit tests, for which coverage is uploaded to C

Codecov 10 Nov 4, 2022
Lightweight mmm - Lightweight (Bayesian) Media Mix Model

Lightweight (Bayesian) Media Mix Model This is not an official Google product. L

Google 342 Jan 3, 2023
Differential fuzzing for the masses!

NEZHA NEZHA is an efficient and domain-independent differential fuzzer developed at Columbia University. NEZHA exploits the behavioral asymmetries bet

null 147 Dec 5, 2022
ProFuzzBench - A Benchmark for Stateful Protocol Fuzzing

ProFuzzBench - A Benchmark for Stateful Protocol Fuzzing ProFuzzBench is a benchmark for stateful fuzzing of network protocols. It includes a suite of

null 155 Jan 8, 2023
Emulation and Feedback Fuzzing of Firmware with Memory Sanitization

BaseSAFE This repository contains the BaseSAFE Rust APIs, introduced by "BaseSAFE: Baseband SAnitized Fuzzing through Emulation". The example/ directo

Security in Telecommunications 138 Dec 16, 2022
A fuzzing framework for SMT solvers

yinyang A fuzzing framework for SMT solvers. Given a set of seed SMT formulas, yinyang generates mutant formulas to stress-test SMT solvers. yinyang c

Project Yin-Yang for SMT Solver Testing 145 Jan 4, 2023
AntiFuzz: Impeding Fuzzing Audits of Binary Executables

AntiFuzz: Impeding Fuzzing Audits of Binary Executables Get the paper here: https://www.usenix.org/system/files/sec19-guler.pdf Usage: The python scri

Chair for Sys­tems Se­cu­ri­ty 88 Dec 21, 2022
Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-art fuzzing techniques

About Fuzzification Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-

gts3.org (SSLab@Gatech) 55 Oct 25, 2022
Hydra: an Extensible Fuzzing Framework for Finding Semantic Bugs in File Systems

Hydra: An Extensible Fuzzing Framework for Finding Semantic Bugs in File Systems Paper Finding Semantic Bugs in File Systems with an Extensible Fuzzin

gts3.org (SSLab@Gatech) 129 Dec 15, 2022