MEDS: Enhancing Memory Error Detection for Large-Scale Applications

Related tags

Deep Learning MEDS

Overview

MEDS: Enhancing Memory Error Detection for Large-Scale Applications

Prerequisites

cmake and clang

Build MEDS supporting compiler

$ make

Build Using Docker

# build docker image
$ docker build -t meds .

# run docker image
$ docker run --cap-add=SYS_PTRACE -it meds /bin/bash

Testing MEDS

MEDS's testing runs original ASAN's testcases as well as MEDS specific testcases.
- Copied ASAN's testcases in llvm/projects/compiler-rt/test/meds/TestCases/ASan
- MEDS specific testcases in llvm/projects/compiler-rt/test/meds/TestCases/Meds
To run the test,

$ make test

Testing Time: 30.70s
 Expected Passes    : 183
 Expected Failures  : 1
 Unsupported Tests  : 50

Build applications with MEDS heap allocation and ASan stack and global

Given a test program test.cc,

$ cat > test.cc

int main(int argc, char **argv) {
  int *a = new int[10];
  a[argc * 10] = 1;
  return 0;
}

test.cc can be built using the option, -fsanitize=meds.

$ build/bin/clang++ -fsanitize=meds test.cc -o test
$ ./test

==90589==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x43fff67eb078 at pc 0x0000004f926d bp 0x7fffffffe440 sp 0x7fffffffe438
WRITE of size 4 at 0x43fff67eb078 thread T0
    #0 0x4f926c in main (/home/wookhyun/release/meds-release/a.out+0x4f926c)
    #1 0x7ffff6b5c82f in __libc_start_main /build/glibc-bfm8X4/glibc-2.23/csu/../csu/libc-start.c:291
    #2 0x419cb8 in _start (/home/wookhyun/release/meds-release/a.out+0x419cb8)

Address 0x43fff67eb078 is a wild pointer.
SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/wookhyun/release/meds-release/a.out+0x4f926c) in main
Shadow bytes around the buggy address:
  0x08807ecf55b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x08807ecf55c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x08807ecf55d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x08807ecf55e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x08807ecf55f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x08807ecf5600: fa fa fa fa fa fa fa fa fa fa 00 00 00 00 00[fa]
  0x08807ecf5610: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x08807ecf5620: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x08807ecf5630: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x08807ecf5640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x08807ecf5650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==90589==ABORTING

Options

-fsanitize=meds: Enable heap protection using MEDS (stack and global are protected using ASAN)
-mllvm -meds-stack=1: Enable stack protection using MEDS
-mllvm -meds-global=1 -mcmodel=large: Enable global protection using MEDS
- This also requires --emit-relocs in LDFLAGS
Example: to protect heap/stack using MEDS and global using ASAN

$ clang -fsanitize=meds -mllvm -meds-stack=1 test.c -o test

Example: to protect heap/global using MEDS and stack using ASAN

$ clang -fsanitize=meds -mllvm -meds-global=1 -mcmodel=large -Wl,-emit-relocs test.c -o test

Example: to protect heap/stack/global using MEDS

$ clang -fsanitize=meds -mllvm -meds-stack=1 -mllvm -meds-global=1 -mcmodel=large -Wl,--emit-relocs

Contributors

Wookhyun Han ([email protected])
Byunggil Joe ([email protected])
Byoungyougn Lee ([email protected])
Chengyu Song ([email protected])
Insik Shin ([email protected])

Enhancing Knowledge Tracing via Adversarial Training

Enhancing Knowledge Tracing via Adversarial Training This repository contains source code for the paper "Enhancing Knowledge Tracing via Adversarial T

14 Oct 24, 2022

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

DSM The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion Project Website; Datasets li

114 Oct 16, 2022

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning.

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning. Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive

7 Dec 16, 2021

Pytorch implementation of the paper "Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization"

4 Sep 18, 2022

Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022

ATP: AMRize Then Parse! Enhancing AMR Parsing with PseudoAMRs Hi this is the source code of our paper "ATP: AMRize Then Parse! Enhancing AMR Parsing w

13 Nov 23, 2022

Open-AI's DALL-E for large scale training in mesh-tensorflow.

DALL-E in Mesh-Tensorflow [WIP] Open-AI's DALL-E in Mesh-Tensorflow. If this is similarly efficient to GPT-Neo, this repo should be able to train mode

432 Dec 16, 2022

Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op

34.7k Jan 4, 2023

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

212 Dec 25, 2022

[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks, ICLR 2021 (Spotlight) Demo | Paper [NEW!] Time to play with our interac

373 Jan 2, 2023

Comments

Failed to detect use-after-free when malloc more than 2MB buffer

MEDS fails to detect UAF error of the following malloc-free-malloc-use code with 2MB or larger buffer size:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(){
    char *a = malloc(0x200000); // 2MB
    // char *a = malloc(0x10000); // 64KB
    printf("a = %p\n", a);
    free(a);
    
    char *b;
    for(int i = 0; ; i++){
        b = malloc(0x200000);
        printf("i = %d, a = %p, b = %p\n", i, a, b);
        if(a == b)
            break;
        free(b);
    }
    
    int offset = 0x10;
    printf("writing in %p\n", (a+offset));
    a[offset] = 'a';

    free(b);
    return 0;
}

I compile and run this code in the docker with MEDS

$ docker run -v ~/MEDS/sample:/data --cap-add=SYS_PTRACE -it --rm meds /bin/bash
$ clang -fsanitize=meds -g -mllvm -meds-stack=1 -mllvm -meds-global=1 -mcmodel=large -Wl,--emit-relocs uaf-3.c -o uaf-3-meds
$ ./uaf-3-meds

Theoretically it should not break the for loop since the quarantine zone is 80T. But the address of 'b' is equal to the address of 'a' during the first malloc.

But if the buffer is small (like 64KB), the 80T quarantine zone mechanism works fine.

opened by Marsman1996 2

Question about the redzone size.

Hi,

If I do not misunderstand the paper, the size of the MEDS redzone should be 4MB. But in the example below the redzone size seems much smaller than 4MB.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(){
    char *a = malloc(0x20);
    char *b = malloc(0x20);
    // char *a = malloc(0x1000000);
    // char *b = malloc(0x1000000);
    int offset = 0x4030;
    printf("a = %p, b = %p\n", a, b);
    printf("writing in %p\n", &a[offset]);
    a[offset] = 'a';
    free(a);
    free(b);
    return 0;
}

And the outout is:

a = 0x42e90657d010, b = 0x42e906581040

So the redzone size is about 16KB(0x42e906581040 - 0x42e90657d010 - 0x20 = 0x4010).

Thanks for any reply!

opened by Marsman1996 3

MEDS: Enhancing Memory Error Detection for Large-Scale Applications

Related tags

Overview

MEDS: Enhancing Memory Error Detection for Large-Scale Applications

Prerequisites

Build MEDS supporting compiler

Build Using Docker

Testing MEDS

Build applications with MEDS heap allocation and ASan stack and global

Options

Contributors

You might also like...

Enhancing Knowledge Tracing via Adversarial Training

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning.

Pytorch implementation of the paper "Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization"

Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022

Open-AI's DALL-E for large scale training in mesh-tensorflow.

Apache Spark - A unified analytics engine for large-scale data processing

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Comments

Failed to detect use-after-free when malloc more than 2MB buffer

Question about the redzone size.

Owner

Secomp Lab at Purdue University

Prevent `CUDA error: out of memory` in just 1 line of code.

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Episodic-memory - Ego4D Episodic Memory Benchmark

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Run Effective Large Batch Contrastive Learning on Limited Memory GPU

Code & Data for Enhancing Photorealism Enhancement

Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration