Parsing ELF and DWARF in Python

Overview

pyelftools

pyelftools is a pure-Python library for parsing and analyzing ELF files and DWARF debugging information. See the User's guide for more details.

Pre-requisites

As a user of pyelftools, one only needs Python to run. It works with Python versions 2.7 and 3.x (x >= 5). For hacking on pyelftools the requirements are a bit more strict, please see the hacking guide.

Installing

pyelftools can be installed from PyPI (Python package index):

> pip install pyelftools

Alternatively, you can download the source distribution for the most recent and historic versions from the Downloads tab on the pyelftools project page (by going to Tags). Then, you can install from source, as usual:

> python setup.py install

Since pyelftools is a work in progress, it's recommended to have the most recent version of the code. This can be done by downloading the master zip file or just cloning the Git repository.

Since pyelftools has no external dependencies, it's also easy to use it without installing, by locally adjusting PYTHONPATH.

How to use it?

pyelftools is a regular Python library: you import and invoke it from your own code. For a detailed usage guide and links to examples, please consult the user's guide.

License

pyelftools is open source software. Its code is in the public domain. See the LICENSE file for more details.

Comments
  • Supplementary object files

    Supplementary object files

    Since dwarf5, we have the concept of "supplementary object files", which are meant to optimize the size of DWARFInfo by letting several files refer to a common one for their common DIEs.

    It works by adding a .debug_sup section, containing a header and a filename which will be referred to as the supplementary object file. Some forms are designed to reference the supplementary object file instead of the current one:

    • DW_FORM_ref_sup4
    • DW_FORM_ref_sup8
    • DW_FORM_strp_sup

    The references are either directly at the attribute level, or can be used in a DW_TAG_importedunit to import a whole compilation unit from the supplementary object file.

    A similar thing also exists as GNU proprietary extension, using a .gnu_debugaltlink section with a slightly different format, and different form names:

    • DW_FORM_GNU_strp_alt
    • DW_FORM_GNU_ref_alt

    In this implementation, I worked around the fact that we need to know the base directory by making it an optional argument to ELFFile. Ideally, an ELFFile would know it's location by itself.

    Simple tests have been added. I will expand to them to make sure the DW_FORM_ref_sup4 and DW_FORM_ref_sup8 are properly supported. The test_.*2 files are not necessary by themselves, they are needed to generate the common files with dwz though.

    This PR embeds the commit which is proposed separately in #423: one of dwz optimizations is to switch to ULEB128 encoding when it can.

    opened by rdunklau 23
  • Support DWARF5 standardization of GNU extension expression opcodes

    Support DWARF5 standardization of GNU extension expression opcodes

    In dwarfv5, a number of GNU extensions have been standardized. I don't know how to test for them to be honest, but it would be nice to add synonyms to the GNU extension opcodes we already parse.

    It seems to me that the following should be added:

    DW_OP_GNU_entry_value => DW_OP_entry_value DW_OP_GNU_const_type => DW_OP_const_type DW_OP_GNU_regval_type => DW_OP_regval_type DW_OP_GNU_deref_type => DW_OP_deref_type DW_OP_GNU_implicit_pointer => DW_OP_implicit_pointer DW_OP_GNU_convert => DW_OP_convert

    patches-welcome depends-on-another-PR blocking-next-release 
    opened by rdunklau 22
  • for sibling of form ref_addr, only sibling value should be used

    for sibling of form ref_addr, only sibling value should be used

    Hi,

    cur_offset calculation might be wrong if the sibling is DW_FORM_ref_addr, because it holds an reference from the beginning of the .debug_info.

    This type of reference (DW_FORM_ref_addr) is an offset from the
    beginning of the .debug_info section of the target executable or
    shared object file, or, for references within a supplementary object file,
    an offset from the beginning of the local .debug_info section;
    

    this fix is related to : https://github.com/eliben/pyelftools/commit/670079afe5a472aab74cec74acc15ae7c50cdb83#r36974522

    opened by sagiben 19
  • GNU expressions, take 3

    GNU expressions, take 3

    No tests so far.

    I've produced a body of ELF files with GNU opcodes by compiling gdb on Debian Buster x86_64. It generates DWARF by default. I was going to set up a test against readelf, but then ran into #302 - multiple instances of the same attribute. Dear maintainers, please decide what to do about that.

    Alternatively, I can publish a standalone unit test - go over a file or two, parse all expressions, make sure all the opcodes parse.

    Alternatively, I can make a custom test where I would match the parsed expressions against the output of readelf, ignoring everything else.

    EDIT: I've rebuilt GDB with the latest everything, the duplicate attribute issue is no longer there. HOWEVER, now there's an unsupported relocation type.

    opened by sevaa 15
  • aarch64 and ppc64 support?

    aarch64 and ppc64 support?

    Hi @eliben ! I wrote a utility to help debug -Wframe-larger-than= warnings (which are pretty unhelpful in telling you what particular variables may be causing this, especially when many child calls have been inlined. Apparently all of that info about local variables in captured in the debug info).

    We used it for debugging a warning for a ppc64 Linux kernel build, but it seemed the object file failed to parse. We observed the thrown ElfError: Unsupported relocation type: 1.

    I haven't dug into reproducing yet, but I wanted to check first if pyelftools supports ppc64? I'm not sure if ppc64 uses a different target triple for endianess or not. I also haven't dumped the relocation kinds, which may be helpful for this report.

    patches-welcome 
    opened by nickdesaulniers 14
  • Cached random access to CUs and DIEs

    Cached random access to CUs and DIEs

    Here is a series that refactors the DIE cache to cache entries without walking from the top of a CU, and also one that adds a similar cache to the CU lookup in the DWARFInfo.

    opened by mdmillerii 14
  • examples/dwarf_lineprogram_filenames.py: TypeError: tuple indices must be integers or slices, not str

    examples/dwarf_lineprogram_filenames.py: TypeError: tuple indices must be integers or slices, not str

    I cloned a fresh copy of pyelftools to check DWARF5 line program support and I get this exception:

    pyelftools$ python3 examples/dwarf_lineprogram_filenames.py --test /tmp/tmp31cx7ko3/usr/lib/debug/lib/modules/5.17.0-3-amd64/kernel/drivers/atm/ambassador.ko
    Processing file: /tmp/tmp31cx7ko3/usr/lib/debug/lib/modules/5.17.0-3-amd64/kernel/drivers/atm/ambassador.ko
      Found a compile unit at offset 0, length 147144
    Traceback (most recent call last):
      File "examples/dwarf_lineprogram_filenames.py", line 96, in <module>
        process_file(filename)
      File "examples/dwarf_lineprogram_filenames.py", line 46, in process_file
        line_entry_mapping(line_program)
      File "examples/dwarf_lineprogram_filenames.py", line 62, in line_entry_mapping
        filename = lpe_filename(line_program, lpe.state.file)
      File "examples/dwarf_lineprogram_filenames.py", line 82, in lpe_filename
        dir_index = file_entry["dir_index"]
    TypeError: tuple indices must be integers or slices, not str
    

    This is the unmodified example program that comes with pyelftools.

    opened by vegard 13
  • GNU expressions

    GNU expressions

    Support for some GNU opcodes for DWARF expressions.

    One of them, DW_OP_GNU_entry_value, threw a monkey wrench into the whole expression parsing scheme. As an argument, it contains a variable length binary blob with another expression. So the expression parsing logic should recurse. Now, expression parsing in pyelftools was originally designed with progressive parsing in mind, thus the visitor logic. There is no out out the box logic for "parse this expression into a single data structure".

    So I came up with one. In order to keep things compatible, I've split the ExprDumper class from descriptions.py into two parts - the "unite operations into a list" bit and the "format an operation into a string" bit. The former resides under dwarf_expr now, as GenericExprDumper, and the parsing logic of DW_OP_GNU_entry_value relies on that to combine the nested expression into a list.

    There's a unit test, too.

    opened by sevaa 13
  • Lazy parsing

    Lazy parsing

    Recently, I used pyelftools in a debugger. During my work I faced a problem. pyelftools parses much of debug info when the debugger just looks for a few things. E.g. when the debugger looks for a DIE describing a variable, entire CU (all its DIEs) is parsed. When working with a big binary (like Qemu emulator) debugger loading consumes about 5 minutes which is very annoying to a user.

    The solution I suggest is lazy parsing. This patch series reworks some internals of the library so it only parses requested things.

    Also, because an ELF file is considered constant, there is no a reason to parse things twice. So, I also added some caches for parsed entities.

    pending-user-input 
    opened by laerreal 11
  • Enable parsing of relocations pointed to by DYNAMIC.

    Enable parsing of relocations pointed to by DYNAMIC.

    In combination with DynamicSegment, this allows relocations to be processed without a functional section table.

    Example usage:

    from elftools.elf.elffile import ELFFile
    from elftools.elf.dynamic import DynamicSegment
    from elftools.elf.relocation import get_dynamic_reloc_tables
    
    import sys
    
    elff = ELFFile(open(sys.argv[1], 'rb'))
    
    for seg in elff.iter_segments():
        if isinstance(seg, DynamicSegment):
            relos = get_dynamic_reloc_tables(elff.stream, elff, seg)
            for relname in relos:
                print relname
                for rel in relos[relname].iter_relocations():
                    print rel
    
    pending-user-input 
    opened by nneonneo 11
  • Cache instantiation of DWARF structs.

    Cache instantiation of DWARF structs.

    Hello,

    I've taken an interest in profiling some of my code using pyelftools. It turns out that a lot of time is spent instantiating DWARFStructs, when they are realistically called with the same arguments over and over again.

    I decided to run a quick benchmark on the postgres debug symbols I'm working with.

    from elftools.elf.elffile import ELFFile
    import sys
    
    with ELFFile.load_from_path(sys.argv[1].encode('utf8')) as f:
        dwarf_info = f.get_dwarf_info()
        cus = list(dwarf_info.iter_CUs())
        print(f"Number of compile units: {len(cus)}")
    

    On unpatched master:

    # time python /tmp/test.py /var/lib/machines/fedora/usr/lib/debug/usr/bin/postgres-14.3-2.fc36.x86_64.debug  
    Number of compile units: 6229
    
    real    0m16,524s
    user    0m16,309s
    sys     0m0,200s
    

    With this small instantiation cache:

    # time python /tmp/test.py /var/lib/machines/fedora/usr/lib/debug/usr/bin/postgres-14.3-2.fc36.x86_64.debug  
    Number of compile units: 6229
    
    real    0m0,142s
    user    0m0,122s
    sys     0m0,020s
    
    opened by rdunklau 10
  • ImportError: cannot import name 'bytes2str' from 'elftools.common.utils'

    ImportError: cannot import name 'bytes2str' from 'elftools.common.utils'

    I tried to install pyelftools like this:

    pip3 install pyelftools 
    

    I then downloaded the example code dwarf_pubnames_types.py and when I try to run it I get:

    ImportError: cannot import name 'bytes2str' from 'elftools.common.utils' (/usr/local/lib/python3.10/site-packages/elftools/common/utils.py)
    

    When I examine /usr/local/lib/python3.10/site-packages/elftools/common/utils.py I find that there is no bytes2str

    But when I look at the utils.py code in github I see that the function is there.

    elftools/common/utils.py

    So I see if my pip3 install pyelftools are from an different version:

    % pip3 show  pyelftools                                                
    Name: pyelftools
    Version: 0.29
    Summary: Library for analyzing ELF files and DWARF debugging information
    Home-page: https://github.com/eliben/pyelftools
    Author: Eli Bendersky
    Author-email: [email protected]
    License: Public domain
    Location: /usr/local/lib/python3.10/site-packages
    Requires: 
    Required-by: 
    

    So this is the latest version and what I see in gitbut, right?

    How do I solve this?

    opened by nyholku 7
  • I can't get function name with dwarf_decode_address.py

    I can't get function name with dwarf_decode_address.py

    I did build C++ code with clang in Linux. My binary is DIE tag is only DW_TAG_compile_unit. So, Only return 'None' data when called decode_funcname() function. Am I miss something?

    opened by LeeGoodThing 3
  • How to get DW_OP_addr from DW_FORM_sec_offset

    How to get DW_OP_addr from DW_FORM_sec_offset

    I have script I have been using for a long time, but now it's getting DW_FORM_sec_offset's. It was only made for DW_FORM_exprloc.

    Before I got the DW_OP_addr: by doing DWDESC.describe_DWARF_expr(die.attributes['DW_AT_location'].value, die.cu.structs)

    This doesn't work with DW_FORM_sec_offset though. I get this error: return b''.join(bytes((b,)) for b in bytelist) TypeError: 'int' object is not iterable Which is expected since the format is differen't.

    opened by Samuel-Fipps 2
  • Modifed file pointer is not reset when _identify function in elffile.py raises exception

    Modifed file pointer is not reset when _identify function in elffile.py raises exception

    There is a case where the _identify function in elffile.py raises an exception, leaving the file pointer at 4.

    https://github.com/eliben/pyelftools/blob/8a74c8f9ca466de0738b1e94394aac494ff3db39/elftools/elf/elffile.py#L567-L569

    When this code is used with a shared file pointer and code that assumes the file pointer will be 0 a code path like when trying to create an ELFFile object, it can cause issues when the ending file pointer is not 0. On an exception, the file pointer should be reset to 0 and the exception reraised.

    opened by notgriffin 0
  • Is it possible to have the size of an ELF file on disk

    Is it possible to have the size of an ELF file on disk

    Hi, I was wondering if it's possible to have the size of an ELF file on the disk, based on the information in its header ? If I understood well, e_shoff + (e_shentsize * e_shnum) is not always equal to the size on the disk. Thanks!

    opened by LafLaurine 1
  • some problems about Section.data()

    some problems about Section.data()

    Hi, I'm trying to extract assembly info from an object file, so I used ELFFile.get_section_by_name('.text') and Section.data(), like:

    import sys sys.path.insert(0, '.') from elftools.elf.elffile import ELFFile

    with open("/root/linux-master/drivers/tty/serial/earlycon.o", "rb") as f: elf = ELFFile(f, sys.stdout) text = elf.get_section_by_name('.text') textInfo = text.stream.read() print(len(textInfo)) addr = text['sh_addr'] code = text.data() print(code)

    However, the output content is:

    3219 b""

    And I don't understand why text.data() can't return the assembly info stream. related file is uploaded.

    earlycon.zip

    opened by Absoler 1
Owner
Eli Bendersky
Eli Bendersky
🔥 Pyflame: A Ptracing Profiler For Python. This project is deprecated and not maintained.

Pyflame: A Ptracing Profiler For Python (This project is deprecated and not maintained.) Pyflame is a high performance profiling tool that generates f

Uber Archive 3k Jan 7, 2023
Code2flow generates call graphs for dynamic programming language. Code2flow supports Python, Javascript, Ruby, and PHP.

Code2flow generates call graphs for dynamic programming language. Code2flow supports Python, Javascript, Ruby, and PHP.

Scott Rogowski 3k Jan 1, 2023
AryaBota: An app to teach Python coding via gradual programming and visual output

AryaBota An app to teach Python coding, that gradually allows students to transition from using commands similar to natural language, to more Pythonic

null 5 Feb 8, 2022
Python's missing debug print command and other development tools.

python devtools Python's missing debug print command and other development tools. For more information, see documentation. Install Just pip install de

Samuel Colvin 637 Jan 2, 2023
VizTracer is a low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution.

VizTracer is a low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution.

null 2.8k Jan 8, 2023
A package containing a lot of useful utilities for Python developing and debugging.

Vpack A package containing a lot of useful utilities for Python developing and debugging. Features Sigview: press Ctrl+C to print the current stack in

volltin 16 Aug 18, 2022
Arghonaut is an interactive interpreter, visualizer, and debugger for Argh! and Aargh!

Arghonaut Arghonaut is an interactive interpreter, visualizer, and debugger for Argh! and Aargh!, which are Befunge-like esoteric programming language

Aaron Friesen 2 Dec 10, 2021
pdb++, a drop-in replacement for pdb (the Python debugger)

pdb++, a drop-in replacement for pdb What is it? This module is an extension of the pdb module of the standard library. It is meant to be fully compat

null 1k Dec 24, 2022
Full-screen console debugger for Python

PuDB: a console-based visual debugger for Python Its goal is to provide all the niceties of modern GUI-based debuggers in a more lightweight and keybo

Andreas Klöckner 2.6k Jan 1, 2023
Trace any Python program, anywhere!

lptrace lptrace is strace for Python programs. It lets you see in real-time what functions a Python program is running. It's particularly useful to de

Karim Hamidou 687 Nov 20, 2022
Debugging manhole for python applications.

Overview docs tests package Manhole is in-process service that will accept unix domain socket connections and present the stacktraces for all threads

Ionel Cristian Mărieș 332 Dec 7, 2022
(OLD REPO) Line-by-line profiling for Python - Current repo ->

line_profiler and kernprof line_profiler is a module for doing line-by-line profiling of functions. kernprof is a convenient script for running either

Robert Kern 3.6k Jan 6, 2023
Monitor Memory usage of Python code

Memory Profiler This is a python module for monitoring memory consumption of a process as well as line-by-line analysis of memory consumption for pyth

Fabian Pedregosa 80 Nov 18, 2022
Sampling profiler for Python programs

py-spy: Sampling profiler for Python programs py-spy is a sampling profiler for Python programs. It lets you visualize what your Python program is spe

Ben Frederickson 9.5k Jan 8, 2023
Visual profiler for Python

vprof vprof is a Python package providing rich and interactive visualizations for various Python program characteristics such as running time and memo

Nick Volynets 3.9k Jan 1, 2023
pdb++, a drop-in replacement for pdb (the Python debugger)

pdb++, a drop-in replacement for pdb What is it? This module is an extension of the pdb module of the standard library. It is meant to be fully compat

null 1k Jan 2, 2023
Run-time type checker for Python

This library provides run-time type checking for functions defined with PEP 484 argument (and return) type annotations. Four principal ways to do type

Alex Grönholm 1.1k Jan 5, 2023
Graphical Python debugger which lets you easily view the values of all evaluated expressions

birdseye birdseye is a Python debugger which records the values of expressions in a function call and lets you easily view them after the function exi

Alex Hall 1.5k Dec 24, 2022
A powerful set of Python debugging tools, based on PySnooper

snoop snoop is a powerful set of Python debugging tools. It's primarily meant to be a more featureful and refined version of PySnooper. It also includ

Alex Hall 874 Jan 8, 2023