x86-64 assembler embedded in Python


PeachPy logo

Portable Efficient Assembly Code-generator in Higher-level Python (PeachPy)

PeachPy License: Simplified BSD Travis-CI Build Status AppVeyor Build Status

PeachPy is a Python framework for writing high-performance assembly kernels.

PeachPy aims to simplify writing optimized assembly kernels while preserving all optimization opportunities of traditional assembly. Some PeachPy features:

  • Universal assembly syntax for Windows, Unix, and Golang assembly.
    • PeachPy can directly generate ELF, MS COFF and Mach-O object files and assembly listings for Golang toolchain
  • Automatic adaption of function to different calling conventions and ABIs.
    • Functions for different platforms can be generated from the same assembly source
    • Supports Microsoft x64 ABI, System V x86-64 ABI (Linux, OS X, and FreeBSD), Linux x32 ABI, Native Client x86-64 SFI ABI, Golang AMD64 ABI, Golang AMD64p32 ABI
  • Automatic register allocation.
    • PeachPy is flexible and lets mix auto-allocated and hardcoded registers in the same code.
  • Automation of routine tasks in assembly programming:
    • Function prolog and epilog and generated by PeachPy
    • De-duplication of data constants (e.g. Constant.float32x4(1.0))
    • Analysis of ISA extensions used in a function
  • Supports x86-64 instructions up to AVX-512 and SHA
    • Including 3dnow!+, XOP, FMA3, FMA4, TBM and BMI2.
    • Excluding x87 FPU and most system instructions.
    • Rigorously tested with auto-generated tests to produce the same opcodes as binutils.
  • Auto-generation of metadata files
    • Makefile with module dependencies (-MMD and -MF options)
    • C header for the generated functions
    • Function metadata in JSON format
  • Python-based metaprogramming and code-generation.
  • Multiplexing of multiple instruction streams (helpful for software pipelining).
  • Compatible with Python 2 and Python 3, CPython and PyPy.

Online Demo

You can try online demo on PeachPy.IO


PeachPy is actively developed, and thus there are presently no stable releases of 0.2 branch. We recommend that you use the master version:

pip install --upgrade git+https://github.com/Maratyszcza/PeachPy

Installation for development

If you plan to modify PeachPy, we recommend the following installation procedure:

git clone https://github.com/Maratyszcza/PeachPy.git
cd PeachPy
python setup.py develop

Using PeachPy as a command-line tool

# These two lines are not needed for PeachPy, but will help you get autocompletion in good code editors
from peachpy import *
from peachpy.x86_64 import *

# Lets write a function float DotProduct(const float* x, const float* y)

# If you want maximum cross-platform compatibility, arguments must have names
x = Argument(ptr(const_float_), name="x")
# If name is not specified, it is auto-detected
y = Argument(ptr(const_float_))

# Everything inside the `with` statement is function body
with Function("DotProduct", (x, y), float_,
  # Enable instructions up to SSE4.2
  # PeachPy will report error if you accidentially use a newer instruction
  target=uarch.default + isa.sse4_2):

  # Request two 64-bit general-purpose registers. No need to specify exact names.
  reg_x, reg_y = GeneralPurposeRegister64(), GeneralPurposeRegister64()

  # This is a cross-platform way to load arguments. PeachPy will map it to something proper later.
  LOAD.ARGUMENT(reg_x, x)
  LOAD.ARGUMENT(reg_y, y)

  # Also request a virtual 128-bit SIMD register...
  xmm_x = XMMRegister()
  # ...and fill it with data
  MOVAPS(xmm_x, [reg_x])
  # It is fine to mix virtual and physical (xmm0-xmm15) registers in the same code
  MOVAPS(xmm2, [reg_y])

  # Execute dot product instruction, put result into xmm_x
  DPPS(xmm_x, xmm2, 0xF1)

  # This is a cross-platform way to return results. PeachPy will take care of ABI specifics.

Now you can compile this code into a binary object file that you can link into a program...

# Use MS-COFF format with Microsoft ABI for Windows
python -m peachpy.x86_64 -mabi=ms -mimage-format=ms-coff -o example.obj example.py
# Use Mach-O format with SysV ABI for OS X
python -m peachpy.x86_64 -mabi=sysv -mimage-format=mach-o -o example.o example.py
# Use ELF format with SysV ABI for Linux x86-64
python -m peachpy.x86_64 -mabi=sysv -mimage-format=elf -o example.o example.py
# Use ELF format with x32 ABI for Linux x32 (x86-64 with 32-bit pointer)
python -m peachpy.x86_64 -mabi=x32 -mimage-format=elf -o example.o example.py
# Use ELF format with Native Client x86-64 ABI for Chromium x86-64
python -m peachpy.x86_64 -mabi=nacl -mimage-format=elf -o example.o example.py

What else? You can convert the program to Plan 9 assembly for use with Go programming language:

# Use Go ABI (asm version) with -S flag to generate assembly for Go x86-64 targets
python -m peachpy.x86_64 -mabi=goasm -S -o example_amd64.s example.py
# Use Go-p32 ABI (asm version) with -S flag to generate assembly for Go x86-64 targets with 32-bit pointers
python -m peachpy.x86_64 -mabi=goasm-p32 -S -o example_amd64p32.s example.py

If Plan 9 assembly is too restrictive for your use-case, generate .syso objects which can be linked into Go programs:

# Use Go ABI (syso version) to generate .syso objects for Go x86-64 targets
# Image format can be any (ELF/Mach-O/MS-COFF)
python -m peachpy.x86_64 -mabi=gosyso -mimage-format=elf -o example_amd64.syso example.py
# Use Go-p32 ABI (syso version) to generate .syso objects for Go x86-64 targets with 32-bit pointers
# Image format can be any (ELF/Mach-O/MS-COFF)
python -m peachpy.x86_64 -mabi=gosyso-p32 -mimage-format=elf -o example_amd64p32.syso example.py

See examples for real-world scenarios of using PeachPy with make, nmake and go generate tools.

Using PeachPy as a Python module

When command-line tool does not provide sufficient flexibility, Python scripts can import PeachPy objects from peachpy and peachpy.x86_64 modules and do arbitrary manipulations on output images, program structure, instructions, and bytecodes.

PeachPy as Inline Assembler for Python

PeachPy links assembly and Python: it represents assembly instructions and syntax as Python classes, functions, and objects. But it also works the other way around: PeachPy can represent your assembly functions as callable Python functions!

from peachpy import *
from peachpy.x86_64 import *

x = Argument(int32_t)
y = Argument(int32_t)

with Function("Add", (x, y), int32_t) as asm_function:
    reg_x = GeneralPurposeRegister32()
    reg_y = GeneralPurposeRegister32()

    LOAD.ARGUMENT(reg_x, x)
    LOAD.ARGUMENT(reg_y, y)

    ADD(reg_x, reg_y)


python_function = asm_function.finalize(abi.detect()).encode().load()

print(python_function(2, 2)) # -> prints "4"

PeachPy as Instruction Encoder

PeachPy can be used to explore instruction length, opcodes, and alternative encodings:

from peachpy.x86_64 import *

ADD(eax, 5).encode() # -> bytearray(b'\x83\xc0\x05')

MOVAPS(xmm0, xmm1).encode_options() # -> [bytearray(b'\x0f(\xc1'), bytearray(b'\x0f)\xc8')]

VPSLLVD(ymm0, ymm1, [rsi + 8]).encode_length_options() # -> {6: bytearray(b'\xc4\xe2uGF\x08'),
                                                       #     7: bytearray(b'\xc4\xe2uGD&\x08'),
                                                       #     9: bytearray(b'\xc4\xe2uG\x86\x08\x00\x00\x00')}



  • NNPACK -- an acceleration layer for convolutional networks on multi-core CPUs.
  • ChaCha20 -- Go implementation of ChaCha20 cryptographic cipher.
  • AEZ -- Go implemenetation of AEZ authenticated-encryption scheme.
  • bp128 -- Go implementation of SIMD-BP128 integer encoding and decoding.
  • go-marvin32 -- Go implementation of Microsoft's Marvin32 hash function.
  • go-highway -- Go implementation of Google's Highway hash function.
  • go-metro -- Go implementation of MetroHash function.
  • go-stadtx -- Go implementation of Stadtx hash function.
  • go-sip13 -- Go implementation of SipHash 1-3 function.
  • go-chaskey -- Go implementation of Chaskey MAC.
  • go-speck -- Go implementation of SPECK cipher.
  • go-bloomindex - Go implementation of Bloom-filter based search index.
  • go-groupvariant - SSE-optimized group varint integer encoding in Go.
  • Yeppp! performance library. All optimized kernels in Yeppp! are implemented in PeachPy (uses old version of PeachPy with deprecated syntax).

Peer-Reviewed Publications

  • Marat Dukhan "PeachPy: A Python Framework for Developing High-Performance Assembly Kernels", Python for High-Performance Computing (PyHPC) 2013 (slides, paper, code uses deprecated syntax)
  • Marat Dukhan "PeachPy meets Opcodes: Direct Machine Code Generation from Python", Python for High-Performance Computing (PyHPC) 2015 (slides, paper on ACM Digital Library).

Other Presentations


  • Nearly all instruction classes in PeachPy are generated from Opcodes Database
  • Instruction encodings in PeachPy are validated against binutils using auto-generated tests
  • PeachPy uses six and enum34 packages as a compatibility layer between Python 2 and Python 3


HPC Garage logo Georgia Tech College of Computing logo

This work is a research project at the HPC Garage lab in the Georgia Institute of Technology, College of Computing, School of Computational Science and Engineering.

The work was supported in part by grants to Prof. Richard Vuduc's research lab, The HPC Garage, from the National Science Foundation (NSF) under NSF CAREER award number 0953100; and a grant from the Defense Advanced Research Projects Agency (DARPA) Computer Science Study Group program

Any opinions, conclusions or recommendations expressed in this software and documentation are those of the authors and not necessarily reflect those of NSF or DARPA.

  • How do we debug C/C++ Application

    How do we debug C/C++ Application

    I have a question: Say I am developing an application in C/C++ and part of the code I develop in assembly, If I use peachpy to write this assembly code, How am I going to debug this assembly code?

    In visual studio while using yasm assebler this was possible. Is similar kind of thing can be done in PeachPy?

    How PeachPy is different from using intrinsics? ThankYou

    opened by kvaragan 13
  • Go: `uintptr` support?

    Go: `uintptr` support?


    It seems to be this can map directly to uintptr_t ?

    diff --git a/peachpy/x86_64/function.py b/peachpy/x86_64/function.py
    index f4614d6..00c81b9 100644
    --- a/peachpy/x86_64/function.py
    +++ b/peachpy/x86_64/function.py
    @@ -130,6 +130,8 @@ class Function:
                     return "boolean"
                 elif c_type.is_size_integer:
                     return "int" if c_type.is_signed_integer else "uint"
    +            elif c_type.is_pointer_integer:
    +                return "uintptr"
                 elif c_type.is_signed_integer:
                     return {
                         1: "int8",

    My rationale for wanting this is to make it easier to have slices and structs as arguments.

    A slice header ( https://golang.org/pkg/reflect/#SliceHeader ) can be simulated by having three arguments,

    s = Argument(ptr(uintptr_t))
    s_len = Argument(int64_t)
    s_cap = Argument(int64_t)

    (I know the documentation says to use ptrdiff_t to get Go's int, but that really makes the code confusing... In my code I'm using int64 explicitly because I know I'm on 64-bit platform.)

    However, due to the lack of mapping for uintptr, this fails when trying to generate the commented function header -- I have to put an explicit integer type for s.

    Similarly if I want to pass a pointer to a struct, using uintptr_t seems the only sane option.

    opened by dgryski 12
  • rsp is not exposed through x86_64/__init__.py

    rsp is not exposed through x86_64/__init__.py

    I noticed that x86_64/__init__.py exposes registers from registers.py for easier access, but not rsp. Is there a specific reason for this, or just an oversight? I'll be happy to submit a fix PR if applicable

    opened by eliben 12
  • How do I use the IDIV instruction? (also: Reserving the AX register)

    How do I use the IDIV instruction? (also: Reserving the AX register)

    The IDIV instruction puts its result in the AX register. The problem I have is that I can't figure out how to prevent GeneralPurposeRegister64 from allocating AX, so my pseudoregisters end up overwriting the input and result.

    How do I prevent pseudoregisters from allocating AX?

    opened by pwaller 12
  • Make PeachPy pip installable

    Make PeachPy pip installable

    This patch makes setuptools a dependency. This is needed to make setup.py automatically make the Opcodes module available at setup time.

    Opcodes is made into a setup-time dependency, and if the module is installed with python setup.py build or python setup.py develop or pip install PeachPy or pip install --editable PeachPy, then the python setup.py generate command is run automatically.

    This patch depends on https://github.com/Maratyszcza/Opcodes/pull/5 because of the way setuptools handles setup-time dependencies. They are not installed proper, but made available as an .egg file. Opcodes, before Maratyszcza/Opcodes#5, tries to read the XML descriptions by opening a file relative to __file__, which is not a thing which can be read with open(). Instead, it has to use pkg_resources to do so.

    With these two PRs merged, it should be possible to pip install git+https://github.com/Maratyszcza/PeachPy. We could also then discuss making it easy to package up and upload to PyPi.

    opened by pwaller 10
  • 32 bit Python 3.5 in a 64 bit Windows 8.1 machine doesn't detect() abi correctly.

    32 bit Python 3.5 in a 64 bit Windows 8.1 machine doesn't detect() abi correctly.


    x86_64\abi.py Line 134:

        elif osname == "Windows" and machine == "AMD64" and pointer_size == 8:
            return microsoft_x64_abi

    But when I get here on my Windows 8.1 64 bit, running Python35 (32 bit Python35 BTW.. probably should install 64 bit) it skips this abi, because my pointer_size = 4, not 8.

    ..not sure if running a 32 bit Python on a 64 bit machine is even something you want to deal with.

    I didn't mean to install 32 bit Python, and will correct in a bit - let me know if you want me to flog PeachPy in this environment a bit more before I correct and I will.

    opened by cforger 10
  • Liveness analysis does not appear to work correctly

    Liveness analysis does not appear to work correctly

    Following on from #60, consider the following simple program. It moves a value into a general purpose register and subsequently returns that value. In the meantime, it explicitly uses rax. My expectation is that the liveness analysis should determine that rax is in use and therefore the GeneralPurposeRegister64 register r should not choose rax for the allocation.

    input_ptr = Argument(ptr())
    with Function("foo", (input_ptr,), int64_t)  as function:
        name = LOAD.ARGUMENT(r15, input_ptr)
        r = GeneralPurposeRegister64()
        MOV(r, 1234)
        MOV(rax, 4567)
        MOV([r15], rax)

    The resulting code I get:

    // Generated by PeachPy 0.2.0 from f.py
    // func foo(input_ptr uintptr) int64
    TEXT ·foo(SB),4,$0-16
    	MOVQ input_ptr+0(FP), R15
    	MOVQ $1234, AX
    	MOVQ $4567, AX
    	MOVQ AX, 0(R15)
    	MOVQ AX, ret+8(FP)

    I have instrumented the liveness analysis code over at my instrument branch. The output of that instrumentation is as follows.

    'LOAD.ARGUMENT r15, void* input_ptr' in_mask {}; out_mask {15: 15}
    'MOV gp64-vreg<1>, 1234' in_mask {}; out_mask {-1: 15}
    'MOV rax, 4567' in_mask {}; out_mask {0: 15}
    'MOV [r15], rax' in_mask {0: 15, 15: 15}; out_mask {}
    'RETURN gp64-vreg<1>' in_mask {-1: 15}; out_mask {}
    analyze_availability -> defaultdict(<class 'int'>, {0: 15, -1: 15, 15: 15})
    Consider block [0, 5)
    analyze_liveness [0, 5) {}
    live_registers_list() masks: defaultdict(<class 'int'>, {})
    live_registers_list [0, 5) -> [defaultdict(<class 'int'>, {-1: 15}), defaultdict(<class 'int'>, {0: 15, -1: 15, 15: 15}), defaultdict(<class 'int'>, {-1: 15, 15: 15}), defaultdict(<class 'int'>, {15: 15}), defaultdict(<class 'int'>, {})]
    _analize availability: LOAD.ARGUMENT r15, void* input_ptr -> live_registers: defaultdict(<class 'int'>, {})
    _analize availability: MOV gp64-vreg<1>, 1234 -> live_registers: defaultdict(<class 'int'>, {15: 15})
    _analize availability: MOV rax, 4567 -> live_registers: defaultdict(<class 'int'>, {-1: 15, 15: 15})
    _analize availability: MOV [r15], rax -> live_registers: defaultdict(<class 'int'>, {0: 15, -1: 15, 15: 15})
    _analize availability: RETURN gp64-vreg<1> -> live_registers: defaultdict(<class 'int'>, {-1: 15})
    _analize (conflicts)  'LOAD.ARGUMENT r15, void* input_ptr' inp: set(); out: {r15}; unalloc: []; live: defaultdict(<class 'int'>, {})
    _analize (conflicts)  'MOV gp64-vreg<1>, 1234' inp: set(); out: {gp64-vreg<1>}; unalloc: []; live: defaultdict(<class 'int'>, {15: 15})
    _analize (conflicts)  'MOV rax, 4567' inp: set(); out: {rax}; unalloc: [gp64-vreg<1>]; live: defaultdict(<class 'int'>, {-1: 15, 15: 15})
        gp64-vreg<1>::conflict_internal_ids: [-1, 15]
          Conflicts for virtual 1 are now {1: {-1, 15}}
    _analize (conflicts)  'MOV [r15], rax' inp: {r15, rax}; out: set(); unalloc: []; live: defaultdict(<class 'int'>, {0: 15, -1: 15, 15: 15})
    _analize (conflicts)  'RETURN gp64-vreg<1>' inp: {gp64-vreg<1>}; out: set(); unalloc: [gp64-vreg<1>]; live: defaultdict(<class 'int'>, {-1: 15})
        gp64-vreg<1>::conflict_internal_ids: [-1]
          Conflicts for virtual 1 are now {1: {-1, 15}}
    Function._bind_registers gp64-vreg<1> to None
    Function._bind_registers gp64-vreg<1> to None
    RegisterAllocator._bind_register: Allocate virtual 1 to physical 0
    RegisterAllocator._bind_register: Considered conflicts {-1, 15}
    	LOAD.ARGUMENT r15, void* input_ptr
    	MOV rax, 1234
    	MOV rax, 4567
    	MOV [r15], rax

    I note that the list of conflicts does not change if I change the register I move 4567 into. If I do MOV(rbx, 4567) instead, I still get Conflicts for virtual 1 are now {1: {-1, 15}}. I don't quite understand the representations and intent of the register allocation code well enough to diagnose why this is, but I would expect that the conflict list depends on which register is named, but that does not seem to be the case.

    opened by pwaller 10
  • Getting

    Getting "ImportError: No module named x86_64" when running simple code

    I have this simple code:

    import peachpy
    import peachpy.x86_64
    x = peachpy.Argument(peachpy.int64_t)
    with peachpy.x86_64.Function("Add4", (x,), peachpy.int64_t) as asm_function:
        peachpy.x86_64.LOAD.ARGUMENT(peachpy.x86_64.rax, x)
        peachpy.x86_64.ADD(peachpy.x86_64.rax, 4)
    abi = peachpy.x86_64.abi.detect()
    encoded_function = asm_function.finalize(abi).encode()
    python_function = encoded_function.load()
    # The JIT call

    When I run this code, I get this error:

    Traceback (most recent call last):
      File "C:/Users/Oround/Desktop/Tests.py", line 2, in <module>
        import peachpy.x86_64
    ImportError: No module named x86_64

    Why am I getting this error? Is there something I could do to fix it?

    opened by bifunctor 9
  • Mixing virtual and physical registers may produce bugs

    Mixing virtual and physical registers may produce bugs

    If virtual register is allocated before physical registers, PeachPy can bind it to the same physical register.

    opened by Maratyszcza 9
  • Why is PUSH(r32/m32) not supported?

    Why is PUSH(r32/m32) not supported?

    Error: "Invalid operand types: PUSH r32"

    From the source code:

    class PUSH(Instruction):
        """Push Value Onto the Stack"""
        def __init__(self, *args, **kwargs):
            """Supported forms:
                * PUSH(imm32)
                * PUSH(r16/m16)
                * PUSH(r64/m64)

    Is it intended that PUSH(r32/m32) is missing? Same for POP

    opened by Ou7law007 0
  • PyPI page is outdated

    PyPI page is outdated


    It has the version 0.0.1 dated 2013, and there is no 0.0.1 here on GitHub while it was last updated recently.

    opened by yurivict 0
  • ROL: doesn't require 2 operands

    ROL: doesn't require 2 operands

    In current implementation ROL r is not supported - in x86 assembly this does shift by 1. Probably same thing for RCL, RCR and ROR

    opened by kriskwiatkowski 0
  • Make the output of codecode/x86_64.py reproducible

    Make the output of codecode/x86_64.py reproducible

    Whilst working on the Reproducible Builds effort I noticed that PeachPy was generating code/documentation in a non-deterministic manner. This is because it was iterating over data structures in a nondeterministic manner which resultd in (non-normative) differences such as:

      -<li><p>VEXPANDPD(ymm{k}{z}, ymm/m256)    [AVX512F and AVX512VL]</p></li>
       <li><p>VEXPANDPD(xmm{k}{z}, xmm/m128)    [AVX512VL]</p></li>
      +<li><p>VEXPANDPD(ymm{k}{z}, ymm/m256)    [AVX512F and AVX512VL]</p></li>


      -        self._implicit_out_regs = {0: 7, 2: 7}
      +        self._implicit_out_regs = {2: 7, 0: 7}

    This PR just sorts a bunch of stuff prior to output.

    (This was originally filed in Debian as #964186.)

    opened by lamby 0
  • RIP call

    RIP call

    Hi, I'm trying to assemble a position independent piece of code. I'm trying to create a CALL with a relative pointer using PC/IP:

    from peachpy.x86_64 import *
    from peachpy.x86_64.registers import rip
    call = CALL([rip+8])

    and I'm getting:

    ~/.local/lib/python3.6/site-packages/peachpy/x86_64/generic.py in __init__(self, *args, **kwargs)
       9875             origin = inspect.stack()
       9876         super(CALL, self).__init__("CALL", origin=origin, prototype=prototype)
    -> 9877         self.operands = tuple(map(check_operand, args))
       9878         if len(self.operands) != 1:
       9879             raise SyntaxError("Instruction \"CALL\" requires 1 operands")
    ~/.local/lib/python3.6/site-packages/peachpy/x86_64/operand.py in check_operand(operand)
         26         if len(operand) != 1:
         27             raise ValueError("Memory operands must be represented by a list with only one element")
    ---> 28         return MemoryOperand(operand[0])
         29     elif isinstance(operand, Constant):
         30         from copy import copy, deepcopy
    ~/.local/lib/python3.6/site-packages/peachpy/x86_64/operand.py in __init__(self, address, size, mask, broadcast)
        249             isinstance(address.register, (XMMRegister, YMMRegister, ZMMRegister)) and \
        250             not address.mask.is_zeroing, \
    --> 251             "Only MemoryAddress, 64-bit general-purpose registers, XMM/YMM/ZMM registers, " \
        252             "and merge-masked XMM/YMM/ZMM registers may be specified as an address"
        253         from peachpy.util import is_int
    AssertionError: Only MemoryAddress, 64-bit general-purpose registers, XMM/YMM/ZMM registers, and merge-masked XMM/YMM/ZMM registers may be specified as an address

    Am I doing something wrong?

    opened by tsarpaul 2
  • python2 to python3 fixes nose tests failed due python2 specific code …

    python2 to python3 fixes nose tests failed due python2 specific code …

    nose tests failed due python2 specific code conversion was done with 2to3

    opened by mslacken 0
  • How do you run the go-generate example code? (newbie)

    How do you run the go-generate example code? (newbie)

    Apologies if this is the wrong forum for a newbie question. My problem is more about how to use go assembler than python. I have gotten stuck trying to compile and run the output of PeachPy in the go-generate example:

    [email protected]:~/Documents/PeachPy/examples/go-generate$ go generate dot_product.go 
    [email protected]:~/Documents/PeachPy/examples/go-generate$ ls
    dot_product_amd64.s  dot_product.go  dot_product.py  dot_product_test.go  main.go

    I tried many different go build and go run commands with no success yet. To get the main.go to run I had to change "package blas" to "package main" in all files and rename the folder from "go-generate" to "main". Finally I could run the main function with "go build -x .". Sadly the results seem to be incorrect, so I guess I am doing something wrong still:

    [email protected]:~/Documents/PeachPy/examples/main$ git diff main.go
    diff --git a/examples/main/main.go b/examples/main/main.go
    index 18f6e32..80af7e1 100644
    --- a/examples/main/main.go
    +++ b/examples/main/main.go
    @@ -1,17 +1,21 @@
    -package blas
    +package main
     import "fmt"
     func main() {
    +       var d float32
            x := make([]float32, 2048)
            y := make([]float32, len(x))
    +       d = 0
            for i := 0; i < len(x); i++ {
                    x[i] = 2.0
                    y[i] = 3.0
    +               d += x[i] * y[i]
            z := DotProduct(&x[0], &y[0], uint(len(x)))
            fmt.Println("hello world")
    -       fmt.Println("z =", z)
    +       fmt.Println("z =", z, "d =", d)
    [email protected]:~/Documents/PeachPy/examples/main$ go build -x .
    mkdir -p $WORK/b001/
    cat >$WORK/b001/importcfg.link << 'EOF' # internal
    packagefile _/home/jon/Documents/PeachPy/examples/main=/home/jon/.cache/go-build/ff/ff2e6ec9f76a30991be7715da37e3f2bb5e4bada2dfb76a5c6f6c282c58359c5-d
    packagefile fmt=/usr/lib/go-1.10/pkg/linux_amd64/fmt.a
    packagefile runtime=/usr/lib/go-1.10/pkg/linux_amd64/runtime.a
    packagefile errors=/usr/lib/go-1.10/pkg/linux_amd64/errors.a
    packagefile io=/usr/lib/go-1.10/pkg/linux_amd64/io.a
    packagefile math=/usr/lib/go-1.10/pkg/linux_amd64/math.a
    packagefile os=/usr/lib/go-1.10/pkg/linux_amd64/os.a
    packagefile reflect=/usr/lib/go-1.10/pkg/linux_amd64/reflect.a
    packagefile strconv=/usr/lib/go-1.10/pkg/linux_amd64/strconv.a
    packagefile sync=/usr/lib/go-1.10/pkg/linux_amd64/sync.a
    packagefile unicode/utf8=/usr/lib/go-1.10/pkg/linux_amd64/unicode/utf8.a
    packagefile runtime/internal/atomic=/usr/lib/go-1.10/pkg/linux_amd64/runtime/internal/atomic.a
    packagefile runtime/internal/sys=/usr/lib/go-1.10/pkg/linux_amd64/runtime/internal/sys.a
    packagefile sync/atomic=/usr/lib/go-1.10/pkg/linux_amd64/sync/atomic.a
    packagefile internal/cpu=/usr/lib/go-1.10/pkg/linux_amd64/internal/cpu.a
    packagefile internal/poll=/usr/lib/go-1.10/pkg/linux_amd64/internal/poll.a
    packagefile internal/testlog=/usr/lib/go-1.10/pkg/linux_amd64/internal/testlog.a
    packagefile syscall=/usr/lib/go-1.10/pkg/linux_amd64/syscall.a
    packagefile time=/usr/lib/go-1.10/pkg/linux_amd64/time.a
    packagefile unicode=/usr/lib/go-1.10/pkg/linux_amd64/unicode.a
    packagefile internal/race=/usr/lib/go-1.10/pkg/linux_amd64/internal/race.a
    mkdir -p $WORK/b001/exe/
    cd .
    BUILD_PATH_PREFIX_MAP='/tmp/go-build=$WORK:' /usr/lib/go-1.10/pkg/tool/linux_amd64/link -o $WORK/b001/exe/a.out -importcfg $WORK/b001/importcfg.link -buildmode=exe -buildid=QZKile1wdIBBkCwbsWwa/66eWng91NslShieKxUh9/4qjNxkXklLxRsN2b-BAg/QZKile1wdIBBkCwbsWwa -extld=gcc /home/jon/.cache/go-build/ff/ff2e6ec9f76a30991be7715da37e3f2bb5e4bada2dfb76a5c6f6c282c58359c5-d
    /usr/lib/go-1.10/pkg/tool/linux_amd64/buildid -w $WORK/b001/exe/a.out # internal
    mv $WORK/b001/exe/a.out main
    rm -r $WORK/b001/
    [email protected]:~/Documents/PeachPy/examples/main$ ./main
    hello world
    z = 5.252114e+21 d = 12288
    [email protected]:~/Documents/PeachPy/examples/main$ go test  -bench .
    goos: linux
    goarch: amd64
    BenchmarkDotProduct_L1_PeachPy-4   	10000000	       170 ns/op
    BenchmarkDotProduct_L2_PeachPy-4   	  500000	      3043 ns/op
    BenchmarkDotProduct_L3_PeachPy-4   	   50000	     35891 ns/op
    BenchmarkDotProduct_L1_Go-4        	 1000000	      1647 ns/op
    BenchmarkDotProduct_L2_Go-4        	  100000	     13089 ns/op
    BenchmarkDotProduct_L3_Go-4        	   10000	    104663 ns/op

    Could you help me? I am missing the exact series of "go" commands (or other) that need to be used in a shell order to run the code, run the tests and verify the output is correct.

    Adding the main function code into the test code and running "go test" lets me use the original folder, but the answer is still incorrect ?

    [email protected]:~/Documents/PeachPy/examples/go-generate$ go test  .
    --- FAIL: TestDotProduct (0.00s)
    	dot_product_test.go:20: Dot incorrect, got: 5252114210988121653248.000000 want: 12288.000000
    FAIL	_/home/jon/Documents/PeachPy/examples/go-generate	0.001s
    [email protected]:~/Documents/PeachPy/examples/go-generate$ git diff dot_product_test.go
    diff --git a/examples/go-generate/dot_product_test.go b/examples/go-generate/dot_product_test.go
    index d7f24b4..e86a9e2 100644
    --- a/examples/go-generate/dot_product_test.go
    +++ b/examples/go-generate/dot_product_test.go
    @@ -4,6 +4,22 @@ import (
    +func TestDotProduct(t *testing.T ){
    +        var N int
    +       N = 2048
    +       x := make([]float32, N)
    +       y := make([]float32, N)
    +       for i:=0 ; i < N; i++ {
    +           x[i] = 2.
    +           y[i] = 3.
    +       }
    +       dasm := DotProduct( &x[0], &y[0], uint(N) )
    +       dtrue := float32( N ) * 6
    +       if dasm != dtrue {
    +          t.Errorf("Dot incorrect, got: %f want: %f",dasm,dtrue)
    +       }

    It could also be an error in the generated asm code perhaps, but I was sort of assuming this should be OK...

    Many thanks in advance for your help!

    opened by jonwright 0
  • Deterministic output

    Deterministic output

    Every time I regenerate the assembly for https://github.com/dgryski/go-highway with python3, some of the labels change. This makes it difficult to isolate a small change in the source file with a small change in the output.

    opened by dgryski 1
  • Slow function finalize/encode

    Slow function finalize/encode

    I’m generating about 1000 test cases, a few hundred instructions each. This process takes 195 seconds on my PC. According to cProfile 64% of the time is spent in deepcopy, which is quite a bit. Below is a full call graph generated with gprof2dot:


    Apparently each of the two steps (ABIFunction, EncodedFunction) involved deepcopy()ies all instruction objects, which are quite heavy. If deepcopy() could be – somehow – avoided the whole process would be alot faster.

    opened by PromyLOPh 1
  • Add missing XLATB group/import

    Add missing XLATB group/import

    Have you thought about dropping x86_64.json (i.e. dump everything into a single file) and using import *? It took me a while to understand why newly added instructions (opcodes) did not generate classes until I closely looked at the codegen.

    opened by PromyLOPh 2
Marat Dukhan
Marat Dukhan
The Python programming language

This is Python version 3.10.0 alpha 5 Copyright (c) 2001-2021 Python Software Foundation. All rights reserved. See the end of this file for further co

Python 40.9k Oct 26, 2021
The Stackless Python programming language

This is Python version 3.7.0 alpha 4+ Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 20

Stackless Python 759 Oct 21, 2021
Grumpy is a Python to Go source code transcompiler and runtime.

Grumpy: Go running Python Overview Grumpy is a Python to Go source code transcompiler and runtime that is intended to be a near drop-in replacement fo

Google 10.4k Oct 22, 2021
Python for .NET is a package that gives Python programmers nearly seamless integration with the .NET Common Language Runtime (CLR) and provides a powerful application scripting tool for .NET developers.

pythonnet - Python.NET Python.NET is a package that gives Python programmers nearly seamless integration with the .NET Common Language Runtime (CLR) a

null 2.8k Oct 22, 2021
An implementation of Python in Common Lisp

CLPython - an implementation of Python in Common Lisp CLPython is an open-source implementation of Python written in Common Lisp. With CLPython you ca

Willem Broekema 326 Sep 3, 2021
Pyjion - A JIT for Python based upon CoreCLR

Pyjion Designing a JIT API for CPython A note on development Development has moved to https://github.com/tonybaloney/Pyjion FAQ What are the goals of

Microsoft 1.5k Oct 25, 2021
DO NOT USE. Implementation of Python 3.x for .NET Framework that is built on top of the Dynamic Language Runtime.

IronPython 3 IronPython3 is NOT ready for use yet. There is still much that needs to be done to support Python 3.x. We are working on it, albeit slowl

IronLanguages 1.6k Oct 21, 2021
A mini implementation of python library.

minipy author = RQDYSGN date = 2021.10.11 version = 0.2 1. 简介 基于python3.7环境,通过py原生库和leetcode上的一些习题构建的超小型py lib。 2. 环境 Python 3.7 2. 结构 ${project_name}

RQDYGSN 2 Oct 11, 2021
A faster and highly-compatible implementation of the Python programming language. The code here is out of date, please follow our blog

Pyston is a faster and highly-compatible implementation of the Python programming language. Version 2 is currently closed source, but you can find the

null 5k Oct 16, 2021
CPython Extension Module Support for Flit

CPython Extension Module Support for Flit This is a PEP 517 build backend piggybacking (and hacking) Flit to support building C extensions. Mostly a p

Tzu-ping Chung 6 Sep 16, 2021