A compiler for ARM, X86, MSP430, xtensa and more implemented in pure Python

Overview

Introduction

The PPCI (Pure Python Compiler Infrastructure) project is a compiler written entirely in the Python programming language. It contains front-ends for various programming languages as well as machine code generation functionality. With this library you can generate (working!) machine code using Python (and thus very easy to explore, extend, etc.)!

The project contains:

  • Language frontends for C, Python, Pascal, Basic and Brainfuck
  • Code generation for several architectures: 6500, arm, avr, m68k, microblaze, msp430, openrisc, risc-v, stm8, x86_64, xtensa
  • Command line utilities, such as ppci-cc, ppci-ld and ppci-opt
  • WebAssembly, JVM, OCaml support
  • Support for ELF, EXE, S-record and hexfile formats
  • An intermediate representation (IR) which can be serialized in json
  • The project can be used as a library so you can script the compilation process

Installation

Since the compiler is a python package, you can install it with pip:

$ pip install ppci

Usage

An example of commandline usage:

$ cd examples/linux64/hello-make
$ ppci-cc -c -O1 -o hello.o hello.c
...
$ ppci-ld --entry main --layout linux64.ld hello.o -o hello
...
$ ./hello
Hello, World!

API example to compile C code:

>> obj = cc(source_file, 'arm') >>> obj = link([obj])">
>>> import io
>>> from ppci.api import cc, link
>>> source_file = io.StringIO("""
...  int printf(char* fmt) { }
...
...  void main() {
...     printf("Hello world!\n");
...  }
... """)
>>> obj = cc(source_file, 'arm')
>>> obj = link([obj])

Example how to assemble some assembly code:

>> obj = asm(source_file, 'x86_64') >>> obj.get_section('code').data bytearray(b'[ARH\xbf*\x00\x00\x00\x00\x00\x00\x00')">
>>> import io
>>> from ppci.api import asm
>>> source_file = io.StringIO("""section code
... pop rbx
... push r10
... mov rdi, 42""")
>>> obj = asm(source_file, 'x86_64')
>>> obj.get_section('code').data
bytearray(b'[ARH\xbf*\x00\x00\x00\x00\x00\x00\x00')

Example of the low level api usage:

>>> from ppci.arch.x86_64 import instructions, registers
>>> i = instructions.Pop(registers.rbx)
>>> i.encode()
b'['

Functionality

  • Command line utilities:
  • Can be used with tools like make or other build tools.
  • Language support:
    • C
    • Pascal
    • Python
    • Basic
    • Brainfuck
    • C3 (PPCI's own systems language, intended to address some pitfalls of C)
  • CPU support:
    • 6500, arm, avr, m68k, microblaze, msp430, openrisc, risc-v, stm8, x86_64, xtensa
  • Support for:
    • WebAssembly
    • JVM
    • OCaml bytecode
    • LLVM IR
    • DWARF debugging format
  • File formats:
    • ELF files
    • COFF PE (EXE) files
    • hex files
    • S-record files
  • Uses well known human-readable and machine-processable formats like JSON and XML as its tools' formats.

Documentation

Documentation can be found here:

Warning

This project is in alpha state and not ready for production use!

You can try out PPCI at godbolt.org, a site which offers Web access to various compilers: https://godbolt.org/g/eooaPP

gitter appveyor codecov docstate travis codacygrade codacycoverage downloads conda

Comments
  • New release (0.5.8)?

    New release (0.5.8)?

    It looks like enough functionality landed in the PPCI to warrant a new release. (Which is always a chance to announce it and spread a word about the project).

    And IMHO, relocatable ELF is pretty big feature to warrant 0.6 version number.

    opened by pfalcon 10
  • Implementation of sbrk

    Implementation of sbrk

    In order to implement dynamic allocated str (#91) and more, today I worked on bringing malloc support. I am almost there (I altered the necessary python code for that), the only thing remaining is sbrk function, following this implementation I would need to link the information about the end of the free memory on the host. What would be the best way of doing that?

    opened by darleybarreto 9
  • Implement a simple malloc

    Implement a simple malloc

    This PR (related to #113) has some works on the C compiler side to enable heap allocations on *nix systems (depends on brk syscall). Also some additions to the librt/libc parts.

    opened by darleybarreto 8
  • C frontend: Support (basic) inline asm

    C frontend: Support (basic) inline asm

    My understanding that C frontend currently doesn't have inline asm support. Which means that any asm must be in a separate source, which means that currently it's not possible to compile a single C source and get an executable with one command. Rounded up to "integer digits" that in turn means "PPCI doesn't really work".

    I'd humbly suggest that a race to get following scenario work (ppci-cc hello.c -o hello; ./hello) should be the first task for the project.

    And for that, apparently inline asm support is needed. Per https://github.com/windelbouwman/ppci-mirror/issues/23, it should be implemented in GCC-compatible manner, which of course likely will take time and effort.

    As a stop-gap measure, might just introduce adhoc __ppci_asm("mov rax, rdi") to get that syscall() func up and running. (With a social contract that __ppci_asm will be removed once normal asm is implemented, to keep the codebase clean).

    opened by pfalcon 8
  • Translating to ppci assembly

    Translating to ppci assembly

    Hi, I am not a system programmer so I have really limited skills on this subject, specially when comes to assembly. I am trying to make x86_64 musl longjmp work on ppci, this is the implementation:

    .global longjmp
    longjmp:
    	xor %eax,%eax
    	cmp $1,%esi             /* CF = val ? 0 : 1 */
    	adc %esi,%eax           /* eax = val + !val */
    	mov (%rdi),%rbx         /* rdi is the jmp_buf, restore regs from it */
    	mov 8(%rdi),%rbp
    	mov 16(%rdi),%r12
    	mov 24(%rdi),%r13
    	mov 32(%rdi),%r14
    	mov 40(%rdi),%r15
    	mov 48(%rdi),%rsp
    	jmp *56(%rdi)           /* goto saved address without altering rsp */
    

    Is there a guide to the asm that ppci reads? I know some things, but from simple examples. The above snippet should be something like the following:

    global longjmp
    longjmp:
    	xor eax,eax
    	cmp #1,esi             ; CF = val ? 0 : 1
    	adc esi,eax           ; eax = val + !val
    	mov (rdi),rbx         ; rdi is the jmp_buf, restore regs from it
    	mov 8(rdi),rbp
    	mov 16(rdi),r12
    	mov 24(rdi),r13
    	mov 32(rdi),r14
    	mov 40(rdi),r15
    	mov 48(rdi),rsp
    	jmp *56(rdi)           ; goto saved address without altering rsp
    
    opened by darleybarreto 7
  • examples/linux64/hello-make: Example using single .c file and Makefile

    examples/linux64/hello-make: Example using single .c file and Makefile

    And a linker script. Made possible by the inline assembly support, added recently.

    The sample still has to perform various workarounds of PPCI warts, which rather be fixed before merging this sample.

    opened by pfalcon 7
  • FIX: uniformly handle string/numeric constants in multiple Python versions

    FIX: uniformly handle string/numeric constants in multiple Python versions

    Referring to greentreesnakes, constants have changed in handling from 3.8 and on. This PR attempts to unify the two and keep compatibility - with the current feature set.

    The short of it is that ast.Constant exists in 3.6 but is not emitted until 3.8. This means that in 3.6 and 3.7 that strings/numbers generate Num/Str, whereas in 3.8, ast.Constant is used for both but differentiates by way of a Python object .value.

    For me, Python 3.7 was previously hanging on ppci.common.CompilerError: ('Cannot do <_ast.Str object at 0x1029e8a10>', (mandelbrot.py, 31, 10)) before this PR, and works successfully after. Looks like it's not covered in the test suite currently.

    opened by klauer 6
  • ppci-cc: wrong initialization of nested structs

    ppci-cc: wrong initialization of nested structs

    With the following declarations:

    struct S1 { int a; int b; };

    struct S2 { int a; int b; union { int c; int d; }; struct S1 s; };

    struct S2 v = {1, 2, 3, {4, 5}};

    v.a is 1, v.b is 2, v.c and v.d are 3, but v.s.a and v.s.b contains garbage.

    bug 
    opened by tstreiff 6
  • ppci-cc : incorrect IR module generated with -O 2

    ppci-cc : incorrect IR module generated with -O 2

    When compiling the following C code with "ppci-cc -O 2 xxx.c", an unexpected error is reported void g(int n);

    void f() { int i; g(i); } Here is the progress report : 2020-01-15 22:59:12,590 | INFO | root | ppci 0.5.7 on CPython 3.6.9 2020-01-15 22:59:12,601 | INFO | cbuilder | Starting C compilation (c99) 2020-01-15 22:59:12,607 | INFO | cparser | Parsing finished 2020-01-15 22:59:12,610 | INFO | ccodegen | Finished IR-code generation 2020-01-15 22:59:12,612 | INFO | optimize | Optimizing module main level 2 2020-01-15 22:59:12,614 | ERROR | root | und_i_alloc = undefined is used 2020-01-15 22:59:12,614 | ERROR | root | None und_i_alloc = undefined is used I dig a bit and saw that the problem is reported by the IR verifier in irutils.py. It reports that instruction "undefined" has been found in the module. The instruction is apparently inserted by the "mem2reg promotor" during the optmizer pass. This is why the problem does not occur when compiling with -O 0. Note that if variable "i" is assigned a value before calling "g", the problem does not happen. If this comes from this, it should be reported in a clearer way.

    bug 
    opened by tstreiff 6
  • docs: Clarify

    docs: Clarify "python" vs "python3"

    It looks like PPCI requires Python3, which is great. It looks like "python -m ppci" syntax is used somewhere in the docs (as suggested in https://github.com/windelbouwman/ppci-mirror/issues/11), which is also great: https://ppci.readthedocs.io/en/latest/reference/lang/java.html#compile-java-ahead-of-time

    However, that uses exactly python, and for most users nowadays that will lead to stack trace ending with:

    AssertionError: Needs to be run in python version 3.x
    

    Not exactly user friendly (especially for people not familiar with Python, and #11 suggests to keep those in mind).

    So, suggested to use python3 consistently in the commands everywhere.

    opened by pfalcon 6
  • Improve/make consistent

    Improve/make consistent "python3 -m ppci" behavior

    I personally would prefer to not rely on entrypoints created by the installer script (and ideally/eventually, not rely on the need for ppci to be "installed" at all).

    So, the first thing I'm greeted is:

    $ python3 -m ppci
    Traceback (most recent call last):
      File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/mnt/hdd/projects-3rdparty/Python/Python-compilers-for-non-python/ppci-mirror/ppci/__main__.py", line 18, in <module>
        main()
      File "/mnt/hdd/projects-3rdparty/Python/Python-compilers-for-non-python/ppci-mirror/ppci/__main__.py", line 10, in main
        subcommand = sys.argv[1]
    IndexError: list index out of range
    

    Now, there's ppci-ld, but:

    $ python3 -m ppci ld
    ...
    ModuleNotFoundError: No module named 'ppci.cli.ld'
    

    Suddenly, the linker is under python3 -m ppci link. Let's fix it up and make it consistent, which is of course "ld", per https://github.com/windelbouwman/ppci-mirror/issues/23 . If there's strong desire "link" can stay as an alias, but I'd recommend against it, let's just just clean up such small cases, it's just not worse to maintain "diversification" in such small things, it will only lead to confusion and maintenance overheads.

    opened by pfalcon 5
  • docs: Fix a few typos

    docs: Fix a few typos

    There are small typos in:

    • ppci/arch/data_instructions.py
    • ppci/arch/runtime.py
    • ppci/lang/c/semantics.py
    • ppci/wasm/components.py

    Fixes:

    • Should read support rather than supprt.
    • Should read referred rather than refered.
    • Should read necessarily rather than nessecarily.

    Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

    opened by timgates42 0
  • Evaluation of constant expression in preprocessing directives may produce incorrect results

    Evaluation of constant expression in preprocessing directives may produce incorrect results

    #if (1 ? -2 : 1u) < 0
    #error
    #endif
    
    File : "<source>"
        1:#if (1 ? -2 : 0 + 1u) < 0
        2:#error
           ^ 
    2022-04-24 21:27:27,454 |     INFO |       root | ppci 0.5.5 compiler on CPython 3.8.10 on Linux-5.13.0-1021-aws-x86_64-with-glibc2.29
    2022-04-24 21:27:27,457 |     INFO |   cbuilder | Starting C compilation (c99)
    2022-04-24 21:27:27,458 |    ERROR |       root | 
    2022-04-24 21:27:27,458 |    ERROR |       root | (<source>, 2, 2)
    Compiler returned: 1
    
    ppci 0.5.5
    
    opened by pmor13 0
  • Reading section failed

    Reading section failed

    Hi, I tried to load an existing WASM module (maybe a bit unusual module, I don't know).

    <ipython-input-5-a1926f0a6fca> in <module>
    ----> 1 wasm.Module(open(r'/home/ekarni/pyodide/addlibs/__loader.wasm','rb'))
    
    ~/.pyenv/versions/3.9.6/lib/python3.9/site-packages/ppci/wasm/components.py in __init__(self, *input)
        109             elif hasattr(arg, "read"):
        110                 assert not hasattr(arg, "read_module")
    --> 111                 return self._from_file(arg)
        112
        113         # Else, more direct instantiation
    
    ~/.pyenv/versions/3.9.6/lib/python3.9/site-packages/ppci/wasm/components.py in _from_file(self, f)
        248
        249         reader = BinaryFileReader(f)
    --> 250         reader.read_module(self)
        251
        252     def to_string(self):
    
    ~/.pyenv/versions/3.9.6/lib/python3.9/site-packages/ppci/wasm/binary/reader.py in read_module(self, module)
         58             section_data = self.read_length_prefixed_bytes()
         59             with self.push_data(section_data):
    ---> 60                 self.read_section(section_id)
         61
         62         logger.info(
    
    ~/.pyenv/versions/3.9.6/lib/python3.9/site-packages/ppci/wasm/binary/reader.py in read_section(self, section_id)
         69     def read_section(self, section_id):
         70         """ Process a single section. """
    ---> 71         section_name = self._section_id_to_name[section_id]
         72         logger.debug("Loading %s section", section_name)
         73
    
    KeyError: 12
    
    

    A very simple module does work. This module was compiled by emcc. I can upload it.

    opened by eyalk11 1
  • lang/llvmir/parser: Factor out get_implicit_name(), use for implicit …

    lang/llvmir/parser: Factor out get_implicit_name(), use for implicit …

    …labels.

    LLVM IR has a habbit of using implicit names in various places, from function params to basic block labels. In such cases, an autogenerated "numbered name" is used.

    Signed-off-by: Paul Sokolovsky [email protected]

    opened by pfalcon 2
  • Compiling and running a simple GC

    Compiling and running a simple GC

    A while ago I've found a simple gc to use on a simple LISP implementation to be compiled with ppci-cc. What is lacking on master is __builtin_frame_address(0) and set/longjmp. Do you think this gc would be an interesting example to place inside tools? I have fixes for these two (x86_64). Instead of using a builtin __builtin_frame_address, I've made changes in the inline asm to allow something like this:

    long ret;
    asm("mov %0, rbp": "=r"(ret) ::);
    void *tos = (void*) ret;
    
    opened by darleybarreto 0
Owner
Windel Bouwman
Python fanatic
Windel Bouwman
The most widely used Python to C compiler

Welcome to Cython! Cython is a language that makes writing C extensions for Python as easy as Python itself. Cython is based on Pyrex, but supports mo

null 7.6k Jan 3, 2023
Reactjs web app written entirely in python, using transcrypt compiler.

Reactjs web app written entirely in python, using transcrypt compiler.

Dan Shai 22 Nov 27, 2022
A small C compiler written in Python for learning purposes

A small C compiler written in Python. Generates x64 Intel-format assembly, which is then assembled and linked by nasm and ld.

Scattered Thoughts 3 Oct 22, 2021
a simple functional programming language compiler written in python

Functional Programming Language A compiler for my small functional language. Written in python with SLY lexer/parser generator library. Requirements p

Ashkan Laei 3 Nov 5, 2021
A simple BrainF**k compiler written in Python

bf-comp A simple BrainF**k compiler written in Python. What else were you looking for?

null 1 Jan 9, 2022
πŸŽ‰ πŸŽ‰ PyComp - Java Code compiler written in python.

?? ?? PyComp Java Code compiler written in python. This is yet another compiler meant for babcock students project which was created using pure python

Alumona Benaiah 5 Nov 30, 2022
Pulse sequence builder and compiler for q1asm

q1pulse Pulse sequence builder and compiler for q1asm. q1pulse is a simple library to compile pulse sequence to q1asm, the assembly language of Qblox

Sander de Snoo 3 Dec 14, 2022
MatroSka Mod Compiler for ts4scripts

MMC Current Version: 0.2 MatroSka Mod Compiler for .ts4script files Requirements Have Python 3.7 installed and set as default. Running from Source pip

MatroSka 1 Dec 13, 2021
This is a small compiler to demonstrate how compilers work.

This is a small compiler to demonstrate how compilers work. It compiles our own dialect to C, while being written in Python.

Md. Tonoy Akando 2 Jul 19, 2022
A C-like hardware description language (HDL) adding high level synthesis(HLS)-like automatic pipelining as a language construct/compiler feature.

β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•— β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β•β• β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆ

Julian Kemmerer 391 Jan 1, 2023
Hotpile: High Order Turing Machine Language Compiler

Hotpile: High Order Turing Machine Language Compiler Build and Run Requirements: Python 3.6+, bison, flex, and GCC installed. Needs to be run under UN

Jiang Weihao 4 Dec 29, 2021
Compiler Final Project - Lisp Interpreter

Compiler Final Project - Lisp Interpreter

null 2 Jan 23, 2022
Ikaros is a free financial library built in pure python that can be used to get information for single stocks, generate signals and build prortfolios

Ikaros is a free financial library built in pure python that can be used to get information for single stocks, generate signals and build prortfolios

Salma Saidane 64 Sep 28, 2022
This repository contains a lot of short scripting programs implemented both in Python (Flask) and TypeScript (NodeJS).

fast-scripts This repository contains a lot of short scripting programs implemented both in Python (Flask) and TypeScript (NodeJS). In python These wi

Nahum Maurice 3 Dec 10, 2022
:snake: Complete C99 parser in pure Python

pycparser v2.20 Contents 1 Introduction 1.1 What is pycparser? 1.2 What is it good for? 1.3 Which version of C does pycparser support? 1.4 What gramma

Eli Bendersky 2.8k Dec 29, 2022
An ultra fast cross-platform multiple screenshots module in pure Python using ctypes.

Python MSS from mss import mss # The simplest use, save a screen shot of the 1st monitor with mss() as sct: sct.shot() An ultra fast cross-platfo

MickaΓ«l Schoentgen 799 Dec 30, 2022
Project based on pure python with OOP

Object oriented programming review Object oriented programming (OOP) is among the most used programming paradigms (if not the most common) in the indu

Facundo Abrahan Cerimeli 1 May 9, 2022
An kind of operating system portal to a variety of apps with pure python

pyos An kind of operating system portal to a variety of apps. Installation Run this on your terminal: git clone https://github.com/arjunj132/pyos.git

null 1 Jan 22, 2022
A(Sync) Interface for internal Audible API written in pure Python.

Audible Audible is a Python low-level interface to communicate with the non-publicly Audible API. It enables Python developers to create there own Aud

mkb79 192 Jan 3, 2023