It's like Forth but in Python

Related tags

Miscellaneous porth
Overview

Porth

WARNING! This language is a work in progress!

It's like Forth but written in Python. But I don't actually know for sure since I never programmed in Forth, I only heard that it's some sort of stack-based programming language. Porth is also stack-based programming language. Which makes it just like Forth am I rite?

Porth is planned to be

  • Compiled
  • Native
  • Stack-based (just like Forth)
  • Turing-complete
  • Self-hosted (Python is used only as an initial bootstrap, once the language is mature enough we gonna rewrite it in itself)
  • Statically typed (the type checking is probably gonna be similar to the WASM validation)

(these are not the selling points, but rather milestones of the development)

Examples

Hello, World:

include "std.porth"

"Hello, World\n" stdout write

Simple program that prints numbers from 0 to 99 in an ascending order:

do dup print 1 + end ">
include "std.porth"

100 0 while 2dup > do
    dup print 1 +
end

Quick Start

Simulation

Simulation simply interprets the program.

$ cat program.porth
34 35 + print
$ ./porth.py sim program.porth
69

It is strongly recommended to use PyPy for the Simulation Mode since CPython is too slow for that. Try to simulate ./euler/problem04.porth using CPython and compare it with PyPy and Compilation Mode.

Compilation

Compilation generates assembly code, compiles it with nasm, and then links it with GNU ld. So make sure you have both available in your $PATH.

$ cat program.porth
34 35 + print
$ ./porth.py com program.porth
[INFO] Generating ./program.asm
[CMD] nasm -felf64 ./program.asm
[CMD] ld -o ./program ./program.o
$ ./program
69

Testing

Test cases are located in ./tests/ folder. The *.txt files contain inputs (command line arguments, stdin) and expected outputs (exit code, stdout, stderr) of the corresponding programs.

Run ./test.py script to execute the programs and assert their outputs:

$ ./test.py run

To updated expected outputs of the programs run the update subcommand:

$ ./test.py update

To update expected command line arguments and stdin of a specific program run the update input subcommand:

$ ./test.py update input ./tests/argv.porth new cmd args
[INFO] Provide the stdin for the test case. Press ^D when you are done...
Hello, World
^D
[INFO] Saving input to ./tests/argv.txt

The ./examples/ folder contains programs that are ment for showcasing the language rather then testing it, but we still can use them for testing just like the stuff in the ./tests/ folder:

$ ./test.py run ./examples/
$ ./test.py update input ./examples/name.porth
$ ./test.py update output ./examples/

For more info see ./test.py help

Usage

If you wanna use the Porth compiler separately from its codebase you only need two things:

By default the compiler searches files to include in ./ and ./std/. You can add more search paths via the -I flag before the subcommand: ./porth.py -I com ... . See ./porth.py help for more info.

Language Reference

This is what the language supports so far. Since the language is a work in progress everything in this section is the subject to change.

Data Types

Integer

Currently an integer is anything that is parsable by int function of Python. When the compiler encounters an integer it pushes it onto the data stack for processing by the relevant operations.

Example:

10 20 +

The code above pushes 10 and 20 onto the data stack and sums them up with + operation.

String

Currently a string is any sequence of bytes sandwiched between two ". No newlines inside of the strings are allowed. Escaping is done by unicode_escape codec of Python. No way to escape " themselves for now. No special support for Unicode is provided right now too.

When the compiler encounters a string:

  1. the size of the string in bytes is pushed onto the data stack,
  2. the bytes of the string are copied somewhere into the memory (the exact location is implementation specific),
  3. the pointer to the beginning of the string is pushed onto the data stack.

Those, a single string pushes two values onto the data stack: the size and the pointer.

Example:

include "std.porth"
"Hello, World" stdout write

The write macro from std.porth module expects three values on the data stack:

  1. the size of the buffer it needs to print,
  2. the pointer to the beginning of the buffer,
  3. and the output file descriptor where it needs to print to.

The size and the pointer are provided by the string "Hello, World". The file descriptor is stdout macro from std.porth.

Character

Currently a character is a single byte sandwiched between two '. Escaping is done by unicode_escape codec of Python. No way to escape ' themselves for now. No special support for Unicode is provided right now too.

When compiler encounters a character it pushes its value as an integer onto the stack.

Example:

'E' print

This program pushes integer 69 onto the stack (since the ASCII code of letter E is 69) and prints it with the print operation.

Built-in Words

Stack Manipulation

  • dup - duplicate an element on top of the stack.
a = pop()
push(a)
push(a)
  • swap - swap 2 elements on the top of the stack.
a = pop()
b = pop()
push(a)
push(b)
  • drop - drops the top element of the stack.
pop()
  • print - print the element on top of the stack in a free form to stdout and remove it from the stack.
a = pop()
print(a)
  • over
a = pop()
b = pop()
push(b)
push(a)
push(b)

Comparison

  • = - checks if two elements on top of the stack are equal. Removes the elements from the stack and pushes 1 if they are equal and 0 if they are not.
a = pop()
b = pop()
push(int(a == b))
  • != - checks if two elements on top of the stack are not equal.
a = pop()
b = pop()
push(int(a != b))
  • > - checks if the element below the top greater than the top.
b = pop()
a = pop()
push(int(a > b))
  • < - checks if the element below the top less than the top.
b = pop()
a = pop()
push(int(a < b))
  • >=
b = pop()
a = pop()
push(int(a >= b))
  • <=
b = pop()
a = pop()
push(int(a >= b))

Arithmetic

  • + - sums up two elements on the top of the stack.
a = pop()
b = pop()
push(a + b)
  • - - subtracts the top of the stack from the element below.
a = pop()
b = pop()
push(b - a)
  • * - multiples the top of the stack with the element below the top of the stack
a = pop()
b = pop()
push(b * a)
  • divmod
a = pop()
b = pop()
push(b // a)
push(b % a)

Bitwise

  • shr
a = pop()
b = pop()
push(b >> a)
  • shl
a = pop()
b = pop()
push(b << a)
  • bor
a = pop()
b = pop()
push(b | a)
  • band
a = pop()
b = pop()
push(b & a)

Control Flow

  • if else end - pops the element on top of the stack and if the element is not 0 executes the , otherwise .
  • while do end - keeps executing both and until produces 0 at the top of the stack. Checking the result of the removes it from the stack.

Memory

  • mem - pushes the address of the beginning of the memory where you can read and write onto the stack.
push(mem_addr)
  • . - store a given byte at the address on the stack.
byte = pop()
addr = pop()
store(addr, byte)
  • , - load a byte from the address on the stack.
addr = pop()
byte = load(addr)
push(byte)
  • .64 - store an 8-byte word at the address on the stack.
word = pop()
addr = pop()
store(addr, word)
  • ,64 - load an 8-byte word from the address on the stack.
word = pop()
byte = load(word)
push(byte)

System

  • syscall - perform a syscall with n arguments where n is in range [0..6]. (syscall1, syscall2, etc)
syscall_number = pop()

   
    
for i in range(n):
    arg = pop()
    
    

     

     
    
   

Macros

Define a new word write that expands into a sequence of tokens 1 1 syscall3 during the compilation.

macro write
    1 1 syscall3
end

Include

Include tokens of file file.porth

include "file.porth"
Comments
  • `porth sim` and `porth com -r` diverges in code execution

    `porth sim` and `porth com -r` diverges in code execution

    The code below seems to generate the expected output in simulation mode but don't output anything in compilation mode.

    include "std.porth"
    
    "abcdefghijklmnopqrstuvxyz\0"
    
    mem     swap .64
    mem + 1 swap .
    
    mem     ,64 print // string address in memory 
    mem + 1 ,   print // 26 string length
    mem ,64 ,   print // 97 (a) 
    

    Simulation output: ./porth.py sim

    31
    26
    97
    

    Compilation output is empty. ./porth.py com -r

    opened by drocha87 6
  • Supporting Intrinsic `LOAD` and `STORE` 64 bit values

    Supporting Intrinsic `LOAD` and `STORE` 64 bit values

    I was interested in contributing to this, so I figured I should ask how you're thinking this should be implemented. Mainly, should the size of the value to be read be an input (like 8, 16, 32, 64 bit) OR should it just have intrinsic support for 64 bit values and then have standard library macros provide support for other sizes?

    If its only 64 bit then it can probably just be done with

    pop rax
    xor rbx, rbx
    mov rbx, [rax] ;; Instead of mov bl, [rax]
    push rbx
    
    opened by zrthxn 5
  • Stack pointer operation (enables putting putd in `std.porth`)

    Stack pointer operation (enables putting putd in `std.porth`)

    A simple operation that pushes a pointer to the element that is currenlty on the top of the stack.

    With this functionality it's possible to allocate a temporary buffer on the stack by repeatedly pushing 0, and this enables to implement putd without a static buffer in memory, so putd can now be in the standard library.

    As a sidenote, building gol.porth will now fail due to the macro name collision.

    There is one drawback, and that is that this will create a discrepancy between simulation and compilation, as there is no easy way to implement this in python I think. Assuming that the stack is contiguous in memory in the bootstrapped version, there should really be no problem after that.

    opened by ap29600 4
  • Unable to use control flow in macros

    Unable to use control flow in macros

    It's impossible to use an if or while block inside of a macro, as it results in a compiler error. Simulation or compile doesn't matter, it's broken in both.

    Example

    macro test
        2 1 < if
            12 print
        end
    end
    

    Compile result

    $ python porth.py sim borked.porth
    Traceback (most recent call last):
      File "porth.py", line 924, in <module>
        program = compile_file_to_program(program_path, include_paths);
      File "porth.py", line 862, in compile_file_to_program
        return compile_tokens_to_program(lex_file(file_path), include_paths)
      File "porth.py", line 709, in compile_tokens_to_program
        block_ip = stack.pop()
    IndexError: pop from empty list
    

    Version d7ecaa221a791616e2c76989feaf65a9cdf42e0c

    opened by Mm2PL 3
  • Handle intrinsics with a lookup table

    Handle intrinsics with a lookup table

    This makes it easier to have exhaustive handling of operations by combaring the length of the lookup table with the length of the dataclass, and should also be a bit faster to compile (I assume the dictionary indexed with enums should have O(1) lookup.)

    opened by ap29600 2
  • `compile_tokens_to_program`  there's no need to reverse the tokens

    `compile_tokens_to_program` there's no need to reverse the tokens

    I just realized that there's no need to reverse the tokens since you can just tokens.pop(0) and tokens = macros[token.value].tokens + tokens when you need to append the list of tokens to tokens (in case of expanding a macro for example).

    I don't know if this can bring any performance improvements but I think it simplifies the code a little bit.

    If you want I can change the code and send a PR.

    opened by drocha87 2
  • Produce less jump points to enable optimization

    Produce less jump points to enable optimization

    It would be nice to only produce jump points in the assembly when that jump point is actually used.

    why?

    This is useful because it makes it clear where an instance like:

    addr_79:
        ;; -- plus --
        pop rax
        pop rbx
        add rax, rbx
        push rax
    addr_80:
        ;; -- load --
        pop rax
        xor rbx, rbx
        mov bl, [rax]
        push rbx
    

    can be safely reduced to:

    addr_79:
        ;; -- plus --
        pop rax
        pop rbx
        add rax, rbx
        ;; push rax
    addr_80:
        ;; -- load --
        ;; pop rax
        xor rbx, rbx
        mov bl, [rax]
        push rbx
    

    while this one can't:

    addr_11:
        ;; -- push int 0 --
        mov rax, 0
        push rax
    addr_12:
        ;; -- while --
    addr_13:
        ;; -- dup -- 
        pop rax
        push rax
        push rax
    

    because a jump might occur to addr_12.

    Doing this would enable to make optimizations with a separate tool instead of making the compiler more complex.

    is this worth it?

    A quick and dirty test removing some of the redundant pushes and pops manually from the generated assembly takes the runtime of rule110 on a board of size 1000 from 0.069s to 0.065s (consistently over a batch of 10 runs), and I definitely missed at least half the pops.

    So, probably not worth much but it's still something.

    EDIT I removed a few more and now it's 0.069s to 0.046s, by also replacing instances of:

        push rax
        pop rbx
    

    by:

    mov rbx, rax
    
    opened by ap29600 2
  • Discrepancy between com and sim modes for handling utf-8 characters.

    Discrepancy between com and sim modes for handling utf-8 characters.

    In compile mode, we are using len on the token value directly, which means we are not considering the size difference introduced by utf-8 characters, which was handled in simulation mode, but you missed that in compilation mode.

    opened by vipulbhj 2
  • intrinsic map, f-strings, redundant asserts in loops

    intrinsic map, f-strings, redundant asserts in loops

    This PR moves the intrinsics out of the massive if-else to a separate functions, with a dict mapping intrinsic enums to the functions themselves. I also moved some asserts out of loops where it doesn't make sense to have them (e.g new items aren't going to appear in enums between loop iterations). Some asserts were also made a bit more dynamic. Plus f-strings instead of % formatting. The changes here also provided significant speedup on rule110, from 4.3s to 2.3s or so on Cpython 3.10.

    opened by monoidic 1
  • [FEATURE REQUEST] make std.porth accesible anywhere when you move it to `/usr/include/porth/*.porth` or `/usr/include/*.porth`

    [FEATURE REQUEST] make std.porth accesible anywhere when you move it to `/usr/include/porth/*.porth` or `/usr/include/*.porth`

    I'd like to not need to include std.porth in every project I write so I have an idea - if a file (std.porth) is not found in the specified directory in include - make it look in /usr/lib/include/porth or /usr/lib/include/std.porth

    opened by TruncatedDinosour 1
  • move assert out of loops

    move assert out of loops

    This commit move some asserts out of loop, giving us some speed improvements in euler/problem04.porth.

    Running the actual code (with asserts inside loops)

    time ~/pypy3.7-v7.3.5-linux64/bin/pypy porth.py sim euler/problem04.porth
    906609
    ~/pypy3.7-v7.3.5-linux64/bin/pypy porth.py sim euler/problem04.porth  2,54s user 0,03s system 99% cpu 2,593 total
    

    Moving out the assert from loop

    time ~/pypy3.7-v7.3.5-linux64/bin/pypy porth.py sim euler/problem04.porth
    906609
    ~/pypy3.7-v7.3.5-linux64/bin/pypy porth.py sim euler/problem04.porth  2,33s user 0,02s system 99% cpu 2,372 total
    
    opened by drocha87 1
  • Change the openat macro to use the syscall intrinsic for 4 argument syscalls

    Change the openat macro to use the syscall intrinsic for 4 argument syscalls

    When I was programming the porth.porth program to eventually write the compiled asm to a file.

    I encountered a requirement to set the permission mask when using the openat syscall to open a output file for writing.

    When I could only pass 3 of the 4 arguments, I ended up with a file I didn't have write permission to open again. The next time I attempted to run the compiled porth compiler, I was denied permission to open the output file.

    opened by mjz19910 0
  • [DISC] Take into account PRs with porth code that's unrelated to the main compiler

    [DISC] Take into account PRs with porth code that's unrelated to the main compiler

    I understand that that currently language is changing rapidly, but @rexim can you please consider PRs that add porth code, at least ones that do not affect your main activity of writing the compiler, i.e. adding examples or maybe additions/fixes to std?

    It would also be very nice to reflect this in the CONTRIBUTING.md

    opened by 0dminnimda 3
  • Added a dynamic memory implementation using mmap

    Added a dynamic memory implementation using mmap

    Hope I'm not stepping on toes here - if this is planned for a future stream, please feel free to reject the PR. I'm just very excited about the language, and happy that there's useful stuff I can do with it :)

    Still todo:

    • [ ] Python emulation of mmap/munmap syscalls (should be easy to self-host!)
    • [ ] Better memory allocation - only whole pages right now, would be very expensive for a lot of small allocations - I seem to recall there's a certain YouTube video about writing your own malloc....

    640K? Pah!

    opened by Nabushika 0
  • Parity error between numbers in simulation and compilation

    Parity error between numbers in simulation and compilation

    Python uses bigints by default, which means you can have arbitrarily large numbers. I noticed this when writing an abs macro. The following code produces two different outputs on my machine:

    include "std.porth"
    macro abs
     if dup 0 < do
       0 over - swap drop
     end
    end
    
    -18446744073709551615 abs
    // becomes 1 on bare metal, 18446744073709551615 in Python
    1 - 
    if 0 = do
     "We're running on bare metal!\n" puts
    else
     "We're in a simulation!\n" puts
    end
    
    

    Not sure if it's worth fixing in Python as when Porth becomes self-hosted this issue will go away. At least it's amusing :)

    opened by Nabushika 1
  • Invalid syntax with MEM_CAPACITY = 640_000

    Invalid syntax with MEM_CAPACITY = 640_000

    When downloading the code and then doing the Quickstart, instead of working, it prints out: File "porth.py", line 18 MEM_CAPACITY = 640_000 # should be enough for everyone ^ SyntaxError: invalid syntax This also happens for: SIM_STR_CAPACITY = 640_000 SIM_ARGV_CAPACITY = 640_000

    Note: Because I'm using Windows, I am using a Ubuntu emulator found in the Windows Store

    opened by ghost 3
Owner
Tsoding
Recreational Programming
Tsoding
🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Boltons boltons should be builtins. Boltons is a set of over 230 BSD-licensed, pure-Python utilities in the same spirit as — and yet conspicuously mis

Mahmoud Hashemi 5.4k Feb 20, 2021
🗽 Like yarn outdated/upgrade, but for pip. Upgrade all your pip packages and automate your Python Dependency Management.

pipupgrade The missing command for pip Table of Contents Features Quick Start Usage Basic Usage Docker Environment Variables FAQ License Features Upda

Achilles Rasquinha 529 Dec 31, 2022
Pymon is like nodemon but it is for python,

Pymon is like nodemon but it is for python,

Swaraj Puppalwar 2 Jun 11, 2022
Something like Asteroids but not really, done in CircuitPython

CircuitPython Staroids Something like Asteroids, done in CircuitPython. Works with FunHouse, MacroPad, Pybadge, EdgeBadge, CLUE, and Pygamer. circuitp

Tod E. Kurt 14 May 31, 2022
Like Docker, but for Squeak. You know, for kids.

Squeaker Like Docker, but for Smalltalk images. You know, for kids. It's a small program that helps in automated derivation of configured Smalltalk im

Tony Garnock-Jones 14 Sep 11, 2022
A C-like hardware description language (HDL) adding high level synthesis(HLS)-like automatic pipelining as a language construct/compiler feature.

██████╗ ██╗██████╗ ███████╗██╗ ██╗███╗ ██╗███████╗ ██████╗ ██╔══██╗██║██╔══██╗██╔════╝██║ ██║████╗ ██║██╔════╝██╔════╝ ██████╔╝██║██████╔╝█

Julian Kemmerer 391 Jan 1, 2023
Python with braces. Because Python is awesome, but whitespace is awful.

Bython Python with braces. Because Python is awesome, but whitespace is awful. Bython is a Python preprosessor which translates curly brackets into in

null 1 Nov 4, 2021
PSP (Python Starter Package) is meant for those who want to start coding in python but are new to the coding scene.

Python Starter Package PSP (Python Starter Package) is meant for those who want to start coding in python, but are new to the coding scene. We include

Giter/ 1 Nov 20, 2021
A chain of stores wants a 3-month demand forecast for its 10 different stores and 50 different products.

Demand Forecasting Objective A chain store wants a machine learning project for a 3-month demand forecast for 10 different stores and 50 different pro

null 2 Jan 6, 2022
Simple rofi script to choose player for playerctl to execute its command

rofi-playerctl-switcher simple rofi script to choose player for playerctl to execute its command Usage copy playerSwitch.py and playerctl.sh to ~/.con

null 2 Jan 3, 2022
Given an array of integers, calculate the ratios of its elements that are positive, negative, and zero.

Given an array of integers, calculate the ratios of its elements that are positive, negative, and zero. Print the decimal value of each fraction on a new line with places after the decimal.

Shruti Dhave 2 Nov 29, 2021
Automates the fixing of problems reported by yamllint by parsing its output

yamlfixer yamlfixer automates the fixing of problems reported by yamllint by parsing its output. Usage This software automatically fixes some errors a

OPT Nouvelle Caledonie 26 Dec 26, 2022
A timer for bird lovers, plays a random birdcall while displaying its image and info.

Birdcall Timer A timer for bird lovers. Siriema hatchling by Junior Peres Junior Background My partner needed a customizable timer for sitting and sta

Marcelo Sanches 1 Jul 8, 2022
CBO uses its Capital Tax model (CBO-CapTax) to estimate the effects of federal taxes on capital income from new investment

CBO’s CapTax Model CBO uses its Capital Tax model (CBO-CapTax) to estimate the effects of federal taxes on capital income from new investment. Specifi

Congressional Budget Office 7 Dec 16, 2022
Its a simple and fun to use application. You can make your own quizes and send the lik of the quiz to your friends.

Quiz Application Its a simple and fun to use application. You can make your own quizes and send the lik of the quiz to your friends. When they would a

Atharva Parkhe 1 Feb 23, 2022
A simple but flexible plugin system for Python.

PluginBase PluginBase is a module for Python that enables the development of flexible plugin systems in Python. Step 1: from pluginbase import PluginB

Armin Ronacher 1k Dec 16, 2022
A simple but flexible plugin system for Python.

PluginBase PluginBase is a module for Python that enables the development of flexible plugin systems in Python. Step 1: from pluginbase import PluginB

Armin Ronacher 935 Feb 20, 2021
Simple but maybe too simple config management through python data classes. We use it for machine learning.

??‍✈️ Coqpit Simple, light-weight and no dependency config handling through python data classes with to/from JSON serialization/deserialization. Curre

coqui 67 Nov 29, 2022
This is a pretty basic but relatively nice looking Python Pomodoro Timer.

Python Pomodoro-Timer This is a pretty basic but relatively nice looking Pomodoro Timer. Currently its set to a very basic mode, but the funcationalit

EmmHarris 2 Oct 18, 2021