Taking the fight to the establishment.

Trevor van Hoof

Last update: Feb 1, 2022

Related tags

Miscellaneous throwdown

Overview

Throwdown

Taking the fight to the establishment.

Wat?

I wanted a simple markdown interpreter in python and/or javascript to output html for my website. Python does not have a bug-free official distribution, javascript only has things you install through npm and I don't want to have anything to do with node and the 100 MB of dependencies you end up uploading to your FTP server in order to do the most basic tasks.

So writing my own parser it is then eh? I tried to trudge through the commonmark markdown spec and had a heart attack at the complexity. 24722 words and 181 pages of complicated language explaining features I absolutely don't need.

I just want minimal, well defined, syntactical elements with maximum payoff, so here is throwdown. Taking the fight to the establishment to have a stupidly minimal markup language in both definition and capability.

Goals

Keep it a subset for markdown so we can use existing IDEs & plugins. Support HTML tags in line with text. Write a well defined language spec in the manual to ease creating new interpreters for it.

The spec

Tokenization

Given a piece of text we tokenize the following concepts (these are regular expressions using DOTALL and MULTILINE modifiers):

blank_line(s): (\r | \n | \r\n){2,}
html tag: <.*?>
code: ```.*?```
unescaped italic: ^_|(?

 any characters inbetween matching tokens are flagged content.
 Content itself gets an additional treatment where we replace this regex 
  
  \\(?)
 
 
 for escaped characters, with whatever was in the matched group. I currently do this in the generation stage but it could move to any stage.
 I am not sure if in the UTF8 2/3/4 byte characters any of these elements may match, so make sure to perform these single-characetr checks per unicode char, not per byte.
  
   Parsing
 We then have a parsing pass that tries to group matching tokens:
 In this example: 
  
  This *word* is bold but this* is wrong.
 
 
 We have the following tokens: 
  
  content, bold, content, bold, content, bold, content
 
 
 The parser simply finds any content block surrounded by matching code|italic|bold neighbours, and then 'consumes' these neighbours so they can not be picked up more than once. Reading from left to right this means we get (note we search outwards from content recursively to support *_content_* notations, instead of holding on to the boundary tokens as soon as we encounter them): 
  
  content group content bold content
 
 
 Then, any token outside of a group gets merged into it's content, any consecutive content gets merged into 1 content. The first step reduced the bold into it's left neighbour: 
  
  content group content content
 
 
 The next step reduces the two content blocks into one: 
  
  content group content
 
 
 The above step should include html tags.
 A final step is to remove the blank line tokens, but first we must make sure to merge consecutive group and content blocks, because after this any consecutive content and/or group tokens are known unique paragraphs (or headers) so the blank lines are no longer necessary to imply this separation.
  
   Generation
 Then there is the generation step. We simply walk the resulting tokens and output a html document. 
  
  If a content group is preceded by a heading, the node gets wrapped into 
     tags where n is the number of #.
    
  Every other content node gets wrapped into  tags. 
  Every group gets wrapped based on the first and last tokens (which are identical). 
    
    italic becomes  In this case the wrapping is recursive, a bold group in an intalic group may exist.
 bold becomes  In this case the wrapping is recursive, an italic group in a bold group may exist.
 code becomes 
 
   
 
 
 
 Write 
 to insert single line breaks manually.
  
      TODO:
 Consider bullet points and numbered lists, though the html is not super invasive.

A python Script For Taking Screenshot Of Windows

PyShot A Python Script For Taking Screenshot Of Windows Disclaimer This tool is for educational purposes only ! Don't use this to take revenge I will

2 Jun 22, 2022

A project designed to make taking notes easier than ever - by doing it all on command line

A project designed to make taking notes easier than ever - by doing it all on command line! Yes, all of your files are easily accessible through one command interface, and can be written to at any time! #ad #sponsored

1 Dec 10, 2021

Athens: a great tool for taking notes and organising knowldge

AthensSyncer Athens is a great tool for taking notes and organising knowldge. But it is a bummer that you cannot use it accross multiple devices. Well

6 Dec 14, 2022

A simple, fantasy and fast note taking program.

notes A simple, fantasy and fast note taking program Installation This program supposed to run in linux and may have some bugs on windows or any other

1 Apr 6, 2022

BART aids transcribe tasks by taking a source audio file and creating automatic repeated loops, allowing transcribers to listen to fragments multiple times

BART (Beyond Audio Replay Technology) aids transcribe tasks by taking a source audio file and creating automatic repeated loops, allowing transcribers to listen to fragments multiple times (with possible overlap between segments).

2 Feb 4, 2022

This application aims to read all wifi passwords and visualizes the complexity in graph formation by taking into account several criteria and help you generate new random passwords.

0 May 29, 2022

If you are worried about being found perhaps try taking cover under a blanket. Pure Python PowerShell Obfuscator

Taking the fight to the establishment.

Related tags

Overview

Throwdown

Wat?

Goals

The spec

Tokenization

Parsing

Generation

TODO:

You might also like...

A python Script For Taking Screenshot Of Windows

A project designed to make taking notes easier than ever - by doing it all on command line

Athens: a great tool for taking notes and organising knowldge

A simple, fantasy and fast note taking program.

BART aids transcribe tasks by taking a source audio file and creating automatic repeated loops, allowing transcribers to listen to fragments multiple times

This application aims to read all wifi passwords and visualizes the complexity in graph formation by taking into account several criteria and help you generate new random passwords.

If you are worried about being found perhaps try taking cover under a blanket. Pure Python PowerShell Obfuscator

Notes taking website build with Docker + Django + React.

An automated scanning, enumeration, and note taking tool for pentesters

A comand-line utility for taking automated screenshots of websites

Owner

Trevor van Hoof

A simple, fantasy and fast note taking program.

OpenStickFirmware is open source software designed to handle any and all tasks required in a custom Fight Stick

An Undertale RPG Discord bot to fight monsters, bosses, level up and duel with other players

A script to extract SNESticle from Fight Night Round 2

Bot SpaceCrypto - An automation (bot) to play the game SpaceCrypto, it automatically log in, send ships to fight, refresh the game, new map, etc

A command line simple note taking app

Markup for note taking

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

ScreenTeX is a tool that grabs all text when taking a screenshot rather than getting an image.

A webapp for taking fast notes, designed for business, school, and collaboration with groups.