A markdown lexer and parser which gives the programmer atomic control over markdown parsing to html.

stonepresto

Last update: Aug 13, 2022

Related tags

Markdown/YAML atomd

Overview

atomd

Version 0.2.0

A markdown lexer and parser which gives the developer atomic control over markdown parsing to html.

Atomd first tokenizes the markdown with the lexer.

Using the tokenized markdown, parsers for various different markup languages (HTML, XML, etc) are easier to implement, since it more or less becomes a game of find and replace.

Install

python3 -m pip install atomd

Usage

Functions in atomd.py called md2 () combine the lexer and parser to provide an easier calling convention.

In the case of the HTML parser, this is md2html().

If you want to parse a markdown file, do the following:

from atomd import md2html

html = md2html(filename="path/to/myfile.md").decode('utf-8')

If you want to parse a string of markdown, do the following:

from atomd import md2html

html = md2html(md=b"# My markdown ByteString!").decode('utf-8')

Lexer

This portion of the program tokenizes the markdown based on the lexicon, which is a mapping of markdown syntax items to tokens that will represent the specific syntax.

Tokenization is done with a series of regular expressions.

If there are bugs in the program, it is likely they exist in the regular expressions and can be fixed without significant design changes to the program, which allows for quick turnaround on bugfixes.