A fast yet powerful Python Markdown parser with renderers and plugins.

Overview

Mistune v2

A fast yet powerful Python Markdown parser with renderers and plugins.

Coverage

NOTE: This is the re-designed v2 of mistune. Check v1 branch for earlier code.

Using old Mistune? Checkout docs: https://mistune.readthedocs.io/en/v0.8.4/

Sponsors

Mistune is sponsored by Typlog, a blogging and podcast hosting platform, simple yet powerful. Write in Markdown.

Support Me via GitHub Sponsors.

Install

To install v2 of mistune:

$ pip install mistune==2.0.0a6

Overview

Convert Markdown to HTML with ease:

import mistune

mistune.html(your_markdown_text)

Security Reporting

If you found security bugs, please do not send a public issue or patch. You can send me email at [email protected]. Attachment with patch is welcome. My PGP Key fingerprint is:

72F8 E895 A70C EBDF 4F2A DFE0 7E55 E3E0 118B 2B4C

Or, you can use the Tidelift security contact. Tidelift will coordinate the fix and disclosure.

License

Mistune is licensed under BSD. Please see LICENSE for licensing details.

Comments
  • Line under quote is not considered part of quote

    Line under quote is not considered part of quote

    Apologies if this just an issue on my side. But I found that the follow two lines would not be joined into the same quote, when passed to my custom renderer:

    > first quote line
    second line underneath
    

    I assumed they would both be passed together to the block_quote function.

    bug 
    opened by makeworld-the-better-one 21
  • default_features -> default_rules

    default_features -> default_rules

    Hi there !

    Are you on a strict 'no backward compat' (as the version number is < 1.x) ?

    I guess we miht try to reintroduce default_features as a class property that print a warning and maps to default_rules ?

    Just saying that as all our CI test are failing since 0,5 release, but not a big deal we can update and.or pin mistune version.

    Thanks !

    opened by Carreau 18
  • Nested HTML inside block_html is escaped when escape=False, parse_block_html=True

    Nested HTML inside block_html is escaped when escape=False, parse_block_html=True

    Normally with escape=False, nested HTML block is corectly not escaped:

    >>> print markdown('<div id="special-part"><div class="subsection">text</div></div>', escape=False)
    <div id="special-part"><div class="subsection">text</div></div>
    

    But when I add parse_block_html=True, only out-most element is not escaped and the rest is escaped:

    >>> print markdown('<div id="special-part"><div class="subsection">text</div></div>', escape=False, parse_block_html=True)
    <div id="special-part">&lt;div class="subsection"&gt;text&lt;/div&gt;</div>
    
    opened by tdivis 17
  • Implement text renderer.

    Implement text renderer.

    For rendering of markdown documents in text format.

    The concrete use case is the ability to render templated markdown documents directly within a terminal environment.

    opened by csadorf 12
  • Various Extension System Problems

    Various Extension System Problems

    I just looked at how the grammar is extended and in it's current form it's not really usable. There are a few problems with the way the inline lexer is supposed to modified alone:

    • right now the example modifies default_rules which is a class attribute and as such shared across all other lexers as well.
    • because the text rule is greedy it needs to be modified whenever another rule is inserted but it is impossible for a plugin to do so because it cannot now which sequences indicate the start of a rule. The current example only works by chance.

    I think the current design does not really suit itself to extension well but I'm not sure how to fix it without breaking what's already there.

    opened by mitsuhiko 12
  • 0.8.4 vs 2.0.0a4 performance

    0.8.4 vs 2.0.0a4 performance

    Congrats @lepture for this project! (I'm doing a benchmark of many Markdown to HTML libraries, and mistune seems to be the best!)

    I used timeit with the same Markdown document, and:

    • version 0.8.4: mistune.markdown(s) took 4ms on average
    • version 2.0.0a4: mistune.markdown(s) took 13ms on average (for the same document)

    What could be the reason of 2.0.0a4 be 3x slower than version 0.8.4?

    Can I use 0.8.4 for my current project, is it still stable?

    s = 'my_sample_document'
    import timeit
    print(timeit.timeit("markdown(s)", setup="from mistune import markdown;from __main__ import s", number=100)/100)
    

    PS: I also found this article, it's linked to this topic: https://getnikola.com/blog/markdown-can-affect-performance.html

    opened by josephernest 11
  • Parse line numbers

    Parse line numbers

    Is it possible to retrieve information about the line number of block-level elements in the AST? We are looking into using Mistune for a project and this would be quite helpful.

    E.g., running:

    markdown = mistune.create_markdown(renderer=mistune.AstRenderer())
    markdown("# Heading 1\n\n## Heading2")
    

    to return something like

    [{'type': 'heading',
      'children': [{'type': 'text', 'text': 'Heading 1'}],
      'level': 1,
      'lineno': 0},
     {'type': 'heading',
      'children': [{'type': 'text', 'text': 'Heading2'}],
      'level': 2,
      'lineno': 2}]
    

    If it's not possible currently, is it something that would be doable w/ a PR?

    feature request 
    opened by choldgraf 11
  • Provide context for renderer

    Provide context for renderer

    Great work on this; it is indeed fast!

    Would you consider a change something like the prototype in this pull request? It provides the inline_lexer to the renderer method and sets a couple extra values that can be used for more advanced rendering.

    Of course, a full implementation would change all the renderer methods to accept the lexer as an argument and would be breaking backwards compatibility, but it is a very new project so maybe it is in time? If so, I'm willing to make the change and submit it.

    In the exact example usage I'm thinking of, I detect if an image is all alone or surrounded by other elements. Depending on its surroundings or lack thereof, I change the rendered CSS class. So for example, an image followed by text all in one paragraph might float the image left and flow the text around it.

    Maybe an image would better explain:

    sample

    opened by gholt 11
  • Mistune 2.0.0 release?

    Mistune 2.0.0 release?

    Hi,

    I have been testing markdown rendering in our archiver for GNU Mailman project, Hyperkitty for some time now using the alpha/rc version of mistune 2.0. While it works okay in our test environments, i was hoping that released stable versions rely on the non-pre-release dependencies.

    I was wondering if you had any plans for the 2.0 release for mistune? There seems to be no issues or PRs marked with the 2.0 milestone at the moment.

    opened by maxking 10
  • Support for Front Matter?

    Support for Front Matter?

    I have a blog with markdown files generated by Jekyll. They use Markdown files that has a metadata block on top that contains yaml. Is this something that mistune would be willing to support? Jekyll is a very common blog engine, so I think it would be useful to many!

    ---
    id: 29
    title: How this website was built
    date: 2005-12-20T00:00:36
    categories:
      - CSS
    ---
    Hi there early readers!
    
    opened by EmilStenstrom 10
  • Traceback: 'DocPageRenderer' object has no attribute 'rstrip

    Traceback: 'DocPageRenderer' object has no attribute 'rstrip

    I am getting a traceback with 0.3.1 (which works with 0.3.0):

    Traceback (most recent call last):
      File "website/crossbario/__init__.py", line 191, in <module>
        pages = DocPages('../crossbar.wiki')
      File "website/crossbario/__init__.py", line 153, in __init__
        self._renderer = mistune.Markdown(renderer = rend, inline = inline)
      File "mistune.py", line 899, in mistune.Markdown.__init__ (mistune.c:16104)
        inline = inline(renderer, **kwargs)
      File "mistune.py", line 502, in mistune.InlineLexer.__call__ (mistune.c:8497)
        return self.output(src)
      File "mistune.py", line 510, in mistune.InlineLexer.output (mistune.c:9059)
        src = src.rstrip('\n')
    AttributeError: 'DocPageRenderer' object has no attribute 'rstrip'
    make: *** [test] Error 1
    
    opened by oberstet 9
  • External plugins guidance and ecosystem

    External plugins guidance and ecosystem

    Mistune users are able to write there own plugins, and when they do this locally, it is easy enough for them to import and use there plugin code, so it's available when they use Mistune. Is there a preferred way or best practice for how users should publish their plugin? What is the best way one could find community-made plugins? If there are such things, I propose adding them to the docs.

    Taking some inspiration from Lektor, projects could be published to pypi with a mistune- prefix so they are discoverable via pypi's search bar, or you could maintain a docs page linking to them. Perhaps PyPA would allow a new trove classifier to list Mistune as a framework, as well (makes sense to me).

    I thought of this coming from the idea that Lektor could more easily pull in arbitrary Mistune plugins. https://github.com/lektor/lektor/issues/1076

    opened by nixjdm 1
  • abbr plugin error?

    abbr plugin error?

    image

    https://github.com/lepture/mistune/blob/1264a1c954396fa31304bfac39588381f015a5d8/mistune/plugins/abbr.py#L56-L63

    maybe here def_abbr and should the same as abbr? I run my code local no error, but on github action error occurs

    full log here https://github.com/teedoc/teedoc/actions/runs/3069280480/jobs/4957741255

    opened by Neutree 0
  • Rendering code comment as heading in html inside markdown

    Rendering code comment as heading in html inside markdown

    We're embedding some pyscript inside out markdown, and having some trouble with code comments being rendered as headings:

    It appears to be an issue within a custom tag where there is a blank line before the line prefaced with the #

    To reproduce:

    $ python3
    Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import mistune
    >>> mistune.__version__
    '2.0.4'
    >>> mistune.html(r'''<some-tag>
    ... 
    ... # a comment
    ... something with # a trailing comment
    ... </some-tag>
    ... 
    ... ''')
    '<some-tag>\n<h1>a comment</h1>\n<p>something with # a trailing comment</p>\n</some-tag>\n'
    

    What is expected:

    The code/comments within the custom tags (in our case they're <py-script></py-script>) should not be interpreted as markdown (there weren't in a pre-2.x version of mistune), and produce: <some-tag>\n<p># a comment</p>\n<p>something with # a trailing comment</p>\n</some-tag>\n

    When there is no blank line preceeding the code comment, this behavior is not observed:

    >>> mistune.html(r'''<some-tag>
    ... # a comment
    ... something with # a trailing comment
    ... </some-tag>''')
    '<some-tag>\n# a comment\nsomething with # a trailing comment\n</some-tag>\n'
    
    opened by toonarmycaptain 0
  • What's the scope of CVE-2022-34749?

    What's the scope of CVE-2022-34749?

    The advisory for https://github.com/advisories/GHSA-fw3v-x4f2-v673 says that all versions of mistune before 2.0.3 are vulnerable. Given the fix was to modify a single regex, which isn't present in versions before 2.0.0a1, I think this claim is unlikely to be true. Is every version of mistune ever released (starting with 0.1.0) actually vulnerable to a ReDoS, or should there be a version bound of e.g. >1.8.4 on the advisory?

    opened by sersorrel 2
  • 2.0.2: sphinx warnings

    2.0.2: sphinx warnings

    Looks like sphinx now shows some warnings

    + SETUPTOOLS_SCM_PRETEND_VERSION=2.0.2
    + /usr/bin/sphinx-build -n -T -b man docs build/sphinx/man
    Running Sphinx v4.5.0
    making output directory... done
    building [mo]: targets for 0 po files that are out of date
    building [man]: all manpages
    updating environment: [new config] 8 added, 0 changed, 0 removed
    reading sources... [100%] plugins
    /home/tkloczko/rpmbuild/BUILD/mistune-2.0.2/docs/changes.rst:21: WARNING: Title underline too short.
    
    Version 2.0.0rc1
    ~~~~~~~~~~~~~~~
    /home/tkloczko/rpmbuild/BUILD/mistune-2.0.2/docs/changes.rst:21: WARNING: Title underline too short.
    
    Version 2.0.0rc1
    ~~~~~~~~~~~~~~~
    /home/tkloczko/rpmbuild/BUILD/mistune-2.0.2/docs/plugins.rst:4: WARNING: duplicate label plugins, other instance in /home/tkloczko/rpmbuild/BUILD/mistune-2.0.2/docs/advanced.rst
    looking for now-outdated files... none found
    pickling environment... done
    checking consistency... done
    writing... python-mistune.3 { intro guide plugins directives advanced api changes } done
    build succeeded, 3 warnings.
    
    opened by kloczek 0
  • Unexpected regex matching behaviour

    Unexpected regex matching behaviour

    I'm trying to port an app (Moment) from Mistune 0.8.4 to Mistune 2.0, but I'm struggling with figuring out what's going here:

    import mistune
    
    def parse_colour(inline, m, state):
        colour = m.group(1)
        text = m.group(2)
        return "colour", colour, text
    
    
    def render_html_colour(colour, text):
        return f'<span data-mx-color="{colour}">{text}</span>'
    
    
    def plugin_matrix(md):
        colour = (
            r"^<(.+?)>"          # capture the colour in `<colour>`
            r"\((.+?)"           # capture text in `(text`
            r"(?<!\\)(?:\\\\)*"  # ignore the next `)` if it's \escaped
            r"\)"                # finish on a `)`
        )
    
        md.inline.register_rule("colour", colour, parse_colour)
        md.inline.rules.append("colour")
    
        if md.renderer.NAME == "html":
            md.renderer.register("colour", render_html_colour)
    
    
    markdown_to_html = mistune.create_markdown(
        hard_wrap = True,
        escape = True,
        renderer = "html",
        plugins = [plugin_matrix],
    )
    
    markdown = """
    <#ff0000>(red text!)
    """
    
    print(markdown_to_html(markdown))
    

    This works fine for turning <#ff0000>(red text!) into <span data-mx-color="#ff0000">red text!</span>. However, <red>(red text!) turns into &lt;red&gt;(red text!), which is not what I would have expected given that my regular expression should match this too and turn it into <span data-mx-color="red">red text!</span>. This regex matches both patterns just fine when I test it in a Python shell by importing re, and this pattern also worked fine in Mistune 0.8.4 as far as I can tell: https://gitlab.com/mx-moment/moment/-/blob/2a1a96e762e7e7c6dce13a7caa8b34f7222230d7/src/backend/html_markdown.py#L35-40

    What gives? Am I missing something? I've tried looking into Mistune's source code to see if I can figure out what's going on, but I don't think I understand Python well enough.

    opened by Newbytee 3
Releases(v2.0.4)
Owner
Hsiaoming Yang
This guy is too lazy to introduce himself.
Hsiaoming Yang
A markdown lexer and parser which gives the programmer atomic control over markdown parsing to html.

A markdown lexer and parser which gives the programmer atomic control over markdown parsing to html.

stonepresto 4 Aug 13, 2022
A fast, extensible and spec-compliant Markdown parser in pure Python.

mistletoe mistletoe is a Markdown parser in pure Python, designed to be fast, spec-compliant and fully customizable. Apart from being the fastest Comm

Mi Yu 514 Sep 26, 2022
Provides syntax for Python-Markdown which allows for the inclusion of the contents of other Markdown documents.

Markdown-Include This is an extension to Python-Markdown which provides an "include" function, similar to that found in LaTeX (and also the C pre-proc

Chris MacMackin 83 Sep 28, 2022
Mdformat is an opinionated Markdown formatter that can be used to enforce a consistent style in Markdown files

Mdformat is an opinionated Markdown formatter that can be used to enforce a consistent style in Markdown files. Mdformat is a Unix-style command-line tool as well as a Python library.

Executable Books 151 Sep 22, 2022
markdown2: A fast and complete implementation of Markdown in Python

Markdown is a light text markup format and a processor to convert that to HTML. The originator describes it as follows: Markdown is a text-to-HTML con

Trent Mick 2.4k Sep 20, 2022
A lightweight and fast-to-use Markdown document generator based on Python

A lightweight and fast-to-use Markdown document generator based on Python

快乐的老鼠宝宝 1 Jan 10, 2022
Static site generator that supports Markdown and reST syntax. Powered by Python.

Pelican Pelican is a static site generator, written in Python. Write content in reStructuredText or Markdown using your editor of choice Includes a si

Pelican dev team 11.2k Sep 30, 2022
A Python implementation of John Gruber’s Markdown with Extension support.

Python-Markdown This is a Python implementation of John Gruber's Markdown. It is almost completely compliant with the reference implementation, though

Python-Markdown 3k Sep 27, 2022
A Python implementation of John Gruber’s Markdown with Extension support.

Python-Markdown This is a Python implementation of John Gruber's Markdown. It is almost completely compliant with the reference implementation, though

Python-Markdown 3k Oct 1, 2022
Extensions for Python Markdown

PyMdown Extensions Extensions for Python Markdown. Documentation Extension documentation is found here: https://facelessuser.github.io/pymdown-extensi

Isaac Muse 641 Sep 27, 2022
Lightweight Markdown dialect for Python desktop apps

Litemark is a lightweight Markdown dialect originally created to be the markup language for the Codegame Platform project. When you run litemark from the command line interface without any arguments, the Litemark Viewer opens and displays the rendered demo.

null 10 Apr 23, 2022
A markdown template manager for writing API docs in python.

DocsGen-py A markdown template manager for writing API docs in python. Contents Usage API Reference Usage You can install the latest commit of this re

Ethan Evans 1 May 10, 2022
Livemark is a static page generator that extends Markdown with interactive charts, tables, and more.

Livermark This software is in the early stages and is not well-tested Livemark is a static site generator that extends Markdown with interactive chart

Frictionless Data 81 Aug 31, 2022
Read a list in markdown and do something with it!

Markdown List Reader A simple tool for reading lists in markdown. Usage Begin by running the mdr.py file and input either a markdown string with the -

Esteban Garcia 3 Sep 13, 2021
Yuque2md - Offline download the markdown file and image from yuque

yuque2md 按照语雀知识库里的目录,导出语雀知识库中所有的markdown文档,并离线图片到本地 使用 安装 Python3.x clone 项目 下载依

JiaJianHuang 3 Apr 17, 2022
Convert HTML to Markdown-formatted text.

html2text html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to

Alireza Savand 1.3k Sep 22, 2022
Comprehensive Markdown plugin built for Django

Django MarkdownX Django MarkdownX is a comprehensive Markdown plugin built for Django, the renowned high-level Python web framework, with flexibility,

neutronX 735 Sep 23, 2022
Awesome Django Markdown Editor, supported for Bootstrap & Semantic-UI

martor Martor is a Markdown Editor plugin for Django, supported for Bootstrap & Semantic-UI. Features Live Preview Integrated with Ace Editor Supporte

null 625 Sep 26, 2022
A super simple script which uses the GitHub API to convert your markdown files to GitHub styled HTML site.

A super simple script which uses the GitHub API to convert your markdown files to GitHub styled HTML site.

Çalgan Aygün 210 Sep 25, 2022