A Python implementation of John Gruber’s Markdown with Extension support.

Overview

Python-Markdown

Build Status Coverage Status Latest Version Python Versions BSD License Code of Conduct

This is a Python implementation of John Gruber's Markdown. It is almost completely compliant with the reference implementation, though there are a few known issues. See Features for information on what exactly is supported and what is not. Additional features are supported by the Available Extensions.

Documentation

pip install markdown
import markdown
html = markdown.markdown(your_text_string)

For more advanced installation and usage documentation, see the docs/ directory of the distribution or the project website at https://Python-Markdown.github.io/.

See the change log at https://Python-Markdown.github.io/change_log.

Support

You may report bugs, ask for help, and discuss various other issues on the bug tracker.

Code of Conduct

Everyone interacting in the Python-Markdown project's codebases, issue trackers, and mailing lists is expected to follow the Code of Conduct.

Issues
  • Refactor HTML Parser

    Refactor HTML Parser

    This is experimental. More of the HTMLParser methods need to be fleshed out. So far the basic stuff works as long as there is no invalid HTML in the document (which is untested at this point).

    Input:

    Some *Markdown* text.
    
    <p>Some *raw* HTML</p>
    
    <span>*inline*</span>
    
        <p>code block</p>
    
    `<em>code span</em>`
    
    <div>
    
    foo *bar*
    
    * baz bar
    
    blah blah
    
    </div>
    
    More *Markdown*.
    

    Output:

    <p>Some <em>Markdown</em> text.</p>
    <p>Some *raw* HTML</p>
    
    <p><span><em>inline</em></span></p>
    <pre><code>&lt;p&gt;code block&lt;/p&gt;
    </code></pre>
    <p><code>&lt;em&gt;code span&lt;/em&gt;</code></p>
    <div>
    
    foo *bar*
    
    * baz bar
    
    blah blah
    
    </div>
    
    <p>More <em>Markdown</em>.</p>
    

    ... which exactly matches the existing behavior.

    I havn't actually run the tests on this yet, so I'm curious to see what Travis says...

    approved 
    opened by waylan 51
  • Infinite execution on some input

    Infinite execution on some input

    With some input, i have a infinite execution, with markdown function, no exception raise.

    Step to reproduce: https://gist.github.com/anonymous/ffab9ad433127893f04b9d009cd21444

    bug 
    opened by dattaz 47
  • AttributeError: module 'importlib' has no attribute 'util' with python-markdown 3.4 on macOS/Windows

    AttributeError: module 'importlib' has no attribute 'util' with python-markdown 3.4 on macOS/Windows

    With python3.9 on macOS:

    $ python3.9 -m venv venv
    $ source venv/bin/activate
    $ pip install markdown
    Collecting markdown
      Using cached Markdown-3.4-py3-none-any.whl (93 kB)
    Collecting importlib-metadata>=4.4; python_version < "3.10"
      Using cached importlib_metadata-4.12.0-py3-none-any.whl (21 kB)
    Collecting zipp>=0.5
      Using cached zipp-3.8.1-py3-none-any.whl (5.6 kB)
    Installing collected packages: zipp, importlib-metadata, markdown
    Successfully installed importlib-metadata-4.12.0 markdown-3.4 zipp-3.8.1
    WARNING: You are using pip version 20.2.3; however, version 22.1.2 is available.
    You should consider upgrading via the '/Users/mike/tmp/resume.md/venv/bin/python3.9 -m pip install --upgrade pip' command.
    $ python
    Python 3.9.4 (default, Apr 16 2021, 21:18:07)
    [Clang 12.0.0 (clang-1200.0.32.29)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import markdown
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/mike/tmp/resume.md/venv/lib/python3.9/site-packages/markdown/__init__.py", line 22, in <module>
        from .core import Markdown, markdown, markdownFromFile
      File "/Users/mike/tmp/resume.md/venv/lib/python3.9/site-packages/markdown/core.py", line 27, in <module>
        from .preprocessors import build_preprocessors
      File "/Users/mike/tmp/resume.md/venv/lib/python3.9/site-packages/markdown/preprocessors.py", line 29, in <module>
        from .htmlparser import HTMLExtractor
      File "/Users/mike/tmp/resume.md/venv/lib/python3.9/site-packages/markdown/htmlparser.py", line 29, in <module>
        spec = importlib.util.find_spec('html.parser')
    AttributeError: module 'importlib' has no attribute 'util'
    >>>
    

    With python3.10 on macOS:

    $ python3.10 -m venv 3.10
    $ source 3.10/bin/activate
    $ pip install markdown
    Collecting markdown
      Using cached Markdown-3.4-py3-none-any.whl (93 kB)
    Installing collected packages: markdown
    Successfully installed markdown-3.4
    WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
    You should consider upgrading via the '/Users/mike/tmp/resume.md/3.10/bin/python3.10 -m pip install --upgrade pip' command.
    $ python
    Python 3.10.3 (main, Mar 25 2022, 22:16:41) [Clang 12.0.5 (clang-1205.0.22.9)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import markdown
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/mike/tmp/resume.md/3.10/lib/python3.10/site-packages/markdown/__init__.py", line 22, in <module>
        from .core import Markdown, markdown, markdownFromFile
      File "/Users/mike/tmp/resume.md/3.10/lib/python3.10/site-packages/markdown/core.py", line 27, in <module>
        from .preprocessors import build_preprocessors
      File "/Users/mike/tmp/resume.md/3.10/lib/python3.10/site-packages/markdown/preprocessors.py", line 29, in <module>
        from .htmlparser import HTMLExtractor
      File "/Users/mike/tmp/resume.md/3.10/lib/python3.10/site-packages/markdown/htmlparser.py", line 29, in <module>
        spec = importlib.util.find_spec('html.parser')
    AttributeError: module 'importlib' has no attribute 'util'
    

    pip install "markdown<3.4" works, so this is perhaps a regression in the 3.4 release?

    opened by mikepqr 39
  • Abandon or Modify ElementTree?

    Abandon or Modify ElementTree?

    The short version:

    As part of version 3.0 (see #391) should Python-Markdown perhaps abandon ElementTree for a different document object like Docutils' node tree or use a modified ElementTree for internally representing the Parsed HTML document?

    Any and all feedback is welcome.

    The long version:

    Starting in Python-Markdown version 2.0, internally parsed documents have been represented as ElementTree objects. While this mostly works, there are a few irritations. ElementTree (hereinafter ET) was designed for XML, not HTML and therefore a few of its design choices are less than ideal when working with HTML.

    For example, by design, XML does not generally have text and child nodes interspersed like HTML does. While ET provides text and tail attributes on each element, it is not as easy to work with as it would be if the text was contained in child "TextNodes" (much like JavaScript's DOM). Additionally, ET nodes have no knowledge of their parent(s), which can be a problem in certain HTML specific situations (some elements cannot contain other elements as children or grandchildren or great-grandchildren...).

    I see two possible workarounds to this: Modify ET or use a different type of object.

    Modifying ElementTree

    We already have a modified serializer which gives us better HTML output (it is actually a modified HTML serializer from ET) and we already import ET and document that all extensions should import ET from Markdown. Therefore, if we were to change anything (via subclasses, etc) those changes would propagate throughout all extensions without too much change.

    In fact, some time ago, I played around with the idea of making ET nodes aware of their parents. While it worked, I quickly abandoned it as I realized that it would not work for cElementTree. However, on further consideration, we don't really need cElementTree (most of the benefits are in a faster XML parser which we don't use).

    Interestingly, in Python 3.3 cElementTree is deprecated. What actually happens is that ET defines the Python implementation and then at the bottom of the module, it tries to import the C implementation, which upon success, overrides the Python objects of the same name. What is interesting about this is that the Python implementation of the Element class (ET's node object) is preserved as _Element_Py for external code which needs access to it (as explained in the comments).

    I envision a modified ET lib to basically subclass the Python Element object to enforce knowledge of parents for all nodes. Then a TextNode would be created which works essentially like Comments work now:

    def TextElement(text=None):
        element = Element(TextElement)
        element.text = text
        return element
    

    The serializer would then be updated to properly output TextElements. In fact, at some point, the serializer might even be able to loose knowledge of the text and tail attributes on regular nodes. However, that last bit could wait for all extensions to adopt the new stuff.

    In addition to TextElement we could also have RawTextElement and AtomicTextElement. Both would be ignored by the parser (no additional parsing would take place). However, a RawTextElement would be given special treatment by the serializer in that no escaping would take place (raw HTML could be stored inline in the document rather than in a separate store with placeholders in the document), whereas an AtomicTextElement would be serialized like a regular TextElement.

    The advantage of an AtomicTextElement (over the existing AtomicString) is that a single node could have multiple child text nodes. Today, each node only gets one text attribute. Therefore, when a AtomicString is concatenated with an existing text string, we lose the 'atomic' quality of the sub-string. However, with this change each sub-string can reside in its own separate text node and maintain the 'atomic' quality when necessary.

    Using Docutils

    Rather that creating our own one-off hacked version of ET, we could instead use an already existing library which gives us all of the same features (and more). Today, the only widely supported and stable library I'm aware of is Docutils' Document Tree. While the Document Tree is described as an XML representation of a document, Docutils provides a Python API to work with the Document Tree which is very similar to the modified ET API I described above (known parents, TextElement, FixedTextElement...). Unfortunately that API is not documented. Although, the the source code is easy enough to follow.

    Until recently, I was of the assumption that to implement something that used Docutils, one would need to define a bunch of directives (etc) which more-or-less modify the ReST parser. However, take a look at the Overview of the Docutils Architecture. A parser simply needs to create a node tree. In fact, the base Parser class is only a few simple methods. The entire directives thing is in a separate directory under the ReST Parser only. Theoretically, one could subclass the base Parser class, and build a node tree using whatever parsing method desired and Docutils wouldn't care.

    For that matter, Python-Markdown would not have to replicate Docutils "Parser" API. We could just use the node tree internally. As a plus, this would give us access to all of the built-in and third party Docutils writers (serializers). In other words, we would get all of Docutils output formats for free.

    Additionally, Docutils' node tree also provides for various meta-data to be stored within the node tree. For example, each node can contain the line and column at which its contents were found in the original source document. This provides an easy way for a spellchecker to run against the parser and report misspelled words in the document without first converting it to HTML, among other uses which do not require serialized output.

    No, this would not make Python-Markdown suddenly able to be supported by Sphinx. Sphinx is mostly a collection of custom directives built on top of the ReST parser. ReST directives do not make sense in Markdown. However, we could convert Markdown to ReST as many other third party parsers convert various formats to ReST via a ReST writer. There is also at least one third party writer which outputs Markdown from a node tree. By adopting Docutils node tree, Python-Markdown could become part of an ecosystem for converting between all sorts of various document formats (an expandable competitor to Pandoc?).

    The downsides to using Docutils are that we are then relying on a third party library (up till now, Python-Markdown has not) and all extensions would absolutely be forced to change to support the new version. It is also possible that we wouldn't be able to use the available HTML writer as the default because of some inherent differences with Markdown and ReST (ReST is much more verbose and we might need to hack the node tree or the writer to get the writer to output correct HTML from a Markdown perspective -- I have not investigated this).

    As it stands now, there are various small changes required of extensions between version 2 and 3, but I expect that most extensions would be able to support both without much effort. If we went with Docutils, that would no longer be the case.

    Or, maybe this whole thing is a bad idea and we should just continue to use ET as-is.

    Any and all feedback is welcome.

    feature core needs-decision 
    opened by waylan 39
  • Bold/Italic bug

    Bold/Italic bug

    I think I'm running up against another bold/italics bug. I did some quick searches and it looks like the other issues were considered resolved, sorry if I'm re-reporting on something already fixed that hasn't made it upstream yet.

    Installed from pip current Python-Markdown version 3.0.1

    The raw markdown line that breaks is:

    This is text **bold *italic bold*** with more text
    

    The output I'm getting is as follows:

    <p>This is text <strong>bold *italic bold</strong>* with more text</p>
    

    However, the following format does seem to work correctly.

    This is text ***bold italic** italic* more text
    

    The output is

    <p>This is text <em><strong>bold italic</strong> italic</em> more text</p>
    
    bug confirmed 
    opened by Dave-ts 27
  • Add SmartyPants extension as part of Python-Markdown

    Add SmartyPants extension as part of Python-Markdown

    This is a feature request. It'd be nice if there was a built-in (batteries included) extension to implement SmartyPants quoting by turning on a simple extension.

    I notice that someone is already using SmartyPants with Markdown for Python, though not as an extension: http://byrneswoder.com/blog/one-secret-to-generating-clean-html-from-text/

    feature someday-maybe 
    opened by david-a-wheeler 27
  • Markdown in raw HTML stops working after first raw HTML tag

    Markdown in raw HTML stops working after first raw HTML tag

    Hello,

    I'm using Markdown and the "extra" extension to support Markdown in raw HTML div elements with the attribute markdown='1' as explained on the example page: https://pythonhosted.org/Markdown/extensions/extra.html

    However, as soon as a raw HTML tag without the "markdown" attribute occurs inside the elment, Markdown will not be processed anymore after the end of that element. If you put any Markdown code after the "Raw html blocks may also be nested." text of the example page, it will not be processed even though it is still in the "markdown='1'" div.

    Short example with 3 Markdown uses where the second one is not processed:

    <div markdown="1">
    
    Markdown is *active* here.
    
    <div name="RawHtml">
    Raw html blocks may also be nested.
    </div>
    
    Markdown is *not* active anymore here.
    
    </div>
    
    Markdown is *active again* here.
    
    opened by fpw 26
  • Fix InlineProcessor and add better italic and bold support

    Fix InlineProcessor and add better italic and bold support

    Changes

    • Fixes issues with tails in InlineProcessor
    • Adds better italic and bold support
    • New tests for changes and improved code coverage

    Tests

    I cannot install Python 3.1 on my OSX Mavericks

      py27: commands succeeded
    ERROR:   py31: InterpreterNotFound: python3.1
      py32: commands succeeded
      py33: commands succeeded
      py34: commands succeeded
    
    Name                               Stmts   Miss  Cover   Missing
    ----------------------------------------------------------------
    markdown/__init__                    193     42    78%   141, 330-333, 391-434, 479-493
    markdown/__main__                     40      0   100%
    markdown/blockparser                  30      0   100%
    markdown/blockprocessors             273      7    97%   189, 194-195, 204, 253, 546, 552
    markdown/extensions/__init__          34      4    88%   28-29, 36-37
    markdown/extensions/abbr              38      0   100%
    markdown/extensions/admonition        45      0   100%
    markdown/extensions/attr_list         96      0   100%
    markdown/extensions/codehilite        99     19    81%   25-27, 43-44, 105-120, 177-178, 181
    markdown/extensions/def_list          59      2    97%   95-96
    markdown/extensions/extra             54      0   100%
    markdown/extensions/fenced_code       48      0   100%
    markdown/extensions/footnotes        176      8    95%   91-92, 105, 111, 118, 243, 288-289
    markdown/extensions/headerid          84      4    95%   72-73, 75, 103
    markdown/extensions/meta              35      0   100%
    markdown/extensions/nl2br             11      0   100%
    markdown/extensions/sane_lists        17      0   100%
    markdown/extensions/smart_strong      14      0   100%
    markdown/extensions/smarty            87      0   100%
    markdown/extensions/tables            56      0   100%
    markdown/extensions/toc              134     18    87%   50-52, 96-104, 118-120, 151, 182, 191
    markdown/extensions/wikilinks         49      0   100%
    markdown/inlinepatterns              247      1    99%   225
    markdown/odict                       113     37    67%   25-32, 35, 42, 54, 57, 60-66, 69-71, 104-105, 108-110, 119-122, 129, 136, 139-140, 160, 185-189
    markdown/postprocessors               49      0   100%
    markdown/preprocessors               207     17    92%   87, 92, 116, 135, 171, 199, 273, 292-304
    markdown/serializers                 153     48    69%   82-83, 106-117, 143, 147-150, 158, 160, 165, 169-174, 181, 203, 218, 224-235, 238, 254, 259, 262, 266, 269
    markdown/treeprocessors              187      4    98%   80, 203-205
    markdown/util                         59      0   100%
    ----------------------------------------------------------------
    TOTAL                               2687    211    92%
    

    The only modification that had to be made to existing tests were for these two issues (which I view as improvements):

    --- /Users/facelessuser/Desktop/Python-Markdown/tests/misc/para-with-hr.html
    +++ actual_output.html
    @@ -2,5 +2,5 @@
     <hr />
     <p>Followed by another paragraph.</p>
     <p>Here is another paragraph, followed by:
    -*** not an HR.
    +<em>*</em> not an HR.
     Followed by more of the same paragraph.</p>
    
    --- /Users/facelessuser/Desktop/Python-Markdown/tests/misc/em_strong.html
    +++ actual_output.html
    @@ -4,7 +4,7 @@
     <p>With spaces: * *</p>
     <p>Two underscores __</p>
     <p>with spaces: _ _</p>
    -<p>three asterisks: ***</p>
    +<p>three asterisks: <em>*</em></p>
     <p>with spaces: * * *</p>
     <p>three underscores: ___</p>
     <p>with spaces: _ _ _</p>
    

    Let me know what you think.

    opened by facelessuser 25
  • Deadlock: never ending match() in treeprocessors.py!

    Deadlock: never ending match() in treeprocessors.py!

    Hi, match = pattern.getCompiledRegExp().match(data[startIndex:]) never ends and hangs python process. This happens in v2.6.11 python 2/3 and I guess later version are affected as well. It happens only with certain input data and with patternIndex = 2. Please see the python file attached with the sample code, pattern #2 and data. reg.py.txt

    invalid 
    opened by vladsf 24
  • Replace homegrown OrderedDict with purpose-built Registry.

    Replace homegrown OrderedDict with purpose-built Registry.

    All processors and patterns now get "registered" to a Registry. Each item is given a name (string) and a priority. The name is for later reference and the priority can be either an integer or float and is used to sort. A Registry instance is a list-like iterable with the items auto-sorted by priority. If two items have the same priority, then they are listed in the order there were "registered". Registering a new item with the same name as an already registered item replaces the old item with the new item (however, the new item is sorted by its priority). To remove an item, "deregister" it by name or index.

    Fixes #418.

    Note that this is an adaptation of #510 which has been rebased onto master.

    opened by waylan 24
  • Replace OrderedDict with prioritized List?

    Replace OrderedDict with prioritized List?

    I'm looking for feedback on a possible Extension API change which might be introduced in version 3.0.

    Currently (version 2.x), all extensions register where they are run within the parser with our homegrown OrderedDict. Each piece of code is assigned a name in the dict and an extension inserts itself before or after a given name.

    patterns.add(SomePattern(), '<emphasis')  # insert before "emphasis" pattern
    del patterns['emphasis']                  # remove "emphasis" pattern
    

    What I am proposing is that instead of an Ordered Dict, we use a list (as we did in version 1.x). However, each item in the list is assigned a "priority" attribute which is used to sort the list in order. For example, each inlinepattern class would have a priority set (10, 15, 20, 25, 30, ...). Higher numbers get run first. An extension could set a priority of 22 to get placed between items with priorities of 20 & 25. If a second extension needed to also be between 20 & 25 but before the extension with priority 22, is could use priority 23 or 24, and we don't have the possible conflicts that exist now.

    The tricky part would be in removing existing patterns. It is easy to do with the named keys. It might be a little more tricky without. And we can't hardcode index position as that can be changed by other extensions. The entire list would need to be searched for the given class instance. Do we set a "name" property on each class for this reason, rely only on the "priority" property, or something else? Perhaps the built-ins could all be assigned to constants. That way, the constant (a text-based name) could be used for reference purposes, but the value would be the integer (much like the logging modules error codes). Or maybe the constants could point to the class instances themselves.

    Therefore, where patterns is the list of patterns, one might alter the list like this:

    patterns.register(SomePattern(), priority=23)
    patterns.deregister(inlinepatterns.EMPHASIS)
    

    I'm using register/deregister as opposed to register/unregister for the reasons stated here (although it could change by popular demand). However, what I'm not sure about is the best way to define the priority:

    This is odd as the priority is a function of the registration process, not creation of the class instance:

    myinstance = MyPattern(priority=23)
    patterns.register(myinstance)
    

    This is easy to understand but then requires the parser to monkeypatch the class instance to attach the priority to the class:

    myinstance = MyPattern()
    patterns.register(myinstance, priority=23)
    

    In the first example, register is simply an alias to list.append. However, the second example would require something like this:

    class PriorityList(list):
        def register(self, item, priority):
            item.priority = priority  # the monkeypatch
            self.append(item)
    

    Of course, regardless of implementation details, we would only sort once after all extensions are loaded.

    WHY?

    1. Provides more flexibility to extension authors. Multiple third-party-extensions would be less likely to conflict with each other.

      For example, in the current situation, Extension A removes "emphasis" and extension B tries to insert before emphasis (<emphasis). If the user lists the extensions [B, A], everything works fine, but if she lists them as [A, B], then a KeyError will be raised when setting up extension B.

      Or two extensions might use the same name. For example multiple "math" extensions currently exist but each does something slightly different. A user could conceivably try to use two of them together, but one might replace the other.

    2. Gets rid of the awkward <emphasis syntax. Ugh.

    3. Removes the homegrown (and mostly untested) OrderedDict (the implementation that ships with Python only allows adding to the end so it is useless for this purpose). The fact that the new implementation is just a sorted list is an implementation detail that does not even need to be mentioned in the docs. Extension authors only need to know about and use the two methods register and deregister on a registry.

    Any and all feedback is welcome.

    opened by waylan 24
  • Proposal: Move `hr` extension (or part of it) after list handling

    Proposal: Move `hr` extension (or part of it) after list handling

    Doing some prototyping of #1175, I ran into a particular issue. In the attempt to try and utilize a YAML-ish frontmatter header for general purpose block extensions, I noticed that hr are handled before lists.

    It seems the motivation was to ensure that - - - would be processed as an hr tag and not a list, but this also exclude --- as well.

    Before I lay out some potential proposals, I should state a few behavioral things about lists and hr.

    1. Even with hr disabled, lists do not allow --- list by default, as a list requires a space after the first -. This means that you can't create a list with a lone - or --- unless you add a trailing space after the first -.
    2. Even if other parsers allow - - - list, this is difficult to use in the real world, and I suspect simply on oversight as I can't see a real-world use of this. But if it is required just to be consistent with how the rules are laid out, I guess I can see why it behaves the way it does.

    Proposals:

    1. Split HR into 2 rules: HR1 which specifically catches - - - cases before lists, and HR2 which catches --- cases after list processing. This would keep behavior identical as it is now.

    2. Put an exception case in UListProcessor.test that at least rejects the case if the block starts with ^[ ]{0,%d}([+*-]{3,})[ ]*(?:\n|$). Then just move HR completely.

      I'd like to not limit all cases where a single unordered list indicator can have only a single space after it as that is how we make SuperFences work:

      - 
          ```
          code
          ```
      

    Of course, a suitable priority would have to be determined to minimize any potential breakages, maybe attaching the new hr immediately after UListProcessor? Maybe 29 or 29.9?

    I realize the other potential option is that I can just override the HR rule myself when I register a said general purpose block. I also realize this issue could be moot if the aforementioned general purpose block doesn't use YAML-ish fences for its block frontmatter.

    Figured I would open up a separate discussion here to discuss this particular issue.

    feature core 
    opened by facelessuser 2
  • Getting issue while executing markdown in lambda

    Getting issue while executing markdown in lambda

    Please find the below error , The error we are getting is with markdown 3.4.1 and it is working fine when using python version 3.3.7,

    [ERROR] Runtime.ImportModuleError: Unable to import module 'handler': cannot import name 'etree' from 'markdown.util' (/var/task/site-packages/markdown/util.py)Traceback (most recent call last): | [ERROR] Runtime.ImportModuleError: Unable to import module 'handler': cannot import name 'etree' from 'markdown.util' (/var/task/site-packages/markdown/util.py) Traceback (most recent call last):

    more-info-needed 
    opened by mhrjan 3
  • chore: Set permissions for GitHub actions

    chore: Set permissions for GitHub actions

    Restrict the GitHub token permissions only to the required ones; this way, even if the attackers will succeed in compromising your workflow, they won’t be able to do much.

    • Included permissions for the action. https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions

    https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions

    https://docs.github.com/en/actions/using-jobs/assigning-permissions-to-jobs

    Keeping your GitHub Actions and workflows secure Part 1: Preventing pwn requests

    Signed-off-by: naveen [email protected]

    needs-review 
    opened by naveensrinivasan 2
  • Extension md_in_html does not recognize tags with hyphens

    Extension md_in_html does not recognize tags with hyphens

    Web components are custom HTML components that are required to have - in their names. This breaks current HTML handling since these elements are not considered. IMHO they should be treated the same as <div> ("block" elements, if I'm not mistaken).

    The following was tested in current main with the extension md_in_html active.

    input

    <a-b>
    
    asdf
    
    </a-b>
    

    output:

    <p><a-b></p>
    <p>asdf</p>
    <p></a-b></p>
    

    expected:

    <a-b>
    <p>asdf</p>
    </a-b>
    

    I went through the code and might know how to add this, but I would like the maintainers' input before proceeding.

    feature someday-maybe extension confirmed 
    opened by igordsm 4
  • autolink for non-HTTP URIs, and other non-tag content, produces invalid XML

    autolink for non-HTTP URIs, and other non-tag content, produces invalid XML

    Consider the following:

    import markdown
    
    md = '''
    Here are some elements:
    
      * url <http://example.org>
      * repo url <ssh://example.org>, which is a non-HTTP URL
      * and <urn:foo> is something else
      * ssh url2 <ssh:[email protected]>, handled as an email address
      * misc element <em>boo!</em>
    '''
    
    converter = markdown.Markdown()
    print(converter.convert(md))
    

    This renders as

    <p>Here are some elements:</p>
    <ul>
    <li>url <a href="http://example.org">http://example.org</a></li>
    <li>repo url <ssh://example.org>, which is a non-HTTP URL</li>
    <li>and <urn:foo> is something else</li>
    <li>ssh url2 <a href="&#109;&#97;&#105;&#108;&#116;&#111;&#58;&#115;&#115;&#104;&#58;&#109;&#101;&#64;&#101;&#120;&#97;&#109;&#112;&#108;&#101;&#46;&#111;&#114;&#103;">&#115;&#115;&#104;&#58;&#109;&#101;&#64;&#101;&#120;&#97;&#109;&#112;&#108;&#101;&#46;&#111;&#114;&#103;</a>, handled as an email address</li>
    <li>misc element <em>boo!</em></li>
    </ul>
    

    I think items number 2 and 3 are incorrect, (a) because the behaviour doesn't match two significant Markdown specs, and (b) because they are both invalid XML (yes, <urn:foo> looks like an XML element with a namespace prefix; let's not go there...).

    The autolink feature in the Daring Fireball spec is ‘for URLs and email addresses’ (though the only URL in that example is an HTTP URL). The corresponding section in the CommonMark spec says that the autolink should happen for an absolute URI. So the second case should be turned into <a href='ssh://example.org'>ssh://example.org</a>.

    What appears to be happening, instead, is that this is being interpreted as literal HTML. The relevant section of Gruber's spec is rather vague, but the corresponding part of the CommonMark spec says that this should happen only to ‘[t]ext between < and > that looks like an HTML tag’, which of course <ssh://example.org> doesn't (CommonMark: ‘A tag name consists of an ASCII letter followed by zero or more ASCII letters, digits, or hyphens (-)’).

    Independently of any spec, however, having <ssh://example.org> appear in the output means that that output is syntactically invalid, and I feel this shouldn't happen for any input, however insane.

    Suggestion:

    • When <starttag> consists of something other than [a-zA-Z][a-zA-Z0-9-]*, then it is either a URI, in which case it should be turned into an <a> element, or it is not, in which case it should be included literally in the output, as if the content were instead enclosed in backticks.

    This would imply that item 3 should render as <code>urn:foo</code>.

    feature someday-maybe core confirmed 
    opened by nxg 3
  • kwargs are not checked for unexpected parameters

    kwargs are not checked for unexpected parameters

    • Repro case: return markdown.markdown(post, output='html5')
    • Expected behavior: Error or warning
    • Actual behavior: Typo is ignored

    For a while I had the code return markdown.markdown(post, output='html5'), which seemed to be working OK. However, it turns out that was a typo -- I should have been using output_format. Normally, the runtime would catch this, but instead **kwargs are collected and passed to the Markdown class, where keys are retrieved as needed.

    It's not a security issue in this library, as far as I can tell, but this pattern has lead to security issues elsewhere. (Imagine if there were a safe_output kwarg that someone typo'd.)

    I think this could be as simple as having a known-keys set that the kwargs dict's keys are checked against before processing. I'd be happy to contribute a PR if this would be an acceptable approach.

    feature core needs-decision 
    opened by timmc 2
Owner
Python-Markdown
A Python implementation of John Gruber’s Markdown with extensions.
Python-Markdown
Provides syntax for Python-Markdown which allows for the inclusion of the contents of other Markdown documents.

Markdown-Include This is an extension to Python-Markdown which provides an "include" function, similar to that found in LaTeX (and also the C pre-proc

Chris MacMackin 80 Jul 27, 2022
Mdformat is an opinionated Markdown formatter that can be used to enforce a consistent style in Markdown files

Mdformat is an opinionated Markdown formatter that can be used to enforce a consistent style in Markdown files. Mdformat is a Unix-style command-line tool as well as a Python library.

Executable Books 134 Aug 11, 2022
A markdown lexer and parser which gives the programmer atomic control over markdown parsing to html.

A markdown lexer and parser which gives the programmer atomic control over markdown parsing to html.

stonepresto 3 May 8, 2022
Markdown parser, done right. 100% CommonMark support, extensions, syntax plugins & high speed. Now in Python!

markdown-it-py Markdown parser done right. Follows the CommonMark spec for baseline parsing Configurable syntax: you can add new rules and even replac

Executable Books 346 Aug 11, 2022
markdown2: A fast and complete implementation of Markdown in Python

Markdown is a light text markup format and a processor to convert that to HTML. The originator describes it as follows: Markdown is a text-to-HTML con

Trent Mick 2.4k Aug 5, 2022
A fast yet powerful Python Markdown parser with renderers and plugins.

Mistune v2 A fast yet powerful Python Markdown parser with renderers and plugins. NOTE: This is the re-designed v2 of mistune. Check v1 branch for ear

Hsiaoming Yang 2.1k Aug 2, 2022
Static site generator that supports Markdown and reST syntax. Powered by Python.

Pelican Pelican is a static site generator, written in Python. Write content in reStructuredText or Markdown using your editor of choice Includes a si

Pelican dev team 11.1k Aug 8, 2022
Extensions for Python Markdown

PyMdown Extensions Extensions for Python Markdown. Documentation Extension documentation is found here: https://facelessuser.github.io/pymdown-extensi

Isaac Muse 617 Aug 9, 2022
A fast, extensible and spec-compliant Markdown parser in pure Python.

mistletoe mistletoe is a Markdown parser in pure Python, designed to be fast, spec-compliant and fully customizable. Apart from being the fastest Comm

Mi Yu 503 Aug 13, 2022
Lightweight Markdown dialect for Python desktop apps

Litemark is a lightweight Markdown dialect originally created to be the markup language for the Codegame Platform project. When you run litemark from the command line interface without any arguments, the Litemark Viewer opens and displays the rendered demo.

null 10 Apr 23, 2022
A lightweight and fast-to-use Markdown document generator based on Python

A lightweight and fast-to-use Markdown document generator based on Python

快乐的老鼠宝宝 1 Jan 10, 2022
A markdown template manager for writing API docs in python.

DocsGen-py A markdown template manager for writing API docs in python. Contents Usage API Reference Usage You can install the latest commit of this re

Ethan Evans 1 May 10, 2022
Convert HTML to Markdown-formatted text.

html2text html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to

Alireza Savand 1.3k Aug 10, 2022
Comprehensive Markdown plugin built for Django

Django MarkdownX Django MarkdownX is a comprehensive Markdown plugin built for Django, the renowned high-level Python web framework, with flexibility,

neutronX 725 Aug 5, 2022
Awesome Django Markdown Editor, supported for Bootstrap & Semantic-UI

martor Martor is a Markdown Editor plugin for Django, supported for Bootstrap & Semantic-UI. Features Live Preview Integrated with Ace Editor Supporte

null 600 Aug 13, 2022
Livemark is a static page generator that extends Markdown with interactive charts, tables, and more.

Livermark This software is in the early stages and is not well-tested Livemark is a static site generator that extends Markdown with interactive chart

Frictionless Data 82 Aug 7, 2022
A super simple script which uses the GitHub API to convert your markdown files to GitHub styled HTML site.

A super simple script which uses the GitHub API to convert your markdown files to GitHub styled HTML site.

Çalgan Aygün 210 Jul 27, 2022
Remarkable Markdown Debian Package Fix

Remarkable debian package fix For some reason the Debian package for remarkable markdown editor has not been made to install properly on Ubuntu 20.04

Eric Seifert 30 Jul 20, 2022
Read a list in markdown and do something with it!

Markdown List Reader A simple tool for reading lists in markdown. Usage Begin by running the mdr.py file and input either a markdown string with the -

Esteban Garcia 3 Sep 13, 2021