>> print(json.dumps(xmltod" /> >> print(json.dumps(xmltod" /> >> print(json.dumps(xmltod"/>

Python module that makes working with XML feel like you are working with JSON

Related tags



xmltodict is a Python module that makes working with XML feel like you are working with JSON, as in this "spec":

Build Status

>>> print(json.dumps(xmltodict.parse("""
...  <mydocument has="an attribute">
...    <and>
...      <many>elements</many>
...      <many>more elements</many>
...    </and>
...    <plus a="complex">
...      element as well
...    </plus>
...  </mydocument>
...  """), indent=4))
    "mydocument": {
        "@has": "an attribute", 
        "and": {
            "many": [
                "more elements"
        "plus": {
            "@a": "complex", 
            "#text": "element as well"

Namespace support

By default, xmltodict does no XML namespace processing (it just treats namespace declarations as regular node attributes), but passing process_namespaces=True will make it expand namespaces for you:

>>> xml = """
... <root xmlns="http://defaultns.com/"
...       xmlns:a="http://a.com/"
...       xmlns:b="http://b.com/">
...   <x>1</x>
...   <a:y>2</a:y>
...   <b:z>3</b:z>
... </root>
... """
>>> xmltodict.parse(xml, process_namespaces=True) == {
...     'http://defaultns.com/:root': {
...         'http://defaultns.com/:x': '1',
...         'http://a.com/:y': '2',
...         'http://b.com/:z': '3',
...     }
... }

It also lets you collapse certain namespaces to shorthand prefixes, or skip them altogether:

>>> namespaces = {
...     'http://defaultns.com/': None, # skip this namespace
...     'http://a.com/': 'ns_a', # collapse "http://a.com/" -> "ns_a"
... }
>>> xmltodict.parse(xml, process_namespaces=True, namespaces=namespaces) == {
...     'root': {
...         'x': '1',
...         'ns_a:y': '2',
...         'http://b.com/:z': '3',
...     },
... }

Streaming mode

xmltodict is very fast (Expat-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like Discogs or Wikipedia:

>>> def handle_artist(_, artist):
...     print(artist['name'])
...     return True
>>> xmltodict.parse(GzipFile('discogs_artists.xml.gz'),
...     item_depth=2, item_callback=handle_artist)
A Perfect Circle
King Crimson
Chris Potter

It can also be used from the command line to pipe objects to a script like this:

import sys, marshal
while True:
    _, article = marshal.load(sys.stdin)
$ bunzip2 enwiki-pages-articles.xml.bz2 | xmltodict.py 2 | myscript.py

Or just cache the dicts so you don't have to parse that big XML file again. You do this only once:

$ bunzip2 enwiki-pages-articles.xml.bz2 | xmltodict.py 2 | gzip > enwiki.dicts.gz

And you reuse the dicts with every script that needs them:

$ gunzip enwiki.dicts.gz | script1.py
$ gunzip enwiki.dicts.gz | script2.py


You can also convert in the other direction, using the unparse() method:

>>> mydict = {
...     'response': {
...             'status': 'good',
...             'last_updated': '2014-02-16T23:10:12Z',
...     }
... }
>>> print(unparse(mydict, pretty=True))
<?xml version="1.0" encoding="utf-8"?>

Text values for nodes can be specified with the cdata_key key in the python dict, while node properties can be specified with the attr_prefix prefixed to the key name in the python dict. The default value for attr_prefix is @ and the default value for cdata_key is #text.

>>> import xmltodict
>>> mydict = {
...     'text': {
...         '@color':'red',
...         '@stroke':'2',
...         '#text':'This is a test'
...     }
... }
>>> print(xmltodict.unparse(mydict, pretty=True))
<?xml version="1.0" encoding="utf-8"?>
<text stroke="2" color="red">This is a test</text>

Lists that are specified under a key in a dictionary use the key as a tag for each item. But if a list does have a parent key, for example if a list exists inside another list, it does not have a tag to use and the items are converted to a string as shown in the example below. To give tags to nested lists, use the expand_iter keyword argument to provide a tag as demonstrated below. Note that using expand_iter will break roundtripping.

>>> mydict = {
...     "line": {
...         "points": [
...             [1, 5],
...             [2, 6],
...         ]
...     }
... }
>>> print(xmltodict.unparse(mydict, pretty=True))
<?xml version="1.0" encoding="utf-8"?>
        <points>[1, 5]</points>
        <points>[2, 6]</points>
>>> print(xmltodict.unparse(mydict, pretty=True, expand_iter="coord"))
<?xml version="1.0" encoding="utf-8"?>

Ok, how do I get it?

Using pypi

You just need to

$ pip install xmltodict

RPM-based distro (Fedora, RHEL, …)

There is an official Fedora package for xmltodict.

$ sudo yum install python-xmltodict

Arch Linux

There is an official Arch Linux package for xmltodict.

$ sudo pacman -S python-xmltodict

Debian-based distro (Debian, Ubuntu, …)

There is an official Debian package for xmltodict.

$ sudo apt install python-xmltodict


There is an official FreeBSD port for xmltodict.

$ pkg install py36-xmltodict

openSUSE/SLE (SLE 15, Leap 15, Tumbleweed)

There is an official openSUSE package for xmltodict.

# Python2
$ zypper in python2-xmltodict

# Python3
$ zypper in python3-xmltodict
  • xml containing 1 child

    xml containing 1 child

    Consider the following code

    xml = """<?xml version="1.0" encoding="utf-8" ?>

    Wouldn't you expect to have an iterable object even when there is only 1 child?

    opened by ocZio 31
  • Several Enhancements

    Several Enhancements

    I implemented several enhancements:

    • The ability to create dictionaries of elements based on known index-key elements.
    • The ability to force lists for certain tags.
    • The ability to strip namespaces from results.
    • The ability to receive data in new data structures that separate the XML attributes from the data.
    • The ability to parse ElementTree data into a dictionary.
    • The ability to use new parsing classes that keep track of default options.
    • The ability to use an iterator in streaming mode.
    • Whitespace stripping now applies to data in streaming mode.

    While implementing the enhancements, I took care not to disturb the default behavior. Therefore, all the changes should not impact existing users.

    Also, all the changes have unit tests that cover them. (The code coverage is > 90%. Most of the misses are in lines of code that are meant to handle variations on the Element/ElementTree objects, depending on which library created them.)

    Index Keys

    Imagine you have this input:


    In this case, it might be helpful to have the 'servers' dictionary keyed off of the server name. You can now do this using the index_keys option. With this option, the named tags will be "promoted" to be the key for their subtree.

    So, for example, the index_keys=('name',) option will produce this data structure:

    {u'servers': {u'server1': {u'name': u'server1',
                               u'os': u'Linux'},
                  u'server2': {u'name': u'server2',
                               u'os': u'Windows'}}}

    But, what if you need the "server" tag because it is intermixed with other tags? In that case, you can turn off the "index_keys_compress" option.

    For example:

    >>> xmltodict.parse("""
    ... <devices>
    ...     <server>
    ...         <name>server1</name>
    ...         <os>Linux</os>
    ...     </server>
    ...     <server>
    ...         <name>server2</name>
    ...         <os>Windows</os>
    ...     </server>
    ...     <workstation>
    ...         <name>host1</name>
    ...         <os>Linux</os>
    ...     </workstation>
    ...     <workstation>
    ...         <name>host2</name>
    ...         <os>Windows</os>
    ...     </workstation>
    ... </devices>
    ... """, new_style=True, index_keys=('name',), index_keys_compress=False).prettyprint(width=2)
    {u'devices': {u'server': {u'server1': {u'name': u'server1',
                                           u'os': u'Linux'},
                              u'server2': {u'name': u'server2',
                                           u'os': u'Windows'}},
                  u'workstation': {u'host1': {u'name': u'host1',
                                              u'os': u'Linux'},
                                   u'host2': {u'name': u'host2',
                                              u'os': u'Windows'}}}}

    Force Lists

    Sometimes, you have a node that may have one or more items. Rather than testing for both a list and single item, you can simplify your code by having the xmltodict parser always create a list for you.

    For example, compare these outputs:

    >>> xmltodict.parse("""
    ... <servers>
    ...     <server>
    ...         <name>server1</name>
    ...         <os>Linux</os>
    ...     </server>
    ... </servers>
    ... """, new_style=True).prettyprint(width=2) 
    {u'servers': {u'server': {u'name': u'server1',
                              u'os': u'Linux'}}}
    >>> xmltodict.parse("""
    ... <servers>
    ...     <server>
    ...         <name>server1</name>
    ...         <os>Linux</os>
    ...     </server>
    ...     <server>
    ...         <name>server2</name>
    ...         <os>Windows</os>
    ...     </server>
    ... </servers>
    ... """, new_style=True).prettyprint(width=2) 
    {u'servers': {u'server': [{u'name': u'server1',
                               u'os': u'Linux'},
                              {u'name': u'server2',
                               u'os': u'Windows'}]}}

    In the first case rv['servers']['server'] points to a single item. In the second case, rv['servers']['server'] points to a list.

    You can force this to always be a list by setting the "force_list" parameter:

    >>> xmltodict.parse("""
    ... <servers>
    ...     <server>
    ...         <name>server1</name>
    ...         <os>Linux</os>
    ...     </server>
    ... </servers>
    ... """, new_style=True, force_list=('server',)).prettyprint(width=2)
    {u'servers': {u'server': [{u'name': u'server1',
                               u'os': u'Linux'}]}}

    Strip Namespaces

    Let me start by granting the truth that XML namespaces are an essential part of node and attribute names. Now, having said that, there are times when the namespaces are already well-known and are merely extra information that a user can (and will try to) safely ignore. In these cases, you can set the "strip_namespace" option to strip namespaces.

    For example:

    >>> xmltodict.parse("""
    ... <servers xmlns="http://a.com/" xmlns:b="http://b.com/">
    ...     <b:server>
    ...         <name>test</name>
    ...     </b:server>
    ... </servers>
    ... """, new_style=True, strip_namespace=True).prettyprint(width=2)
    {u'servers': {u'server': {u'name': u'test'}}}

    New Classes

    One of the difficulties in dealing with XML data in Python is representing the richness of the XML data (including, especially, the dual layers of attributes and data) while creating the simplest data structure possible. I'm sure many people have tried. I tried again.

    The fundamental premise here is this: if a user cares about an XML attribute, he/she knows to go looking for it. So, it is most important to present the main data in a simple format, and it is sufficient to provide one or more methods for users to find XML attributes.

    The three data structures are:

    • XMLCDATANode: This "quacks" like a string/unicode. (For example, XMLCDATANode("a") == "a" will evaluate to true.)
    • XMLDictNode: This "quacks" like a dict or OrderedDict.
    • XMLListNode: This "quacks" like a list.

    These data structures have some extra methods to deal with attributes:

    • has_xml_attrs(): Returns True if there are XML attributes; False otherwise.
    • get_xml_attr(name[, default]): Returns the value of the XML attribute if it exists. Otherwise, it will return the default, if given, or raise a KeyError.
    • set_xml_attr(name, value): Sets the value of the XML attribute.
    • delete_xml_attr(name): Delete the XML attribute. Raises a KeyError if the XML attribute does not exists.
    • get_xml_attr(): Returns the dictionary of XML attributes.

    These data structures also implement a prettyprint() method which takes the same options as pprint() (except, of course, for the object to be printed). The prettyprint() method prints out the data only and does not show the XML attributes. This decision was made for readability purposes. The repr() method shows both.

    I've already shown some examples of the new classes above. Here's another example:

    >>> rv = xmltodict.parse("""
    ... <servers>
    ...     <server coolness="high">
    ...         <name>server1</name>
    ...     </server>
    ... </servers>
    ... """, new_style=True)
    >>> repr(rv)
    "XMLDictNode(xml_attrs=OrderedDict(), value=OrderedDict([(u'servers', XMLDictNode(xml_attrs=OrderedDict(), value=OrderedDict([(u'server', XMLDictNode(xml_attrs=OrderedDict([(u'coolness', u'high')]), value=OrderedDict([(u'name', XMLCDATANode(xml_attrs=OrderedDict(), value=u'server1'))])))])))]))"
    >>> rv.prettyprint(width=2)
    {u'servers': {u'server': {u'name': u'server1'}}}
    >>> rv.has_xml_attrs()
    >>> rv['servers'].has_xml_attrs()
    >>> rv['servers']['server'].has_xml_attrs()
    >>> rv['servers']['server'].get_xml_attrs()
    OrderedDict([(u'coolness', u'high')])
    >>> rv['servers']['server'].get_xml_attr('coolness')
    >>> rv['servers']['server'].get_xml_attr('darkness', '@@[email protected]@')
    '@@[email protected]@'
    >>> rv['servers']['server']['name']
    XMLCDATANode(XMLattrs=OrderedDict(), value=u'server1')
    >>> rv['servers']['server']['name'] == 'server1'
    >>> rv['servers']['server'].set_xml_attr('coolness', 'low')
    >>> rv['servers'].set_xml_attr('length', '1')
    >>> rv['servers'].set_xml_attr('delete_me', True)
    >>> rv['servers'].delete_xml_attr('delete_me')
    >>> xmltodict.unparse(rv)
    u'<?xml version="1.0" encoding="utf-8"?>\n<servers length="1"><server coolness="low"><name>server1</name></server></servers>'

    Parsing ElementTree Data

    Sometimes, a user may use a library that returns an Element or ElementTree. In those cases, it would be useful to be able to convert it into an easy to use dictionary without having to first convert it to text. In those cases, the user can use the parse_lxml() method. (This was originally intended for lxml; hence, the name. However, it should work with ElementTree, as well. Indeed, I did much of my testing with cElementTree.)

    The parse_lxml() method should take the same options as the parse() method.


    >>> xml = etree.XML("<a><b>data</b></a>")
    >>> xmltodict.parse_lxml(xml, new_style=True).prettyprint()
    {'a': {'b': u'data'}}

    Parsing Classes

    Two new classes hold parsing defaults. They can be overridden on each invocation.

    • The Parser() class is used for parsing XML text.
    • The LXMLParser() class is used for parsing ElementTree objects.


    >>> parser = xmltodict.Parser(new_style=True, index_keys=('name',))
    >>> parser("<a><b><name>item1</name></b></a>").prettyprint()
    {u'a': {u'item1': {u'name': u'item1'}}}
    >>> parser("<a><b><name>item1</name></b></a>", index_keys=()).prettyprint()
    {u'a': {u'b': {u'name': u'item1'}}}
    >>> parser("<a><b><name>item1</name></b></a>", index_keys_compress=False).prettyprint()
    {u'a': {u'b': {u'item1': {u'name': u'item1'}}}}


    In streaming mode, you can now use an iterator/generator to loop through the list of matching items. This will be done with incremental parsing of the input file (however, see note below about Jython). The input file is processed 1KB at a time and then each matching node is returned on a subsequent iteration. Once all the matching nodes from the first 1KB are returned, the next 1KB is read (and so on).

    (Note: I'm not sure why, but the Travis CI test shows that Jython is failing the unit test that checks to make sure that the parsing really is done incrementally. I need to do more examination to determine whether this is a true failure, or a false failure due to a flaw in the test.)

    If the generator argument evaluates to True and item_depth is non-zero, the parser will return an iterator. On each iteration, the code will return the next (path, item) tuple at the item_depth level. These are the same items (in the same format) that would be passed to the callback function; however, they are returned at each iteration.

    Two corner cases: If generator is True and item_depth is zero, the code will return a single-item list with an empty path and the full document. If generator is True and item_callback is also set, the item_callback will be executed for each iteration prior to the iterator's return.


    >>> xml = """\
    ... <a prop="x">
    ...   <b>1</b>
    ...   <b>2</b>
    ... </a>"""
    >>> for (path, item) in xmltodict.parse(xml, generator=True, item_depth=2):
    ...     print 'path:%s item:%s' % (path, item)
    path:[(u'a', {u'prop': u'x'}), (u'b', None)] item:1
    path:[(u'a', {u'prop': u'x'}), (u'b', None)] item:2

    Whitespace Stripping Enhancement

    In streaming mode, whitespace stripping now applies to streaming mode. Previously, if the item at the item_depth was a CDATA node, whitespace stripping was not applied prior to the item being sent to the callback function. Now, whitespace stripping takes effect prior to the call to the callback function (or return of a value from the iterator).

    (Whitespace stripping is still controlled by the strip_whitespace argument.)

    Example of the previous behavior:

    >>> print xml2
    >>> for (stack, value) in xmltodict.parse(xml2, generator=True, item_depth=3, item_callback=cb):
    ...     print "\tValue: %r" % value
        Value: u'\n  \n    data1'
        Value: u'\n    data2'
        Value: u'\n    data3'
        Value: u'\n  \n    data4'
        Value: u'\n    data5'
        Value: u'\n    data6'

    Example of the new behavior:

    >>> for (stack, value) in xmltodict.parse(xml2, generator=True, item_depth=3):
    ...     print "\tValue: %r" % value
        Value: u'data1'
        Value: u'data2'
        Value: u'data3'
        Value: u'data4'
        Value: u'data5'
        Value: u'data6'
    opened by jonlooney 13
  • Optional attributes and unknown children count

    Optional attributes and unknown children count

    What's wrong?

    1. Optional attributes

    If i have a tag which can look like this:


    or like this:

    <sometag attr="123">blablabla</sometag>

    In the first case parse result will look like this:

        'sometag': 'blablabla'

    and in the second case it will look like:

        'sometag': {
            '@attr': '123',
             '#text': 'blablabla'

    So if I want to get its text content I can't just do something like:

    something = parse_result['sometag']['#text']

    I have to write such ugly things:

    something = parse_result['sometag']
    something = something['#text'] if type(something) is OrderedDict else something

    2. Unknown children count

    If I have a tag which can look like this:

        <child>some text</child>
        <child>other text</child>

    And I don't know exact children count (child could be only one!) So I couldn't iterate over children like this:

    for child in parse_result['parent']['child']:
        # some code

    because list would be used only if there are more than one child. So in this case I also have to perform some ugly type checking. Like this:

    children = parse_result['parent']['child']
    children = children if type(children) is list else [children]
    for child in children:
        # some code


    I suggest to add special mode (triggered by special optional argument passed to parse function). In this mode it will always use dictionary for describing tags and always use lists for describing children tags.

    opened by vovanz 10
  • Only OrderedDicts are returned

    Only OrderedDicts are returned

    This may just be a documentation issue, but when I run: (Python 2.7 OS X)

    foo = xmltodict.parse("""<?xml version="1.0" ?>
        print foo

    I get:

    Output: OrderedDict([(u'person', OrderedDict([(u'name', u'john'), (u'age', u'20')]))])

    In a nested XML document, this is making hard for me to turn this into JSON

    opened by pjakobsen 9
  • json to xml with

    json to xml with "self-closing tags"

    Hi experts, Is there a way to convert from json to xml with self-closing tags. Example: My json is define as below arr = [{"@name": "transactionId", "@value" : "1234", "@type": "u32"}, {"@name": "numTransactions", "@value" : "1", "@type":"u32"}]

    with xmltodict.unparse(), the generated XML has these lines.

    But I need the self-closing tags, like this

    Looking forward for experts suggestions


    opened by Kamakshilk 8
  • unparse handles lists incorrectly?

    unparse handles lists incorrectly?

    My Python object looks like so: {'Response': {'@ErrorCode': '00', 'Versions': [{'Version': {'@Updated': u'2013-10-23T18:29:11', 'Basic': {'@MD5': u'a7674c694607b169e57593a4952ea26f'}}}, {'Version': {'@Updated': u'2013-10-23T18:55:53', 'Basic': {'@MD5': u'b50001ee638f7df058d2c5f9157c6e8a'}}}]}}

    The resulting XML from 'unparse' puts an endtag for Versions after the first Version, then starts it again before the second list item.

    Seems that "Versions" shouldn't be ended after the first "Version" object?

    opened by jillh510 8
  • Moved the data from string to list

    Moved the data from string to list

    Moving the data from string to list causes a significant speed and memory improvement noticed mostly in large XML files. Appending to a list is more efficient than reconstructing a string.

    The speed improvement seen by me was up to 1000x faster on xml files weighting 150MBs or more.

    opened by bharel 7
  • Map attributes to unicode

    Map attributes to unicode

    Fixes #86

    opened by piotrkilczuk 7
  • Latest version is not x.x.x :-)

    Latest version is not x.x.x :-)

    Hi, sorry for this issue, but I'm looking to package xmltodict into Debian, but I've a problem with the latest tag. Could your change the v0.9 to v0.9.0 Thanks !

    opened by sbadia 7
  • Child Order Not Maintained with Different Tags

    Child Order Not Maintained with Different Tags

    If there is a generic doc that has 4 elements, but one in the middle has a different tag name, that order is not persisted in the round trip

    >>> xml = """<doc><el>1</el><el>2</el><el1>3</el1><el>4</el></doc>"""
    >>> d = xmltodict.parse(xml)
    >>> round_trip = xmltodict.unparse(d)
    >>> print(round_trip)

    As you can see the el1 got moved to the end.

    opened by mcrowson 7
  • Found a possible security concern

    Found a possible security concern

    Hey there!

    I belong to an open source security research community, and a member (@yetingli) has found an issue, but doesn’t know the best way to disclose it.

    If not a hassle, might you kindly add a SECURITY.md file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.

    Thank you for your consideration, and I look forward to hearing from you!

    (cc @huntr-helper)

    opened by zidingz 0
  • `xml_attribs` argument ignored when streaming.

    `xml_attribs` argument ignored when streaming.

    My expected behavior would be for either (1) the xml_attribs argument to raise an error or warning if used with the item_callback argument, or (2) the attributes to be included in the returned dictionary inside the callback, without needing to reconstruct it myself from the path argument.

    Instead it silently leaves them out.

    opened by elibixby 0
  • Escape `&` character while unparsing

    Escape `&` character while unparsing


    I am trying to unparse the below xml:

    <docLink> <docLinkId>11763263</docLinkId> <webUrl>https://some.domain.com?a=10&b=20</webUrl> </docLink>

    getting this error: xml.parsers.expat.ExpatError: not well-formed (invalid token)

    If I remove the & it works fine.

    Is there any solution for the above issue or any workaround?

    opened by aashutoshbane 1
  • What does full document do? Assistance removing <?xml version= from unparse">

    What does full document do? Assistance removing from unparse

    Using xmldict.unparse with default parameters inserts a meta-string at the start.

    Is there anyway to remove this? I noticed setting full_document=False would remove it, but I'm not sure what it does.

    opened by DOH-Manada 0
  • How to make streaming work with 7z compressed files?

    How to make streaming work with 7z compressed files?

    Is it possible, I would tend to think it will depend on another library related to 7z which allows line by line reading? However still wanted to check here, if anyone has tried or make it work with 7z files?

    opened by karrtikiyerkcm 0
  • update documentation with conda installation option

    update documentation with conda installation option

    The README file has all other installation options but conda (Ananconda/Miniconda). Please update it with conda installation option.

    conda install -c conda-forge xmltodict
    opened by sugatoray 0
  • added conda installation command

    added conda installation command

    The README file did not mention anything about conda installation of xmltodict. This pull-request adds that information to the README file.

    • [x] issue #274
    opened by sugatoray 2
  • Support empty string and make the difference from null

    Support empty string and make the difference from null

    We currently represent empty tags as null/None and it seems there's no way to represent empty string, refs https://github.com/martinblech/xmltodict/blob/v0.12.0/tests/test_xmltodict.py#L80-L98 I feel like supporting this feature would be a plus. Interesting related reading https://stackoverflow.com/questions/774192/what-is-the-correct-way-to-represent-null-xml-elements

    opened by AndreMiras 0
  • List type object with single parameter is being converted into wrong format

    List type object with single parameter is being converted into wrong format

    Format is wrong after conversion of List object with single value.


    { "name": "ParameterName", "p": "10638" }


    { "name": "ParameterName", "p": ["10638"] }

    Please look into this issue. Attaching Sample XML snip for reference. image

    Thank you in advance.

    opened by mdakilansari 0
  • Add more namespace attribute tests

    Add more namespace attribute tests

    Added tests with namespace attributes like this:

    <root xmlns="http://defaultns.com/" 
       <x a:attr="val">1</x>


    <root xmlns="http://www.url" version="1.00">
    	<proc xmlns="http://www.url">
    opened by leogregianin 0
Martín Blech
Martín Blech
The lxml XML toolkit for Python

What is lxml? lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language. It's also very fast and memory

null 2k Oct 26, 2021
A jquery-like library for python

pyquery: a jquery-like library for python pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jq

Gael Pasgrimaud 2k Oct 22, 2021
Standards-compliant library for parsing and serializing HTML documents and fragments in Python

html5lib html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all majo

null 935 Oct 15, 2021
Safely add untrusted strings to HTML/XML markup.

MarkupSafe MarkupSafe implements a text object that escapes characters so it is safe to use in HTML and XML. Characters that have special meanings are

The Pallets Projects 427 Oct 11, 2021
Pythonic HTML Parsing for Humans™

Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When us

Python Software Foundation 12.2k Oct 23, 2021
A library for converting HTML into PDFs using ReportLab

XHTML2PDF The current release of xhtml2pdf is xhtml2pdf 0.2.5. Release Notes can be found here: Release Notes As with all open-source software, its us

null 1.8k Oct 22, 2021
The awesome document factory

The Awesome Document Factory WeasyPrint is a smart solution helping web developers to create PDF documents. It turns simple HTML pages into gorgeous s

Kozea 4.6k Oct 22, 2021
Python binding to Modest engine (fast HTML5 parser with CSS selectors).

A fast HTML5 parser with CSS selectors using Modest engine. Installation From PyPI using pip: pip install selectolax Development version from github:

Artem Golubin 463 Oct 16, 2021
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

Bleach Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes. Bleach can also linkify text safely, appl

Mozilla 2.2k Oct 20, 2021