A library for converting HTML into PDFs using ReportLab

Related tags

xhtml2pdf
Overview

XHTML2PDF

PyPI version Python versions Travis CI AppVeyor Coveralls Read the Docs

The current release of xhtml2pdf is xhtml2pdf 0.2.5. Release Notes can be found here: Release Notes As with all open-source software, its use in production depends on many factors, so be aware that you may find issues in some cases.

Big thanks to everyone who has worked on this project so far and to those who help maintain it.

About

xhtml2pdf is a HTML to PDF converter using Python, the ReportLab Toolkit, html5lib and PyPDF2. It supports HTML5 and CSS 2.1 (and some of CSS 3). It is completely written in pure Python, so it is platform independent.

The main benefit of this tool is that a user with web skills like HTML and CSS is able to generate PDF templates very quickly without learning new technologies.

Documentation

The documentation of xhtml2pdf is available at Read the Docs.

And we could use your help improving it! A good place to start is doc/source/usage.rst.

Installation

This is a typical Python library and can be installed using pip:

pip install xhtml2pdf

Requirements

Python 2.7+. Only Python 3.4+ is tested and guaranteed to work.

All additional requirements are listed in the requirements.txt file and are installed automatically using the pip install xhtml2pdf method.

Alternatives

You can try WeasyPrint. The codebase is pretty, it has different features and it does a lot of what xhtml2pdf does.

Call for testing

This project is heavily dependent on getting its test coverage up! Furthermore, parts of the codebase could do well with cleanups and refactoring.

If you benefit from xhtml2pdf, perhaps look at the test coverage and identify parts that are yet untouched.

Development environment

  1. If you don't have it, install pip, the python package installer:

    sudo easy_install pip
    

    For more information about pip refer to http://www.pip-installer.org

  2. We will recommend using virtualenv for development. It's great to have a separate environment for each project, keeping the dependencies for multiple projects separated:

    sudo pip install virtualenv
    

    For more information about virtualenv refer to http://www.virtualenv.org

  3. Create a virtualenv for the project. This can be inside the project directory, but cannot be under version control:

    virtualenv --distribute xhtml2pdfenv --python=python2
    
  4. Activate your virtualenv:

    source xhtml2pdfenv/bin/activate
    

    Later to deactivate it use:

    deactivate
    
  5. The next step will be to install/upgrade dependencies from the requirements.txt file:

    pip install -r requirements.txt
    
  6. Run tests to check your configuration:

    nosetests --with-coverage
    

    You should have a log with the following success status:

    Ran 36 tests in 0.322s
    
    OK
    

Python integration

Some simple demos of how to integrate xhtml2pdf into a Python program may be found here: test/simple.py

Running tests

Two different test suites are available to assert that xhtml2pdf works reliably:

  1. Unit tests. The unit testing framework is currently minimal, but is being improved on a regular basis (contributions welcome). They should run in the expected way for Python's unittest module, i.e.:

    nosetests --with-coverage (or your personal favorite)
    
  2. Functional tests. Thanks to mawe42's super cool work, a full functional test suite is available at testrender/.

Contact

This project is community-led! Feel free to open up issues on GitHub about new ideas to improve xhtml2pdf.

History

These are the major milestones and the maintainers of the project:

  • 2000-2007, commercial project, spirito.de, written by Dirk Holtwich
  • 2007-2010 Dirk Holtwich (project named "Pisa", project released as GPL)
  • 2010-2012 Dirk Holtwick (project named "xhtml2pdf", changed license to Apache)
  • 2012-2015 Chris Glass (@chrisglass)
  • 2015-2016 Benjamin Bach (@benjaoming)
  • 2016-2018 Sam Spencer (@LegoStormtroopr)
  • 2018-Current Luis Zarate (@luisza)

For more history, see the CHANGELOG.txt file.

License

Copyright 2010 Dirk Holtwick, holtwick.it

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Issues
  • Problems with some Unicode characters

    Problems with some Unicode characters

    Hi, I'm using the latest xhtml2pdf (0.2b1) & reportlab (3.4.0) through django-easy-pdf (0.1.0) on Python 3.6.0 and it's working great for the most part! One problem I am still experiencing, though, is that some Unicode characters are not rendering properly (šŠčČćĆđĐžŽ):

    screen shot 2017-03-29 at 16 38 36

    I'm using the default django-easy-pdf base template and I found that I can somewhat repair things if I override it to declare the html encoding:

    {% block extra_style %}
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    {% endblock %}
    

    Which results in some characters being rendered correctly like Š and Ž, but not all of them (Č, Ć, Đ are still blacked out).

    screen shot 2017-03-29 at 16 38 19

    I tried experimenting with different font declarations (sans-serif, serif, external fonts), but I can't seem to fix this. The characters are never rendered correctly. I don't know if I'm missing some xhtml2pdf / Reportlab setting here. Do you maybe have an idea of a possible solution?

    Fonts 
    opened by metakermit 48
  • black square box while generating pdf (unicode error)

    black square box while generating pdf (unicode error)

    A weird problem. While generating pdf, inplace of unicodes square black boxes apperars. Dont know if its unicode or font-face error. I even dont know if to use the "font-face and font-family" to generate the unicode into pdf. Anything I am missing ?? Great thanks.

    Code snippet # -- coding: utf-8 --

    from xhtml2pdf import pisa
    from StringIO import StringIO
    
    source = """<html>
                <style>
                    @font-face {
                    font-family: Mangal;
                    src: url("mangal.ttf");
                    }
    
                    body {
                    font-family: Mangal;
                    }
                </style>
                <body>
                    This is a test <br/>
                           सरल
                </body>
            </html>"""
    
    # Utility function
    def convertHtmlToPdf(source):       
        pdf = StringIO()
        pisaStatus = pisa.CreatePDF(StringIO(source.encode('utf-8')), pdf)
    
        # return True on success and False on errors
        print "Success: ", pisaStatus.err
        return pdf
    
    # Main program
    if __name__=="__main__":
        print pisa.showLogging()
        pdf = convertHtmlToPdf(source)
        fd = open("test.pdf", "w+b")
        fd.write(pdf.getvalue())
        fd.close()
    
    opened by beebek 31
  • Twitter-Bootstrap Causes Selector CSSParseError

    Twitter-Bootstrap Causes Selector CSSParseError

    Twitter Bootstrap has some pretty gnarly CSS selectors that xhml2pdf doesn't like.

    Result is:

    Selector Pseudo Function closing ')' not found:: (u':not(', u'[controls]) {\n disp')

    1. pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), dest=result, link_callback=fetch_resources )
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/document.py" in pisaDocument
    2.                     encoding, context=context, xml_output=xml_output)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/document.py" in pisaStory
    3. pisaParser(src, context, default_css, xhtml, encoding, xml_output)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/parser.py" in pisaParser
    4. context.parseCSS()
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/context.py" in parseCSS
    5.     self.css = self.cssParser.parse(self.cssText)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in parse
    6.             src, stylesheet = self._parseStylesheet(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseStylesheet
    7.             src, atResults = self._parseAtKeyword(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseAtKeyword
    8.         src, result = self._parseAtImports(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseAtImports
    9.         stylesheet = self.cssBuilder.atImport(import_, mediums, self)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/css.py" in atImport
    10.         return cssParser.parseExternal(import_)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/context.py" in parseExternal
    11.     result = self.parse(cssFile.getData())
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in parse
    12.             src, stylesheet = self._parseStylesheet(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseStylesheet
    13.             src, ruleset = self._parseRuleset(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseRuleset
    14.     src, selectors = self._parseSelectorGroup(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelectorGroup
    15.         src, selector = self._parseSelector(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelector
    16.     src, selector = self._parseSimpleSelector(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSimpleSelector
    17.             src, selector = self._parseSelectorPseudo(src, selector)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelectorPseudo
    18.             raise self.ParseError('Selector Pseudo Function closing \')\' not found', src, ctxsrc)
      

    Exception Type: CSSParseError at /p/pdf/gd8lx6xbl Exception Value: Selector Pseudo Function closing ')' not found:: (u':not(', u'[controls]) {\n disp')

    opened by Miserlou 22
  • 0.1b1

    0.1b1

    This PR lists all the changes that are released in the alphas.

    Once merge conflicts are resolved and dist released, this PR closes.

    opened by benjaoming 21
  • Python 3

    Python 3

    I made some changes so that the tests now run in both Python 2 and and Python 3, Build Status. Most of the changes I made were the same as made by @wylee, in #205.

    I also added a file to do Travis CI testing #202, and updated some of the dependencies.

    opened by JimInCO 18
  • Now broken with html5lib

    Now broken with html5lib

    From https://pypi.python.org/pypi/html5lib/0.99999999:

    Move a whole load of stuff (inputstream, ihatexml, trie, tokenizer, utils) to be underscore prefixed to clarify their status as private

    Except https://github.com/xhtml2pdf/xhtml2pdf/blob/master/xhtml2pdf/parser.py#L17:

    from html5lib import treebuilders, inputstream
    

    Current fix:

    • Use `pip install html5lib==1.0b8`
      
    opened by LegoStormtroopr 18
  • Add optional pisaDocument argument to set metadata

    Add optional pisaDocument argument to set metadata

    Without this the functionality of pisaDocument would need to be recreated in order to set metadata such as the document author.

    Usage is like so:

    pisaDocument(src=io.StringIO(html), dest=open(output_file, "w"), context_meta={
                "author": "MyCorp Ltd.",
                "title": "My Document Title",
                "subject": "My Document Subject",
                "keywords": "pdf,documents",
            })
    
    opened by alistair-broomhead 16
  • Change the requirements to use PyPDF2

    Change the requirements to use PyPDF2

    Newer version of PyPDF...

    opened by zkanda 16
  • Python2/Python3 compatibility

    Python2/Python3 compatibility

    So, I'm close but for some reason on my install image in docs don't show up in python2 and are a little smaller in python3.

    I'm gonna fix this even if it kills me.

    Todo:

    • [x] Figure out how to render a transparent PDF as white (-flatten doesn't work for multipage PDFs)
    • [ ] Make the images the right size
    • [x] Clean up the string.join issues in reportlab_paragraph
    • [ ] Fix background for tr's
    opened by LegoStormtroopr 13
  • Unwanted Helvetica font

    Unwanted Helvetica font

    No matter what font I use, there is always Helvetica and it's not embed, so most of printing companies can not print the document if a font is missing.

    opened by maguayo 13
  • Python3.9 support

    Python3.9 support

    Hi guys, are you planning to support Python3.9 anytime soon in the future?

    opened by SteveTr97 1
  • Support for A7 and A8 sizes

    Support for A7 and A8 sizes

    Is there anyway I can add support for thermal printer sizes (A7 or A8 paper size)? If no, is there any plans to add this in future?

    opened by basimkhalid 0
  • '>' not supported between instances of 'NoneType' and 'float'

    '>' not supported between instances of 'NoneType' and 'float'

    • The error above on Windows 10 64bit / Python 3.7.7.
    • The html is an output from Pandas pivot table.
    • The error does occur in some cases. however is never occurs on Debian 10 / Python 3.7.3

    Requirement already satisfied: xhtml2pdf==0.2.5 in c:\python37\lib\site-packages (0.2.5) Requirement already satisfied: html5lib>=1.0 in c:\python37\lib\site-packages (from xhtml2pdf==0.2.5) (1.1) Requirement already satisfied: pyPdf2 in c:\python37\lib\site-packages (from xhtml2pdf==0.2.5) (1.26.0) Requirement already satisfied: Pillow in c:\python37\lib\site-packages (from xhtml2pdf==0.2.5) (8.3.2) Requirement already satisfied: reportlab>=3.3.0 in c:\python37\lib\site-packages (from xhtml2pdf==0.2.5) (3.6.1) Requirement already satisfied: six in c:\python37\lib\site-packages (from xhtml2pdf==0.2.5) (1.14.0) Requirement already satisfied: python-bidi>=0.4.2 in c:\python37\lib\site-packages (from xhtml2pdf==0.2.5) (0.4.2) Requirement already satisfied: arabic-reshaper>=2.1.0 in c:\python37\lib\site-packages (from xhtml2pdf==0.2.5) (2.1.3) Requirement already satisfied: future in c:\python37\lib\site-packages (from arabic-reshaper>=2.1.0->xhtml2pdf==0.2.5) (0.18.2)

    Exception Type: | TypeError -- | -- '>' not supported between instances of 'NoneType' and 'float' C:\Python37\lib\site-packages\reportlab\platypus\tables.py, line 421, in identity

    opened by essadek 0
  • tr:nth-child(even) background-color not working

    tr:nth-child(even) background-color not working

    I'm trying to set alternating colors rows to a table, but xhtml2pdf seems not to work. Is there a workaround for this issue?

    # Set table styles
    styles = [
        dict(selector="th", props=th_props),
        dict(selector="td", props=td_props),
        dict(selector="tr", props=tr_props),
        dict(selector="tr:nth-child(even)", props=[('background-color', 'lightblue')]),
    ]
    
    source_html = df.set_table_styles(styles).render()
    

    Jupyter notebook shows df.set_table_styles(styles) like this: image

    But pisa_status = pisa.CreatePDF(source_html, dest=result_file) creates the PDF like this: image

    opened by mvinoba 0
  • Support for Indian Unicode fonts

    Support for Indian Unicode fonts

    Some Indian languages are not displayed properly, even after the custom font is used. Whenever there is a combined letter (a combination of multiple letters treated as one single letter) the PDF is displaying it as separate letters, whereas its displayed properly in HTML pages. I am sure it is just the display, since when I copy the "broken" content from the PDF and paste it somewhere, it is displayed correctly. Eg: language: Malayalam, font : https://fonts.google.com/specimen/Baloo+Chettan+2

    Here is the font-face: @font-face { font-family: 'BalooChettan2'; src: url("{{ internal_settings.app_url }}/static/sales/fonts/BalooChettan2-Regular.ttf"); }

    Here is the CSS: font-family: 'BalooChettan2', cursive;

    The words I used: (The below display is proper) ടൈപ്പ്റൈറ്റഡ് വാക്കുകൾ

    Uploading a screenshot of the PDF display of the above word image

    opened by basimkhalid 0
  • ResourceWarning: unclosed file

    ResourceWarning: unclosed file

    I am using xhtml2pdf 0.2.5 with python3.7 and I get the following warnings:

    /usr/local/lib/python3.7/site-packages/xhtml2pdf/w3c/css.py:891: ResourceWarning: unclosed file <io.TextIOWrapper name='<here_path_to_file>' mode='r' encoding='UTF-8'> return cssParser.parseExternal(import)

    /usr/local/lib/python3.7/site-packages/xhtml2pdf/parser.py:643: ResourceWarning: unclosed file <_io.BufferedReader name='<here_path_to_file>'> pisaLoop(nnode, context, path, **kw)

    opened by maradragan 1
  • DeprecationWarning: invalid escape sequence

    DeprecationWarning: invalid escape sequence

    I am using xhtml2pdf 0.2.5 with python3.7 and I get the following deprecation warnings:

    /usr/local/lib/python3.7/site-packages/xhtml2pdf/util.py:62: DeprecationWarning: invalid escape sequence . "^.?rgb[a]?(.?([0-9]+).?([0-9]+)(?:.?(?:[01].(?:[0-9]+)))?[)].?[ ]$") /usr/local/lib/python3.7/site-packages/xhtml2pdf/util.py:615: DeprecationWarning: invalid escape sequence + b64 = re.sub(b'[^A-Za-z0-9+/]+', b'', b64) /usr/local/lib/python3.7/site-packages/xhtml2pdf/tags.py:103: DeprecationWarning: invalid escape sequence : rxLink = re.compile("^(#|[a-z]+:).") /usr/local/lib/python3.7/site-packages/xhtml2pdf/w3c/cssParser.py:347: DeprecationWarning: invalid escape sequence \s i_unicode = '\\(?:%s){1,6}\s?' % i_hex /usr/local/lib/python3.7/site-packages/xhtml2pdf/w3c/cssParser.py:350: DeprecationWarning: invalid escape sequence - i_nmstart = _orRule('-[^0-9]|[A-Za-z_]', i_nonascii, /usr/local/lib/python3.7/site-packages/xhtml2pdf/w3c/cssParser.py:362: DeprecationWarning: invalid escape sequence * i_element_name = '((?:%s)|*)' % (i_ident[1:-1],) /usr/local/lib/python3.7/site-packages/xhtml2pdf/w3c/cssParser.py:364: DeprecationWarning: invalid escape sequence * i_namespace_selector = '((?:%s)|*|)|(?!=)' % (i_ident[1:-1],) /usr/local/lib/python3.7/site-packages/xhtml2pdf/w3c/cssParser.py:379: DeprecationWarning: invalid escape sequence \s i_uri = ('url\(\s(?:(?:%s)|((?:%s)+))\s*\)' /usr/local/lib/python3.7/site-packages/xhtml2pdf/w3c/cssParser.py:393: DeprecationWarning: invalid escape sequence ? i_unicoderange2 = "(?:U\+?{1,6}|{h}(?{0,5}|{h}(?{0,4}|{h}(?{0,3}|{h}(?{0,2}|{h}(??|{h}))))))" /usr/local/lib/python3.7/site-packages/xhtml2pdf/w3c/cssParser.py:399: DeprecationWarning: invalid escape sequence / i_comment = '(?:/*[^*]*+([^/*][^*]*+)/)' /usr/local/lib/python3.7/site-packages/xhtml2pdf/w3c/cssParser.py:401: DeprecationWarning: invalid escape sequence \s i_important = '!\s(important)'

    opened by maradragan 0
  • can't render ePUB fixed layout format

    can't render ePUB fixed layout format

    when rendering ePUB fixed layout format, the images are missing sorry, I don't have a public ePUB fixed layout format for your test

    opened by retsyo 0
  • css z index and top won't work

    css z index and top won't work

    Hello,

    i have in my css:

    #wrap { position:relative; width: 200px; height: 145px;

    }

    #text { z-index:100; position:absolute; color:black; font-size:20px; font-weight:bold; left:10px; top:90px; }

    html:

    Brand

    when i change top in #text nothning changed in output. I have no idea how i can overlay text over images....

    opened by Bastilla123 0
  • table of contents

    table of contents

    @vidar @nduthoit @overshard @orestis @boromil please how do i make a table of contents? thanks!

    opened by yishairasowsky 0
Releases(0.2.4)
  • 0.2.4(Jan 21, 2020)

    Update link_callback documentation. Stylize code lines in documentation. Fixed cgi escape util on setup version. Add test to python 3.7 and 3.8. Fixed width assignation on fragments. Support urllib in python 3 and python 2. Add em unit support. Repair base64 unscaped string. Fixed urlparse when urls has parameters. Fixed i_rgbcolor support.

    Source code(tar.gz)
    Source code(zip)
  • 0.2.2(Apr 17, 2018)

  • 0.2.1(Feb 15, 2018)

    This new release has a lot of improvements in python 3 and demos.

    Version 0.2.1

    • Improve python3 support - thanks ***luisza, andreyfedoseev and flupzor ***
    • Include new Httplibs options - thanks luisza
    • Support to background image - thanks flupzor
    • Remove python23 support - thanks flupzor
    • Transparent images work again in Python 3 - thanks flupzor
    • Readthedocs integration - thanks luisza
    • Update Django demo site - thanks luisza
    • PEP8 and cleanup code - thanks luisza
    • Drop the turbogears module - thanks browniebroke
    Source code(tar.gz)
    Source code(zip)
  • 0.1b2(Aug 1, 2016)

  • 0.1b1(Jun 5, 2016)

    This release is possibly the final release ever of xhtml2pdf, except if someone takes over maintainership. It has Python 3 support, but there are certain bugs also that you can read about in the ~37 unclosed issues.

    Source code(tar.gz)
    Source code(zip)
  • 0.1a4(May 18, 2016)

    Version 0.1alpha4

    • Removed PyPy support
    • Avoid exceptions likely to occur systematic to how narrow a text column is #309 - thanks _jkDesignDE_
    • Improved tests for tables #305 - thanks _taddeimania_
    • Fix broken empty PDFs in Python2 #301 - thanks _citizen-stig_
    • Unknown page sizes now raise an exception #71 - thanks _benjaoming_
    • Unorderable types caused by duplicate CSS selectors / rules #69 - thanks _benjaoming_
    • Allow empty page definition with no space after @page - #88 - thanks _benjaoming_
    • Error when in addFromFile using file-like object #245 - thanks _benjaoming_
    • Python 3: Bad table formatting with empty columns #279 - thanks _citizen-stig and benjaoming_
    • Removed paragraph2.py, unused ghost file since the beginning of the project #289 - thanks _citizen-stig_
    • Catch-all exceptions removed in a lot of places, not quite done #290 - thanks _benjaoming_
    Source code(tar.gz)
    Source code(zip)
Python module that makes working with XML feel like you are working with JSON

xmltodict xmltodict is a Python module that makes working with XML feel like you are working with JSON, as in this "spec": >>> print(json.dumps(xmltod

Martín Blech 4.6k Oct 23, 2021
A jquery-like library for python

pyquery: a jquery-like library for python pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jq

Gael Pasgrimaud 2k Oct 22, 2021
Standards-compliant library for parsing and serializing HTML documents and fragments in Python

html5lib html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all majo

null 935 Oct 15, 2021
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

Bleach Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes. Bleach can also linkify text safely, appl

Mozilla 2.2k Oct 20, 2021
Python binding to Modest engine (fast HTML5 parser with CSS selectors).

A fast HTML5 parser with CSS selectors using Modest engine. Installation From PyPI using pip: pip install selectolax Development version from github:

Artem Golubin 463 Oct 16, 2021
The lxml XML toolkit for Python

What is lxml? lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language. It's also very fast and memory

null 2k Oct 26, 2021
Safely add untrusted strings to HTML/XML markup.

MarkupSafe MarkupSafe implements a text object that escapes characters so it is safe to use in HTML and XML. Characters that have special meanings are

The Pallets Projects 427 Oct 11, 2021
The awesome document factory

The Awesome Document Factory WeasyPrint is a smart solution helping web developers to create PDF documents. It turns simple HTML pages into gorgeous s

Kozea 4.6k Oct 22, 2021
Converts XML to Python objects

untangle Documentation Converts XML to a Python object. Siblings with similar names are grouped into a list. Children can be accessed with parent.chil

Christian Stefanescu 527 Oct 19, 2021