A library for converting HTML into PDFs using ReportLab

Overview

XHTML2PDF

PyPI version Python versions Travis CI AppVeyor Coveralls Read the Docs

The current release of xhtml2pdf is xhtml2pdf 0.2.5. Release Notes can be found here: Release Notes As with all open-source software, its use in production depends on many factors, so be aware that you may find issues in some cases.

Big thanks to everyone who has worked on this project so far and to those who help maintain it.

About

xhtml2pdf is a HTML to PDF converter using Python, the ReportLab Toolkit, html5lib and PyPDF2. It supports HTML5 and CSS 2.1 (and some of CSS 3). It is completely written in pure Python, so it is platform independent.

The main benefit of this tool is that a user with web skills like HTML and CSS is able to generate PDF templates very quickly without learning new technologies.

Documentation

The documentation of xhtml2pdf is available at Read the Docs.

And we could use your help improving it! A good place to start is doc/source/usage.rst.

Installation

This is a typical Python library and can be installed using pip:

pip install xhtml2pdf

Requirements

Python 2.7+. Only Python 3.4+ is tested and guaranteed to work.

All additional requirements are listed in the requirements.txt file and are installed automatically using the pip install xhtml2pdf method.

Alternatives

You can try WeasyPrint. The codebase is pretty, it has different features and it does a lot of what xhtml2pdf does.

Call for testing

This project is heavily dependent on getting its test coverage up! Furthermore, parts of the codebase could do well with cleanups and refactoring.

If you benefit from xhtml2pdf, perhaps look at the test coverage and identify parts that are yet untouched.

Development environment

  1. If you don't have it, install pip, the python package installer:

    sudo easy_install pip
    

    For more information about pip refer to http://www.pip-installer.org

  2. We will recommend using virtualenv for development. It's great to have a separate environment for each project, keeping the dependencies for multiple projects separated:

    sudo pip install virtualenv
    

    For more information about virtualenv refer to http://www.virtualenv.org

  3. Create a virtualenv for the project. This can be inside the project directory, but cannot be under version control:

    virtualenv --distribute xhtml2pdfenv --python=python2
    
  4. Activate your virtualenv:

    source xhtml2pdfenv/bin/activate
    

    Later to deactivate it use:

    deactivate
    
  5. The next step will be to install/upgrade dependencies from the requirements.txt file:

    pip install -r requirements.txt
    
  6. Run tests to check your configuration:

    nosetests --with-coverage
    

    You should have a log with the following success status:

    Ran 36 tests in 0.322s
    
    OK
    

Python integration

Some simple demos of how to integrate xhtml2pdf into a Python program may be found here: test/simple.py

Running tests

Two different test suites are available to assert that xhtml2pdf works reliably:

  1. Unit tests. The unit testing framework is currently minimal, but is being improved on a regular basis (contributions welcome). They should run in the expected way for Python's unittest module, i.e.:

    nosetests --with-coverage (or your personal favorite)
    
  2. Functional tests. Thanks to mawe42's super cool work, a full functional test suite is available at testrender/.

Contact

This project is community-led! Feel free to open up issues on GitHub about new ideas to improve xhtml2pdf.

History

These are the major milestones and the maintainers of the project:

  • 2000-2007, commercial project, spirito.de, written by Dirk Holtwich
  • 2007-2010 Dirk Holtwich (project named "Pisa", project released as GPL)
  • 2010-2012 Dirk Holtwick (project named "xhtml2pdf", changed license to Apache)
  • 2012-2015 Chris Glass (@chrisglass)
  • 2015-2016 Benjamin Bach (@benjaoming)
  • 2016-2018 Sam Spencer (@LegoStormtroopr)
  • 2018-Current Luis Zarate (@luisza)

For more history, see the CHANGELOG.txt file.

License

Copyright 2010 Dirk Holtwick, holtwick.it

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • Problems with some Unicode characters

    Problems with some Unicode characters

    Hi, I'm using the latest xhtml2pdf (0.2b1) & reportlab (3.4.0) through django-easy-pdf (0.1.0) on Python 3.6.0 and it's working great for the most part! One problem I am still experiencing, though, is that some Unicode characters are not rendering properly (šŠčČćĆđĐžŽ):

    screen shot 2017-03-29 at 16 38 36

    I'm using the default django-easy-pdf base template and I found that I can somewhat repair things if I override it to declare the html encoding:

    {% block extra_style %}
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    {% endblock %}
    

    Which results in some characters being rendered correctly like Š and Ž, but not all of them (Č, Ć, Đ are still blacked out).

    screen shot 2017-03-29 at 16 38 19

    I tried experimenting with different font declarations (sans-serif, serif, external fonts), but I can't seem to fix this. The characters are never rendered correctly. I don't know if I'm missing some xhtml2pdf / Reportlab setting here. Do you maybe have an idea of a possible solution?

    Fonts 
    opened by metakermit 48
  • black square box while generating pdf (unicode error)

    black square box while generating pdf (unicode error)

    A weird problem. While generating pdf, inplace of unicodes square black boxes apperars. Dont know if its unicode or font-face error. I even dont know if to use the "font-face and font-family" to generate the unicode into pdf. Anything I am missing ?? Great thanks.

    Code snippet # -- coding: utf-8 --

    from xhtml2pdf import pisa
    from StringIO import StringIO
    
    source = """<html>
                <style>
                    @font-face {
                    font-family: Mangal;
                    src: url("mangal.ttf");
                    }
    
                    body {
                    font-family: Mangal;
                    }
                </style>
                <body>
                    This is a test <br/>
                           सरल
                </body>
            </html>"""
    
    # Utility function
    def convertHtmlToPdf(source):       
        pdf = StringIO()
        pisaStatus = pisa.CreatePDF(StringIO(source.encode('utf-8')), pdf)
    
        # return True on success and False on errors
        print "Success: ", pisaStatus.err
        return pdf
    
    # Main program
    if __name__=="__main__":
        print pisa.showLogging()
        pdf = convertHtmlToPdf(source)
        fd = open("test.pdf", "w+b")
        fd.write(pdf.getvalue())
        fd.close()
    
    opened by beebek 31
  • Twitter-Bootstrap Causes Selector CSSParseError

    Twitter-Bootstrap Causes Selector CSSParseError

    Twitter Bootstrap has some pretty gnarly CSS selectors that xhml2pdf doesn't like.

    Result is:

    Selector Pseudo Function closing ')' not found:: (u':not(', u'[controls]) {\n disp')

    1. pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), dest=result, link_callback=fetch_resources )
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/document.py" in pisaDocument
    2.                     encoding, context=context, xml_output=xml_output)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/document.py" in pisaStory
    3. pisaParser(src, context, default_css, xhtml, encoding, xml_output)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/parser.py" in pisaParser
    4. context.parseCSS()
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/context.py" in parseCSS
    5.     self.css = self.cssParser.parse(self.cssText)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in parse
    6.             src, stylesheet = self._parseStylesheet(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseStylesheet
    7.             src, atResults = self._parseAtKeyword(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseAtKeyword
    8.         src, result = self._parseAtImports(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseAtImports
    9.         stylesheet = self.cssBuilder.atImport(import_, mediums, self)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/css.py" in atImport
    10.         return cssParser.parseExternal(import_)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/context.py" in parseExternal
    11.     result = self.parse(cssFile.getData())
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in parse
    12.             src, stylesheet = self._parseStylesheet(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseStylesheet
    13.             src, ruleset = self._parseRuleset(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseRuleset
    14.     src, selectors = self._parseSelectorGroup(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelectorGroup
    15.         src, selector = self._parseSelector(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelector
    16.     src, selector = self._parseSimpleSelector(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSimpleSelector
    17.             src, selector = self._parseSelectorPseudo(src, selector)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelectorPseudo
    18.             raise self.ParseError('Selector Pseudo Function closing \')\' not found', src, ctxsrc)
      

    Exception Type: CSSParseError at /p/pdf/gd8lx6xbl Exception Value: Selector Pseudo Function closing ')' not found:: (u':not(', u'[controls]) {\n disp')

    opened by Miserlou 22
  • Now broken with html5lib

    Now broken with html5lib

    From https://pypi.python.org/pypi/html5lib/0.99999999:

    Move a whole load of stuff (inputstream, ihatexml, trie, tokenizer, utils) to be underscore prefixed to clarify their status as private

    Except https://github.com/xhtml2pdf/xhtml2pdf/blob/master/xhtml2pdf/parser.py#L17:

    from html5lib import treebuilders, inputstream
    

    Current fix:

    • Use `pip install html5lib==1.0b8`
      
    opened by LegoStormtroopr 18
  • Python 3

    Python 3

    I made some changes so that the tests now run in both Python 2 and and Python 3, Build Status. Most of the changes I made were the same as made by @wylee, in #205.

    I also added a file to do Travis CI testing #202, and updated some of the dependencies.

    opened by JimInCO 18
  • Add optional pisaDocument argument to set metadata

    Add optional pisaDocument argument to set metadata

    Without this the functionality of pisaDocument would need to be recreated in order to set metadata such as the document author.

    Usage is like so:

    pisaDocument(src=io.StringIO(html), dest=open(output_file, "w"), context_meta={
                "author": "MyCorp Ltd.",
                "title": "My Document Title",
                "subject": "My Document Subject",
                "keywords": "pdf,documents",
            })
    
    opened by alistair-broomhead 16
  • I need to step back as a maintainer

    I need to step back as a maintainer

    I've been neglecting my repsonsibilities to this project.

    While I was happy to take it on, and help curate other contributors Python3 support into the codebase, I have to step back.

    Apologies to anyone who is still using this. :(

    opened by LegoStormtroopr 14
  • Python2/Python3 compatibility

    Python2/Python3 compatibility

    So, I'm close but for some reason on my install image in docs don't show up in python2 and are a little smaller in python3.

    I'm gonna fix this even if it kills me.

    Todo:

    • [x] Figure out how to render a transparent PDF as white (-flatten doesn't work for multipage PDFs)
    • [ ] Make the images the right size
    • [x] Clean up the string.join issues in reportlab_paragraph
    • [ ] Fix background for tr's
    opened by LegoStormtroopr 13
  • Unwanted Helvetica font

    Unwanted Helvetica font

    No matter what font I use, there is always Helvetica and it's not embed, so most of printing companies can not print the document if a font is missing.

    opened by maguayo 13
  • ZeroDivisionError: float division by zero

    ZeroDivisionError: float division by zero

    Hi, I get this error while trying to parse an HTML containing the following piece of code. I'm using the latest versions of all packages needed:

    • html5lib-0.90
    • pyPdf-1.13
    • reportlab-2.5
    • xhtml2pdf-0.0.3

    and Python 2.7 (2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)])

    Python Code: -[ import cStringIO as StringIO from xhtml2pdf import pisa ....

    html = ''' <TABLE BORDER="0" CELLPADDING="2" CELLSPACING="2"> <TR> <TD></TD> </TR> </TABLE> ''' dest = file('test.pdf', "wb") pdf = pisa.CreatePDF( StringIO.StringIO(html), dest, log_warn = 1, log_err = 1 ) ]-

    Note: If I put something inside the TD (example: ".... <TD>... some stuff..... </TD>........") or I change the value of the attr cellpadding, it works!!!

    Traceback: -[ Traceback (most recent call last): File "C:\tmp\test.py", line 95, in log_err = 1 File "C:\Python27\lib\site-packages\xhtml2pdf\document.py", line 131, in pisaDocument doc.build(context.story) File "C:\Python27\lib\site-packages\reportlab\platypus\doctemplate.py", line 880, in build self.handle_flowable(flowables) File "C:\Python27\lib\site-packages\reportlab\platypus\doctemplate.py", line 763, in handle_flowable if frame.add(f, canv, trySplit=self.allowSplitting): File "C:\Python27\lib\site-packages\reportlab\platypus\frames.py", line 174, in _add flowable.drawOn(canv, self._x + self._leftExtraIndent, y, _sW=aW-w) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 108, in drawOn self._drawOn(canvas) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 89, in _drawOn self.draw()#this is the bit you overload File "C:\Python27\lib\site-packages\reportlab\platypus\tables.py", line 1302, in draw self._drawCell(cellval, cellstyle, (colpos, rowpos), (colwidth, rowheight)) File "C:\Python27\lib\site-packages\reportlab\platypus\tables.py", line 1393, in _drawCell w, h = self._listCellGeom(cellval,colwidth,cellstyle,W=W, H=H,aH=rowheight) File "C:\Python27\lib\site-packages\xhtml2pdf\xhtml2pdf_reportlab.py", line 710, in _listCellGeom return Table._listCellGeom(self, V, w, s, W=W, H=H, aH=aH) File "C:\Python27\lib\site-packages\reportlab\platypus\tables.py", line 377, in _listCellGeom vw, vh = v.wrapOn(canv, aW, aH) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 119, in wrapOn w, h = self.wrap(aW,aH) File "C:\Python27\lib\site-packages\xhtml2pdf\xhtml2pdf_reportlab.py", line 693, in wrap return KeepInFrame.wrap(self, availWidth, availHeight) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 970, in wrap W, H = func(s1) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 951, in func W /= x ZeroDivisionError: float division by zero ]-

    Thanks for your great job, Shen139

    opened by shen139 12
  • Release a new version

    Release a new version

    I just upgraded my version with the master branch from github and it fixes a ton of issues in the current 0.0.5 release. Could you release a new version so we can just use pypi?
    Thanks for all the work on this :)

    opened by lzantal 11
  • Security Vulnerability in xhtml2pdf Dependency

    Security Vulnerability in xhtml2pdf Dependency "future"

    There is a security vulnerability in the future package that does not seem likely to be resolved. Can xhtml2pdf be released to not require this package? CVE: https://github.com/advisories/GHSA-v3c5-jqr6-7qm8 Discussions: https://github.com/PythonCharmers/python-future/issues/612 https://github.com/PythonCharmers/python-future/pull/610

    opened by jacobgqc 0
  • Replace PyPDF3 by pypdf

    Replace PyPDF3 by pypdf

    PyPDF2 has recently moved to pypdf. Im the maintainer of PyPDF2 and pypdf.

    PyPDF3 has a way smaller community than PyPDF2. I try to get the community to move to pypdf: https://github.com/sfneal/PyPDF3/issues/18

    Swiftly going over some issues of xhtml2pdf:

    • #624 : https://pypdf.readthedocs.io/en/latest/modules/PdfWriter.html#pypdf.PdfWriter.pdf_header
    • #454 : We added modern encryption / decryption support - https://pypdf.readthedocs.io/en/latest/user/encryption-decryption.html
    opened by MartinThoma 1
  • TOC with points

    TOC with points

    Good Morning,

    I want to create a table of contents with points between the text and the number of page but only blank is available like on picture below.

    image

    This possibility exists ?

    Best Regards

    opened by eyesonly13 0
  • Footer not displayed in 0.2.8 and 0.2.7

    Footer not displayed in 0.2.8 and 0.2.7

    The footer frame is not displayed in pdf file when following the documentation.

    How to reproduce:

    • create a virtualenv with xhtml2pdf library:
    python3 -m venv venv
    ./venv/bin/pip install xhtml2pdf
    
    • add an index.html file containing the example at https://xhtml2pdf.readthedocs.io/en/latest/format_html.html#example-with-2-static-frames-and-1-content-frame
    • ./venv/bin/xhtml2pdf index.html

    The generated index.pdf shows only 'Lyrics-R-Us' and 'To PDF or not to PDF'.

    It could occurs in others releases. I only checked these two releases.

    opened by sblondon 2
  • AttributeError: 'PmlBaseDoc' object has no attribute '_page_count'

    AttributeError: 'PmlBaseDoc' object has no attribute '_page_count'

    When I add the <pdf:pagecount> html tag within the source_html attribute of pisa.CreatePDF, copied directly from the example, it gives me the error AttributeError: 'PmlBaseDoc' object has no attribute '_page_count'

    FYI, I am on a M1 Mac

    opened by phillipshaong 0
Releases(v0.2.8)
  • v0.2.8(Jun 16, 2022)

    🐛 Bug-Fixes

    • Fix background-image issues with #614 and pull requests with #619
    • Fix CSSParseError for minified @font-face definitions #609
    • Fixed a few typos and grammar mistakes in usage.rst documentation. #610
    Source code(tar.gz)
    Source code(zip)
  • v0.2.7(Mar 31, 2022)

    🎉 New

    • Add encryption and password protection
    • New WaterMark management system with new options
    • Add Graphic builder
    • Add signing pdfs (simple and pades)

    🐛 Bug-Fixes

    • Remove import cycle between utils and default
    • Fixed link_callback construction of path
    • Fixed path when is relative to current path

    ⚠️ Deprecation

    • xhtml in pisa.CreatePDF support will removed on next release
    • XML2PDF and XHTML2PDF will be removed on next release use HTML2PDF instead

    📘 Documentation

    • Add render pdf on documentation and add some html example.
    • Include graphics examples

    | Thanks to the following people on GitHub for contributing to this release: | @marcelagz for graphics support :)

    Source code(tar.gz)
    Source code(zip)
  • v0.2.6(Mar 11, 2022)

    • Drop python 2 support.
    • Remove most of python 2 code and cleanup
    • Update packages dependencies
    • Remove six dependency and update Readme
    • Set timeout in https options
    • Add new file manager approach using factory method, now new classes deal with different types of data B64InlineURI, LocalProtocolURI, NetworkFileUri, LocalFileURI, BytesFileUri
    • Now getColor return None when None is passed ignoring default value, but return default if bool(data) == false
    • rtl languages reversed lines added as a ParaFrag (note: not fully supported yet)
    • Check if Paragraph has 'rtl' attribute (note: not fully supported yet)
    • Fix UnboundLocalError in reportlab_paragraph (#585) (#586)
    • Remove usage of getStringIO (#590) removed form reportlab
    • Change test for github workflow using only Linux
    • Add Python 3.9, 3.10
    • Switch from PyPDF2 to PyPDF3
    • Add SVG support
    • Update package information.
    • Allow call tests using make.
    Source code(tar.gz)
    Source code(zip)
    xhtml2pdf-0.2.6.tar.gz(99.36 KB)
  • 0.2.4(Jan 21, 2020)

    Update link_callback documentation. Stylize code lines in documentation. Fixed cgi escape util on setup version. Add test to python 3.7 and 3.8. Fixed width assignation on fragments. Support urllib in python 3 and python 2. Add em unit support. Repair base64 unscaped string. Fixed urlparse when urls has parameters. Fixed i_rgbcolor support.

    Source code(tar.gz)
    Source code(zip)
  • 0.2.2(Apr 17, 2018)

  • 0.2.1(Feb 15, 2018)

    This new release has a lot of improvements in python 3 and demos.

    Version 0.2.1

    • Improve python3 support - thanks ***luisza, andreyfedoseev and flupzor ***
    • Include new Httplibs options - thanks luisza
    • Support to background image - thanks flupzor
    • Remove python23 support - thanks flupzor
    • Transparent images work again in Python 3 - thanks flupzor
    • Readthedocs integration - thanks luisza
    • Update Django demo site - thanks luisza
    • PEP8 and cleanup code - thanks luisza
    • Drop the turbogears module - thanks browniebroke
    Source code(tar.gz)
    Source code(zip)
  • 0.1b2(Aug 1, 2016)

  • 0.1b1(Jun 5, 2016)

    This release is possibly the final release ever of xhtml2pdf, except if someone takes over maintainership. It has Python 3 support, but there are certain bugs also that you can read about in the ~37 unclosed issues.

    Source code(tar.gz)
    Source code(zip)
  • 0.1a4(May 18, 2016)

    Version 0.1alpha4

    • Removed PyPy support
    • Avoid exceptions likely to occur systematic to how narrow a text column is #309 - thanks _jkDesignDE_
    • Improved tests for tables #305 - thanks _taddeimania_
    • Fix broken empty PDFs in Python2 #301 - thanks _citizen-stig_
    • Unknown page sizes now raise an exception #71 - thanks _benjaoming_
    • Unorderable types caused by duplicate CSS selectors / rules #69 - thanks _benjaoming_
    • Allow empty page definition with no space after @page - #88 - thanks _benjaoming_
    • Error when in addFromFile using file-like object #245 - thanks _benjaoming_
    • Python 3: Bad table formatting with empty columns #279 - thanks _citizen-stig and benjaoming_
    • Removed paragraph2.py, unused ghost file since the beginning of the project #289 - thanks _citizen-stig_
    • Catch-all exceptions removed in a lot of places, not quite done #290 - thanks _benjaoming_
    Source code(tar.gz)
    Source code(zip)
Owner
null
Lektor-html-pretify - Lektor plugin to pretify the HTML DOM using Beautiful Soup

html-pretify Lektor plugin to pretify the HTML DOM using Beautiful Soup. How doe

Chaos Bodensee 2 Nov 8, 2022
A HTML-code compiler-thing that lets you reuse HTML code.

RHTML RHTML stands for Reusable-Hyper-Text-Markup-Language, and is pronounced "Rech-tee-em-el" despite how its abbreviation is. As the name stands, RH

Duckie 4 Nov 15, 2021
Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API

Dominate Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure

Tom Flanagan 1.5k Jan 9, 2023
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

Bleach Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes. Bleach can also linkify text safely, appl

Mozilla 2.5k Dec 29, 2022
Standards-compliant library for parsing and serializing HTML documents and fragments in Python

html5lib html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all majo

null 1k Dec 27, 2022
A python HTML builder library.

PyML A python HTML builder library. Goals Fully functional html builder similar to the javascript node manipulation. Implement an html parser that ret

Arjix 8 Jul 4, 2022
Generate HTML using python 3 with an API that follows the DOM standard specfication.

Generate HTML using python 3 with an API that follows the DOM standard specfication. A JavaScript API and tons of cool features. Can be used as a fast prototyping tool.

byteface 114 Dec 14, 2022
Safely add untrusted strings to HTML/XML markup.

MarkupSafe MarkupSafe implements a text object that escapes characters so it is safe to use in HTML and XML. Characters that have special meanings are

The Pallets Projects 514 Dec 31, 2022
Pythonic HTML Parsing for Humans™

Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When us

Python Software Foundation 12.9k Jan 1, 2023
Modded MD conversion to HTML

MDPortal A module to convert a md-eqsue lang to html Basically I ruined md in an attempt to convert it to html Overview Here is a demo file from parse

Zeb 1 Nov 27, 2021
A jquery-like library for python

pyquery: a jquery-like library for python pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jq

Gael Pasgrimaud 2.2k Dec 29, 2022
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo José Guijarro Villalba 75 Oct 21, 2022
Python utility library for compositing PDF documents with reportlab.

pdfdoc-py Python utility library for compositing PDF documents with reportlab. Installation The pdfdoc-py package can be installed directly from the s

Michael Gale 1 Jan 6, 2022
A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

Universal Online Judge Spider Introduction This is a spider for Universal Online Judge (UOJ) system (https://uoj.ac/). It also works for all other Onl

TriNitroTofu 1 Dec 7, 2021
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

ArchiveBox Open-source self-hosted web archiving. ▶️ Quickstart | Demo | Github | Documentation | Info & Motivation | Community | Roadmap "Your own pe

ArchiveBox 14.8k Jan 5, 2023
That project takes as input special TXT File, divides its content into lsit of HTML objects and then creates HTML file from them.

That project takes as input special TXT File, divides its content into lsit of HTML objects and then creates HTML file from them.

null 1 Jan 10, 2022
Converting Html files to pdf using python script, pdfkit module and wkhtmltopdf.

Html-to-pdf-pdfkit-wkhtml- This repository has code for converting local html files and online html resources into pdf. It is an python script which u

Hemachandran P 1 Nov 9, 2021
A Python module and command-line utility for converting .ANS format ANSI art to HTML

ansipants A Python module and command-line utility for converting .ANS format ANSI art to HTML. Installation pip install ansipants Command-line usage

null 4 Oct 16, 2022
Lektor-html-pretify - Lektor plugin to pretify the HTML DOM using Beautiful Soup

html-pretify Lektor plugin to pretify the HTML DOM using Beautiful Soup. How doe

Chaos Bodensee 2 Nov 8, 2022
Django-Text-to-HTML-converter - The simple Text to HTML Converter using Django framework

Django-Text-to-HTML-converter This is the simple Text to HTML Converter using Dj

Nikit Singh Kanyal 6 Oct 9, 2022