A library for converting HTML into PDFs using ReportLab

Last update: Dec 27, 2022

Related tags

HTML Manipulation xhtml2pdf

Overview

XHTML2PDF

The current release of xhtml2pdf is xhtml2pdf 0.2.5. Release Notes can be found here: Release Notes As with all open-source software, its use in production depends on many factors, so be aware that you may find issues in some cases.

Big thanks to everyone who has worked on this project so far and to those who help maintain it.

About

xhtml2pdf is a HTML to PDF converter using Python, the ReportLab Toolkit, html5lib and PyPDF2. It supports HTML5 and CSS 2.1 (and some of CSS 3). It is completely written in pure Python, so it is platform independent.

The main benefit of this tool is that a user with web skills like HTML and CSS is able to generate PDF templates very quickly without learning new technologies.

Documentation

The documentation of xhtml2pdf is available at Read the Docs.

And we could use your help improving it! A good place to start is doc/source/usage.rst.

Installation

This is a typical Python library and can be installed using pip:

pip install xhtml2pdf

Requirements

Python 2.7+. Only Python 3.4+ is tested and guaranteed to work.

All additional requirements are listed in the requirements.txt file and are installed automatically using the pip install xhtml2pdf method.

Alternatives

You can try WeasyPrint. The codebase is pretty, it has different features and it does a lot of what xhtml2pdf does.

Call for testing

This project is heavily dependent on getting its test coverage up! Furthermore, parts of the codebase could do well with cleanups and refactoring.

If you benefit from xhtml2pdf, perhaps look at the test coverage and identify parts that are yet untouched.

Development environment

If you don't have it, install pip, the python package installer:
```
sudo easy_install pip
```
For more information about pip refer to http://www.pip-installer.org
We will recommend using virtualenv for development. It's great to have a separate environment for each project, keeping the dependencies for multiple projects separated:
```
sudo pip install virtualenv
```
For more information about virtualenv refer to http://www.virtualenv.org
Create a virtualenv for the project. This can be inside the project directory, but cannot be under version control:
```
virtualenv --distribute xhtml2pdfenv --python=python2
```
Activate your virtualenv:
```
source xhtml2pdfenv/bin/activate
```
Later to deactivate it use:
```
deactivate
```
The next step will be to install/upgrade dependencies from the requirements.txt file:
```
pip install -r requirements.txt
```
Run tests to check your configuration:
```
nosetests --with-coverage
```
You should have a log with the following success status:
```
Ran 36 tests in 0.322s

OK
```

Python integration

Some simple demos of how to integrate xhtml2pdf into a Python program may be found here: test/simple.py

Running tests

Two different test suites are available to assert that xhtml2pdf works reliably:

Unit tests. The unit testing framework is currently minimal, but is being improved on a regular basis (contributions welcome). They should run in the expected way for Python's unittest module, i.e.:
```
nosetests --with-coverage (or your personal favorite)
```
Functional tests. Thanks to mawe42's super cool work, a full functional test suite is available at testrender/.

Contact

This project is community-led! Feel free to open up issues on GitHub about new ideas to improve xhtml2pdf.

History

These are the major milestones and the maintainers of the project:

2000-2007, commercial project, spirito.de, written by Dirk Holtwich
2007-2010 Dirk Holtwich (project named "Pisa", project released as GPL)
2010-2012 Dirk Holtwick (project named "xhtml2pdf", changed license to Apache)
2012-2015 Chris Glass (@chrisglass)
2015-2016 Benjamin Bach (@benjaoming)
2016-2018 Sam Spencer (@LegoStormtroopr)
2018-Current Luis Zarate (@luisza)

For more history, see the CHANGELOG.txt file.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments

Problems with some Unicode characters
Hi, I'm using the latest xhtml2pdf (0.2b1) & reportlab (3.4.0) through django-easy-pdf (0.1.0) on Python 3.6.0 and it's working great for the most part! One problem I am still experiencing, though, is that some Unicode characters are not rendering properly (šŠčČćĆđĐžŽ):

I'm using the default django-easy-pdf base template and I found that I can somewhat repair things if I override it to declare the html encoding:

{% block extra_style %} <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> {% endblock %}

Which results in some characters being rendered correctly like Š and Ž, but not all of them (Č, Ć, Đ are still blacked out).

I tried experimenting with different font declarations (sans-serif, serif, external fonts), but I can't seem to fix this. The characters are never rendered correctly. I don't know if I'm missing some xhtml2pdf / Reportlab setting here. Do you maybe have an idea of a possible solution?
Fonts
opened by metakermit 48

black square box while generating pdf (unicode error)

A weird problem. While generating pdf, inplace of unicodes square black boxes apperars. Dont know if its unicode or font-face error. I even dont know if to use the "font-face and font-family" to generate the unicode into pdf. Anything I am missing ?? Great thanks.

Code snippet # -- coding: utf-8 --

from xhtml2pdf import pisa
from StringIO import StringIO

source = """<html>
            <style>
                @font-face {
                font-family: Mangal;
                src: url("mangal.ttf");
                }

                body {
                font-family: Mangal;
                }
            </style>
            <body>
                This is a test <br/>
                       सरल
            </body>
        </html>"""

# Utility function
def convertHtmlToPdf(source):       
    pdf = StringIO()
    pisaStatus = pisa.CreatePDF(StringIO(source.encode('utf-8')), pdf)

    # return True on success and False on errors
    print "Success: ", pisaStatus.err
    return pdf

# Main program
if __name__=="__main__":
    print pisa.showLogging()
    pdf = convertHtmlToPdf(source)
    fd = open("test.pdf", "w+b")
    fd.write(pdf.getvalue())
    fd.close()

opened by beebek 31

Twitter-Bootstrap Causes Selector CSSParseError
Twitter Bootstrap has some pretty gnarly CSS selectors that xhml2pdf doesn't like.

Result is:

Selector Pseudo Function closing ')' not found:: (u':not(', u'[controls]) {\n disp')

pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), dest=result, link_callback=fetch_resources )
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/document.py" in pisaDocument

encoding, context=context, xml_output=xml_output)
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/document.py" in pisaStory

pisaParser(src, context, default_css, xhtml, encoding, xml_output)
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/parser.py" in pisaParser

context.parseCSS()
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/context.py" in parseCSS

self.css = self.cssParser.parse(self.cssText)
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in parse

src, stylesheet = self._parseStylesheet(src)
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseStylesheet

src, atResults = self._parseAtKeyword(src)
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseAtKeyword

src, result = self._parseAtImports(src)
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseAtImports

stylesheet = self.cssBuilder.atImport(import_, mediums, self)
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/css.py" in atImport

return cssParser.parseExternal(import_)
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/context.py" in parseExternal

result = self.parse(cssFile.getData())
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in parse

src, stylesheet = self._parseStylesheet(src)
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseStylesheet

src, ruleset = self._parseRuleset(src)
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseRuleset

src, selectors = self._parseSelectorGroup(src)
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelectorGroup

src, selector = self._parseSelector(src)
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelector

src, selector = self._parseSimpleSelector(src)
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSimpleSelector

src, selector = self._parseSelectorPseudo(src, selector)
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelectorPseudo

raise self.ParseError('Selector Pseudo Function closing \')\' not found', src, ctxsrc)

Exception Type: CSSParseError at /p/pdf/gd8lx6xbl Exception Value: Selector Pseudo Function closing ')' not found:: (u':not(', u'[controls]) {\n disp')
opened by Miserlou 22
Now broken with html5lib
From https://pypi.python.org/pypi/html5lib/0.99999999:

Move a whole load of stuff (inputstream, ihatexml, trie, tokenizer, utils) to be underscore prefixed to clarify their status as private

Except https://github.com/xhtml2pdf/xhtml2pdf/blob/master/xhtml2pdf/parser.py#L17:

from html5lib import treebuilders, inputstream

Current fix:

Use `pip install html5lib==1.0b8`
opened by LegoStormtroopr 18
Python 3

I made some changes so that the tests now run in both Python 2 and and Python 3, . Most of the changes I made were the same as made by @wylee, in #205.

I also added a file to do Travis CI testing #202, and updated some of the dependencies.

opened by JimInCO 18

Add optional pisaDocument argument to set metadata

Without this the functionality of pisaDocument would need to be recreated in order to set metadata such as the document author.

Usage is like so:

pisaDocument(src=io.StringIO(html), dest=open(output_file, "w"), context_meta={
            "author": "MyCorp Ltd.",
            "title": "My Document Title",
            "subject": "My Document Subject",
            "keywords": "pdf,documents",
        })

opened by alistair-broomhead 16

I need to step back as a maintainer

I've been neglecting my repsonsibilities to this project.

While I was happy to take it on, and help curate other contributors Python3 support into the codebase, I have to step back.

Apologies to anyone who is still using this. :(

opened by LegoStormtroopr 14
Python2/Python3 compatibility
So, I'm close but for some reason on my install image in docs don't show up in python2 and are a little smaller in python3.

I'm gonna fix this even if it kills me.

Todo:

[x] Figure out how to render a transparent PDF as white (-flatten doesn't work for multipage PDFs)

[ ] Make the images the right size

[x] Clean up the string.join issues in reportlab_paragraph

[ ] Fix background for tr's
opened by LegoStormtroopr 13
Unwanted Helvetica font

No matter what font I use, there is always Helvetica and it's not embed, so most of printing companies can not print the document if a font is missing.

opened by maguayo 13
ZeroDivisionError: float division by zero
Hi, I get this error while trying to parse an HTML containing the following piece of code. I'm using the latest versions of all packages needed:

html5lib-0.90

pyPdf-1.13

reportlab-2.5

xhtml2pdf-0.0.3

and Python 2.7 (2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)])

Python Code: -[ import cStringIO as StringIO from xhtml2pdf import pisa ....

html = ''' <TABLE BORDER="0" CELLPADDING="2" CELLSPACING="2"> <TR> <TD></TD> </TR> </TABLE> ''' dest = file('test.pdf', "wb") pdf = pisa.CreatePDF( StringIO.StringIO(html), dest, log_warn = 1, log_err = 1 ) ]-

Note: If I put something inside the TD (example: ".... <TD>... some stuff..... </TD>........") or I change the value of the attr cellpadding, it works!!!

Traceback: -[ Traceback (most recent call last): File "C:\tmp\test.py", line 95, in log_err = 1 File "C:\Python27\lib\site-packages\xhtml2pdf\document.py", line 131, in pisaDocument doc.build(context.story) File "C:\Python27\lib\site-packages\reportlab\platypus\doctemplate.py", line 880, in build self.handle_flowable(flowables) File "C:\Python27\lib\site-packages\reportlab\platypus\doctemplate.py", line 763, in handle_flowable if frame.add(f, canv, trySplit=self.allowSplitting): File "C:\Python27\lib\site-packages\reportlab\platypus\frames.py", line 174, in _add flowable.drawOn(canv, self._x + self._leftExtraIndent, y, _sW=aW-w) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 108, in drawOn self._drawOn(canvas) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 89, in _drawOn self.draw()#this is the bit you overload File "C:\Python27\lib\site-packages\reportlab\platypus\tables.py", line 1302, in draw self._drawCell(cellval, cellstyle, (colpos, rowpos), (colwidth, rowheight)) File "C:\Python27\lib\site-packages\reportlab\platypus\tables.py", line 1393, in _drawCell w, h = self._listCellGeom(cellval,colwidth,cellstyle,W=W, H=H,aH=rowheight) File "C:\Python27\lib\site-packages\xhtml2pdf\xhtml2pdf_reportlab.py", line 710, in _listCellGeom return Table._listCellGeom(self, V, w, s, W=W, H=H, aH=aH) File "C:\Python27\lib\site-packages\reportlab\platypus\tables.py", line 377, in _listCellGeom vw, vh = v.wrapOn(canv, aW, aH) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 119, in wrapOn w, h = self.wrap(aW,aH) File "C:\Python27\lib\site-packages\xhtml2pdf\xhtml2pdf_reportlab.py", line 693, in wrap return KeepInFrame.wrap(self, availWidth, availHeight) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 970, in wrap W, H = func(s1) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 951, in func W /= x ZeroDivisionError: float division by zero ]-

Thanks for your great job, Shen139
opened by shen139 12
Release a new version

I just upgraded my version with the master branch from github and it fixes a ton of issues in the current 0.0.5 release. Could you release a new version so we can just use pypi?
Thanks for all the work on this :)

opened by lzantal 11
Security Vulnerability in xhtml2pdf Dependency "future"

There is a security vulnerability in the future package that does not seem likely to be resolved. Can xhtml2pdf be released to not require this package? CVE: https://github.com/advisories/GHSA-v3c5-jqr6-7qm8 Discussions: https://github.com/PythonCharmers/python-future/issues/612 https://github.com/PythonCharmers/python-future/pull/610

opened by jacobgqc 0
Replace PyPDF3 by pypdf
PyPDF2 has recently moved to pypdf. Im the maintainer of PyPDF2 and pypdf.

PyPDF3 has a way smaller community than PyPDF2. I try to get the community to move to pypdf: https://github.com/sfneal/PyPDF3/issues/18

Swiftly going over some issues of xhtml2pdf:

#624 : https://pypdf.readthedocs.io/en/latest/modules/PdfWriter.html#pypdf.PdfWriter.pdf_header

#454 : We added modern encryption / decryption support - https://pypdf.readthedocs.io/en/latest/user/encryption-decryption.html
opened by MartinThoma 1
TOC with points

Good Morning,

I want to create a table of contents with points between the text and the number of page but only blank is available like on picture below.

This possibility exists ?

Best Regards

opened by eyesonly13 0
Footer not displayed in 0.2.8 and 0.2.7
The footer frame is not displayed in pdf file when following the documentation.

How to reproduce:

create a virtualenv with xhtml2pdf library:

python3 -m venv venv ./venv/bin/pip install xhtml2pdf

add an index.html file containing the example at https://xhtml2pdf.readthedocs.io/en/latest/format_html.html#example-with-2-static-frames-and-1-content-frame

./venv/bin/xhtml2pdf index.html

The generated index.pdf shows only 'Lyrics-R-Us' and 'To PDF or not to PDF'.

It could occurs in others releases. I only checked these two releases.
opened by sblondon 2
AttributeError: 'PmlBaseDoc' object has no attribute '_page_count'

When I add the <pdf:pagecount> html tag within the source_html attribute of pisa.CreatePDF, copied directly from the example, it gives me the error AttributeError: 'PmlBaseDoc' object has no attribute '_page_count'

FYI, I am on a M1 Mac

opened by phillipshaong 0

Releases(v0.2.8)

v0.2.8(Jun 16, 2022)
🐛 Bug-Fixes

Fix background-image issues with #614 and pull requests with #619

Fix CSSParseError for minified @font-face definitions #609

Fixed a few typos and grammar mistakes in usage.rst documentation. #610

Source code(tar.gz)
Source code(zip)
v0.2.7(Mar 31, 2022)
🎉 New

Add encryption and password protection

New WaterMark management system with new options

Add Graphic builder

Add signing pdfs (simple and pades)

🐛 Bug-Fixes

Remove import cycle between utils and default

Fixed link_callback construction of path

Fixed path when is relative to current path

⚠️ Deprecation

xhtml in pisa.CreatePDF support will removed on next release

XML2PDF and XHTML2PDF will be removed on next release use HTML2PDF instead

📘 Documentation

Add render pdf on documentation and add some html example.

Include graphics examples

| Thanks to the following people on GitHub for contributing to this release: | @marcelagz for graphics support :)
Source code(tar.gz)
Source code(zip)
v0.2.6(Mar 11, 2022)
Drop python 2 support.

Remove most of python 2 code and cleanup

Update packages dependencies

Remove six dependency and update Readme

Set timeout in https options

Add new file manager approach using factory method, now new classes deal with different types of data B64InlineURI, LocalProtocolURI, NetworkFileUri, LocalFileURI, BytesFileUri

Now getColor return None when None is passed ignoring default value, but return default if bool(data) == false

rtl languages reversed lines added as a ParaFrag (note: not fully supported yet)

Check if Paragraph has 'rtl' attribute (note: not fully supported yet)

Fix UnboundLocalError in reportlab_paragraph (#585) (#586)

Remove usage of getStringIO (#590) removed form reportlab

Change test for github workflow using only Linux

Add Python 3.9, 3.10

Switch from PyPDF2 to PyPDF3

Add SVG support

Update package information.

Allow call tests using make.

Source code(tar.gz)
Source code(zip)
xhtml2pdf-0.2.6.tar.gz(99.36 KB)
0.2.4(Jan 21, 2020)

Update link_callback documentation. Stylize code lines in documentation. Fixed cgi escape util on setup version. Add test to python 3.7 and 3.8. Fixed width assignation on fragments. Support urllib in python 3 and python 2. Add em unit support. Repair base64 unscaped string. Fixed urlparse when urls has parameters. Fixed i_rgbcolor support.
Source code(tar.gz)
Source code(zip)
0.2.2(Apr 17, 2018)

Include new python version in test and change requirements to force html5lib to 1.0.1.
Source code(tar.gz)
Source code(zip)
0.2.1(Feb 15, 2018)
This new release has a lot of improvements in python 3 and demos.

Version 0.2.1

Improve python3 support - thanks ***luisza, andreyfedoseev and flupzor ***

Include new Httplibs options - thanks luisza

Support to background image - thanks flupzor

Remove python23 support - thanks flupzor

Transparent images work again in Python 3 - thanks flupzor

Readthedocs integration - thanks luisza

Update Django demo site - thanks luisza

PEP8 and cleanup code - thanks luisza

Drop the turbogears module - thanks browniebroke

Source code(tar.gz)
Source code(zip)
0.2b(Feb 9, 2017)

Source code(tar.gz)
Source code(zip)
0.1b2(Aug 1, 2016)

Fixes #318
Source code(tar.gz)
Source code(zip)
0.1b1(Jun 5, 2016)

This release is possibly the final release ever of xhtml2pdf, except if someone takes over maintainership. It has Python 3 support, but there are certain bugs also that you can read about in the ~37 unclosed issues.
Source code(tar.gz)
Source code(zip)
0.1a4(May 18, 2016)
Version 0.1alpha4

Removed PyPy support

Avoid exceptions likely to occur systematic to how narrow a text column is #309 - thanks _jkDesignDE_

Improved tests for tables #305 - thanks _taddeimania_

Fix broken empty PDFs in Python2 #301 - thanks _citizen-stig_

Unknown page sizes now raise an exception #71 - thanks _benjaoming_

Unorderable types caused by duplicate CSS selectors / rules #69 - thanks _benjaoming_

Allow empty page definition with no space after @page - #88 - thanks _benjaoming_

Error when in addFromFile using file-like object #245 - thanks _benjaoming_

Python 3: Bad table formatting with empty columns #279 - thanks _citizen-stig and benjaoming_

Removed paragraph2.py, unused ghost file since the beginning of the project #289 - thanks _citizen-stig_

Catch-all exceptions removed in a lot of places, not quite done #290 - thanks _benjaoming_

Source code(tar.gz)
Source code(zip)
0.1a2(Apr 14, 2016)

Source code(tar.gz)
Source code(zip)

Owner

GitHub

Lektor-html-pretify - Lektor plugin to pretify the HTML DOM using Beautiful Soup

html-pretify Lektor plugin to pretify the HTML DOM using Beautiful Soup. How doe

2 Nov 8, 2022

A HTML-code compiler-thing that lets you reuse HTML code.

RHTML RHTML stands for Reusable-Hyper-Text-Markup-Language, and is pronounced "Rech-tee-em-el" despite how its abbreviation is. As the name stands, RH

4 Nov 15, 2021

Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API

Dominate Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure

1.5k Jan 9, 2023

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

Bleach Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes. Bleach can also linkify text safely, appl

2.5k Dec 29, 2022

Standards-compliant library for parsing and serializing HTML documents and fragments in Python

html5lib html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all majo

1k Dec 27, 2022

A python HTML builder library.

PyML A python HTML builder library. Goals Fully functional html builder similar to the javascript node manipulation. Implement an html parser that ret

8 Jul 4, 2022

Generate HTML using python 3 with an API that follows the DOM standard specfication.

Generate HTML using python 3 with an API that follows the DOM standard specfication. A JavaScript API and tons of cool features. Can be used as a fast prototyping tool.

114 Dec 14, 2022

Safely add untrusted strings to HTML/XML markup.

MarkupSafe MarkupSafe implements a text object that escapes characters so it is safe to use in HTML and XML. Characters that have special meanings are

514 Dec 31, 2022

Pythonic HTML Parsing for Humans™

Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When us

12.9k Jan 1, 2023

Modded MD conversion to HTML

MDPortal A module to convert a md-eqsue lang to html Basically I ruined md in an attempt to convert it to html Overview Here is a demo file from parse

1 Nov 27, 2021

A jquery-like library for python

pyquery: a jquery-like library for python pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jq

2.2k Dec 29, 2022

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

75 Oct 21, 2022

Python utility library for compositing PDF documents with reportlab.

pdfdoc-py Python utility library for compositing PDF documents with reportlab. Installation The pdfdoc-py package can be installed directly from the s

1 Jan 6, 2022

A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

Universal Online Judge Spider Introduction This is a spider for Universal Online Judge (UOJ) system (https://uoj.ac/). It also works for all other Onl

1 Dec 7, 2021

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

14.8k Jan 5, 2023

That project takes as input special TXT File, divides its content into lsit of HTML objects and then creates HTML file from them.

1 Jan 10, 2022

A library for converting HTML into PDFs using ReportLab

Related tags

Overview

XHTML2PDF

About

Documentation

Installation

Requirements

Alternatives

Call for testing

Development environment

Python integration

Running tests

Contact

History

License

Comments

Releases(v0.2.8)

v0.2.8(Jun 16, 2022)

v0.2.7(Mar 31, 2022)

v0.2.6(Mar 11, 2022)

0.2.4(Jan 21, 2020)

0.2.2(Apr 17, 2018)

0.2.1(Feb 15, 2018)

Version 0.2.1

0.2b(Feb 9, 2017)

0.1b2(Aug 1, 2016)

0.1b1(Jun 5, 2016)

0.1a4(May 18, 2016)

Version 0.1alpha4

0.1a2(Apr 14, 2016)

Owner

Lektor-html-pretify - Lektor plugin to pretify the HTML DOM using Beautiful Soup

A HTML-code compiler-thing that lets you reuse HTML code.

Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

Standards-compliant library for parsing and serializing HTML documents and fragments in Python

A python HTML builder library.

Generate HTML using python 3 with an API that follows the DOM standard specfication.

Safely add untrusted strings to HTML/XML markup.

Pythonic HTML Parsing for Humans™

Modded MD conversion to HTML

A jquery-like library for python

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

Python utility library for compositing PDF documents with reportlab.

A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

That project takes as input special TXT File, divides its content into lsit of HTML objects and then creates HTML file from them.

Converting Html files to pdf using python script, pdfkit module and wkhtmltopdf.

A Python module and command-line utility for converting .ANS format ANSI art to HTML

Lektor-html-pretify - Lektor plugin to pretify the HTML DOM using Beautiful Soup

Django-Text-to-HTML-converter - The simple Text to HTML Converter using Django framework