Standards-compliant library for parsing and serializing HTML documents and fragments in Python

Overview

html5lib

https://travis-ci.org/html5lib/html5lib-python.svg?branch=master

html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.

Usage

Simple usage follows this pattern:

import html5lib
with open("mydocument.html", "rb") as f:
    document = html5lib.parse(f)

or:

import html5lib
document = html5lib.parse("<p>Hello World!")

By default, the document will be an xml.etree element instance. Whenever possible, html5lib chooses the accelerated ElementTree implementation (i.e. xml.etree.cElementTree on Python 2.x).

Two other tree types are supported: xml.dom.minidom and lxml.etree. To use an alternative format, specify the name of a treebuilder:

import html5lib
with open("mydocument.html", "rb") as f:
    lxml_etree_document = html5lib.parse(f, treebuilder="lxml")

When using with urllib2 (Python 2), the charset from HTTP should be pass into html5lib as follows:

from contextlib import closing
from urllib2 import urlopen
import html5lib

with closing(urlopen("http://example.com/")) as f:
    document = html5lib.parse(f, transport_encoding=f.info().getparam("charset"))

When using with urllib.request (Python 3), the charset from HTTP should be pass into html5lib as follows:

from urllib.request import urlopen
import html5lib

with urlopen("http://example.com/") as f:
    document = html5lib.parse(f, transport_encoding=f.info().get_content_charset())

To have more control over the parser, create a parser object explicitly. For instance, to make the parser raise exceptions on parse errors, use:

import html5lib
with open("mydocument.html", "rb") as f:
    parser = html5lib.HTMLParser(strict=True)
    document = parser.parse(f)

When you're instantiating parser objects explicitly, pass a treebuilder class as the tree keyword argument to use an alternative document format:

import html5lib
parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
minidom_document = parser.parse("<p>Hello World!")

More documentation is available at https://html5lib.readthedocs.io/.

Installation

html5lib works on CPython 2.7+, CPython 3.5+ and PyPy. To install:

$ pip install html5lib

The goal is to support a (non-strict) superset of the versions that pip supports.

Optional Dependencies

The following third-party libraries may be used for additional functionality:

  • lxml is supported as a tree format (for both building and walking) under CPython (but not PyPy where it is known to cause segfaults);
  • genshi has a treewalker (but not builder); and
  • chardet can be used as a fallback when character encoding cannot be determined.

Bugs

Please report any bugs on the issue tracker.

Tests

Unit tests require the pytest and mock libraries and can be run using the py.test command in the root directory.

Test data are contained in a separate html5lib-tests repository and included as a submodule, thus for git checkouts they must be initialized:

$ git submodule init
$ git submodule update

If you have all compatible Python implementations available on your system, you can run tests on all of them using the tox utility, which can be found on PyPI.

Questions?

There's a mailing list available for support on Google Groups, html5lib-discuss, though you may get a quicker response asking on IRC in #whatwg on irc.freenode.net.

Comments
  • fix version numbering

    fix version numbering

    I just spent 4 hours with another engineer trying to figure out why our builds weren't producing the same results.

    Turns out I had html5lib "0.9999999" and he had "0.999999999" which was nearly impossible to spot on the pip list readout. >.<

    http://semver.org/#why-use-semantic-versioning

    I'm going to guess that you'll say html5lib isn't ready for production so your versioning scheme is appropriate, but I'm still filing this bug because I'm steamed about losing so much development time. :/

    opened by phillyc 52
  • Do not directly use isolated surrogates in unicode literals

    Do not directly use isolated surrogates in unicode literals

    Jython does not support isolated surrogates in unicode, including in unicode literals. This has been reported in https://github.com/html5lib/html5lib-python/issues/2 This bug is critical for Jython due to the fact that html5lib is a vendor lib for pip, and this is blocking pip from running on Jython.

    For platforms besides Jython, this pull request allows for these surrogates to be constructed in literals, but through an additional step of indirection. For Jython itself, Jython's normal decode of literals will ensure that such invalid unicode strings cannot be constructed from any source.

    To run this on Jython:

    1. Install https://bitbucket.org/jimbaker/jython-socket-reboot, following these instructions: https://wiki.python.org/jython/JythonDeveloperGuide
    2. Use this branch of pip to install nosetests, etc.: https://github.com/jimbaker/pip Note that tox is not yet supported - because we need to get pip running first! :)

    Note that in the dev build, you will find executables in dist/bin, such as dist/bin/jython or dist/bin/pip

    The jython-socket-reboot branch is nearly complete for merging into Jython; it is a major component of Jython 2.7.0 beta 3. (I'm a core dev of Jython.)

    opened by jimbaker 28
  • New release on PyPI?

    New release on PyPI?

    Since this library is vendored by pip, which is itself vendored in CPython's ensurepip module, the fact that there's no release that includes PR #403 is blocking the removal of the deprecated abstract base classes from the collections module, long advertised as happening in Python 3.8, see https://github.com/python/cpython/pull/10596

    Normally I am the last person to ask for releases (I assume maintainers have good reasons for not doing so, and I usually don't like being on the receiving end of such requests), but the deadline for the 3.8 beta release is coming up very soon, so if there's any way to expedite an html5lib release it would really help that effort, and it would avoid stickier solutions like patching html5lib directly in pip.

    opened by pganssle 27
  • Update Travis to use tox, and add Appveyor CI

    Update Travis to use tox, and add Appveyor CI

    Update tox.ini to utilise requirements-test.txt, and run pylint. Require lxml 3.6.0 on Windows as it has wheels available for 2.6-3.4.

    Also enables coverage for PyPy on Travis.

    opened by jayvdb 17
  • WIP More general fix for #127 with addinfourl

    WIP More general fix for #127 with addinfourl

    Do not merge yet, this is incomplete.

    As discussed in #134 I changed this to avoid .read(0) entirely and pass a first chunk to HTMLUnicodeInputStream and HTMLBinaryInputStream, but then I get lost in their implementation and I don’t know what to do with that first chunk.

    Surprisingly, most tests still passed because .seek(0) is used in some places. I added a failing test with a non-seekable input.

    opened by SimonSapin 14
  • How could html5lib be made faster?

    How could html5lib be made faster?

    html5lib is nice, but it's pretty slow. On a fairly large test file, lxml took 50ms and html5lib took 5 seconds, which is 100 times slower.

    Are there any particularly slow parts of html5lib that could be optimized? Would compiling it with Cython help?

    opened by tbodt 13
  • test not running when building on opensuse build server

    test not running when building on opensuse build server

    Hi

    trying to package html5lib for opensuse, but I'm running into the following error when trying to run the tests:

    [ 113s] File "/home/abuild/rpmbuild/BUILD/html5lib-0.999999999/html5lib/tests/tokenizertotree.py", line 10, in [ 113s] from . import test_tokenizer [ 113s] ImportError: cannot import name 'test_tokenizer'

    I can't see that file in the tar-ball or in git over here. Building from the tar-ball on pypi using python 3.5.1.

    opened by arunpersaud 11
  • Declare implicit dependency on Six 1.9 or higher

    Declare implicit dependency on Six 1.9 or higher

    This project uses a syntax for importing urllib.parse from six which was introduced in Six version 1.4

    The import can be found here

    The relevant change in Six can be found here

    This can cause an issue if a project has a previous version of Six (say version 1.3) installed when installing html5lib

    EDIT: importing viewkeys in html5parser.py requires six 1.9

    opened by amorde 10
  • Selected patches from Calibre

    Selected patches from Calibre

    See #119. CC @kovidgoyal.

    This cherry-picks a few things from https://github.com/gsnedders/html5lib-python/commits/calibre-patches, which was a complete set of Calibre's patches from November 2013. https://github.com/kovidgoyal/calibre/commits/master/src/html5lib has very little changed in it since then, primarily a move to 0.999999-dev and a separate downstream fix for https://github.com/html5lib/html5lib-python/commit/0c551c9519e47f76f8f185089ed71cb9539b6e00.

    So, of those on that branch…

    • https://github.com/gsnedders/html5lib-python/commit/49f37d2724b117b1eaa1ba7a23d27aef14db6932, https://github.com/gsnedders/html5lib-python/commit/64e8b0bd8e9bc4dbb8339ab0e094ef933035339c, and https://github.com/gsnedders/html5lib-python/commit/cc9f28af4859663738c67971f9893e2558f86138 are all things I'm against landing upstream (they should just be special-cased in the downstream tree builder rather than requiring invoking separate handling for all tree builders).
    • https://github.com/gsnedders/html5lib-python/commit/7702d80e52c1d1c90ea41af6808623c78b518f01 is something I find very inelegant, and goes against the recommendations laid out in PEP 8. Also, with CPython, in the True/False cases it's likely slower, therefore failing at its stated goal, as it results in more byte code and POP_JUMP_IF_FALSE and POP_JUMP_IF_TRUE special-case the condition being True or False (oddly, they don't specialise None, though it is in PyObject_IsTrue; if that makes any notable performance difference then I'd suggest fixing that in CPython).
    • https://github.com/gsnedders/html5lib-python/commit/a2d2e05fb667683db9e020e7a2629e62fdc95832 is unlikely to land because the input stream isn't public API (not that we document what is anywhere…).
    • https://github.com/gsnedders/html5lib-python/commit/1576515fcd62af2969e387c92846790ec177a6db in principle is something we want, but I think we want for all tokens and not just elements. (See also #87 and #97.) That or something similar will likely land in future.
    • https://github.com/gsnedders/html5lib-python/commit/124e97565affa33dd36fd29c874020c72cce29fb is included here, after https://github.com/html5lib/html5lib-python/commit/9dca7d8e8e2eb95f8e0f9acba0a29386129b272e which fixes the DOM tree builder to follow the tree builder API correctly. (It exposed bugs in the DOM tree builder, yay!)
    • https://github.com/gsnedders/html5lib-python/commit/db0b5a02a2cb56d0410b878005ca5fdfa91961b2 is what most of the commits on this PR deal with, fixing up small issues with the Calibre code.
    needs-tests 
    opened by gsnedders 10
  • Add GitHub Actions workflow for tox

    Add GitHub Actions workflow for tox

    Provides a migration path for #525.

    • ~~The python debug.py post-build step from .travis.yml has not been migrated; GitHub Actions workflows support a continue-on-error configuration flag at the job and step level, but I don't yet see whether this can be used to provide an equivalent 'post-job' command.~~
    • ~~As of the initial version of this pull request, builds take place within a single GitHub Actions workflow that runs tox in parallel mode. Build output for each Python environment is present in the workflow console, but it's not as human-readable as the grouping provided by Travis CI.~~
    opened by jayaddison 9
  • Add some missing MathML elements and attributes.

    Add some missing MathML elements and attributes.

    (in particular this adds the semantics/annotation/annotation-xml elements required to fix a very old planet bug http://www.maths-informatique-jeux.com/blog/frederic/?post/2013/01/22/Analysis-of-Lithium-s-algorithm#c142)

    opened by fred-wang 9
  • On html5lib maintenance

    On html5lib maintenance

    Hi there!

    https://github.com/html5lib/html5lib-python/issues/361 discusses funding for maintenance, but that conversation has languished a year and a half ago.

    It looks like there's a bunch of PRs that would bring this repo a bit into the present (and into the future!), but there's... maybe no one around to merge them? Sorry for the ping, but @gsnedders, @jgraham, you seem to have been the last folks to merge PRs around here; would you need a hand maintaining things and getting things rolling again?

    For clarity, my angle here is that I inadvertently caused a bit of havoc over at nbconvert by switching their HTML sanitization away from lxml.clean_html to bleach, which, as we know, uses html5lib under the hood, and there are some performance issues now; see https://github.com/jupyter/nbconvert/issues/1892. Also, for the record, I'm no stranger to maintaining a popular PyPI package; I'm the current de facto maintainer of babel.

    opened by akx 0
  • Bump Flake8 to fix CI for Python 3.7

    Bump Flake8 to fix CI for Python 3.7

    The Python 3.7 CI job has started failing:

        self._load_entrypoint_plugins()
      File "/home/runner/work/html5lib-python/html5lib-python/.tox/py/lib/python3.7/site-packages/flake8/plugins/manager.py", line 254, in _load_entrypoint_plugins
        eps = importlib_metadata.entry_points().get(self.namespace, ())
    AttributeError: 'EntryPoints' object has no attribute 'get'
    ERROR: InvocationError for command /home/runner/work/html5lib-python/html5lib-python/.tox/py/bin/flake8 . (exited with code 1)
    

    This is because importlib_metadata 5.0 removes some deprecations, and it's a dependency of Flake8. But new Flake8 works, so let's update:

    • https://github.com/python/importlib_metadata/issues/406
    • https://github.com/PyCQA/flake8/issues/1701

    Also fix the new Flake8 findings:

    ./html5lib/_inputstream.py:327:27: E275 missing whitespace after keyword
    ./html5lib/serializer.py:225:15: E275 missing whitespace after keyword
    ./html5lib/serializer.py:232:15: E275 missing whitespace after keyword
    ./html5lib/treewalkers/etree.py:40:19: E275 missing whitespace after keyword
    ./html5lib/treebuilders/etree.py:111:19: E275 missing whitespace after keyword
    ./html5lib/treebuilders/etree.py:204:19: E275 missing whitespace after keyword
    ./html5lib/tests/test_serializer.py:77:19: E275 missing whitespace after keyword
    
    opened by hugovk 0
  • 1.1: sphinx warnings about `reference target not found`

    1.1: sphinx warnings about `reference target not found`

    On building my packages I'm using sphinx-build command with -n switch which shows warmings about missing references. These are not critical issues. Here is the output with warnings:

    + /usr/bin/sphinx-build -n -T -b man doc build/sphinx/man
    Running Sphinx v4.5.0
    making output directory... done
    building [mo]: targets for 0 po files that are out of date
    building [man]: all manpages
    updating environment: [new config] 10 added, 0 changed, 0 removed
    reading sources... [100%] movingparts
    looking for now-outdated files... none found
    pickling environment... done
    checking consistency... done
    writing... python-html5lib.3 { movingparts modules html5lib html5lib.filters html5lib.treebuilders html5lib.treewalkers html5lib.treeadapters changes license } /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/movingparts.rst:7: WARNING: py:class reference target not found: html5lib.serializer.HTMLSerializer
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/movingparts.rst:18: WARNING: py:mod reference target not found: xml.etree
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/movingparts.rst:23: WARNING: py:mod reference target not found: xml.dom.minidom
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/movingparts.rst:25: WARNING: py:mod reference target not found: lxml.etree
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/movingparts.rst:61: WARNING: py:class reference target not found: html5lib.serializer.HTMLSerializer
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/movingparts.rst:86: WARNING: py:class reference target not found: html5lib.serializer.HTMLSerializer
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/movingparts.rst:160: WARNING: py:mod reference target not found: chardet
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.rst:7: WARNING: py:mod reference target not found: constants
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.rst:14: WARNING: py:mod reference target not found: html5parser
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.rst:22: WARNING: py:mod reference target not found: serializer
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/html5lib/serializer.py:docstring of html5lib.serializer.serialize:9: WARNING: py:class reference target not found: html5lib.serializer.HTMLSerializer
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.filters.rst:4: WARNING: py:mod reference target not found: base
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.filters.rst:12: WARNING: py:mod reference target not found: alphabeticalattributes
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/html5lib/filters/alphabeticalattributes.py:docstring of html5lib.filters.alphabeticalattributes.Filter:1: WARNING: py:class reference target not found: html5lib.filters.base.Filter
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.filters.rst:20: WARNING: py:mod reference target not found: inject_meta_charset
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/html5lib/filters/inject_meta_charset.py:docstring of html5lib.filters.inject_meta_charset.Filter:1: WARNING: py:class reference target not found: html5lib.filters.base.Filter
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.filters.rst:28: WARNING: py:mod reference target not found: lint
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/html5lib/filters/lint.py:docstring of html5lib.filters.lint.Filter:1: WARNING: py:class reference target not found: html5lib.filters.base.Filter
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.filters.rst:36: WARNING: py:mod reference target not found: optionaltags
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/html5lib/filters/optionaltags.py:docstring of html5lib.filters.optionaltags.Filter:1: WARNING: py:class reference target not found: html5lib.filters.base.Filter
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.filters.rst:44: WARNING: py:mod reference target not found: sanitizer
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/html5lib/filters/sanitizer.py:docstring of html5lib.filters.sanitizer.Filter:1: WARNING: py:class reference target not found: html5lib.filters.base.Filter
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.filters.rst:52: WARNING: py:mod reference target not found: whitespace
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/html5lib/filters/whitespace.py:docstring of html5lib.filters.whitespace.Filter:1: WARNING: py:class reference target not found: html5lib.filters.base.Filter
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.treebuilders.rst:4: WARNING: py:mod reference target not found: treebuilders
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.treebuilders.rst:12: WARNING: py:mod reference target not found: base
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.treebuilders.rst:20: WARNING: py:mod reference target not found: dom
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.treebuilders.rst:28: WARNING: py:mod reference target not found: etree
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.treebuilders.rst:36: WARNING: py:mod reference target not found: etree_lxml
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.treewalkers.rst:4: WARNING: py:mod reference target not found: treewalkers
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.treewalkers.rst:12: WARNING: py:mod reference target not found: base
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.treewalkers.rst:20: WARNING: py:mod reference target not found: dom
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.treewalkers.rst:28: WARNING: py:mod reference target not found: etree
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.treewalkers.rst:36: WARNING: py:mod reference target not found: etree_lxml
    /home/tkloczko/rpmbuild/BUILD/html5lib-python-1.1/doc/html5lib.treewalkers.rst:44: WARNING: py:mod reference target not found: genshi
    done
    build succeeded, 35 warnings.
    
    opened by kloczek 0
  • Enfoce https://reuse.software compliance

    Enfoce https://reuse.software compliance

    The https://reuse.software specification is the standard for providing machine-readable licensing and copyright information.

    I believe:

    • machine-readable metadata is a good thing.
    • licensing and copyright matter
    • automation is a good thing.

    Therefore I propose we:

    1. Make this repository compliant with the https://reuse.software specification
    2. Enfoce compliance via a github action
    opened by hexagonrecursion 0
  • Remove Travis CI in favour of GitHub Actions

    Remove Travis CI in favour of GitHub Actions

    Fixes https://github.com/html5lib/html5lib-python/issues/525.

    GitHub Actions was added in https://github.com/html5lib/html5lib-python/pull/527, and tests are also run on AppVeyor for Windows (this could also be moved to GHA, but that's another issue).

    Also Travis CI hasn't been running for 10 months:

    image

    https://travis-ci.org/github/html5lib/html5lib-python/builds

    opened by hugovk 3
Owner
null
Pythonic HTML Parsing for Humans™

Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When us

Python Software Foundation 12.9k Jan 1, 2023
That project takes as input special TXT File, divides its content into lsit of HTML objects and then creates HTML file from them.

That project takes as input special TXT File, divides its content into lsit of HTML objects and then creates HTML file from them.

null 1 Jan 10, 2022
A HTML-code compiler-thing that lets you reuse HTML code.

RHTML RHTML stands for Reusable-Hyper-Text-Markup-Language, and is pronounced "Rech-tee-em-el" despite how its abbreviation is. As the name stands, RH

Duckie 4 Nov 15, 2021
Lektor-html-pretify - Lektor plugin to pretify the HTML DOM using Beautiful Soup

html-pretify Lektor plugin to pretify the HTML DOM using Beautiful Soup. How doe

Chaos Bodensee 2 Nov 8, 2022
A python HTML builder library.

PyML A python HTML builder library. Goals Fully functional html builder similar to the javascript node manipulation. Implement an html parser that ret

Arjix 8 Jul 4, 2022
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

Bleach Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes. Bleach can also linkify text safely, appl

Mozilla 2.5k Dec 29, 2022
A library for converting HTML into PDFs using ReportLab

XHTML2PDF The current release of xhtml2pdf is xhtml2pdf 0.2.5. Release Notes can be found here: Release Notes As with all open-source software, its us

null 2k Dec 27, 2022
Generate HTML using python 3 with an API that follows the DOM standard specfication.

Generate HTML using python 3 with an API that follows the DOM standard specfication. A JavaScript API and tons of cool features. Can be used as a fast prototyping tool.

byteface 114 Dec 14, 2022
Safely add untrusted strings to HTML/XML markup.

MarkupSafe MarkupSafe implements a text object that escapes characters so it is safe to use in HTML and XML. Characters that have special meanings are

The Pallets Projects 514 Dec 31, 2022
Modded MD conversion to HTML

MDPortal A module to convert a md-eqsue lang to html Basically I ruined md in an attempt to convert it to html Overview Here is a demo file from parse

Zeb 1 Nov 27, 2021
A jquery-like library for python

pyquery: a jquery-like library for python pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jq

Gael Pasgrimaud 2.2k Dec 29, 2022
Converts XML to Python objects

untangle Documentation Converts XML to a Python object. Siblings with similar names are grouped into a list. Children can be accessed with parent.chil

Christian Stefanescu 567 Nov 30, 2022
Python module that makes working with XML feel like you are working with JSON

xmltodict xmltodict is a Python module that makes working with XML feel like you are working with JSON, as in this "spec": >>> print(json.dumps(xmltod

Martín Blech 5k Jan 4, 2023
The lxml XML toolkit for Python

What is lxml? lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language. It's also very fast and memory

null 2.3k Jan 2, 2023
Python binding to Modest engine (fast HTML5 parser with CSS selectors).

A fast HTML5 parser with CSS selectors using Modest engine. Installation From PyPI using pip: pip install selectolax Development version from github:

Artem Golubin 710 Jan 4, 2023
Python library for serializing any arbitrary object graph into JSON. It can take almost any Python object and turn the object into JSON. Additionally, it can reconstitute the object back into Python.

jsonpickle jsonpickle is a library for the two-way conversion of complex Python objects and JSON. jsonpickle builds upon the existing JSON encoders, s

null 1.1k Jan 2, 2023
Swagger UI is a collection of HTML, JavaScript, and CSS assets that dynamically generate beautiful documentation from a Swagger-compliant API.

Introduction Swagger UI allows anyone — be it your development team or your end consumers — to visualize and interact with the API’s resources without

Swagger 23.2k Dec 29, 2022
NNR conformation conditional and global probabilities estimation and analysis in peptides or proteins fragments

NNR and global probabilities estimation and analysis in peptides or protein fragments This module calculates global and NNR conformation dependent pro

null 0 Jul 15, 2021
BART aids transcribe tasks by taking a source audio file and creating automatic repeated loops, allowing transcribers to listen to fragments multiple times

BART (Beyond Audio Replay Technology) aids transcribe tasks by taking a source audio file and creating automatic repeated loops, allowing transcribers to listen to fragments multiple times (with possible overlap between segments).

null 2 Feb 4, 2022
Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API

Dominate Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure

Tom Flanagan 1.5k Jan 9, 2023