119 Python PAGE-XML Libraries

Deep learning based page layout analysis

Deep Learning Based Page Layout Analyze This is a Python implementaion of page layout analyze tool. The goal of page layout analyze is to segment page

186 Dec 29, 2022

Text page dewarping using a "cubic sheet" model

page_dewarp Page dewarping and thresholding using a "cubic sheet" model - see full writeup at https://mzucker.github.io/2016/08/15/page-dewarping.html

1.2k Dec 29, 2022

PubMed Mapper: A Python library that map PubMed XML to Python object

pubmed-mapper: A Python Library that map PubMed XML to Python object 中文文档 1. Philosophy view UML Programmatically access PubMed article is a common ta

33 Dec 8, 2022

A Python toolkit for processing tabular data

401 Dec 19, 2022

Generate a cool README/About me page for your Github Profile

Github Profile README/ About Me Generator 💯 This webapp lets you build a cool README for your profile. A few inputs + ~15 mins = Your Github Profile

179 Jan 7, 2023

changedetection.io - The best and simplest self-hosted website change detection monitoring service

changedetection.io - The best and simplest self-hosted website change detection monitoring service. An alternative to Visualping, Watchtower etc. Designed for simplicity - the main goal is to simply monitor which websites had a text change. Open source web page change detection.

7.3k Jan 1, 2023

Module for automatic summarization of text documents and HTML pages.

Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim

2.5k Feb 17, 2021

Module for automatic summarization of text documents and HTML pages.

Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim

3k Jan 8, 2023

An MkDocs plugin that simplifies configuring page titles and their order

MkDocs Awesome Pages Plugin An MkDocs plugin that simplifies configuring page titles and their order The awesome-pages plugin allows you to customize

282 Dec 28, 2022

Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

trafilatura: Web scraping tool for text discovery and retrieval Description Trafilatura is a Python package and command-line tool which seamlessly dow

704 Jan 6, 2023

A Python module to bypass Cloudflare's anti-bot page.

cloudscraper A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Requests.

2.6k Dec 31, 2022

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Parsel Parsel is a BSD-licensed Python library to extract and remove data from HTML and XML using XPath and CSS selectors, optionally combined with re

859 Dec 29, 2022

A modern CSS selector implementation for BeautifulSoup

Soup Sieve Overview Soup Sieve is a CSS selector library designed to be used with Beautiful Soup 4. It aims to provide selecting, matching, and filter

151 Dec 23, 2022

The lxml XML toolkit for Python

What is lxml? lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language. It's also very fast and memory

2.3k Jan 2, 2023

Module for automatic summarization of text documents and HTML pages.

Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim

3k Jan 3, 2023

Create Open XML PowerPoint documents in Python

python-pptx is a Python library for creating and updating PowerPoint (.pptx) files. A typical use would be generating a customized PowerPoint presenta

1.7k Jan 5, 2023

Python module that makes working with XML feel like you are working with JSON

xmltodict xmltodict is a Python module that makes working with XML feel like you are working with JSON, as in this "spec": print(json.dumps(xmltod

5k Jan 4, 2023

Converts XML to Python objects

untangle Documentation Converts XML to a Python object. Siblings with similar names are grouped into a list. Children can be accessed with parent.chil

567 Nov 30, 2022

Safely add untrusted strings to HTML/XML markup.

MarkupSafe MarkupSafe implements a text object that escapes characters so it is safe to use in HTML and XML. Characters that have special meanings are

514 Dec 31, 2022

Python PAGE-XML Resources

Python PAGE-XML Libraries

Deep learning based page layout analysis

Text page dewarping using a "cubic sheet" model

PubMed Mapper: A Python library that map PubMed XML to Python object

A Python toolkit for processing tabular data

Generate a cool README/About me page for your Github Profile

changedetection.io - The best and simplest self-hosted website change detection monitoring service

Module for automatic summarization of text documents and HTML pages.

Module for automatic summarization of text documents and HTML pages.

An MkDocs plugin that simplifies configuring page titles and their order

Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

A Python module to bypass Cloudflare's anti-bot page.

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

A modern CSS selector implementation for BeautifulSoup

The lxml XML toolkit for Python

Module for automatic summarization of text documents and HTML pages.

Create Open XML PowerPoint documents in Python

Python module that makes working with XML feel like you are working with JSON

Converts XML to Python objects

Safely add untrusted strings to HTML/XML markup.

Related tags