Python Web Content Extracting Libraries

Filter By

Python Web Content Extracting

Every web site provides APIs.

Toapi Overview Toapi give you the ability to make every web site provides APIs. Version v2.0.0, Completely rewrote. More elegant. More pythonic v1.0.0

3.3k Jan 5, 2023

Module for automatic summarization of text documents and HTML pages.

Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim

3k Jan 3, 2023

Github Actions采集RSS, 打造无广告内容优质的头版头条超赞宝藏页

Github Actions Rss (garss, 嘎RSS! 已收集69个RSS源, 生成时间: 2021-02-26 11:23:45) 信息茧房是指人们关注的信息领域会习惯性地被自己的兴趣所引导，从而将自己的生活桎梏于像蚕茧一般的“茧房”中的现象。

721 Jan 2, 2023

RSS feed generator website with user friendly interface

331 Jan 2, 2023

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Newspaper3k: Article scraping & curation Inspired by requests for its simplicity and powered by lxml for its speed: "Newspaper is an amazing python li

12.3k Jan 1, 2023

Pythonic HTML Parsing for Humans™

Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When us

12.9k Jan 1, 2023

Convert HTML to Markdown-formatted text.

html2text html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to

1.3k Dec 31, 2022

Open clone of OpenAI's unreleased WebText dataset scraper.

Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed.

471 Dec 30, 2022

Web Content Retrieval for Humans™

Lassie Lassie is a Python library for retrieving basic content from websites. Usage import lassie lassie.fetch('http://www.youtube.com/watch?v

571 Dec 29, 2022

fast python port of arc90's readability tool, updated to match latest readability.js!

python-readability Given a html document, it pulls out the main body text and cleans it up. This is a python port of a ruby port of arc90's readabilit

2.2k Dec 28, 2022

a small library for extracting rich content from urls

A small library for extracting rich content from urls. what does it do? micawber supplies a few methods for retrieving rich metadata about a variety o