Overview
Robox is a simple library with a clean interface for exploring/scraping the web or testing a website you’re developing. Robox can fetch a page, click on links and buttons, and fill out and submit forms. Robox is built on top of two excelent libraries: httpx and beautifulsoup4.
Robox has all the standard features of httpx, including async, plus:
- clean api
- caching
- downloading files
- history
- understands robots.txt
Examples
from robox import Robox
with Robox() as robox:
page = robox.open("https://httpbin.org/forms/post")
form = page.get_form()
form.fill_in("custname", value="foo")
form.check("topping", values=["Onion"])
form.choose("size", option="Medium")
form.fill_in("comments", value="all good in the hood")
form.fill_in("delivery", value="13:37")
page = page.submit_form(form)
assert page.url == "https://httpbin.org/post"
or use async version:
import asyncio
from pprint import pprint
from robox import AsyncRobox
async def main():
async with AsyncRobox(follow_redirects=True) as robox:
page = await robox.open("https://www.google.com")
form = page.get_form()
form.fill_in("q", value="python")
consent_page = await page.submit_form(form)
form = consent_page.get_form()
page = await consent_page.submit_form(form)
pprint(list(page.get_links_by_text("python")))
asyncio.run(main())
Caching can be easily configured via httpx-cache
from robox import Robox, DictCache, FileCache
with Robox(cache=DictCache()) as robox:
p1 = robox.open("https://httpbin.org/get")
assert not p1.from_cache
p2 = robox.open("https://httpbin.org/get")
assert p2.from_cache
See examples folder for more detailed examples.
Installation
Using pip:
pip install robox
Robox requires Python 3.8+. See Changelog for changes.