Headless chrome/chromium automation library (unofficial port of puppeteer)

miyakogi

Last update: Dec 30, 2022

Related tags

Overview

Pyppeteer

Pyppeteer has moved to pyppeteer/pyppeteer

Unofficial Python port of puppeteer JavaScript (headless) chrome/chromium browser automation library.

Free software: MIT license (including the work distributed under the Apache 2.0 license)
Documentation: https://miyakogi.github.io/pyppeteer

Installation

Pyppeteer requires python 3.6+. (experimentally supports python 3.5)

Install by pip from PyPI:

python3 -m pip install pyppeteer

Or install latest version from github:

python3 -m pip install -U git+https://github.com/miyakogi/pyppeteer.git@dev

Usage

Note: When you run pyppeteer first time, it downloads a recent version of Chromium (~100MB). If you don't prefer this behavior, run pyppeteer-install command before running scripts which uses pyppeteer.

Example: open web page and take a screenshot.

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('http://example.com')
    await page.screenshot({'path': 'example.png'})
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

Example: evaluate script on the page.

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('http://example.com')
    await page.screenshot({'path': 'example.png'})

    dimensions = await page.evaluate('''() => {
        return {
            width: document.documentElement.clientWidth,
            height: document.documentElement.clientHeight,
            deviceScaleFactor: window.devicePixelRatio,
        }
    }''')

    print(dimensions)
    # >>> {'width': 800, 'height': 600, 'deviceScaleFactor': 1}
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

Pyppeteer has almost same API as puppeteer. More APIs are listed in the document.

Puppeteer's document and troubleshooting are also useful for pyppeteer users.

Differences between puppeteer and pyppeteer

Pyppeteer is to be as similar as puppeteer, but some differences between python and JavaScript make it difficult.

These are differences between puppeteer and pyppeteer.

Keyword arguments for options

Puppeteer uses object (dictionary in python) for passing options to functions/methods. Pyppeteer accepts both dictionary and keyword arguments for options.

Dictionary style option (similar to puppeteer):

browser = await launch({'headless': True})

Keyword argument style option (more pythonic, isn't it?):

browser = await launch(headless=True)

Element selector method name (`$` -> `querySelector`)

In python, $ is not usable for method name. So pyppeteer uses Page.querySelector()/Page.querySelectorAll()/Page.xpath() instead of Page.$()/Page.$$()/Page.$x(). Pyppeteer also has shorthands for these methods, Page.J(), Page.JJ(), and Page.Jx().

Arguments of `Page.evaluate()` and `Page.querySelectorEval()`

Puppeteer's version of evaluate() takes JavaScript raw function or string of JavaScript expression, but pyppeteer takes string of JavaScript. JavaScript strings can be function or expression. Pyppeteer tries to automatically detect the string is function or expression, but sometimes it fails. If expression string is treated as function and error is raised, add force_expr=True option, which force pyppeteer to treat the string as expression.

Example to get page content:

content = await page.evaluate('document.body.textContent', force_expr=True)

Example to get element's inner text:

element = await page.querySelector('h1')
title = await page.evaluate('(element) => element.textContent', element)

Future Plan

Catch up development of puppeteer
- Not intend to add original API which puppeteer does not have

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Comments

waitForNavigation doesn't work after clicking a link

# coding:utf-8
import asyncio
from pyppeteer import launch

async def main():
    brower = await launch(headless=False)
    page = await brower.newPage()
    await page.setViewport(dict(width=1200, height=1000))
    await page.goto("https://github.com")
    await page.click('.HeaderMenu [href="/features"]')
    await page.waitForNavigation()

asyncio.get_event_loop().run_until_complete(main())

https://github.com/features The page has been loaded but the navigation timed out

Traceback (most recent call last):
  File "D:/code/py3/pyppeteer/hello.py", line 42, in <module>
    asyncio.get_event_loop().run_until_complete(main())
  File "F:\python3\Lib\asyncio\base_events.py", line 467, in run_until_complete
    return future.result()
  File "D:/code/py3/pyppeteer/hello.py", line 14, in main
    await page.waitForNavigation()
  File "D:\code\py3\.venv\lib\site-packages\pyppeteer\page.py", line 698, in waitForNavigation
    raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms exceeded.

puppeteer solution

https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pageclickselector-options

const [response] = await Promise.all([
  page.waitForNavigation(waitOptions),
  page.click(selector, clickOptions),
]);

But how does python solve?

opened by zhanghaofei 12

Errors while using latest version of websocket
Hello. I was playing with sample code like this:

from pyppeteer import launch import logging logging.basicConfig(level=logging.DEBUG) async def main(): browser = launch(options={ 'headless': True, 'timeout': 10000, # Maximum time in milliseconds to wait for the browser instance to start }) url = 'http://httpbin.org/anything' page = await browser.newPage() response = await page.goto(url, options={ 'timeout': 3000, 'waitUntil': 'load'}) print('response status: {}'.format(response.status)) await browser.close() loop = asyncio.get_event_loop() loop.set_debug(enabled=True) loop.run_until_complete(main())

Using python 3.6 and websockets==3.3 it works without any troubles. But when I upgraded websocket to 4.0.1 - it starts falling with such message:

response status: 200 Task exception was never retrieved future: <Task finished coro=<Connection._recv_loop() done, defined at /usr/local/lib/python3.6/site-packages/pyppeteer/connection.py:47> exception=InvalidState('Cannot write to a WebSocket in the CLOSING state',)> Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/pyppeteer/connection.py", line 53, in _recv_loop resp = await self.connection.recv() File "/usr/local/lib/python3.6/site-packages/websockets/protocol.py", line 309, in recv loop=self.loop, return_when=asyncio.FIRST_COMPLETED) File "/usr/local/lib/python3.6/asyncio/tasks.py", line 307, in wait return (yield from _wait(fs, timeout, return_when, loop)) File "/usr/local/lib/python3.6/asyncio/tasks.py", line 390, in _wait yield from waiter concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/pyppeteer/connection.py", line 58, in _recv_loop break File "/usr/local/lib/python3.6/site-packages/websockets/client.py", line 390, in aexit yield from self.ws_client.close() File "/usr/local/lib/python3.6/site-packages/websockets/protocol.py", line 370, in close self.timeout, loop=self.loop) File "/usr/local/lib/python3.6/asyncio/tasks.py", line 352, in wait_for return fut.result() File "/usr/local/lib/python3.6/site-packages/websockets/protocol.py", line 642, in write_frame "in the {} state".format(self.state.name)) websockets.exceptions.InvalidState: Cannot write to a WebSocket in the CLOSING state

Any suggestions, how to handle this?
opened by kapkirl 12
add the isNavigationRequest attribute for request

Sorry to disturb you @miyakogi , I add the isNavigationRequest attribute for request, and I just changed the network_manager.py file. Unfortunately, all my requests are aborted! Looking forward to your reply.

opened by tigerfsh 10

How to kill process friendly

Hi, I've used pyppeteer in Scrapy as Download Middleware. But I find that it's difficult to stop the running Scrapy Project.

When I press 'Ctrl + C', the console will print this for per time. I press once, it prints once and never stop.

Like this:

2018-07-13 02:58:09 [asyncio] ERROR: Task exception was never retrieved
future: <Task finished coro=<Connection._async_send() done, defined at /usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyppeteer/connection.py:61> exception=ConnectionClosed('WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason',)>
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyppeteer/connection.py", line 64, in _async_send
    await self.connection.send(msg)
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/websockets/protocol.py", line 334, in send
    yield from self.ensure_open()
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/websockets/protocol.py", line 470, in ensure_open
    raise ConnectionClosed(self.close_code, self.close_reason)
websockets.exceptions.ConnectionClosed: WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason

I tried to except this exception in Download Middleware but it doesn't work.

and the __del__ function doesn't work too:

    def __del__(self):
        self.loop.close()

Here is my code:

https://github.com/Python3WebSpider/ScrapyCrawlSpider

and this is my Download Middleware:

https://github.com/Python3WebSpider/ScrapyCrawlSpider/blob/master/scrapycrawlspider/middlewares.py

you can run it with scrapy crawl china after installing the requirements.

Now I can't stop the Scrapy Project as usual, I wonder how to make it more friendly to stop the Project.

Hope you can help me, thanks~

opened by Germey 10

Chrome Hangs

When I run the screenshot sample code it attempts to download Chrome. But after downloading it hangs.

[W:pyppeteer.chromium_downloader] start chromium download. Download may take a few minutes. [W:pyppeteer.chromium_downloader] chromium download done. [W:pyppeteer.chromium_downloader] chromium extracted to: /Users/<username>/.pyppeteer/local-chromium/497674

It does download the Chrome app. If I kill the script and retry, it won't download the app again but it doesn't do anything either.

This is the trace when I ^C

^CTraceback (most recent call last): File "test.py", line 13, in <module> browser = launch() File "/usr/local/lib/python3.6/site-packages/pyppeteer/launcher.py", line 117, in launch return Launcher(options, **kwargs).launch() File "/usr/local/lib/python3.6/site-packages/pyppeteer/launcher.py", line 84, in launch time.sleep(0.1) KeyboardInterrupt

Any ideas?
bug

opened by NoahCardoza 10
Failed to connect to browser port

I try to use pyppeteer on centos 7.2 with google-chrome-stable but below error was raised

Traceback (most recent call last): File "test.py", line 12, in asyncio.get_event_loop().run_until_complete(main()) File "/usr/local/python3.6/lib/python3.6/asyncio/base_events.py", line 468, in run_until_complete return future.result() File "test.py", line 6, in main browser = await launch({"executablePath": "/usr/bin/google-chrome-stable"}) File "/usr/local/python3.6/lib/python3.6/site-packages/pyppeteer/launcher.py", line 243, in launch return await Launcher(options, **kwargs).launch() File "/usr/local/python3.6/lib/python3.6/site-packages/pyppeteer/launcher.py", line 160, in launch self.browserWSEndpoint = self._get_ws_endpoint() File "/usr/local/python3.6/lib/python3.6/site-packages/pyppeteer/launcher.py", line 178, in _get_ws_endpoint raise BrowserError(f'Failed to connect to browser port: {url}') pyppeteer.errors.BrowserError: Failed to connect to browser port: http://127.0.0.1:32776/json/version

python version: 3.6.5 pyppeteer: (0.0.17) And I use google-chrome-stable --headless --screenshot --no-sandbox http://example.com it works

opened by Elaoed 8
Don't add more things to ~/.

https://github.com/miyakogi/pyppeteer/blob/19db6ba043613143c958bae9324b392f2a06055b/pyppeteer/launcher.py#L34

Can you please use appdirs or equivalent to not create more dot-directories in my home? As an added bonus, this will DTRT on windows too.
enhancement

opened by habnabit 7

Chromium downloaded automatically but not found (Docker, alpine 3.6//3.7)

pyppeteer: 0.0.17

Dockerfile (https://github.com/imWildCat/scylla/blob/289c4626eaa4e00604d95d429d2f08b7487706d1/Dockerfile):


FROM python:3.6-alpine3.7 as build

RUN apk add --update --no-cache g++ gcc libxslt-dev make build-base curl-dev openssl-dev

RUN mkdir -p /var/www/scylla
WORKDIR /var/www/scylla

RUN pip install scylla

FROM python:3.6-alpine3.7

LABEL maintainer="WildCat <[email protected]>"

RUN apk add --update --no-cache libxslt-dev curl-dev openssl-dev

COPY --from=build /usr/local/lib/python3.6/site-packages/ /usr/local/lib/python3.6/site-packages/

WORKDIR /var/www/scylla
VOLUME /var/www/scylla

EXPOSE 8899
EXPOSE 8081

CMD python -m scylla

Error logs:

[W:pyppeteer.chromium_downloader] chromium extracted to: /root/.pyppeteer/local-chromium/543305
Process Process-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/scylla/scheduler.py", line 34, in fetch_ips
    html = worker.get_html(url, render_js=provider.should_render_js())
  File "/usr/local/lib/python3.6/site-packages/scylla/worker.py", line 47, in get_html
    response.html.render(wait=1.5, timeout=10.0)
  File "/usr/local/lib/python3.6/site-packages/requests_html.py", line 572, in render
    self.session.browser  # Automatycally create a event loop and browser
  File "/usr/local/lib/python3.6/site-packages/requests_html.py", line 680, in browser
    self._browser = self.loop.run_until_complete(pyppeteer.launch(headless=True, args=['--no-sandbox']))
  File "/usr/local/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.6/site-packages/pyppeteer/launcher.py", line 243, in launch
    return await Launcher(options, **kwargs).launch()
  File "/usr/local/lib/python3.6/site-packages/pyppeteer/launcher.py", line 141, in launch
    env=env,
  File "/usr/local/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/local/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/root/.pyppeteer/local-chromium/543305/chrome-linux/chrome': '/root/.pyppeteer/local-chromium/543305/chrome-linux/chrome'

Actually, the file is there:

~/.pyppeteer/local-chromium/543305/chrome-linux # ls -lh
total 239556
drwxr-xr-x    2 root     root        4.0K May 31 15:49 MEIPreload
-rwxr-xr-x    1 root     root      189.3M May 31 15:49 chrome
-rw-r--r--    1 root     root        5.0K May 31 15:49 chrome-wrapper
-rw-r--r--    1 root     root      802.2K May 31 15:49 chrome_100_percent.pak
-rw-r--r--    1 root     root        1.0M May 31 15:49 chrome_200_percent.pak
-rw-r--r--    1 root     root       22.7K May 31 15:49 chrome_sandbox
-rw-r--r--    1 root     root        9.7M May 31 15:49 icudtl.dat
-rw-r--r--    1 root     root      359.0K May 31 15:49 libEGL.so
-rw-r--r--    1 root     root        6.1M May 31 15:49 libGLESv2.so
-rw-r--r--    1 root     root        2.6M May 31 15:49 libclearkeycdm.so
drwxr-xr-x    2 root     root        4.0K May 31 15:49 locales
-rw-r--r--    1 root     root        3.2M May 31 15:49 nacl_helper
-rw-r--r--    1 root     root        9.5K May 31 15:49 nacl_helper_bootstrap
-rw-r--r--    1 root     root        3.5M May 31 15:49 nacl_helper_nonsfi
-rw-r--r--    1 root     root        3.5M May 31 15:49 nacl_irt_x86_64.nexe
-rw-r--r--    1 root     root      171.0K May 31 15:49 natives_blob.bin
-rw-r--r--    1 root     root        2.5K May 31 15:49 product_logo_48.png
drwxr-xr-x    3 root     root        4.0K May 31 15:49 resources
-rw-r--r--    1 root     root       12.0M May 31 15:49 resources.pak
drwxr-xr-x    2 root     root        4.0K May 31 15:49 swiftshader
-rw-r--r--    1 root     root        1.7M May 31 15:49 v8_context_snapshot.bin
-rw-r--r--    1 root     root       36.5K May 31 15:49 xdg-mime
-rw-r--r--    1 root     root       32.5K May 31 15:49 xdg-settings
~/.pyppeteer/local-chromium/543305/chrome-linux #

opened by imWildCat 7

Browser does not start in Docker

Hello and thank you for your awesome package! I tried it locally on my ubuntu, liked it a lot, now trying to add it to my project's intergration testing facilities and experiencing basically the next minified issue:

I guess it might require installation of some additional Debian (because python:3.6 docker image is based on Debian 8) packages or env variables but I don't know which ones exactly.

Please help me:)

opened by scythargon 7
cannot import name 'launch'

Hi,

I'm trying to run the example but got this error:

Traceback (most recent call last): File "pyppeteer.py", line 2, in from pyppeteer import launch File "test\pyppeteer.py", line 2, in from pyppeteer import launch ImportError: cannot import name 'launch'

Tried and failed on both Mac and Windows.

opened by deathemperor 7

pyppeteer.errors.NetworkError: Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.

possibly related to #175

pyppeteer is losing connection to chromium browser (maybe because of something that happens inside chromium?). However, when we run the same logic on node.js, we get no erors.

I have attached a script to replicate the issue (apologies, it's not cleaned up - a discovery code afaik)

Here is the output of the script:

[I:pyppeteer.launcher] Browser listening on: ws://127.0.0.1:51578/devtools/browser/a2c27856-5b78-490b-a2a4-cce6c03f14fb
ws://127.0.0.1:51578/devtools/browser/a2c27856-5b78-490b-a2a4-cce6c03f14fb
1 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4726N/abstract 55657
2 https://ui.adsabs.harvard.edu/#abs/2019NewA...66...20P/abstract 56716
3 https://ui.adsabs.harvard.edu/#abs/2019NewA...66...40N/abstract 53318
4 https://ui.adsabs.harvard.edu/#abs/2019JMoSt1177..418K/abstract 57338
5 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4372C/abstract 56445
6 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4726N/abstract 56319
7 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5167S/abstract 58452
8 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5459M/abstract 58754
9 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5567M/abstract 59967
10 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483..458B/abstract 69867
11 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483..529C/abstract 67922
12 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483L..47C/abstract 54674
13 https://ui.adsabs.harvard.edu/#abs/2019NewA...67....1N/abstract 62676
14 https://ui.adsabs.harvard.edu/#abs/2019NewA...68...51M/abstract 55099
15 https://ui.adsabs.harvard.edu/#abs/2019PhyE..107....5B/abstract 54913
[I:pyppeteer.connection] connection closed
DISCONNECTED
Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.
Traceback (most recent call last):
  File "client.py", line 53, in main
    }''')
  File "/dvt/workspace2/ADSTurboBee/python3.7/lib/python3.7/site-packages/pyppeteer/page.py", line 1158, in evaluate
    return await frame.evaluate(pageFunction, *args, force_expr=force_expr)
  File "/dvt/workspace2/ADSTurboBee/python3.7/lib/python3.7/site-packages/pyppeteer/frame_manager.py", line 295, in evaluate
    pageFunction, *args, force_expr=force_expr)
  File "/dvt/workspace2/ADSTurboBee/python3.7/lib/python3.7/site-packages/pyppeteer/execution_context.py", line 55, in evaluate
    pageFunction, *args, force_expr=force_expr)
  File "/dvt/workspace2/ADSTurboBee/python3.7/lib/python3.7/site-packages/pyppeteer/execution_context.py", line 109, in evaluateHandle
    _rewriteError(e)
  File "/dvt/workspace2/ADSTurboBee/python3.7/lib/python3.7/site-packages/pyppeteer/execution_context.py", line 239, in _rewriteError
    raise error
  File "/dvt/workspace2/ADSTurboBee/python3.7/lib/python3.7/site-packages/pyppeteer/execution_context.py", line 106, in evaluateHandle
    'userGesture': True,
  File "/dvt/workspace2/ADSTurboBee/python3.7/lib/python3.7/site-packages/pyppeteer/connection.py", line 218, in send
    f'Protocol Error ({method}): Session closed. Most likely the '
pyppeteer.errors.NetworkError: Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.

Fatal error https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5553J/abstract
Reconnected ws://127.0.0.1:51578/devtools/browser/a2c27856-5b78-490b-a2a4-cce6c03f14fb
[<pyppeteer.page.Page object at 0x7fe4a9784390>, <pyppeteer.page.Page object at 0x7fe4a9784588>]

9 out of 10 times, I get the error after 15 urls were loaded. This seems to suggest some sort of a timeout (underlaying websocket?)

I tried running with verbose logs, but can't see anything out of ordinary (around the time the DISCONNECT event happens)


import time 
import asyncio
import signal
from pyppeteer import launch, connect
from functools import partial
import traceback
import logging
import sys
pyppeteer_level = logging.INFO
logging.getLogger('pyppeteer').setLevel(pyppeteer_level)
logging.getLogger('websockets.protocol').setLevel(pyppeteer_level)

urls = ['https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4726N/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4726N/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019NewA...66...20P/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019NewA...66...31M/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019NewA...66...40N/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019AcSpA.209..264M/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019JMoSt1177..418K/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4364C/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4372C/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4422C/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4726N/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4985A/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5167S/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5349R/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5459M/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5553J/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5567M/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483..392D/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483..458B/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483..711A/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483..529C/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483..840B/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483L..47C/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483L..64D/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019NewA...67....1N/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019NewA...67...45V/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019NewA...68...51M/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019IJAEO..75...15B/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019PhyE..107....5B/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019SurSc.681...32E/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019CNSNS..70...89J/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019CNSNS..70..223O/abstract', 'https://ui.adsabs.harvard.edu/#abs/2015AIPC.1672m0003H/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.1858Y/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019Icar..319....1A/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.1786L/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.1733G/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5018E/abstract']


def signal_handler(signal, frame):  
    while len(urls):
        urls.pop()

signal.signal(signal.SIGINT, signal_handler)

loop = asyncio.get_event_loop()

async def get_browser():
    # two errs, 16 batch: '--disable-dev-shm-usage'
    # identical to previous: '--shm-size=1gb'
    # identical: no args
    # '--no-sandbox', '--disable-setuid-sandbox' : no diff
    b = await launch(options={'headless': True, 'waitUntil': ['load', 'domcontentloaded', 'networkidle0'], 'args': ['--enable-popup-blocking']})
    b.on('disconnected', lambda: sys.stderr.write("DISCONNECTED\n"))
    return b

async def main():
    browser = await get_browser()
    print (browser.wsEndpoint)
    wsEndpoint = browser.wsEndpoint
    errc = 0

    while len(urls):
        try:
            page = await browser.newPage()
            page.once('error', lambda: print('ERROR'))
            page.once('pageerror', lambda: print('PAGEERROR'))
            page.once('requestfailed', lambda: print('REQUESTFAILED'))
            i = 0
            for u in urls:
                await page.goto(u, options={'waitUntil':['load', 'domcontentloaded', 'networkidle0'], 'timeout': 30000})
                await asyncio.sleep(1)
                content = await page.evaluate('''() => {
                        return {
                            html: document.documentElement.outerHTML
                        }    
                    }''')
                i+=1
                print(i, u, len(content['html']))
                bibc = u.split('/')[-2]
                fo = open(bibc + '.html', 'w')
                html = content['html']
                fo.write(html.replace('<meta charset="utf-8">', '<meta charset="utf-8"><base href="https://ui.adsabs.harvard.edu/" />'))
                fo.close()

                # results in: NetworkError: Protocol Error (Page.navigate): Session closed. Most likely the page has been closed.
                # on next page.goto
                # await page.close()
                urls.pop(0)


        except Exception as e:
            await asyncio.sleep(1)
            errc += 1
            print (e)
            print(traceback.format_exc())
            print ('Fatal error', urls.pop(0))
            browser = await connect(options={'browserWSEndpoint': wsEndpoint})
            print('Reconnected ' + browser.wsEndpoint)
            #await browser.close()
            print(await browser.pages())
            await asyncio.sleep(1)
            #browser = await launch(headless = False, waitUntil='networkidle2', args=['--disable-dev-shm-usage'])
            browser = await get_browser()
            
    await browser.close()
    print ('num_errors={}'.format(errc))

loop.run_until_complete(main())

opened by romanchyla 6

>> REPOSITORY ABANDONED >> use pyppeteer2 instead
A community is maintaining a clone of the repo at under the pyppeteer organization.

New repo: https://github.com/pyppeteer/pyppeteer2 Checkout Issue: https://github.com/pyppeteer/pyppeteer2/issues/1

Quick migration:

pip uninstall pyppeteer pip install pyppeteer2

pyppeteer2 uses the same module name so migration is easy.
opened by cgarciae 1

Execution context was destroyed, most likely because of a navigation.

Hi, I have a webpage having a button to go to next page and then I want to click it. I follow the document written in https://miyakogi.github.io/pyppeteer/reference.html and use the following pattern:

await asyncio.gather(
    page.waitForNavigation(),
    page.click("#foobar"),
)

Most of time, it works fine. But occasionally, I will get the following errors:

  File "/usr/local/lib/python3.7/dist-packages/pyppeteer/page.py", line 1548, in click
    await frame.click(selector, options, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pyppeteer/frame_manager.py", line 581, in click
    handle = await self.J(selector)
  File "/usr/local/lib/python3.7/dist-packages/pyppeteer/frame_manager.py", line 317, in querySelector
    value = await document.querySelector(selector)
  File "/usr/local/lib/python3.7/dist-packages/pyppeteer/element_handle.py", line 360, in querySelector
    self, selector,
  File "/usr/local/lib/python3.7/dist-packages/pyppeteer/execution_context.py", line 108, in evaluateHandle
    _rewriteError(e)
  File "/usr/local/lib/python3.7/dist-packages/pyppeteer/execution_context.py", line 237, in _rewriteError
    raise type(error)(msg)
pyppeteer.errors.NetworkError: Execution context was destroyed, most likely because of a navigation.

Any idea or workaround?

I am using the following version of pyppeteer:

$ python3.7 -m pip list | grep pyppeteer
pyppeteer (0.0.25)

I think it is a related issue: https://github.com/puppeteer/puppeteer/issues/5056

opened by wonghang 2

Page goto returns None

Hi, For some URLs, the goto method does not fail, but still returns None.

For example:

http://www.swisscamps.ch/de/index.php
http://www.whisky-club-oberwallis.ch

Here is a minimal example (using pyppeteer 0.0.25 and Python 3.7.4):

import asyncio
from pyppeteer import launch

urls = [
    'http://www.swisscamps.ch/de/index.php',
    'http://www.whisky-club-oberwallis.ch/brennereien']

async def main():
    browser = await launch(headless=True)
    page = await browser.newPage()
    for url in urls:
        response = await page.goto(url, waitUntil='networkidle0')
        print(url, response)
    await page.close()
    await browser.close()

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Output:

http://www.swisscamps.ch/de/index.php None
http://www.whisky-club-oberwallis.ch None

Using curl or the chrome browser, I can still see the response headers and status codes. Any idea where this comes from / how to fix it ?

opened by derlin 2

can we use the cookies in python request from puppeteer cookies?

I am trying to save cookies from rendering through puppeteer and use in python-requests. but i am not able to do? please help me if you anyone have solution ?

opened by abhay-kum 1

Owner

miyakogi

python, vim

GitHub

Free cleverbot without headless browser

Cleverbot Scraper Simple free cleverbot library that doesn't require running a heavy ram wasting headless web browser to actually chat with the bot, a

3 Sep 25, 2022

Python 3 wrapper of Microsoft UIAutomation. Support UIAutomation for MFC, WindowsForm, WPF, Modern UI(Metro UI), Qt, IE, Firefox, Chrome ...

1.6k Dec 29, 2022

A simple python script that uses selenium(chrome web driver),pyautogui,time and schedule modules to enter google meets automatically

3 Feb 7, 2022

Selenium-python but lighter: Helium is the best Python library for web automation.

Selenium-python but lighter: Helium Selenium-python is great for web automation. Helium makes it easier to use. For example: Under the hood, Helium fo

3.2k Dec 31, 2022

Python version of the Playwright testing and automation library.

?? Playwright for Python Docs | API Playwright is a Python library to automate Chromium, Firefox and WebKit browsers with a single API. Playwright del

7.8k Jan 2, 2023

A simple asynchronous TCP/IP Connect Port Scanner in Python 3

Python 3 Asynchronous TCP/IP Connect Port Scanner A simple pure-Python TCP Connect port scanner. This application leverages the use of Python's Standa

70 Jan 3, 2023

Divide full port scan results and use it for targeted Nmap runs

Divide Et Impera And Scan (and also merge the scan results) DivideAndScan is used to efficiently automate port scanning routine by splitting it into 3

226 Dec 30, 2022

Generic automation framework for acceptance testing and RPA

Robot Framework Introduction Installation Example Usage Documentation Support and contact Contributing License Introduction Robot Framework is a gener

7.7k Jan 7, 2023

A cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard.

PyAutoGUI PyAutoGUI is a cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard. pip inst

7.5k Dec 31, 2022

A folder automation made using Watch-dog, it only works in linux for now but I assume, it will be adaptable to mac and PC as well

folder-automation A folder automation made using Watch-dog, it only works in linux for now but I assume, it will be adaptable to mac and PC as well Th

31 May 28, 2021

PyAutoEasy is a extension / wrapper around the famous PyAutoGUI, a cross-platform GUI automation tool to replace your boooring repetitive tasks.

PyAutoEasy PyAutoEasy is a extension / wrapper around the famous PyAutoGUI, a cross-platform GUI automation tool to replace your boooring repetitive t

7 Oct 27, 2022

Headless chrome/chromium automation library (unofficial port of puppeteer)

Related tags

Overview

Pyppeteer

Pyppeteer has moved to pyppeteer/pyppeteer

Installation

Usage

Differences between puppeteer and pyppeteer

Keyword arguments for options

Element selector method name ($ -> querySelector)

Arguments of Page.evaluate() and Page.querySelectorEval()

Future Plan

Credits

Comments

puppeteer solution

Owner

miyakogi

Free cleverbot without headless browser

Python 3 wrapper of Microsoft UIAutomation. Support UIAutomation for MFC, WindowsForm, WPF, Modern UI(Metro UI), Qt, IE, Firefox, Chrome ...

A simple python script that uses selenium(chrome web driver),pyautogui,time and schedule modules to enter google meets automatically

Selenium-python but lighter: Helium is the best Python library for web automation.

Python version of the Playwright testing and automation library.

A simple asynchronous TCP/IP Connect Port Scanner in Python 3

Divide full port scan results and use it for targeted Nmap runs

Generic automation framework for acceptance testing and RPA

A cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard.

A folder automation made using Watch-dog, it only works in linux for now but I assume, it will be adaptable to mac and PC as well

A browser automation framework and ecosystem.

✅ Python web automation and testing. 🚀 Fast, easy, reliable. 💠

A complete test automation tool

Integration layer between Requests and Selenium for automation of web actions.

Command line driven CI frontend and development task automation tool.

Flexible test automation for Python

Network automation lab using nornir, scrapli, and containerlab with Arista EOS

Nokia SR OS automation

PyAutoEasy is a extension / wrapper around the famous PyAutoGUI, a cross-platform GUI automation tool to replace your boooring repetitive tasks.

Element selector method name (`$` -> `querySelector`)

Arguments of `Page.evaluate()` and `Page.querySelectorEval()`