Headless chrome/chromium automation library (unofficial port of puppeteer)

Overview

Pyppeteer

Pyppeteer has moved to pyppeteer/pyppeteer


PyPI PyPI version Documentation Travis status AppVeyor status codecov

Unofficial Python port of puppeteer JavaScript (headless) chrome/chromium browser automation library.

Installation

Pyppeteer requires python 3.6+. (experimentally supports python 3.5)

Install by pip from PyPI:

python3 -m pip install pyppeteer

Or install latest version from github:

python3 -m pip install -U git+https://github.com/miyakogi/pyppeteer.git@dev

Usage

Note: When you run pyppeteer first time, it downloads a recent version of Chromium (~100MB). If you don't prefer this behavior, run pyppeteer-install command before running scripts which uses pyppeteer.

Example: open web page and take a screenshot.

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('http://example.com')
    await page.screenshot({'path': 'example.png'})
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

Example: evaluate script on the page.

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('http://example.com')
    await page.screenshot({'path': 'example.png'})

    dimensions = await page.evaluate('''() => {
        return {
            width: document.documentElement.clientWidth,
            height: document.documentElement.clientHeight,
            deviceScaleFactor: window.devicePixelRatio,
        }
    }''')

    print(dimensions)
    # >>> {'width': 800, 'height': 600, 'deviceScaleFactor': 1}
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

Pyppeteer has almost same API as puppeteer. More APIs are listed in the document.

Puppeteer's document and troubleshooting are also useful for pyppeteer users.

Differences between puppeteer and pyppeteer

Pyppeteer is to be as similar as puppeteer, but some differences between python and JavaScript make it difficult.

These are differences between puppeteer and pyppeteer.

Keyword arguments for options

Puppeteer uses object (dictionary in python) for passing options to functions/methods. Pyppeteer accepts both dictionary and keyword arguments for options.

Dictionary style option (similar to puppeteer):

browser = await launch({'headless': True})

Keyword argument style option (more pythonic, isn't it?):

browser = await launch(headless=True)

Element selector method name ($ -> querySelector)

In python, $ is not usable for method name. So pyppeteer uses Page.querySelector()/Page.querySelectorAll()/Page.xpath() instead of Page.$()/Page.$$()/Page.$x(). Pyppeteer also has shorthands for these methods, Page.J(), Page.JJ(), and Page.Jx().

Arguments of Page.evaluate() and Page.querySelectorEval()

Puppeteer's version of evaluate() takes JavaScript raw function or string of JavaScript expression, but pyppeteer takes string of JavaScript. JavaScript strings can be function or expression. Pyppeteer tries to automatically detect the string is function or expression, but sometimes it fails. If expression string is treated as function and error is raised, add force_expr=True option, which force pyppeteer to treat the string as expression.

Example to get page content:

content = await page.evaluate('document.body.textContent', force_expr=True)

Example to get element's inner text:

element = await page.querySelector('h1')
title = await page.evaluate('(element) => element.textContent', element)

Future Plan

  1. Catch up development of puppeteer
    • Not intend to add original API which puppeteer does not have

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Comments
  • waitForNavigation doesn't work after clicking a link

    waitForNavigation doesn't work after clicking a link

    # coding:utf-8
    import asyncio
    from pyppeteer import launch
    
    async def main():
        brower = await launch(headless=False)
        page = await brower.newPage()
        await page.setViewport(dict(width=1200, height=1000))
        await page.goto("https://github.com")
        await page.click('.HeaderMenu [href="/features"]')
        await page.waitForNavigation()
    
    asyncio.get_event_loop().run_until_complete(main())
    
    

    https://github.com/features The page has been loaded but the navigation timed out

    Traceback (most recent call last):
      File "D:/code/py3/pyppeteer/hello.py", line 42, in <module>
        asyncio.get_event_loop().run_until_complete(main())
      File "F:\python3\Lib\asyncio\base_events.py", line 467, in run_until_complete
        return future.result()
      File "D:/code/py3/pyppeteer/hello.py", line 14, in main
        await page.waitForNavigation()
      File "D:\code\py3\.venv\lib\site-packages\pyppeteer\page.py", line 698, in waitForNavigation
        raise error
    pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms exceeded.
    

    puppeteer solution

    https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pageclickselector-options

    const [response] = await Promise.all([
      page.waitForNavigation(waitOptions),
      page.click(selector, clickOptions),
    ]);
    

    But how does python solve?

    opened by zhanghaofei 12
  • Errors while using latest version of websocket

    Errors while using latest version of websocket

    Hello. I was playing with sample code like this:

    from pyppeteer import launch
    import logging
    logging.basicConfig(level=logging.DEBUG)
    
    async def main():
        browser = launch(options={
            'headless': True,
            'timeout': 10000,  # Maximum time in milliseconds to wait for the browser instance to start
        })
        url = 'http://httpbin.org/anything'
        page = await browser.newPage()
    
        response = await page.goto(url, options={
            'timeout': 3000,
            'waitUntil': 'load'})
        print('response status: {}'.format(response.status))
        await browser.close()
    
    loop = asyncio.get_event_loop()
    loop.set_debug(enabled=True)
    loop.run_until_complete(main())
    

    Using python 3.6 and websockets==3.3 it works without any troubles. But when I upgraded websocket to 4.0.1 - it starts falling with such message:

    response status: 200 Task exception was never retrieved future: <Task finished coro=<Connection._recv_loop() done, defined at /usr/local/lib/python3.6/site-packages/pyppeteer/connection.py:47> exception=InvalidState('Cannot write to a WebSocket in the CLOSING state',)> Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/pyppeteer/connection.py", line 53, in _recv_loop resp = await self.connection.recv() File "/usr/local/lib/python3.6/site-packages/websockets/protocol.py", line 309, in recv loop=self.loop, return_when=asyncio.FIRST_COMPLETED) File "/usr/local/lib/python3.6/asyncio/tasks.py", line 307, in wait return (yield from _wait(fs, timeout, return_when, loop)) File "/usr/local/lib/python3.6/asyncio/tasks.py", line 390, in _wait yield from waiter concurrent.futures._base.CancelledError

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/pyppeteer/connection.py", line 58, in _recv_loop break File "/usr/local/lib/python3.6/site-packages/websockets/client.py", line 390, in aexit yield from self.ws_client.close() File "/usr/local/lib/python3.6/site-packages/websockets/protocol.py", line 370, in close self.timeout, loop=self.loop) File "/usr/local/lib/python3.6/asyncio/tasks.py", line 352, in wait_for return fut.result() File "/usr/local/lib/python3.6/site-packages/websockets/protocol.py", line 642, in write_frame "in the {} state".format(self.state.name)) websockets.exceptions.InvalidState: Cannot write to a WebSocket in the CLOSING state

    Any suggestions, how to handle this?

    opened by kapkirl 12
  • add the isNavigationRequest attribute for request

    add the isNavigationRequest attribute for request

    Sorry to disturb you @miyakogi , I add the isNavigationRequest attribute for request, and I just changed the network_manager.py file. Unfortunately, all my requests are aborted! Looking forward to your reply.

    opened by tigerfsh 10
  • How to kill process friendly

    How to kill process friendly

    Hi, I've used pyppeteer in Scrapy as Download Middleware. But I find that it's difficult to stop the running Scrapy Project.

    When I press 'Ctrl + C', the console will print this for per time. I press once, it prints once and never stop.

    Like this:

    2018-07-13 02:58:09 [asyncio] ERROR: Task exception was never retrieved
    future: <Task finished coro=<Connection._async_send() done, defined at /usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyppeteer/connection.py:61> exception=ConnectionClosed('WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason',)>
    Traceback (most recent call last):
      File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyppeteer/connection.py", line 64, in _async_send
        await self.connection.send(msg)
      File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/websockets/protocol.py", line 334, in send
        yield from self.ensure_open()
      File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/websockets/protocol.py", line 470, in ensure_open
        raise ConnectionClosed(self.close_code, self.close_reason)
    websockets.exceptions.ConnectionClosed: WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason
    

    I tried to except this exception in Download Middleware but it doesn't work.

    and the __del__ function doesn't work too:

        def __del__(self):
            self.loop.close()
    

    Here is my code:

    https://github.com/Python3WebSpider/ScrapyCrawlSpider

    and this is my Download Middleware:

    https://github.com/Python3WebSpider/ScrapyCrawlSpider/blob/master/scrapycrawlspider/middlewares.py

    you can run it with scrapy crawl china after installing the requirements.

    Now I can't stop the Scrapy Project as usual, I wonder how to make it more friendly to stop the Project.

    Hope you can help me, thanks~

    opened by Germey 10
  • Chrome Hangs

    Chrome Hangs

    When I run the screenshot sample code it attempts to download Chrome. But after downloading it hangs.

    [W:pyppeteer.chromium_downloader] start chromium download. Download may take a few minutes. [W:pyppeteer.chromium_downloader] chromium download done. [W:pyppeteer.chromium_downloader] chromium extracted to: /Users/<username>/.pyppeteer/local-chromium/497674

    It does download the Chrome app. If I kill the script and retry, it won't download the app again but it doesn't do anything either.

    This is the trace when I ^C

    ^CTraceback (most recent call last): File "test.py", line 13, in <module> browser = launch() File "/usr/local/lib/python3.6/site-packages/pyppeteer/launcher.py", line 117, in launch return Launcher(options, **kwargs).launch() File "/usr/local/lib/python3.6/site-packages/pyppeteer/launcher.py", line 84, in launch time.sleep(0.1) KeyboardInterrupt

    Any ideas?

    bug 
    opened by NoahCardoza 10
  • Failed to connect to browser port

    Failed to connect to browser port

    I try to use pyppeteer on centos 7.2 with google-chrome-stable but below error was raised

    Traceback (most recent call last): File "test.py", line 12, in asyncio.get_event_loop().run_until_complete(main()) File "/usr/local/python3.6/lib/python3.6/asyncio/base_events.py", line 468, in run_until_complete return future.result() File "test.py", line 6, in main browser = await launch({"executablePath": "/usr/bin/google-chrome-stable"}) File "/usr/local/python3.6/lib/python3.6/site-packages/pyppeteer/launcher.py", line 243, in launch return await Launcher(options, **kwargs).launch() File "/usr/local/python3.6/lib/python3.6/site-packages/pyppeteer/launcher.py", line 160, in launch self.browserWSEndpoint = self._get_ws_endpoint() File "/usr/local/python3.6/lib/python3.6/site-packages/pyppeteer/launcher.py", line 178, in _get_ws_endpoint raise BrowserError(f'Failed to connect to browser port: {url}') pyppeteer.errors.BrowserError: Failed to connect to browser port: http://127.0.0.1:32776/json/version

    python version: 3.6.5 pyppeteer: (0.0.17) And I use google-chrome-stable --headless --screenshot --no-sandbox http://example.com it works

    opened by Elaoed 8
  • Don't add more things to ~/.

    Don't add more things to ~/.

    https://github.com/miyakogi/pyppeteer/blob/19db6ba043613143c958bae9324b392f2a06055b/pyppeteer/launcher.py#L34

    Can you please use appdirs or equivalent to not create more dot-directories in my home? As an added bonus, this will DTRT on windows too.

    enhancement 
    opened by habnabit 7
  • Chromium downloaded automatically but not found (Docker, alpine 3.6//3.7)

    Chromium downloaded automatically but not found (Docker, alpine 3.6//3.7)

    pyppeteer: 0.0.17

    Dockerfile (https://github.com/imWildCat/scylla/blob/289c4626eaa4e00604d95d429d2f08b7487706d1/Dockerfile):

    
    FROM python:3.6-alpine3.7 as build
    
    RUN apk add --update --no-cache g++ gcc libxslt-dev make build-base curl-dev openssl-dev
    
    RUN mkdir -p /var/www/scylla
    WORKDIR /var/www/scylla
    
    RUN pip install scylla
    
    FROM python:3.6-alpine3.7
    
    LABEL maintainer="WildCat <[email protected]>"
    
    RUN apk add --update --no-cache libxslt-dev curl-dev openssl-dev
    
    COPY --from=build /usr/local/lib/python3.6/site-packages/ /usr/local/lib/python3.6/site-packages/
    
    WORKDIR /var/www/scylla
    VOLUME /var/www/scylla
    
    EXPOSE 8899
    EXPOSE 8081
    
    CMD python -m scylla
    
    

    Error logs:

    [W:pyppeteer.chromium_downloader] chromium extracted to: /root/.pyppeteer/local-chromium/543305
    Process Process-1:
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
        self.run()
      File "/usr/local/lib/python3.6/multiprocessing/process.py", line 93, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/local/lib/python3.6/site-packages/scylla/scheduler.py", line 34, in fetch_ips
        html = worker.get_html(url, render_js=provider.should_render_js())
      File "/usr/local/lib/python3.6/site-packages/scylla/worker.py", line 47, in get_html
        response.html.render(wait=1.5, timeout=10.0)
      File "/usr/local/lib/python3.6/site-packages/requests_html.py", line 572, in render
        self.session.browser  # Automatycally create a event loop and browser
      File "/usr/local/lib/python3.6/site-packages/requests_html.py", line 680, in browser
        self._browser = self.loop.run_until_complete(pyppeteer.launch(headless=True, args=['--no-sandbox']))
      File "/usr/local/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete
        return future.result()
      File "/usr/local/lib/python3.6/site-packages/pyppeteer/launcher.py", line 243, in launch
        return await Launcher(options, **kwargs).launch()
      File "/usr/local/lib/python3.6/site-packages/pyppeteer/launcher.py", line 141, in launch
        env=env,
      File "/usr/local/lib/python3.6/subprocess.py", line 709, in __init__
        restore_signals, start_new_session)
      File "/usr/local/lib/python3.6/subprocess.py", line 1344, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: '/root/.pyppeteer/local-chromium/543305/chrome-linux/chrome': '/root/.pyppeteer/local-chromium/543305/chrome-linux/chrome'
    

    Actually, the file is there:

    ~/.pyppeteer/local-chromium/543305/chrome-linux # ls -lh
    total 239556
    drwxr-xr-x    2 root     root        4.0K May 31 15:49 MEIPreload
    -rwxr-xr-x    1 root     root      189.3M May 31 15:49 chrome
    -rw-r--r--    1 root     root        5.0K May 31 15:49 chrome-wrapper
    -rw-r--r--    1 root     root      802.2K May 31 15:49 chrome_100_percent.pak
    -rw-r--r--    1 root     root        1.0M May 31 15:49 chrome_200_percent.pak
    -rw-r--r--    1 root     root       22.7K May 31 15:49 chrome_sandbox
    -rw-r--r--    1 root     root        9.7M May 31 15:49 icudtl.dat
    -rw-r--r--    1 root     root      359.0K May 31 15:49 libEGL.so
    -rw-r--r--    1 root     root        6.1M May 31 15:49 libGLESv2.so
    -rw-r--r--    1 root     root        2.6M May 31 15:49 libclearkeycdm.so
    drwxr-xr-x    2 root     root        4.0K May 31 15:49 locales
    -rw-r--r--    1 root     root        3.2M May 31 15:49 nacl_helper
    -rw-r--r--    1 root     root        9.5K May 31 15:49 nacl_helper_bootstrap
    -rw-r--r--    1 root     root        3.5M May 31 15:49 nacl_helper_nonsfi
    -rw-r--r--    1 root     root        3.5M May 31 15:49 nacl_irt_x86_64.nexe
    -rw-r--r--    1 root     root      171.0K May 31 15:49 natives_blob.bin
    -rw-r--r--    1 root     root        2.5K May 31 15:49 product_logo_48.png
    drwxr-xr-x    3 root     root        4.0K May 31 15:49 resources
    -rw-r--r--    1 root     root       12.0M May 31 15:49 resources.pak
    drwxr-xr-x    2 root     root        4.0K May 31 15:49 swiftshader
    -rw-r--r--    1 root     root        1.7M May 31 15:49 v8_context_snapshot.bin
    -rw-r--r--    1 root     root       36.5K May 31 15:49 xdg-mime
    -rw-r--r--    1 root     root       32.5K May 31 15:49 xdg-settings
    ~/.pyppeteer/local-chromium/543305/chrome-linux #
    
    opened by imWildCat 7
  • Browser does not start in Docker

    Browser does not start in Docker

    Hello and thank you for your awesome package! I tried it locally on my ubuntu, liked it a lot, now trying to add it to my project's intergration testing facilities and experiencing basically the next minified issue: image

    I guess it might require installation of some additional Debian (because python:3.6 docker image is based on Debian 8) packages or env variables but I don't know which ones exactly.

    Please help me:)

    opened by scythargon 7
  • cannot import name 'launch'

    cannot import name 'launch'

    Hi,

    I'm trying to run the example but got this error:

    Traceback (most recent call last): File "pyppeteer.py", line 2, in from pyppeteer import launch File "test\pyppeteer.py", line 2, in from pyppeteer import launch ImportError: cannot import name 'launch'

    Tried and failed on both Mac and Windows.

    opened by deathemperor 7
  • pyppeteer.errors.NetworkError: Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.

    pyppeteer.errors.NetworkError: Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.

    possibly related to #175

    pyppeteer is losing connection to chromium browser (maybe because of something that happens inside chromium?). However, when we run the same logic on node.js, we get no erors.

    I have attached a script to replicate the issue (apologies, it's not cleaned up - a discovery code afaik)

    Here is the output of the script:

    [I:pyppeteer.launcher] Browser listening on: ws://127.0.0.1:51578/devtools/browser/a2c27856-5b78-490b-a2a4-cce6c03f14fb
    ws://127.0.0.1:51578/devtools/browser/a2c27856-5b78-490b-a2a4-cce6c03f14fb
    1 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4726N/abstract 55657
    2 https://ui.adsabs.harvard.edu/#abs/2019NewA...66...20P/abstract 56716
    3 https://ui.adsabs.harvard.edu/#abs/2019NewA...66...40N/abstract 53318
    4 https://ui.adsabs.harvard.edu/#abs/2019JMoSt1177..418K/abstract 57338
    5 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4372C/abstract 56445
    6 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4726N/abstract 56319
    7 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5167S/abstract 58452
    8 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5459M/abstract 58754
    9 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5567M/abstract 59967
    10 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483..458B/abstract 69867
    11 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483..529C/abstract 67922
    12 https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483L..47C/abstract 54674
    13 https://ui.adsabs.harvard.edu/#abs/2019NewA...67....1N/abstract 62676
    14 https://ui.adsabs.harvard.edu/#abs/2019NewA...68...51M/abstract 55099
    15 https://ui.adsabs.harvard.edu/#abs/2019PhyE..107....5B/abstract 54913
    [I:pyppeteer.connection] connection closed
    DISCONNECTED
    Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.
    Traceback (most recent call last):
      File "client.py", line 53, in main
        }''')
      File "/dvt/workspace2/ADSTurboBee/python3.7/lib/python3.7/site-packages/pyppeteer/page.py", line 1158, in evaluate
        return await frame.evaluate(pageFunction, *args, force_expr=force_expr)
      File "/dvt/workspace2/ADSTurboBee/python3.7/lib/python3.7/site-packages/pyppeteer/frame_manager.py", line 295, in evaluate
        pageFunction, *args, force_expr=force_expr)
      File "/dvt/workspace2/ADSTurboBee/python3.7/lib/python3.7/site-packages/pyppeteer/execution_context.py", line 55, in evaluate
        pageFunction, *args, force_expr=force_expr)
      File "/dvt/workspace2/ADSTurboBee/python3.7/lib/python3.7/site-packages/pyppeteer/execution_context.py", line 109, in evaluateHandle
        _rewriteError(e)
      File "/dvt/workspace2/ADSTurboBee/python3.7/lib/python3.7/site-packages/pyppeteer/execution_context.py", line 239, in _rewriteError
        raise error
      File "/dvt/workspace2/ADSTurboBee/python3.7/lib/python3.7/site-packages/pyppeteer/execution_context.py", line 106, in evaluateHandle
        'userGesture': True,
      File "/dvt/workspace2/ADSTurboBee/python3.7/lib/python3.7/site-packages/pyppeteer/connection.py", line 218, in send
        f'Protocol Error ({method}): Session closed. Most likely the '
    pyppeteer.errors.NetworkError: Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.
    
    Fatal error https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5553J/abstract
    Reconnected ws://127.0.0.1:51578/devtools/browser/a2c27856-5b78-490b-a2a4-cce6c03f14fb
    [<pyppeteer.page.Page object at 0x7fe4a9784390>, <pyppeteer.page.Page object at 0x7fe4a9784588>]
    

    9 out of 10 times, I get the error after 15 urls were loaded. This seems to suggest some sort of a timeout (underlaying websocket?)

    I tried running with verbose logs, but can't see anything out of ordinary (around the time the DISCONNECT event happens)

    
    import time 
    import asyncio
    import signal
    from pyppeteer import launch, connect
    from functools import partial
    import traceback
    import logging
    import sys
    pyppeteer_level = logging.INFO
    logging.getLogger('pyppeteer').setLevel(pyppeteer_level)
    logging.getLogger('websockets.protocol').setLevel(pyppeteer_level)
    
    urls = ['https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4726N/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4726N/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019NewA...66...20P/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019NewA...66...31M/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019NewA...66...40N/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019AcSpA.209..264M/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019JMoSt1177..418K/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4364C/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4372C/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4422C/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4726N/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.4985A/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5167S/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5349R/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5459M/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5553J/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5567M/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483..392D/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483..458B/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483..711A/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483..529C/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483..840B/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483L..47C/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.483L..64D/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019NewA...67....1N/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019NewA...67...45V/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019NewA...68...51M/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019IJAEO..75...15B/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019PhyE..107....5B/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019SurSc.681...32E/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019CNSNS..70...89J/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019CNSNS..70..223O/abstract', 'https://ui.adsabs.harvard.edu/#abs/2015AIPC.1672m0003H/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.1858Y/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019Icar..319....1A/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.1786L/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.1733G/abstract', 'https://ui.adsabs.harvard.edu/#abs/2019MNRAS.482.5018E/abstract']
    
    
    def signal_handler(signal, frame):  
        while len(urls):
            urls.pop()
    
    signal.signal(signal.SIGINT, signal_handler)
    
    loop = asyncio.get_event_loop()
    
    async def get_browser():
        # two errs, 16 batch: '--disable-dev-shm-usage'
        # identical to previous: '--shm-size=1gb'
        # identical: no args
        # '--no-sandbox', '--disable-setuid-sandbox' : no diff
        b = await launch(options={'headless': True, 'waitUntil': ['load', 'domcontentloaded', 'networkidle0'], 'args': ['--enable-popup-blocking']})
        b.on('disconnected', lambda: sys.stderr.write("DISCONNECTED\n"))
        return b
    
    async def main():
        browser = await get_browser()
        print (browser.wsEndpoint)
        wsEndpoint = browser.wsEndpoint
        errc = 0
    
        while len(urls):
            try:
                page = await browser.newPage()
                page.once('error', lambda: print('ERROR'))
                page.once('pageerror', lambda: print('PAGEERROR'))
                page.once('requestfailed', lambda: print('REQUESTFAILED'))
                i = 0
                for u in urls:
                    await page.goto(u, options={'waitUntil':['load', 'domcontentloaded', 'networkidle0'], 'timeout': 30000})
                    await asyncio.sleep(1)
                    content = await page.evaluate('''() => {
                            return {
                                html: document.documentElement.outerHTML
                            }    
                        }''')
                    i+=1
                    print(i, u, len(content['html']))
                    bibc = u.split('/')[-2]
                    fo = open(bibc + '.html', 'w')
                    html = content['html']
                    fo.write(html.replace('<meta charset="utf-8">', '<meta charset="utf-8"><base href="https://ui.adsabs.harvard.edu/" />'))
                    fo.close()
    
                    # results in: NetworkError: Protocol Error (Page.navigate): Session closed. Most likely the page has been closed.
                    # on next page.goto
                    # await page.close()
                    urls.pop(0)
    
    
            except Exception as e:
                await asyncio.sleep(1)
                errc += 1
                print (e)
                print(traceback.format_exc())
                print ('Fatal error', urls.pop(0))
                browser = await connect(options={'browserWSEndpoint': wsEndpoint})
                print('Reconnected ' + browser.wsEndpoint)
                #await browser.close()
                print(await browser.pages())
                await asyncio.sleep(1)
                #browser = await launch(headless = False, waitUntil='networkidle2', args=['--disable-dev-shm-usage'])
                browser = await get_browser()
                
        await browser.close()
        print ('num_errors={}'.format(errc))
    
    loop.run_until_complete(main())
    
    opened by romanchyla 6
  • >> REPOSITORY ABANDONED >> use pyppeteer2 instead

    >> REPOSITORY ABANDONED >> use pyppeteer2 instead

    A community is maintaining a clone of the repo at under the pyppeteer organization.

    New repo: https://github.com/pyppeteer/pyppeteer2 Checkout Issue: https://github.com/pyppeteer/pyppeteer2/issues/1

    Quick migration:

    pip uninstall pyppeteer
    pip install pyppeteer2
    

    pyppeteer2 uses the same module name so migration is easy.

    opened by cgarciae 1
  •  Execution context was destroyed, most likely because of a navigation.

    Execution context was destroyed, most likely because of a navigation.

    Hi, I have a webpage having a button to go to next page and then I want to click it. I follow the document written in https://miyakogi.github.io/pyppeteer/reference.html and use the following pattern:

    await asyncio.gather(
        page.waitForNavigation(),
        page.click("#foobar"),
    )
    

    Most of time, it works fine. But occasionally, I will get the following errors:

      File "/usr/local/lib/python3.7/dist-packages/pyppeteer/page.py", line 1548, in click
        await frame.click(selector, options, **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/pyppeteer/frame_manager.py", line 581, in click
        handle = await self.J(selector)
      File "/usr/local/lib/python3.7/dist-packages/pyppeteer/frame_manager.py", line 317, in querySelector
        value = await document.querySelector(selector)
      File "/usr/local/lib/python3.7/dist-packages/pyppeteer/element_handle.py", line 360, in querySelector
        self, selector,
      File "/usr/local/lib/python3.7/dist-packages/pyppeteer/execution_context.py", line 108, in evaluateHandle
        _rewriteError(e)
      File "/usr/local/lib/python3.7/dist-packages/pyppeteer/execution_context.py", line 237, in _rewriteError
        raise type(error)(msg)
    pyppeteer.errors.NetworkError: Execution context was destroyed, most likely because of a navigation.
    

    Any idea or workaround?

    I am using the following version of pyppeteer:

    $ python3.7 -m pip list | grep pyppeteer
    pyppeteer (0.0.25)
    

    I think it is a related issue: https://github.com/puppeteer/puppeteer/issues/5056

    opened by wonghang 2
  • Page goto returns None

    Page goto returns None

    Hi, For some URLs, the goto method does not fail, but still returns None.

    For example:

    • http://www.swisscamps.ch/de/index.php
    • http://www.whisky-club-oberwallis.ch

    Here is a minimal example (using pyppeteer 0.0.25 and Python 3.7.4):

    import asyncio
    from pyppeteer import launch
    
    urls = [
        'http://www.swisscamps.ch/de/index.php',
        'http://www.whisky-club-oberwallis.ch/brennereien']
    
    async def main():
        browser = await launch(headless=True)
        page = await browser.newPage()
        for url in urls:
            response = await page.goto(url, waitUntil='networkidle0')
            print(url, response)
        await page.close()
        await browser.close()
    
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())
    

    Output:

    http://www.swisscamps.ch/de/index.php None
    http://www.whisky-club-oberwallis.ch None
    

    Using curl or the chrome browser, I can still see the response headers and status codes. Any idea where this comes from / how to fix it ?

    opened by derlin 2
  • can we use the cookies in python request from puppeteer cookies?

    can we use the cookies in python request from puppeteer cookies?

    I am trying to save cookies from rendering through puppeteer and use in python-requests. but i am not able to do? please help me if you anyone have solution ?

    opened by abhay-kum 1
Owner
miyakogi
python, vim
miyakogi
Free cleverbot without headless browser

Cleverbot Scraper Simple free cleverbot library that doesn't require running a heavy ram wasting headless web browser to actually chat with the bot, a

Matheus Fillipe 3 Sep 25, 2022
Python 3 wrapper of Microsoft UIAutomation. Support UIAutomation for MFC, WindowsForm, WPF, Modern UI(Metro UI), Qt, IE, Firefox, Chrome ...

Python 3 wrapper of Microsoft UIAutomation. Support UIAutomation for MFC, WindowsForm, WPF, Modern UI(Metro UI), Qt, IE, Firefox, Chrome ...

yin kaisheng 1.6k Dec 29, 2022
A simple python script that uses selenium(chrome web driver),pyautogui,time and schedule modules to enter google meets automatically

A simple python script that uses selenium(chrome web driver),pyautogui,time and schedule modules to enter google meets automatically

null 3 Feb 7, 2022
Selenium-python but lighter: Helium is the best Python library for web automation.

Selenium-python but lighter: Helium Selenium-python is great for web automation. Helium makes it easier to use. For example: Under the hood, Helium fo

Michael Herrmann 3.2k Dec 31, 2022
Python version of the Playwright testing and automation library.

?? Playwright for Python Docs | API Playwright is a Python library to automate Chromium, Firefox and WebKit browsers with a single API. Playwright del

Microsoft 7.8k Jan 2, 2023
A simple asynchronous TCP/IP Connect Port Scanner in Python 3

Python 3 Asynchronous TCP/IP Connect Port Scanner A simple pure-Python TCP Connect port scanner. This application leverages the use of Python's Standa

null 70 Jan 3, 2023
Divide full port scan results and use it for targeted Nmap runs

Divide Et Impera And Scan (and also merge the scan results) DivideAndScan is used to efficiently automate port scanning routine by splitting it into 3

snovvcrash 226 Dec 30, 2022
Generic automation framework for acceptance testing and RPA

Robot Framework Introduction Installation Example Usage Documentation Support and contact Contributing License Introduction Robot Framework is a gener

Robot Framework 7.7k Jan 7, 2023
A cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard.

PyAutoGUI PyAutoGUI is a cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard. pip inst

Al Sweigart 7.5k Dec 31, 2022
A folder automation made using Watch-dog, it only works in linux for now but I assume, it will be adaptable to mac and PC as well

folder-automation A folder automation made using Watch-dog, it only works in linux for now but I assume, it will be adaptable to mac and PC as well Th

Parag Jyoti Paul 31 May 28, 2021
A browser automation framework and ecosystem.

Selenium Selenium is an umbrella project encapsulating a variety of tools and libraries enabling web browser automation. Selenium specifically provide

Selenium 25.5k Jan 1, 2023
✅ Python web automation and testing. 🚀 Fast, easy, reliable. 💠

Build fast, reliable, end-to-end tests. SeleniumBase is a Python framework for web automation, end-to-end testing, and more. Tests are run with "pytes

SeleniumBase 3k Jan 4, 2023
A complete test automation tool

Golem - Test Automation Golem is a test framework and a complete tool for browser automation. Tests can be written with code in Python, codeless using

null 486 Dec 30, 2022
Integration layer between Requests and Selenium for automation of web actions.

Requestium is a Python library that merges the power of Requests, Selenium, and Parsel into a single integrated tool for automatizing web actions. The

Tryolabs 1.7k Dec 27, 2022
Command line driven CI frontend and development task automation tool.

tox automation project Command line driven CI frontend and development task automation tool At its core tox provides a convenient way to run arbitrary

tox development team 3.1k Jan 4, 2023
Flexible test automation for Python

Nox - Flexible test automation for Python nox is a command-line tool that automates testing in multiple Python environments, similar to tox. Unlike to

Stargirl Flowers 941 Jan 3, 2023
Network automation lab using nornir, scrapli, and containerlab with Arista EOS

nornir-scrapli-eos-lab Network automation lab using nornir, scrapli, and containerlab with Arista EOS. Objectives Deploy base configs to 4xArista devi

Vireak Ouk 13 Jul 7, 2022
Nokia SR OS automation

Nokia SR OS automation Nokia is one of the biggest vendors of the telecommunication equipment, which is very popular in the Service Provider segment.

Karneliuk.com 7 Jul 23, 2022
PyAutoEasy is a extension / wrapper around the famous PyAutoGUI, a cross-platform GUI automation tool to replace your boooring repetitive tasks.

PyAutoEasy PyAutoEasy is a extension / wrapper around the famous PyAutoGUI, a cross-platform GUI automation tool to replace your boooring repetitive t

Dingu Sagar 7 Oct 27, 2022