Downloader Middleware to support Playwright in Scrapy & Gerapy

Overview

Gerapy Playwright

This is a package for supporting Playwright in Scrapy, also this package is a module in Gerapy.

Installation

pip3 install gerapy-playwright

Usage

You can use PlaywrightRequest to specify a request which uses playwright to render.

For example:

yield PlaywrightRequest(detail_url, callback=self.parse_detail)

And you also need to enable PlaywrightMiddleware in DOWNLOADER_MIDDLEWARES:

DOWNLOADER_MIDDLEWARES = {
    'gerapy_playwright.downloadermiddlewares.PlaywrightMiddleware': 543,
}

Congratulate, you've finished the all of the required configuration.

If you run the Spider again, Playwright will be started to render every web page which you configured the request as PlaywrightRequest.

Settings

GerapyPlaywright provides some optional settings.

Concurrency

You can directly use Scrapy's setting to set Concurrency of Playwright, for example:

CONCURRENT_REQUESTS = 3

Pretend as Real Browser

Some website will detect WebDriver or Headless, GerapyPlaywright can pretend Chromium by inject scripts. This is enabled by default.

You can close it if website does not detect WebDriver to speed up:

GERAPY_PLAYWRIGHT_PRETEND = False

Also you can use pretend attribute in PlaywrightRequest to overwrite this configuration.

Logging Level

By default, Playwright will log all the debug messages, so GerapyPlaywright configured the logging level of Playwright to WARNING.

If you want to see more logs from Playwright, you can change the this setting:

import logging
GERAPY_PLAYWRIGHT_LOGGING_LEVEL = logging.DEBUG

Download Timeout

Playwright may take some time to render the required web page, you can also change this setting, default is 30s:

# playwright timeout
GERAPY_PLAYWRIGHT_DOWNLOAD_TIMEOUT = 30

Headless

By default, Playwright is running in Headless mode, you can also change it to False as you need, default is True:

GERAPY_PLAYWRIGHT_HEADLESS = False

Window Size

You can also set the width and height of Playwright window:

GERAPY_PLAYWRIGHT_WINDOW_WIDTH = 1400
GERAPY_PLAYWRIGHT_WINDOW_HEIGHT = 700

Default is 1400, 700.

Proxy

You can set a proxy channel via below this config:

GERAPY_PLAYWRIGHT_PROXY = 'http://tps254.kdlapi.com:15818'
GERAPY_PLAYWRIGHT_PROXY_CREDENTIAL = {
  'username': 'xxx',
  'password': 'xxxx'
}

Screenshot

You can get screenshot of loaded page, you can pass screenshot args to PlaywrightRequest as dict:

Below are the supported args:

  • type (str): Specify screenshot type, can be either jpeg or png. Defaults to png.
  • quality (int): The quality of the image, between 0-100. Not applicable to png image.
  • full_page (bool): When true, take a screenshot of the full scrollable page. Defaults to False.
  • clip (dict): An object which specifies clipping region of the page. This option should have the following fields:
    • x (int): x-coordinate of top-left corner of clip area.
    • y (int): y-coordinate of top-left corner of clip area.
    • width (int): width of clipping area.
    • height (int): height of clipping area.
  • omit_background (bool): Hide default white background and allow capturing screenshot with transparency.
  • timeout (str): Maximum time in milliseconds, defaults to 30 seconds, pass 0 to disable timeout.

Check more from https://playwright.dev/python/docs/api/class-page#page-screenshot

For example:

yield PlaywrightRequest(start_url, callback=self.parse_index, wait_for='.item .name', screenshot={
            'type': 'png',
            'full_page': True
        })

then you can get screenshot result in response.meta['screenshot']:

Simplest save it to file:

def parse_index(self, response):
    with open('screenshot.png', 'wb') as f:
        f.write(response.meta['screenshot'].getbuffer())

If you want to enable screenshot for all requests, you can configure it by GERAPY_PLAYWRIGHT_SCREENSHOT.

For example:

GERAPY_PLAYWRIGHT_SCREENSHOT = {
    'type': 'png',
    'full_page': True
}

PlaywrightRequest

PlaywrightRequest provide args which can override global settings above.

  • url: request url
  • callback: callback
  • wait_until: one of "load", "domcontentloaded", "networkidle" see https://playwright.dev/python/docs/api/class-page#page-wait-for-load-state, default is domcontentloaded
  • wait_for: wait for some element to load, also supports dict
  • script: script to execute
  • actions: actions defined for execution of Page object
  • proxy: use proxy for this time, like http://x.x.x.x:x
  • proxy_credential: the proxy credential, like {'username': 'xxxx', 'password': 'xxxx'}
  • sleep: time to sleep after loaded, override GERAPY_PLAYWRIGHT_SLEEP
  • timeout: load timeout, override GERAPY_PLAYWRIGHT_DOWNLOAD_TIMEOUT
  • ignore_resource_types: ignored resource types, override GERAPY_PLAYWRIGHT_IGNORE_RESOURCE_TYPES
  • pretend: pretend as normal browser, override GERAPY_PLAYWRIGHT_PRETEND
  • screenshot: ignored resource types, see https://playwright.dev/python/docs/api/class-page#page-screenshot, override GERAPY_PLAYWRIGHT_SCREENSHOT

For example, you can configure PlaywrightRequest as:

from gerapy_playwright import PlaywrightRequest

def parse(self, response):
    yield PlaywrightRequest(url,
        callback=self.parse_detail,
        wait_until='domcontentloaded',
        wait_for='title',
        script='() => { return {name: "Germey"} }',
        sleep=2)

Then Playwright will:

  • wait for document to load
  • wait for title to load
  • execute console.log(document) script
  • sleep for 2s
  • return the rendered web page content, get from response.meta['screenshot']
  • return the script executed result, get from response.meta['script_result']

For waiting mechanism controlled by JavaScript, you can use await in script, for example:

js = '''async () => {
    await new Promise(resolve => setTimeout(resolve, 10000));
    return {
        'name': 'Germey'
    }
}
'''
yield PlaywrightRequest(url, callback=self.parse, script=js)

Then you can get the script result from response.meta['script_result'], result is {'name': 'Germey'}.

If you think the JavaScript is wired to write, you can use actions argument to define a function to execute Python based functions, for example:

async def execute_actions(page):
    await page.evaluate('() => { document.title = "Hello World"; }')
    return 1
yield PlaywrightRequest(url, callback=self.parse, actions=execute_actions)

Then you can get the actions result from response.meta['actions_result'], result is 1.

Also you can define proxy and proxy_credential for each Reqest, for example:

yield PlaywrightRequest(
  self.base_url,
  callback=self.parse_index,
  priority=10,
  proxy='http://tps254.kdlapi.com:15818',
  proxy_credential={
      'username': 'xxxx',
      'password': 'xxxx'
})

proxy and proxy_credential will override the settings GERAPY_PLAYWRIGHT_PROXY and GERAPY_PLAYWRIGHT_PROXY_CREDENTIAL.

Example

For more detail, please see example.

Also you can directly run with Docker:

docker run germey/gerapy-playwright-example

Outputs:

2021-12-27 16:54:14 [scrapy.utils.log] INFO: Scrapy 2.2.0 started (bot: example)
2021-12-27 16:54:14 [scrapy.utils.log] INFO: Versions: lxml 4.7.1.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.7.9 (default, Aug 31 2020, 07:22:35) - [Clang 10.0.0 ], pyOpenSSL 21.0.0 (OpenSSL 1.1.1l  24 Aug 2021), cryptography 35.0.0, Platform Darwin-21.1.0-x86_64-i386-64bit
2021-12-27 16:54:14 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2021-12-27 16:54:14 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'example',
 'CONCURRENT_REQUESTS': 1,
 'NEWSPIDER_MODULE': 'example.spiders',
 'RETRY_HTTP_CODES': [403, 500, 502, 503, 504],
 'SPIDER_MODULES': ['example.spiders']}
2021-12-27 16:54:14 [scrapy.extensions.telnet] INFO: Telnet Password: e931b241390ad06a
2021-12-27 16:54:14 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2021-12-27 16:54:14 [gerapy.playwright] INFO: playwright libraries already installed
2021-12-27 16:54:14 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'gerapy_playwright.downloadermiddlewares.PlaywrightMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2021-12-27 16:54:14 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2021-12-27 16:54:14 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2021-12-27 16:54:14 [scrapy.core.engine] INFO: Spider opened
2021-12-27 16:54:14 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-12-27 16:54:14 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2021-12-27 16:54:14 [example.spiders.movie] DEBUG: start url https://antispider1.scrape.center/page/1
2021-12-27 16:54:14 [gerapy.playwright] DEBUG: processing request <GET https://antispider1.scrape.center/page/1>
2021-12-27 16:54:14 [gerapy.playwright] DEBUG: playwright_meta {'wait_until': 'domcontentloaded', 'wait_for': '.item', 'script': None, 'actions': None, 'sleep': None, 'proxy': None, 'proxy_credential': None, 'pretend': None, 'timeout': None, 'screenshot': None}
2021-12-27 16:54:14 [gerapy.playwright] DEBUG: set options {'headless': False}
cookies []
2021-12-27 16:54:16 [gerapy.playwright] DEBUG: PRETEND_SCRIPTS is run
2021-12-27 16:54:16 [gerapy.playwright] DEBUG: timeout 10
2021-12-27 16:54:16 [gerapy.playwright] DEBUG: crawling https://antispider1.scrape.center/page/1
2021-12-27 16:54:16 [gerapy.playwright] DEBUG: request https://antispider1.scrape.center/page/1 with options {'url': 'https://antispider1.scrape.center/page/1', 'wait_until': 'domcontentloaded'}
2021-12-27 16:54:18 [gerapy.playwright] DEBUG: waiting for .item
2021-12-27 16:54:18 [gerapy.playwright] DEBUG: sleep for 1s
2021-12-27 16:54:19 [gerapy.playwright] DEBUG: taking screenshot using args {'type': 'png', 'full_page': True}
2021-12-27 16:54:19 [gerapy.playwright] DEBUG: close playwright
2021-12-27 16:54:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://antispider1.scrape.center/page/1> (referer: None)
2021-12-27 16:54:20 [example.spiders.movie] DEBUG: start url https://antispider1.scrape.center/page/2
2021-12-27 16:54:20 [gerapy.playwright] DEBUG: processing request <GET https://antispider1.scrape.center/page/2>
2021-12-27 16:54:20 [gerapy.playwright] DEBUG: playwright_meta {'wait_until': 'domcontentloaded', 'wait_for': '.item', 'script': None, 'actions': None, 'sleep': None, 'proxy': None, 'proxy_credential': None, 'pretend': None, 'timeout': None, 'screenshot': None}
2021-12-27 16:54:20 [gerapy.playwright] DEBUG: set options {'headless': False}
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/1
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/2
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/3
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/4
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/5
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/6
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/7
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/8
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/9
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/10
cookies []
2021-12-27 16:54:21 [gerapy.playwright] DEBUG: PRETEND_SCRIPTS is run
2021-12-27 16:54:21 [gerapy.playwright] DEBUG: timeout 10
2021-12-27 16:54:21 [gerapy.playwright] DEBUG: crawling https://antispider1.scrape.center/page/2
2021-12-27 16:54:21 [gerapy.playwright] DEBUG: request https://antispider1.scrape.center/page/2 with options {'url': 'https://antispider1.scrape.center/page/2', 'wait_until': 'domcontentloaded'}
2021-12-27 16:54:23 [gerapy.playwright] DEBUG: waiting for .item
2021-12-27 16:54:24 [gerapy.playwright] DEBUG: sleep for 1s
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: taking screenshot using args {'type': 'png', 'full_page': True}
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: close playwright
2021-12-27 16:54:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://antispider1.scrape.center/page/2> (referer: None)
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: processing request <GET https://antispider1.scrape.center/detail/10>
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: playwright_meta {'wait_until': 'domcontentloaded', 'wait_for': '.item', 'script': None, 'actions': None, 'sleep': None, 'proxy': None, 'proxy_credential': None, 'pretend': None, 'timeout': None, 'screenshot': None}
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: set options {'headless': False}
...
Comments
  • twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed

    twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed

    python: 3.9 GerapyPlaywright: 0.2.4 os: mac 11.6

    运行scrapy crawl spider的时候直接报错:

    Traceback (most recent call last):
      File "/Users/zz/.virtualenvs/crawler-apk/bin/scrapy", line 8, in <module>
        sys.exit(execute())
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/cmdline.py", line 145, in execute
        _run_print_help(parser, _run_command, cmd, args, opts)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/cmdline.py", line 100, in _run_print_help
        func(*a, **kw)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/cmdline.py", line 153, in _run_command
        cmd.run(args, opts)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/commands/crawl.py", line 22, in run
        crawl_defer = self.crawler_process.crawl(spname, **opts.spargs)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/crawler.py", line 205, in crawl
        crawler = self.create_crawler(crawler_or_spidercls)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/crawler.py", line 238, in create_crawler
        return self._create_crawler(crawler_or_spidercls)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/crawler.py", line 313, in _create_crawler
        return Crawler(spidercls, self.settings, init_reactor=True)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/crawler.py", line 82, in __init__
        default.install()
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/twisted/internet/selectreactor.py", line 194, in install
        installReactor(reactor)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/twisted/internet/main.py", line 32, in installReactor
        raise error.ReactorAlreadyInstalledError("reactor already installed"
    twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed
    

    这个应该改哪里呢

    opened by yyyy777 3
  • Fix leaking file descriptors by using the context manager

    Fix leaking file descriptors by using the context manager

    I was running into OSError: [Errno 24] Too many open files while using this with scrapy for scraping a domain.

    By using async_playwright() as a context manager, we ensure it's closed once finished. This fixes the issue.

    opened by xolan 2
  •     raise BadGzipFile('Not a gzipped file (%r)' % magic) gzip.BadGzipFile: Not a gzipped file (b'<!')

    raise BadGzipFile('Not a gzipped file (%r)' % magic) gzip.BadGzipFile: Not a gzipped file (b'

    崔佬,我这边也不能用。一启动scrapy,就会报这个。 配置: python:3.9.4 macOs: 11.5.2

    Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/twisted/internet/defer.py", line 1445, in _inlineCallbacks result = current_context.run(g.send, result) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scrapy/core/downloader/middleware.py", line 54, in process_response response = yield deferred_from_coro(method(request=request, response=response, spider=spider)) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 62, in process_response decoded_body = self._decode(response.body, encoding.lower()) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 82, in _decode body = gunzip(body) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scrapy/utils/gz.py", line 27, in gunzip chunk = f.read1(8196) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/gzip.py", line 313, in read1 return self._buffer.read1(size) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/_compression.py", line 68, in readinto data = self.read(len(byte_view)) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/gzip.py", line 487, in read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/gzip.py", line 435, in _read_gzip_header raise BadGzipFile('Not a gzipped file (%r)' % magic) gzip.BadGzipFile: Not a gzipped file (b'<!')

    opened by wf4867612 2
  • Method is not Json serializable for actions

    Method is not Json serializable for actions

    Hello,

    I am running into an issue with 'yield' for gerapy-playwright when I need to access a login page. I try to run: yield PlaywrightRequest(login_page_url, self.parse_login, actions = self.login_action) in order to first use playwright to login and then access data that can only be accessed when logging in with self.parse_login. I am getting a: builtins.TypeError: is not JSON serializable.

    I am using scrapy cluster along with gerapy-playwright in order to run a scheduler for all the spiders that I have: https://github.com/istresearch/scrapy-cluster

    It seems that the action is saved in the meta data as a method and cannot be passed to the scheduler. Is it possible to type cast the action as a string and then when the action is called later, to do a method call on the string? If I understand correctly, the action is produced on line 339 of the downloadermiddlewares.py inside of gerepy-playwright. Would it be possible to evaluate the string as a method so that the scrapy-cluster scheduler can pass the string but gerapy-playwright still calls the self.login_action method prior to the self.parse_login?

    opened by BenzTivianne 0
  • playwright._impl._api_types.Error: Browser closed.

    playwright._impl._api_types.Error: Browser closed.

    这种报错会是什么原因呢...

    2022-03-23 06:21:06 [scrapy.core.scraper] ERROR: Error downloading <GET https://apkpure.com/bikers-men-women-bike-photo-editor-future-trends/com.dsrtech.bikers> Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks result = current_context.run( File "/usr/local/lib/python3.8/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "/usr/local/lib/python3.8/dist-packages/scrapy/core/downloader/middleware.py", line 41, in process_request response = yield deferred_from_coro(method(request=request, spider=spider)) File "/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py", line 1030, in adapt extracted = result.result() File "/usr/local/lib/python3.8/dist-packages/gerapy_playwright/downloadermiddlewares.py", line 243, in _process_request context = await browser.new_context( File "/usr/local/lib/python3.8/dist-packages/playwright/async_api/_generated.py", line 11254, in new_context await self._async( File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_browser.py", line 117, in new_context channel = await self._channel.send("newContext", params) File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_connection.py", line 39, in send return await self.inner_send(method, params, False) File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_connection.py", line 63, in inner_send result = next(iter(done)).result() playwright._impl._api_types.Error: Browser closed. ==================== Browser output: ==================== /ms-playwright/chromium-978106/chrome-linux/chrome --disable-background-networking --enable-features=NetworkService,NetworkServiceInProcess --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=ImprovedCookieControls,LazyFrameLoading,GlobalMediaControls,DestroyProfileOnBrowserClose,MediaRouter,AcceptCHFrame,AutoExpandDetailsElement --allow-pre-commit-input --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --disable-sync --force-color-profile=srgb --metrics-recording-only --no-first-run --enable-automation --password-store=basic --use-mock-keychain --no-service-autorun --export-tagged-pdf --headless --hide-scrollbars --mute-audio --blink-settings=primaryHoverType=2,availableHoverTypes=2,primaryPointerType=4,availablePointerTypes=4 --no-sandbox --disable-extensions --hide-scrollbars --mute-audio --no-sandbox --disable-setuid-sandbox --disable-gpu --user-data-dir=/tmp/playwright_chromiumdev_profile-LGppgb --remote-debugging-pipe --no-startup-window pid=1185 [pid=1185][err] [0323/062041.773971:ERROR:platform_thread_posix.cc(151)] pthread_create: Resource temporarily unavailable (11) [pid=1185][err] [0323/062041.774268:ERROR:platform_thread_posix.cc(151)] pthread_create: Resource temporarily unavailable (11) [pid=1185][err] [0323/062041.778128:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.778090:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.778548:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.778568:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 1 time(s) [pid=1185][err] [0323/062041.778540:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.780496:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.785120:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.785947:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.785963:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 2 time(s) [pid=1185][err] [0323/062041.786835:ERROR:bus.cc(397)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [pid=1185][err] [0323/062041.786892:ERROR:bus.cc(397)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [pid=1185][err] [0323/062041.787157:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable. [pid=1185][err] [0323/062041.787335:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.816287:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.815965:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.816903:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.816914:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 3 time(s) [pid=1185][err] [0323/062041.816721:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.821091:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.821123:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.821310:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.821321:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 4 time(s) [pid=1185][err] [0323/062041.822089:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.823058:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.823172:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.823358:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.823369:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 5 time(s) [pid=1185][err] [0323/062041.824213:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.825010:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.825129:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.825312:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.825323:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 6 time(s) [pid=1185][err] [0323/062041.825608:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.825332:FATAL:gpu_data_manager_impl_private.cc(447)] GPU process isn't usable. Goodbye. [pid=1185][err] #0 0x55f9da32a369 base::debug::CollectStackTrace() [pid=1185][err] #1 0x55f9da2908c3 base::debug::StackTrace::StackTrace() [pid=1185][err] #2 0x55f9da2a3650 logging::LogMessage::~LogMessage() [pid=1185][err] #3 0x55f9d7e92bf7 content::(anonymous namespace)::IntentionallyCrashBrowserForUnusableGpuProcess() [pid=1185][err] #4 0x55f9d7e903fe content::GpuDataManagerImplPrivate::FallBackToNextGpuMode() [pid=1185][err] #5 0x55f9d7e8f303 content::GpuDataManagerImpl::FallBackToNextGpuMode() [pid=1185][err] #6 0x55f9d7e99d13 content::GpuProcessHost::RecordProcessCrash() [pid=1185][err] #7 0x55f9d7e9af44 content::GpuProcessHost::OnProcessLaunchFailed() [pid=1185][err] #8 0x55f9d7d15421 content::BrowserChildProcessHostImpl::OnProcessLaunchFailed() [pid=1185][err] #9 0x55f9d7d6faf5 content::internal::ChildProcessLauncherHelper::PostLaunchOnClientThread() [pid=1185][err] #10 0x55f9d7d6fd15 base::internal::Invoker<>::RunOnce() [pid=1185][err] #11 0x55f9da2e8bb0 base::TaskAnnotator::RunTaskImpl() [pid=1185][err] #12 0x55f9da2fca99 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl() [pid=1185][err] #13 0x55f9da2fc7bc base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork() [pid=1185][err] #14 0x55f9da2fcf92 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork() [pid=1185][err] #15 0x55f9da2ac06b base::(anonymous namespace)::WorkSourceDispatch() [pid=1185][err] #16 0x7fe589c2b17d g_main_context_dispatch [pid=1185][err] #17 0x7fe589c2b400 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.6+0x523ff) [pid=1185][err] #18 0x7fe589c2b4a3 g_main_context_iteration [pid=1185][err] #19 0x55f9da2abeb3 base::MessagePumpGlib::Run() [pid=1185][err] #20 0x55f9da2fd1fe base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run() [pid=1185][err] #21 0x55f9da2ca3ed base::RunLoop::Run() [pid=1185][err] #22 0x55f9d7d2d2ad content::BrowserMainLoop::RunMainMessageLoop() [pid=1185][err] #23 0x55f9d7d2eb62 content::BrowserMainRunnerImpl::Run() [pid=1185][err] #24 0x55f9df75683e headless::HeadlessContentMainDelegate::RunProcess() [pid=1185][err] #25 0x55f9d9e42862 content::RunBrowserProcessMain() [pid=1185][err] #26 0x55f9d9e43d0f content::ContentMainRunnerImpl::RunBrowser() [pid=1185][err] #27 0x55f9d9e4389f content::ContentMainRunnerImpl::Run() [pid=1185][err] #28 0x55f9d9e40cb4 content::RunContentProcess() [pid=1185][err] #29 0x55f9d9e415ce content::ContentMain() [pid=1185][err] #30 0x55f9d9e9cc5a headless::(anonymous namespace)::RunContentMain() [pid=1185][err] #31 0x55f9d9e9c965 headless::HeadlessShellMain() [pid=1185][err] #32 0x55f9d6961fa8 ChromeMain [pid=1185][err] #33 0x7fe588ea60b3 __libc_start_main [pid=1185][err] #34 0x55f9d6961dea _start [pid=1185][err] Task trace: [pid=1185][err] #0 0x55f9d7d6f9ac content::internal::ChildProcessLauncherHelper::PostLaunchOnLauncherThread() [pid=1185][err] #1 0x55f9d7d6f3aa content::internal::ChildProcessLauncherHelper::StartLaunchOnClientThread() [pid=1185][err] #2 0x55f9da682456 mojo::SimpleWatcher::Context::Notify() [pid=1185][err] #3 0x55f9d7d6f3aa content::internal::ChildProcessLauncherHelper::StartLaunchOnClientThread() [pid=1185][err] #4 0x55f9da682456 mojo::SimpleWatcher::Context::Notify() [pid=1185][err] Task trace buffer limit hit, update PendingTask::kTaskBacktraceLength to increase. [pid=1185][err]

    opened by yyyy777 0
Releases(v0.2.3)
  • v0.2.3(Jan 11, 2022)

  • v0.2.0(Dec 28, 2021)

    • New Feature: Add support for:
      • Specifying channel for launching
      • Specifying executablePath for launching
      • Specifying slowMo for launching
      • Specifying devtools for launching
      • Specifying --disable-extensions in args for launching
      • Specifying --hide-scrollbars in args for launching
      • Specifying --no-sandbox in args for launching
      • Specifying --disable-setuid-sandbox in args for launching
      • Specifying --disable-gpu in args for launching
    • Update: change GERAPY_PLAYWRIGHT_SLEEP default to 0
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Dec 28, 2021)

  • v0.1.1(Dec 27, 2021)

    First version of Playwright, add basic support for:

    • Auto Installation
    • Render with Playwright
    • Setting Concurrency
    • Setting Proxy
    • Setting Cookies
    • Screenshot
    • Evaluating Script
    • Wait for Elements
    • Wait loading control
    • Setting Timeout
    • Pretending Webdriver
    Source code(tar.gz)
    Source code(zip)
Owner
Gerapy
Distributed Crawler Management Framework Based on Scrapy, Scrapyd.
Gerapy
Python-Youtube-Downloader - An Open Source Python Youtube Downloader

Python-Youtube-Downloader Hello There This Is An Open Source Python Youtube Down

Flex Tools 3 Jun 14, 2022
Youtube Downloader is a simple but highly efficient Youtube Video Downloader, made completly using Python

Youtube Downloader is a simple but highly efficient Youtube Video Downloader, made completly using Python

Arsh 2 Nov 26, 2022
Youtube-downloader-using-Python - Youtube downloader using Python

Youtube-downloader-using-Python Hii guys !! Fancy to see here Welcome! built by

Lakshmi Deepak 2 Jun 9, 2022
Youtube playlist downloader with full metadata support

ytrake GUI tool to embed metadata for albums on Youtube with youtube-dl. Requires youtube-dl v2021.06.06. Post-processing Album metadata: Usage ytrake

null 28 Jul 12, 2022
Youtube Downloader Telegram Bot 😉

Youtube Dl bot ?? Prerequisite ffmpeg install dependencies pip3 install -r requirements.txt Setup Bot - Change configuration config.py File - insta

Aryan Vikash 285 Dec 6, 2022
A scriptable music downloader for Qobuz, Tidal, and Deezer

streamrip A scriptable stream downloader for Qobuz, Tidal, and Deezer. Features Downloads tracks, albums, playlists, discographies, and labels from Qo

null 967 Jan 3, 2023
Bulk Downloader for Reddit

saveddit is a bulk media downloader for reddit pip3 install saveddit Setting up authorization Register an application with Reddit Write down your clie

Pranav 136 Jan 3, 2023
A Udemy downloader that can download DRM protected videos and non-DRM protected videos.

Udemy Downloader with DRM support NOTE This program is WIP, the code is provided as-is and i am not held resposible for any legal repercussions result

Puyodead1 468 Dec 29, 2022
Music and video downloader, Made with love by Bryan Herrera

Python-Mp3Mp4-Downloader Music and video downloader, Made with love by Bryan Herrera Requirements CHOCOLATELY windows command If your system does not

ርᚱ1ናተᛰ ᚻህᚥተპᚱ 104 Dec 27, 2022
📺 YouTube Song Downloader Bot For Telegram 🔮

?? YouTube Song Downloader Bot For Telegram ?? Powerd By TamilBots.

Tamil Bots 146 Dec 31, 2022
music downloader written in python. (Uses jiosaavn API)

music downloader written in python. (Uses jiosaavn API)

Rohn Chatterjee 35 Jul 20, 2022
MMDL (Mega Music Downloader) - A tool to easily download music.

mmdl - Mega Music Downloader What is mmdl ❓ TLDR: MMDL is a cli app which allows you to quickly and efficiently download one or multiple songs from Yo

techboy-coder 30 Dec 13, 2022
apkizer is a mass downloader for android applications for all available versions.

apkizer apkizer collects all available versions of an Android application from apkpure.com Purpose Sometimes mobile applications can be useful to dig

Kamil Onur Özkaleli 41 Dec 16, 2022
Pantheon - The fastest YouTube downloader.

A Youtube downloader written in Python3, using HTTP requests and an API.

Billy 38 Nov 21, 2022
Terminal based YouTube player and downloader

termitube NOTE: THIS REPOSITORY IS A FORK OF mps-youtube as mps-youtube has been unmaintained for almost a year now. Features Search and play audio/vi

Otis/Jacob Root 27 Dec 23, 2022
Using Youtube downloader is the fast and easy way to download and save any YouTube video.

Youtube video downloader using Django Using Django as a backend along with pytube module to create Youtbue Video Downloader. https://yt-videos-downloa

Suman Raj Khanal 10 Jun 18, 2022
A prometheus exporter for torrent downloader like qbittorrent/transmission/deluge

downloader-exporter A prometheus exporter for qBitorrent/Transmission/Deluge. Get metrics from multiple servers and offers them in a prometheus format

Lei Shi 41 Nov 18, 2022
bing image downloader app used to download bulk images for a specific search term created using streamlit and bing_image_downloader python packages

bing image downloader app bing image downloader app is used to download bulk images for a specific search term. bing image downloader app gets the sea

Siva Prakash 8 Apr 5, 2022