A Python module to bypass Cloudflare's anti-bot page.

VeNoMouS

Last update: Dec 31, 2022

Related tags

Overview

cloudscraper

A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Requests. Cloudflare changes their techniques periodically, so I will update this repo frequently.

This can be useful if you wish to scrape or crawl a website protected with Cloudflare. Cloudflare's anti-bot page currently just checks if the client supports Javascript, though they may add additional techniques in the future.

Due to Cloudflare continually changing and hardening their protection page, cloudscraper requires a JavaScript Engine/interpreter to solve Javascript challenges. This allows the script to easily impersonate a regular web browser without explicitly deobfuscating and parsing Cloudflare's Javascript.

For reference, this is the default message Cloudflare uses for these sorts of pages:

Checking your browser before accessing website.com.

This process is automatic. Your browser will redirect to your requested content shortly.

Please allow up to 5 seconds...

Any script using cloudscraper will sleep for ~5 seconds for the first visit to any site with Cloudflare anti-bots enabled, though no delay will occur after the first request.

Donations

If you feel like showing your love and/or appreciation for this project, then how about shouting me a coffee or beer :)

Installation

Simply run pip install cloudscraper. The PyPI package is at https://pypi.python.org/pypi/cloudscraper/

Alternatively, clone this repository and run python setup.py install.

Dependencies

Python 3.x
Requests >= 2.9.2
requests_toolbelt >= 0.9.1

python setup.py install will install the Python dependencies automatically. The javascript interpreters and/or engines you decide to use are the only things you need to install yourself, excluding js2py which is part of the requirements as the default.

Javascript Interpreters and Engines

We support the following Javascript interpreters/engines.

ChakraCore: Library binaries can also be located here.
js2py: >=0.67
native: Self made native python solver (Default)
Node.js
V8: We use Sony's v8eval() python module.

Updates

Cloudflare modifies their anti-bot protection page occasionally, So far it has changed maybe once per year on average.

If you notice that the anti-bot page has changed, or if this module suddenly stops working, please create a GitHub issue so that I can update the code accordingly.

Many issues are a result of users not updating to the latest release of this project. Before filing an issue, please run the following command:

pip show cloudscraper

If the value of the version field is not the latest release, please run the following to update your package:

pip install cloudscraper -U

If you are still encountering a problem, open an issue and please include:

The full exception and stack trace.
The URL of the Cloudflare-protected page which the script does not work on.
A Pastebin or Gist containing the HTML source of the protected page.
The version number from pip show cloudscraper.

Usage

The simplest way to use cloudscraper is by calling create_scraper().

import cloudscraper

scraper = cloudscraper.create_scraper()  # returns a CloudScraper instance
# Or: scraper = cloudscraper.CloudScraper()  # CloudScraper inherits from requests.Session
print(scraper.get("http://somesite.com").text)  # => "<!DOCTYPE html><html><head>..."

That's it...

Any requests made from this session object to websites protected by Cloudflare anti-bot will be handled automatically. Websites not using Cloudflare will be treated normally. You don't need to configure or call anything further, and you can effectively treat all websites as if they're not protected with anything.

You use cloudscraper exactly the same way you use Requests. cloudScraper works identically to a Requests Session object, just instead of calling requests.get() or requests.post(), you call scraper.get() or scraper.post().

Consult Requests' documentation for more information.

Options

Brotli

Description

Brotli decompression support has been added, and it is enabled by default.

Parameters

Parameter	Value	Default
allow_brotli	(boolean)	True

Example

scraper = cloudscraper.create_scraper(allow_brotli=False)

Browser / User-Agent Filtering

Description

Control how and which User-Agent is "randomly" selected.

Parameters

Can be passed as an argument to create_scraper(), get_tokens(), get_cookie_string().

Parameter	Value	Default
browser	(string) `chrome` or `firefox`	None

Parameter	Value	Default
browser	(dict)

`browser` dict Parameters

Parameter	Value	Default
browser	(string) `chrome` or `firefox`	None
mobile	(boolean)	True
desktop	(boolean)	True
platform	(string) `'linux', 'windows', 'darwin', 'android', 'ios'`	None
custom	(string)	None

Example

scraper = cloudscraper.create_scraper(browser='chrome')

# will give you only mobile chrome User-Agents on Android
scraper = cloudscraper.create_scraper(
    browser={
        'browser': 'chrome',
        'platform': 'android',
        'desktop': False
    }
)

# will give you only desktop firefox User-Agents on Windows
scraper = cloudscraper.create_scraper(
    browser={
        'browser': 'firefox',
        'platform': 'windows',
        'mobile': False
    }
)

# Custom will also try find the user-agent string in the browsers.json,
# If a match is found, it will use the headers and cipherSuite from that "browser",
# Otherwise a generic set of headers and cipherSuite will be used.
scraper = cloudscraper.create_scraper(
    browser={
        'custom': 'ScraperBot/1.0',
    }
)

Debug

Description

Prints out header and content information of the request for debugging.

Parameters

Can be set as an attribute via your cloudscraper object or passed as an argument to create_scraper(), get_tokens(), get_cookie_string().

Parameter	Value	Default
debug	(boolean)	False

Example

scraper = cloudscraper.create_scraper(debug=True)

Delays

Description

Cloudflare IUAM challenge requires the browser to wait ~5 seconds before submitting the challenge answer, If you would like to override this delay.

Parameters

Can be set as an attribute via your cloudscraper object or passed as an argument to create_scraper(), get_tokens(), get_cookie_string().

Parameter	Value	Default
delay	(float)	extracted from IUAM page

Example

scraper = cloudscraper.create_scraper(delay=10)

Existing session

Description:

If you already have an existing Requests session, you can pass it to the function create_scraper() to continue using that session.

Parameters

Parameter	Value	Default
sess	(requests.session)	None

Example

session = requests.session()
scraper = cloudscraper.create_scraper(sess=session)

Note

Unfortunately, not all of Requests session attributes are easily transferable, so if you run into problems with this,

You should replace your initial session initialization call

From:

sess = requests.session()

To:

sess = cloudscraper.create_scraper()

JavaScript Engines and Interpreters

Description

cloudscraper currently supports the following JavaScript Engines/Interpreters

ChakraCore
js2py
native: Self made native python solver (Default)
Node.js
V8

Parameters

Can be set as an attribute via your cloudscraper object or passed as an argument to create_scraper(), get_tokens(), get_cookie_string().

Parameter	Value	Default
interpreter	(string)	`native`

Example

scraper = cloudscraper.create_scraper(interpreter='nodejs')

3rd Party Captcha Solvers

Description

cloudscraper currently supports the following 3rd party Captcha solvers, should you require them.

2captcha
anticaptcha
CapMonster Cloud
deathbycaptcha
9kw
return_response

Note

I am working on adding more 3rd party solvers, if you wish to have a service added that is not currently supported, please raise a support ticket on github.

Required Parameters

Can be set as an attribute via your cloudscraper object or passed as an argument to create_scraper(), get_tokens(), get_cookie_string().

Parameter	Value	Default
captcha	(dict)	None

2captcha

Required `captcha` Parameters

Parameter	Value	Required	Default
provider	(string) `2captcha`	yes
api_key	(string)	yes
no_proxy	(boolean)	no	False

Note

if proxies are set you can disable sending the proxies to 2captcha by setting no_proxy to True

Example

scraper = cloudscraper.create_scraper(
  interpreter='nodejs',
  captcha={
    'provider': '2captcha',
    'api_key': 'your_2captcha_api_key'
  }
)

anticaptcha

Required `captcha` Parameters

Parameter	Value	Required	Default
provider	(string) `anticaptcha`	yes
api_key	(string)	yes
no_proxy	(boolean)	no	False

Note

if proxies are set you can disable sending the proxies to anticaptcha by setting no_proxy to True

Example

scraper = cloudscraper.create_scraper(
  interpreter='nodejs',
  captcha={
    'provider': 'anticaptcha',
    'api_key': 'your_anticaptcha_api_key'
  }
)

CapMonster Cloud

Required `captcha` Parameters

Parameter	Value	Required	Default
provider	(string) `capmonster`	yes
clientKey	(string)	yes
no_proxy	(boolean)	no	False

Note

if proxies are set you can disable sending the proxies to CapMonster by setting no_proxy to True

Example

scraper = cloudscraper.create_scraper(
  interpreter='nodejs',
  captcha={
    'provider': 'capmonster',
    'clientKey': 'your_capmonster_clientKey'
  }
)

deathbycaptcha

Required `captcha` Parameters

Parameter	Value	Required
provider	(string) `deathbycaptcha`	yes
username	(string)	yes
password	(string)	yes

Example

scraper = cloudscraper.create_scraper(
  interpreter='nodejs',
  captcha={
    'provider': 'deathbycaptcha',
    'username': 'your_deathbycaptcha_username',
    'password': 'your_deathbycaptcha_password',
  }
)

9kw

Required `captcha` Parameters

Parameter	Value	Required	Default
provider	(string) `9kw`	yes
api_key	(string)	yes
maxtimeout	(int)	no	180

Example

scraper = cloudscraper.create_scraper(
  interpreter='nodejs',
  captcha={
    'provider': '9kw',
    'api_key': 'your_9kw_api_key',
    'maxtimeout': 300
  }
)

return_response

Use this if you want the requests response payload without solving the Captcha.

Required `captcha` Parameters

Parameter	Value	Required	Default
provider	(string) `return_response`	yes

Example

scraper = cloudscraper.create_scraper(
  interpreter='nodejs',
  captcha={'provider': 'return_response'}
)

Integration

It's easy to integrate cloudscraper with other applications and tools. Cloudflare uses two cookies as tokens: one to verify you made it past their challenge page and one to track your session. To bypass the challenge page, simply include both of these cookies (with the appropriate user-agent) in all HTTP requests you make.

To retrieve just the cookies (as a dictionary), use cloudscraper.get_tokens(). To retrieve them as a full Cookie HTTP header, use cloudscraper.get_cookie_string().

get_tokens and get_cookie_string both accept Requests' usual keyword arguments (like get_tokens(url, proxies={"http": "socks5://localhost:9050"})).

Please read Requests' documentation on request arguments for more information.

User-Agent Handling

The two integration functions return a tuple of (cookie, user_agent_string).

You must use the same user-agent string for obtaining tokens and for making requests with those tokens, otherwise Cloudflare will flag you as a bot.

That means you have to pass the returned user_agent_string to whatever script, tool, or service you are passing the tokens to (e.g. curl, or a specialized scraping tool), and it must use that passed user-agent when it makes HTTP requests.

Integration examples

Remember, you must always use the same user-agent when retrieving or using these cookies. These functions all return a tuple of (cookie_dict, user_agent_string).

Retrieving a cookie dict through a proxy

get_tokens is a convenience function for returning a Python dict containing Cloudflare's session cookies. For demonstration, we will configure this request to use a proxy. (Please note that if you request Cloudflare clearance tokens through a proxy, you must always use the same proxy when those tokens are passed to the server. Cloudflare requires that the challenge-solving IP and the visitor IP stay the same.)

If you do not wish to use a proxy, just don't pass the proxies keyword argument. These convenience functions support all of Requests' normal keyword arguments, like params, data, and headers.

import cloudscraper

proxies = {"http": "http://localhost:8080", "https": "http://localhost:8080"}
tokens, user_agent = cloudscraper.get_tokens("http://somesite.com", proxies=proxies)
print(tokens)
# => {
    'cf_clearance': 'c8f913c707b818b47aa328d81cab57c349b1eee5-1426733163-3600',
    '__cfduid': 'dd8ec03dfdbcb8c2ea63e920f1335c1001426733158'
}

Retrieving a cookie string

get_cookie_string is a convenience function for returning the tokens as a string for use as a Cookie HTTP header value.

This is useful when crafting an HTTP request manually, or working with an external application or library that passes on raw cookie headers.

import cloudscraper

cookie_value, user_agent = cloudscraper.get_cookie_string('http://somesite.com')

print('GET / HTTP/1.1\nCookie: {}\nUser-Agent: {}\n'.format(cookie_value, user_agent))

# GET / HTTP/1.1
# Cookie: cf_clearance=c8f913c707b818b47aa328d81cab57c349b1eee5-1426733163-3600; __cfduid=dd8ec03dfdbcb8c2ea63e920f1335c1001426733158
# User-Agent: Some/User-Agent String

curl example

Here is an example of integrating cloudscraper with curl. As you can see, all you have to do is pass the cookies and user-agent to curl.

import subprocess
import cloudscraper

# With get_tokens() cookie dict:

# tokens, user_agent = cloudscraper.get_tokens("http://somesite.com")
# cookie_arg = 'cf_clearance={}; __cfduid={}'.format(tokens['cf_clearance'], tokens['__cfduid'])

# With get_cookie_string() cookie header; recommended for curl and similar external applications:

cookie_arg, user_agent = cloudscraper.get_cookie_string('http://somesite.com')

# With a custom user-agent string you can optionally provide:

# ua = "Scraping Bot"
# cookie_arg, user_agent = cloudscraper.get_cookie_string("http://somesite.com", user_agent=ua)

result = subprocess.check_output(
    [
        'curl',
        '--cookie',
        cookie_arg,
        '-A',
        user_agent,
        'http://somesite.com'
    ]
)

Trimmed down version. Prints page contents of any site protected with Cloudflare, via curl.

Warning: shell=True can be dangerous to use with subprocess in real code.

url = "http://somesite.com"
cookie_arg, user_agent = cloudscraper.get_cookie_string(url)
cmd = "curl --cookie {cookie_arg} -A {user_agent} {url}"
print(
    subprocess.check_output(
        cmd.format(
            cookie_arg=cookie_arg,
            user_agent=user_agent,
            url=url
        ),
        shell=True
    )
)

Comments

Ugly fix for fake jsch_vc, pass params

They started to include second fake form with bad params that we have to ignore. Challenge html code: https://gist.github.com/oczkers/b4f7408e81c70b9b32643690d2caf19e website: https://takefile.link

OrderedDict uses only last value when there are duplicate keys so we ended up with jschl_vc=1, pass="" I've fixed it by reversing list before converting list->OrderedDict so now it uses first seen values instead of last seen. It worked for this site but can be easly changed in future probably so this is ugly fix and You probably don't want to merge this - we should use sth more bulletproof like loop checking params one by one or cutting part of html code before regex etc.

opened by oczkers 10
Allow replacing actual call to perform HTTP request via subclassing

Basically, I have my own urllib wrapper that I'm maintaining almost entirely out of spite. It has it's own cookie/UA/header/etc... management, and I'd like to be able to just wrap that instead of having to move things back and forth between it and the requests session continuously.

This change basically moves the actual calls to the parent super().request() call into a stub function, so I can subclass CloudScraper(), and then just replace the body of perform_request() with my own HTTP fetching machinery.

I'm not sure this is something of interest to really anyone other then myself, but it's also a really simple change (and could potentially be useful for testing purposes/mocking as well). Being able to plug in any arbitrary HTTP(s) transport seems like a nice feature too.

opened by fake-name 6
Added parameter to change the type of encryption used

I was having problems to perform the handshake with some servers because it is using 384bit encryption, so I found a type that solves my problem the "secp384r1". I added the possibility for the user to choose the best algorithm for each use.

The main problem I had was handshake errors like: (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE]) sslv3 alert handshake failure (_ssl.c:1108)'))

opened by felipehertzer 4
fix: Add missing comma in `get_tokens`
This comma has been most probably been left out unintentionally, leading to string concatenation between the two consecutive lines. This issue has been found automatically using a regular expression.
opened by mrshu 2
Async integration

As it is not possible to create issues I'll ask here.

I am coming from aiocfscrape which was an async approach/reimplementation of cfscrape. cfscrape seems to be dead nowadays. Thus aiocfscrape would now do the bypassing by itself or rebasing on a new project. Either way, it would need to be rewritten.

But as you seem to be fond of supporting various environments (eg. multiple different JS engine and captcha services).

Thus I propose to add async support with aiohttp directly to this repo instead of leeching off this one.

Architecturally I'd put the different implementations (requests, aiohttp) similarly as the JS engine and captcha service into one place, where then the user can say he wants either one of them. The difference would be however that the user can tell the session async=True and it'll then get the async implementation instead of the requests one. Another way would be to just create a new module and tell the user to import from ...async.CloudScraper instead. This would also need second implementations of eg. the node js engine as we'd have to use async subprocesses instead of the usual one.

This would also mean the python version compatibility wouldn't be 3.x but rather at least 3.5.x or rather even 3.6 as 3.5 actually reached its end of life.

I'd be glad to create/maintain the async implementation.

opened by Nachtalb 2
Use requests.utils.extract_zipped_paths() to read browsers.json

I was packaging cloudscraper and requests in a zip file and had kludged a way to read browsers.json, when I found that requests already had a better solution that it uses to read certifi.cacert.pem.

I applied it to cloudscraper and thought I'd at least offer it to you. It's up to you, of course, whether you find this useful or not.

Thanks for making cloudscraper available.

opened by JimmXinu 1
Some fix in regex to pass a cloudflare

I had issues with a cloudflare (I add it in tests folder), because there is a class in form markup or spaces that make the parsing wrong.

For information, I success pass this cloudflare only with the js2py, there was errors with native (I had a loop, so i think the result of challeng is wrong).

opened by aroquemaurel 1
setup: exclude tests for default install

We probably don't need install tests for "normal" users and this is required to get gentoo ebuild working (package manager).

ps. You forgot to push new release/archive on github - latest is 1.2.9

opened by oczkers 1
Add tests

I made a couple of necessary fixes to pass some tests and a couple are being skipped for the time being. If running tox and you have .tox cache, you'll need to remove it to refresh dependencies Unable to use make ci on travi-ci atm, related to https://github.com/pytest-dev/pytest-xdist/issues/187

Coverage from the CI build: https://coveralls.io/github/pro-src/cloudscraper.py

opened by ghost 1
Dev environment + CI configuration
.coveragerc in preference of not passing certain CLI arguments around

Update .gitignore to exclude generated tests/coverage related files

Update .gitignore to exclude .idea/ and .env (Prevent accidental inclusion)

.travis.yml for testing on CI (Coverage reporting isn't configured for it yet)

Makefile for common tasks

Pipfile for dev-dependency management and CI builds

Pipfie.lock to lock dev-dependencies for CI builds

requirements.txt for some integrations and convenience

Fixed some typos

Updated dependencies

tests/test_cloudscraper.py is just a placeholder

tox.ini - tox configuration file for pre-release testing, etc.

Other dev-dependencies as required by future tests

Prefers coveralls for coverage reporting
opened by ghost 1
CloudflareChallengeError: Detected a Cloudflare version 2 challenge

Hello, i got this error : CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version. Can you help me ?

opened by bL34ch-exe 0

Releases(1.2.65)

1.2.65(Nov 9, 2022)
Add new captcha provider Captcha AI

Source code(tar.gz)
Source code(zip)
1.2.62(Aug 27, 2022)
Whats the point in even trying to detect v1 challenges anymore...

Source code(tar.gz)
Source code(zip)
1.2.61(Aug 27, 2022)
Updated v2 detection

Source code(tar.gz)
Source code(zip)
1.2.60(Mar 15, 2022)
Old code sitting in dev, pushed to prod..

Source code(tar.gz)
Source code(zip)
1.2.58(Apr 10, 2021)
Cloudflare changing minor things that broke the regex.

Changed debug to support non printable UTF-8

Source code(tar.gz)
Source code(zip)
1.2.56(Jan 28, 2021)
The arms race continues, updated Cloudflare's changes ... agaaaaaaaaiiiin

Source code(tar.gz)
Source code(zip)
1.2.54(Jan 27, 2021)
Refactoring code and move away from supporting python 2

Bug fix Cloudflare v2 detection

Source code(tar.gz)
Source code(zip)
1.2.52(Jan 7, 2021)
Added in "Bot Fight Mode" detection

Source code(tar.gz)
Source code(zip)
1.2.50(Dec 25, 2020)
hCaptcha fix for deathbycaptcha

Added new captcha provider CapMonster Cloud

Replaced package polling with polling2 in captcha modules

Source code(tar.gz)
Source code(zip)
1.2.46(Jul 27, 2020)
Removed debug from 2captcha (ooops my bad).

Added in no_proxy to captcha parameters if you dont want to send proxy to 2captcha / anticaptcha.

Added in platform filtering to browser (User-Agent) via platform parameter.

added doubleDown parameter to control if re-request is to be performed when Captcha is detected.

Source code(tar.gz)
Source code(zip)
1.2.44(Jul 24, 2020)
Initial update to Captcha providers to support proxies (anti-captcha, 2captcha).

Some re-wording, comments and general house cleaning.

Source code(tar.gz)
Source code(zip)
1.2.40(May 27, 2020)
~12 days have passed and Cloudflare updated again... they keeping to the schedule 👍

Fixed: Cloudflare V1 challenge change (broke regex by introducing blank a.value).

Fixed: string -> float -> string causes issues in py2 str() rounding precision

Enhancement: Added Pre/Posting Hooking into request function.

Source code(tar.gz)
Source code(zip)
1.2.38(May 16, 2020)
Update regex for new Cloudflare changes in numerous places.

Updated JSFuck challenge for new dynamic k variable.

Updated interpreters to account for new dynamic k allocation from subset list.

Source code(tar.gz)
Source code(zip)
1.2.36(May 4, 2020)
Update regex for Cloudflare form challenge

Overwrite auto_set_ecdh by manually setting elliptic curve

Rewrote native interpreter for JSFuck due to nested calculations

Added exception if new Cloudflare challenge detected.

Added support for hCaptcha in 9KW

Source code(tar.gz)
Source code(zip)
1.2.34(Apr 22, 2020)
Add ability for custom ssl context to be passed

Added new timer to anticaptcha module

Fixed Cloudflare's challenge form change

Removed DNT from headers causing reCaptcha on some sites

Updated cipher suite for browsers

Source code(tar.gz)
Source code(zip)
1.2.32(Apr 2, 2020)
hCaptcha support added (anticaptcha, 2captcha)

Fix for Cloudflare dual challenge form

cipherSuite update

Source code(tar.gz)
Source code(zip)
1.2.30(Mar 20, 2020)
Refactored Exceptions classes.

Updated cipherSuites for Cloudflare.

Source code(tar.gz)
Source code(zip)
1.2.28(Mar 11, 2020)

The good folks over at Cloudflare have changed something... yet again... and explicitly setting ALPN now causes challenge issues on Ubuntu and Windows.

Source code(tar.gz)
Source code(zip)
1.2.26(Mar 4, 2020)
Cloudflare made a change, my change should circumvent theirs.

Force SSL Context ALPN to HTTP/1.1

Source code(tar.gz)
Source code(zip)
1.2.24(Feb 18, 2020)
Just some refactoring / bug fixes

Refactored 302 Redirect on localized path with no schema.

@dipu-bd submitted PR for User_Agent.loadUserAgent() to close browser.json.

Thanks to @Fran008 , @TheYoke @paulitap88 , @vrayv and anyone else I missed for raising the tickets and testing the dev branches for me ❤️

cheers guys.
Source code(tar.gz)
Source code(zip)
1.2.23(Feb 15, 2020)
Hotfix 1.2.23

Fix reCaptcha html entities

Fix 302 Redirect after challenge solve, fix redirecting to existing path via `./

Source code(tar.gz)
Source code(zip)
1.2.22(Feb 14, 2020)
Custom Exceptions

Unescape HTML Entities Cloudflare introduced on the challenge request.

Source code(tar.gz)
Source code(zip)
1.2.20(Jan 15, 2020)
Changed openSSL warning to a print instead of a raised exception.

Removed Brotli as a required dependency.

Updated cipher suites for User-Agents

Fixed a bug in matching custom User-Agents

Source code(tar.gz)
Source code(zip)
1.2.18(Dec 25, 2019)
Hohoho Merry Christmas. …

Improve / re-implement redirection support

Also support http -> https protocol scheme switch on challenge solve

Re-word Cloudflare 1020 block message

Add cookie test

Updated README.md

Source code(tar.gz)
Source code(zip)
1.2.16(Dec 12, 2019)
Has a number of fixes

New native python Cloudflare challenge solver (now default interpreter).

Cloudflare sometimes redirects instead of passthrough after challenge solve, re-request if is redirect.

Alert/Raise Error if Cloudflare 1020 firewall block detected.

Removed the requirement for pyopenssl

Split out pytests into dev requirements

Added custom parameter to browser param for custom User-Agent

Added a tryMatch code in conjuction with custom parameter, which will try find and load ciphers && headers if custom is found in browsers.json otherwise a custom default will be used.

Source code(tar.gz)
Source code(zip)
1.2.9(Nov 27, 2019)
IUAM

Endpoints have changed to detect parameter __cf_chl_jschl_tk__ with UUID, for the challenge solve

Method is now a POST, no longer a GET

Parameter's have been removed, and are now instead data in the POST form

reCaptcha

Changes in IUAM apply here as well as the additional listed below

Endpoints have changed to detect parameter __cf_chl_captcha_tk__ with UUID, for the challenge solve

New id param in payload added, id derived from CF-RAY header, which is also in the variable data-ray

Testing

testing is disabled till I write some new tests.

Source code(tar.gz)
Source code(zip)
1.2.8(Nov 12, 2019)

Fixed an issue with reCaptcha where if urllib3 < 1.25.1 and content was brotli compressed, it was not decompressing the brotli content.
Source code(tar.gz)
Source code(zip)

1.2.7(Nov 6, 2019)

Removed cipher ECDHE-RSA-CHACHA20-POLY1305 to mitigate reCaptcha generation from Cloudflare

Removed Nothinng -> RuntimeError ReCaptcha
Removed Nothinng -> RuntimeError ReCaptcha
Removed TLS_CHACHA20_POLY1305_SHA256 -> RuntimeError ReCaptcha
Removed TLS_AES_128_GCM_SHA256 -> RuntimeError ReCaptcha
Removed TLS_AES_256_GCM_SHA384 -> RuntimeError ReCaptcha
Removed ECDHE-RSA-AES128-GCM-SHA256 -> 200
Removed AES128-GCM-SHA256 -> 200
Removed AES256-GCM-SHA384 -> 200
Removed AES256-SHA -> 200
Removed ECDHE-ECDSA-AES256-GCM-SHA384 -> 200
Removed ECDHE-ECDSA-CHACHA20-POLY1305 -> 200
Removed ECDHE-RSA-CHACHA20-POLY1305 -> 200
Removed ECDHE-ECDSA-AES128-GCM-SHA256 -> 200
Removed TLS_AES_128_GCM_SHA256 -> RuntimeError ReCaptcha
Removed TLS_AES_256_GCM_SHA384 -> RuntimeError ReCaptcha
Removed TLS_CHACHA20_POLY1305_SHA256 -> RuntimeError ReCaptcha
Removed ECDHE-ECDSA-AES128-GCM-SHA256 -> 200
Removed ECDHE-ECDSA-AES256-SHA -> 200
Removed ECDHE-RSA-CHACHA20-POLY1305 -> 200
Removed ECDHE-ECDSA-AES128-SHA -> 200
Removed ECDHE-RSA-AES128-GCM-SHA256 -> 200
Removed ECDHE-ECDSA-CHACHA20-POLY1305 -> 200
Removed DHE-RSA-AES256-SHA -> 200
Removed ECDHE-ECDSA-AES256-GCM-SHA384 -> 200
Removed AES256-SHA -> 200
Removed DHE-RSA-AES128-SHA -> 200

* Working list, by removing one of these ciphers in both browsers:
ECDHE-RSA-CHACHA20-POLY1305
ECDHE-RSA-AES128-GCM-SHA256
ECDHE-ECDSA-CHACHA20-POLY1305
ECDHE-ECDSA-AES256-GCM-SHA384
ECDHE-ECDSA-AES128-GCM-SHA256
AES256-SHA

+-------------------------------+--------+---------+------------+
|             Cipher            | Chrome | Firefox | Removable? |
+-------------------------------+--------+---------+------------+
|     TLS_AES_128_GCM_SHA256    |   X    |    X    |            |
|     TLS_AES_256_GCM_SHA384    |   X    |    X    |            |
|  TLS_CHACHA20_POLY1305_SHA256 |   X    |    X    |            |
| ECDHE-ECDSA-AES128-GCM-SHA256 |   X    |    X    |    Yes     |
|  ECDHE-RSA-AES128-GCM-SHA256  |   X    |    X    |    Yes     |
| ECDHE-ECDSA-AES256-GCM-SHA384 |   X    |    X    |    Yes     |
| ECDHE-ECDSA-CHACHA20-POLY1305 |   X    |    X    |    Yes     |
|  ECDHE-RSA-CHACHA20-POLY1305  |   X    |    X    |    Yes     |
|       AES128-GCM-SHA256       |   X    |         |            |
|       AES256-GCM-SHA384       |   X    |         |            |
|           AES256-SHA          |   X    |    X    |    Yes     |
|     ECDHE-ECDSA-AES256-SHA    |        |    X    |            |
|     ECDHE-ECDSA-AES128-SHA    |        |    X    |            |
|       DHE-RSA-AES128-SHA      |        |    X    |            |
|       DHE-RSA-AES256-SHA      |        |    X    |            |
+-------------------------------+--------+---------+------------+

Source code(tar.gz)
Source code(zip)

1.2.5(Oct 23, 2019)
Removed cipher ECDHE-RSA-AES256-GCM-SHA384 to mitigate reCaptcha generation from Cloudflare

Source code(tar.gz)
Source code(zip)
1.2.2(Oct 9, 2019)

Fix reCaptcha class solveCaptcha params.
Source code(tar.gz)
Source code(zip)

Owner

VeNoMouS

Discord: VeNoMouSNZ#5979

GitHub

This is a module that I had created along with my friend. It's a basic web scraping module

QuickInfo PYPI link : https://pypi.org/project/quickinfo/ This is the library that you've all been searching for, it's built for developers and allows

2 Dec 13, 2021

Python script who crawl first shodan page and check DBLTEK vulnerability

?? MASS DBLTEK EXPLOIT CHECKER USING SHODAN ?? Python script who crawl first shodan page and check DBLTEK vulnerability

4 Jan 9, 2022

Screenhook is a script that captures an image of a web page and send it to a discord webhook.

screenshot from the web for discord webhooks screenhook is a script that captures an image of a web page and send it to a discord webhook.

3 Jun 4, 2022

Scrapping the data from each page of biocides listed on the BAUA website into a csv file

1 Nov 30, 2021

EBay-email-tracker - Scapes an entire search page of a particular item on eBay and sends regular updates to an email address

Introduction This is a project I built with the sole intent to learn more about

1 Jan 14, 2022

An application that on a given url, crowls a web page and gets all words, sorts and counts them.

Web-Scrapping-1 An application that on a given url, crowls a web page and gets all words, sorts and counts them. Installation Using the package manage

1 Jan 16, 2022

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

1 Feb 10, 2022

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group

8.4k Jan 8, 2023

A python module to parse the Open Graph Protocol

OpenGraph is a module of python for parsing the Open Graph Protocol, you can read more about the specification at http://ogp.me/ Installation $ pip in

213 Nov 12, 2022

A simple proxy scraper that utilizes the requests module in python.

Proxy Scraper A simple proxy scraper that utilizes the requests module in python. Usage Depending on your python installation your commands may vary.

3 Sep 8, 2021

Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

NewsScraper A simple Python 3 module to get crypto or news articles and their content from various RSS feeds. ?? Installation Clone the repo locally.

3 Jan 2, 2022

Footballmapies - Football mapies for learning webscraping and use of gmplot module in python

1 Jan 28, 2022

VG-Scraper is a python program using the module called BeautifulSoup which allows anyone to scrape something off an website. This program lets you put in a number trough an input and a number is 1 news article.

VG-Scraper VG-Scraper is a convinient program where you can find all the news articles instead of finding one yourself. Installing [Linux] Open a term

3 Feb 13, 2022

A Pixiv web crawler module

Pixiv-spider A Pixiv spider module WARNING It's an unfinished work, browsing the code carefully before using it. Features 0004 - Readme.md updated, co

1 Nov 14, 2021

ChromiumJniGenerator - Jni Generator module extracted from Chromium project

4 Jun 12, 2022

A module for CME that spiders hashes across the domain with a given hash.

hash_spider A module for CME that spiders hashes across the domain with a given hash. Installation Simply copy hash_spider.py to your CME module folde

37 Sep 8, 2022

This Spider/Bot is developed using Python and based on Scrapy Framework to Fetch some items information from Amazon

- Hello, This Project Contains Amazon Web-bot. - I've developed this bot for fething some items information on Amazon. - Scrapy Framework in Python is

4 Feb 13, 2022

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website (prior form publication) by form number and returns the results as json. It provides the option to download pdfs over a range of years.

1 Jan 4, 2022

Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a bot

Aliexpress to telegram post Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a b

6 Dec 6, 2022

A Python module to bypass Cloudflare's anti-bot page.

Related tags

Overview

cloudscraper

Donations

Installation

Dependencies

Javascript Interpreters and Engines

Updates

Usage

Options

Brotli

Description

Parameters

Example

Browser / User-Agent Filtering

Description

Parameters

browser dict Parameters

Example

Debug

Description

Parameters

Example

Delays

Description

Parameters

Example

Existing session

Description:

Parameters

Example

Note

JavaScript Engines and Interpreters

Description

Parameters

Example

3rd Party Captcha Solvers

Description

Note

Required Parameters

2captcha

Required captcha Parameters

Note

Example

anticaptcha

Required captcha Parameters

Note

Example

CapMonster Cloud

Required captcha Parameters

Note

Example

deathbycaptcha

Required captcha Parameters

Example

9kw

Required captcha Parameters

Example

return_response

Required captcha Parameters

Example

Integration

User-Agent Handling

Integration examples

Retrieving a cookie dict through a proxy

Retrieving a cookie string

curl example

Comments

Releases(1.2.65)

1.2.65(Nov 9, 2022)

1.2.62(Aug 27, 2022)

1.2.61(Aug 27, 2022)

1.2.60(Mar 15, 2022)

1.2.58(Apr 10, 2021)

1.2.56(Jan 28, 2021)

1.2.54(Jan 27, 2021)

1.2.52(Jan 7, 2021)

1.2.50(Dec 25, 2020)

1.2.46(Jul 27, 2020)

`browser` dict Parameters

Required `captcha` Parameters

Required `captcha` Parameters

Required `captcha` Parameters

Required `captcha` Parameters

Required `captcha` Parameters

Required `captcha` Parameters