Scrape plants scientific name information from Agroforestry Species Switchboard 2.0.

Overview

Agroforestry Species Switchboard 2.0 Scraper

contributions welcome MIT License

Watch on GitHub Star on GitHub Tweet

Scrape plants scientific name information from Species Switchboard 2.0.

Requirements

How to run

  1. Install dependencies
    cp env.sample .env
    pipenv --python 3
    pipenv install
  2. Run
    pipenv run python main.py
  3. The result will be placed in a file named result.*.csv

Test Shell

pipenv run scrapy shell 'http://apps.worldagroforestry.org/products/switchboard/index.php/species_search/Acacia%20abyssinica'

Cleanup All Outputs

rm result.* && rm log.*

Special Cases

Case Link Note
ICRAF Databases Not Found Engelhardia spicata
Genus Found Forficula What to do next?
Multiple Species Found Alstonia spectabilis Get the matched species right?
Species Variant Found Engelhardtia spicata Need human to check
Similar Species Found Costus speciosus Need human to check

Contributing

  1. Fork this repo
  2. Develop
  3. Create pull request
  4. Tag @rizqirizqi for review
  5. Merge~~

License

GPL-3.0

Comments
  • Bump twisted from 21.7.0 to 22.4.0

    Bump twisted from 21.7.0 to 22.4.0

    Bumps twisted from 21.7.0 to 22.4.0.

    Release notes

    Sourced from twisted's releases.

    Twisted 22.4.0 (2022-04-11)

    Features

    • twisted.python.failure.Failure tracebacks now capture module information, improving compatibility with the Raven Sentry client. (#7796)
    • twisted.python.failure.Failure objects are now compatible with dis.distb, improving compatibility with post-mortem debuggers. (#9599)

    Bugfixes

    • twisted.internet.interfaces.IReactorSSL.listenSSL now has correct type annotations. (#10274)
    • twisted.internet.test.test_glibbase.GlibReactorBaseTests now passes. (#10317)

    Conch

    Features

    
    - twisted.conch.ssh now supports using RSA keys with SHA-2 signatures (RFC 8332) when acting as a server.  The rsa-sha2-512 and rsa-sha2-256 public key signature algorithms are automatically preferred over ssh-rsa if the client advertises support for them; the actual public keys do not need to change. ([#9765](https://github.com/twisted/twisted/issues/9765))
    - twisted.conch.ssh now has an alternative Ed25519 implementation using PyNaCl, in order to support platforms that lack OpenSSL >= 1.1.1b.  The new "conch_nacl" extra has the necessary dependency. ([#10208](https://github.com/twisted/twisted/issues/10208))
    

    Misc

    
    -  ([#10313](https://github.com/twisted/twisted/issues/10313))
    

    Web

    Features </code></pre> <ul> <li>Twisted is now compatible with h2 4.x.x. (<a href="https://github-redirect.dependabot.com/twisted/twisted/issues/10182">#10182</a>)</li> </ul> <p>Bugfixes</p> <pre><code>

    • twisted.web.http had several several defects in HTTP request parsing that could permit HTTP request smuggling. It now disallows signed Content-Length headers, forbids illegal characters in chunked extensions, forbids a 0x prefix to chunk lengths, and only strips spaces and horizontal tab characters from header values. These changes address CVE-2022-24801 and GHSA-c2jg-hw38-jrqq. (#10323)

    Mail

    &lt;/tr&gt;&lt;/table&gt; </code></pre> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary>

    <p><em>Sourced from <a href="https://github.com/twisted/twisted/blob/trunk/NEWS.rst">twisted's changelog</a>.</em></p> <blockquote> <h1>Twisted 22.4.0 (2022-04-11)</h1> <h2>Features</h2> <ul> <li>twisted.python.failure.Failure tracebacks now capture module information, improving compatibility with the Raven Sentry client. (<a href="https://github-redirect.dependabot.com/twisted/twisted/issues/7796">#7796</a>)</li> <li>twisted.python.failure.Failure objects are now compatible with dis.distb, improving compatibility with post-mortem debuggers. (<a href="https://github-redirect.dependabot.com/twisted/twisted/issues/9599">#9599</a>)</li> </ul> <h2>Bugfixes</h2> <ul> <li>twisted.internet.interfaces.IReactorSSL.listenSSL now has correct type annotations. (<a href="https://github-redirect.dependabot.com/twisted/twisted/issues/10274">#10274</a>)</li> <li>twisted.internet.test.test_glibbase.GlibReactorBaseTests now passes. (<a href="https://github-redirect.dependabot.com/twisted/twisted/issues/10317">#10317</a>)</li> </ul> <h2>Conch</h2> <p>Features</p> <pre><code>

    • twisted.conch.ssh now supports using RSA keys with SHA-2 signatures (RFC 8332) when acting as a server. The rsa-sha2-512 and rsa-sha2-256 public key signature algorithms are automatically preferred over ssh-rsa if the client advertises support for them; the actual public keys do not need to change. (#9765)
    • twisted.conch.ssh now has an alternative Ed25519 implementation using PyNaCl, in order to support platforms that lack OpenSSL &gt;= 1.1.1b. The new &quot;conch_nacl&quot; extra has the necessary dependency. (#10208)

    Misc

    Web

    Features

    • Twisted is now compatible with h2 4.x.x. (#10182)

    Bugfixes

    
    - twisted.web.http had several several defects in HTTP request parsing that could permit HTTP request smuggling. It now disallows signed Content-Length headers, forbids illegal characters in chunked extensions, forbids a ``0x`` prefix to chunk lengths, and only strips spaces and horizontal tab characters from header values. These changes address CVE-2022-24801 and GHSA-c2jg-hw38-jrqq. ([#10323](https://github.com/twisted/twisted/issues/10323))
    

    Mail

    </tr></table>

    ... (truncated)

    Commits
    • ed86633 Mark as misc.
    • c894617 Update format for release notes item.
    • 5c5c046 Revert coverage reporting changes.
    • 682f2c3 Manual fix the news.
    • dd98e9c python -m incremental.update Twisted --newversion 22.4.0
    • 3eabae5 Fix coverage reporting as codecov v1 was removed.
    • a265267 Update after review.
    • efac92c tox -e towncrier
    • 5ece2d4 python -m incremental.update Twisted --rc
    • 592217e Merge pull request from GHSA-c2jg-hw38-jrqq
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page.
    dependencies 
    opened by dependabot[bot] 1
  • Bump scrapy from 2.5.1 to 2.7.0

    Bump scrapy from 2.5.1 to 2.7.0

    Bumps scrapy from 2.5.1 to 2.7.0.

    Release notes

    Sourced from scrapy's releases.

    2.7.0

    See the full changelog

    2.6.3

    Makes pip install Scrapy work again.

    It required making changes to support pyOpenSSL 22.1.0. We had to drop support for SSLv3 as a result.

    We also upgraded the minimum versions of some dependencies.

    See the changelog.

    2.6.2

    Fixes a security issue around HTTP proxy usage, and addresses a few regressions introduced in Scrapy 2.6.0.

    See the changelog.

    2.6.1

    Fixes a regression introduced in 2.6.0 that would unset the request method when following redirects.

    2.6.0

    • Security fixes for cookie handling (see details below)
    • Python 3.10 support
    • asyncio support is no longer considered experimental, and works out-of-the-box on Windows regardless of your Python version
    • Feed exports now support pathlib.Path output paths and per-feed item filtering and post-processing

    See the full changelog

    Security bug fixes

    • When a Request object with cookies defined gets a redirect response causing a new Request object to be scheduled, the cookies defined in the original Request object are no longer copied into the new Request object.

      If you manually set the Cookie header on a Request object and the domain name of the redirect URL is not an exact match for the domain of the URL of the original Request object, your Cookie header is now dropped from the new Request object.

      The old behavior could be exploited by an attacker to gain access to your cookies. Please, see the cjvr-mfj7-j4j8 security advisory for more information.

      Note: It is still possible to enable the sharing of cookies between different domains with a shared domain suffix (e.g. example.com and any subdomain) by defining the shared domain suffix (e.g. example.com) as the cookie domain when defining your cookies. See the documentation of the Request class for more information.

    • When the domain of a cookie, either received in the Set-Cookie header of a response or defined in a Request object, is set to a public suffix <https://publicsuffix.org/>_, the cookie is now ignored unless the cookie domain is the same as the request domain.

      The old behavior could be exploited by an attacker to inject cookies from a controlled domain into your cookiejar that could be sent to other domains not controlled by the attacker. Please, see the mfjm-vh54-3f96 security advisory for more information.

    Changelog

    Sourced from scrapy's changelog.

    Scrapy 2.7.0 (2022-10-17)

    Highlights:

    • Added Python 3.11 support, dropped Python 3.6 support
    • Improved support for :ref:asynchronous callbacks <topics-coroutines>
    • :ref:Asyncio support <using-asyncio> is enabled by default on new projects
    • Output names of item fields can now be arbitrary strings
    • Centralized :ref:request fingerprinting <request-fingerprints> configuration is now possible

    Modified requirements

    
    Python 3.7 or greater is now required; support for Python 3.6 has been dropped.
    Support for the upcoming Python 3.11 has been added.
    

    The minimum required version of some dependencies has changed as well:

    • lxml_: 3.5.0 → 4.3.0

    • Pillow_ (:ref:images pipeline &lt;images-pipeline&gt;): 4.0.0 → 7.1.0

    • zope.interface_: 5.0.0 → 5.1.0

    (:issue:5512, :issue:5514, :issue:5524, :issue:5563, :issue:5664, :issue:5670, :issue:5678)

    Deprecations

    
    -   :meth:`ImagesPipeline.thumb_path
        &lt;scrapy.pipelines.images.ImagesPipeline.thumb_path&gt;` must now accept an
        ``item`` parameter (:issue:`5504`, :issue:`5508`).
    
    • The scrapy.downloadermiddlewares.decompression module is now deprecated (:issue:5546, :issue:5547).

    New features

    • The :meth:~scrapy.spidermiddlewares.SpiderMiddleware.process_spider_output method of :ref:spider middlewares &lt;topics-spider-middleware&gt; can now be defined as an :term:asynchronous generator (:issue:4978).

    </tr></table>

    ... (truncated)

    Commits
    • 20b79a0 Bump version: 2.6.2 → 2.7.0
    • 06c8f67 2.7 release notes (#5680)
    • ea6315b Merge pull request #5679 from wRAR/template-asyncio-reactor
    • 960a7f6 Verify that the installed asyncio event loop matches ASYNCIO_EVENT_LOOP (#5529)
    • 75bb516 Adapt tests to the new value of TWISTED_REACTOR for new projects
    • 0435751 Add async callback support to the parse command (#5577)
    • 22a59d0 CI: use the latest version of Ubuntu (#5675)
    • 62cc26e Change TWISTED_REACTOR in the default template.
    • 715c05d transport.producer.loseConnection() → transport.loseConnection() (#4995)
    • da9a2f8 Remove mention of minimum PyPy versions from the documentation (#5678)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Bump scrapy from 2.5.1 to 2.6.3

    Bump scrapy from 2.5.1 to 2.6.3

    Bumps scrapy from 2.5.1 to 2.6.3.

    Release notes

    Sourced from scrapy's releases.

    2.6.3

    Makes pip install Scrapy work again.

    It required making changes to support pyOpenSSL 22.1.0. We had to drop support for SSLv3 as a result.

    We also upgraded the minimum versions of some dependencies.

    See the changelog.

    2.6.2

    Fixes a security issue around HTTP proxy usage, and addresses a few regressions introduced in Scrapy 2.6.0.

    See the changelog.

    2.6.1

    Fixes a regression introduced in 2.6.0 that would unset the request method when following redirects.

    2.6.0

    • Security fixes for cookie handling (see details below)
    • Python 3.10 support
    • asyncio support is no longer considered experimental, and works out-of-the-box on Windows regardless of your Python version
    • Feed exports now support pathlib.Path output paths and per-feed item filtering and post-processing

    See the full changelog

    Security bug fixes

    • When a Request object with cookies defined gets a redirect response causing a new Request object to be scheduled, the cookies defined in the original Request object are no longer copied into the new Request object.

      If you manually set the Cookie header on a Request object and the domain name of the redirect URL is not an exact match for the domain of the URL of the original Request object, your Cookie header is now dropped from the new Request object.

      The old behavior could be exploited by an attacker to gain access to your cookies. Please, see the cjvr-mfj7-j4j8 security advisory for more information.

      Note: It is still possible to enable the sharing of cookies between different domains with a shared domain suffix (e.g. example.com and any subdomain) by defining the shared domain suffix (e.g. example.com) as the cookie domain when defining your cookies. See the documentation of the Request class for more information.

    • When the domain of a cookie, either received in the Set-Cookie header of a response or defined in a Request object, is set to a public suffix <https://publicsuffix.org/>_, the cookie is now ignored unless the cookie domain is the same as the request domain.

      The old behavior could be exploited by an attacker to inject cookies from a controlled domain into your cookiejar that could be sent to other domains not controlled by the attacker. Please, see the mfjm-vh54-3f96 security advisory for more information.

    Changelog

    Sourced from scrapy's changelog.

    Scrapy 2.6.3 (2022-09-27)

    • Added support for pyOpenSSL_ 22.1.0, removing support for SSLv3 (:issue:5634, :issue:5635, :issue:5636).

    • Upgraded the minimum versions of the following dependencies:

      • cryptography_: 2.0 → 3.3

      • pyOpenSSL_: 16.2.0 → 21.0.0

      • service_identity_: 16.0.0 → 18.1.0

      • Twisted_: 17.9.0 → 18.9.0

      • zope.interface_: 4.1.3 → 5.0.0

      (:issue:5621, :issue:5632)

    • Fixes test and documentation issues (:issue:5612, :issue:5617, :issue:5631).

    .. _release-2.6.2:

    Scrapy 2.6.2 (2022-07-25)

    Security bug fix:

    • When :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware processes a request with :reqmeta:proxy metadata, and that :reqmeta:proxy metadata includes proxy credentials, :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware sets the Proxy-Authentication header, but only if that header is not already set.

      There are third-party proxy-rotation downloader middlewares that set different :reqmeta:proxy metadata every time they process a request.

      Because of request retries and redirects, the same request can be processed by downloader middlewares more than once, including both :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware and any third-party proxy-rotation downloader middleware.

      These third-party proxy-rotation downloader middlewares could change the :reqmeta:proxy metadata of a request to a new value, but fail to remove the Proxy-Authentication header from the previous value of the :reqmeta:proxy metadata, causing the credentials of one proxy to be sent

    ... (truncated)

    Commits
    • 4dc8e77 Bump version: 2.6.2 → 2.6.3
    • fa5945b 2.6.3: set a release date
    • e5ed046 Merge pull request #5637 from Gallaecio/support-latest-openssl
    • aec2d3a 2.6.3: update the release notes
    • fcc224f tox.ini cleanup
    • b00f312 Limit minium versions of mitmproxy
    • d3f82aa Merge pull request #5617 from Laerte/fix/tests-w3lib
    • efc11b3 zope.interface: 4.4.2 → 5.0.0 (setuptools #2017)
    • edd7cfe Update test-standard link in contributing docs (#5631)
    • 9f443e8 zope.interface: 4.1.3 → 4.4.2
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Bump scrapy from 2.5.1 to 2.6.2

    Bump scrapy from 2.5.1 to 2.6.2

    Bumps scrapy from 2.5.1 to 2.6.2.

    Release notes

    Sourced from scrapy's releases.

    2.6.2

    Fixes a security issue around HTTP proxy usage, and addresses a few regressions introduced in Scrapy 2.6.0.

    See the changelog.

    2.6.1

    Fixes a regression introduced in 2.6.0 that would unset the request method when following redirects.

    2.6.0

    • Security fixes for cookie handling (see details below)
    • Python 3.10 support
    • asyncio support is no longer considered experimental, and works out-of-the-box on Windows regardless of your Python version
    • Feed exports now support pathlib.Path output paths and per-feed item filtering and post-processing

    See the full changelog

    Security bug fixes

    • When a Request object with cookies defined gets a redirect response causing a new Request object to be scheduled, the cookies defined in the original Request object are no longer copied into the new Request object.

      If you manually set the Cookie header on a Request object and the domain name of the redirect URL is not an exact match for the domain of the URL of the original Request object, your Cookie header is now dropped from the new Request object.

      The old behavior could be exploited by an attacker to gain access to your cookies. Please, see the cjvr-mfj7-j4j8 security advisory for more information.

      Note: It is still possible to enable the sharing of cookies between different domains with a shared domain suffix (e.g. example.com and any subdomain) by defining the shared domain suffix (e.g. example.com) as the cookie domain when defining your cookies. See the documentation of the Request class for more information.

    • When the domain of a cookie, either received in the Set-Cookie header of a response or defined in a Request object, is set to a public suffix <https://publicsuffix.org/>_, the cookie is now ignored unless the cookie domain is the same as the request domain.

      The old behavior could be exploited by an attacker to inject cookies from a controlled domain into your cookiejar that could be sent to other domains not controlled by the attacker. Please, see the mfjm-vh54-3f96 security advisory for more information.

    Changelog

    Sourced from scrapy's changelog.

    Scrapy 2.6.2 (2022-07-25)

    Security bug fix:

    • When :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware processes a request with :reqmeta:proxy metadata, and that :reqmeta:proxy metadata includes proxy credentials, :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware sets the Proxy-Authentication header, but only if that header is not already set.

      There are third-party proxy-rotation downloader middlewares that set different :reqmeta:proxy metadata every time they process a request.

      Because of request retries and redirects, the same request can be processed by downloader middlewares more than once, including both :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware and any third-party proxy-rotation downloader middleware.

      These third-party proxy-rotation downloader middlewares could change the :reqmeta:proxy metadata of a request to a new value, but fail to remove the Proxy-Authentication header from the previous value of the :reqmeta:proxy metadata, causing the credentials of one proxy to be sent to a different proxy.

      To prevent the unintended leaking of proxy credentials, the behavior of :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware is now as follows when processing a request:

      • If the request being processed defines :reqmeta:proxy metadata that includes credentials, the Proxy-Authorization header is always updated to feature those credentials.

      • If the request being processed defines :reqmeta:proxy metadata without credentials, the Proxy-Authorization header is removed unless it was originally defined for the same proxy URL.

        To remove proxy credentials while keeping the same proxy URL, remove the Proxy-Authorization header.

      • If the request has no :reqmeta:proxy metadata, or that metadata is a falsy value (e.g. None), the Proxy-Authorization header is removed.

        It is no longer possible to set a proxy URL through the :reqmeta:proxy metadata but set the credentials through the Proxy-Authorization header. Set proxy credentials through the :reqmeta:proxy metadata instead.

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump twisted from 21.7.0 to 22.2.0

    Bump twisted from 21.7.0 to 22.2.0

    Bumps twisted from 21.7.0 to 22.2.0.

    Release notes

    Sourced from twisted's releases.

    Twisted 22.2.0 (2022-03-01)

    Bugfixes

    • twisted.internet.gireactor.PortableGIReactor.simulate and twisted.internet.gtk2reactor.PortableGtkReactor.simulate no longer raises TypeError when there are no delayed called. This was a regression introduced with the migration to Python 3 in which the builtin min function no longer accepts None as an argument. (#9660)
    • twisted.conch.ssh.transport.SSHTransportBase now disconnects the remote peer if the SSH version string is not sent in the first 4096 bytes. (#10284, CVE-2022-21716, GHSA-rv6r-3f5q-9rgx)

    Improved Documentation

    • Add type annotations for twisted.web.http.Request.getHeader. (#10270)

    Deprecations and Removals

    • Support for Python 3.6, which is EoL as of 2021-09-04, has been deprecated. (#10303)

    Misc

    Conch

    Misc

    
    - [#10298](https://github.com/twisted/twisted/issues/10298)
    

    Web

    No significant changes.

    Mail

    No significant changes.

    </tr></table>

    ... (truncated)

    Changelog

    Sourced from twisted's changelog.

    Twisted 22.2.0 (2022-03-01)

    Bugfixes

    • twisted.internet.gireactor.PortableGIReactor.simulate and twisted.internet.gtk2reactor.PortableGtkReactor.simulate no longer raises TypeError when there are no delayed called. This was a regression introduced with the migration to Python 3 in which the builtin min function no longer accepts None as an argument. (#9660)
    • twisted.conch.ssh.transport.SSHTransportBase now disconnects the remote peer if the SSH version string is not sent in the first 4096 bytes. (#10284, CVE-2022-21716, GHSA-rv6r-3f5q-9rgx)

    Improved Documentation

    • Add type annotations for twisted.web.http.Request.getHeader. (#10270)

    Deprecations and Removals

    • Support for Python 3.6, which is EoL as of 2021-09-04, has been deprecated. (#10303)

    Misc

    Conch

    Misc

    
    - [#10298](https://github.com/twisted/twisted/issues/10298)
    

    Web

    No significant changes.

    Mail

    No significant changes.

    </tr></table>

    ... (truncated)

    Commits
    • 89c395e Update the release date.
    • 2b2af4d python -m incremental.update Twisted --newversion 22.2.0
    • 5246003 Apply suggestions from code review from twm.
    • 766bcd3 tox -e towncrier
    • 7cbf195 python -m incremental.update Twisted --rc
    • 98387b3 Merge pull request from GHSA-rv6r-3f5q-9rgx
    • a4523b4 Fix typo.
    • a6849d4 Merge pull request #1693 from twisted/10303-deprecate-py36
    • bce8e81 Merge branch 'trunk' into 10303-deprecate-py36
    • 9045ef7 Merge pull request #1679 from doadin/patch-3
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump scrapy from 2.5.1 to 2.6.0

    Bump scrapy from 2.5.1 to 2.6.0

    Bumps scrapy from 2.5.1 to 2.6.0.

    Release notes

    Sourced from scrapy's releases.

    2.6.0

    • Security fixes for cookie handling (see details below)
    • Python 3.10 support
    • asyncio support is no longer considered experimental, and works out-of-the-box on Windows regardless of your Python version
    • Feed exports now support pathlib.Path output paths and per-feed item filtering and post-processing

    See the full changelog

    Security bug fixes

    • When a Request object with cookies defined gets a redirect response causing a new Request object to be scheduled, the cookies defined in the original Request object are no longer copied into the new Request object.

      If you manually set the Cookie header on a Request object and the domain name of the redirect URL is not an exact match for the domain of the URL of the original Request object, your Cookie header is now dropped from the new Request object.

      The old behavior could be exploited by an attacker to gain access to your cookies. Please, see the cjvr-mfj7-j4j8 security advisory for more information.

      Note: It is still possible to enable the sharing of cookies between different domains with a shared domain suffix (e.g. example.com and any subdomain) by defining the shared domain suffix (e.g. example.com) as the cookie domain when defining your cookies. See the documentation of the Request class for more information.

    • When the domain of a cookie, either received in the Set-Cookie header of a response or defined in a Request object, is set to a public suffix <https://publicsuffix.org/>_, the cookie is now ignored unless the cookie domain is the same as the request domain.

      The old behavior could be exploited by an attacker to inject cookies from a controlled domain into your cookiejar that could be sent to other domains not controlled by the attacker. Please, see the mfjm-vh54-3f96 security advisory for more information.

    Changelog

    Sourced from scrapy's changelog.

    Scrapy 2.6.0 (2022-03-01)

    Highlights:

    • :ref:Security fixes for cookie handling <2.6-security-fixes>

    • Python 3.10 support

    • :ref:asyncio support <using-asyncio> is no longer considered experimental, and works out-of-the-box on Windows regardless of your Python version

    • Feed exports now support :class:pathlib.Path output paths and per-feed :ref:item filtering <item-filter> and :ref:post-processing <post-processing>

    .. _2.6-security-fixes:

    Security bug fixes

    
    -   When a :class:`~scrapy.http.Request` object with cookies defined gets a
        redirect response causing a new :class:`~scrapy.http.Request` object to be
        scheduled, the cookies defined in the original
        :class:`~scrapy.http.Request` object are no longer copied into the new
        :class:`~scrapy.http.Request` object.
    
    If you manually set the ``Cookie`` header on a
    :class:`~scrapy.http.Request` object and the domain name of the redirect
    URL is not an exact match for the domain of the URL of the original
    :class:`~scrapy.http.Request` object, your ``Cookie`` header is now dropped
    from the new :class:`~scrapy.http.Request` object.
    

    The old behavior could be exploited by an attacker to gain access to your cookies. Please, see the cjvr-mfj7-j4j8 security advisory_ for more information.

    .. _cjvr-mfj7-j4j8 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-cjvr-mfj7-j4j8

    .. note:: It is still possible to enable the sharing of cookies between different domains with a shared domain suffix (e.g. example.com and any subdomain) by defining the shared domain suffix (e.g. example.com) as the cookie domain when defining your cookies. See the documentation of the :class:~scrapy.http.Request class for more information.

    • When the domain of a cookie, either received in the Set-Cookie header of a response or defined in a :class:~scrapy.http.Request object, is set to a public suffix &lt;https://publicsuffix.org/&gt;_, the cookie is now </tr></table>

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies 
opened by dependabot[bot] 1
  • Bump twisted from 21.7.0 to 22.1.0

    Bump twisted from 21.7.0 to 22.1.0

    Bumps twisted from 21.7.0 to 22.1.0.

    Release notes

    Sourced from twisted's releases.

    Twisted 22.1.0 (2022-02-03)

    Features

    • Python 3.10 is now a supported platform (#10224)
    • Type annotations have been added to the twisted.python.fakepwd module. (#10287)

    Bugfixes

    • twisted.internet.defer.inlineCallbacks has an improved type annotation, to avoid typing errors when it is used on a function which returns a non-None result. (#10231)
    • twisted.internet.base.DelayedCall.__repr__ and twisted.internet.task.LoopingCall.__repr__ had the changes from #10155 reverted to accept non-function callables. (#10235)
    • Revert the removal of .whl building that was done as part of #10177. (#10236)
    • The type annotation of the host parameter to twisted.internet.interfaces.IReactorTCP.connectTCP has been corrected from bytes to str. (#10251)
    • Deprecated twisted.python.threading.ThreadPool.currentThread() in favor of threading.current_thread(). Switched twisted.python.threading.ThreadPool.currentThread() and twisted.python.threadable.getThreadID() to use `threading.current_thread()to avoid the deprecation warnings introduced forthreading.currentThread()`` in Python 3.10. (#10273)

    Improved Documentation

    • twisted.internet.utils.runWithWarningsSupressed behavior of waiting on deferreds has been documented. (#10238)
    • Sync API docs templates with pydoctor 21.9.0 release, using new theming capabilities. (#10267)

    Misc

    Conch

    Features

    
    - twisted.conch.ssh now supports SSH extension negotiation (RFC 8308). ([#10266](https://github.com/twisted/twisted/issues/10266))
    

    Bugfixes

    • twisted.conch now uses constant-time comparisons for MACs. (#8199)
    • twisted.conch.ssh.filetransfer.FileTransferServer will now return an ENOENT error status if an SFTP client tries to close an unrecognized file handle. (#10293)
    • SSHTransportBase.ssh_KEXINIT now uses the remote peer preferred MAC list for negotiation. In previous versions it was only using the local preferred MAC list. (#10241)

    ... (truncated)

    Changelog

    Sourced from twisted's changelog.

    Twisted 22.1.0 (2022-02-03)

    Features

    • Python 3.10 is now a supported platform (#10224)
    • Type annotations have been added to the twisted.python.fakepwd module. (#10287)

    Bugfixes

    • twisted.internet.defer.inlineCallbacks has an improved type annotation, to avoid typing errors when it is used on a function which returns a non-None result. (#10231)
    • twisted.internet.base.DelayedCall.__repr__ and twisted.internet.task.LoopingCall.__repr__ had the changes from #10155 reverted to accept non-function callables. (#10235)
    • Revert the removal of .whl building that was done as part of #10177. (#10236)
    • The type annotation of the host parameter to twisted.internet.interfaces.IReactorTCP.connectTCP has been corrected from bytes to str. (#10251)
    • Deprecated twisted.python.threading.ThreadPool.currentThread() in favor of threading.current_thread(). Switched twisted.python.threading.ThreadPool.currentThread() and twisted.python.threadable.getThreadID() to use `threading.current_thread()to avoid the deprecation warnings introduced forthreading.currentThread()`` in Python 3.10. (#10273)

    Improved Documentation

    • twisted.internet.utils.runWithWarningsSupressed behavior of waiting on deferreds has been documented. (#10238)
    • Sync API docs templates with pydoctor 21.9.0 release, using new theming capabilities. (#10267)

    Misc

    Conch

    Bugfixes

    • SSHTransportBase.ssh_KEXINIT now uses the remote peer preferred MAC list for negotiation. In previous versions it was only using the local preferred MAC list. (#10241)

    Features

    
    - twisted.conch.ssh now supports SSH extension negotiation (RFC 8308). ([#10266](https://github.com/twisted/twisted/issues/10266))
    

    Bugfixes </tr></table>

    ... (truncated)

    Commits
    • 45d463c move conch bugfix.
    • d48e4d3 Manually update the release version and date inside the NEWS file. T
    • 9ce5061 Update final release version.
    • 9d9322b Update the release notes.
    • 7e65fbe Bump copyright.
    • ddf72a9 tox -e towncrier
    • b33589f Update to 21.1.0.rc1
    • a033c84 Merge pull request #1685 from twisted/10293-conch-sftp-close-invalid-handle
    • 9e9cce2 Copy the skip logic from FileTransferCloseTests
    • 385e9f2 mention the name of the draft doc
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump lxml from 4.6.4 to 4.6.5

    Bump lxml from 4.6.4 to 4.6.5

    Bumps lxml from 4.6.4 to 4.6.5.

    Changelog

    Sourced from lxml's changelog.

    4.6.5 (2021-12-12)

    Bugs fixed

    • A vulnerability (GHSL-2021-1038) in the HTML cleaner allowed sneaking script content through SVG images.

    • A vulnerability (GHSL-2021-1037) in the HTML cleaner allowed sneaking script content through CSS imports and other crafted constructs.

    Commits
    • a9611ba Fix a test in Py2.
    • a3eacbc Prepare release of 4.6.5.
    • b7ea687 Update changelog.
    • 69a7473 Cleaner: cover some more cases where scripts could sneak through in specially...
    • 54d2985 Fix condition in test decorator.
    • 4b220b5 Use the non-depcrecated TextTestResult instead of _TextTestResult (GH-333)
    • d85c6de Exclude a test when using the macOS system libraries because it fails with li...
    • cd4bec9 Add macOS-M1 as wheel build platform.
    • fd0d471 Install automake and libtool in macOS build to be able to install the latest ...
    • f233023 Cleaner: Remove SVG image data URLs since they can embed script content.
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump scrapy from 2.5.1 to 2.7.1

    Bump scrapy from 2.5.1 to 2.7.1

    Bumps scrapy from 2.5.1 to 2.7.1.

    Release notes

    Sourced from scrapy's releases.

    2.7.1

    • Relaxed the restriction introduced in 2.6.2 so that the Proxy-Authentication header can again be set explicitly in certain cases, restoring compatibility with scrapy-zyte-smartproxy 2.1.0 and older
    • Bug fixes

    See the full changelog

    2.7.0

    See the full changelog

    2.6.3

    Makes pip install Scrapy work again.

    It required making changes to support pyOpenSSL 22.1.0. We had to drop support for SSLv3 as a result.

    We also upgraded the minimum versions of some dependencies.

    See the changelog.

    2.6.2

    Fixes a security issue around HTTP proxy usage, and addresses a few regressions introduced in Scrapy 2.6.0.

    See the changelog.

    2.6.1

    Fixes a regression introduced in 2.6.0 that would unset the request method when following redirects.

    2.6.0

    • Security fixes for cookie handling (see details below)
    • Python 3.10 support
    • asyncio support is no longer considered experimental, and works out-of-the-box on Windows regardless of your Python version
    • Feed exports now support pathlib.Path output paths and per-feed item filtering and post-processing

    See the full changelog

    Security bug fixes

    • When a Request object with cookies defined gets a redirect response causing a new Request object to be scheduled, the cookies defined in the original Request object are no longer copied into the new Request object.

      If you manually set the Cookie header on a Request object and the domain name of the redirect URL is not an exact match for the domain of the URL of the original Request object, your Cookie header is now dropped from the new Request object.

      The old behavior could be exploited by an attacker to gain access to your cookies. Please, see the cjvr-mfj7-j4j8 security advisory for more information.

    ... (truncated)

    Changelog

    Sourced from scrapy's changelog.

    Scrapy 2.7.1 (2022-11-02)

    New features

    
    -   Relaxed the restriction introduced in 2.6.2 so that the
        ``Proxy-Authentication`` header can again be set explicitly, as long as the
        proxy URL in the :reqmeta:`proxy` metadata has no other credentials, and
        for as long as that proxy URL remains the same; this restores compatibility
        with scrapy-zyte-smartproxy 2.1.0 and older (:issue:`5626`).
    

    Bug fixes

    
    -   Using ``-O``/``--overwrite-output`` and ``-t``/``--output-format`` options
        together now produces an error instead of ignoring the former option
        (:issue:`5516`, :issue:`5605`).
    
    • Replaced deprecated :mod:asyncio APIs that implicitly use the current event loop with code that explicitly requests a loop from the event loop policy (:issue:5685, :issue:5689).

    • Fixed uses of deprecated Scrapy APIs in Scrapy itself (:issue:5588, :issue:5589).

    • Fixed uses of a deprecated Pillow API (:issue:5684, :issue:5692).

    • Improved code that checks if generators return values, so that it no longer fails on decorated methods and partial methods (:issue:5323, :issue:5592, :issue:5599, :issue:5691).

    Documentation </code></pre> <ul> <li> <p>Upgraded the Code of Conduct to Contributor Covenant v2.1 (:issue:<code>5698</code>).</p> </li> <li> <p>Fixed typos (:issue:<code>5681</code>, :issue:<code>5694</code>).</p> </li> </ul> <p>Quality assurance</p> <pre><code>

    • Re-enabled some erroneously disabled flake8 checks (:issue:5688).

    • Ignored harmless deprecation warnings from :mod:typing in tests (:issue:5686, :issue:5697).

    • Modernized our CI configuration (:issue:5695, :issue:5696).

    &lt;/tr&gt;&lt;/table&gt; </code></pre> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary>

    <ul> <li><a href="https://github.com/scrapy/scrapy/commit/6ded3cf4cd134b615239babe28bb28c3ff524b05"><code>6ded3cf</code></a> Bump version: 2.7.0 → 2.7.1</li> <li><a href="https://github.com/scrapy/scrapy/commit/95880c5de1b1909bf03303fb9c02cddb0508fe1a"><code>95880c5</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5701">#5701</a> from scrapy/relnotes-2.7.1</li> <li><a href="https://github.com/scrapy/scrapy/commit/5ec175b8bb08f93c431d7d64d2389b90ec7a1f37"><code>5ec175b</code></a> Small relnotes fixes.</li> <li><a href="https://github.com/scrapy/scrapy/commit/940a73863bf7dcb16b3f2d9f5efb83efe4599712"><code>940a738</code></a> Release notes for 2.7.1.</li> <li><a href="https://github.com/scrapy/scrapy/commit/a95a338eeada7275a5289cf036136610ebaf07eb"><code>a95a338</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5599">#5599</a> from tonal/patch-1</li> <li><a href="https://github.com/scrapy/scrapy/commit/9077d0f9b490114f117c668f115240c16afccedf"><code>9077d0f</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5698">#5698</a> from pankali/patch-1</li> <li><a href="https://github.com/scrapy/scrapy/commit/76c2cb070e4efe3ae33a4b3d72a5bcac6709f48f"><code>76c2cb0</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5697">#5697</a> from iamkaushal/<a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5686">#5686</a>_fix</li> <li><a href="https://github.com/scrapy/scrapy/commit/9f45be439de8a3b9a6d201c33e98b408a73c02bb"><code>9f45be4</code></a> Update Code of Conduct to Contributor Covenant v2.1</li> <li><a href="https://github.com/scrapy/scrapy/commit/bd9e482c2f0db92065708c8291be6e8bc1f05218"><code>bd9e482</code></a> added typing.io and typing.re in pytest warning filter to ignore</li> <li><a href="https://github.com/scrapy/scrapy/commit/fd692f309105d917f5f46bd00a88c550d6cc7da3"><code>fd692f3</code></a> Prevent running the -O and -t command-line options together (<a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5605">#5605</a>)</li> <li>Additional commits viewable in <a href="https://github.com/scrapy/scrapy/compare/2.5.1...2.7.1">compare view</a></li> </ul> </details>

    <br />

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump twisted from 21.7.0 to 22.10.0

    Bump twisted from 21.7.0 to 22.10.0

    Bumps twisted from 21.7.0 to 22.10.0.

    Release notes

    Sourced from twisted's releases.

    Twisted 22.10.0 (2022-10-30)

    This release contains a security fix for CVE-2022-39348. This is a low-severity security bug.

    Twisted 22.10.0rc1 release candidate was released on 2022-10-26 and there are no changes between the release candidate and the final release.

    Features

    • The systemd: endpoint parser now supports "named" file descriptors. This is a more reliable mechanism for choosing among several inherited descriptors. (#8147)

    Improved Documentation

    • The systemd endpoint parser's index parameter is now documented as leading to non-deterministic results in which descriptor is selected. The new name parameter is now documented as preferred. (#8146)
    • The implementers of Zope interfaces are once more displayed in the documentations. (#11690)

    Deprecations and Removals

    • twisted.protocols.dict, which was deprecated in 17.9, has been removed. (#11725)

    Misc

    Conch

    Bugfixes

    
    - twisted.conch.manhole.ManholeInterpreter now captures tracebacks even if sys.excepthook has been modified. ([#11638](https://github.com/twisted/twisted/issues/11638))
    

    Web

    Features

    ... (truncated)

    Changelog

    Sourced from twisted's changelog.

    Twisted 22.10.0 (2022-10-30)

    This release contains a security fix for CVE-2022-39348. This is a low-severity security bug.

    Twisted 22.10.0rc1 release candidate was released on 2022-10-26 and there are no changes between the release candidate and the final release.

    Features

    • The systemd: endpoint parser now supports "named" file descriptors. This is a more reliable mechanism for choosing among several inherited descriptors. (#8147)

    Improved Documentation

    • The systemd endpoint parser's index parameter is now documented as leading to non-deterministic results in which descriptor is selected. The new name parameter is now documented as preferred. (#8146)
    • The implementers of Zope interfaces are once more displayed in the documentations. (#11690)

    Deprecations and Removals

    • twisted.protocols.dict, which was deprecated in 17.9, has been removed. (#11725)

    Misc

    Conch

    Bugfixes

    
    - twisted.conch.manhole.ManholeInterpreter now captures tracebacks even if sys.excepthook has been modified. ([#11638](https://github.com/twisted/twisted/issues/11638))
    

    Web

    Features

    ... (truncated)

    Commits
    • 39ee213 Update news for final version.
    • 7e76513 python -m incremental.update Twisted --newversion 22.10.0
    • 3f1f502 Apply suggestions from twm.
    • 3185b01 Add info about CVE at the start of the release notes.
    • 15aa477 tox -e towncrier
    • 0a29d34 python -m incremental.update Twisted --rc
    • f2f5e81 Merge pull request from GHSA-vg46-2rrj-3647
    • b0545bc Merge branch 'trunk' into advisory-fix-1
    • 50761f4 #11715: Use NEXT in deprecation examples (#11720)
    • 927a5dc Add newsfragment
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page.
    dependencies 
    opened by dependabot[bot] 0
  • Bump openpyxl from 3.0.9 to 3.0.10

    Bump openpyxl from 3.0.9 to 3.0.10

    Bumps openpyxl from 3.0.9 to 3.0.10.

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump lxml from 4.6.4 to 4.9.1

    Bump lxml from 4.6.4 to 4.9.1

    Bumps lxml from 4.6.4 to 4.9.1.

    Changelog

    Sourced from lxml's changelog.

    4.9.1 (2022-07-01)

    Bugs fixed

    • A crash was resolved when using iterwalk() (or canonicalize()) after parsing certain incorrect input. Note that iterwalk() can crash on valid input parsed with the same parser after failing to parse the incorrect input.

    4.9.0 (2022-06-01)

    Bugs fixed

    • GH#341: The mixin inheritance order in lxml.html was corrected. Patch by xmo-odoo.

    Other changes

    • Built with Cython 0.29.30 to adapt to changes in Python 3.11 and 3.12.

    • Wheels include zlib 1.2.12, libxml2 2.9.14 and libxslt 1.1.35 (libxml2 2.9.12+ and libxslt 1.1.34 on Windows).

    • GH#343: Windows-AArch64 build support in Visual Studio. Patch by Steve Dower.

    4.8.0 (2022-02-17)

    Features added

    • GH#337: Path-like objects are now supported throughout the API instead of just strings. Patch by Henning Janssen.

    • The ElementMaker now supports QName values as tags, which always override the default namespace of the factory.

    Bugs fixed

    • GH#338: In lxml.objectify, the XSI float annotation "nan" and "inf" were spelled in lower case, whereas XML Schema datatypes define them as "NaN" and "INF" respectively.

    ... (truncated)

    Commits
    • d01872c Prevent parse failure in new test from leaking into later test runs.
    • d65e632 Prepare release of lxml 4.9.1.
    • 86368e9 Fix a crash when incorrect parser input occurs together with usages of iterwa...
    • 50c2764 Delete unused Travis CI config and reference in docs (GH-345)
    • 8f0bf2d Try to speed up the musllinux AArch64 build by splitting the different CPytho...
    • b9f7074 Remove debug print from test.
    • b224e0f Try to install 'xz' in wheel builds, if available, since it's now needed to e...
    • 897ebfa Update macOS deployment target version from 10.14 to 10.15 since 10.14 starts...
    • 853c9e9 Prepare release of 4.9.0.
    • d3f77e6 Add a test for https://bugs.launchpad.net/lxml/+bug/1965070 leaving out the a...
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump numpy from 1.21.3 to 1.22.0

    Bump numpy from 1.21.3 to 1.22.0

    Bumps numpy from 1.21.3 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Owner
    Mgs. M. Rizqi Fadhlurrahman
    Frontend Engineer
    Mgs. M. Rizqi Fadhlurrahman
    Library to scrape and clean web pages to create massive datasets.

    lazynlp A straightforward library that allows you to crawl, clean up, and deduplicate webpages to create massive monolingual datasets. Using this libr

    Chip Huyen 2.1k Jan 6, 2023
    Scrape Twitter for Tweets

    Backers Thank you to all our backers! ?? [Become a backer] Sponsors Support this project by becoming a sponsor. Your logo will show up here with a lin

    Ahmet Taspinar 2.2k Jan 5, 2023
    mlscraper: Scrape data from HTML pages automatically with Machine Learning

    ?? Scrape data from HTML websites automatically with Machine Learning

    Karl Lorey 798 Dec 29, 2022
    Scrape all the media from an OnlyFans account - Updated regularly

    Scrape all the media from an OnlyFans account - Updated regularly

    CRIMINAL 3.2k Dec 29, 2022
    Docker containerized Python Flask API that uses selenium to scrape and interact with websites

    Docker containerized Python Flask API that uses selenium to scrape and interact with websites

    Christian Gracia 0 Jan 22, 2022
    An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!

    Social Media Scraper An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line! Go to the website » Vie

    null 2 Aug 3, 2022
    A tool to easily scrape youtube data using the Google API

    YouTube data scraper To easily scrape any data from the youtube homepage, a youtube channel/user, search results, playlists, and a single video itself

    null 7 Dec 3, 2022
    Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

    SpaceX Sofware I developed software to scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info to use the software you need Python a

    Maxence Rémy 16 Aug 2, 2022
    Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine .

    TwitterScraper Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine . Screenshot Data Users Only

    Remax Alghamdi 19 Nov 17, 2022
    An helper library to scrape data from TikTok in one line, using the Influencer Hunters APIs.

    TikTok Scraper An utility library to scrape data from TikTok hassle-free Go to the website » View Demo · Report Bug · Request Feature About The Projec

    null 6 Jan 8, 2023
    Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

    Github Scraper Github scraper app is used to scrape data for a specific user profile. Github scraper app gets a github profile name and check whether

    Siva Prakash 6 Apr 5, 2022
    Scrape and display grades onto the console

    WebScrapeGrades About The Project This Project is a personal project where I learned how to webscrape using python requests. Being able to get request

    Cyrus Baybay 1 Oct 23, 2021
    Scrape puzzle scrambles from csTimer.net

    Scroodle Selenium script to scrape scrambles from csTimer.net csTimer runs locally in your browser, so this doesn't strain the servers any more than i

    Jason Nguyen 1 Oct 29, 2021
    A simple flask application to scrape gogoanime website.

    gogoanime-api-flask A simple flask application to scrape gogoanime website. Used for demo and learning purposes only. How to use the API The base api

    null 1 Oct 29, 2021
    a way to scrape a database of all of the isef projects

    ISEF Database This is a simple web scraper which gets all of the projects and abstract information from here. My goal for this is for someone to get i

    William Kaiser 1 Mar 18, 2022
    CRI Scrape is a tool for get general info about Italian Red Cross in GAIA Platform

    CRI Scrape CRI Scrape is a tool for get general info about Italian Red Cross in GAIA Platform Disclaimer This code is only for educational purpose. So

    Vincenzo Cardone 0 Jul 23, 2022
    API which uses discord to scrape NameMC searches/droptime/dropping status of minecraft names

    NameMC Scrape API This is an api to scrape NameMC using message previews generated by discord. NameMC makes it a pain to scrape their website, but som

    Twilak 2 Dec 22, 2021
    This is python to scrape overview and reviews of companies from Glassdoor.

    Data Scraping for Glassdoor This is python to scrape overview and reviews of companies from Glassdoor. Please use it carefully and follow the Terms of

    Houping 5 Jun 23, 2022
    A Python web scraper to scrape latest posts from official Coinbase's Blog.

    Coinbase Blog Scraper A Python web scraper to scrape latest posts from official Coinbase's Blog. IDEA It scrapes up latest blog posts from https://blo

    Lucas Villela 3 Feb 18, 2022