Modern robots.txt Parser for Python

Related tags

Miscellaneous reppy
Overview

Robots Exclusion Protocol Parser for Python

Build Status

Robots.txt parsing in Python.

Goals

  • Fetching -- helper utilities for fetching and parsing robots.txts, including checking cache-control and expires headers
  • Support for newer features -- like Crawl-Delay and Sitemaps
  • Wildcard matching -- without using regexes, no less
  • Performance -- with >100k parses per second, >1M URL checks per second once parsed
  • Caching -- utilities to help with the caching of robots.txt responses

Installation

reppy is available on pypi:

pip install reppy

When installing from source, there are submodule dependencies that must also be fetched:

git submodule update --init --recursive
make install

Usage

Checking when pages are allowed

Two classes answer questions about whether a URL is allowed: Robots and Agent:

from reppy.robots import Robots

# This utility uses `requests` to fetch the content
robots = Robots.fetch('http://example.com/robots.txt')
robots.allowed('http://example.com/some/path/', 'my-user-agent')

# Get the rules for a specific agent
agent = robots.agent('my-user-agent')
agent.allowed('http://example.com/some/path/')

The Robots class also exposes properties expired and ttl to describe how long the response should be considered valid. A reppy.ttl policy is used to determine what that should be:

from reppy.ttl import HeaderWithDefaultPolicy

# Use the `cache-control` or `expires` headers, defaulting to a 30 minutes and
# ensuring it's at least 10 minutes
policy = HeaderWithDefaultPolicy(default=1800, minimum=600)

robots = Robots.fetch('http://example.com/robots.txt', ttl_policy=policy)

Customizing fetch

The fetch method accepts *args and **kwargs that are passed on to requests.get, allowing you to customize the way the fetch is executed:

robots = Robots.fetch('http://example.com/robots.txt', headers={...})

Matching Rules and Wildcards

Both * and $ are supported for wildcard matching.

This library follows the matching 1996 RFC describes. In the case where multiple rules match a query, the longest rules wins as it is presumed to be the most specific.

Checking sitemaps

The Robots class also lists the sitemaps that are listed in a robots.txt

# This property holds a list of URL strings of all the sitemaps listed
robots.sitemaps

Delay

The Crawl-Delay directive is per agent and can be accessed through that class. If none was specified, it's None:

# What's the delay my-user-agent should use
robots.agent('my-user-agent').delay

Determining the robots.txt URL

Given a URL, there's a utility to determine the URL of the corresponding robots.txt. It preserves the scheme and hostname and the port (if it's not the default port for the scheme).

# Get robots.txt URL for http://[email protected]:8080/path;params?query#fragment
# It's http://example.com:8080/robots.txt
Robots.robots_url('http://[email protected]:8080/path;params?query#fragment')

Caching

There are two cache classes provided -- RobotsCache, which caches entire reppy.Robots objects, and AgentCache, which only caches the reppy.Agent relevant to a client. These caches duck-type the class that they cache for the purposes of checking if a URL is allowed:

from reppy.cache import RobotsCache
cache = RobotsCache(capacity=100)
cache.allowed('http://example.com/foo/bar', 'my-user-agent')

from reppy.cache import AgentCache
cache = AgentCache(agent='my-user-agent', capacity=100)
cache.allowed('http://example.com/foo/bar')

Like reppy.Robots.fetch, the cache constructory accepts a ttl_policy to inform the expiration of the fetched Robots objects, as well as *args and **kwargs to be passed to reppy.Robots.fetch.

Caching Failures

There's a piece of classic caching advice: "don't cache failures." However, this is not always appropriate in certain circumstances. For example, if the failure is a timeout, clients may want to cache this result so that every check doesn't take a very long time.

To this end, the cache module provides a notion of a cache policy. It determines what to do in the case of an exception. The default is to cache a form of a disallowed response for 10 minutes, but you can configure it as you see fit:

# Do not cache failures (note the `ttl=0`):
from reppy.cache.policy import ReraiseExceptionPolicy
cache = AgentCache('my-user-agent', cache_policy=ReraiseExceptionPolicy(ttl=0))

# Cache and reraise failures for 10 minutes (note the `ttl=600`):
cache = AgentCache('my-user-agent', cache_policy=ReraiseExceptionPolicy(ttl=600))

# Treat failures as being disallowed
cache = AgentCache(
    'my-user-agent',
    cache_policy=DefaultObjectPolicy(ttl=600, lambda _: Agent().disallow('/')))

Development

A Vagrantfile is provided to bootstrap a development environment:

vagrant up

Alternatively, development can be conducted using a virtualenv:

virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

Tests

Tests may be run in vagrant:

make test

Development

Environment

To launch the vagrant image, we only need to vagrant up (though you may have to provide a --provider flag):

vagrant up

With a running vagrant instance, you can log in and run tests:

vagrant ssh
make test

Running Tests

Tests are run with the top-level Makefile:

make test

PRs

These are not all hard-and-fast rules, but in general PRs have the following expectations:

  • pass Travis -- or more generally, whatever CI is used for the particular project
  • be a complete unit -- whether a bug fix or feature, it should appear as a complete unit before consideration.
  • maintain code coverage -- some projects may include code coverage requirements as part of the build as well
  • maintain the established style -- this means the existing style of established projects, the established conventions of the team for a given language on new projects, and the guidelines of the community of the relevant languages and frameworks.
  • include failing tests -- in the case of bugs, failing tests demonstrating the bug should be included as one commit, followed by a commit making the test succeed. This allows us to jump to a world with a bug included, and prove that our test in fact exercises the bug.
  • be reviewed by one or more developers -- not all feedback has to be accepted, but it should all be considered.
  • avoid 'addressed PR feedback' commits -- in general, PR feedback should be rebased back into the appropriate commits that introduced the change. In cases, where this is burdensome, PR feedback commits may be used but should still describe the changed contained therein.

PR reviews consider the design, organization, and functionality of the submitted code.

Commits

Certain types of changes should be made in their own commits to improve readability. When too many different types of changes happen simultaneous to a single commit, the purpose of each change is muddled. By giving each commit a single logical purpose, it is implicitly clear why changes in that commit took place.

  • updating / upgrading dependencies -- this is especially true for invocations like bundle update or berks update.
  • introducing a new dependency -- often preceeded by a commit updating existing dependencies, this should only include the changes for the new dependency.
  • refactoring -- these commits should preserve all the existing functionality and merely update how it's done.
  • utility components to be used by a new feature -- if introducing an auxiliary class in support of a subsequent commit, add this new class (and its tests) in its own commit.
  • config changes -- when adjusting configuration in isolation
  • formatting / whitespace commits -- when adjusting code only for stylistic purposes.

New Features

Small new features (where small refers to the size and complexity of the change, not the impact) are often introduced in a single commit. Larger features or components might be built up piecewise, with each commit containing a single part of it (and its corresponding tests).

Bug Fixes

In general, bug fixes should come in two-commit pairs: a commit adding a failing test demonstrating the bug, and a commit making that failing test pass.

Tagging and Versioning

Whenever the version included in setup.py is changed (and it should be changed when appropriate using http://semver.org/), a corresponding tag should be created with the same version number (formatted v ).

git tag -a v0.1.0 -m 'Version 0.1.0

This release contains an initial working version of the `crawl` and `parse`
utilities.'

git push --tags origin
Comments
  • wrong implementation?

    wrong implementation?

    Given:

    # manually created file at /robots.txt
    User-agent: *
    Disallow: /wp-admin/
    

    Nothing is matched on the domain for *. How could that be possible?

    According to the google spec, for a url:

    If allowed, allow it
    If disallowed, disallow it
    else, allow it.
    
    opened by kootenpv 11
  • Failed installation via pip on Mac OS X Yosemite

    Failed installation via pip on Mac OS X Yosemite

    In file included from reppy/rep-cpp/src/agent.cpp:3: reppy/rep-cpp/deps/url-cpp/include/url.h:8:10: fatal error: 'unordered_map' file not found #include <unordered_map> ^ 1 error generated. error: command '/usr/bin/clang' failed with exit status 1

    opened by kwikiel 11
  • BIG-4192 - Use `rep-cpp`

    BIG-4192 - Use `rep-cpp`

    This uses rep-cpp to replace the core functionality here.

    On master:

    (venv) vagrant@reppy:/vagrant$ ./bench.py 
    Parse
    ==========
    Total: 3.57568597794
      Avg: 3.57568597794e-05
     Rate: 27966.6616747
    
    Evaluate
    ==========
    Total: 0.284116029739
      Avg: 2.84116029739e-06
     Rate: 351968.877264
    

    On this branch:

    (venv) vagrant@reppy:/vagrant$ ./bench.py 
    Parse
    ==========
    Total: 0.708585977554
      Avg: 7.08585977554e-06
     Rate: 141126.134538
    
    Evaluate
    ==========
    Total: 0.0823559761047
      Avg: 8.23559761047e-07
     Rate: 1214240.96623
    

    That's roughly a 5x parse improvement and about 3.5x evaluation improvement.

    @neilmb @b4hand @lindseyreno

    opened by dlecocq 11
  • Bump version and update maintainer

    Bump version and update maintainer

    Bump version for recent change to add ReadTimeout error. Also update maintainer. @b4hand , I assume you wish to be removed as maintainer. @lindseyreno , as you have experience with reppy, you have been volunteered.

    opened by evanbattaglia 9
  • Incorrect result for mentioned URLs

    Incorrect result for mentioned URLs

    Hi again,

    PFA few of the URLs where incorrect result is returned (False here)

    https://thesparkgroup.com/available-positions/ http://planate.net/careers/ http://www.halagroup.com/career.php?id=13

    Most of the case that I found were:

    • Wordpress site
    • sites with status 404, 403 or giving error on robots.txt page
    • Last one mentioned in file seems to fail for no reason till I understand.

    From what I understand these all URLs should return True. Correct me if I am wrong.

    reppyFail.xlsx

    opened by rock321987 9
  • Add new py3 version to CI

    Add new py3 version to CI

    I noticed when submitting a PR for another project to add support for respecting robots.txt files that reppy was not building under python 3.7-dev and nightly builds.

    To ensure proper visibility for coming versions, I am proposing additional versions for the testing matrix.

    opened by stevelle 8
  • ValueError : Need User Agent

    ValueError : Need User Agent

    Hello,

    With some robots.txt i get this exception :

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "reppy/robots.pyx", line 78, in reppy.robots.FetchMethod (reppy/robots.cpp:3235)
      File "reppy/robots.pyx", line 89, in reppy.robots.FetchMethod (reppy/robots.cpp:2962)
      File "reppy/robots.pyx", line 71, in reppy.robots.ParseMethod (reppy/robots.cpp:2375)
      File "reppy/robots.pyx", line 129, in reppy.robots.Robots.__init__ (reppy/robots.cpp:3947)
    ValueError: Need User-Agent
    

    example :

    from reppy.robots import Robots
    r = Robots.fetch('https://tools.pingdom.com/robots.txt')
    r = Robots.fetch('http://www.lidd.fr/robots.txt')
    
    opened by azotlikid 8
  • Unscoped Crawl-delay is treated as parse error, and thus disallows all URLs

    Unscoped Crawl-delay is treated as parse error, and thus disallows all URLs

    Wrong answer whether a URL is allowed with such robots rules:

    Crawl-delay: 1
    
    User-agent: *
    Disallow: /*%
    Disallow: /*.aspx
    Disallow: /*.aspx*
    

    There are a lot of urls with Crawl-delay: in the head of the file.

    Source url: https://learningnetwork.cisco.com/robots.txt

    Steps for reproducing:

    In [1]: from reppy.cache import RobotsCache
       ...: cache = RobotsCache(capacity=100)
    
    In [16]: cache.allowed('http://learningnetwork.cisco.com', 'my')
    Out[16]: False
    

    Full robots file:

    # Puppet managed file
    
    Crawl-delay: 1
    
    Sitemap: https://learningnetwork.cisco.com/sitemap.xml
    
    User-agent: *
    Disallow: /*%
    Disallow: /*.aspx
    Disallow: /*.aspx*
    Disallow: /*.jspa
    Disallow: /*.jspa*
    Disallow: /*/authors/
    Disallow: /*/bookmarks
    Disallow: /*/customize-container.jspa
    Disallow: /*secondVersionNumber*
    Disallow: /*/profile-status-list.jspa
    Disallow: /*/render-widget!execute.jspa
    Disallow: /*/restore
    Disallow: /*/restore?version=
    Disallow: /*/spotlight-search.jspa
    Disallow: /*/status
    Disallow: /*/version
    Disallow: /*?author
    Disallow: /*?containerType=
    Disallow: /*@
    Disallow: /*=tags
    Disallow: /*answerCategory=
    Disallow: /*author=
    Disallow: /*bookmarks*
    Disallow: /*filterID=
    Disallow: /*itemView=
    Disallow: /*jsessionid=
    Disallow: /*-picker.jspa?
    Disallow: /*searchQuestions=
    Disallow: /*start=
    Disallow: /*tab=
    Disallow: /*tags=
    Disallow: /*tagSet=
    Disallow: /*targetUser=
    Disallow: /*tstart=
    Disallow: /*version=
    Disallow: /*view=
    Disallow: /bookmarks/
    Disallow: /c-
    Disallow: /*c-*
    Disallow: /docs/*/c-
    Disallow: /docs/c-
    Disallow: /ITDIT/
    Disallow: /mailto*
    Disallow: /message/
    Disallow: /people/*
    Disallow: /poll-post!input.jspa?
    Disallow: /post!
    Disallow: /post!imagePicker.jspa
    Disallow: /post!reply.jspa?messageID=
    Disallow: /Products/*.aspx
    Disallow: /search.jspa?
    Disallow: /servlet/JiveServlet
    Disallow: /*servlet*
    Disallow: /tags
    Disallow: /thread/*/c-
    Disallow: /thread/c-
    Disallow: /thread-to-doc.jspa
    Disallow: /user-picker.jspa
    
    opened by ivan-is 8
  • www.bestbuy.com/robots.txt false negative

    www.bestbuy.com/robots.txt false negative

    For some reason this URL

    /site/Global/Free-Shipping/pcmcat276800050002.c?id=pcmcat276800050002

    is disallowed by the library against this robots.txt from www.bestbuy.com

    User-agent: *
    Disallow: /*id=pcmcat140800050004
    Disallow: /*id=pcmcat143800050032
    Disallow: /nex/
    Disallow: /shop/
    Disallow: /*~~*
    Disallow: /*jsessionid=
    Disallow: /*dnmId=*
    Disallow: /*ld=*lg=*rd=*
    Disallow: /m/e/*
    Disallow: /site/builder/*
    Disallow: /site/promo/black-friday-*
    Disallow: /site/promo/Black-Friday-*
    Disallow: /*template=_gameDetailsTab
    Disallow: /*template=_movieDetailsTab
    Disallow: /*template=_musicDetailsTab
    Disallow: /*template=_softwareDetailsTab
    Disallow: /*template=_accessoriesTab
    Disallow: /*template=_castAndCrewTab
    Disallow: /*template=_editorialTab
    Disallow: /*template=_episodesTab
    Disallow: /*template=_protectionAndServicesTab
    Disallow: /*template=_specificationsTab
    
    opened by enewhuis 8
  • python 3.7 breakage

    python 3.7 breakage

    The pypi package doesn't install on Python 3.7 because of changes in the CPython API. I had several other packages that had the same issue, and the fix was to regenerate the .pyx with a modern cython.

    reppy/robots.cpp:7835:69: error: too many arguments to function
           return (*((__Pyx_PyCFunctionFast)meth)) (self, args, nargs, NULL);
    
    opened by wumpus 6
  • Cache is not working correctly

    Cache is not working correctly

    Cache seems to work incorrectly(or not at all :grin:). To simulate the process I used following piece of code

    from reppy.robots import Robots
    from reppy.ttl import HeaderWithDefaultPolicy
    from expiringdict import ExpiringDict
    
    policy = HeaderWithDefaultPolicy(default=1800, minimum=600)
    robotsCache = ExpiringDict(max_len=62000, max_age_seconds=4800)
    
    urlRobot = 'http://example.com/robots.txt'
    url = 'http://example.com/some/path/'
    USER_AGENT = 'my-user-agent'
    
    
    for i in range(1, 10):
        robots = Robots.fetch(urlRobot, ttl_policy=policy)
        print(robots.allowed(url, USER_AGENT))
    
    print('HeaderWithDefaultPolicy ends here')
    
    for i in range(1, 10):
        if urlRobot in robotsCache:
            robots = robotsCache[urlRobot]
        else:
            robots = Robots.fetch(urlRobot)
            robotsCache[urlRobot] = robots
        
        print(robots.allowed(url, USER_AGENT))
    

    The speed is clearly visible in both cases. Cache used in library is making URL request in each time of iteration. I disconnected my internet when it was running through HeaderWithDefaultPolicy, and it froze. When I killed the process it exited with urllib error. So it was making calls and cache was not working.

    Correct me if I am wrong

    And when will this be available in python3.6?

    opened by rock321987 6
  • GCC fails when trying to install reppy through pip

    GCC fails when trying to install reppy through pip

    I cannot install reppy in either python 3.8 or 3.10.

    # pip3.8 install reppy                                                                                                                                                                  
    Collecting reppy
      Using cached reppy-0.4.14.tar.gz (93 kB)
      Preparing metadata (setup.py) ... done
    Requirement already satisfied: cachetools in /home/shi/.virtualenvs/ir-1/lib/python3.8/site-packages (from reppy) (5.0.0)
    Requirement already satisfied: python-dateutil!=2.0,>=1.5 in /home/shi/.virtualenvs/ir-1/lib/python3.8/site-packages (from reppy) (2.8.2)
    Requirement already satisfied: requests in /home/shi/.virtualenvs/ir-1/lib/python3.8/site-packages (from reppy) (2.27.1)
    Requirement already satisfied: six in /home/shi/.virtualenvs/ir-1/lib/python3.8/site-packages (from reppy) (1.16.0)
    Requirement already satisfied: certifi>=2017.4.17 in /home/shi/.virtualenvs/ir-1/lib/python3.8/site-packages (from requests->reppy) (2021.10.8)
    Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/shi/.virtualenvs/ir-1/lib/python3.8/site-packages (from requests->reppy) (1.26.9)
    Requirement already satisfied: idna<4,>=2.5 in /home/shi/.virtualenvs/ir-1/lib/python3.8/site-packages (from requests->reppy) (3.3)
    Requirement already satisfied: charset-normalizer~=2.0.0 in /home/shi/.virtualenvs/ir-1/lib/python3.8/site-packages (from requests->reppy) (2.0.12)
    Building wheels for collected packages: reppy
      Building wheel for reppy (setup.py) ... error
      ERROR: Command errored out with exit status 1:
       command: /home/shi/.virtualenvs/ir-1/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-b0oy_bsz/reppy_b67f12bbf68043dab8f5014f214e8614/setup.py'"'"'; __file__='"'"'/tmp/pip-install-b0oy_bsz/reppy_b67f12bbf68043dab8f5014f214e8614/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-40ydxajn
           cwd: /tmp/pip-install-b0oy_bsz/reppy_b67f12bbf68043dab8f5014f214e8614/
      Complete output (42 lines):
      Building from C++
      /home/shi/.virtualenvs/ir-1/lib/python3.8/site-packages/setuptools/dist.py:723: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
        warnings.warn(
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.8
      creating build/lib.linux-x86_64-3.8/reppy
      copying reppy/ttl.py -> build/lib.linux-x86_64-3.8/reppy
      copying reppy/exceptions.py -> build/lib.linux-x86_64-3.8/reppy
      copying reppy/util.py -> build/lib.linux-x86_64-3.8/reppy
      copying reppy/__init__.py -> build/lib.linux-x86_64-3.8/reppy
      creating build/lib.linux-x86_64-3.8/reppy/cache
      copying reppy/cache/policy.py -> build/lib.linux-x86_64-3.8/reppy/cache
      copying reppy/cache/__init__.py -> build/lib.linux-x86_64-3.8/reppy/cache
      running build_ext
      creating build/temp.linux-x86_64-3.8
      creating build/temp.linux-x86_64-3.8/reppy
      creating build/temp.linux-x86_64-3.8/reppy/rep-cpp
      creating build/temp.linux-x86_64-3.8/reppy/rep-cpp/deps
      creating build/temp.linux-x86_64-3.8/reppy/rep-cpp/deps/url-cpp
      creating build/temp.linux-x86_64-3.8/reppy/rep-cpp/deps/url-cpp/src
      creating build/temp.linux-x86_64-3.8/reppy/rep-cpp/src
      gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -DOPENSSL_NO_SSL2 -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fPIC -Ireppy/rep-cpp/include -Ireppy/rep-cpp/deps/url-cpp/include -I/home/shi/.virtualenvs/ir-1/include -I/usr/include/python3.8 -c reppy/rep-cpp/deps/url-cpp/src/psl.cpp -o build/temp.linux-x86_64-3.8/reppy/rep-cpp/deps/url-cpp/src/psl.o -std=c++11
      In file included from reppy/rep-cpp/deps/url-cpp/src/psl.cpp:7:
      reppy/rep-cpp/deps/url-cpp/include/punycode.h:56:54: error: ‘numeric_limits’ is not a member of ‘std’
         56 |         const punycode_uint MAX_PUNYCODE_UINT = std::numeric_limits<punycode_uint>::max();
            |                                                      ^~~~~~~~~~~~~~
      reppy/rep-cpp/deps/url-cpp/include/punycode.h:56:82: error: expected primary-expression before ‘>’ token
         56 |         const punycode_uint MAX_PUNYCODE_UINT = std::numeric_limits<punycode_uint>::max();
            |                                                                                  ^
      reppy/rep-cpp/deps/url-cpp/include/punycode.h:56:85: error: ‘::max’ has not been declared; did you mean ‘std::max’?
         56 |         const punycode_uint MAX_PUNYCODE_UINT = std::numeric_limits<punycode_uint>::max();
            |                                                                                     ^~~
            |                                                                                     std::max
      In file included from /usr/include/c++/11.2.0/algorithm:62,
                       from reppy/rep-cpp/deps/url-cpp/src/psl.cpp:1:
      /usr/include/c++/11.2.0/bits/stl_algo.h:3467:5: note: ‘std::max’ declared here
       3467 |     max(initializer_list<_Tp> __l, _Compare __comp)
            |     ^~~
      error: command '/usr/bin/gcc' failed with exit code 1
      ----------------------------------------
      ERROR: Failed building wheel for reppy
      Running setup.py clean for reppy
    Failed to build reppy
    Installing collected packages: reppy
        Running setup.py install for reppy ... error
        ERROR: Command errored out with exit status 1:
         command: /home/shi/.virtualenvs/ir-1/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-b0oy_bsz/reppy_b67f12bbf68043dab8f5014f214e8614/setup.py'"'"'; __file__='"'"'/tmp/pip-install-b0oy_bsz/reppy_b67f12bbf68043dab8f5014f214e8614/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-rsopie2y/install-record.txt --single-version-externally-managed --compile --install-headers /home/shi/.virtualenvs/ir-1/include/site/python3.8/reppy
             cwd: /tmp/pip-install-b0oy_bsz/reppy_b67f12bbf68043dab8f5014f214e8614/
        Complete output (44 lines):
        Building from C++
        /home/shi/.virtualenvs/ir-1/lib/python3.8/site-packages/setuptools/dist.py:723: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
          warnings.warn(
        running install
        /home/shi/.virtualenvs/ir-1/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
          warnings.warn(
        running build
        running build_py
        creating build
        creating build/lib.linux-x86_64-3.8
        creating build/lib.linux-x86_64-3.8/reppy
        copying reppy/ttl.py -> build/lib.linux-x86_64-3.8/reppy
        copying reppy/exceptions.py -> build/lib.linux-x86_64-3.8/reppy
        copying reppy/util.py -> build/lib.linux-x86_64-3.8/reppy
        copying reppy/__init__.py -> build/lib.linux-x86_64-3.8/reppy
        creating build/lib.linux-x86_64-3.8/reppy/cache
        copying reppy/cache/policy.py -> build/lib.linux-x86_64-3.8/reppy/cache
        copying reppy/cache/__init__.py -> build/lib.linux-x86_64-3.8/reppy/cache
        running build_ext
        creating build/temp.linux-x86_64-3.8
        creating build/temp.linux-x86_64-3.8/reppy
        creating build/temp.linux-x86_64-3.8/reppy/rep-cpp
        creating build/temp.linux-x86_64-3.8/reppy/rep-cpp/deps
        creating build/temp.linux-x86_64-3.8/reppy/rep-cpp/deps/url-cpp
        creating build/temp.linux-x86_64-3.8/reppy/rep-cpp/deps/url-cpp/src
        creating build/temp.linux-x86_64-3.8/reppy/rep-cpp/src
        gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -DOPENSSL_NO_SSL2 -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fPIC -Ireppy/rep-cpp/include -Ireppy/rep-cpp/deps/url-cpp/include -I/home/shi/.virtualenvs/ir-1/include -I/usr/include/python3.8 -c reppy/rep-cpp/deps/url-cpp/src/psl.cpp -o build/temp.linux-x86_64-3.8/reppy/rep-cpp/deps/url-cpp/src/psl.o -std=c++11
        In file included from reppy/rep-cpp/deps/url-cpp/src/psl.cpp:7:
        reppy/rep-cpp/deps/url-cpp/include/punycode.h:56:54: error: ‘numeric_limits’ is not a member of ‘std’
           56 |         const punycode_uint MAX_PUNYCODE_UINT = std::numeric_limits<punycode_uint>::max();
              |                                                      ^~~~~~~~~~~~~~
        reppy/rep-cpp/deps/url-cpp/include/punycode.h:56:82: error: expected primary-expression before ‘>’ token
           56 |         const punycode_uint MAX_PUNYCODE_UINT = std::numeric_limits<punycode_uint>::max();
              |                                                                                  ^
        reppy/rep-cpp/deps/url-cpp/include/punycode.h:56:85: error: ‘::max’ has not been declared; did you mean ‘std::max’?
           56 |         const punycode_uint MAX_PUNYCODE_UINT = std::numeric_limits<punycode_uint>::max();
              |                                                                                     ^~~
              |                                                                                     std::max
        In file included from /usr/include/c++/11.2.0/algorithm:62,
                         from reppy/rep-cpp/deps/url-cpp/src/psl.cpp:1:
        /usr/include/c++/11.2.0/bits/stl_algo.h:3467:5: note: ‘std::max’ declared here
         3467 |     max(initializer_list<_Tp> __l, _Compare __comp)
              |     ^~~
        error: command '/usr/bin/gcc' failed with exit code 1
        ----------------------------------------
    ERROR: Command errored out with exit status 1: /home/shi/.virtualenvs/ir-1/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-b0oy_bsz/reppy_b67f12bbf68043dab8f5014f214e8614/setup.py'"'"'; __file__='"'"'/tmp/pip-install-b0oy_bsz/reppy_b67f12bbf68043dab8f5014f214e8614/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-rsopie2y/install-record.txt --single-version-externally-managed --compile --install-headers /home/shi/.virtualenvs/ir-1/include/site/python3.8/reppy Check the logs for full command output.
    
    opened by dcfreire 9
  • Fix broken Cython compilation due to baked C++ files

    Fix broken Cython compilation due to baked C++ files

    Having C++ files for users without Cython is nice, but when people actually use Cython it introduces multiple things:

    • incompatibilities on Cython version change
    • out-dated generated code which is by default skipped in the cythonization step as there already is a generated file (thus requires --force without a relevant reason in this case)
    • if anything changes in an upstream (dependency, CPython, etc), people can't just tell pip to utilize source code because the source code is already broken by the generated code even if the repo itself already contains a fix

    What's happening is basically this:

    • on the first hash the repo is without change, thus you'll get just C++ error due to missing symbol
    • on the second hash, the state is after the unsuccessful compilation and it can be seen that the C++ code was untouched
    • on the third hash the generated file is removed, thus cythonization + C++ compilation proceeds as it should
    • on the fourth hash a dumbed-down patch was utilized, cythonization overwrote the C++ file and C++ compilation succeeded
    docker run -it python:alpine /bin/sh -c '
        apk add git g++ && \
        git clone https://github.com/seomoz/reppy && \
        cd reppy && \
        git submodule update --init --recursive && \
        git status && \
        pip install cython && \
        sha256sum reppy/robots.cpp >> /hashes.txt && \
        pip install -v -e . || \
        sha256sum reppy/robots.cpp >> /hashes.txt && \
        git clean -dxf && \
        git status && \
        rm reppy/robots.cpp && \
        pip install -v -e . && \
        sha256sum reppy/robots.cpp >> /hashes.txt && \
        pip uninstall -y reppy && \
        git clean -dxf && \
        git status && \
        sed -i -e "43i\ \ \ \ from Cython.Build import cythonize as c;c(ext_files[-1], force=True,language=\"c++\")" setup.py && \
        git diff > diff.txt && cat diff.txt && rm diff.txt && \
        pip install -v -e . && \
        sha256sum reppy/robots.cpp >> /hashes.txt && \
        cat /hashes.txt' |tee reppy-output.txt
    

    This might have implications on caching of the result when developing the package when having multiple files, however since there is pretty much a single .pyx file I believe this implication is redundant and the overall dev-experience improves by having a straightforward solution.

    Attached is the tee-d output: reppy-output.txt

    Closes #122

    opened by KeyWeeUsr 0
  • User-agent checking in robots.txt issue

    User-agent checking in robots.txt issue

    I need to validate https://facebook.com/robots.txt (I have seen the below issue in most of the websites which I cannot disclose)

    I'm using reppy==0.4.14

    When I try to validate the above with Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

    It is giving me a False response.

    Small reproducible sample given below

    from reppy.robots import Robots
    
    useragent = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
    # useragent = "Googlebot"
    robot_url = "https://facebook.com/robots.txt"
    url = "https://facebook.com/"
    
    robots = Robots.fetch(robot_url)
    res = robots.allowed(url, useragent)
    print("RESPONSE | ", res)
    

    According to https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers

    Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) is Googlebot This should give a True response, instead I'm getting False

    When I tried user agent as Googlebot in above code, it gave me a True response.

    I have also tried downloading robots.txt content and parsing method too. Giving the same issue as above.

    Reproducible sample

    from reppy.robots import Robots
    
    useragent = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
    # useragent = "Googlebot"
    robot_url = "https://facebook.com/robots.txt"
    url = "https://facebook.com/"
    
    import requests
    payload = {}
    headers = {}
    response = requests.request("GET", robot_url, headers=headers, data=payload)
    robots_content = response.text
    
    print("ROBOTS.TXT CONTENT | ", robots_content)
    
    robots = Robots.parse(robot_url, robots_content)
    res = robots.allowed(url, useragent)
    
    print("RESPONSE | ", res)
    

    Why giving the full string of user agent instead of token is not working?

    Giving Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) is the same as giving Googlebot and should return True response. Ref: https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers

    Thank you!

    opened by jishnug007 0
  • Please update version on PyPi

    Please update version on PyPi

    The latest version on PyPi is reppy 0.4.14, while the source code is already reppy 0.4.16:

    https://pypi.org/project/reppy/#history

    Could you please update the version on PyPi? Thanks!

    opened by mfhepp 0
  • ValueError on robots.txt file

    ValueError on robots.txt file

    Hi,

    I have found a broken robots.txt file which generates a ValueError

    The line which is broken is: Disallow: //xtbcallback.phpDisallow: //login.php

    My question is should it break because the robots file is invalid or should the line be skippend and the rest of the file be read?

    opened by Tijs-2 0
  • Missing files in sdist

    Missing files in sdist

    It appears that the manifest is missing at least one file necessary to build from the sdist for version 0.3.5. You're in good company, about 5% of other projects updated in the last year are also missing files.

    + /tmp/venv/bin/pip3 wheel --no-binary reppy2 -w /tmp/ext reppy2==0.3.5
    Looking in indexes: http://10.10.0.139:9191/root/pypi/+simple/
    Collecting reppy2==0.3.5
      Downloading http://10.10.0.139:9191/root/pypi/%2Bf/6f2/4d760a9389276/reppy2-0.3.5.tar.gz (10 kB)
    Collecting python-dateutil
      Downloading http://10.10.0.139:9191/root/pypi/%2Bf/75b/b3f31ea686f11/python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
      Saved /tmp/ext/python_dateutil-2.8.1-py2.py3-none-any.whl
    Collecting requests
      Downloading http://10.10.0.139:9191/root/pypi/%2Bf/439/99036bfa82904/requests-2.23.0-py2.py3-none-any.whl (58 kB)
      Saved /tmp/ext/requests-2.23.0-py2.py3-none-any.whl
    Collecting url
      Downloading http://10.10.0.139:9191/root/pypi/%2Bf/8c6/e4a117bfc1566/url-0.4.2.tar.gz (140 kB)
    Collecting six>=1.5
      Downloading http://10.10.0.139:9191/root/pypi/%2Bf/8f3/cd2e254d8f793/six-1.14.0-py2.py3-none-any.whl (10 kB)
      Saved /tmp/ext/six-1.14.0-py2.py3-none-any.whl
    Collecting chardet<4,>=3.0.2
      Downloading http://10.10.0.139:9191/root/pypi/%2Bf/fc3/23ffcaeaed0e0/chardet-3.0.4-py2.py3-none-any.whl (133 kB)
      Saved /tmp/ext/chardet-3.0.4-py2.py3-none-any.whl
    Collecting idna<3,>=2.5
      Downloading http://10.10.0.139:9191/root/pypi/%2Bf/a06/8a21ceac8a4d6/idna-2.9-py2.py3-none-any.whl (58 kB)
      Saved /tmp/ext/idna-2.9-py2.py3-none-any.whl
    Collecting certifi>=2017.4.17
      Downloading http://10.10.0.139:9191/root/pypi/%2Bf/1d9/87a998c75633c/certifi-2020.4.5.1-py2.py3-none-any.whl (157 kB)
      Saved /tmp/ext/certifi-2020.4.5.1-py2.py3-none-any.whl
    Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
      Downloading http://10.10.0.139:9191/root/pypi/%2Bf/882/06b0eb87e6d67/urllib3-1.25.9-py2.py3-none-any.whl (126 kB)
      Saved /tmp/ext/urllib3-1.25.9-py2.py3-none-any.whl
    Skipping python-dateutil, due to already being wheel.
    Skipping requests, due to already being wheel.
    Skipping six, due to already being wheel.
    Skipping chardet, due to already being wheel.
    Skipping idna, due to already being wheel.
    Skipping certifi, due to already being wheel.
    Skipping urllib3, due to already being wheel.
    Building wheels for collected packages: reppy2, url
      Building wheel for reppy2 (setup.py): started
      Building wheel for reppy2 (setup.py): finished with status 'error'
      ERROR: Command errored out with exit status 1:
       command: /tmp/venv/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-wheel-k26q4ofp/reppy2/setup.py'"'"'; __file__='"'"'/tmp/pip-wheel-k26q4ofp/reppy2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-r1w3b7xn
           cwd: /tmp/pip-wheel-k26q4ofp/reppy2/
      Complete output (32 lines):
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib
      creating build/lib/reppy
      copying reppy/exceptions.py -> build/lib/reppy
      copying reppy/cache.py -> build/lib/reppy
      copying reppy/__init__.py -> build/lib/reppy
      copying reppy/parser.py -> build/lib/reppy
      installing to build/bdist.linux-x86_64/wheel
      running install
      running install_lib
      creating build/bdist.linux-x86_64
      creating build/bdist.linux-x86_64/wheel
      creating build/bdist.linux-x86_64/wheel/reppy
      copying build/lib/reppy/parser.py -> build/bdist.linux-x86_64/wheel/reppy
      copying build/lib/reppy/__init__.py -> build/bdist.linux-x86_64/wheel/reppy
      copying build/lib/reppy/cache.py -> build/bdist.linux-x86_64/wheel/reppy
      copying build/lib/reppy/exceptions.py -> build/bdist.linux-x86_64/wheel/reppy
      running install_egg_info
      running egg_info
      writing reppy2.egg-info/PKG-INFO
      writing dependency_links to reppy2.egg-info/dependency_links.txt
      writing requirements to reppy2.egg-info/requires.txt
      writing top-level names to reppy2.egg-info/top_level.txt
      warning: Failed to find the configured license file 'LICENSE'
      reading manifest file 'reppy2.egg-info/SOURCES.txt'
      writing manifest file 'reppy2.egg-info/SOURCES.txt'
      Copying reppy2.egg-info to build/bdist.linux-x86_64/wheel/reppy2-0.3.5.egg-info
      running install_scripts
      error: [Errno 2] No such file or directory: 'LICENSE'
      ----------------------------------------
      ERROR: Failed building wheel for reppy2
      Running setup.py clean for reppy2
      Building wheel for url (setup.py): started
      Building wheel for url (setup.py): finished with status 'error'
      ERROR: Command errored out with exit status 1:
       command: /tmp/venv/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-wheel-k26q4ofp/url/setup.py'"'"'; __file__='"'"'/tmp/pip-wheel-k26q4ofp/url/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-657k0jvw
           cwd: /tmp/pip-wheel-k26q4ofp/url/
      Complete output (258 lines):
      Building from C++
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.8
      creating build/lib.linux-x86_64-3.8/url
      copying url/__init__.py -> build/lib.linux-x86_64-3.8/url
      creating build/lib.linux-x86_64-3.8/url/psl
      copying url/psl/2016-08-16.psl -> build/lib.linux-x86_64-3.8/url/psl
      running build_ext
      building 'url.url' extension
      creating build/temp.linux-x86_64-3.8
      creating build/temp.linux-x86_64-3.8/url
      creating build/temp.linux-x86_64-3.8/url/url-cpp
      creating build/temp.linux-x86_64-3.8/url/url-cpp/src
      x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -Iurl/url-cpp/include -I/tmp/venv/include -I/usr/include/python3.8 -c url/url-cpp/src/url.cpp -o build/temp.linux-x86_64-3.8/url/url-cpp/src/url.o -std=c++11
      x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -Iurl/url-cpp/include -I/tmp/venv/include -I/usr/include/python3.8 -c url/url-cpp/src/utf8.cpp -o build/temp.linux-x86_64-3.8/url/url-cpp/src/utf8.o -std=c++11
      x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -Iurl/url-cpp/include -I/tmp/venv/include -I/usr/include/python3.8 -c url/url-cpp/src/punycode.cpp -o build/temp.linux-x86_64-3.8/url/url-cpp/src/punycode.o -std=c++11
      x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -Iurl/url-cpp/include -I/tmp/venv/include -I/usr/include/python3.8 -c url/url-cpp/src/psl.cpp -o build/temp.linux-x86_64-3.8/url/url-cpp/src/psl.o -std=c++11
      x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -Iurl/url-cpp/include -I/tmp/venv/include -I/usr/include/python3.8 -c url/url.cpp -o build/temp.linux-x86_64-3.8/url/url.o -std=c++11
      url/url.cpp: In function ‘PyObject* PyInit_url()’:
      url/url.cpp:10054:34: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10054 |   __pyx_type_3url_3url_StringURL.tp_print = 0;
            |                                  ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10054:34: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10054 |   __pyx_type_3url_3url_StringURL.tp_print = 0;
            |                                  ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10054:34: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10054 |   __pyx_type_3url_3url_StringURL.tp_print = 0;
            |                                  ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10059:35: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10059 |   __pyx_type_3url_3url_UnicodeURL.tp_print = 0;
            |                                   ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10059:35: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10059 |   __pyx_type_3url_3url_UnicodeURL.tp_print = 0;
            |                                   ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10059:35: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10059 |   __pyx_type_3url_3url_UnicodeURL.tp_print = 0;
            |                                   ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10063:52: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10063 |   __pyx_type_3url_3url___pyx_scope_struct__deparam.tp_print = 0;
            |                                                    ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10063:52: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10063 |   __pyx_type_3url_3url___pyx_scope_struct__deparam.tp_print = 0;
            |                                                    ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10063:52: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10063 |   __pyx_type_3url_3url___pyx_scope_struct__deparam.tp_print = 0;
            |                                                    ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10066:53: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10066 |   __pyx_type_3url_3url___pyx_scope_struct_1_genexpr.tp_print = 0;
            |                                                     ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10066:53: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10066 |   __pyx_type_3url_3url___pyx_scope_struct_1_genexpr.tp_print = 0;
            |                                                     ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10066:53: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10066 |   __pyx_type_3url_3url___pyx_scope_struct_1_genexpr.tp_print = 0;
            |                                                     ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10069:59: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10069 |   __pyx_type_3url_3url___pyx_scope_struct_2_filter_params.tp_print = 0;
            |                                                           ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10069:59: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10069 |   __pyx_type_3url_3url___pyx_scope_struct_2_filter_params.tp_print = 0;
            |                                                           ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10069:59: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10069 |   __pyx_type_3url_3url___pyx_scope_struct_2_filter_params.tp_print = 0;
            |                                                           ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10072:53: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10072 |   __pyx_type_3url_3url___pyx_scope_struct_3_genexpr.tp_print = 0;
            |                                                     ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10072:53: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10072 |   __pyx_type_3url_3url___pyx_scope_struct_3_genexpr.tp_print = 0;
            |                                                     ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10072:53: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10072 |   __pyx_type_3url_3url___pyx_scope_struct_3_genexpr.tp_print = 0;
            |                                                     ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10075:53: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10075 |   __pyx_type_3url_3url___pyx_scope_struct_4_genexpr.tp_print = 0;
            |                                                     ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10075:53: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10075 |   __pyx_type_3url_3url___pyx_scope_struct_4_genexpr.tp_print = 0;
            |                                                     ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp:10075:53: warning: ‘_typeobject::tp_print’ is deprecated [-Wdeprecated-declarations]
      10075 |   __pyx_type_3url_3url___pyx_scope_struct_4_genexpr.tp_print = 0;
            |                                                     ^~~~~~~~
      In file included from /usr/include/python3.8/object.h:746,
                       from /usr/include/python3.8/pytime.h:6,
                       from /usr/include/python3.8/Python.h:85,
                       from url/url.cpp:4:
      /usr/include/python3.8/cpython/object.h:260:30: note: declared here
        260 |     Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
            |                              ^~~~~~~~
      url/url.cpp: In function ‘void __Pyx__ExceptionSwap(PyThreadState*, PyObject**, PyObject**, PyObject**)’:
      url/url.cpp:12497:24: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘exc_type’; did you mean ‘curexc_type’?
      12497 |     tmp_type = tstate->exc_type;
            |                        ^~~~~~~~
            |                        curexc_type
      url/url.cpp:12498:25: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘exc_value’; did you mean ‘curexc_value’?
      12498 |     tmp_value = tstate->exc_value;
            |                         ^~~~~~~~~
            |                         curexc_value
      url/url.cpp:12499:22: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
      12499 |     tmp_tb = tstate->exc_traceback;
            |                      ^~~~~~~~~~~~~
            |                      curexc_traceback
      url/url.cpp:12500:13: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘exc_type’; did you mean ‘curexc_type’?
      12500 |     tstate->exc_type = *type;
            |             ^~~~~~~~
            |             curexc_type
      url/url.cpp:12501:13: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘exc_value’; did you mean ‘curexc_value’?
      12501 |     tstate->exc_value = *value;
            |             ^~~~~~~~~
            |             curexc_value
      url/url.cpp:12502:13: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
      12502 |     tstate->exc_traceback = *tb;
            |             ^~~~~~~~~~~~~
            |             curexc_traceback
      error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
      ----------------------------------------
      ERROR: Failed building wheel for url
      Running setup.py clean for url
    Failed to build reppy2 url
    ERROR: Failed to build one or more wheels
    
    opened by thatch 3
Owner
Moz
Moz
Manipulation OpenAI Gym environments to simulate robots at the STARS lab

liegroups Python implementation of SO2, SE2, SO3, and SE3 matrix Lie groups using numpy or PyTorch. [Documentation] Installation To install, cd into t

STARS Laboratory 259 Dec 11, 2022
Script em python, utilizando PySimpleGUI, para a geração de arquivo txt a ser importado no sistema de Bilhetagem Eletrônica da RioCard, no Estado do Rio de Janeiro.

pedido-vt-riocard Script em python, utilizando PySimpleGUI, para a geração de arquivo txt a ser importado no sistema de Bilhetagem Eletrônica da RioCa

Carlos Bruno Gomes 1 Dec 1, 2021
A simple python script where the user inputs the current ingredients they have in their kitchen into ingredients.txt

A simple python script where the user inputs the current ingredients they have in their kitchen into ingredients.txt and then runs the main.py script, and it will output what recipes can be created based upon the ingredients supported.

Jordan Leich 3 Nov 2, 2022
Python requirements.txt Guesser

Python-Requirements-Guesser ⚠️ This is alpha quality software. Work in progress Attempt to guess requirements.txt modules versions based on Git histor

Jerome 9 May 24, 2022
Render your templates using .txt files

PizzaX About Run Run tests To run the tests, open your terminal and type python tests.py (WIN) or python3 tests.py (UNX) Using the function To use the

Marcello Belanda 2 Nov 24, 2021
Birthday program - A program that lookups a birthday txt file and compares to the current date to check for birthdays

Birthday Program This is a program that lookups a birthday txt file and compares

Daquiver 4 Feb 2, 2022
Analisador de strings feito em Python // String parser made in Python

Este é um analisador feito em Python, neste programa, estou estudando funções e a sua junção com "if's" e dados colocados pelo usuário. Neste código,

Dev Nasser 1 Nov 3, 2021
:snake: Complete C99 parser in pure Python

pycparser v2.20 Contents 1 Introduction 1.1 What is pycparser? 1.2 What is it good for? 1.3 Which version of C does pycparser support? 1.4 What gramma

Eli Bendersky 2.8k Dec 29, 2022
A Gura parser implementation for Python

Gura parser This repository contains the implementation of a Gura format parser in Python. Installation pip install gura-parser Usage import gura gur

JWare Solutions 19 Jan 25, 2022
Parser for RISC OS Font control characters in Python

RISC OS Font control parsing in Python This repository contains a class (FontControlParser) for parsing font control codes from a byte squence, in Pyt

Charles Ferguson 1 Nov 2, 2021
A python library for writing parser-based interactive fiction.

About IntFicPy A python library for writing parser-based interactive fiction. Currently in early development. IntFicPy Docs Parser-based interactive f

Rita Lester 31 Nov 23, 2022
Neogex is a human readable parser standard, being implemented in Python

Neogex (New Expressions) Parsing Standard Much like Regex, Neogex allows for string parsing and validation based on a set of requirements. Unlike Rege

Seamus Donnellan 1 Dec 17, 2021
A modern Python build backend

trampolim A modern Python build backend. Features Task system, allowing to run arbitrary Python code during the build process (Planned) Easy to use CL

Filipe Laíns 39 Nov 8, 2022
A python program with an Objective-C GUI for building and booting OpenCore on both legacy and modern Macs

A python program with an Objective-C GUI for building and booting OpenCore on both legacy and modern Macs, see our in-depth Guide for more information.

dortania 4.7k Jan 2, 2023
A modern python module including many useful features that make discord bot programming extremely easy.

discord-super-utils Documentation Secondary Documentation A modern python module including many useful features that make discord bot programming extr

null 106 Dec 19, 2022
Ergonomic option parser on top of dataclasses, inspired by structopt.

oppapī Ergonomic option parser on top of dataclasses, inspired by structopt. Usage from typing import Optional from oppapi import from_args, oppapi @

yukinarit 4 Jul 19, 2022
Parser for air tickets' price

Air-ticket-price-parser Parser for air tickets' price How to Install Firefox If geckodriver.exe is not compatible with your Firefox version, download

Situ Xuannn 1 Dec 13, 2021
A simple string parser based on CLR to check whether a string is acceptable or not for a given grammar.

A simple string parser based on CLR to check whether a string is acceptable or not for a given grammar.

Bharath M Kulkarni 1 Dec 15, 2021
A parser of Windows Defender's DetectionHistory forensic artifact, containing substantial info about quarantined files and executables.

A parser of Windows Defender's DetectionHistory forensic artifact, containing substantial info about quarantined files and executables.

Jordan Klepser 101 Oct 30, 2022