declutters url lists for crawling/pentesting

Related tags

URL Manipulation uro
Overview

uro

Using a URL list for security testing can be painful as there are a lot of URLs that have uninteresting/duplicate content; uro aims to solve that.

It doesn't make any http requests to the URLs and removes:

  • human written content e.g. blog posts
  • urls with same path but parameter value difference
  • incremental urls e.g. /cat/1/ and /cat/2/
  • image, js, css and other static files

Usage

First, install uro with pip:

pip3 install uro

Now, there's just one way to use it, no args, no bullshit.

cat urls.txt | uro

uro-demo

Comments
  • ImportError: cannot import name 'SIGPIPE' from 'signal'

    ImportError: cannot import name 'SIGPIPE' from 'signal'

    D:\uro>uro Traceback (most recent call last): File "C:\Users\umara\AppData\Local\Programs\Python\Python38\Scripts\uro-script.py", line 33, in sys.exit(load_entry_point('uro==0.0.2', 'console_scripts', 'uro')()) File "C:\Users\umara\AppData\Local\Programs\Python\Python38\Scripts\uro-script.py", line 25, in importlib_load_entry_point return next(matches).load() File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\importlib\metadata.py", line 77, in load module = import_module(match.group('module')) File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\importlib_init_.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 655, in _load_unlocked File "", line 618, in _load_backward_compatible File "", line 259, in load_module File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\site-packages\uro-0.0.2-py3.8.egg\uro\uro.py", line 4, in ImportError: cannot import name 'SIGPIPE' from 'signal' (C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\signal.py)

    opened by umar98 3
  • Error install uro

    Error install uro

    suya has the error... WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behavior with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

    I've done the steps above but haven't found a bright spot :(

    can anyone help me???

    invalid 
    opened by mjulda 2
  • When using uro on subdomains it leaves :// in front?

    When using uro on subdomains it leaves :// in front?

    When using uro on subdomains it leaves :// in front

    example:

    cat subs.txt | uro

    subs.txt example: site.com sub.site.com sub123.site.com

    anything without http:// or https:// in front it leaves the :// in front.

    opened by gprime31 2
  • ERROR

    ERROR

    i just can't get this to work have cloned the repo and run the install command, bur when i try "cat file.txt | uro" it dosen't work. do i have to do any additional commands? any installation video??:)

    invalid 
    opened by spector012 2
  • PLease solve this

    PLease solve this

    └─# cat params.csv | uro | wc -l Traceback (most recent call last): File "/usr/local/bin/uro", line 8, in sys.exit(main()) File "/usr/local/lib/python3.9/dist-packages/uro/uro.py", line 155, in main if re.search(pattern, path): File "/usr/lib/python3.9/re.py", line 201, in search return _compile(pattern, flags).search(string) File "/usr/lib/python3.9/re.py", line 304, in _compile p = sre_compile.compile(pattern, flags) File "/usr/lib/python3.9/sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "/usr/lib/python3.9/sre_parse.py", line 962, in parse raise source.error("unbalanced parenthesis") re.error: unbalanced parenthesis at position 68 6547

    opened by r3dpars3c 2
  • It doesn't delete paths

    It doesn't delete paths

    When we check the paths, we see that 43935989 and 43935976 are used differently.

    root@localhost:~# cat urls.txt
    https://news.mail.ru/politics/43935976/?social=tw
    https://news.mail.ru/politics/43935989/?social=tw
    

    it should delete one of them but it doesn't.

    root@localhost:~# cat urls.txt | uro
    https://news.mail.ru/politics/43935976/?social=tw
    https://news.mail.ru/politics/43935989/?social=tw
    
    bug 
    opened by Phoenix1112 1
  • error handling

    error handling

    So I added uro to my workflow and after a while I got this error:

    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 139, in main
        if matches_patterns(path):
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 107, in matches_patterns
        if re.search(pattern, path):
      File "/usr/lib/python3.8/re.py", line 201, in search
        return _compile(pattern, flags).search(string)
      File "/usr/lib/python3.8/re.py", line 304, in _compile
        p = sre_compile.compile(pattern, flags)
      File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
        p = sre_parse.parse(p, flags)
      File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
        p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
      File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
        itemsappend(_parse(source, state, verbose, nested + 1,
      File "/usr/lib/python3.8/sre_parse.py", line 836, in _parse
        raise source.error("missing ), unterminated subpattern",
    re.error: missing ), unterminated subpattern at position 369
    

    It is happening to me with different inputs so seems to be something that happens often

    invalid 
    opened by marcelo321 1
  • Uro error

    Uro error

    λ cat newfile222.txt | uro Traceback (most recent call last): File "C:\Users\Yaseen\AppData\Local\Programs\Python\Python39\Scripts\uro-script.py", line 33, in sys.exit(load_entry_point('uro==0.0.1', 'console_scripts', 'uro')()) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\site-packages\uro\uro.py", line 139, in main if matches_patterns(path): File "c:\users\yaseen\appdata\local\programs\python\python39\lib\site-packages\uro\uro.py", line 107, in matches_patterns if re.search(pattern, path): File "c:\users\yaseen\appdata\local\programs\python\python39\lib\re.py", line 201, in search return _compile(pattern, flags).search(string) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\re.py", line 304, in _compile p = sre_compile.compile(pattern, flags) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 948, in parse p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 443, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 836, in _parse raise source.error("missing ), unterminated subpattern", re.error: missing ), unterminated subpattern at position 379 cat: write error: No space left on device

    Can you help, it saying space issue, i have alot of space

    bug invalid 
    opened by hellofresh01 1
  • Improvement Request

    Improvement Request

    Hi Somdev,

    1. I'd like to suggest you add the following extensions to be blacklisted. I have gathered all of these extensions manually and I think It would be nice to omit them:
    'svg','img','gif','mp4','flv','ogv','webm','webp','mov','mp3','m4a','m4p','ppt','pptx','pdf','scss','tif','tiff','ttf','otf','woff','woff2','eot','htc','swf','rtf','image'
    
    1. Also, I would like to ask for white-listing and allowing the js extension as there are lots of interesting features/endpoints to be found on them and I don't think if they are considered "useless".

    Thanks!

    Kind Regards, HolyBugx

    enhancement 
    opened by HolyBugx 1
  • More extension to declutter

    More extension to declutter

    Maybe it can be useful to add this extension to the one to declutter, at least, it's what I usually do:

    .doc
    .docx
    .mp3
    .mp4
    .exe
    .tif
    .ttf
    .woff
    .woff2
    .ico
    .zip
    
    duplicate 
    opened by leorac 0
  • Bad character range P-C at position 31

    Bad character range P-C at position 31

    cat urls.txt | uro

    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 155, in main
        if re.search(pattern, path):
      File "/usr/lib/python3.8/re.py", line 201, in search
        return _compile(pattern, flags).search(string)
      File "/usr/lib/python3.8/re.py", line 304, in _compile
        p = sre_compile.compile(pattern, flags)
      File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
        p = sre_parse.parse(p, flags)
      File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
        p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
      File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
        itemsappend(_parse(source, state, verbose, nested + 1,
      File "/usr/lib/python3.8/sre_parse.py", line 598, in _parse
        raise source.error(msg, len(this) + 1 + len(that))
    re.error: bad character range P-C at position 31
    
    bug 
    opened by remonsec 0
  • uro error

    uro error

    cat urls.txt | uro > test

    Traceback (most recent call last): File "/usr/local/bin/uro", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/uro/uro.py", line 123, in main for line in sys.stdin: File "/usr/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

    @s0md3v

    bug 
    opened by Iamsajidkhan 0
  • Error

    Error

    Traceback (most recent call last): File "/usr/local/bin/uro", line 33, in sys.exit(load_entry_point('uro==0.0.4', 'console_scripts', 'uro')()) File "/usr/local/bin/uro", line 25, in importlib_load_entry_point return next(matches).load() StopIteration

    opened by umarahmad125 0
  • broken pipe

    broken pipe

    I have been encountering this issue:

      File "/usr/local/bin/uro", line 10, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.7/dist-packages/uro/uro.py", line 151, in main
        print(host + path + dict_to_params(param))
    BrokenPipeError: [Errno 32] Broken pipe
    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 10, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.7/dist-packages/uro/uro.py", line 161, in main
        print(host + path)
    BrokenPipeError: [Errno 32] Broken pipe
    

    Any idea why would it be?

    opened by marcelo321 0
  • enhanced filtration

    enhanced filtration

    like i want to filter "/A/embed?url=" or "/B/embed?url=" which return similar data like i want to filter "/A.php" or "/A.php/" which return similar data

    enhancement 
    opened by LztCode 1
Releases(0.0.4)
  • 0.0.4(Mar 19, 2022)

  • 0.0.3(Feb 27, 2022)

    • removed redundant imports and code
    • added more extensions to blacklist
    • less memory and time consumption
    • fixed 'broken pipe' error when piping the output to utilities like head
    • fixed an error where similar urls were not getting filtered when they had any parameters
    Source code(tar.gz)
    Source code(zip)
  • 0.0.2(Sep 1, 2021)

Owner
Somdev Sangwan
I make things, I break things and I make things that break things.
Somdev Sangwan
A simple, immutable URL class with a clean API for interrogation and manipulation.

purl - A simple Python URL class A simple, immutable URL class with a clean API for interrogation and manipulation. Supports Pythons 2.7, 3.3, 3.4, 3.

David Winterbottom 286 Jan 2, 2023
Have you ever wondered: Where does this link go? The REDLI Tool follows the path of the URL.

Have you ever wondered: Where does this link go? The REDLI Tool follows the path of the URL. It allows you to see the complete path a redirected URL goes through. It will show you the full redirection path of URLs, shortened links, or tiny URLs.

JAYAKUMAR 28 Sep 11, 2022
A URL builder for genius :D

genius-url A URL builder for genius :D Usage from gurl import genius_url

ꌗᖘ꒒ꀤ꓄꒒ꀤꈤꍟ 12 Aug 14, 2021
a url shortener project from semicolonworld

Url Shortener With Django Written by Semicolon World

null 3 Aug 24, 2021
find all the URL of a site with a specific Regex

href this program will find all the link with a spesfic Regex pattern from a site. what it will do in any site there are a lots of url that may you ne

Arya Shabane 12 Dec 5, 2022
This is a no-bullshit file hosting and URL shortening service that also runs 0x0.st. Use with uWSGI.

This is a no-bullshit file hosting and URL shortening service that also runs 0x0.st. Use with uWSGI.

mia 1.6k Dec 31, 2022
python3 flask based python-url-shortener microservice.

python-url-shortener This repository is for managing all public/private entity specific api endpoints for an organisation. In this case we have entity

Asutosh Parida 1 Oct 18, 2021
A python code for url redirect check

A python code for url redirect check

Fayas Noushad 1 Oct 24, 2021
A url redirect status check module for python

A url redirect status check module for python

Fayas Noushad 2 Oct 24, 2021
URL Shortener in Flask - Web service using Flask framework for Shortener URLs

URL Shortener in Flask Web service using Flask framework for Shortener URLs Install Create Virtual env $ python3 -m venv env Install requirements.txt

Rafnix Guzman 1 Sep 21, 2021
Use this module to detect if a URL is on discord's phishing list.

PhishDetector This module was made so you can check a URL and see if it's in discord's official list of phishing and suspicious URLs. Installation pip

Elijah 4 Mar 25, 2022
A url shortner written in Flask.

url-shortener-elitmus This is a simple flask app which takes an URL and shortens it. This shortened verion of the URL redirects to the user to the lon

null 2 Nov 23, 2021
Customizable URL shortener written in Python3 for sniffing and spoofing

Customizable URL shortener written in Python3 for sniffing and spoofing

null 3 Nov 22, 2022
A simple URL shortener built with Flask

A simple URL shortener built with Flask and MongoDB.

Mike Lowe 2 Feb 5, 2022
A simple URL shortener app using Python AWS Chalice, AWS Lambda and AWS Dynamodb.

url-shortener-chalice A simple URL shortener app using AWS Chalice. Please make sure you configure your AWS credentials using AWS CLI before starting

Ranadeep Ghosh 2 Dec 9, 2022
Ukiyo - A simple, minimalist and efficient discord vanity URL sniper

Ukiyo - a simple, minimalist and efficient discord vanity URL sniper. Ukiyo is easy to use, has a very visually pleasing interface, and has great spee

null 13 Apr 14, 2022
Qysqa - URL shortener website with python

Qysqa - shorten your URL. ~ A simple URL-shortening website. how do you pronounc

Dastan Ozgeldi 0 Nov 18, 2022
Shorten-Link - Make shorten URL with Cuttly API

Shorten-Link This Script make shorten URL with custom slashtag The script take f

Ahmed Hossam 3 Feb 13, 2022
ShortenURL-model - The model layer class for shorten url service

ShortenURL Model The model layer class for shorten URL service Usage Complete th

TwinIsland 1 Jan 7, 2022