UDdup - URLs Deduplication Tool
The tool gets a list of URLs, and removes "duplicate" pages in the sense of URL patterns that are probably repetitive and points to the same web template.
https://www.example.com/product/123 https://www.example.com/product/456 https://www.example.com/product/123?is_prod=false https://www.example.com/product/222?is_debug=true
All the above are probably points to the same product "template". Therefore it should be enough to scan only some of these URLs by our various scanners.
The result of the above after UDdup should be:
Why do I need it?
Mostly for better (automated) reconnaissance process, with less noise (for both the tester and the target).
Take a look at
demo.txt which is the raw URLs file which results in
With pip (Recommended)
pip install uddup
Manual (from code)
# Clone the repository. git clone https://github.com/rotemreiss/uddup.git # Install the Python requirements. cd uddup pip install -r requirements.txt
uddup -u demo.txt -o ./demo-result.txt
More Usage Options
|Short Form||Long Form||Description|
|-h||--help||Show this help message and exit|
|-u||--urls||File with a list of urls|
|-o||--output||Save results to a file|
|-s||--silent||Print only the result URLs|
|-fp||--filter-path||Filter paths by a given Regex|
Filter Paths by Regex
Allows filtering custom paths pattern. For example, if we would like to filter all paths that starts with
/product we will need to run:
# Single Regex uddup -u demo.txt -fp "^product"
https://www.example.com/ https://www.example.com/privacy-policy https://www.example.com/product/1 https://www.example2.com/product/2 https://www.example3.com/product/4
Advanced Regex with multiple path filters
uddup -u demo.txt -fp "(^product)|(^category)"
Feel free to fork the repository and submit pull-requests.
Want to say thanks? :) Message me on Linkedin