A python module for retrieving and parsing WHOIS data

Overview

pythonwhois

A WHOIS retrieval and parsing library for Python.

Dependencies

None! All you need is the Python standard library.

Instructions

The manual (including install instructions) can be found in the doc/ directory. A HTML version is also viewable here.

Goals

  • 100% coverage of WHOIS formats.
  • Accurate and complete data.
  • Consistently functional parsing; constant tests to ensure the parser isn't accidentally broken.

Features

  • WHOIS data retrieval
    • Able to follow WHOIS server redirects
    • Won't get stuck on multiple-result responses from verisign-grs
  • WHOIS data parsing
    • Base information (registrar, etc.)
    • Dates/times (registration, expiry, ...)
    • Full registrant information (!)
    • Nameservers
  • Optional WHOIS data normalization
    • Attempts to intelligently reformat WHOIS data for better (human) readability
    • Converts various abbreviation types to full locality names
      • Airport codes
      • Country names (2- and 3-letter ISO codes)
      • US states and territories
      • Canadian states and territories
      • Australian states
  • pwhois, a simple WHOIS tool using pythonwhois
    • Easily readable output format
    • Can also output raw WHOIS data
    • ... and JSON.
  • Automated testing suite
    • Will detect and warn about any changes in parsed data compared to previous runs
    • Guarantees that previously working WHOIS parsing doesn't unintentionally break when changing code

IP range WHOIS

pythonwhois does not yet support WHOIS lookups on IP ranges (including single IPs), although this will be added at some point in the future. In the meantime, consider using ipwhois - it offers functionality and an API similar to pythonwhois, but for IPs. It also supports delegated RWhois.

Do note that ipwhois does not offer a normalization feature, and does not (yet) come with a command-line tool. Additionally, ipwhois is maintained by Philip Hane and not by me; please make sure to file bugs relating to it in the ipwhois repository, not in that of pythonwhois.

Important update notes

2.4.0 and up: A lot of changes were made to the normalization, and the performance under Python 2.x was significantly improved. The average parsing time under Python 2.7 has dropped by 94% (!), and on my system averages out at 18ms. Performance under Python 3.x is unchanged. pythonwhois will now expand a lot of abbreviations in normalized mode, such as airport codes, ISO country codes, and US/CA/AU state abbreviations. The consequence of this is that the library is now bigger (as it ships a list of these abbreviations). Also note that there may be licensing consequences, in particular regarding the airport code database. More information about that can be found below.

2.3.0 and up: Python 3 support was fixed. Creation date parsing for contacts was fixed; correct timestamps will now be returned, rather than unformatted ones - if your application relies on the broken variant, you'll need to change your code. Some additional parameters were added to the net and parse methods to facilitate NIC handle lookups; the defaults are backwards-compatible, and these changes should not have any consequences for your code. Thai WHOIS parsing was implemented, but is a little spotty - data may occasionally be incorrectly split up. Please submit a bug report if you run across any issues.

2.2.0 and up: The internal workings of get_whois_raw have been changed, to better facilitate parsing of WHOIS data from registries that may return multiple partial matches for a query, such as whois.verisign-grs.com. This change means that, by default, get_whois_raw will now strip out the part of such a response that does not pertain directly to the requested domain. If your application requires an unmodified raw WHOIS response and is calling get_whois_raw directly, you should use the new never_cut parameter to keep pythonwhois from doing this post-processing. As this is a potentially breaking behaviour change, the minor version has been bumped.

It doesn't work!

  • It doesn't work at all?
  • It doesn't parse the data for a particular domain?
  • There's an inaccuracy in parsing the data for a domain, even just a small one?

If any of those apply, don't hesitate to file an issue! The goal is 100% coverage, and we need your feedback to reach that goal.

License

This library may be used under the WTFPL - or, if you take issue with that, consider it to be under the CC0.

Data sources

This library uses a number of third-party datasets for normalization:

Be aware that the OpenFlights database in particular has potential licensing consequences; if you do not wish to be bound by these potential consequences, you may simply delete the airports.dat file from your distribution. pythonwhois will assume there is no database available, and will not perform airport code conversion (but still function correctly otherwise). This also applies to other included datasets.

Contributing

Feel free to fork and submit pull requests (to the develop branch)! If you change any parsing or normalization logic, ensure to run the full test suite before opening a pull request. Instructions for that are below.

Please note that this project uses tabs for indentation.

All commands are relative to the root directory of the repository.

Pull requests that do not include output from test.py will be rejected!

Adding new WHOIS data to the testing set

pwhois --raw thedomain.com > test/data/thedomain.com

Checking the currently parsed data (while editing the parser)

./pwhois -f test/data/thedomain.com/ .

(don't forget the dot at the end!)

Marking the current parsed data as correct for a domain

Make sure to verify (using pwhois or otherwise) that the WHOIS data for the domain is being parsed correctly, before marking it as correct!

./test.py update thedomain.com

Running all tests

./test.py run all

Testing a specific domain

./test.py run thedomain.com

Running the full test suite including support for multiple python versions

tox

Generating documentation

You need ZippyDoc (which can be installed through pip install zippydoc).

zpy2html doc/*.zpy
Comments
  • Nominet parsing

    Nominet parsing

    Hi,

    When running against .co.uk domains this seems to be excluding contact information. (when searching 'google.co.uk' address information is included in raw output but not parsed by the code). I've also noticed it seems to be missing the "Registration status:" field too.

    Thanks

    bug 
    opened by meamens 13
  • Unable to get whois record for .ac.uk domains

    Unable to get whois record for .ac.uk domains

    Attempted to pull whois record for UK HE institutions as follows (for example):

    pythonwhois.get_whois("imperial.ac.uk")
    

    This responds with the whois record for "ac.uk" not "imperial.ac.uk". From the response data:

    Domain name:\r\n        ac.uk
    

    I guess you require explicit lists of supported extensions?

    bug 
    opened by richard-jones 9
  • Do you need a new maintainer?

    Do you need a new maintainer?

    Noticing a lot of a open pull requests. Happy to join on maintenance, or to take over. Never done that before, so I'm not sure exactly the best process. This is a great library, I'd be happy to help. Let me know!

    question 
    opened by floer32 8
  • Fail to import pythonwhois in my python application

    Fail to import pythonwhois in my python application

    Sorry i'm new to python. I've done 'pip install pythonwhois' my current verison is - pythonwhois-2.4.3-py2.7.egg

    Then i've written this python script to just do a WHOis lookup on "www.google.com"

    I get this error running the script: AttributeError: 'module' object has no attribute 'get_whois'

    It's not finding the get_whois function from the library. What am i missing?

    _code_*

    import pythonwhois

    domain = "www.google.com" normalized = True

    pythonwhois.get_whois(domain, True)

    opened by mdagcilar 7
  • Unicode parsing problems

    Unicode parsing problems

    Looking up some IP addresses seems to lead to an UnicodeDecodeError:

    >>> pythonwhois.get_whois('179.175.242.131')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/fsufitch/python-whois/pythonwhois/__init__.py", line 4, in get_whois
        raw_data, server_list = net.get_whois_raw(domain, with_server_list=True)
      File "/home/fsufitch/python-whois/pythonwhois/net.py", line 42, in get_whois_raw
        response = whois_request(request_domain, target_server)
      File "/home/fsufitch/python-whois/pythonwhois/net.py", line 92, in whois_request
        return buff.decode("utf-8")
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 565: invalid continuation byte
    

    Also:

    >>> pythonwhois.get_whois('80.148.135.28')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/fsufitch/python-whois/pythonwhois/__init__.py", line 8, in get_whois
        return parse.parse_raw_whois(raw_data, normalized=normalized, never_query_handles=False, handle_server=server_list[-1])
      File "/home/fsufitch/python-whois/pythonwhois/parse.py", line 553, in parse_raw_whois
        data["contacts"] = parse_registrants(raw_data, never_query_handles, handle_server)
      File "/home/fsufitch/python-whois/pythonwhois/parse.py", line 894, in parse_registrants
        contact = fetch_nic_contact(data_reference["handle"], handle_server)
      File "/home/fsufitch/python-whois/pythonwhois/parse.py", line 981, in fetch_nic_contact
        response = net.get_whois_raw(handle, lookup_server)
      File "/home/fsufitch/python-whois/pythonwhois/net.py", line 42, in get_whois_raw
        response = whois_request(request_domain, target_server)
      File "/home/fsufitch/python-whois/pythonwhois/net.py", line 92, in whois_request
        return buff.decode("utf-8")
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdf in position 706: invalid continuation byte
    

    More IPs that break in this way:

    • 177.169.129.223
    • 80.148.135.28
    • 181.1.55.141
    • 91.6.97.224
    • 179.246.105.144
    • 94.92.58.77

    I ran into this while testing a separate program on completely random IPs. To reproduce:

    import ipaddress, random, traceback
    import pythonwhois
    
    randip = lambda: ipaddress.ip_address(random.randint(0,256**4-1)).compressed
    
    for i in range(50):
       ip = randip()
       try:
           pythonwhois.get_whois(ip)
       except UnicodeDecodeError as e:
           print('========', ip, '========')
           traceback.print_exc()
       except Exception as e:
           continue
    

    Thanks!

    bug help-wanted patch-pending encoding 
    opened by fsufitch 7
  • whois lookup for ASN

    whois lookup for ASN

    I can use this module to lookup ASN, but sometimes encountered errors: (1) AS4515 - No root WHOIS server found for domain (2) AS2706 - No root WHOIS server found for domain

    Please have a look, thanks.

    question 
    opened by wallyw1 6
  • added .de region specific support

    added .de region specific support

    You seem to query whois.denic.de for the .de TLD. Unfortunately no data is received by your module as you're lacking the -T dn,ace -C US-ASCII string before the actually domain to query.

    You can easily check the behaviour through echo "-T dn,ace -C US-ASCII test.de" | nc whois.denic.de 43 and echo "test.de" | nc whois.denic.de 43

    Added domain test.de to test-cases as this very domain is unlikely to change over time due to the state-sponsored organisation behind.

    Test-cases are green and I guess there is no need to print the whole output there considered the case that this change is a quite trivial one. I do hope you agree :smile:

    opened by ckoepp 6
  • default kwargs are evaluated at definition time

    default kwargs are evaluated at definition time

    After seeing some strange behavior while trying to write unittests for my script which uses python-whois, I thought I would go ahead and submit this patch. This PR addresses a comment made here:

    https://github.com/joepie91/python-whois/commit/d86e4ba9166e80b8c66b093b355f3c32e94c0229#commitcomment-6967148

    This stackoverflow question covers the topic pretty well: “Least Astonishment” in Python: The Mutable Default Argument

    From docs.python.org:

    Default parameter values are evaluated when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that the same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. A way around this is to use None as the default, and explicitly test for it in the body of the function

    https://docs.python.org/2/reference/compound_stmts.html#function-definitions

    pending-response 
    opened by stavxyz 5
  • Python 3.x support?

    Python 3.x support?

    It does not appear python-whois has Python 3.x support. As 3.x grows more commonplace, this will become a serious obstacle to adoption.

    Some problems I ran into:

    ~/python-whois$ python3 pwhois
      File "pwhois", line 6
        except ImportError, e:
                          ^
    SyntaxError: invalid syntax
    
    
    ~/python-whois$ python3
    Python 3.4.0 (default, Apr 11 2014, 13:05:11) 
    [GCC 4.8.2] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from pythonwhois import get_whois
    >>> get_whois('google.com')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/fsufitch/python-whois/pythonwhois/__init__.py", line 4, in get_whois
        raw_data = net.get_whois_raw(domain)
      File "/home/fsufitch/python-whois/pythonwhois/net.py", line 15, in get_whois_raw
        domain = encode( domain if type(domain) is unicode else decode(domain, "utf8"),    "idna" )
    NameError: name 'unicode' is not defined
    

    There may be other problems too, but I do not have the time to conduct a code review of it right now.

    Thanks for looking into this! I really wanted to use pythonwhois but my project is unfortunately python3-only.

    bug 
    opened by fsufitch 5
  • several domains not parsable

    several domains not parsable

    The parser cannot parse the following whois domains:

    • .co.th like starbucks.co.th
    • .ir like nic.ir
    • .com like servequake.com
    • .com.mx like expopack.com.mx (already referred to issue #22 )

    Edit: I'll add non-working tlds with example domains to this post.

    bug 
    opened by ckoepp 5
  • another regex explosion

    another regex explosion

    This is similar to issue #2. The call takes ~400s to complete; if interrupted, the traceback is somewhere is re module.

    >>> import pythonwhois
    >>> pythonwhois.get_whois('drpciv.biz')
    ^CTraceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "[...]/lib/python2.7/site-packages/pythonwhois/__init__.py", line 5, in get_whois
        return parse.parse_raw_whois(raw_data, normalized=normalized)
      File "[...]/lib/python2.7/site-packages/pythonwhois/parse.py", line 255, in parse_raw_whois
        data["contacts"] = parse_registrants(raw_data)
      File "[...]/lib/python2.7/site-packages/pythonwhois/parse.py", line 639, in parse_registrants
        match = re.search(regex, segment)
      File "[...]/lib/python2.7/re.py", line 148, in search
        return _compile(pattern, flags).search(string)
    KeyboardInterrupt
    
    bug 
    opened by obormot 5
Owner
Sven Slootweg
Sven Slootweg
DomainMonitor is a web project that has a RESTful API to get a domain's subdomains and whois data.

DomainMonitor is a web project that has a RESTful API to get a domain's subdomains and whois data.

null 2 Feb 5, 2022
#whois it? Let's find out!

whois_bot #whois it? Let's find out! Currently in development: a gatekeeper bot for a community (https://t.me/IT_antalya) of 250+ expat IT pros of Ant

Kirill Nikolaev 14 Jun 24, 2022
An experimental script to perform bulk parsing of arbitrary file features with YARA and console logging.

RonnieColemanYARAParser This script is named after Ronnie Coleman, and peforms bulk lifts on arbitary file features using YARA console logging. Requir

Steve 20 Dec 13, 2022
A Radare2 based Python module for Binary Analysis and Reverse Engineering.

Zepu1chr3 A Radare2 based Python module for Binary Analysis and Reverse Engineering. Installation You can simply run this command. pip3 install zepu1c

Mehmet Ali KERİMOĞLU 5 Aug 25, 2022
A cross-platform Python module that displays **** for password input. Works on Windows, unlike getpass. Formerly called stdiomask.

PWInput A cross-platform Python module that displays **** for password input. Works on Windows, unlike getpass. Formerly called stdiomask. Installatio

Al Sweigart 26 Sep 4, 2022
A hashtag check python module

A hashtag check python module

Fayas Noushad 3 Aug 10, 2022
AIL LeakFeeder: A Module for AIL Framework that automate the process to feed leaked files automatically to AIL

AIL LeakFeeder: A Module for AIL Framework that automates the process to feed leaked files automatically to AIL, So basically this feeder will help you ingest AIL with your leaked files automatically.

ail project 8 May 3, 2022
WinRemoteEnum is a module-based collection of operations achievable by a low-privileged domain user.

WinRemoteEnum WinRemoteEnum is a module-based collection of operations achievable by a low-privileged domain user, sharing the goal of remotely gather

Simon 9 Nov 9, 2022
pwncat module that automatically exploits CVE-2021-4034 (pwnkit)

pwncat_pwnkit Introduction The purpose of this module is to attempt to exploit CVE-2021-4034 (pwnkit) on a target when using pwncat. There is no need

Dana Epp 33 Jul 1, 2022
This collection of tools that makes it easy to secure and/or obfuscate messages, files, and data.

Scrambler App This collection of tools that makes it easy to secure and/or obfuscate messages, files, and data. It leverages encryption tools such as

Mystic 2 Aug 31, 2022
Python library to prevent XSS(cross site scripting attach) by removing harmful content from data.

A tool for removing malicious content from input data before saving data into database. It takes input containing HTML with XSS scripts and returns va

null 2 Jul 5, 2022
RedlineSpam - Python tool to spam Redline Infostealer panels with legit looking data

RedlineSpam Python tool to spam Redline Infostealer panels with legit looking da

null 4 Jan 27, 2022
Unauthenticated Sqlinjection that leads to dump data base but this one impersonated Admin and drops a interactive shell

Unauthenticated Sqlinjection that leads to dump database but this one impersonated Admin and drops a interactive shell

sam 16 Nov 9, 2022
Data Recovery from your broken Android phone

Broken Phone Recovery a guide how to backup data from your locked android phone if you broke your screen (and more) you can skip some steps depending

v1nc 25 Sep 23, 2022
We protect the privacy of the data on your computer by using the camera of your Debian based Pardus operating system. 🕵️

Pardus Lookout We protect the privacy of the data on your computer by using the camera of your Debian based Pardus operating system. The application i

Ahmet Furkan DEMIR 19 Nov 18, 2022
CloakifyFactory & the Cloakify Toolset - Data Exfiltration & Infiltration In Plain Sight;

CloakifyFactory CloakifyFactory & the Cloakify Toolset - Data Exfiltration & Infiltration In Plain Sight; Evade DLP/MLS Devices; Social Engineering of

null 3 Oct 18, 2022
Wordlist attacks on Bitwarden data.json files

BitwardenDecryptBrute This is a slightly modified version of BitwardenDecrypt. In addition to the decryption this version can do wordlist attacks for

null 42 Nov 9, 2022
Delta Sharing: An Open Protocol for Secure Data Sharing

Delta Sharing: An Open Protocol for Secure Data Sharing Delta Sharing is an open protocol for secure real-time exchange of large datasets, which enabl

Delta Lake 497 Jan 2, 2023
Acc-Data-Gen - Allows you to generate a password, e-mail & token for your Minecraft Account

Acc-Data-Gen Allows you to generate a password, e-mail & token for your Minecraft Account How to use the generator: Move all the files in a single dir

KarmaBait 2 May 16, 2022