Python HTTP Agent Parser

Overview

Downloads

PYPI

Travis

Features

  • Fast
  • Detects OS and Browser. Does not aim to be a full featured agent parser
  • Will not turn into django-httpagentparser ;)

Usage

>> print(httpagentparser.simple_detect(s)) ('Linux', 'Chrome 5.0.307.11') >>> print(httpagentparser.detect(s)) {'os': {'name': 'Linux'}, 'browser': {'version': '5.0.307.11', 'name': 'Chrome'}} >>> s = "Mozilla/5.0 (Linux; U; Android 2.3.5; en-in; HTC_DesireS_S510e Build/GRJ90) \ AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1" >>> print(httpagentparser.simple_detect(s)) ('Android Linux 2.3.5', 'Safari 4.0') >>> print(httpagentparser.detect(s)) {'dist': {'version': '2.3.5', 'name': 'Android'}, 'os': {'name': 'Linux'}, 'browser': {'version': '4.0', 'name': 'Safari'}}">
>>> import httpagentparser
>>> s = "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/532.9 (KHTML, like Gecko) \
        Chrome/5.0.307.11 Safari/532.9"
>>> print(httpagentparser.simple_detect(s))
('Linux', 'Chrome 5.0.307.11')
>>> print(httpagentparser.detect(s))
{'os': {'name': 'Linux'},
 'browser': {'version': '5.0.307.11', 'name': 'Chrome'}}

>>> s = "Mozilla/5.0 (Linux; U; Android 2.3.5; en-in; HTC_DesireS_S510e Build/GRJ90) \
        AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"
>>> print(httpagentparser.simple_detect(s))
('Android Linux 2.3.5', 'Safari 4.0')
>>> print(httpagentparser.detect(s))
{'dist': {'version': '2.3.5', 'name': 'Android'},
'os': {'name': 'Linux'},
'browser': {'version': '4.0', 'name': 'Safari'}}

History

http://stackoverflow.com/questions/927552/parsing-http-user-agent-string/1151956#1151956

Comments
  • Support bots and phones

    Support bots and phones

    I spent a good deal of time adding bots and phones until even myself could not tell what that agent string meant. What is left is a pile of unintelligible mess.

    With that said, this patch supports every bot and feature phone that provided some useful info and occured more than a few times in a day worth of logs on a busy website.

    One possible improvement would bo to assign types to browsers, so I can easily discern the bots from the phones from the desktops.

    opened by pepijndevos 14
  • Support Internet Explorer Compatibility mode versions?

    Support Internet Explorer Compatibility mode versions?

    Is there interest in updating support IE Compatibilty modes? Currently all compatibilty mode user agent strings return as showing Internet Explorer 7.0

    Sample user agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E; InfoPath.3; MS-RTC LM 8)

    Current detection: {'os': {'version': '7', 'name': 'Windows'}, 'browser': {'version': '7.0', 'name': 'Microsoft Internet Explorer'}}

    Proposed detection: {'os': {'version': '7', 'name': 'Windows'}, 'browser': {'version': '8.0 Compatibility View', 'name': 'Microsoft Internet Explorer'}}

    opened by newportt 8
  • Fix RuntimeError: dictionary changed size during iteration

    Fix RuntimeError: dictionary changed size during iteration

    Description

    This attempts to address situations where globals() changed during iteration.

        import httpagentparser
      File "/.virtualenv/lib/python3.6/site-packages/httpagentparser/__init__.py", line 647, in <module>
        detectorshub = DetectorsHub()
      File "/.virtualenv/lib/python3.6/site-packages/httpagentparser/__init__.py", line 21, in __init__
        self.registerDetectors()
      File "/.virtualenv/lib/python3.6/site-packages/httpagentparser/__init__.py", line 34, in registerDetectors
        detectors = [v() for v in globals().values() if DetectorBase in getattr(v, '__mro__', [])]
      File "/.virtualenv/lib/python3.6/site-packages/httpagentparser/__init__.py", line 34, in <listcomp>
        detectors = [v() for v in globals().values() if DetectorBase in getattr(v, '__mro__', [])]
    RuntimeError: dictionary changed size during iteration
    

    It occurred during start up of a multi-threaded Flask app.

    Approach

    Iterate a list copy of globals().values() instead.

    opened by joemeister 7
  • Include spiders/bots

    Include spiders/bots

    It will be nice to add a new type of "browser" for bot/spiders like this:

    "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

    opened by reiven 7
  • Opera Mobile support

    Opera Mobile support

    Added support for Opera Mobile on Android devices. Had to fix the incorrect assumption that if the browser is not "Mobile Safari", it is a tablet. For alternative browsers, it will be Opera, Dolphin, Firefox, ... but no conclusion can be drawn about the fact that it is a phone or a tablet.

    opened by patrickdessalle 7
  • pip install return error

    pip install return error

    pip install/easy_install called in a virtualenv -p python3

    I checked http://guide.python-distribute.org/creation.html your setup is OK I checked your MANIFEST.in OK I checked the presence of README.rst it is OK I checked the doc, you play by the rules.

    I am clueless. Removing l7 in setup.py and doing python /home/jul/src/parse3/build/httpagentparser/setup.py install works however.

    Downloading/unpacking httpagentparser Real name of requirement httpagentparser is httpagentparser Downloading httpagentparser-1.1.0.tar.gz Running setup.py egg_info for package httpagentparser Traceback (most recent call last): File "", line 14, in File "/home/jul/src/parse3/build/httpagentparser/setup.py", line 7, in long_description=file("README.rst").read(), NameError: name 'file' is not defined Complete output from command python setup.py egg_info: Traceback (most recent call last):

    File "", line 14, in

    File "/home/jul/src/parse3/build/httpagentparser/setup.py", line 7, in

    long_description=file("README.rst").read(),
    

    NameError: name 'file' is not defined

    opened by jul 6
  • MSIE detection incorrect when multiple

    MSIE detection incorrect when multiple "MSIE X.Y;" sections present in UA

    For example, this UA spotted in the wild:

    Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; SLCC1; .NET CLR 2.0.50727; InfoPath.1; .NET CLR 3.5.30729; .NET CLR 3.0.30618; .NET4.0C)
    

    Detects as:

    {'os': {'version': 'XP', 'name': 'Windows'}, 'browser': {'version': '6.0', 'name': 'Microsoft Internet Explorer'}}
    

    When it's actually IE8.

    I traced the problem to the default getVersion implementation, which favours the last occurrence of MSIE in the version string.

    I write a new one on the MSIE class, which favours the first one instead:

    class MSIE(Browser):
        look_for = "MSIE" 
        skip_if_found = ["Opera"]
        name = "Microsoft Internet Explorer"
        version_splitters = [" ", ";"]
        def getVersion(self, agent):
            parts = agent.split(self.look_for + self.version_splitters[0])
            return parts[min(len(parts) - 1, 1)].split(self.version_splitters[1])[0].strip()
    

    My one grabs the FIRST version detected rather than the last one.

    I haven't provided this as a pull request, because the tests for MSIE are non-existent currently, so I have no idea if my change breaks a bunch of UAs in the wild.

    I added a test case for this UA at least:

        ('Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; SLCC1; .NET CLR 2.0.50727; InfoPath.1; .NET CLR 3.5.30729; .NET CLR 3.0.30618; .NET4.0C)',
        ('Windows XP', 'Microsoft Internet Explorer 8.0'),
        {'os': {'version': 'XP', 'name': 'Windows'}, 'browser': {'version': '8.0', 'name': 'Microsoft Internet Explorer'}},),
    

    I'd suggest adding a bunch of MSIE test cases first.

    opened by nigelmcnie 5
  • support for Opera version

    support for Opera version

    looks like for the Opera useragent Opera/9.80 (Windows NT 6.1; U; Edition Next; en) Presto/2.10.229 Version/12.00 we're getting Opera v9.80 reported, not Version 12... not sure if it's an Opera or httpagentparser issue...

    opened by Offbeatmammal 5
  • Failing user agents (Opera 7, 8 and Netscape 8)

    Failing user agents (Opera 7, 8 and Netscape 8)

    These user agents parse fine: Opera/9.20 (Windows NT 6.0; U; en), Opera 9, Windows Vista Opera/9.00 (Windows NT 5.1; U; en), Opera 9, Windows XP

    These fail to parse the version correctly, returning something like "Opera Mozilla/4.0" for browser: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.0 Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.1) Opera 7.02 [en]

    This one fails to detect the browser altogether: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20060127 Netscape/8.1

    opened by jaraco 5
  • Improper parsing for Windows NT

    Improper parsing for Windows NT

    Hei Shon,

    First of all, thanks for sharing this! I was looking for an agent parser, and found out your lib in python.org.

    I don't know if you are aware of this or not, but I was running some tests and found some inconsistent parsing on the following user-agent:

    import httpagentparser s = "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.60 Safari/534.24" print httpagentparser.detect(s) {'os': {'version': 'NT 5.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.60 Safari/534.24', 'name': 'Windows'}, 'browser': {'version': '11.0.696.60', 'name': 'Chrome'}}

    My computer is running Windows XP SP3, using Chrome 11.0.696.60. All other tests (Linux and Mac) went ok, just so you know.

    I took the liberty to write a patch for this, so I pulling it to your code later on.

    Thanks,

    opened by brunobraga 5
  • iPad and Android phone/tablet detection

    iPad and Android phone/tablet detection

    Added detection for iPad and Android phone/tablet.

    • iPad: Simply copy-pasted the iPhone detection and changed the string. Looks like it works.
    • Android: Added "Mobile Safari" string check, as described here: http://stackoverflow.com/questions/5341637/how-do-detect-android-tablets-in-general-useragent
    opened by srs81 4
  • MSOffice user agent not detected

    MSOffice user agent not detected

    Hi,

    When I try it with the user agent Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; Microsoft Outlook 16.0.5257; ms-office; MSOffice 16)

    Instead of the correct output MS Office 16 (user agent when Outlook load a external image) , I get the result: Microsoft Internet Explorer 7.0

    Do you know why ?

    opened by thib-d 0
  • Tag the source

    Tag the source

    It would be very helpful if you could tag releases as well. This would enable distributions who want to run your tests to fetch the package from GitHub instead of Pypi, where the tests are excluded.

    Thanks

    opened by fabaff 0
  • several bots not recognized

    several bots not recognized

    I manually looked through some logs and feed it into httpagendparser and found several wrongly identified user agents, including high profiles google bot:

    >>> httpagentparser.detect("Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)")
    {'platform': {'name': None, 'version': None}}
    
    >>> httpagentparser.detect("Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
    {'platform': {'name': 'Android', 'version': '6.0.1'}, 'os': {'name': 'Linux'}, 'bot': False, 'dist': {'name': 'Android', 'version': '6.0.1'}, 'browser': {'name': 'Chrome', 'version': '41.0.2272.96'}}
    
    >>> httpagentparser.detect("Mozilla/5.0 (X11; U; Linux Core i7-4980HQ; de; rv:32.0; compatible; JobboerseBot; http://www.jobboerse.com/bot.htm) Gecko/20100101 Firefox/38.0")
    {'platform': {'name': 'Linux', 'version': None}, 'os': {'name': 'Linux'}, 'bot': False, 'browser': {'name': 'Firefox', 'version': '38.0'}}
    
    >>> httpagentparser.detect("crawler_eb_germany")
    {'platform': {'name': None, 'version': None}}
    
    >>> httpagentparser.detect("Mozilla/5.0 (compatible; Qwantify/Bleriot/1.1; +https://help.qwant.com/bot)")
    {'platform': {'name': None, 'version': None}}
    
    >>> httpagentparser.detect("Mozilla/5.0 (compatible; SemrushBot/6~bl; +http://www.semrush.com/bot.html)")
    {'platform': {'name': None, 'version': None}}
    
    >>> httpagentparser.detect("Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)")
    {'platform': {'name': None, 'version': None}}
    
    >>> httpagentparser.detect("ZoominfoBot (zoominfobot at zoominfo dot com)")
    {'platform': {'name': None, 'version': None}}
    
    
    opened by FlorianLudwig 0
  • using regex to match detector classes directly as appose to iterating

    using regex to match detector classes directly as appose to iterating

    Trying to build a regex, by adding all "look_for", Example:-

    if we add all "look_for" of "os"(known_type)

    "Windows Phone OS|Windows Phone|Symbian|SymbianOS|CrOS|PlayStation|PLAYSTATION|Windows|Series40|iPhone|iPad|BlackBerry|Macintosh|Linux"

    Then with this regex we can jump to respective "detector" class on which we have to run detect as appose to iterating.

    some profiling:- With changes

    0.00596690177917 // import httpagentparser 
    
    0.000109910964966 // httpagentparser.detect(ua)
             13214 function calls (13037 primitive calls) in 0.006 seconds
    

    without changes

    0.00134301185608  // import httpagentparser 
    
    0.000181913375854  // httpagentparser.detect(ua)
             693 function calls in 0.002 seconds
    

    So i am actually making it slower overall

    But if u run on in a loop of 100000 with changes

    0.00691199302673 // import 
    2.00802302361 // loop on httpagentparser.detect(ua)
             3513180 function calls (3513003 primitive calls) in 2.015 seconds
    

    without changes

    0.00165581703186 // import 
    6.87254285812  //loop on httpagentparser.detect(ua)
             22500469 function calls in 6.874 seconds
    

    There is difference in this case

    This idea, i wanted to try out, code is messy but i can clean it a bit if u see potential in it. But agree, regex is always a little tricky and error prone.

    opened by rajatsingla 2
  • Nokia Lumia 920's OS system is recognised as Android..

    Nokia Lumia 920's OS system is recognised as Android..

    Here is the user agent:

    Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; NOKIA; Lumia 920) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Mobile Safari/537.36 Edge/13.10586
    
    opened by lipis 1
Owner
Shekhar
Python Startups OpenSource Linux Web
Shekhar
async parser for JET

This project is mainly aims to provide an async parsing option for NTDS.dit database file for obtaining user secrets.

null 15 Mar 8, 2022
iOS Snapchat parser for chats and cached files

ParseSnapchat iOS Snapchat parser for chats and cached files Tested on Windows and Linux install required libraries: pip install -r requirements.txt c

null 11 Dec 5, 2022
HeadHunter parser

HHparser Description Program for finding work at HeadHunter service Features Find job Parse vacancies Dependencies python pip geckodriver firefox Inst

memphisboy 1 Oct 30, 2021
DiddiParser 2: The DiddiScript parser.

DiddiParser 2 The DiddiScript parser, written in Python. Installation DiddiParser2 can be installed via pip: pip install diddiparser2 Usage DiddiPars

Diego Ramirez 3 Dec 28, 2022
Python tool to check a web applications compliance with OWASP HTTP response headers best practices

Check Your Head A quick and easy way to check a web applications response headers!

Zak 6 Nov 9, 2021
Automatically Generate Rulesets for IIS for Intelligent HTTP/S C2 Redirection

Automatically Generate Rulesets for IIS for Intelligent HTTP/S C2 Redirection This project converts a Cobalt Strike profile to a functional web.config

Jesse 99 Dec 13, 2022
✨ Voici un code en Python par moi, et en français qui permet d'exécuter du Javascript en Python.

JavaScript In Python ❗ Voici un code en Python par moi, et en français qui permet d'exécuter du Javascript en Python. ?? Une vidéo pour vous expliquer

MrGabin 4 Mar 28, 2022
Simple python module to get the information regarding battery in python.

Battery Stats A python3 module created for easily reading the current parameters of Battery in realtime. It reads battery stats from /sys/class/power_

Shreyas Ashtamkar 5 Oct 21, 2022
ticktock is a minimalist library to view Python time performance of Python code.

ticktock is a minimalist library to view Python time performance of Python code.

Victor Benichoux 30 Sep 28, 2022
Python @deprecat decorator to deprecate old python classes, functions or methods.

deprecat Decorator Python @deprecat decorator to deprecate old python classes, functions or methods. Installation pip install deprecat Usage To use th

null 12 Dec 12, 2022
A python package containing all the basic functions and classes for python. From simple addition to advanced file encryption.

A python package containing all the basic functions and classes for python. From simple addition to advanced file encryption.

PyBash 11 May 22, 2022
Find dependent python scripts of a python script in a project directory.

Find dependent python scripts of a python script in a project directory.

null 2 Dec 5, 2021
A functional standard library for Python.

Toolz A set of utility functions for iterators, functions, and dictionaries. See the PyToolz documentation at https://toolz.readthedocs.io LICENSE New

null 4.1k Dec 30, 2022
Python Classes Without Boilerplate

attrs is the Python package that will bring back the joy of writing classes by relieving you from the drudgery of implementing object protocols (aka d

The attrs Cabal 4.6k Jan 6, 2023
🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Boltons boltons should be builtins. Boltons is a set of over 230 BSD-licensed, pure-Python utilities in the same spirit as — and yet conspicuously mis

Mahmoud Hashemi 6k Jan 4, 2023
Retrying library for Python

Tenacity Tenacity is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just

Julien Danjou 4.3k Jan 5, 2023
Simple yet flexible natural sorting in Python.

natsort Simple yet flexible natural sorting in Python. Source Code: https://github.com/SethMMorton/natsort Downloads: https://pypi.org/project/natsort

Seth Morton 712 Dec 23, 2022
A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!

Ubelt is a small library of robust, tested, documented, and simple functions that extend the Python standard library. It has a flat API that all behav

Jon Crall 638 Dec 13, 2022
Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.

Retrying Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just

Ray Holder 1.9k Dec 29, 2022