A Pythonic wrapper for the Wikipedia API

Overview

Wikipedia

https://travis-ci.org/goldsmith/Wikipedia.png?branch=master https://pypip.in/d/wikipedia/badge.png https://pypip.in/v/wikipedia/badge.png License

Wikipedia is a Python library that makes it easy to access and parse data from Wikipedia.

Search Wikipedia, get article summaries, get data like links and images from a page, and more. Wikipedia wraps the MediaWiki API so you can focus on using Wikipedia data, not getting it.

>>> import wikipedia
>>> print wikipedia.summary("Wikipedia")
# Wikipedia (/ˌwɪkɨˈpiːdiə/ or /ˌwɪkiˈpiːdiə/ WIK-i-PEE-dee-ə) is a collaboratively edited, multilingual, free Internet encyclopedia supported by the non-profit Wikimedia Foundation...

>>> wikipedia.search("Barack")
# [u'Barak (given name)', u'Barack Obama', u'Barack (brandy)', u'Presidency of Barack Obama', u'Family of Barack Obama', u'First inauguration of Barack Obama', u'Barack Obama presidential campaign, 2008', u'Barack Obama, Sr.', u'Barack Obama citizenship conspiracy theories', u'Presidential transition of Barack Obama']

>>> ny = wikipedia.page("New York")
>>> ny.title
# u'New York'
>>> ny.url
# u'http://en.wikipedia.org/wiki/New_York'
>>> ny.content
# u'New York is a state in the Northeastern region of the United States. New York is the 27th-most exten'...
>>> ny.links[0]
# u'1790 United States Census'

>>> wikipedia.set_lang("fr")
>>> wikipedia.summary("Facebook", sentences=1)
# Facebook est un service de réseautage social en ligne sur Internet permettant d'y publier des informations (photographies, liens, textes, etc.) en contrôlant leur visibilité par différentes catégories de personnes.

Note: this library was designed for ease of use and simplicity, not for advanced use. If you plan on doing serious scraping or automated requests, please use Pywikipediabot (or one of the other more advanced Python MediaWiki API wrappers), which has a larger API, rate limiting, and other features so we can be considerate of the MediaWiki infrastructure.

Installation

To install Wikipedia, simply run:

$ pip install wikipedia

Wikipedia is compatible with Python 2.6+ (2.7+ to run unittest discover) and Python 3.3+.

Documentation

Read the docs at https://wikipedia.readthedocs.org/en/latest/.

To run tests, clone the repository on GitHub, then run:

$ pip install -r requirements.txt
$ bash runtests  # will run tests for python and python3
$ python -m unittest discover tests/ '*test.py'  # manual style

in the root project directory.

To build the documentation yourself, after installing requirements.txt, run:

$ pip install sphinx
$ cd docs/
$ make html

License

MIT licensed. See the LICENSE file for full details.

Credits

  • wiki-api by @richardasaurus for inspiration
  • @nmoroze and @themichaelyang for feedback and suggestions
  • The Wikimedia Foundation for giving the world free access to data
Bitdeli badge
Comments
  • When importing, PyCharm says

    When importing, PyCharm says "Unused import statement"

    Hello sir. I was trying to build a project where I need to import wikipedia module. I installed it with "pip install wikipedia". Then I checked the settings in Pycharm and added the module in the existing list of modules. But whenever I try to import it with "import wikipedia", the line gets faded saying "unused import statement". And thus unable to import the module in Pycharm. But the statement works in python IDLE. I tried troubleshoots from google and stack overflow, but nothing solved my problem. Now looking for your help.

    Looking forward to hearing from you.

    opened by SM-Fahim 7
  • Add lang_links property to WikipediaPage

    Add lang_links property to WikipediaPage

    This change adds a lang_links property method to WikipediaPage, which returns a list of links to versions of the Wikipedia page in different languages. The result is a list of (language_prefix, title) tuples.

    opened by jichu4n 6
  • DisambiguationError: perhaps a better response?

    DisambiguationError: perhaps a better response?

    For example:

        wikipedia.summary('recommendation')
    

    gives the following error and terminates the program:

        290       may_refer_to = [li.a.get_text() for li in filtered_lis if li.a]
        291 
    --> 292       raise DisambiguationError(self.title, may_refer_to)
        293 
        294     else:
    
    DisambiguationError: "Recommendation" may refer to: 
    norm (philosophy)
    Recommender systems
    European Union recommendation
    W3C recommendation
    letter of recommendation
    

    Perhaps a better response, maybe in a JSON format, would help? without termination, and instead ask for further clarification?

    opened by arcolife 6
  • add - catch page errors when 'fullurl' is missing.

    add - catch page errors when 'fullurl' is missing.

    When requesting some pages via WikipediaPage():

      File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 224, in __init__
        self.load(redirect=redirect, preload=preload)
      File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 276, in load
        self.__init__(title, redirect=redirect, preload=preload)
      File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 224, in __init__
        self.load(redirect=redirect, preload=preload)
      File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 296, in load
        self.url = data['fullurl']
    KeyError: 'fullurl'
    

    I think it may be due to case & whitespace in the article title string? In any case, raising a PageError in this case may be more appropriate to pass back to the client.

    opened by rfaulkner 5
  • Empty 'extract' in Wikipedia response causes 'TypeError: list indices must be integers, not str'.

    Empty 'extract' in Wikipedia response causes 'TypeError: list indices must be integers, not str'.

    >>> import wikipedia
    >>> wikipedia.page('Fully connected network', auto_suggest=False, redirect=True)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 211, in page
        return WikipediaPage(title, redirect=redirect, preload=preload)
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 224, in __init__
        self.load(redirect=redirect, preload=preload)
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 276, in load
        self.__init__(title, redirect=redirect, preload=preload)
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 224, in __init__
        self.load(redirect=redirect, preload=preload)
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 250, in load
        pages = request['query']['pages']
    TypeError: list indices must be integers, not str
    
    opened by dmirylenka 5
  • Adding revid and parentid to the request.

    Adding revid and parentid to the request.

    Hi. This is my first pull request, so please bear with me. I like your library, but I need revids. This breaks the tests, so I don't expect you to apply. I don't see how to fix the tests automatically. Do you have something you run that regenerates the request_mock_data file or do I just grub around in it?

    Thanks for your time,

    opened by fusiongyro 5
  • KeyError thrown when HTTP request times out (wikipedia.search)

    KeyError thrown when HTTP request times out (wikipedia.search)

    Calling a series of searches like wikipedia.search(item, results=1) occasionally results in the error:

    File "/Library/Python/2.7/site-packages/wikipedia/wikipedia.py", line 47, in search
        search_results = (d['title'] for d in raw_results['query']['search'])
    KeyError: 'query'
    

    This is because the raw_results dict looks like this: {u'servedby': u'mw1118', u'error': {u'info': u'HTTP request timed out.', u'code': u'srsearch-error'}}

    Maybe it would be better if some sort of HTTP exception was thrown instead of the key error? That is, if I'm understanding what's happening correctly.

    opened by mobeets 5
  • beautifulsoup4 import error in requirements.txt

    beautifulsoup4 import error in requirements.txt

    Awesome work on this @goldsmith ! Using it at a hackathon atm.

    Seems to be a type in requirements.txt on version 0.9.4 on PyPi, this is what I get at the moment

    beautifulsoup4==4.3.11
    nose==1.3.0
    requests==1.2.3
    wsgiref==0.1.2
    

    The latest version of https://github.com/goldsmith/Wikipedia/blob/master/requirements.txt has beautifulsoup4, perhaps just need a new upload to PyPi?

    bug 
    opened by CalumJEadie 5
  • ModuleNotFoundError: No module named 'wiki'

    ModuleNotFoundError: No module named 'wiki'

    Hello guys, I'm trying to get some SNPedia data. The SNPedia API's documentation page says:

    image

    So, I started a Google Colab and a Jupyter Notbook on my local machine and I got this error in both cases:

    image image

    What did I do wrong? What's the problem with the Wiki file?

    opened by edgui-appolonicorreia 4
  • Added validity check to wikipedia.set_lang(). Also ran project through autopep8 to fix the indentation

    Added validity check to wikipedia.set_lang(). Also ran project through autopep8 to fix the indentation

    The indentation was against PEP8, so I ran it through autopep8 so my editor wouldn't spout errors for all the 2space indents. The full command was autopep8 ./ --recursive --in-place --pep8-passes 2000 --verbose

    opened by c-ameron 4
  • ImportError on installing from pip in virtualenv

    ImportError on installing from pip in virtualenv

    If I develop a new virtualenv and do a pip install wikipedia, it gives:

    ImportError: No module named requests
    

    This is due to 'import wikipedia' in the setup.py file. I think this shouldn't be present in setup.py file (since it hasn't finished installing requirements yet). Also, could the requests version be updated to 2.3.0 ?

    opened by arcolife 4
  • Title resolution leads to internal error

    Title resolution leads to internal error

    Whenever I run the line wikipedia.page("New York City") The api returns an error indicating there is no page for 'new york sity'. The error occurs on line 270 of wikipedia.py results, suggestion = search(title, results=1, suggestion=True) where title='new york city' resolves to 'new york sity' for whatever reason

    opened by grossmanm 0
  • get side panel info

    get side panel info

    I am looking for a way to get an information from a side panel of wikipedia website, example:https://en.wikipedia.org/wiki/Netflix (the panel on the right)

    I was trying using this:

    import wikipedia
    xx = wikipedia.page("Netlix")
    xx.title
    print(xx.content)
    

    It does provide the whole page content - besides side panel.

    I was trying to avoid BeautifulSoup package, but I am not sure if it's possible. Any ideas?

    opened by kasiahewelt 0
  • wikipedia api searh

    wikipedia api searh

    This is my code:

    import wikipedia
    wikipedia.summary("theresa may")
    

    but for some reasons, it ran into error, and saying that I am giving "teresa may" as input???

    PageError: Page id "teresa may" does not match any pages. Try another id!
    

    What is going on here?

    image
    opened by Felikesw 1
  • Return exact page title match instead of suggestion if it exists.

    Return exact page title match instead of suggestion if it exists.

    Fixes #279 and related.

    There are already at least four proposed fixes to this bug ( #305 #297 #287 #253 ), but none of them quite succeed so here's a fourth attempt. Of course, considering the maintainer hasn't been seen since 2018 I doubt there's going to be any movement on any of them.

    opened by wlerin 0
  • Search terms redirecting to unrelated pages

    Search terms redirecting to unrelated pages

    Searching for "bass" searches for "band."

    >>> import wikipedia as w
    >>> w.page("bass")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 276, in page
        return WikipediaPage(title, redirect=redirect, preload=preload)
      File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 299, in __init__
        self.__load(redirect=redirect, preload=preload)
      File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 367, in __load
        self.__init__(redirects['to'], redirect=redirect, preload=preload)
      File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 299, in __init__
        self.__load(redirect=redirect, preload=preload)
      File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 393, in __load
        raise DisambiguationError(getattr(self, 'title', page['title']), may_refer_to)
    wikipedia.exceptions.DisambiguationError: "Band" may refer to:
    Bánd
    Band, Iran
    Band, Mureș
    Band-e Majid Khan
    Band (surname)
    Musical ensemble
    Band (rock and pop)
    Concert band
    Dansband
    Jazz band
    Marching band
    ...
    

    Searching for "band" appears to search for "and".

    >>> import wikipedia as w
    >>> w.page("band")
    C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py:389: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
    
    The code that caused this warning is on line 389 of the file C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.
    
      lis = BeautifulSoup(html).find_all('li')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 276, in page
        return WikipediaPage(title, redirect=redirect, preload=preload)
      File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 299, in __init__
        self.__load(redirect=redirect, preload=preload)
      File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 393, in __load
        raise DisambiguationError(getattr(self, 'title', page['title']), may_refer_to)
    wikipedia.exceptions.DisambiguationError: "and" may refer to:
    Conjunction (grammar)
    Logical conjunction
    Bitwise AND
    short-circuit operator
    Ampersand
    AND gate
    And (John Martyn album)
    And (Koda Kumi album)
    A N D (Tricot album)
    Jonah Matranga
    ...
    
    opened by DickieTheProgrammer 1
Owner
Jonathan Goldsmith
cofounder Tetra, building an automatic notetaker with speech recognition. we're hiring for web and iOS!
Jonathan Goldsmith
Python wrapper for Wikipedia

Wikipedia API Wikipedia-API is easy to use Python wrapper for Wikipedias' API. It supports extracting texts, sections, links, categories, translations

Martin Majlis 369 Dec 30, 2022
🚀 An asynchronous python API wrapper meant to replace discord.py - Snappy discord api wrapper written with aiohttp & websockets

Pincer An asynchronous python API wrapper meant to replace discord.py ❗ The package is currently within the planning phase ?? Links |Join the discord

Pincer 125 Dec 26, 2022
A Pythonic client for the official https://data.gov.gr API.

pydatagovgr An unofficial Pythonic client for the official data.gov.gr API. Aims to be an easy, intuitive and out-of-the-box way to: find data publish

Ilias Antonopoulos 40 Nov 10, 2022
A tool for extracting plain text from Wikipedia dumps

WikiExtractor WikiExtractor.py is a Python script that extracts and cleans text from a Wikipedia database dump. The tool is written in Python and requ

Giuseppe Attardi 3.2k Dec 31, 2022
Esse script procura qualquer, dados que você queira na wikipedia! Em breve traremos um com dados em toda a internet.

Buscador de dados simples Dependências necessárias Para você poder começar a utilizar esta ferramenta, você vai precisar da dependência "wikipedia", p

Erick Campoy 4 Feb 24, 2022
Ross Virtual Assistant is a programme which can play Music, search Wikipedia, open Websites and much more.

Ross-Virtual-Assistant Ross Virtual Assistant is a programme which can play Music, search Wikipedia, open Websites and much more. Installation Downloa

Jehan Patel 4 Nov 8, 2021
Cities bot - A simple example of using aiogram and the wikipedia package

Cities game A simple example of using aiogram and the wikipedia package. The bot

Artem Meller 2 Jan 29, 2022
Aws-lambda-requests-wrapper - Request/Response wrapper for AWS Lambda with API Gateway

AWS Lambda Requests Wrapper Request/Response wrapper for AWS Lambda with API Gat

null 1 May 20, 2022
The successor of GeoSnipe, a pythonic Minecraft username sniper based on AsyncIO.

OneSnipe The successor of GeoSnipe, a pythonic Minecraft username sniper based on AsyncIO. Documentation View Documentation Features • Mojang & Micros

null 1 Jan 14, 2022
Pythonic and easy iCalendar library (rfc5545)

ics.py 0.8.0-dev : iCalendar for Humans Original repository (GitHub) - Bugtracker and issues (GitHub) - PyPi package (ics) - Documentation (Read The D

ics.py 513 Jan 2, 2023
The Main Pythonic Version Of Twig Using Nextcord

The Main Pythonic Version Of Twig Using Nextcord

null 8 Mar 21, 2022
A python package that fetches tweets and user information in a very pythonic manner.

Tweetsy Tweetsy uses Twitter's underlying API to fetch user information and tweets and present it in a human-friendly way. What makes Tweetsy special

Sakirul Alam 5 Nov 12, 2022
PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

PRAW: The Python Reddit API Wrapper PRAW, an acronym for "Python Reddit API Wrapper", is a Python package that allows for simple access to Reddit's AP

Python Reddit API Wrapper Development 3k Dec 29, 2022
Python API wrapper around Trello's API

A wrapper around the Trello API written in Python. Each Trello object is represented by a corresponding Python object. The attributes of these objects

Richard Kolkovich 904 Jan 2, 2023
PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

PRAW: The Python Reddit API Wrapper PRAW, an acronym for "Python Reddit API Wrapper", is a Python package that allows for simple access to Reddit's AP

Python Reddit API Wrapper Development 3k Dec 29, 2022
An API wrapper around the pythonanywhere's API.

pyaww An API wrapper around the pythonanywhere's API. The name stands for pythonanywherewrapper. 100% api coverage most of the codebase is documented

null 7 Dec 11, 2022
Async ready API wrapper for Revolt API written in Python.

Mutiny Async ready API wrapper for Revolt API written in Python. Installation Python 3.9 or higher is required To install the library, you can just ru

null 16 Mar 29, 2022
An API Wrapper for Gofile API

Gofile2 from gofile2 import Gofile g_a = Gofile() print(g_a.upload(file="/home/itz-fork/photo.png")) An API Wrapper for Gofile API. About API Gofile

I'm Not A Bot #Left_TG 16 Dec 10, 2022
A simple API wrapper for the Tenor API

Gifpy A simple API wrapper for the Tenor API Installation Python 3.9 or higher is recommended python3 -m pip install gifpy Clone repository: $ git cl

Juan Ignacio Battiston 4 Dec 22, 2021