A Pythonic wrapper for the Wikipedia API

Jonathan Goldsmith

Last update: Dec 28, 2022

Related tags

Third-party APIs Wrappers Wikipedia

Overview

Wikipedia

Wikipedia is a Python library that makes it easy to access and parse data from Wikipedia.

Search Wikipedia, get article summaries, get data like links and images from a page, and more. Wikipedia wraps the MediaWiki API so you can focus on using Wikipedia data, not getting it.

>>> import wikipedia
>>> print wikipedia.summary("Wikipedia")
# Wikipedia (/ˌwɪkɨˈpiːdiə/ or /ˌwɪkiˈpiːdiə/ WIK-i-PEE-dee-ə) is a collaboratively edited, multilingual, free Internet encyclopedia supported by the non-profit Wikimedia Foundation...

>>> wikipedia.search("Barack")
# [u'Barak (given name)', u'Barack Obama', u'Barack (brandy)', u'Presidency of Barack Obama', u'Family of Barack Obama', u'First inauguration of Barack Obama', u'Barack Obama presidential campaign, 2008', u'Barack Obama, Sr.', u'Barack Obama citizenship conspiracy theories', u'Presidential transition of Barack Obama']

>>> ny = wikipedia.page("New York")
>>> ny.title
# u'New York'
>>> ny.url
# u'http://en.wikipedia.org/wiki/New_York'
>>> ny.content
# u'New York is a state in the Northeastern region of the United States. New York is the 27th-most exten'...
>>> ny.links[0]
# u'1790 United States Census'

>>> wikipedia.set_lang("fr")
>>> wikipedia.summary("Facebook", sentences=1)
# Facebook est un service de réseautage social en ligne sur Internet permettant d'y publier des informations (photographies, liens, textes, etc.) en contrôlant leur visibilité par différentes catégories de personnes.

Note: this library was designed for ease of use and simplicity, not for advanced use. If you plan on doing serious scraping or automated requests, please use Pywikipediabot (or one of the other more advanced Python MediaWiki API wrappers), which has a larger API, rate limiting, and other features so we can be considerate of the MediaWiki infrastructure.

Installation

To install Wikipedia, simply run:

$ pip install wikipedia

Wikipedia is compatible with Python 2.6+ (2.7+ to run unittest discover) and Python 3.3+.

Documentation

Read the docs at https://wikipedia.readthedocs.org/en/latest/.

To run tests, clone the repository on GitHub, then run:

$ pip install -r requirements.txt
$ bash runtests  # will run tests for python and python3
$ python -m unittest discover tests/ '*test.py'  # manual style

in the root project directory.

To build the documentation yourself, after installing requirements.txt, run:

$ pip install sphinx
$ cd docs/
$ make html

License

MIT licensed. See the LICENSE file for full details.

Credits

wiki-api by @richardasaurus for inspiration
@nmoroze and @themichaelyang for feedback and suggestions
The Wikimedia Foundation for giving the world free access to data

Comments

When importing, PyCharm says "Unused import statement"

Hello sir. I was trying to build a project where I need to import wikipedia module. I installed it with "pip install wikipedia". Then I checked the settings in Pycharm and added the module in the existing list of modules. But whenever I try to import it with "import wikipedia", the line gets faded saying "unused import statement". And thus unable to import the module in Pycharm. But the statement works in python IDLE. I tried troubleshoots from google and stack overflow, but nothing solved my problem. Now looking for your help.

Looking forward to hearing from you.

opened by SM-Fahim 7
Add lang_links property to WikipediaPage

This change adds a lang_links property method to WikipediaPage, which returns a list of links to versions of the Wikipedia page in different languages. The result is a list of (language_prefix, title) tuples.

opened by jichu4n 6

DisambiguationError: perhaps a better response?

For example:

    wikipedia.summary('recommendation')

gives the following error and terminates the program:

    290       may_refer_to = [li.a.get_text() for li in filtered_lis if li.a]
    291 
--> 292       raise DisambiguationError(self.title, may_refer_to)
    293 
    294     else:

DisambiguationError: "Recommendation" may refer to: 
norm (philosophy)
Recommender systems
European Union recommendation
W3C recommendation
letter of recommendation

Perhaps a better response, maybe in a JSON format, would help? without termination, and instead ask for further clarification?

opened by arcolife 6

add - catch page errors when 'fullurl' is missing.

When requesting some pages via WikipediaPage():

  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 224, in __init__
    self.load(redirect=redirect, preload=preload)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 276, in load
    self.__init__(title, redirect=redirect, preload=preload)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 224, in __init__
    self.load(redirect=redirect, preload=preload)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 296, in load
    self.url = data['fullurl']
KeyError: 'fullurl'

I think it may be due to case & whitespace in the article title string? In any case, raising a PageError in this case may be more appropriate to pass back to the client.

opened by rfaulkner 5

Empty 'extract' in Wikipedia response causes 'TypeError: list indices must be integers, not str'.

>>> import wikipedia
>>> wikipedia.page('Fully connected network', auto_suggest=False, redirect=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 211, in page
    return WikipediaPage(title, redirect=redirect, preload=preload)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 224, in __init__
    self.load(redirect=redirect, preload=preload)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 276, in load
    self.__init__(title, redirect=redirect, preload=preload)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 224, in __init__
    self.load(redirect=redirect, preload=preload)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 250, in load
    pages = request['query']['pages']
TypeError: list indices must be integers, not str

opened by dmirylenka 5

Adding revid and parentid to the request.

Hi. This is my first pull request, so please bear with me. I like your library, but I need revids. This breaks the tests, so I don't expect you to apply. I don't see how to fix the tests automatically. Do you have something you run that regenerates the request_mock_data file or do I just grub around in it?

Thanks for your time,

opened by fusiongyro 5
KeyError thrown when HTTP request times out (wikipedia.search)
Calling a series of searches like wikipedia.search(item, results=1) occasionally results in the error:

File "/Library/Python/2.7/site-packages/wikipedia/wikipedia.py", line 47, in search search_results = (d['title'] for d in raw_results['query']['search']) KeyError: 'query'

This is because the raw_results dict looks like this: {u'servedby': u'mw1118', u'error': {u'info': u'HTTP request timed out.', u'code': u'srsearch-error'}}

Maybe it would be better if some sort of HTTP exception was thrown instead of the key error? That is, if I'm understanding what's happening correctly.
opened by mobeets 5
beautifulsoup4 import error in requirements.txt
Awesome work on this @goldsmith ! Using it at a hackathon atm.

Seems to be a type in requirements.txt on version 0.9.4 on PyPi, this is what I get at the moment

beautifulsoup4==4.3.11 nose==1.3.0 requests==1.2.3 wsgiref==0.1.2

The latest version of https://github.com/goldsmith/Wikipedia/blob/master/requirements.txt has beautifulsoup4, perhaps just need a new upload to PyPi?
bug
opened by CalumJEadie 5
ModuleNotFoundError: No module named 'wiki'

Hello guys, I'm trying to get some SNPedia data. The SNPedia API's documentation page says:

So, I started a Google Colab and a Jupyter Notbook on my local machine and I got this error in both cases:

What did I do wrong? What's the problem with the Wiki file?

opened by edgui-appolonicorreia 4
Added validity check to wikipedia.set_lang(). Also ran project through autopep8 to fix the indentation

The indentation was against PEP8, so I ran it through autopep8 so my editor wouldn't spout errors for all the 2space indents. The full command was autopep8 ./ --recursive --in-place --pep8-passes 2000 --verbose

opened by c-ameron 4
ImportError on installing from pip in virtualenv
If I develop a new virtualenv and do a pip install wikipedia, it gives:

ImportError: No module named requests

This is due to 'import wikipedia' in the setup.py file. I think this shouldn't be present in setup.py file (since it hasn't finished installing requirements yet). Also, could the requests version be updated to 2.3.0 ?
opened by arcolife 4
Title resolution leads to internal error

Whenever I run the line wikipedia.page("New York City") The api returns an error indicating there is no page for 'new york sity'. The error occurs on line 270 of wikipedia.py results, suggestion = search(title, results=1, suggestion=True) where title='new york city' resolves to 'new york sity' for whatever reason

opened by grossmanm 0
get side panel info
I am looking for a way to get an information from a side panel of wikipedia website, example:https://en.wikipedia.org/wiki/Netflix (the panel on the right)

I was trying using this:

import wikipedia xx = wikipedia.page("Netlix") xx.title print(xx.content)

It does provide the whole page content - besides side panel.

I was trying to avoid BeautifulSoup package, but I am not sure if it's possible. Any ideas?
opened by kasiahewelt 0
wikipedia api searh
This is my code:

import wikipedia wikipedia.summary("theresa may")

but for some reasons, it ran into error, and saying that I am giving "teresa may" as input???

PageError: Page id "teresa may" does not match any pages. Try another id!

What is going on here?
opened by Felikesw 1
Return exact page title match instead of suggestion if it exists.

Fixes #279 and related.

There are already at least four proposed fixes to this bug ( #305 #297 #287 #253 ), but none of them quite succeed so here's a fourth attempt. Of course, considering the maintainer hasn't been seen since 2018 I doubt there's going to be any movement on any of them.

opened by wlerin 0

Search terms redirecting to unrelated pages

Searching for "bass" searches for "band."

>>> import wikipedia as w
>>> w.page("bass")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 276, in page
    return WikipediaPage(title, redirect=redirect, preload=preload)
  File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 299, in __init__
    self.__load(redirect=redirect, preload=preload)
  File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 367, in __load
    self.__init__(redirects['to'], redirect=redirect, preload=preload)
  File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 299, in __init__
    self.__load(redirect=redirect, preload=preload)
  File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 393, in __load
    raise DisambiguationError(getattr(self, 'title', page['title']), may_refer_to)
wikipedia.exceptions.DisambiguationError: "Band" may refer to:
Bánd
Band, Iran
Band, Mureș
Band-e Majid Khan
Band (surname)
Musical ensemble
Band (rock and pop)
Concert band
Dansband
Jazz band
Marching band
...

Searching for "band" appears to search for "and".

>>> import wikipedia as w
>>> w.page("band")
C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py:389: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 389 of the file C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.

  lis = BeautifulSoup(html).find_all('li')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 276, in page
    return WikipediaPage(title, redirect=redirect, preload=preload)
  File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 299, in __init__
    self.__load(redirect=redirect, preload=preload)
  File "C:\Users\-\AppData\Local\Programs\Python\Python39\lib\site-packages\wikipedia\wikipedia.py", line 393, in __load
    raise DisambiguationError(getattr(self, 'title', page['title']), may_refer_to)
wikipedia.exceptions.DisambiguationError: "and" may refer to:
Conjunction (grammar)
Logical conjunction
Bitwise AND
short-circuit operator
Ampersand
AND gate
And (John Martyn album)
And (Koda Kumi album)
A N D (Tricot album)
Jonah Matranga
...

opened by DickieTheProgrammer 1

Owner

Jonathan Goldsmith

cofounder Tetra, building an automatic notetaker with speech recognition. we're hiring for web and iOS!

GitHub https://wikipedia.readthedocs.org/

Python wrapper for Wikipedia

Wikipedia API Wikipedia-API is easy to use Python wrapper for Wikipedias' API. It supports extracting texts, sections, links, categories, translations

369 Dec 30, 2022

🚀 An asynchronous python API wrapper meant to replace discord.py - Snappy discord api wrapper written with aiohttp & websockets

Pincer An asynchronous python API wrapper meant to replace discord.py ❗ The package is currently within the planning phase ?? Links ｜Join the discord

125 Dec 26, 2022

A Pythonic wrapper for the Wikipedia API

Related tags

Overview

Wikipedia

Installation

Documentation

License

Credits

Comments

Owner

Jonathan Goldsmith

Python wrapper for Wikipedia

🚀 An asynchronous python API wrapper meant to replace discord.py - Snappy discord api wrapper written with aiohttp & websockets

A Pythonic client for the official https://data.gov.gr API.

A tool for extracting plain text from Wikipedia dumps

Esse script procura qualquer, dados que você queira na wikipedia! Em breve traremos um com dados em toda a internet.

Ross Virtual Assistant is a programme which can play Music, search Wikipedia, open Websites and much more.

Cities bot - A simple example of using aiogram and the wikipedia package

Aws-lambda-requests-wrapper - Request/Response wrapper for AWS Lambda with API Gateway

The successor of GeoSnipe, a pythonic Minecraft username sniper based on AsyncIO.

Pythonic and easy iCalendar library (rfc5545)

The Main Pythonic Version Of Twig Using Nextcord

A python package that fetches tweets and user information in a very pythonic manner.

PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

Python API wrapper around Trello's API

PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

An API wrapper around the pythonanywhere's API.

Async ready API wrapper for Revolt API written in Python.

An API Wrapper for Gofile API

A simple API wrapper for the Tenor API