Extract countries, regions and cities from a URL or text

Overview

This project is no longer being maintained and has been archived. Please check the Forks list for newer versions.

Forks

We are aware of two 3rd party forks for this library:

Geograpy

Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.

Install & Setup

Grab the package using pip (this will take a few minutes)

pip install geograpy

Geograpy uses NLTK for entity recognition, so you'll also need to download the models we're using. Fortunately there's a command that'll take care of this for you.

geograpy-nltk

Basic Usage

Import the module, give some text or a URL, and presto.

import geograpy
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy.get_place_context(url=url)

Now you have access to information about all the places mentioned in the linked article.

  • places.countries contains a list of country names
  • places.regions contains a list of region names
  • places.cities contains a list of city names
  • places.other lists everything that wasn't clearly a country, region or city

Note that the other list might be useful for shorter texts, to pull out information like street names, points of interest, etc, but at the moment is a bit messy when scanning longer texts that contain possessive forms of proper nouns (like "Russian" instead of "Russia").

But Wait, There's More

In addition to listing the names of discovered places, you'll also get some information about the relationships between places.

  • places.country_regions regions broken down by country
  • places.country_cities cities broken down by country
  • places.address_strings city, region, country strings useful for geocoding

Last But Not Least

While a text might mention many places, it's probably focused on one or two, so Geograpy also breaks down countries, regions and cities by number of mentions.

  • places.country_mentions
  • places.region_mentions
  • places.city_mentions

Each of these returns a list of tuples. The first item in the tuple is the place name and the second item is the number of mentions. For example:

[('Russian Federation', 14), (u'Ukraine', 11), (u'Lithuania', 1)]  

If You're Really Serious

You can of course use each of Geograpy's modules on their own. For example:

from geograpy import extraction

e = extraction.Extractor(url='http://www.bbc.com/news/world-europe-26919928')
e.find_entities()

# You can now access all of the places found by the Extractor
print e.places

Place context is handled in the places module. For example:

from geograpy import places

pc = places.PlaceContext(['Cleveland', 'Ohio', 'United States'])

pc.set_countries()
print pc.countries #['United States']

pc.set_regions()
print pc.regions #['Ohio']

pc.set_cities()
print pc.cities #['Cleveland']

print pc.address_strings #['Cleveland, Ohio, United States']

And of course all of the other information shown above (country_regions etc) is available after the corresponding set_ method is called.

Credits

Geograpy uses the following excellent libraries:

Geograpy uses the following data sources:

Hat tip to Chris Albon for the name.

Comments
  • Error processing data (from demo)

    Error processing data (from demo)

    NLTK seems to have changed this: http://www.nltk.org/_modules/nltk/tree.html

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python2.7/dist-packages/geograpy/**init**.py", line 6, in get_place_context
        e.find_entities()
      File "/usr/local/lib/python2.7/dist-packages/geograpy/extraction.py", line 31, in find_entities
        if (ne.node == 'GPE' or ne.node == 'PERSON') and ne[0][1] == 'NNP':
      File "/usr/local/lib/python2.7/dist-packages/nltk/tree.py", line 198, in _get_node
        raise NotImplementedError("Use label() to access a node label.")
    NotImplementedError: Use label() to access a node label.
    
    
    opened by brunobg 6
  • geograpy-ntlk error

    geograpy-ntlk error

    This is an issue fork from #4 by @shun-liang. I have the same problem.

    When trying to run geograpy-nltk, I get the following error:

    Traceback (most recent call last): File "/Users/shun/.virtualenvs/hn_hiring_trend/bin/geograpy-nltk", line 5, in nltk.downloader('maxent_ne_chunker') TypeError: 'module' object is not callable

    opened by benmaier 3
  • Error in installation

    Error in installation

    I am getting following error while installation.

    Could not find a version that satisfies the requirement geograpy (from versions: ) No matching distribution found for geograpy.

    opened by Hima-Mehta 1
  • Let people know about never versions and stackoverflow questions

    Let people know about never versions and stackoverflow questions

    https://stackoverflow.com/questions/tagged/geograpy has now a list of questions about geograpy and it's different versions: See https://stackoverflow.com/tags/geograpy/info

    https://github.com/somnathrakshit/geograpy3 has been revived today and has a python3 compatible version that has been tested with python 3.6, 3.7 and 3.8 Please add newer issues there so that they can be propery fixed.

    opened by WolfgangFahl 0
  • AttributeError: 'NoneType' object has no attribute 'name'

    AttributeError: 'NoneType' object has no attribute 'name'

    p = geograpy.get_place_context(text='Pristina')

    Traceback (most recent call last): File "", line 1, in File "/home/cusco/VirtualEnvs/data_parser/lib/python3.7/site-packages/geograpy/init.py", line 12, in get_place_context pc.set_cities() File "/home/cusco/VirtualEnvs/data_parser/lib/python3.7/site-packages/geograpy/places.py", line 160, in set_cities country_name = country.name AttributeError: 'NoneType' object has no attribute 'name'

    'NoneType' object has no attribute 'name'

    opened by cusco 9
  • Unable to run it due to label() exception on extraction.py

    Unable to run it due to label() exception on extraction.py

    Traceback (most recent call last):
      File "ale.py", line 7, in <module>
        places = geograpy.get_place_context(url="https://www.cntraveler.com/hotels/hong-kong-s-a-r-/jordan/mandarin-oriental-hong-kong")
      File "/usr/local/lib/python2.7/site-packages/geograpy/__init__.py", line 6, in get_place_context
        e.find_entities()
      File "/usr/local/lib/python2.7/site-packages/geograpy/extraction.py", line 31, in find_entities
        if (ne.node == 'GPE' or ne.node == 'PERSON') and ne[0][1] == 'NNP':
      File "/usr/local/lib/python2.7/site-packages/nltk/tree.py", line 217, in _get_node
        raise NotImplementedError("Use label() to access a node label.")
    
    opened by AlejandroFernandesAntunes 1
  • OperationalError: unable to open database file

    OperationalError: unable to open database file

    plz I am having a big error here Traceback (most recent call last): File "sm.py", line 36, in pc = places.PlaceContext(['Cleveland', 'Ohio', 'United States']) File "/usr/local/lib/python3.6/dist-packages/geograpy3/places.py", line 34, in init self.conn = sqlite3.connect(db_file) sqlite3.OperationalError: unable to open database file

    opened by naspuka 2
  • regions/countries returning all proper nouns

    regions/countries returning all proper nouns

    import geograpy as gp url = "https://www.politico.eu/article/italy-incurable-economy/" places = gp.get_place_context(url = url) places.regions

    Returns a list of proper nouns from the article, the same goes for places.countries.

    places.country_cities seems to do better but still gives a funky return. {'Italy': ['Rome', 'Naples', 'Codogno'], 'United States': ['Rome', 'Naples', 'Pierre', 'Brussels', 'Italy'], 'Belgium': ['Brussels'], 'France': ['Pierre']}

    opened by saldutgr 1
Owner
Ushahidi
Building open sourced software to change the flow of information in the world.
Ushahidi
This is a no-bullshit file hosting and URL shortening service that also runs 0x0.st. Use with uWSGI.

This is a no-bullshit file hosting and URL shortening service that also runs 0x0.st. Use with uWSGI.

mia 1.6k Dec 31, 2022
Customizable URL shortener written in Python3 for sniffing and spoofing

Customizable URL shortener written in Python3 for sniffing and spoofing

null 3 Nov 22, 2022
A simple URL shortener app using Python AWS Chalice, AWS Lambda and AWS Dynamodb.

url-shortener-chalice A simple URL shortener app using AWS Chalice. Please make sure you configure your AWS credentials using AWS CLI before starting

Ranadeep Ghosh 2 Dec 9, 2022
Ukiyo - A simple, minimalist and efficient discord vanity URL sniper

Ukiyo - a simple, minimalist and efficient discord vanity URL sniper. Ukiyo is easy to use, has a very visually pleasing interface, and has great spee

null 13 Apr 14, 2022
🔗 FusiShort is a URL shortener built with Python, Redis, Docker and Kubernetes

This is a playground application created with goal of applying full cycle software development using popular technologies like Python, Redis, Docker and Kubernetes.

Lucas Fusinato Zanis 7 Nov 10, 2022
Have you ever wondered: Where does this link go? The REDLI Tool follows the path of the URL.

Have you ever wondered: Where does this link go? The REDLI Tool follows the path of the URL. It allows you to see the complete path a redirected URL goes through. It will show you the full redirection path of URLs, shortened links, or tiny URLs.

JAYAKUMAR 28 Sep 11, 2022
A URL builder for genius :D

genius-url A URL builder for genius :D Usage from gurl import genius_url

ꌗᖘ꒒ꀤ꓄꒒ꀤꈤꍟ 12 Aug 14, 2021
declutters url lists for crawling/pentesting

uro Using a URL list for security testing can be painful as there are a lot of URLs that have uninteresting/duplicate content; uro aims to solve that.

Somdev Sangwan 677 Jan 7, 2023
a url shortener project from semicolonworld

Url Shortener With Django Written by Semicolon World

null 3 Aug 24, 2021
find all the URL of a site with a specific Regex

href this program will find all the link with a spesfic Regex pattern from a site. what it will do in any site there are a lots of url that may you ne

Arya Shabane 12 Dec 5, 2022
python3 flask based python-url-shortener microservice.

python-url-shortener This repository is for managing all public/private entity specific api endpoints for an organisation. In this case we have entity

Asutosh Parida 1 Oct 18, 2021
A python code for url redirect check

A python code for url redirect check

Fayas Noushad 1 Oct 24, 2021
A url redirect status check module for python

A url redirect status check module for python

Fayas Noushad 2 Oct 24, 2021
URL Shortener in Flask - Web service using Flask framework for Shortener URLs

URL Shortener in Flask Web service using Flask framework for Shortener URLs Install Create Virtual env $ python3 -m venv env Install requirements.txt

Rafnix Guzman 1 Sep 21, 2021
Use this module to detect if a URL is on discord's phishing list.

PhishDetector This module was made so you can check a URL and see if it's in discord's official list of phishing and suspicious URLs. Installation pip

Elijah 4 Mar 25, 2022
A url shortner written in Flask.

url-shortener-elitmus This is a simple flask app which takes an URL and shortens it. This shortened verion of the URL redirects to the user to the lon

null 2 Nov 23, 2021
A simple URL shortener built with Flask

A simple URL shortener built with Flask and MongoDB.

Mike Lowe 2 Feb 5, 2022
Qysqa - URL shortener website with python

Qysqa - shorten your URL. ~ A simple URL-shortening website. how do you pronounc

Dastan Ozgeldi 0 Nov 18, 2022
Shorten-Link - Make shorten URL with Cuttly API

Shorten-Link This Script make shorten URL with custom slashtag The script take f

Ahmed Hossam 3 Feb 13, 2022