A fast and expressive Craigslist API wrapper

Ira Horecka

Last update: Dec 28, 2022

Related tags

Overview

pycraigslist

A fast and expressive Craigslist API wrapper.

⚠	As of September 2021, it is believed that Craigslist added a rate-limiter. It is advised to throttle requests to Craigslist to prevent a 403 HTTP response status code. View the Exceptions section below to appropriately handle this error if encountered.

Disclaimer

I do not work or have an affiliation with Craigslist.
This library is intended for educational purposes. It is not advised to crawl and download data from Craigslist.

Installation

pip install pycraigslist

Quick Start

Find cars & trucks for sale with keyword "Mazda Miata" in the East Bay Area, California:

>> {'country': 'US', 'region': 'CA', 'site': 'sfbay', 'area': 'eby', 'category': 'cto', 'id': '7291715564', 'repost_of': '', 'last_updated': '2021-03-15 09:06', 'title': '1990 Mazda Miata', 'neighborhood': 'oakland lake merritt / grand', 'price': '$5,000', 'url': 'https://sfbay.craigslist.org/eby/cto/d/oakland-1990-mazda-miata/7291715564.html'} # ... ">

import pycraigslist

miatas = pycraigslist.forsale.cta(site="sfbay", area="eby", query="Mazda Miata")
for miata in miatas.search():
    print(miata)

>>> {'country': 'US',
    'region': 'CA',
    'site': 'sfbay',
    'area': 'eby',
    'category': 'cto',
    'id': '7291715564',
    'repost_of': '',
    'last_updated': '2021-03-15 09:06',
    'title': '1990 Mazda Miata',
    'neighborhood': 'oakland lake merritt / grand',
    'price': '$5,000',
    'url': 'https://sfbay.craigslist.org/eby/cto/d/oakland-1990-mazda-miata/7291715564.html'}
    # ...

Background

This library is intended to be expressive and easy to use.

pycraigslist classes

pycraigslist.community (craigslist.org > community)
pycraigslist.events (craigslist.org > event calendar)
pycraigslist.forsale (craigslist.org > for sale)
pycraigslist.gigs (craigslist.org > gigs)
pycraigslist.housing (craigslist.org > housing)
pycraigslist.jobs (craigslist.org > jobs)
pycraigslist.resumes (craigslist.org > resumes)
pycraigslist.services (craigslist.org > services)

We can search for posts in parent classes. For example, finding paid gigs in Portland, Oregon:

>> {'country': 'US', 'region': 'OR', 'site': 'portland', 'area': 'mlt', 'category': 'lbg', 'id': '7295392821', 'repost_of': '7292985211', 'last_updated': '2021-03-22 13:00', 'title': 'Packing and moving', 'neighborhood': 'SE Portland', 'price': '', 'url': 'https://portland.craigslist.org/mlt/lbg/d/portland-packing-and-moving/7295392821.html'} # ... ">

import pycraigslist

paid_gigs = pycraigslist.gigs(site="portland", is_paid=True)
for gig in paid_gigs.search():
    print(gig)

>>> {'country': 'US',
    'region': 'OR',
    'site': 'portland',
    'area': 'mlt',
    'category': 'lbg',
    'id': '7295392821',
    'repost_of': '7292985211',
    'last_updated': '2021-03-22 13:00',
    'title': 'Packing and moving',
    'neighborhood': 'SE Portland',
    'price': '',
    'url': 'https://portland.craigslist.org/mlt/lbg/d/portland-packing-and-moving/7295392821.html'}
    # ...

pycraigslist subclasses

Most pycraigslist classes have subclasses to allow for categorical searches. For example:

pycraigslist.forsale.bia (craigslist.org > for sale > bikes)
pycraigslist.forsale.cta (craigslist.org > for sale > cars & trucks)
pycraigslist.housing.apa (craigslist.org > housing > apartments / housing for rent)
pycraigslist.housing.roo (craigslist.org > housing > apartments / rooms & shares)

Finding pycraigslist subclasses

Use class method .get_categories() to search for subclasses. The resulting keys are the subclass names.

import pycraigslist

print(pycraigslist.housing.get_categories())

>>> {'apa': 'apartments / housing for rent',
    'swp': 'housing swap',
    'off': 'office & commercial',
    'prk': 'parking & storage',
    'rea': 'real estate',
    'reb': 'real estate - by dealer',
    'reo': 'real estate - by owner',
    'roo': 'rooms & shares',
    'sub': 'sublets & temporary',
    'vac': 'vacation rentals',
    'hou': 'wanted: apts',
    'rew': 'wanted: real estate',
    'sha': 'wanted: room/share',
    'sbw': 'wanted: sublet/temp'}

We'd choose pycraigslist.housing.vac if we're interested in searching for vacation rentals.

Finding and using filters

We can apply filters to our search. Use .get_filters() to find valid filters for a class or subclass instance.

Search filters are sensitive to the context of the pycraigslist instance and the language of the region. For example, here are applicable filters for cars & trucks for sale in Tokyo, Japan:

>> {'query': '...', 'search_titles': 'True/False', 'has_image': 'True/False', 'posted_today': 'True/False', 'bundle_duplicates': 'True/False', 'search_distance': '...', 'zip_code': '...', 'min_price': '...', 'max_price': '...', 'make_model': '...', 'min_year': '...', 'max_year': '...', 'min_miles': '...', 'max_miles': '...', 'min_engine_displacement': '...', 'max_engine_displacement': '...', 'condition': ['新品', 'ほぼ新品', '美品', '良品', '使用に問題なし', 'サルベージ'], 'auto_cylinders': ['3気筒', '4気筒', '5気筒', '6気筒', '8気筒', '10気筒', '12気筒', 'その他'], 'auto_drivetrain': ['前輪', '後輪', '4WD'], 'auto_fuel_type': ['ガソリン', 'ディーゼル', 'ハイブリッド', '電気', 'その他'], 'auto_paint': ['ブラック', 'ブルー', 'ブラウン', 'グリーン', 'グレー', 'オレンジ', 'パープル', 'レッド', 'シルバー', 'ホワイト', 'イエロー', 'カスタム'], 'auto_size': ['コンパクト', 'フルサイズ', '中型', 'サブコンパクト'], 'auto_title_status': ['クリーン', 'サルベージ', '再生', '部品のみ', '先取特権', '不明'], 'auto_transmission': ['MT', 'AT', 'その他'], 'auto_bodytype': ['バス', 'コンバーチブル', 'クーペ', 'ハッチバック', 'ミニバン', 'オフロード', 'ピックアップ', 'セダン', 'トラック', 'SUV', 'ワゴン', 'バン', 'その他'], 'language': ['afrikaans', 'català', 'dansk', 'deutsch', 'english', 'español', 'suomi', 'français', 'italiano', 'nederlands', 'norsk', 'português', 'svenska', 'filipino', 'türkçe', '中文', 'العربية', '日本語', '한국말', 'русский', 'tiếng việt']} ">

import pycraigslist

tokyo_autos = pycraigslist.forsale.cta(site="tokyo")
print(tokyo_autos.get_filters())

>>> {'query': '...', 'search_titles': 'True/False', 'has_image': 'True/False',
    'posted_today': 'True/False', 'bundle_duplicates': 'True/False',
    'search_distance': '...', 'zip_code': '...', 'min_price': '...', 'max_price': '...',
    'make_model': '...', 'min_year': '...', 'max_year': '...', 'min_miles': '...',
    'max_miles': '...', 'min_engine_displacement': '...', 'max_engine_displacement': '...',
    'condition': ['新品', 'ほぼ新品', '美品', '良品', '使用に問題なし', 'サルベージ'],
    'auto_cylinders': ['3気筒', '4気筒', '5気筒', '6気筒', '8気筒', '10気筒', '12気筒', 'その他'],
    'auto_drivetrain': ['前輪', '後輪', '4WD'],
    'auto_fuel_type': ['ガソリン', 'ディーゼル', 'ハイブリッド', '電気', 'その他'],
    'auto_paint': ['ブラック', 'ブルー', 'ブラウン', 'グリーン', 'グレー', 'オレンジ', 'パープル',
                   'レッド', 'シルバー', 'ホワイト', 'イエロー', 'カスタム'],
    'auto_size': ['コンパクト', 'フルサイズ', '中型', 'サブコンパクト'],
    'auto_title_status': ['クリーン', 'サルベージ', '再生', '部品のみ', '先取特権', '不明'],
    'auto_transmission': ['MT', 'AT', 'その他'],
    'auto_bodytype': ['バス', 'コンバーチブル', 'クーペ', 'ハッチバック', 'ミニバン', 'オフロード',
                      'ピックアップ', 'セダン', 'トラック', 'SUV', 'ワゴン', 'バン', 'その他'],
    'language': ['afrikaans', 'català', 'dansk', 'deutsch', 'english', 'español', 'suomi',
                 'français', 'italiano', 'nederlands', 'norsk', 'português', 'svenska',
                 'filipino', 'türkçe', '中文', 'العربية', '日本語', '한국말', 'русский',
                 'tiếng việt']}

To find cars & trucks with clean titles, we'd use the filter parameter "クリーン" (pronounced 'koo-lean'):

>> {'country': 'JP', 'region': '', 'site': 'tokyo', 'area': '', 'category': 'cto', 'id': '7301105503', 'repost_of': '', 'last_updated': '2021-04-03 14:04', 'title': 'Suzuki Jimny 660 XG 4WD Keyless Entry Aluminum Wheel Non-Smoking Car', 'neighborhood': 'Chiba Ken, Noda shi, Funakata 1630-1', 'price': '¥650,000', 'url': 'https://tokyo.craigslist.org/cto/d/suzuki-jimny-660-xg-4wd-keyless-entry/7301105503.html'} # ... ">

import pycraigslist

tokyo_autos = pycraigslist.forsale.cta(site="tokyo", auto_title_status="クリーン")
for auto in tokyo_autos.search():
    print(auto)

>>> {'country': 'JP',
    'region': '',
    'site': 'tokyo',
    'area': '',
    'category': 'cto',
    'id': '7301105503',
    'repost_of': '',
    'last_updated': '2021-04-03 14:04',
    'title': 'Suzuki Jimny 660 XG 4WD Keyless Entry Aluminum Wheel Non-Smoking Car',
    'neighborhood': 'Chiba Ken, Noda shi, Funakata 1630-1',
    'price': '¥650,000',
    'url': 'https://tokyo.craigslist.org/cto/d/suzuki-jimny-660-xg-4wd-keyless-entry/7301105503.html'}
    # ...

When applying many filters, pass a dictionary of filters into the filters keyword parameter. Note: keyword argument filters will override filters if there are conflicting keys. For example:

import pycraigslist

bike_filters = {
    "bicycle_frame_material": "steel",
    # array of filter values are accepted
    "bicycle_wheel_size": ["650C", "700C"],
    "bicycle_type": "road",
}
# we'd still get titanium road bikes with size 650C or 700C wheels
titanium_bikes = pycraigslist.forsale.bia(
    site="sfbay", area="sfc", bicycle_frame_material="titanium", filters=bike_filters
)

Searching for posts

General search

To search for Craigslist posts, use .search(). .search() will return a dictionary of post attributes (type str) and will search for every post by default. Use the limit keyword parameter to add a stop limit to a query. For example, use limit=50 to get 50 posts. There is a maximum of 3000 posts per query.

Find the first 20 posts for farming and gardening services in Denver, Colorado:

>> {'country': 'US', 'region': 'CO', 'site': 'denver', 'area': '', 'category': 'fgs', 'id': '7301324564', 'repost_of': '6974119634', 'last_updated': '2021-04-03 11:47', 'title': '🌲 Tree Removal/Trimming, Stump Grind: LICENSED/INSURED! 720-605-1584', 'neighborhood': 'All Areas', 'price': '', 'url': 'https://denver.craigslist.org/fgs/d/littleton-tree-removal-trimming-stump/7301324564.html'} # ... ">

import pycraigslist

gardening_services = pycraigslist.services.fgs(site="denver")
for service in gardening_services.search(limit=20):
    print(service)

>>> {'country': 'US',
    'region': 'CO',
    'site': 'denver',
    'area': '',
    'category': 'fgs',
    'id': '7301324564',
    'repost_of': '6974119634',
    'last_updated': '2021-04-03 11:47',
    'title': '🌲 Tree Removal/Trimming, Stump Grind: LICENSED/INSURED! 720-605-1584',
    'neighborhood': 'All Areas',
    'price': '',
    'url': 'https://denver.craigslist.org/fgs/d/littleton-tree-removal-trimming-stump/7301324564.html'}
    # ...

Detailed search

Use .search_detail() to get detailed Craigslist posts. The limit keyword parameter in .search also applies to .search_detail. Set include_body=True to include the post's body in the output. By default, include_body=False. Disclaimer: .search_detail is more time consuming than .search.

Get detailed posts with the post body for all cars & trucks for sale in Abilene, Texas:

>> {'country': 'US', 'region': 'TX', 'site': 'abilene', 'area': '', 'category': 'cto', 'id': '7309894792', 'repost_of': '', 'last_updated': '2021-04-20 12:17', 'title': '2009 Mercedes GL-320', 'neighborhood': 'Brownwood', 'price': '$12,000', 'url': 'https://abilene.craigslist.org/cto/d/brownwood-2009-mercedes-gl-320/7309894792.html', 'lat': '31.729000', 'lon': '-99.019000', 'address': '', 'misc': ['2009 mercedes-benz gl-class'], 'condition': 'excellent', 'drive': 'fwd', 'fuel': 'diesel', 'odometer': '100700', 'paint_color': 'black', 'title_status': 'clean', 'transmission': 'automatic', 'body': 'BEAUTIFUL car inside and out!! Diesel with only 100k, mechanic says its in great condition.'} # ... ">

import pycraigslist

all_autos = pycraigslist.forsale.cta(site="abilene")
for auto in all_autos.search_detail(include_body=True):
    print(auto)

>>> {'country': 'US',
    'region': 'TX',
    'site': 'abilene',
    'area': '',
    'category': 'cto',
    'id': '7309894792',
    'repost_of': '',
    'last_updated': '2021-04-20 12:17',
    'title': '2009 Mercedes GL-320',
    'neighborhood': 'Brownwood',
    'price': '$12,000',
    'url': 'https://abilene.craigslist.org/cto/d/brownwood-2009-mercedes-gl-320/7309894792.html',
    'lat': '31.729000',
    'lon': '-99.019000',
    'address': '',
    'misc': ['2009 mercedes-benz gl-class'],
    'condition': 'excellent',
    'drive': 'fwd',
    'fuel': 'diesel',
    'odometer': '100700',
    'paint_color': 'black',
    'title_status': 'clean',
    'transmission': 'automatic',
    'body': 'BEAUTIFUL car inside and out!! Diesel with only 100k, mechanic says its in great condition.'}
    # ...

Additional attributes

__doc__: Gets category name.
url: Gets full URL.
count: Gets number of posts.

>> 'apartments / housing for rent' # 2 print(east_bay_apa.url) >>> 'https://sfbay.craigslist.org/search/eby/apa?searchNearby=1&s=0&max_price=800' # 3 print(east_bay_apa.count) >>> 56 ">

import pycraigslist

east_bay_apa = pycraigslist.housing.apa(site="sfbay", area="eby", max_price=800)

# 1
print(east_bay_apa.__doc__)
>>> 'apartments / housing for rent'

# 2
print(east_bay_apa.url)
>>> 'https://sfbay.craigslist.org/search/eby/apa?searchNearby=1&s=0&max_price=800'

# 3
print(east_bay_apa.count)
>>> 56

Exceptions

pycraigslist has the following exceptions:

ConnectionError : exceeded maximum retries for a query
HTTPError : encountered a client or server error
InvalidFilterValue : filter is not recognized or has an invalid value

To use pycraigslist exceptions, import / import from pycraigslist.exceptions. For example:

import pycraigslist
from pycraigslist.exceptions import ConnectionError, HTTPError, InvalidFilterValue

try:
    sf_bikes = pycraigslist.forsale.bia(site="sfbay", area="sfc", min_price=50)
    for bike in sf_bikes.search():
        print(bike)
except ConnectionError:
    print("Yikes! Something's up with the network.")
except HTTPError as e:
    print(f"Bad HTTP response encountered: {e.status_code} {e.detail}")
except InvalidFilterValue as e:
    print(f"Craigslist filter validation failed. Filter: '{e.name}', Value: '{e.value}'")

Contribute

Support

If you are having issues or would like to propose a new feature, please use the issues tracker.

License

The project is licensed under the MIT license.

Comments

Filter error?

Using the test code below, it seems that the filter options isn't working properly. If i run the code, all type of bikes show in the results.

import pycraigslist

bike_filters = {
    'posted_today': True,
    'bundle_duplicates': True,
    'make': 'Harley',
    'model': 'Softail'
}

bikes = pycraigslist.forsale.mca(site='miami', area='brw', filters=bike_filters)

for bike in bikes.search(limit=5):
    print(bike)

opened by usctzen 7

Error with get_filters()

From the beginning, I was using the following to get the filters.

import pycraigslist

print(pycraigslist.forsale.mca.get_filters())

I needed to verify my filters again and when I issued the command, it replied:

Traceback (most recent call last):
  File "C:\Users\mgpd\PycharmProjects\MOlivo\py_clist.py", line 3, in <module>
    print(pycraigslist.forsale.mca.get_filters())
TypeError: get_filters() missing 1 required positional argument: 'self'

opened by usctzen 5

Missing categories (subcategories?)

Hi,

Your API does not allow me me to use some categories (subcategories?) . For example I specifically want to use the "mcy" category which a subset of the "mca" category. The API return an error if I use this:

bikes = pycraigslist.forsale.mcy(site='miami', area='brw', filters=bike_filters)

This is the error return.

Traceback (most recent call last):
File "C:/Users/mgpd/PycharmProjects/molivo/py_clist.py", line 11, in <module>
bikes = pycraigslist.forsale.mcy(site='miami', area='brw', filters=bike_filters)
AttributeError: type object 'forsale' has no attribute 'mcy'

The "mcy" category is valid. This is the motorcycles for sale by owners.

This category is returned in the data set. Here the full test script:

import pycraigslist

bike_filters = {
    'posted_today': True,
    'bundle_duplicates': True,
    'make': 'Harley',
    'model': 'Softail'
}

bikes = pycraigslist.forsale.mca(site='miami', area='brw', filters=bike_filters)

for bike in bikes.search(limit=5):
    print(bike)

opened by usctzen 5

Limit function has problems.
Using the script below yields different results based on the environment. When used with Google Collab (python 3.8.8) the limit function works fine. When used in my PyCharm (also python 3.8.8) it returns exactly 3 times the limit requested. In this case 15 results are returned in PyCharm.

import pycraigslist bike_filters = { 'posted_today': True, 'bundle_duplicates': True, 'make': 'Harley', 'model': 'Softail' } bikes = pycraigslist.forsale.mca(site='miami', area='brw', filters=bike_filters) for bike in bikes.search(limit=5): print(bike)

Let it be known, that my PyCharm is on Windows 10.
opened by usctzen 4
Validate sites on object instantiation
Is there an easy way to verify if a site (e.g. sfbay, raleigh, etc.) is valid prior to searching via pycraigslist?

Consider the example below:

import pycraigslist def search_miatas(site: str) -> bool: miatas = pycraigslist.forsale.cta(site=site, query="mx5|miata") for miata in miatas.search(limit=1): print(miata)

If I call this function using a known good site, then pycraigslist works as expected.

search_miatas("raleigh") >>> {'country': 'US', 'region': 'NC', 'site': 'raleigh', 'area': '', 'category': 'ctd', 'id': '7358600557', 'repost_of': '7342453821', 'last_updated': '2021-07-30 15:51', 'title': '2016 Mazda MX-5 Miata Grand Touring 2dr Convertible 6M 8000 Miles', 'neighborhood': 'Durham', 'price': '$27,495', 'url': 'https://raleigh.craigslist.org/ctd/d/durham-2016-mazda-mx-miata-grand/7358600557.html'}

However, if I call this function with a known incorrect site, then pycraigslist eventually raises MaximumRequestsError.

search_miatas("test") >>> MaximumRequestsError: Maximum requests attempted - check network connection.

This makes logical sense, because in the background, pycraigslist is trying to hit https://test.craigslist.org, which doesn't exist, so httpx can't connect, so tenacity raises TryError, which pycraigslist re-raises as MaximumRequestsError.

The problem is that the root cause of MaximumRequestsError could be unclear if I'm accepting site input from an unsanitized source (meaning, raw user input). From what I can tell, there's not an easy way to verify whether MaximumRequestsError is raised because there's a legitimate network connectivity issue, or if the user-defined site is invalid.

I can think of two ways to solve this, assuming it's not solved already:

Whenever a pycraigslist.api.BaseAPI object is instantiated, prior to fetching additional filters, parse the list of Craigslist sites, validate that the site parameter is in the list of sites, and raise an exception if it's not. Downside of this approach is increased execution time (meaning, decreased performance) whenever instantiating a new object.

Keep an internal mapping of valid site subdomains and perform the same validation described above. Downside of this approach is, whenever Craigslist adds a new site, you have to throw together a new release to add support for it.
opened by ChristopherJHart 3
Results count

Ira,

Is there a way we could get a result count before running the loop? Does not need an exact number. It just could be a False/True situation. I don't know where you would put this, but if the has_results = True you run the loop.

Not a deal-breaker, but it would be nice.

Thanks

opened by usctzen 3
Add InvalidFilterValue exception for more graceful error handling

Fix #8 by adding a new custom exception, InvalidFilterValue, which will allow for more graceful error handling in scenarios where a Craigslist search filter does not have a valid value.

Also added a few basic unit tests for some underlying filter argument/value parsing functions.

opened by ChristopherJHart 0
pycraigslist.query.filters.parse_filters() should throw custom exception
Right now, the pycraigslist.query.filters.parse_filters() function throws ValueError with a string indicating a filter has an incorrect value.

It would be nice if we can create a custom exception inherited from ValueError that takes in the filter name and incorrect value as parameters. This promotes more graceful error-handling in applications using pycraigslist.

I'm thinking something like this in pycraigslist.exceptions:

class InvalidFilterValue(ValueError): def __init__(self, name, value): self.name = name self.value = value
opened by ChristopherJHart 0
Upgrade to GitHub-native Dependabot

Dependabot Preview will be shut down on August 3rd, 2021. In order to keep getting Dependabot updates, please merge this PR and migrate to GitHub-native Dependabot before then.

Dependabot has been fully integrated into GitHub, so you no longer have to install and manage a separate app. This pull request migrates your configuration from Dependabot.com to a config file, using the new syntax. When merged, we'll swap out dependabot-preview (me) for a new dependabot app, and you'll be all set!

With this change, you'll now use the Dependabot page in GitHub, rather than the Dependabot dashboard, to monitor your version updates, and you'll configure Dependabot through the new config file rather than a UI.

If you've got any questions or feedback for us, please let us know by creating an issue in the dependabot/dependabot-core repository.

Learn more about migrating to GitHub-native Dependabot

Please note that regular @dependabot commands do not work on this pull request.
dependencies

opened by dependabot-preview[bot] 0
No results from query
The following Python code was returning accurate results for several months. In tests with results showing up properly on craigslist, the library is returning 0 results using the same query.

city = "newyork" result = pycraigslist.gigs.cpg(site=city, posted_today=True, search_distance=100)

It appears Craigslist has added a redirect to the search resulting in adding something like this to the URL:

#search=1~list~0~0

Might this be affecting the library? I also noticed that when modifying searchNearby to 0 in the URL, the page returns results, but this is not able to be set using this library.
opened by paradise-point 4

Owner

Ira Horecka

GitHub

A basic API to scrape Craigslist.

CLAPI A basic API to scrape Craigslist. Most useful for viewing posts across a broad geographic area or for viewing posts within a specific timeframe.