:cloud: Python API for ThePirateBay.

Karan Goel

Last update: Oct 21, 2022

Related tags

Third-party APIs Wrappers TPB

Overview

Unofficial Python API for ThePirateBay.

Build Status	Test Coverage	Version	Downloads (30 days)

Installation

$ pip install ThePirateBay

Note that ThePirateBay depends on lxml. If you run into problems in the compilation of lxml through pip, install the libxml2-dev and libxslt-dev packages on your system.

Usage

from tpb import TPB
from tpb import CATEGORIES, ORDERS

t = TPB('https://thepiratebay.org') # create a TPB object with default domain

# search for 'public domain' in 'movies' category
search = t.search('public domain', category=CATEGORIES.VIDEO.MOVIES)

# return listings from page 2 of this search
search.page(2)

# sort this search by count of seeders, and return a multipage result
search.order(ORDERS.SEEDERS.ASC).multipage()

# search, order by seeders and return page 3 results
t.search('python').order(ORDERS.SEEDERS.ASC).page(3)

# multipage beginning on page 4
t.search('recipe book').page(4).multipage()

# search, in a category and return multipage results
t.search('something').category(CATEGORIES.OTHER.OTHER).multipage()

# get page 3 of recent torrents
t.recent().page(3)

# get top torrents in Movies category
t.top().category(CATEGORIES.VIDEO.MOVIES)

# print all torrent descriptions
for torrent in t.search('public domain'):
    print(torrent.info)

# print all torrent files and their sizes
for torrent in t.search('public domain'):
    print(torrent.files)

Torrent details available

Attributes

title # the title of the torrent
url # TPB url for the torrent
category # the main category
sub_category # the sub category
magnet_link # magnet download link
torrent_link # .torrent download link
created # uploaded date time
size # size of torrent
user # username of uploader
seeders # number of seeders
leechers # number of leechers

Properties

created # creation date -- parsed when accessed
info # detailed torrent description -- needs separate request
files # dictionary of files and their size -- needs separate request

Tests

Tests can be ran using tox.

$ pip install tox
$ tox

Alternatively, you will need to install dependencies manually:

$ pip install -r tests/requirements.txt

Then, to execute the tests simply run:

$ python -m unittest discover

By default the tests are ran on a local test server with predownloaded original responses. You can activate the remote running option by:

$ REMOTE=true python -m unittest discover

Donations

If TPB has helped you in any way, and you'd like to help the developer, please consider donating.

- BTC: 19dLDL4ax7xRmMiGDAbkizh6WA6Yei2zP5

- Gratipay: https://www.gratipay.com/karan/

- Flattr: https://flattr.com/profile/thekarangoel

Contribute

If you want to add any new features, or improve existing ones, feel free to send a pull request!

Comments

Total rewrite allowing request modification, advanced pagination and chained methods.

from tpb import TPB
from tpb import CATEGORIES, ORDERING

t = TPB('http://thepiratebay.sx/')
search = t.search('breaking bad', category=CATEGORIES.MOVIES)
search.page(2)
search.ordering(ORDERING.SEEDERS).multipage()
t.search('breaking bad').ordering(ORDERING.SEEDERS).page(3)
t.search('babylon 5').page(4).multipage() # multipage beginning on page 4
t.search('something').category(CATEGORIES.OTHERS).multipage()
t.recent().page(3)
t.top().category(CATEGORIES.MOVIES)

opened by umazalakain 9

urllib.error.HTTPError: HTTP Error 403: Forbidden

When i run it with any request, i get this HTTP error:

Traceback (most recent call last):
  File "main.py", line 7, in <module>
    for torrent in t.search('neighbors'):
  File "/usr/lib/python3.5/site-packages/tpb/tpb.py", line 149, in items
    for item in super(Paginated, self).items():
  File "/usr/lib/python3.5/site-packages/tpb/tpb.py", line 59, in items
    request = urlopen(str(self.url))
  File "/usr/lib/python3.5/urllib/request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.5/urllib/request.py", line 472, in open
    response = meth(req, response)
  File "/usr/lib/python3.5/urllib/request.py", line 582, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.5/urllib/request.py", line 510, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.5/urllib/request.py", line 444, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.5/urllib/request.py", line 590, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

This is the code i tried:

from tpb import TPB
from tpb import CATEGORIES, ORDERS

t = TPB('https://thepiratebay.org') # create a TPB object with default domain

# search for 'public domain' in 'movies' category
search = t.search('public domain', category=CATEGORIES.VIDEO.MOVIES)

# return listings from page 2 of this search
search.page(2)

# sort this search by count of seeders, and return a multipage result
search.order(ORDERS.SEEDERS.ASC).multipage()

# search, order by seeders and return page 3 results
t.search('python').order(ORDERS.SEEDERS.ASC).page(3)

# multipage beginning on page 4
t.search('recipe book').page(4).multipage()

# search, in a category and return multipage results
t.search('something').category(CATEGORIES.OTHER.OTHER).multipage()

# get page 3 of recent torrents
t.recent().page(3)

# get top torrents in Movies category
t.top().category(CATEGORIES.VIDEO.MOVIES)

# print all torrent descriptions
for torrent in t.search('public domain'):
    print(torrent.info)

# print all torrent files and their sizes
for torrent in t.search('public domain'):
    print(torrent.files)

opened by Almoullim 6

Missing Category

Please consider adding 'PORN' and its sub-categories to the code ;) Currently specifying category(CATEGORIES.PORN) throws up an AttributeError.

Cheers.

opened by sbjaved 6
cli utility

A command line interface would be a great tool to make use of TPB. What do you think? It would be also great to automatically detect witch BitTorrent client has the system installed and launch selected torrents downloads with it!

A ncurses interface would be also great but they are a bit more complex. Any thoughts?
wontfix

opened by umazalakain 6
Adding the classmethod from_string in the Torrent class, which allows to

build a Torrent object from its url. Fixes issue #65.

I added beautifulsoup as a new dependency. I'm sorry, I just can't stand lxml. It might also not be be the smartest way to do the job, but the job is done.

opened by JPFrancoia 5

Element not found when parsing dates

Hi, It's my first time using Python so my debugging skills are very rough, but upon iterating over some search results, I'm getting the following error: TypeError: 'NoneType' object is not iterable

Stack trace ends with the following:

  File "/Users/tomasz/Server/couch/venv/lib/python2.7/site-packages/tpb/tpb.py", line 144, in items
    for item in super(Paginated, self).items():
  File "/Users/tomasz/Server/couch/venv/lib/python2.7/site-packages/tpb/tpb.py", line 60, in items
    yield self._build_torrent(row)
  File "/Users/tomasz/Server/couch/venv/lib/python2.7/site-packages/tpb/tpb.py", line 102, in _build_torrent
    created = dateutil.parser.parse(match.groups()[0].replace('\xa0', ' '))
  File "/Users/tomasz/Server/couch/venv/lib/python2.7/site-packages/dateutil/parser.py", line 748, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/Users/tomasz/Server/couch/venv/lib/python2.7/site-packages/dateutil/parser.py", line 310, in parse
    res, skipped_tokens = self._parse(timestr, **kwargs)

After digging into the code, line 102 in tpb.py seems to be the one causing issues created = dateutil.parser.parse(match.groups()[0].replace('\xa0', ' '))

When replaced with datetime.new() it works. Having said that, it might be TPB itself causing problems as I can't reliably reproduce the issue 100% of the time... sometimes it just works. Any thoughts?

opened by tomasz-tomczyk 5

Test server and base case
Base test case that executes every test on remote and local URLs:

executes on remote only if remote is available.

executes always on a local bottle server that returns predownloaded original responses.
opened by umazalakain 5
The multipage date does not work

Hi,

To begin, TPB changed their domain again, just to let you know. Second, it seems the torrent.created gives a wrong date when used with multipage.

Example:

search = t.search('*', category=CATEGORIES.VIDEO.MOVIES) search.order(ORDERS.UPLOADED.DES).multipage()

for torrent in search: print(torrent, torrent.created)

OUTPUT:

... Gravity (2013) DVDScr x264 AC3 by GFGTORRZ 2013-12-14 01:47:00 Resident Evil 2007 Special Edition by GFGTORRZ 2013-12-14 01:43:00 Romeo.And.Juliet.A.Love.Song.2013.DVDRip.XviD.AC3-EVO by UltraTorrents 2013-12-14 01:37:00 Vikingdom 2013 BRRip x264 AC3-MiLLENiUM by TvTeam 2013-12-14 01:29:00 Open Grave [2013] BRRip XViD-ETRG by UltraTorrents 2013-12-14 01:10:00 Gilda (1946) Español Latino by Rob.Merc. 2013-12-14 01:04:00 Turbo.2013.DVDRip.H264.AAC by Xanthippus 2013-12-14 00:11:00 A Fronteira.DVDRip.XviD-DualAudio by Sapop 2013-12-14 14:42:10.814431 Lisa.1990.DVDRip.XviD-EBX by TvTeam 2013-12-14 14:42:10.815012 ...

After midnight, when we are supposed to go to the past day, the date is the current date, i.e if I start the script at 14:42, the torrent date will be 14:42, same day.

BUT, if I loop: search.order(ORDERS.UPLOADED.DES).page(i)

It's ok, the date is right.

opened by JPFrancoia 4
Obtain the nfo infos

Hi,

I wonder if you plan to implement a way to get the infos in the nfo beacon, on the complete page of a torrent.

ex: http://thepiratebay.sx/torrent/9236862/Ubuntu_Gnome_13.10_64-bit

On this page, the beacon is: div class="nfo"

It contains the comments of the uploader, and sometimes some links.

Thxs.
enhancement

opened by JPFrancoia 4
Index out of range for CSS Select

I am getting the following error when running the example script from the project site (here: https://github.com/karan/TPB#usage):

File "example.py", line 34, in print(torrent.info) File "/usr/local/lib/python2.7/dist-packages/ThePirateBay-v1.3.5-py2.7.egg/tpb/tpb.py", line 347, in info info = root.cssselect('#details > .nfo > pre')[0].text_content() IndexError: list index out of range

The only change I made, was replacing the domain from "https://thepiratebay.org" to "https://thepiratebay.la"

opened by sebastian9er 3
Handling the decompression of the data from tpb

Now using requests module instead or urllib. requests.get() handles the decompression of the data given by the website. Fixes issue #68. But introduces a dependance to te requests module.

opened by JPFrancoia 3
Fixed issue relating to row parsing on other mirrors than piratebay.org

The previous version of this didn't allow functionality on anything other than piratebay.org. I made a fix so that it can work on all functioning mirrors of ThePirateBay. This was a fix of what rows it is allowed to parse, as the previous one included the last row to be parsed which had a column count of 1 instead of the typical 4.

opened by brandongallagher1999 0
fixed URL lib calls, fixed HTTP error, working on python 3.8

This update allows the API to function in the latest version of python 3.8 / 3.9.

Was getting an HTTP error beforehand using urllib and replaced it with requests.

opened by brandongallagher1999 0
I've forked the project

It's here: https://github.com/pawamoy/TPB

Due to @karan not responding to issues and PRs, I've taken the liberty to fork the project and merge some PRs in it.

Still not pushed to PyPI. What distribution name should I use since ThePirateBay is taken? tpb maybe?

opened by pawamoy 1
Typo dependency on "dateutils" library

It seems that you have dateutils in your setup.py. That refers to this package which is almost certainly not what you wanted. I'm guessing you wanted python-dateutil and made a mistake about the package name. Since dateutils depends on python-dateutil, you probably didn't notice the mistake.

You seem to have correctly listed python-dateutil in the requirements.txt file (though pinning to a quite old version).

opened by pganssle 0
Switched broken pypip.in badges to shields.io

Hello, this is an auto-generated Pull Request. (Feedback?)

Some time ago, pypip.in shut down. This broke the badges for a bunch of repositories, including thepiratebay. Thankfully, an equivalent service is run by shields.io. This pull request changes the badges to use shields.io instead.

Unfortunately, PyPI has removed download statistics from their API, which means that even the shields.io "download count" badges are broken (they display "no longer available". See this). So those badges should really be removed entirely. Since this is an automated process (and trying to automatically remove the badges from READMEs can be tricky), this pull request just replaces the URL with the shields.io syntax.

opened by movermeyer 0

Owner

Karan Goel

Little brown guy with big dreams. https://goel.io

GitHub

Python client for using Prefect Cloud with Saturn Cloud

prefect-saturn prefect-saturn is a Python package that makes it easy to run Prefect Cloud flows on a Dask cluster with Saturn Cloud. For a detailed tu

15 Dec 7, 2022

💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline!

LocalStack - A fully functional local AWS cloud stack LocalStack provides an easy-to-use test/mocking framework for developing Cloud applications. Cur

45.3k Jan 2, 2023

Cloud-native, data onboarding architecture for the Google Cloud Public Datasets program

Public Datasets Pipelines Cloud-native, data pipeline architecture for onboarding datasets to the Google Cloud Public Datasets Program. Overview Requi

109 Dec 30, 2022

A python to scratch API connector. Can fetch data from the API and send it back in cloud variables.

Scratch2py Scratch2py or S2py is a easy to use, versatile tool to communicate with the Scratch API Based of scratchclient by Raihan142857 Installation

20 Jun 18, 2022

AirDrive lets you store unlimited files to cloud for free. Upload & download files from your personal drive at any time using its super-fast API.

4 Jul 12, 2022

CloudFormation Drift Remediation - Use Cloud Control API to remediate drift that was detected on a CloudFormation stack

36 Dec 11, 2022

Get some python in google cloud functions

[NOTE]: This is a highly experimental (and proof of concept) library so do not expect all python packages to work flawlessly. Also, cloud functions ar

200 Nov 24, 2022

Python SDK for IEX Cloud

iexfinance Python SDK for IEX Cloud. Architecture mirrors that of the IEX Cloud API (and its documentation). An easy-to-use toolkit to obtain data for

640 Jan 7, 2023

The Python SDK for the Rackspace Cloud

pyrax Python SDK for OpenStack/Rackspace APIs DEPRECATED: Pyrax is no longer being developed or supported. See openstacksdk and the rackspacesdk plugi

238 Sep 21, 2022

RichWatch is wrapper around AWS Cloud Watch to display beautiful logs with help of Python library Rich.

RichWatch is TUI (Textual User Interface) for AWS Cloud Watch. It formats and pretty prints Cloud Watch's logs so they are much more readable. Because

21 Jul 25, 2022

Develop and deploy applications with the Ionburst Cloud Python SDK.

Ionburst SDK for Python The Ionburst SDK for Python enables developers to easily integrate with Ionburst Cloud, building in ultra-secure and private o

3 Mar 6, 2022

PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

PRAW: The Python Reddit API Wrapper PRAW, an acronym for "Python Reddit API Wrapper", is a Python package that allows for simple access to Reddit's AP

3k Dec 29, 2022

PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

PRAW: The Python Reddit API Wrapper PRAW, an acronym for "Python Reddit API Wrapper", is a Python package that allows for simple access to Reddit's AP

3k Dec 29, 2022

alpaca-trade-api-python is a python library for the Alpaca Commission Free Trading API.

alpaca-trade-api-python is a python library for the Alpaca Commission Free Trading API. It allows rapid trading algo development easily, with support for both REST and streaming data interfaces

1.5k Jan 9, 2023

WhatsApp Api Python - This documentation aims to exemplify the use of Moorse Whatsapp API in Python

WhatsApp API Python ChatBot Este repositório contém uma aplicação que se utiliza

3 Jan 8, 2022

Cloud-optimized, single-file archive format for pyramids of map tiles

PMTiles PMTiles is a single-file archive format for tiled data. A PMTiles archive can be hosted on a commodity storage platform such as S3, and enable

325 Jan 4, 2023

A little proxy tool based on Tencent Cloud Function Service.

SCFProxy 一个基于腾讯云函数服务的免费代理池。安装 python3 -m venv .venv source .venv/bin/activate pip3 install -r requirements.txt 项目配置函数配置开通腾讯云函数服务在函数服务 > 新建中使用自定义

716 Dec 26, 2022

This is a repository for the Duke University Cloud Computing course project on Serveless Data Engineering Pipeline. For this project, I recreated the below pipeline.

AWS Data Engineering Pipeline This is a repository for the Duke University Cloud Computing course project on Serverless Data Engineering Pipeline. For

15 Jul 28, 2021

WILSON Cloud Respwnder is a Web Interaction Logger Sending Out Notifications with the ability to serve custom content in order to appropriately respond to client-issued requests.

WILSON Cloud Respwnder What is this? WILSON Cloud Respwnder is a Web Interaction Logger Sending Out Notifications (WILSON) with the ability to serve c

48 Oct 31, 2022