pypinfo is a simple CLI to access PyPI download statistics via Google's BigQuery.

Ofek Lev

Last update: Dec 26, 2022

Related tags

Overview

pypinfo: View PyPI download statistics with ease.

https://img.shields.io/pypi/v/pypinfo.svg?style=flat-square

https://img.shields.io/pypi/pyversions/pypinfo.svg?style=flat-square

https://img.shields.io/badge/license-MIT-blue.svg?style=flat-square

https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square

pypinfo is a simple CLI to access PyPI download statistics via Google's BigQuery.

Installation

pypinfo is distributed on PyPI as a universal wheel and is available on Linux/macOS and Windows and supports Python 3.6+.

This is relatively painless, I swear.

Create project

Go to https://bigquery.cloud.google.com.
Sign up if you haven't already. The first TB of queried data each month is free. Each additional TB is $5.
Go to https://console.developers.google.com/cloud-resource-manager and click CREATE PROJECT if you don't already have one:
This takes you to https://console.developers.google.com/projectcreate. Fill out the form and click CREATE. Any name is fine, but I recommend you choose something to do with PyPI like pypinfo. This way you know what the project is designated for:
The next page should show your new project. If not, reload the page and select from the top menu:

Enable BigQuery API

Go to https://console.cloud.google.com/apis/api/bigquery-json.googleapis.com/overview and make sure the correct project is chosen using the drop-down on top. Click the ENABLE button:
After enabling, click CREATE CREDENTIALS:
Choose the "BigQuery API" and "No, I'm not using them":
Fill in a name, and select role "BigQuery User" (if the "BigQuery" is not an option in the list, wait 15-20 minutes and try creating the credentials again), and select a JSON key:
Click continue and the JSON will download to your computer. Note the download location. Move the file wherever you want:

pip install pypinfo
pypinfo --auth path/to/your_credentials.json, or set an environment variable GOOGLE_APPLICATION_CREDENTIALS that points to the file.

Usage

$ pypinfo
Usage: pypinfo [OPTIONS] [PROJECT] [FIELDS]... COMMAND [ARGS]...

  Valid fields are:

  project | version | file | pyversion | percent3 | percent2 | impl | impl-version |

  openssl | date | month | year | country | installer | installer-version |

  setuptools-version | system | system-release | distro | distro-version | cpu |

  libc | libc-version

Options:
  -a, --auth TEXT         Path to Google credentials JSON file.
  --run / --test          --test simply prints the query.
  -j, --json              Print data as JSON, with keys `rows` and `query`.
  -i, --indent INTEGER    JSON indentation level.
  -t, --timeout INTEGER   Milliseconds. Default: 120000 (2 minutes)
  -l, --limit TEXT        Maximum number of query results. Default: 10
  -d, --days TEXT         Number of days in the past to include. Default: 30
  -sd, --start-date TEXT  Must be negative or YYYY-MM[-DD]. Default: -31
  -ed, --end-date TEXT    Must be negative or YYYY-MM[-DD]. Default: -1
  -m, --month TEXT        Shortcut for -sd & -ed for a single YYYY-MM month.
  -w, --where TEXT        WHERE conditional. Default: file.project = "project"
  -o, --order TEXT        Field to order by. Default: download_count
  --all                   Show downloads by all installers, not only pip.
  -pc, --percent          Print percentages.
  -md, --markdown         Output as Markdown.
  -v, --verbose           Print debug messages to stderr.
  --version               Show the version and exit.
  -h, --help              Show this message and exit.

pypinfo accepts 0 or more options, followed by exactly 1 project, followed by 0 or more fields. By default only the last 30 days are queried. Let's take a look at some examples!

Tip: If queries are resulting in NoneType errors, increase timeout.

Downloads for a project

$ pypinfo requests
Served from cache: False
Data processed: 2.83 GiB
Data billed: 2.83 GiB
Estimated cost: $0.02

| download_count |
| -------------- |
|    116,353,535 |

All downloads

$ pypinfo ""
Served from cache: False
Data processed: 116.15 GiB
Data billed: 116.15 GiB
Estimated cost: $0.57

| download_count |
| -------------- |
|  8,642,447,168 |

Downloads for a project by Python version

$ pypinfo django pyversion
Served from cache: False
Data processed: 967.33 MiB
Data billed: 968.00 MiB
Estimated cost: $0.01

| python_version | download_count |
| -------------- | -------------- |
| 3.8            |      1,735,967 |
| 3.6            |      1,654,871 |
| 3.7            |      1,326,423 |
| 2.7            |        876,621 |
| 3.9            |        524,570 |
| 3.5            |        258,609 |
| 3.4            |         12,769 |
| 3.10           |          3,050 |
| 3.3            |            225 |
| 2.6            |            158 |
| Total          |      6,393,263 |

All downloads by country code

$ pypinfo "" country
Served from cache: False
Data processed: 150.40 GiB
Data billed: 150.40 GiB
Estimated cost: $0.74

| country | download_count |
| ------- | -------------- |
| US      |  6,614,473,568 |
| IE      |    336,037,059 |
| IN      |    192,914,402 |
| DE      |    186,968,946 |
| NL      |    182,691,755 |
| None    |    141,753,357 |
| BE      |    111,234,463 |
| GB      |    109,539,219 |
| SG      |    106,375,274 |
| FR      |     86,036,896 |
| Total   |  8,068,024,939 |

Downloads for a project by system and distribution

$ pypinfo cryptography system distro
Served from cache: False
Data processed: 2.52 GiB
Data billed: 2.52 GiB
Estimated cost: $0.02

| system_name | distro_name                     | download_count |
| ----------- | ------------------------------- | -------------- |
| Linux       | Ubuntu                          |     19,524,538 |
| Linux       | Debian GNU/Linux                |     11,662,104 |
| Linux       | Alpine Linux                    |      3,105,553 |
| Linux       | Amazon Linux AMI                |      2,427,975 |
| Linux       | Amazon Linux                    |      2,374,869 |
| Linux       | CentOS Linux                    |      1,955,181 |
| Windows     | None                            |      1,522,069 |
| Linux       | CentOS                          |        568,370 |
| Darwin      | macOS                           |        489,859 |
| Linux       | Red Hat Enterprise Linux Server |        296,858 |
| Total       |                                 |     43,927,376 |

Most popular projects in the past year

$ pypinfo --days 365 "" project
Served from cache: False
Data processed: 1.69 TiB
Data billed: 1.69 TiB
Estimated cost: $8.45

| project         | download_count |
| --------------- | -------------- |
| urllib3         |  1,382,528,406 |
| six             |  1,172,798,441 |
| botocore        |  1,053,169,690 |
| requests        |    995,387,353 |
| setuptools      |    992,794,567 |
| certifi         |    948,518,394 |
| python-dateutil |    934,709,454 |
| idna            |    929,781,443 |
| s3transfer      |    877,565,186 |
| chardet         |    854,744,674 |
| Total           | 10,141,997,608 |

Downloads between two YYYY-MM-DD dates

$ pypinfo --start-date 2018-04-01 --end-date 2018-04-30 setuptools
Served from cache: False
Data processed: 571.37 MiB
Data billed: 572.00 MiB
Estimated cost: $0.01

| download_count |
| -------------- |
|      8,972,826 |

Downloads between two YYYY-MM dates

A yyyy-mm --start-date defaults to the first day of the month
A yyyy-mm --end-date defaults to the last day of the month

$ pypinfo --start-date 2018-04 --end-date 2018-04 setuptools
Served from cache: False
Data processed: 571.37 MiB
Data billed: 572.00 MiB
Estimated cost: $0.01

| download_count |
| -------------- |
|      8,972,826 |

Downloads for a single YYYY-MM month

$ pypinfo --month 2018-04 setuptools
Served from cache: False
Data processed: 571.37 MiB
Data billed: 572.00 MiB
Estimated cost: $0.01

| download_count |
| -------------- |
|      8,972,826 |

Percentage of Python 3 downloads of the top 100 projects in the past year

Let's use --test to only see the query instead of sending it.

$ pypinfo --test --days 365 --limit 100 "" project percent3
SELECT
  file.project as project,
  ROUND(100 * SUM(CASE WHEN REGEXP_EXTRACT(details.python, r"^([^\.]+)") = "3" THEN 1 ELSE 0 END) / COUNT(*), 1) as percent_3,
  COUNT(*) as download_count,
FROM `bigquery-public-data.pypi.file_downloads`
WHERE timestamp BETWEEN TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -366 DAY) AND TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY)
  AND details.installer.name = "pip"
GROUP BY
  project
ORDER BY
  download_count DESC
LIMIT 100

Credits

Donald Stufft for maintaining PyPI all these years.
Google for donating BigQuery capacity to PyPI.
Paul Kehrer for his awesome blog post.

Changelog

Important changes are emphasized.

Unreleased

19.0.0

Update dataset to the new Google-hosted location

18.0.1

Fix usage of date ranges

18.0.0

Use the clustered data table and standard SQL for lower query costs

17.0.0

Add support for libc & libc-version fields

16.0.2

Update TinyDB and Tinyrecord dependencies for compatibility

16.0.1

Pin TinyDB<4, Tinyrecord does not yet support TinyDB v4

16.0.0

Allow yyyy-mm[-dd] --start-date and --end-date:
- A yyyy-mm --start-date defaults to the first day of the month
- A yyyy-mm --end-date defaults to the last day of the month
Add --month as a shortcut to --start-date and --end-date for a single yyyy-mm month
Add --verbose option to print credentials location
Update installation instructions
Enforce black code style

15.0.0

Allow yyyy-mm-dd dates
Add --all option, default to only showing downloads via pip
Add download total row

14.0.0

Added new file field!

13.0.0

Added last_update JSON key, which is a UTC timestamp.

12.0.0

Breaking: JSON output is now a mapping with keys rows, which is all the data that was previously outputted, and query, which is relevant metadata.
Increased the resolution of percentages.

11.0.0

Fixed JSON output.

10.0.0

Fixed custom field ordering.

9.0.0

Added new BigQuery usage stats.
Lowered the default number of results to 10 from 20.
Updated examples.
Fixed table formatting regression.

8.0.0

Updated google-cloud-bigquery dependency.

7.0.0

Output table is now in Markdown format for easy copying to GitHub issues and PRs.

6.0.0

Updated google-cloud-bigquery dependency.

5.0.0

Numeric output (non-json) is now prettier (thanks hugovk)
You can now filter results for only pip installs with the --pip flag (thanks hugovk)

4.0.0

--order now works with all fields (thanks Brian Skinn)
Updated installation docs (thanks Brian Skinn)

3.0.1

Fix: project names are now normalized to adhere to PEP 503.

3.0.0

Breaking: --json option is now just a flag and prints output as prettified JSON.

2.0.0

Added --json path option.

1.0.0

Initial release

Comments

Exact definition for the download count

Hi There,

thanks for this package, very helpful. It's unclear to me exactly what is being output by this tool? Is it a sum of download counts from various sources? How does that differ from "pip only" option -p? Is there a way to get unique download count etc? Some more details would be helpful.

The default output for my packages is almost 6-10 times higher than if use the --pip option - any idea on why is that? I only ever recommended people to use my package via pip install. Although there are some dev, who clone and install locally outside pip, that number is likely very small. So the --pip option should closely match the default count (unless there is lot more going on which I don't understand).

Also, if I would like to estimate "usage" (which is a higher bar from download) from this, would it make sense?

Thanks for your help.

opened by raamana 15

'Client' object has no attribute 'run_sync_query'

First of all, thank you so much for building this package. I'm excited to start using it, but it has been crashing for me on every query run:

coding@Aarons-MBP pypinfo (master)$ pypinfo django date
Traceback (most recent call last):
  File "/usr/local/bin/pypinfo", line 11, in <module>
    load_entry_point('pypinfo==5.0.0', 'console_scripts', 'pypinfo')()
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1043, in invoke
    return Command.invoke(self, ctx)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pypinfo/cli.py", line 89, in pypinfo
    query = client.run_sync_query(built_query)
AttributeError: 'Client' object has no attribute 'run_sync_query'

I tried python2, updating gcloud, etc. to no avail.

opened by aaronjanse 14

Try to filter out mirror and bot downloads by default

Hi,

I tried your project and thik it's really cool. I maintain a small project with few download. When I queried the dataset for the last year it said 20k downloads. This most likely includes spiders and PyPI mirrors. For this to be useful for smaller projects it would be great to try and filter these things out by default.

I for instance have downloads from Python version 1.17, which does not seem right. The majority of my downloads come from the system None (which I assume happens when downloaded over HTTP by clicking the link on the website. Maybe a baseline could be computed based on the minimum number of downloads for any project during the same time interval?

opened by runfalk 14

google.api_core.exceptions.BadRequest: 400 FROM clause with table wildcards matches no table

Hi Everyone, I am getting this error despite being on the latest version:

Installing collected packages: pypinfo
Successfully installed pypinfo-19.0.0

WARNING: You are using pip version 21.1.2; however, version 21.2.4 is available.
You should consider upgrading via the '/Users/Reddy/anaconda3/envs/py36/bin/python -m pip install --upgrade pip' command.

(base) $ 16:37:33 Quark doc >>  pypinfo confounds
Traceback (most recent call last):
  File "/Users/Reddy/anaconda3/bin/pypinfo", line 11, in <module>
    sys.exit(pypinfo())
  File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/click/core.py", line 1114, in invoke
    return Command.invoke(self, ctx)
  File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/pypinfo/cli.py", line 150, in pypinfo
    query_rows = query_job.result(timeout=timeout // 1000)
  File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/google/cloud/bigquery/job.py", line 2762, in result
    super(QueryJob, self).result(timeout=timeout)
  File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/google/cloud/bigquery/job.py", line 703, in result
    return super(_AsyncJob, self).result(timeout=timeout)
  File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/google/api_core/future/polling.py", line 127, in result
    raise self._exception
google.api_core.exceptions.BadRequest: 400 FROM clause with table wildcards matches no table
(base) $ 16:37:43 Quark doc >>

appreciate your help or fix!

Originally posted by @raamana in https://github.com/ofek/pypinfo/issues/114#issuecomment-905857544

opened by hugovk 13

Target for "Generating a Private Key" is missing

Hi there. The installation instructions in your README includes the following link in step 5:

https://cloud.google.com/storage/docs/authentication#generating-a-private-key

This target does not appear to exist. As such, I am unsure how to "create credentials in JSON format."

Thanks, Tom

opened by tomduck 11
wrong installation hints
This is relatively painless, I swear.

Well...

During creation, choose BigQuery User as role.

First off, which of

API key

OAuth client ID

Service accout key

do I choose? I suppose Service account key since this is the only one that takes me to another screen with role in it. There's no such rule as BigQuery User however.
opened by nschloe 8

No matching signature for function DATE_ADD for argument types: TIMESTAMP, INTERVAL INT64 DATE_TIME_PART

Hi,

After installation following instructions given in the README.md, I tried a simplest call:

pypinfo requests

Unfortunately getting following exception:

...
  File "/Users/christianr/.venv/sett/lib/python3.9/site-packages/pypinfo/cli.py", line 166, in pypinfo
    query_rows = query_job.result(timeout=timeout // 1000)
  File "/Users/christianr/.venv/sett/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py", line 1160, in result
    super(QueryJob, self).result(retry=retry, timeout=timeout)
  File "/Users/christianr/.venv/sett/lib/python3.9/site-packages/google/cloud/bigquery/job/base.py", line 631, in result
    return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
  File "/Users/christianr/.venv/sett/lib/python3.9/site-packages/google/api_core/future/polling.py", line 134, in result
    raise self._exception
google.api_core.exceptions.BadRequest: 400 No matching signature for function DATE_ADD for argument types: TIMESTAMP, INTERVAL INT64 DATE_TIME_PART. Supported signature: DATE_ADD(DATE, INTERVAL INT64 DATE_TIME_PART) at [5:25]

(job ID: 947e6084-e5e6-4bb5-ae12-bb8faad8ec0b)

                                                 -----Query Job SQL Follows-----

    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:SELECT
   2:  FORMAT_TIMESTAMP("%Y", timestamp) as download_year,
   3:  COUNT(*) as download_count,
   4:FROM `the-psf.pypi.file_downloads`
   5:WHERE timestamp BETWEEN DATE_ADD(CURRENT_TIMESTAMP(), INTERVAL -1826 DAY) AND DATE_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY)
   6:  AND file.project = "blist"
   7:  AND details.installer.name = "pip"
   8:GROUP BY
   9:  download_year
  10:ORDER BY
  11:  download_count DESC
  12:LIMIT 10
    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |

Using the SQL query in the BigQuery SQL workspace pops up the same error message if I use the public PyPI download statistics dataset.

My environment:

pypinfo, version 18.0.0
Python 3.9.1

opened by ribeaud 7

perf: use clustered table and standard SQL for lower query costs

By using the clustered data table, performance if improved because BigQuery can skip of data that doesn't match the desired project.

Tested locally:

$ pypinfo google-cloud-bigquery        Served from cache: False
Data processed: 740.43 MiB
Data billed: 741.00 MiB
Estimated cost: $0.01

| download_count |
| -------------- |
|     10,149,146 |

Tested locally with all supported fields:

$ pypinfo google-cloud-bigquery project version file pyversion percent3 percent2 impl impl-version openssl date month year country installer installer-version setuptools-version system system-release distro distro-version cpu libc libc-version
Served from cache: False
Data processed: 4.62 GiB
Data billed: 4.62 GiB
Estimated cost: $0.03

| project               | version | file                                              | python_version | percent_3 | percent_2 | implementation | impl_version | openssl_version | download_date | download_month | download_year | country | installer_name | installer_version | setuptools_version | system_name | system_release    | distro_name      | distro_version | cpu    | libc_name | libc_version | download_count |
| --------------------- | ------- | ------------------------------------------------- | -------------- | --------- | --------- | -------------- | ------------ | --------------- | ------------- | -------------- | ------------- | ------- | -------------- | ----------------- | ------------------ | ----------- | ----------------- | ---------------- | -------------- | ------ | --------- | ------------ | -------------- |
| google-cloud-bigquery | 1.24.0  | google_cloud_bigquery-1.24.0-py2.py3-none-any.whl | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.0l          | 2020-12-15    | 2020-12        |         2,020 | US      | pip            | 20.0.2            | 45.1.0             | Linux       | 5.4.49+           | Debian GNU/Linux |              9 | x86_64 | glibc     | 2.24         |        171,636 |
| google-cloud-bigquery | 1.24.0  | google_cloud_bigquery-1.24.0-py2.py3-none-any.whl | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.0l          | 2020-12-13    | 2020-12        |         2,020 | US      | pip            | 20.0.2            | 45.1.0             | Linux       | 5.4.49+           | Debian GNU/Linux |              9 | x86_64 | glibc     | 2.24         |        160,152 |
| google-cloud-bigquery | 2.6.1   | google_cloud_bigquery-2.6.1-py2.py3-none-any.whl  | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.1           | 2021-01-07    | 2021-01        |         2,021 | IN      | pip            | 20.0.2            | 45.2.0             | Linux       | 4.15.0-1092-azure | Ubuntu           |          18.04 | x86_64 | glibc     | 2.27         |        136,529 |
| google-cloud-bigquery | 2.6.1   | google_cloud_bigquery-2.6.1-py2.py3-none-any.whl  | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.1           | 2021-01-06    | 2021-01        |         2,021 | IN      | pip            | 20.0.2            | 45.2.0             | Linux       | 4.15.0-1092-azure | Ubuntu           |          18.04 | x86_64 | glibc     | 2.27         |        133,927 |
| google-cloud-bigquery | 1.24.0  | google_cloud_bigquery-1.24.0-py2.py3-none-any.whl | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.0l          | 2020-12-14    | 2020-12        |         2,020 | US      | pip            | 20.0.2            | 45.1.0             | Linux       | 5.4.49+           | Debian GNU/Linux |              9 | x86_64 | glibc     | 2.24         |        130,400 |
| google-cloud-bigquery | 2.6.1   | google_cloud_bigquery-2.6.1-py2.py3-none-any.whl  | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.1           | 2021-01-05    | 2021-01        |         2,021 | IN      | pip            | 20.0.2            | 45.2.0             | Linux       | 4.15.0-1092-azure | Ubuntu           |          18.04 | x86_64 | glibc     | 2.27         |        129,844 |
| google-cloud-bigquery | 2.6.1   | google_cloud_bigquery-2.6.1-py2.py3-none-any.whl  | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.1           | 2021-01-09    | 2021-01        |         2,021 | IN      | pip            | 20.0.2            | 45.2.0             | Linux       | 4.15.0-1050-azure | Ubuntu           |          18.04 | x86_64 | glibc     | 2.27         |        126,544 |
| google-cloud-bigquery | 2.6.1   | google_cloud_bigquery-2.6.1-py2.py3-none-any.whl  | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.1           | 2021-01-04    | 2021-01        |         2,021 | IN      | pip            | 20.0.2            | 45.2.0             | Linux       | 4.15.0-1092-azure | Ubuntu           |          18.04 | x86_64 | glibc     | 2.27         |        124,986 |
| google-cloud-bigquery | 2.6.1   | google_cloud_bigquery-2.6.1-py2.py3-none-any.whl  | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.1           | 2020-12-30    | 2020-12        |         2,020 | IN      | pip            | 20.0.2            | 45.2.0             | Linux       | 4.15.0-1092-azure | Ubuntu           |          18.04 | x86_64 | glibc     | 2.27         |        121,202 |
| google-cloud-bigquery | 2.6.1   | google_cloud_bigquery-2.6.1-py2.py3-none-any.whl  | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.1           | 2021-01-10    | 2021-01        |         2,021 | IN      | pip            | 20.0.2            | 45.2.0             | Linux       | 4.15.0-1050-azure | Ubuntu           |          18.04 | x86_64 | glibc     | 2.27         |        121,104 |
| Total                 |         |                                                   |                |           |           |                |              |                 |               |                |               |         |                |                   |                    |             |                   |                  |                |        |           |              |      1,356,324 |

Closes #64

opened by tswast 7

sytematic error when i use --auth

When i run pypinfo --auth key.json with the BigQuery key i have the given answer :

$ pypinfo --auth key.json
Traceback (most recent call last):
  File "~/.local/bin/pypinfo", line 8, in <module>
    sys.exit(pypinfo())
  File "~/.local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "~/.local/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "~/.local/lib/python3.8/site-packages/click/core.py", line 1236, in invoke
    return Command.invoke(self, ctx)
  File "~/.local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "~/.local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "~/.local/lib/python3.8/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "~/.local/lib/python3.8/site-packages/pypinfo/cli.py", line 119, in pypinfo
    set_credentials(auth)
  File "~/.local/lib/python3.8/site-packages/pypinfo/db.py", line 25, in set_credentials
    tr.insert({'path': creds_file})
  File "~/.local/lib/python3.8/site-packages/tinyrecord/transaction.py", line 86, in __exit__
    self.record.execute()
  File "~/.local/lib/python3.8/site-packages/tinyrecord/changeset.py", line 21, in execute
    data = self.db._read()
AttributeError: 'Table' object has no attribute '_read'

There is probably something about tinydb package

bug dependencies

opened by Pacidus 7

Right-align Markdown
Simply adding a colon will align columns nicely in Markdown.

See https://help.github.com/articles/organizing-information-with-tables/#formatting-content-within-your-table

Builds on #23 by including a unit test for this change (eg.). This makes things much easier to develop without going over your quota!

Before

As plaintext:

| python_version | percent | download_count | | -------------- | ------- | -------------- | | 2.7 | 59.3% | 3,048 | | 3.5 | 20.9% | 1,074 | | 3.6 | 14.3% | 734 | | 3.4 | 5.4% | 279 | | 2.6 | 0.1% | 4 |

As rendered Markdown:

| python_version | percent | download_count | | -------------- | ------- | -------------- | | 2.7 | 59.3% | 3,048 | | 3.5 | 20.9% | 1,074 | | 3.6 | 14.3% | 734 | | 3.4 | 5.4% | 279 | | 2.6 | 0.1% | 4 |

After

As plaintext:

| python_version | percent | download_count | | -------------- | ------: | -------------: | | 2.7 | 59.3% | 3,048 | | 3.5 | 20.9% | 1,074 | | 3.6 | 14.3% | 734 | | 3.4 | 5.4% | 279 | | 2.6 | 0.1% | 4 |

As rendered Markdown:

| python_version | percent | download_count | | -------------- | ------: | -------------: | | 2.7 | 59.3% | 3,048 | | 3.5 | 20.9% | 1,074 | | 3.6 | 14.3% | 734 | | 3.4 | 5.4% | 279 | | 2.6 | 0.1% | 4 |
opened by hugovk 7
Disclose BigTable Costs in Readme

After spending a bunch of time setting up the GCP account, and looking at usage, it seems to be very expensive.

After the free tier is consumed and the 90day $300 credit expires, BigQuery costs $5 per TB of data queried. Querying the simple_requests table which is 49.77 TB would cost $250 per query.

A query of the distribution_metadata table (18.8GB) would cost $0.094 per query. Again very expensive..

Placing the data in AWS S3 using S3 Select cost about the same, still very expensive.

I think this should be fully disclosed in the readme file

opened by garymazz 6
Non-normalised package name

I would like to use the data to correlate with openSUSE package names, which use the 'real' name supplied in setup.py, i.e. not-normalised.

I've been doing a bit of research at hugovk/top-pypi-packages#4 and https://github.com/psincraian/pepy/issues/128, and the raw data from bigquery can include this, with a very small perf hit.

The query only needs to change from selecting file.project to substr(max(file.filename),1,LENGTH(file.project)) , or more likely including both.

Note this does depend on using standard SQL ( https://github.com/ofek/pypinfo/issues/28 ).

Do we know the cost implications of those changes?

opened by jayvdb 3
feature request - trend support

Hello,

It would be great to be able to tell project tendencies. E.g are my monthly usage going up or down? This is already possible with playing around the start and end date, but what about providing a builtin way of this, that does not require shell scripting it?

opened by gaborbernat 1
How to check current BigQuery usage
Whilst making https://github.com/ofek/pypinfo/pull/29 I got this:

raise exceptions.from_http_response(response) google.api_core.exceptions.Forbidden: 403 GET https://www.googleapis.com/bigquery/v2/projects/pypinfo-hugovk/queries/<snip>?maxResults=0&timeoutMs=10000: Quota exceeded: Your project exceeded quota for free query bytes scanned. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors

I can't find how much of my free monthly TB I've used up on the Google console.

After some digging in the console I found this:

https://console.cloud.google.com/home/activity?project=pypinfo-hugovk&authuser=1

Linking to this:

https://console.cloud.google.com/iam-admin/quotas?project=pypinfo-hugovk&authuser=1

Which isn't very informative -- all zeroes and dashes!

I get the error after doing several pypinfo --percent --pip pypinfo pyversion, but pypinfo --percent --pip -d 1 pypinfo is still fine.

Any idea what "QUERY-MBYTES-FOR-UNBILLED-PROJECTS-per-project" really means?

And where to check the monthly 1TB quota?

Thanks!
opened by hugovk 6

Owner

Ofek Lev

I like developing beautiful APIs.

GitHub

Command-line interface to PyPI Stats API to get download stats for Python packages

pypistats Python 3.6+ interface to PyPI Stats API to get aggregate download statistics on Python packages on the Python Package Index without having t

140 Jan 3, 2023

Python CLI vm manager for remote access of docker images via noVNC

vmman is a tool to quickly boot and view docker-based VMs running on a linux server through noVNC without ssh tunneling on another network.

1 Nov 29, 2021

A simple weather tool. I made this as a way for me to learn Python, API, and PyPi packaging.

105 Dec 31, 2022

Install python modules from pypi from a previous date in history

pip-rewind is a command-line tool that can rewind pypi module versions (given as command-line arguments or read from a requirements.txt file) to a previous date in time.

4 Jul 3, 2021

CLI utility to search and download torrents from major torrent sites

CLI Torrent Downloader About CLI Torrent Downloader provides convenient and quick way to search torrent magnet links (and to run associated torrent cl

86 Dec 19, 2022

Tarstats - A simple Python commandline application that collects statistics about tarfiles

A simple Python commandline application that collects statistics about tarfiles.

13 Feb 20, 2022

Sink is a CLI tool that allows users to synchronize their local folders to their Google Drives. It is similar to the Git CLI and allows fast and reliable syncs with the drive.

Sink is a CLI synchronisation tool that enables a user to synchronise local system files and folders with their Google Drives. It follows a git C

16 May 29, 2022

flora-dev-cli (fd-cli) is command line interface software to interact with flora blockchain.

Install git clone https://github.com/Flora-Network/fd-cli.git cd fd-cli python3 -m venv venv source venv/bin/activate pip install -e . --extra-index-u

14 Sep 11, 2022

AWS Interactive CLI - Allows you to execute a complex AWS commands by chaining one or more other AWS CLI dependency

2 Dec 10, 2021

Python-Stock-Info-CLI: Get stock info through CLI by passing stock ticker.

Python-Stock-Info-CLI Get stock info through CLI by passing stock ticker. Installation Use the following command to install the required modules at on

1 Nov 5, 2021

Yts-cli-streamer - A CLI movie streaming client which works on yts.mx API written in python

YTSP It is a CLI movie streaming client which works on yts.mx API written in pyt

1 Feb 5, 2022

Keybase-cli - Keybase docker container that exposes the keybase CLI and some common commands such as getting files or loading github action secrets

keybase-cli Keybase docker container that exposes the keybase CLI and some commo

4 Aug 4, 2022

[WIP]An ani-cli like cli tool for movies and webseries

mov-cli A cli to browse and watch movies. Installation This project is a work in progress. However, you can try it out python git clone https://github

166 Dec 30, 2022

Customisable pharmacokinetic model accessible via bash CLI allowing for variable dose calculations as well as intravenous and subcutaneous administration calculations

Pharmacokinetic Modelling Group Project A PharmacoKinetic (PK) modelling function for analysis of injected solute dynamics over time, developed by Gro

1 Oct 24, 2021

pypinfo is a simple CLI to access PyPI download statistics via Google's BigQuery.

Related tags

Overview

pypinfo: View PyPI download statistics with ease.

Installation

Usage

Downloads for a project

All downloads

Downloads for a project by Python version

All downloads by country code

Downloads for a project by system and distribution

Most popular projects in the past year

Downloads between two YYYY-MM-DD dates

Downloads between two YYYY-MM dates

Downloads for a single YYYY-MM month

Percentage of Python 3 downloads of the top 100 projects in the past year

Credits

Changelog

Unreleased

19.0.0

18.0.1

18.0.0

17.0.0

16.0.2

16.0.1

16.0.0

15.0.0

14.0.0

13.0.0

12.0.0

11.0.0

10.0.0

9.0.0

8.0.0

7.0.0

6.0.0

5.0.0

4.0.0

3.0.1

3.0.0

2.0.0

1.0.0

Comments

Before

After

Owner

Ofek Lev

Command-line interface to PyPI Stats API to get download stats for Python packages

Python CLI vm manager for remote access of docker images via noVNC

A simple weather tool. I made this as a way for me to learn Python, API, and PyPi packaging.

Install python modules from pypi from a previous date in history

CLI utility to search and download torrents from major torrent sites

Tarstats - A simple Python commandline application that collects statistics about tarfiles

Sink is a CLI tool that allows users to synchronize their local folders to their Google Drives. It is similar to the Git CLI and allows fast and reliable syncs with the drive.

flora-dev-cli (fd-cli) is command line interface software to interact with flora blockchain.

AWS Interactive CLI - Allows you to execute a complex AWS commands by chaining one or more other AWS CLI dependency

Python-Stock-Info-CLI: Get stock info through CLI by passing stock ticker.

Yts-cli-streamer - A CLI movie streaming client which works on yts.mx API written in python

Keybase-cli - Keybase docker container that exposes the keybase CLI and some common commands such as getting files or loading github action secrets

[WIP]An ani-cli like cli tool for movies and webseries

Customisable pharmacokinetic model accessible via bash CLI allowing for variable dose calculations as well as intravenous and subcutaneous administration calculations

Standalone Tailwind CSS CLI, installable via pip

topalias - Linux alias generator from bash/zsh command history with statistics, written on Python.

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

Salesforce object access auditor

Access hacksec.in from your command-line