pypinfo is a simple CLI to access PyPI download statistics via Google's BigQuery.

Overview

pypinfo: View PyPI download statistics with ease.

https://img.shields.io/pypi/v/pypinfo.svg?style=flat-square https://img.shields.io/pypi/pyversions/pypinfo.svg?style=flat-square https://img.shields.io/badge/license-MIT-blue.svg?style=flat-square https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square

pypinfo is a simple CLI to access PyPI download statistics via Google's BigQuery.

Installation

pypinfo is distributed on PyPI as a universal wheel and is available on Linux/macOS and Windows and supports Python 3.6+.

This is relatively painless, I swear.

Create project

  1. Go to https://bigquery.cloud.google.com.

  2. Sign up if you haven't already. The first TB of queried data each month is free. Each additional TB is $5.

  3. Go to https://console.developers.google.com/cloud-resource-manager and click CREATE PROJECT if you don't already have one:

    https://user-images.githubusercontent.com/1324225/47172949-6f4ea880-d315-11e8-8587-8b8117efeae9.png
  4. This takes you to https://console.developers.google.com/projectcreate. Fill out the form and click CREATE. Any name is fine, but I recommend you choose something to do with PyPI like pypinfo. This way you know what the project is designated for:

    https://user-images.githubusercontent.com/1324225/47173020-986f3900-d315-11e8-90ab-4b2ecd85b88e.png
  5. The next page should show your new project. If not, reload the page and select from the top menu:

    https://user-images.githubusercontent.com/1324225/47173170-0b78af80-d316-11e8-879e-01f34e139b80.png

Enable BigQuery API

  1. Go to https://console.cloud.google.com/apis/api/bigquery-json.googleapis.com/overview and make sure the correct project is chosen using the drop-down on top. Click the ENABLE button:

    https://user-images.githubusercontent.com/1324225/47173408-a6718980-d316-11e8-94c2-a17ff54fc389.png
  2. After enabling, click CREATE CREDENTIALS:

    https://user-images.githubusercontent.com/1324225/47173432-bc7f4a00-d316-11e8-8152-6a0e6cfab70f.png
  3. Choose the "BigQuery API" and "No, I'm not using them":

    https://user-images.githubusercontent.com/1324225/47173510-ec2e5200-d316-11e8-8508-2bfbb8f6b02f.png
  4. Fill in a name, and select role "BigQuery User" (if the "BigQuery" is not an option in the list, wait 15-20 minutes and try creating the credentials again), and select a JSON key:

    https://user-images.githubusercontent.com/1324225/47173576-18e26980-d317-11e8-8bfe-e4775d965e32.png
  5. Click continue and the JSON will download to your computer. Note the download location. Move the file wherever you want:

https://user-images.githubusercontent.com/1324225/47173614-331c4780-d317-11e8-9ed2-fc76557a2bf6.png
  1. pip install pypinfo
  2. pypinfo --auth path/to/your_credentials.json, or set an environment variable GOOGLE_APPLICATION_CREDENTIALS that points to the file.

Usage

$ pypinfo
Usage: pypinfo [OPTIONS] [PROJECT] [FIELDS]... COMMAND [ARGS]...

  Valid fields are:

  project | version | file | pyversion | percent3 | percent2 | impl | impl-version |

  openssl | date | month | year | country | installer | installer-version |

  setuptools-version | system | system-release | distro | distro-version | cpu |

  libc | libc-version

Options:
  -a, --auth TEXT         Path to Google credentials JSON file.
  --run / --test          --test simply prints the query.
  -j, --json              Print data as JSON, with keys `rows` and `query`.
  -i, --indent INTEGER    JSON indentation level.
  -t, --timeout INTEGER   Milliseconds. Default: 120000 (2 minutes)
  -l, --limit TEXT        Maximum number of query results. Default: 10
  -d, --days TEXT         Number of days in the past to include. Default: 30
  -sd, --start-date TEXT  Must be negative or YYYY-MM[-DD]. Default: -31
  -ed, --end-date TEXT    Must be negative or YYYY-MM[-DD]. Default: -1
  -m, --month TEXT        Shortcut for -sd & -ed for a single YYYY-MM month.
  -w, --where TEXT        WHERE conditional. Default: file.project = "project"
  -o, --order TEXT        Field to order by. Default: download_count
  --all                   Show downloads by all installers, not only pip.
  -pc, --percent          Print percentages.
  -md, --markdown         Output as Markdown.
  -v, --verbose           Print debug messages to stderr.
  --version               Show the version and exit.
  -h, --help              Show this message and exit.

pypinfo accepts 0 or more options, followed by exactly 1 project, followed by 0 or more fields. By default only the last 30 days are queried. Let's take a look at some examples!

Tip: If queries are resulting in NoneType errors, increase timeout.

Downloads for a project

$ pypinfo requests
Served from cache: False
Data processed: 2.83 GiB
Data billed: 2.83 GiB
Estimated cost: $0.02

| download_count |
| -------------- |
|    116,353,535 |

All downloads

$ pypinfo ""
Served from cache: False
Data processed: 116.15 GiB
Data billed: 116.15 GiB
Estimated cost: $0.57

| download_count |
| -------------- |
|  8,642,447,168 |

Downloads for a project by Python version

$ pypinfo django pyversion
Served from cache: False
Data processed: 967.33 MiB
Data billed: 968.00 MiB
Estimated cost: $0.01

| python_version | download_count |
| -------------- | -------------- |
| 3.8            |      1,735,967 |
| 3.6            |      1,654,871 |
| 3.7            |      1,326,423 |
| 2.7            |        876,621 |
| 3.9            |        524,570 |
| 3.5            |        258,609 |
| 3.4            |         12,769 |
| 3.10           |          3,050 |
| 3.3            |            225 |
| 2.6            |            158 |
| Total          |      6,393,263 |

All downloads by country code

$ pypinfo "" country
Served from cache: False
Data processed: 150.40 GiB
Data billed: 150.40 GiB
Estimated cost: $0.74

| country | download_count |
| ------- | -------------- |
| US      |  6,614,473,568 |
| IE      |    336,037,059 |
| IN      |    192,914,402 |
| DE      |    186,968,946 |
| NL      |    182,691,755 |
| None    |    141,753,357 |
| BE      |    111,234,463 |
| GB      |    109,539,219 |
| SG      |    106,375,274 |
| FR      |     86,036,896 |
| Total   |  8,068,024,939 |

Downloads for a project by system and distribution

$ pypinfo cryptography system distro
Served from cache: False
Data processed: 2.52 GiB
Data billed: 2.52 GiB
Estimated cost: $0.02

| system_name | distro_name                     | download_count |
| ----------- | ------------------------------- | -------------- |
| Linux       | Ubuntu                          |     19,524,538 |
| Linux       | Debian GNU/Linux                |     11,662,104 |
| Linux       | Alpine Linux                    |      3,105,553 |
| Linux       | Amazon Linux AMI                |      2,427,975 |
| Linux       | Amazon Linux                    |      2,374,869 |
| Linux       | CentOS Linux                    |      1,955,181 |
| Windows     | None                            |      1,522,069 |
| Linux       | CentOS                          |        568,370 |
| Darwin      | macOS                           |        489,859 |
| Linux       | Red Hat Enterprise Linux Server |        296,858 |
| Total       |                                 |     43,927,376 |

Most popular projects in the past year

$ pypinfo --days 365 "" project
Served from cache: False
Data processed: 1.69 TiB
Data billed: 1.69 TiB
Estimated cost: $8.45

| project         | download_count |
| --------------- | -------------- |
| urllib3         |  1,382,528,406 |
| six             |  1,172,798,441 |
| botocore        |  1,053,169,690 |
| requests        |    995,387,353 |
| setuptools      |    992,794,567 |
| certifi         |    948,518,394 |
| python-dateutil |    934,709,454 |
| idna            |    929,781,443 |
| s3transfer      |    877,565,186 |
| chardet         |    854,744,674 |
| Total           | 10,141,997,608 |

Downloads between two YYYY-MM-DD dates

$ pypinfo --start-date 2018-04-01 --end-date 2018-04-30 setuptools
Served from cache: False
Data processed: 571.37 MiB
Data billed: 572.00 MiB
Estimated cost: $0.01

| download_count |
| -------------- |
|      8,972,826 |

Downloads between two YYYY-MM dates

  • A yyyy-mm --start-date defaults to the first day of the month
  • A yyyy-mm --end-date defaults to the last day of the month
$ pypinfo --start-date 2018-04 --end-date 2018-04 setuptools
Served from cache: False
Data processed: 571.37 MiB
Data billed: 572.00 MiB
Estimated cost: $0.01

| download_count |
| -------------- |
|      8,972,826 |

Downloads for a single YYYY-MM month

$ pypinfo --month 2018-04 setuptools
Served from cache: False
Data processed: 571.37 MiB
Data billed: 572.00 MiB
Estimated cost: $0.01

| download_count |
| -------------- |
|      8,972,826 |

Percentage of Python 3 downloads of the top 100 projects in the past year

Let's use --test to only see the query instead of sending it.

$ pypinfo --test --days 365 --limit 100 "" project percent3
SELECT
  file.project as project,
  ROUND(100 * SUM(CASE WHEN REGEXP_EXTRACT(details.python, r"^([^\.]+)") = "3" THEN 1 ELSE 0 END) / COUNT(*), 1) as percent_3,
  COUNT(*) as download_count,
FROM `bigquery-public-data.pypi.file_downloads`
WHERE timestamp BETWEEN TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -366 DAY) AND TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY)
  AND details.installer.name = "pip"
GROUP BY
  project
ORDER BY
  download_count DESC
LIMIT 100

Credits

Changelog

Important changes are emphasized.

Unreleased

19.0.0

  • Update dataset to the new Google-hosted location

18.0.1

  • Fix usage of date ranges

18.0.0

  • Use the clustered data table and standard SQL for lower query costs

17.0.0

  • Add support for libc & libc-version fields

16.0.2

  • Update TinyDB and Tinyrecord dependencies for compatibility

16.0.1

  • Pin TinyDB<4, Tinyrecord does not yet support TinyDB v4

16.0.0

  • Allow yyyy-mm[-dd] --start-date and --end-date:
    • A yyyy-mm --start-date defaults to the first day of the month
    • A yyyy-mm --end-date defaults to the last day of the month
  • Add --month as a shortcut to --start-date and --end-date for a single yyyy-mm month
  • Add --verbose option to print credentials location
  • Update installation instructions
  • Enforce black code style

15.0.0

  • Allow yyyy-mm-dd dates
  • Add --all option, default to only showing downloads via pip
  • Add download total row

14.0.0

  • Added new file field!

13.0.0

  • Added last_update JSON key, which is a UTC timestamp.

12.0.0

  • Breaking: JSON output is now a mapping with keys rows, which is all the data that was previously outputted, and query, which is relevant metadata.
  • Increased the resolution of percentages.

11.0.0

  • Fixed JSON output.

10.0.0

  • Fixed custom field ordering.

9.0.0

  • Added new BigQuery usage stats.
  • Lowered the default number of results to 10 from 20.
  • Updated examples.
  • Fixed table formatting regression.

8.0.0

  • Updated google-cloud-bigquery dependency.

7.0.0

  • Output table is now in Markdown format for easy copying to GitHub issues and PRs.

6.0.0

  • Updated google-cloud-bigquery dependency.

5.0.0

  • Numeric output (non-json) is now prettier (thanks hugovk)
  • You can now filter results for only pip installs with the --pip flag (thanks hugovk)

4.0.0

3.0.1

  • Fix: project names are now normalized to adhere to PEP 503.

3.0.0

  • Breaking: --json option is now just a flag and prints output as prettified JSON.

2.0.0

  • Added --json path option.

1.0.0

  • Initial release
Comments
  • Exact definition for the download count

    Exact definition for the download count

    Hi There,

    thanks for this package, very helpful. It's unclear to me exactly what is being output by this tool? Is it a sum of download counts from various sources? How does that differ from "pip only" option -p? Is there a way to get unique download count etc? Some more details would be helpful.

    The default output for my packages is almost 6-10 times higher than if use the --pip option - any idea on why is that? I only ever recommended people to use my package via pip install. Although there are some dev, who clone and install locally outside pip, that number is likely very small. So the --pip option should closely match the default count (unless there is lot more going on which I don't understand).

    Also, if I would like to estimate "usage" (which is a higher bar from download) from this, would it make sense?

    Thanks for your help.

    opened by raamana 15
  • 'Client' object has no attribute 'run_sync_query'

    'Client' object has no attribute 'run_sync_query'

    First of all, thank you so much for building this package. I'm excited to start using it, but it has been crashing for me on every query run:

    coding@Aarons-MBP pypinfo (master)$ pypinfo django date
    Traceback (most recent call last):
      File "/usr/local/bin/pypinfo", line 11, in <module>
        load_entry_point('pypinfo==5.0.0', 'console_scripts', 'pypinfo')()
      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1043, in invoke
        return Command.invoke(self, ctx)
      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
        return callback(*args, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
        return f(get_current_context(), *args, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/pypinfo/cli.py", line 89, in pypinfo
        query = client.run_sync_query(built_query)
    AttributeError: 'Client' object has no attribute 'run_sync_query'
    

    I tried python2, updating gcloud, etc. to no avail.

    opened by aaronjanse 14
  • Try to filter out mirror and bot downloads by default

    Try to filter out mirror and bot downloads by default

    Hi,

    I tried your project and thik it's really cool. I maintain a small project with few download. When I queried the dataset for the last year it said 20k downloads. This most likely includes spiders and PyPI mirrors. For this to be useful for smaller projects it would be great to try and filter these things out by default.

    I for instance have downloads from Python version 1.17, which does not seem right. The majority of my downloads come from the system None (which I assume happens when downloaded over HTTP by clicking the link on the website. Maybe a baseline could be computed based on the minimum number of downloads for any project during the same time interval?

    opened by runfalk 14
  • google.api_core.exceptions.BadRequest: 400 FROM clause with table wildcards matches no table

    google.api_core.exceptions.BadRequest: 400 FROM clause with table wildcards matches no table

    Hi Everyone, I am getting this error despite being on the latest version:

    Installing collected packages: pypinfo
    Successfully installed pypinfo-19.0.0
    
    WARNING: You are using pip version 21.1.2; however, version 21.2.4 is available.
    You should consider upgrading via the '/Users/Reddy/anaconda3/envs/py36/bin/python -m pip install --upgrade pip' command.
    
    (base) $ 16:37:33 Quark doc >>  pypinfo confounds
    Traceback (most recent call last):
      File "/Users/Reddy/anaconda3/bin/pypinfo", line 11, in <module>
        sys.exit(pypinfo())
      File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/click/core.py", line 764, in __call__
        return self.main(*args, **kwargs)
      File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/click/core.py", line 717, in main
        rv = self.invoke(ctx)
      File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/click/core.py", line 1114, in invoke
        return Command.invoke(self, ctx)
      File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/click/core.py", line 956, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/click/core.py", line 555, in invoke
        return callback(*args, **kwargs)
      File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
        return f(get_current_context(), *args, **kwargs)
      File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/pypinfo/cli.py", line 150, in pypinfo
        query_rows = query_job.result(timeout=timeout // 1000)
      File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/google/cloud/bigquery/job.py", line 2762, in result
        super(QueryJob, self).result(timeout=timeout)
      File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/google/cloud/bigquery/job.py", line 703, in result
        return super(_AsyncJob, self).result(timeout=timeout)
      File "/Users/Reddy/anaconda3/lib/python3.6/site-packages/google/api_core/future/polling.py", line 127, in result
        raise self._exception
    google.api_core.exceptions.BadRequest: 400 FROM clause with table wildcards matches no table
    (base) $ 16:37:43 Quark doc >>
    

    appreciate your help or fix!

    Originally posted by @raamana in https://github.com/ofek/pypinfo/issues/114#issuecomment-905857544

    opened by hugovk 13
  • Target for

    Target for "Generating a Private Key" is missing

    Hi there. The installation instructions in your README includes the following link in step 5:

    https://cloud.google.com/storage/docs/authentication#generating-a-private-key

    This target does not appear to exist. As such, I am unsure how to "create credentials in JSON format."

    Thanks, Tom

    opened by tomduck 11
  • wrong installation hints

    wrong installation hints

    This is relatively painless, I swear.

    Well...

    During creation, choose BigQuery User as role.

    First off, which of

    • API key
    • OAuth client ID
    • Service accout key

    do I choose? I suppose Service account key since this is the only one that takes me to another screen with role in it. There's no such rule as BigQuery User however.

    opened by nschloe 8
  • No matching signature for function DATE_ADD for argument types: TIMESTAMP, INTERVAL INT64 DATE_TIME_PART

    No matching signature for function DATE_ADD for argument types: TIMESTAMP, INTERVAL INT64 DATE_TIME_PART

    Hi,

    After installation following instructions given in the README.md, I tried a simplest call:

    pypinfo requests
    

    Unfortunately getting following exception:

    ...
      File "/Users/christianr/.venv/sett/lib/python3.9/site-packages/pypinfo/cli.py", line 166, in pypinfo
        query_rows = query_job.result(timeout=timeout // 1000)
      File "/Users/christianr/.venv/sett/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py", line 1160, in result
        super(QueryJob, self).result(retry=retry, timeout=timeout)
      File "/Users/christianr/.venv/sett/lib/python3.9/site-packages/google/cloud/bigquery/job/base.py", line 631, in result
        return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
      File "/Users/christianr/.venv/sett/lib/python3.9/site-packages/google/api_core/future/polling.py", line 134, in result
        raise self._exception
    google.api_core.exceptions.BadRequest: 400 No matching signature for function DATE_ADD for argument types: TIMESTAMP, INTERVAL INT64 DATE_TIME_PART. Supported signature: DATE_ADD(DATE, INTERVAL INT64 DATE_TIME_PART) at [5:25]
    
    (job ID: 947e6084-e5e6-4bb5-ae12-bb8faad8ec0b)
    
                                                     -----Query Job SQL Follows-----
    
        |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
       1:SELECT
       2:  FORMAT_TIMESTAMP("%Y", timestamp) as download_year,
       3:  COUNT(*) as download_count,
       4:FROM `the-psf.pypi.file_downloads`
       5:WHERE timestamp BETWEEN DATE_ADD(CURRENT_TIMESTAMP(), INTERVAL -1826 DAY) AND DATE_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY)
       6:  AND file.project = "blist"
       7:  AND details.installer.name = "pip"
       8:GROUP BY
       9:  download_year
      10:ORDER BY
      11:  download_count DESC
      12:LIMIT 10
        |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
    
    

    Using the SQL query in the BigQuery SQL workspace pops up the same error message if I use the public PyPI download statistics dataset.

    My environment:

    pypinfo, version 18.0.0
    Python 3.9.1
    
    opened by ribeaud 7
  • perf: use clustered table and standard SQL for lower query costs

    perf: use clustered table and standard SQL for lower query costs

    By using the clustered data table, performance if improved because BigQuery can skip of data that doesn't match the desired project.

    Tested locally:

    $ pypinfo google-cloud-bigquery        Served from cache: False
    Data processed: 740.43 MiB
    Data billed: 741.00 MiB
    Estimated cost: $0.01
    
    | download_count |
    | -------------- |
    |     10,149,146 |
    

    Tested locally with all supported fields:

    $ pypinfo google-cloud-bigquery project version file pyversion percent3 percent2 impl impl-version openssl date month year country installer installer-version setuptools-version system system-release distro distro-version cpu libc libc-version
    Served from cache: False
    Data processed: 4.62 GiB
    Data billed: 4.62 GiB
    Estimated cost: $0.03
    
    | project               | version | file                                              | python_version | percent_3 | percent_2 | implementation | impl_version | openssl_version | download_date | download_month | download_year | country | installer_name | installer_version | setuptools_version | system_name | system_release    | distro_name      | distro_version | cpu    | libc_name | libc_version | download_count |
    | --------------------- | ------- | ------------------------------------------------- | -------------- | --------- | --------- | -------------- | ------------ | --------------- | ------------- | -------------- | ------------- | ------- | -------------- | ----------------- | ------------------ | ----------- | ----------------- | ---------------- | -------------- | ------ | --------- | ------------ | -------------- |
    | google-cloud-bigquery | 1.24.0  | google_cloud_bigquery-1.24.0-py2.py3-none-any.whl | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.0l          | 2020-12-15    | 2020-12        |         2,020 | US      | pip            | 20.0.2            | 45.1.0             | Linux       | 5.4.49+           | Debian GNU/Linux |              9 | x86_64 | glibc     | 2.24         |        171,636 |
    | google-cloud-bigquery | 1.24.0  | google_cloud_bigquery-1.24.0-py2.py3-none-any.whl | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.0l          | 2020-12-13    | 2020-12        |         2,020 | US      | pip            | 20.0.2            | 45.1.0             | Linux       | 5.4.49+           | Debian GNU/Linux |              9 | x86_64 | glibc     | 2.24         |        160,152 |
    | google-cloud-bigquery | 2.6.1   | google_cloud_bigquery-2.6.1-py2.py3-none-any.whl  | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.1           | 2021-01-07    | 2021-01        |         2,021 | IN      | pip            | 20.0.2            | 45.2.0             | Linux       | 4.15.0-1092-azure | Ubuntu           |          18.04 | x86_64 | glibc     | 2.27         |        136,529 |
    | google-cloud-bigquery | 2.6.1   | google_cloud_bigquery-2.6.1-py2.py3-none-any.whl  | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.1           | 2021-01-06    | 2021-01        |         2,021 | IN      | pip            | 20.0.2            | 45.2.0             | Linux       | 4.15.0-1092-azure | Ubuntu           |          18.04 | x86_64 | glibc     | 2.27         |        133,927 |
    | google-cloud-bigquery | 1.24.0  | google_cloud_bigquery-1.24.0-py2.py3-none-any.whl | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.0l          | 2020-12-14    | 2020-12        |         2,020 | US      | pip            | 20.0.2            | 45.1.0             | Linux       | 5.4.49+           | Debian GNU/Linux |              9 | x86_64 | glibc     | 2.24         |        130,400 |
    | google-cloud-bigquery | 2.6.1   | google_cloud_bigquery-2.6.1-py2.py3-none-any.whl  | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.1           | 2021-01-05    | 2021-01        |         2,021 | IN      | pip            | 20.0.2            | 45.2.0             | Linux       | 4.15.0-1092-azure | Ubuntu           |          18.04 | x86_64 | glibc     | 2.27         |        129,844 |
    | google-cloud-bigquery | 2.6.1   | google_cloud_bigquery-2.6.1-py2.py3-none-any.whl  | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.1           | 2021-01-09    | 2021-01        |         2,021 | IN      | pip            | 20.0.2            | 45.2.0             | Linux       | 4.15.0-1050-azure | Ubuntu           |          18.04 | x86_64 | glibc     | 2.27         |        126,544 |
    | google-cloud-bigquery | 2.6.1   | google_cloud_bigquery-2.6.1-py2.py3-none-any.whl  | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.1           | 2021-01-04    | 2021-01        |         2,021 | IN      | pip            | 20.0.2            | 45.2.0             | Linux       | 4.15.0-1092-azure | Ubuntu           |          18.04 | x86_64 | glibc     | 2.27         |        124,986 |
    | google-cloud-bigquery | 2.6.1   | google_cloud_bigquery-2.6.1-py2.py3-none-any.whl  | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.1           | 2020-12-30    | 2020-12        |         2,020 | IN      | pip            | 20.0.2            | 45.2.0             | Linux       | 4.15.0-1092-azure | Ubuntu           |          18.04 | x86_64 | glibc     | 2.27         |        121,202 |
    | google-cloud-bigquery | 2.6.1   | google_cloud_bigquery-2.6.1-py2.py3-none-any.whl  | 3.7            | 100.0     | 0.0       | CPython        | 3.7          | 1.1.1           | 2021-01-10    | 2021-01        |         2,021 | IN      | pip            | 20.0.2            | 45.2.0             | Linux       | 4.15.0-1050-azure | Ubuntu           |          18.04 | x86_64 | glibc     | 2.27         |        121,104 |
    | Total                 |         |                                                   |                |           |           |                |              |                 |               |                |               |         |                |                   |                    |             |                   |                  |                |        |           |              |      1,356,324 |
    

    Closes #64

    opened by tswast 7
  • sytematic error when i use --auth

    sytematic error when i use --auth

    When i run pypinfo --auth key.json with the BigQuery key i have the given answer :

    $ pypinfo --auth key.json
    Traceback (most recent call last):
      File "~/.local/bin/pypinfo", line 8, in <module>
        sys.exit(pypinfo())
      File "~/.local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "~/.local/lib/python3.8/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "~/.local/lib/python3.8/site-packages/click/core.py", line 1236, in invoke
        return Command.invoke(self, ctx)
      File "~/.local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "~/.local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "~/.local/lib/python3.8/site-packages/click/decorators.py", line 21, in new_func
        return f(get_current_context(), *args, **kwargs)
      File "~/.local/lib/python3.8/site-packages/pypinfo/cli.py", line 119, in pypinfo
        set_credentials(auth)
      File "~/.local/lib/python3.8/site-packages/pypinfo/db.py", line 25, in set_credentials
        tr.insert({'path': creds_file})
      File "~/.local/lib/python3.8/site-packages/tinyrecord/transaction.py", line 86, in __exit__
        self.record.execute()
      File "~/.local/lib/python3.8/site-packages/tinyrecord/changeset.py", line 21, in execute
        data = self.db._read()
    AttributeError: 'Table' object has no attribute '_read'
    

    There is probably something about tinydb package

    bug dependencies 
    opened by Pacidus 7
  • Right-align Markdown

    Right-align Markdown

    Simply adding a colon will align columns nicely in Markdown.

    See https://help.github.com/articles/organizing-information-with-tables/#formatting-content-within-your-table

    Builds on #23 by including a unit test for this change (eg.). This makes things much easier to develop without going over your quota!

    Before

    As plaintext:

    | python_version | percent | download_count |
    | -------------- | ------- | -------------- |
    | 2.7            |   59.3% |          3,048 |
    | 3.5            |   20.9% |          1,074 |
    | 3.6            |   14.3% |            734 |
    | 3.4            |    5.4% |            279 |
    | 2.6            |    0.1% |              4 |
    

    As rendered Markdown:

    | python_version | percent | download_count | | -------------- | ------- | -------------- | | 2.7 | 59.3% | 3,048 | | 3.5 | 20.9% | 1,074 | | 3.6 | 14.3% | 734 | | 3.4 | 5.4% | 279 | | 2.6 | 0.1% | 4 |

    After

    As plaintext:

    | python_version | percent | download_count |
    | -------------- | ------: | -------------: |
    | 2.7            |   59.3% |          3,048 |
    | 3.5            |   20.9% |          1,074 |
    | 3.6            |   14.3% |            734 |
    | 3.4            |    5.4% |            279 |
    | 2.6            |    0.1% |              4 |
    

    As rendered Markdown:

    | python_version | percent | download_count | | -------------- | ------: | -------------: | | 2.7 | 59.3% | 3,048 | | 3.5 | 20.9% | 1,074 | | 3.6 | 14.3% | 734 | | 3.4 | 5.4% | 279 | | 2.6 | 0.1% | 4 |

    opened by hugovk 7
  • Disclose BigTable Costs in Readme

    Disclose BigTable Costs in Readme

    After spending a bunch of time setting up the GCP account, and looking at usage, it seems to be very expensive.

    After the free tier is consumed and the 90day $300 credit expires, BigQuery costs $5 per TB of data queried. Querying the simple_requests table which is 49.77 TB would cost $250 per query.

    A query of the distribution_metadata table (18.8GB) would cost $0.094 per query. Again very expensive..

    Placing the data in AWS S3 using S3 Select cost about the same, still very expensive.

    I think this should be fully disclosed in the readme file

    opened by garymazz 6
  • Non-normalised package name

    Non-normalised package name

    I would like to use the data to correlate with openSUSE package names, which use the 'real' name supplied in setup.py, i.e. not-normalised.

    I've been doing a bit of research at hugovk/top-pypi-packages#4 and https://github.com/psincraian/pepy/issues/128, and the raw data from bigquery can include this, with a very small perf hit.

    The query only needs to change from selecting file.project to substr(max(file.filename),1,LENGTH(file.project)) , or more likely including both.

    Note this does depend on using standard SQL ( https://github.com/ofek/pypinfo/issues/28 ).

    Do we know the cost implications of those changes?

    opened by jayvdb 3
  • feature request - trend support

    feature request - trend support

    Hello,

    It would be great to be able to tell project tendencies. E.g are my monthly usage going up or down? This is already possible with playing around the start and end date, but what about providing a builtin way of this, that does not require shell scripting it?

    opened by gaborbernat 1
  • How to check current BigQuery usage

    How to check current BigQuery usage

    Whilst making https://github.com/ofek/pypinfo/pull/29 I got this:

        raise exceptions.from_http_response(response)
    google.api_core.exceptions.Forbidden: 403 GET https://www.googleapis.com/bigquery/v2/projects/pypinfo-hugovk/queries/<snip>?maxResults=0&timeoutMs=10000: Quota exceeded: Your project exceeded quota for free query bytes scanned. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
    

    I can't find how much of my free monthly TB I've used up on the Google console.

    After some digging in the console I found this:

    image

    https://console.cloud.google.com/home/activity?project=pypinfo-hugovk&authuser=1

    Linking to this:

    image

    https://console.cloud.google.com/iam-admin/quotas?project=pypinfo-hugovk&authuser=1

    Which isn't very informative -- all zeroes and dashes!

    I get the error after doing several pypinfo --percent --pip pypinfo pyversion, but pypinfo --percent --pip -d 1 pypinfo is still fine.

    Any idea what "QUERY-MBYTES-FOR-UNBILLED-PROJECTS-per-project" really means?

    And where to check the monthly 1TB quota?

    Thanks!

    opened by hugovk 6
Owner
Ofek Lev
I like developing beautiful APIs.
Ofek Lev
Command-line interface to PyPI Stats API to get download stats for Python packages

pypistats Python 3.6+ interface to PyPI Stats API to get aggregate download statistics on Python packages on the Python Package Index without having t

Hugo van Kemenade 140 Jan 3, 2023
Python CLI vm manager for remote access of docker images via noVNC

vmman is a tool to quickly boot and view docker-based VMs running on a linux server through noVNC without ssh tunneling on another network.

UCSD Engineers for Exploration 1 Nov 29, 2021
A simple weather tool. I made this as a way for me to learn Python, API, and PyPi packaging.

A simple weather tool. I made this as a way for me to learn Python, API, and PyPi packaging.

Clint E. 105 Dec 31, 2022
Install python modules from pypi from a previous date in history

pip-rewind is a command-line tool that can rewind pypi module versions (given as command-line arguments or read from a requirements.txt file) to a previous date in time.

Amar Paul 4 Jul 3, 2021
CLI utility to search and download torrents from major torrent sites

CLI Torrent Downloader About CLI Torrent Downloader provides convenient and quick way to search torrent magnet links (and to run associated torrent cl

x0r0x 86 Dec 19, 2022
Tarstats - A simple Python commandline application that collects statistics about tarfiles

A simple Python commandline application that collects statistics about tarfiles.

Kristian Koehntopp 13 Feb 20, 2022
Sink is a CLI tool that allows users to synchronize their local folders to their Google Drives. It is similar to the Git CLI and allows fast and reliable syncs with the drive.

Sink is a CLI synchronisation tool that enables a user to synchronise local system files and folders with their Google Drives. It follows a git C

Yash Thakre 16 May 29, 2022
flora-dev-cli (fd-cli) is command line interface software to interact with flora blockchain.

Install git clone https://github.com/Flora-Network/fd-cli.git cd fd-cli python3 -m venv venv source venv/bin/activate pip install -e . --extra-index-u

null 14 Sep 11, 2022
AWS Interactive CLI - Allows you to execute a complex AWS commands by chaining one or more other AWS CLI dependency

AWS Interactive CLI - Allows you to execute a complex AWS commands by chaining one or more other AWS CLI dependency

Rafael Torres 2 Dec 10, 2021
Python-Stock-Info-CLI: Get stock info through CLI by passing stock ticker.

Python-Stock-Info-CLI Get stock info through CLI by passing stock ticker. Installation Use the following command to install the required modules at on

Ayush Soni 1 Nov 5, 2021
Yts-cli-streamer - A CLI movie streaming client which works on yts.mx API written in python

YTSP It is a CLI movie streaming client which works on yts.mx API written in pyt

null 1 Feb 5, 2022
[WIP]An ani-cli like cli tool for movies and webseries

mov-cli A cli to browse and watch movies. Installation This project is a work in progress. However, you can try it out python git clone https://github

null 166 Dec 30, 2022
Customisable pharmacokinetic model accessible via bash CLI allowing for variable dose calculations as well as intravenous and subcutaneous administration calculations

Pharmacokinetic Modelling Group Project A PharmacoKinetic (PK) modelling function for analysis of injected solute dynamics over time, developed by Gro

null 1 Oct 24, 2021
Standalone Tailwind CSS CLI, installable via pip

Standalone Tailwind CSS CLI, installable via pip Use Tailwind CSS without Node.j

Tim Kamanin 144 Dec 22, 2022
topalias - Linux alias generator from bash/zsh command history with statistics, written on Python.

topalias topalias - Linux alias generator from bash/zsh command history with statistics, written on Python. Features Generate short alias for popular

Sergey Chudakov 38 May 26, 2022
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

Command line utilities for tabular data files This is a set of command line utilities for manipulating large tabular data files. Files of numeric and

eBay 1.4k Jan 9, 2023
Salesforce object access auditor

Salesforce object access auditor Released as open source by NCC Group Plc - https://www.nccgroup.com/ Developed by Jerome Smith @exploresecurity (with

NCC Group Plc 90 Sep 19, 2022
Access hacksec.in from your command-line

Access hacksec.in from your command-line

hacksec.in 3 Oct 26, 2022