A generator library for concise, unambiguous and URL-safe UUIDs.

Stavros Korokithakis

Last update: Dec 31, 2022

Related tags

Text Processing shortuuid

Overview

Description

shortuuid is a simple python library that generates concise, unambiguous, URL-safe UUIDs.

Often, one needs to use non-sequential IDs in places where users will see them, but the IDs must be as concise and easy to use as possible. shortuuid solves this problem by generating uuids using Python's built-in uuid module and then translating them to base57 using lowercase and uppercase letters and digits, and removing similar-looking characters such as l, 1, I, O and 0.

https://travis-ci.org/skorokithakis/shortuuid.svg?branch=master

Installation

To install shortuuid you need:

Python 2.5 or later in the 2.x line (earlier than 2.6 not tested), or any 3.x.

If you have the dependencies, you have multiple options of installation:

With pip (preferred), do pip install shortuuid.
With setuptools, do easy_install shortuuid.
To install the source, download it from https://github.com/stochastic-technologies/shortuuid and do python setup.py install.

Usage

To use shortuuid, just import it in your project like so:

>>> import shortuuid

You can then generate a short UUID:

>>> shortuuid.uuid()
'vytxeTZskVKR7C7WgdSP3d'

If you prefer a version 5 UUID, you can pass a name (DNS or URL) to the call and it will be used as a namespace (uuid.NAMESPACE_DNS or uuid.NAMESPACE_URL) for the resulting UUID:

>>> shortuuid.uuid(name="example.com")
'wpsWLdLt9nscn2jbTD3uxe'
>>> shortuuid.uuid(name="http://example.com")
'c8sh5y9hdSMS6zVnrvf53T'

You can also generate a cryptographically secure random string (using os.urandom(), internally) with:

>>> shortuuid.ShortUUID().random(length=22)
'RaF56o2r58hTKT7AYS9doj'

To see the alphabet that is being used to generate new UUIDs:

>>> shortuuid.get_alphabet()
'23456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'

If you want to use your own alphabet to generate UUIDs, use set_alphabet():

>>> shortuuid.set_alphabet("aaaaabcdefgh1230123")
>>> shortuuid.uuid()
'0agee20aa1hehebcagddhedddc0d2chhab3b'

shortuuid will automatically sort and remove duplicates from your alphabet to ensure consistency:

>>> shortuuid.get_alphabet()
'0123abcdefgh'

If the default 22 digits are too long for you, you can get shorter IDs by just truncating the string to the desired length. The IDs won't be universally unique any longer, but the probability of a collision will still be very low.

To serialize existing UUIDs, use encode() and decode():

>>> import uuid ; u = uuid.uuid4() ; u
UUID('6ca4f0f8-2508-4bac-b8f1-5d1e3da2247a')
>>> s = shortuuid.encode(u) ; s
'cu8Eo9RyrUsV4MXEiDZpLM'
>>> shortuuid.decode(s) == u
True
>>> short = s[:7] ; short
'cu8Eo9R'
>>> h = shortuuid.decode(short)
UUID('00000000-0000-0000-0000-00b8c0b9f952')
>>> shortuuid.decode(shortuuid.encode(h)) == h
True

Class-based usage

If you need to have various alphabets per-thread, you can use the ShortUUID class, like so:

>>> su = shortuuid.ShortUUID(alphabet="01345678")
>>> su.uuid()
'034636353306816784480643806546503818874456'
>>> su.get_alphabet()
'01345678'
>>> su.set_alphabet("21345687654123456")
>>> su.get_alphabet()
'12345678'

Command-line usage

shortuuid provides a simple way to generate a short UUID in a terminal:

$ python3 -m shortuuid
fZpeF6gcskHbSpTgpQCkcJ

(Replace python3 with py if you are using Windows)

Compatibility note

Versions of ShortUUID prior to 1.0.0 generated UUIDs with their MSB last, i.e. reversed. This was later fixed, but if you have some UUIDs stored as a string with the old method, you need to pass legacy=True to decode() when converting your strings back to UUIDs.

That option will go away in the future, so you will want to convert your UUIDs to strings using the new method. This can be done like so:

>>> new_uuid_str = encode(decode(old_uuid_str, legacy=True))

License

shortuuid is distributed under the BSD license.

Comments

LSB-first encoding makes lexicographic ordering hard

Let's say I want to group a number of shortuuids into several non-overlapping partitions and use a string-based comparison that defines each partition, e.g. "aaa" <= shortuuid < "bbb". Since shortuuids are LSB-first, this isn't possible, since 0 corresponds to '2222222222222222222222' and 1 corresponds to '3222222222222222222222'. With UUIDs generated with str(uuid.uuid4()) this would be '00000000-0000-0000-0000-000000000000' for 0 and '00000000-0000-0000-0000-000000000001' for 1, which is useable for lexicographic ordering based on their integer representation. Is there a specific reason why shortuuids are LSB-first? In my case it complicates things, but are there advantages I'm ignoring?

opened by letmaik 22

ShortUUID should not sort the alphabet

I am attempting to convert a shortuuid generated using node.js and the Flicker base 58 alphabet:

shortid = 'apECAPA6RdHB6HB51FwsdN'
FLICKR_BASE58 = '123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ'
uuid = shortuuid.ShortUUID(alphabet=FLICKR_BASE58).decode(shortid)

Unfortunately this doesn't work:

Traceback (most recent call last):
  File "<ipython-input-54-88d3ac3dd885>", line 3, in <module>
    uuid = shortuuid.ShortUUID(alphabet=FLICKR_BASE58).decode(shortid)
  File "/Users/scotts/.pyenv/versions/python38/lib/python3.8/site-packages/shortuuid/main.py", line 79, in decode
    return _uu.UUID(int=string_to_int(string, self._alphabet))
  File "/Users/scotts/.pyenv/versions/3.8.7/lib/python3.8/uuid.py", line 205, in __init__
    raise ValueError('int is out of range (need a 128-bit value)')
ValueError: int is out of range (need a 128-bit value)

It turns out that ShortUUID is sorting the alphabet before using it. This is incorrect and leads to this failure. It is possible to correctly decode the id like this:

import uuid as _uu
uuid = _uu.UUID(int=string_to_int(shortid, FLICKR_BASE58))

This is a backwards compatibility issue, so switching to an unsorted implementation will probably require a compatibility flag similar to legacy.

opened by snstanton 17

Generating invalid ShortUUIDs
I'm calling shortuuid.uuid() in our unit test which run regularly and I'm seeing it generate some invalid UUIDs

Such as 5vHqLc9etTWW3AVrEJvBez.

The library seems to think this is a valid UUID

>>> shortuuid.decode("5vHqLc9etTWW3AVrEJvBez") UUID('160427e7-f28e-40ee-9f7a-d473721488c8')

But your website does not: https://shortuuid.com/

I'm also validating this with some Elixir libraries and they are reporting that it's invalid as well.

Any help would be greatly appreciated.
opened by kevinkirkup 14
Added PaddedShortUUID for constant length short uuid strings

Small enough UUID integers can create shorter uuid strings (21 character strings are generated ~1-2% of the time with the default alphabet). When you want a fixed length ID generated, use this PaddedShortUUID class instead.

opened by kevinastone 13
Add MSB-first to https://shortuuid.com/

I'm not sure how is the owner of https://shortuuid.com/, but I think this is the right place to post the request.

I'm heavly using shortuuid in the system, so I'm using https://shortuuid.com/ to convert UUID<->shortuuid. However, I'm using shortuuid v1.0.9 which has MSB-first decoding while the site use legacy decoding, so I'm using string reverse tool to get the MSB shortuuid.

It would be great if you can add a new field for MSB or a radio button to switch to MSB decoding to https://shortuuid.com/

Thanks!

opened by MotasemAghbar 12
shortuuid performance and sorting
I was playing with our Java UUID shortner I wrote several years ago and I decided to see what other folks are doing and came across library.

Anyway I have few critiques.

The algorithm as I understand turns the UUID into one giant 128 bit integer and divmods appending the remainder to a string. The most significant part of the UUID is appended first but the contents of the most significant has the most significant right most. For example base10 a "100" would be 001 or in base57 m3.

There are two problems to this approach:

Sorting is no longer the same. That is if you have a list of UUIDs and list of corresponding shortuuids the sort order will not be same. This is because it follows the bit order which makes since for a generic encoding algorithm but I don't think its good for a UUID shortner.

The algorithm is very slow on Java. Or at least this implementation which appears to be a clone of the golang https://github.com/hsingh/java-shortuuid and mostly follows the python version.

An easy replacement for the Java one is to use Bitcoin's Base58 and just change the alphabet which is originally what I used to use for UUID shortening. However Base58 divmods byte by byte and does not construct a full 128 bit integer.

Still Base58 is actually still not that fast compared to say Base64 and that is obviously because of power of 2 division.

My algorithm is almost as fast as base 64. My algorithm works by divmoding longs (64 bit numbers in java) one at a time. This is fine because a UUID is two 64 bit numbers. Secondly there are tricks divmoding 64 bit numbers super fast even if they are not powers of 2 divisors. Secondly it appends the most significant left to write to preserve lexicographical sort order. Padding is required but based on my tests only like ~2% of UUIDs have less than 22 characters (I was too lazy to do the math).

Anyway before I go set an opensource repository and try to port it to python I wanted to get some feedback. Is this a bad idea?
opened by agentgt 12
Using pkg_resources makes package load really slow

Heya, this change really hurts package loading times 😬 https://github.com/skorokithakis/shortuuid/blob/bcd01ceaf2419d874b8543e6d7c28e2c56da7a91/shortuuid/init.py#L13

For example, with our CLI this change has meant an increase in load time from ~0.3 seconds to ~0.8 seconds!

pkg_resources is essentially deprecated, in part, for this very reason; replaced by https://docs.python.org/3/library/importlib.metadata.html and the https://github.com/python/importlib_metadata backport

opened by chrisjsewell 8
Duplicated shord uuid

I'm using shortuuid.ShortUUID().random(length=22) for generating short-uuids in a django application on Heroku.

In already 2 cases I found a duplicated uuid. I haven't been able to reproduce whenever I want the error yet.

opened by leomrocha 8
add py.typed to MANIFEST.in, re-export API
Thanks for shortuuid!

This PR:

[x] updates MANIFEST.in so that the py.typed file is included in the sdist on PyPI, useful for downstream re-builders.

[x] populates __all__ in __init__.py to avoid some mypy warnings

[x] allows removing the noqa comments
opened by bollwyvl 7
Unclear how to use UUID5, namespace

My understanding of UUID v5 is you can provide a namespace and a value to get a deterministic hash for that namespace + value.

I see you can provide shortuuid.uuid a name argument, and I see in the code how that maps to uuid5 in the code... but it's unclear to me what to do from there.

I see that providing the same name argument produces the same output (as expected!). How do you now combine a namespace and value to get a deterministic output, such that func(namespace, value) produces the same output every time (where namespace remains the same and value changes)?

Perhaps I'm misunderstanding something. Thanks.

opened by connerxyz 7
New release has broken existing URLs

I store UUIDs in my database and render to short UUIDs in the UI, especially in URLs. Last week's new releases have broken all my existing URLs because of the MSB issue.

While the legacy=True flag is available, the promise that it will go away in future is not comforting. There is absolutely nothing I can do to withdraw the thousands of existing URLs I have in circulation (as will be the case for anyone who has used ShortUUID in production).

I'd like the option to continue generating legacy short UUIDs forever. The MSB order is irrelevant anyway in a rendered string.

opened by jace 7

Releases(v1.0.0)

v1.0.0(Mar 6, 2020)
NOTE: THIS IS A BREAKING RELEASE. See the compatibility note in the README before upgrading.

v1.0.0 (2020-03-06)

Features

Drop support for Python before 3.5. [Stavros Korokithakis]

Add simple command-line interface (#43) [Éric Araujo]

Fixes

Make encode and decode MSB-first (#36) [Keane Nguyen]

Make the URL check more robust (fixes #32) [Stavros Korokithakis]

v0.5.0 (2017-02-19)

Features

Make int_to_string and string_to_int available globally. [Stavros Korokithakis]

Source code(tar.gz)
Source code(zip)
v0.5.0(Feb 20, 2017)
Feature

Make int_to_string and string_to_int available globally

Source code(tar.gz)
Source code(zip)
v0.4.3(Jan 13, 2016)
Make length dynamic based on alphabet size.

Other stuff I don't remember because I wasn't making releases.

Source code(tar.gz)
Source code(zip)

Owner

Stavros Korokithakis

I love writing code, making stupid devices and writing about writing code and making stupid devices.

GitHub http://www.stavros.io/

Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

Meaningful Word Generator Generates meaningful words from dictionary with given no. of letters and words. This might be useful for generating short li

1 Jan 1, 2022

🚩 A simple and clean python banner generator - Banners

?? A simple and clean python banner generator - Banners

12 Oct 9, 2022

A username generator made from French Canadian most common names.

This script is used to generate a username list using the most common first and last names in Quebec in different formats. It can generate some passwords using specific patterns such as Tremblay2020.

5 Nov 26, 2022

Goblin-sim - Procedural fantasy world generator

goblin-sim This project is an attempt to create a procedural goblin fantasy worl

3 May 18, 2022

A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Python User Agents user_agents is a Python library that provides an easy way to identify/detect devices like mobile phones, tablets and their capabili

1.3k Dec 22, 2022

Etranslate is a free and unlimited python library for transiting your texts

16 Sep 13, 2022

Build a translation program similar to Google Translate with Python programming language and QT library

google-translate Build a translation program similar to Google Translate with Python programming language and QT library Different parts of the progra

3 Oct 9, 2021

Python library for creating PEG parsers

PyParsing -- A Python Parsing Module Introduction The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the t

1.7k Dec 27, 2022

py-trans is a Free Python library for translate text into different languages.

Free Python library to translate text into different languages.

13 Aug 27, 2022

LazyText is inspired b the idea of lazypredict, a library which helps build a lot of basic models without much code.

LazyText is inspired b the idea of lazypredict, a library which helps build a lot of basic models without much code. LazyText is for text what lazypredict is for numeric data.

13 Nov 4, 2022

Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

146 Dec 18, 2022

🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙‍♀️

?? Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! ??‍♀️

5.6k Jan 3, 2023

You can encode and decode base85, ascii85, base64, base32, and base16 with this tool.

8 Dec 20, 2022

StealBit1.1 and earlier strings and config extraction scripts

StealBit1.1 and earlier scripts Use strings_decryptor.py to extract RC4 encrypted strings from a StealBit1.1 sample(s). Use config_extractor.py to ext

5 Dec 29, 2022

Fixes mojibake and other glitches in Unicode text, after the fact.

ftfy: fixes text for you >>> print(fix_encoding("(à¸‡'âŒ£')à¸‡")) (ง'⌣')ง Full documentation: https://ftfy.readthedocs.org Testimonials “My life is li

3.4k Jan 8, 2023

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Contents Maintainer wanted Introduction Installation Documentation License History Source code Authors Maintainer wanted I am looking for a new mainta

1.2k Dec 16, 2022

A generator library for concise, unambiguous and URL-safe UUIDs.

Related tags

Overview

Description

Installation

Usage

Class-based usage

Command-line usage

Compatibility note

License

Comments

Releases(v1.0.0)

v1.0.0(Mar 6, 2020)

v1.0.0 (2020-03-06)

Features

Fixes

v0.5.0 (2017-02-19)

Features

v0.5.0(Feb 20, 2017)

Feature

v0.4.3(Jan 13, 2016)

Owner

Stavros Korokithakis

Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

🚩 A simple and clean python banner generator - Banners

A username generator made from French Canadian most common names.

Goblin-sim - Procedural fantasy world generator

A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Etranslate is a free and unlimited python library for transiting your texts

Build a translation program similar to Google Translate with Python programming language and QT library

Python library for creating PEG parsers

py-trans is a Free Python library for translate text into different languages.

LazyText is inspired b the idea of lazypredict, a library which helps build a lot of basic models without much code.

🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙‍♀️

You can encode and decode base85, ascii85, base64, base32, and base16 with this tool.

StealBit1.1 and earlier strings and config extraction scripts

Fixes mojibake and other glitches in Unicode text, after the fact.

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Implementation of hashids (http://hashids.org) in Python. Compatible with Python 2 and Python 3

Format Covid values to ASCII-Table (Only for Germany and Austria)

Text to ASCII and ASCII to text