Introduction
This library is mainly for CLI programs that carefully produce output for Terminals, or make pretend to be an emulator.
Problem Statement: The printable length of most strings are equal to the number of cells they occupy on the screen 1 charater : 1 cell
. However, there are categories of characters that occupy 2 cells (full-wide), and others that occupy 0 cells (zero-width).
Solution: POSIX.1-2001 and POSIX.1-2008 conforming systems provide wcwidth(3) and wcswidth(3) C functions of which this python module's functions precisely copy. These functions return the number of cells a unicode string is expected to occupy.
Installation
The stable version of this package is maintained on pypi, install using pip:
pip install wcwidth
Example
Problem: given the following phrase (Japanese),
>>> text = u'コンニチハ'
Python incorrectly uses the string length of 5 codepoints rather than the printible length of 10 cells, so that when using the rjust function, the output length is wrong:
>>> print(len('コンニチハ')) 5 >>> print('コンニチハ'.rjust(20, '_')) _______________コンニチハ
By defining our own "rjust" function that uses wcwidth, we can correct this:
>>> def wc_rjust(text, length, padding=' '): ... from wcwidth import wcswidth ... return padding * max(0, (length - wcswidth(text))) + text ...
Our Solution uses wcswidth to determine the string length correctly:
>>> from wcwidth import wcswidth >>> print(wcswidth('コンニチハ')) 10 >>> print(wc_rjust('コンニチハ', 20, '_')) __________コンニチハ
Choosing a Version
Export an environment variable, UNICODE_VERSION
. This should be done by terminal emulators or those developers experimenting with authoring one of their own, from shell:
$ export UNICODE_VERSION=13.0
If unspecified, the latest version is used. If your Terminal Emulator does not export this variable, you can use the jquast/ucs-detect utility to automatically detect and export it to your shell.
wcwidth, wcswidth
Use function wcwidth()
to determine the length of a single unicode character, and wcswidth()
to determine the length of many, a string of unicode characters.
Briefly, return values of function wcwidth()
are:
-
-1
- Indeterminate (not printable).
-
0
- Does not advance the cursor, such as NULL or Combining.
-
2
- Characters of category East Asian Wide (W) or East Asian Full-width (F) which are displayed using two terminal cells.
-
1
- All others.
Function wcswidth()
simply returns the sum of all values for each character along a string, or -1
when it occurs anywhere along a string.
Full API Documentation at http://wcwidth.readthedocs.org
Developing
Install wcwidth in editable mode:
pip install -e.
Execute unit tests using tox:
tox
Regenerate python code tables from latest Unicode Specification data files:
tox -eupdate
Supplementary tools for browsing and testing terminals for wide unicode characters are found in the bin/ of this project's source code. Just ensure to first pip install -erequirements-develop.txt
from this projects main folder. For example, an interactive browser for testing:
./bin/wcwidth-browser.py
Uses
This library is used in:
- jquast/blessed: a thin, practical wrapper around terminal capabilities in Python.
- jonathanslenders/python-prompt-toolkit: a Library for building powerful interactive command lines in Python.
- dbcli/pgcli: Postgres CLI with autocompletion and syntax highlighting.
- thomasballinger/curtsies: a Curses-like terminal wrapper with a display based on compositing 2d arrays of text.
- selectel/pyte: Simple VTXXX-compatible linux terminal emulator.
- astanin/python-tabulate: Pretty-print tabular data in Python, a library and a command-line utility.
- LuminosoInsight/python-ftfy: Fixes mojibake and other glitches in Unicode text.
- nbedos/termtosvg: Terminal recorder that renders sessions as SVG animations.
- peterbrittain/asciimatics: Package to help people create full-screen text UIs.
Other Languages
- timoxley/wcwidth: JavaScript
- janlelis/unicode-display_width: Ruby
- alecrabbit/php-wcwidth: PHP
- Text::CharWidth: Perl
- bluebear94/Terminal-WCWidth: Perl 6
- mattn/go-runewidth: Go
- emugel/wcwidth: Haxe
- aperezdc/lua-wcwidth: Lua
- joachimschmidt557/zig-wcwidth: Zig
- fumiyas/wcwidth-cjk: LD_PRELOAD override
- joshuarubin/wcwidth9: Unicode version 9 in C
History
- 0.2.0 2020-06-01
-
- Enhancement: Unicode version may be selected by exporting the Environment variable
UNICODE_VERSION
, such as13.0
, or6.3.0
. See the jquast/ucs-detect CLI utility for automatic detection. - Enhancement: API Documentation is published to readthedocs.org.
- Updated tables for all Unicode Specifications with files published in a programmatically consumable format, versions 4.1.0 through 13.0 that are published , versions
- Enhancement: Unicode version may be selected by exporting the Environment variable
- 0.1.9 2020-03-22
-
- Performance optimization by Avram Lubkin, PR #35.
- Updated tables to Unicode Specification 13.0.0.
- 0.1.8 2020-01-01
-
- Updated tables to Unicode Specification 12.0.0. (PR #30).
- 0.1.7 2016-07-01
-
- Updated tables to Unicode Specification 9.0.0. (PR #18).
- 0.1.6 2016-01-08 Production/Stable
-
LICENSE
file now included with distribution.
- 0.1.5 2015-09-13 Alpha
-
- Bugfix: Resolution of "combining character width" issue, most especially those that previously returned -1 now often (correctly) return 0. resolved by Philip Craig via PR #11.
- Deprecated: The module path
wcwidth.table_comb
is no longer available, it has been superseded by module pathwcwidth.table_zero
.
- 0.1.4 2014-11-20 Pre-Alpha
-
- Feature:
wcswidth()
now determines printable length for (most) combining characters. The developer's tool bin/wcwidth-browser.py is improved to display combining characters when provided the--combining
option (Thomas Ballinger and Leta Montopoli PR #5). - Feature: added static analysis (prospector) to testing framework.
- Feature:
- 0.1.3 2014-10-29 Pre-Alpha
-
- Bugfix: 2nd parameter of wcswidth was not honored. (Thomas Ballinger, PR #4).
- 0.1.2 2014-10-28 Pre-Alpha
-
- Updated tables to Unicode Specification 7.0.0. (Thomas Ballinger, PR #3).
- 0.1.1 2014-05-14 Pre-Alpha
-
- Initial release to pypi, Based on Unicode Specification 6.3.0
This code was originally derived directly from C code of the same name, whose latest version is available at http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c:
* Markus Kuhn -- 2007-05-26 (Unicode 5.0) * * Permission to use, copy, modify, and distribute this software * for any purpose and without fee is hereby granted. The author * disclaims all warranties with regard to this software.