Python library that measures the width of unicode strings rendered to a terminal

Related tags

wcwidth
Overview

Downloads codecov.io Code Coverage MIT License

Introduction

This library is mainly for CLI programs that carefully produce output for Terminals, or make pretend to be an emulator.

Problem Statement: The printable length of most strings are equal to the number of cells they occupy on the screen 1 charater : 1 cell. However, there are categories of characters that occupy 2 cells (full-wide), and others that occupy 0 cells (zero-width).

Solution: POSIX.1-2001 and POSIX.1-2008 conforming systems provide wcwidth(3) and wcswidth(3) C functions of which this python module's functions precisely copy. These functions return the number of cells a unicode string is expected to occupy.

Installation

The stable version of this package is maintained on pypi, install using pip:

pip install wcwidth

Example

Problem: given the following phrase (Japanese),

>>>  text = u'コンニチハ'

Python incorrectly uses the string length of 5 codepoints rather than the printible length of 10 cells, so that when using the rjust function, the output length is wrong:

>>> print(len('コンニチハ'))
5

>>> print('コンニチハ'.rjust(20, '_'))
_______________コンニチハ

By defining our own "rjust" function that uses wcwidth, we can correct this:

>>> def wc_rjust(text, length, padding=' '):
...    from wcwidth import wcswidth
...    return padding * max(0, (length - wcswidth(text))) + text
...

Our Solution uses wcswidth to determine the string length correctly:

>>> from wcwidth import wcswidth
>>> print(wcswidth('コンニチハ'))
10

>>> print(wc_rjust('コンニチハ', 20, '_'))
__________コンニチハ

Choosing a Version

Export an environment variable, UNICODE_VERSION. This should be done by terminal emulators or those developers experimenting with authoring one of their own, from shell:

$ export UNICODE_VERSION=13.0

If unspecified, the latest version is used. If your Terminal Emulator does not export this variable, you can use the jquast/ucs-detect utility to automatically detect and export it to your shell.

wcwidth, wcswidth

Use function wcwidth() to determine the length of a single unicode character, and wcswidth() to determine the length of many, a string of unicode characters.

Briefly, return values of function wcwidth() are:

-1
Indeterminate (not printable).
0
Does not advance the cursor, such as NULL or Combining.
2
Characters of category East Asian Wide (W) or East Asian Full-width (F) which are displayed using two terminal cells.
1
All others.

Function wcswidth() simply returns the sum of all values for each character along a string, or -1 when it occurs anywhere along a string.

Full API Documentation at http://wcwidth.readthedocs.org

Developing

Install wcwidth in editable mode:

pip install -e.

Execute unit tests using tox:

tox

Regenerate python code tables from latest Unicode Specification data files:

tox -eupdate

Supplementary tools for browsing and testing terminals for wide unicode characters are found in the bin/ of this project's source code. Just ensure to first pip install -erequirements-develop.txt from this projects main folder. For example, an interactive browser for testing:

./bin/wcwidth-browser.py

Uses

This library is used in:

Other Languages

History

0.2.0 2020-06-01
  • Enhancement: Unicode version may be selected by exporting the Environment variable UNICODE_VERSION, such as 13.0, or 6.3.0. See the jquast/ucs-detect CLI utility for automatic detection.
  • Enhancement: API Documentation is published to readthedocs.org.
  • Updated tables for all Unicode Specifications with files published in a programmatically consumable format, versions 4.1.0 through 13.0 that are published , versions
0.1.9 2020-03-22
  • Performance optimization by Avram Lubkin, PR #35.
  • Updated tables to Unicode Specification 13.0.0.
0.1.8 2020-01-01
  • Updated tables to Unicode Specification 12.0.0. (PR #30).
0.1.7 2016-07-01
  • Updated tables to Unicode Specification 9.0.0. (PR #18).
0.1.6 2016-01-08 Production/Stable
  • LICENSE file now included with distribution.
0.1.5 2015-09-13 Alpha
  • Bugfix: Resolution of "combining character width" issue, most especially those that previously returned -1 now often (correctly) return 0. resolved by Philip Craig via PR #11.
  • Deprecated: The module path wcwidth.table_comb is no longer available, it has been superseded by module path wcwidth.table_zero.
0.1.4 2014-11-20 Pre-Alpha
0.1.3 2014-10-29 Pre-Alpha
0.1.2 2014-10-28 Pre-Alpha
0.1.1 2014-05-14 Pre-Alpha
  • Initial release to pypi, Based on Unicode Specification 6.3.0

This code was originally derived directly from C code of the same name, whose latest version is available at http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c:

* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
*
* Permission to use, copy, modify, and distribute this software
* for any purpose and without fee is hereby granted. The author
* disclaims all warranties with regard to this software.
Issues
  • Bump pyyaml from 3.11 to 5.1

    Bump pyyaml from 3.11 to 5.1

    Bumps pyyaml from 3.11 to 5.1.

    Changelog

    Sourced from pyyaml's changelog.

    5.1 (2019-03-13)

    3.13 (2018-07-05)

    • Resolved issues around PyYAML working in Python 3.7.

    3.12 (2016-08-28)

    • Wheel packages for Windows binaries.
    • Adding an implicit resolver to a derived loader should not affect the base loader.
    • Uniform representation for OrderedDict? across different versions of Python.
    • Fixed comparison to None warning.
    Commits
    • e471e86 Updates for 5.1 release
    • 9141e90 Windows Appveyor build
    • d6cbff6 Skip certain unicode tests when maxunicode not > 0xffff
    • 69103ba Update .travis.yml to use libyaml 0.2.2
    • 91c9435 Squash/merge pull request #105 from nnadeau/patch-1
    • 507a464 Make default_flow_style=False
    • 07c88c6 Allow to turn off sorting keys in Dumper
    • 611ba39 Include license file in the generated wheel package
    • 857dff1 Apply FullLoader/UnsafeLoader changes to lib3
    • 0cedb2a Deprecate/warn usage of yaml.load(input)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 14
  • Wrong width for emoji chars

    Wrong width for emoji chars

    as reported in xonsh/xonsh#1569 some (maybe all) emojis are reported as being 2 char wide, while most (all?) terminals think they are 1 char wide:

    440944d0-63b6-11e6-8173-7e954087f26c

    The difference in position is because while editing a command xonsh is using wcwidth but to print the line it just prints and let the terminal position things

    bug enhancement 
    opened by santagada 10
  • Style and about 10x speed improvement

    Style and about 10x speed improvement

    Hi, first off: Awesome project. This world is missing unicode-aware software.

    I had a look at the code, and I found that it could easily be optimized by using more of the python standard library. The code as attached is about 10x faster on large strings (I tested it on 10MB of english text).

    With the caching hack it's about 10x as fast (3.3s vs. 37s on my machine), without it's about 3x as fast (12.5s vs. 37s on my machine).

    Tell me what you think of it and I'll format a more descriptive commit.

    Regards, jaseg

    wontfix needs-feedback 
    opened by jaseg 8
  • Are Combining characters handled correctly?

    Are Combining characters handled correctly?

    I'd also be excited about this functionality - is anyone else already working on it? I'm going to start work on a first pass implementation.

    needs-feedback 
    opened by thomasballinger 7
  • special character width problem

    special character width problem

    image

    import pandas as pd
    from tabulate import tabulate
    from io import StringIO
    import wcwidth
    
    
    csv_str = ''',0,1,2
    0,a)  将净利润调节为经营活动现金流量,,
    1,,2019 年度,2018 年度
    2,净利润,"63,205,243","55,350,200"
    3,加:资产减值损失,"(73,370)","10,465,899"
    4,信用减值损失,"3,611,595",—
    5,固定资产折旧,"6,543,253","6,530,713"
    6,投资性房地产折旧,"1,718,108","1,514,560"
    7,无形资产摊销,"447,821","440,444"
    8,长期待摊费用摊销,"338,210","189,875"
    9,处置固定资产、无形资产和其他长,,
    10,期资产的收益,"(568,141)","(175,112)"
    11,财务费用,"10,179,757","12,568,535"
    12,公允价值变动损失,"484,752","368,343"
    13,投资收益,"(4,212,538)","(5,646,311)"
    14,递延所得税资产的增加,"(3,083,170)","(2,511,075)"
    15,递延所得税负债的增加/(减少),"107,125","(644,386)"
    16,存货的增加,"(70,420,830)","(96,125,732)"
    17,受限资金的(增加)/减少,"(1,387,681)","280,759"
    18,经营性应收项目的增加,"(125,133,902)","(108,365,569)"
    19,经营性应付项目的增加,"83,217,173","135,881,219"
    '''
    
    csv_io = StringIO(csv_str)
    
    df = pd.read_csv(csv_io, index_col=0)
    
    print(df)
    
    df_tabulated = tabulate(df, headers='keys', tablefmt='psql', showindex=False)
    
    print(df_tabulated)
    
    opened by playgithub 6
  • Bad release wheel

    Bad release wheel

    ERROR: wcwidth has an invalid wheel, multiple .dist-info directories found: wcwidth-0.2.0.dist-info, wcwidth-0.2.1.dist-info
    
    opened by FFY00 5
  • Optimize wcwidth()

    Optimize wcwidth()

    Some minor optimizations for wcwidth(). Should result in ~30% performance gain.

    >>> from timeit import timeit
    >>> timeit('wcwidth("a")', setup='from wcwidth import wcwidth', number=10000000)
    

    Before optimizations: 7.065002638002625

    After optimizations: 4.69557189499028

    The main speedups come from using a set instead of a chain of boolean comparisons and passing the upper bound of the table into the binary search instead of calling length on the table for each run.

    opened by avylove 5
  • Using DerivedCombiningClass.txt to determine width is inappropriate

    Using DerivedCombiningClass.txt to determine width is inappropriate

    DerivedCombiningClass.txt contains the Canonical_Combining_Class field from UnicodeData.txt (see http://www.unicode.org/reports/tr44/#Canonical_Combining_Class_Values). This field is intended to be used for the collation algorithm.

    wcwidth.py is currently assuming that characters are zero width combining characters if and only if they have a non-zero combining class. I think this is an invalid assumption. For example, characters that are enclosing marks (General Category = Me) all have a zero combining class, but they are also zero width combining characters.

    I'm not sure what the standard way to determine zero width combining characters is. One possibility is to check for a General Category of Mn or Me, but I don't know if there are any exceptions to this. Also note that there are combining characters that do have a width (category Mc).

    bug 
    opened by philipc 5
  • Set targets on badges

    Set targets on badges

    so that clicking them goes somewhere useful rather than opening the image in the browser.

    Try it out at https://github.com/msabramo/wcwidth/tree/set_targets_on_badges

    opened by msabramo 5
  • Add tests to manifest

    Add tests to manifest

    I need tests to build the RPM. Looks like they got dropped from the dist tarball when the tests directory was moved.

    opened by avylove 4
  • Source tarball on PyPI missing bin/ and tox.ini

    Source tarball on PyPI missing bin/ and tox.ini

    Hi,

    please update MANIFEST.in to make the source tarball include all configs and tools required to build the sources.

    Thanks, Nik ([email protected])

    opened by Natureshadow 0
  • wc_rjust() doesn't work for non-printables

    wc_rjust() doesn't work for non-printables

    Firstly, this isn't a problem for me, I just wanted to let you know about it.

    Using wc_rjust() from the readme, if text contains any non-printable characters, the result is longer than length, which should never happen.

    For example, '\n' is non-printable:

    >>> wc_rjust('\n', 2, '.')
    '...\n'
    

    For reference, the width-naive version:

    >>> '\n'.rjust(2, '.')
    '.\n'
    

    The problem is because of the math here:

    max(0, (length - wcswidth(text)))
    

    If wcswidth(text) is negative, the max is length + 1.

    The simple solution is to just add a note in the readme warning about this situation, but if you wanted, you could expand the function to raise an error:

    >>> def wc_rjust(text, length, padding=' '):
    ...    from wcwidth import wcswidth
    ...    width = wcswidth(text)
    ...    if width < 0:
    ...        raise ValueError('text contains non-printable characters')
    ...    return padding * max(0, (length - width)) + text
    ...
    >>> wc_rjust('\n', 2, '.')
    Traceback (most recent call last):
      ...
    ValueError: text contains non-printable characters
    

    Ultimately, it seems like the problem is using -1 as a sentinel return value instead of raising an error, but it looks like that's inherited from the C function, so fixing that would be a lot of work.

    opened by wjandrea 1
  • Fix typos discovered by codespell

    Fix typos discovered by codespell

    https://github.com/codespell-project/codespell codespell --ignore-words-list="abov,doub,oclock,rever,ue" https://travis-ci.org/github/jquast/wcwidth

    opened by cclauss 1
  • GitHub Action to lint Python code

    GitHub Action to lint Python code

    Because Travis CI runs are no longer connected to pull requests.

    Output: https://github.com/cclauss/wcwidth/actions

    opened by cclauss 3
  • Devanagari's zero-width characters are not accounted for properly

    Devanagari's zero-width characters are not accounted for properly

    I am trying to tabulate entries containing Devanagari characters using python-tabulate. The library uses wcwidth to calculate the visible length of a string, apparently on line 768 here.

    I had opened an issue in astanin/python-tabulate#68 a while ago. The dev directed me to also open an issue here, so here I am. I will quote myself directly from the issue:


    This is how it renders

    Name            Score
    ------------  -------
    राष्ट्र परीक्षण    19.25
    Test             0
    

    versus

    Name               Score
    ---------------  -------
    Devanagari here    19.25
    Test                0
    

    How it should render:

    Name            Score
    ------------  -------
    राष्ट्र परीक्षण         19.25
    Test             0
    
    bug 
    opened by siddhpant 0
  • Variation selectors are not correctly handled

    Variation selectors are not correctly handled

    Variation selectors (U+FE0E, U+FE0F) can change column widths of some preceding characters. For example, U+270F (✏) is a single-column glyph by itself, but with a succeeding U+FE0F it occupies 2 columns as shown in the snapshot below.

    image

    bug 
    opened by Frederick888 4
  • Multi-codepoint emojis

    Multi-codepoint emojis

    Hi,

    Can wcwidth help me with multi-codepoint emojis?

    For instance, here I want to get the cell width for a "woman_mechanic_dark_skin_tone" emoji, which renders in the terminal as 2 cells, but wcswidth reports a width of 6 because it is adding up all the modifiers.

    >>> s="👩\U+1F3FF\u200d🔧"
    >>> print(repr(s))
    '👩🏿\u200d🔧'
    >>> from wcwidth import wcswidth
    >>> wcswidth(s)
    6
    >>> print(s+"\n--")
    👩🏿‍🔧
    --
    

    I've found support for these kind of emojis to be inconsistent across terminals, so maybe this is a lost cause, but is there some kind of standard for these emoji modifiers?

    bug 
    opened by willmcgugan 9
  • U+2064 to U+206f are 0 width, but wcwidth returns 1 width

    U+2064 to U+206f are 0 width, but wcwidth returns 1 width

    base = ord('\u2064')
    for i in range(12):
        print('A' + chr(base+i) + 'B')
    
    needs-research 
    opened by bspammer 2
  • Wrong width for Hindi on macOS, but correct width on Linux

    Wrong width for Hindi on macOS, but correct width on Linux

    I tried using wcwidth to calculate the length of the name for the city of Mumbai in Hindi (बॉम्बे हिंदी)

    from wcwidth import wcswidth
    wcswidth('बॉम्बे हिंदी')
    9
    

    On macOS 10.13.5 using Python 3.6.5, I see a visual width of 5 characters and a calculated width of 9 characters.

    On Ubuntu 18.04 using Python 3.6.5, I see a visual width of 9 characters and a calculated width of 9 characters.

    Thank you by the way for creating a very useful module!

    needs-research 
    opened by tleonhardt 6
  • wc might be an empty string

    wc might be an empty string

    https://github.com/jquast/wcwidth/blob/c71459ea91af86f3bbcdac2c8ed5e7773da2d848/wcwidth/wcwidth.py#L104-L182

    if wc is an empty string, an TypeError: ord() expected a character... exception will be raised

    if there would be a statement if len(wc) == 0: return 0 before ord(wc), it will be better, I think.

    bug 
    opened by choldrim 3
Releases(0.2.5)
  • 0.2.5(Jun 23, 2020)

  • 0.2.4(Jun 11, 2020)

    • minor "bugfix" to avoid using pkg_resources module on import, 7918f581feedeaa4246dc0fc03ec6fb49cff15cb
    • may help xonsh https://github.com/xonsh/xonsh/issues/3607
    Source code(tar.gz)
    Source code(zip)
  • 0.2.3(Jun 2, 2020)

  • 0.2.2(Jun 1, 2020)

    PR #23: Support all versions of Unicode, using the UNICODE_VERSION environment variable, when defined, or, for non-shells, explicitly by passing argument unicode_version to the wcwidth family of functions.

    A demonstration utility that determines the Terminal's Unicode Version is made available as a separate package, https://github.com/jquast/ucs-detect/ which contains a Problem and Solution statement.

    Source code(tar.gz)
    Source code(zip)
  • 0.1.9(Mar 23, 2020)

  • 0.1.7(Jul 2, 2016)

  • 0.1.6(Jan 8, 2016)

  • 0.1.5(Sep 14, 2015)

    • Bugfix: Resolution of "combining character width" issue, most especially those that previously returned -1 now often (correctly) return 0. resolved by Philip Craig via PR #11.
    • Deprecated: The module path wcwidth.table_comb is no longer available, it has been superseded by module path wcwidth.table_zero.
    Source code(tar.gz)
    Source code(zip)
  • 0.1.4(Nov 20, 2014)

    0.1.4

    • Feature: wcswidth() now determines printable length for (most) combining characters. The developer's tool bin/wcwidth-browser.py is improved to display combining characters when provided the --combining option (@thomasballinger and @lmontopo PR #5).
    • added static analysis (prospector) to testing framework.
    Source code(tar.gz)
    Source code(zip)
  • 0.1.3(Oct 29, 2014)

  • 0.1.2(Oct 28, 2014)

  • 0.1(May 5, 2014)

Owner
Jeff Quast
xyzzy
Jeff Quast
Rich is a Python library for rich text and beautiful formatting in the terminal.

Rich 中文 readme • lengua española readme • Läs på svenska Rich is a Python library for rich text and beautiful formatting in the terminal. The Rich API

Will McGugan 30.4k Oct 24, 2021
A simple terminal Christmas tree made with Python

Python Christmas Tree A simple CLI Christmas tree made with Python Installation Just clone the repository and run $ python terminal_tree.py More opti

Francisco B. 34 Oct 6, 2021
Simple cross-platform colored terminal text in Python

Colorama Makes ANSI escape character sequences (for producing colored terminal text and cursor positioning) work under MS Windows. PyPI for releases |

Jonathan Hartley 2.6k Oct 23, 2021
Color text streams with a polished command line interface

colout(1) -- Color Up Arbitrary Command Output Synopsis colout [-h] [-r RESOURCE] colout [-g] [-c] [-l min,max] [-a] [-t] [-T DIR] [-P DIR] [-d COLORM

nojhan 1.1k Sep 28, 2021
emoji terminal output for Python

Emoji Emoji for Python. This project was inspired by kyokomi. Example The entire set of Emoji codes as defined by the unicode consortium is supported

Taehoon Kim 1.3k Oct 16, 2021
CalcuPy 📚 Create console-based calculators in a few lines of code.

CalcuPy ?? Create console-based calculators in a few lines of code. ?? Installation pip install calcupy ?? Usage from calcupy import Calculator calc

Dylan Tintenfich 6 Sep 26, 2021
Typer, build great CLIs. Easy to code. Based on Python type hints.

Typer, build great CLIs. Easy to code. Based on Python type hints. Documentation: https://typer.tiangolo.com Source Code: https://github.com/tiangolo/

Sebastián Ramírez 6.5k Oct 22, 2021
Python and tab completion, better together.

argcomplete - Bash tab completion for argparse Tab complete all the things! Argcomplete provides easy, extensible command line tab completion of argum

Andrey Kislyuk 991 Oct 22, 2021
A drop-in replacement for argparse that allows options to also be set via config files and/or environment variables.

ConfigArgParse Overview Applications with more than a handful of user-settable options are best configured through a combination of command line args,

null 544 Oct 18, 2021
plotting in the terminal

bashplotlib plotting in the terminal what is it? bashplotlib is a python package and command line tool for making basic plots in the terminal. It's a

Greg Lamp 1.6k Oct 17, 2021
Cleo allows you to create beautiful and testable command-line interfaces.

Cleo Create beautiful and testable command-line interfaces. Cleo is mostly a higher level wrapper for CliKit, so a lot of the components and utilities

Sébastien Eustace 763 Oct 15, 2021
A fast, stateless http slash commands framework for scale. Built by the Crunchy bot team.

Roid ?? A fast, stateless http slash commands framework for scale. Built by the Crunchy bot team. ?? Installation You can install roid in it's default

Harrison Burt 7 Oct 17, 2021
A cross platform package to do curses-like operations, plus higher level APIs and widgets to create text UIs and ASCII art animations

ASCIIMATICS Asciimatics is a package to help people create full-screen text UIs (from interactive forms to ASCII animations) on any platform. It is li

null 2.8k Oct 23, 2021
Pythonic command line arguments parser, that will make you smile

docopt creates beautiful command-line interfaces Video introduction to docopt: PyCon UK 2012: Create *beautiful* command-line interfaces with Python N

null 7.5k Oct 25, 2021
Library for building powerful interactive command line applications in Python

Python Prompt Toolkit prompt_toolkit is a library for building powerful interactive command line applications in Python. Read the documentation on rea

prompt-toolkit 7.3k Oct 23, 2021
Textual is a TUI (Text User Interface) framework for Python using Rich as a renderer.

Textual is a TUI (Text User Interface) framework for Python using Rich as a renderer. The end goal is to be able to rapidly create rich termin

Will McGugan 5.8k Oct 17, 2021
Cement is an advanced Application Framework for Python, with a primary focus on CLI

Cement Framework Cement is an advanced Application Framework for Python, with a primary focus on Command Line Interfaces (CLI). Its goal is to introdu

Data Folk Labs, LLC 1k Oct 25, 2021
sane is a command runner made simple.

sane is a command runner made simple.

Miguel M. 14 Jul 24, 2021
Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.

Python Fire Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object. Python Fire is a s

Google 20.3k Oct 24, 2021