emoji terminal output for Python

Overview

Emoji

Emoji for Python. This project was inspired by kyokomi.

Example

The entire set of Emoji codes as defined by the unicode consortium is supported in addition to a bunch of aliases. By default, only the official list is enabled but doing emoji.emojize(use_aliases=True) enables both the full list and aliases.

>> import emoji
>> print(emoji.emojize('Python is :thumbs_up:'))
Python is 👍
>> print(emoji.emojize('Python is :thumbsup:', use_aliases=True))
Python is 👍
>> print(emoji.demojize('Python is 👍'))
Python is :thumbs_up:
>>> print(emoji.emojize("Python is fun :red_heart:"))
Python is fun>>> print(emoji.emojize("Python is fun :red_heart:",variant="emoji_type"))
Python is fun ❤️ #red heart, not black heart

By default, the language is English (language='en') but Spanish ('es'), Portuguese ('pt') and Italian ('it') are also supported.

>> print(emoji.emojize('Python es :pulgar_hacia_arriba:', language='es'))
Python es 👍
>> print(emoji.demojize('Python es 👍', language='es'))
Python es :pulgar_hacia_arriba:
>>> print(emoji.emojize("Python é :polegar_para_cima:", language='pt'))
Python é 👍
>>> print(emoji.demojize("Python é 👍", language='pt'))
Python é :polegar_para_cima:️

Installation

Via pip:

$ pip install emoji --upgrade

From master branch:

$ git clone https://github.com/carpedm20/emoji.git
$ cd emoji
$ python setup.py install

Developing

$ git clone https://github.com/carpedm20/emoji.git
$ cd emoji
$ pip install -e .\[dev\]
$ nosetests

The utils/get-codes-from-unicode-consortium.py may help when updating unicode_codes.py but is not guaranteed to work. Generally speaking it scrapes a table on the Unicode Consortium's website with BeautifulSoup and prints the contents to stdout in a more useful format.

Links

For English:

Emoji Cheat Sheet

Official unicode list

For Spanish:

Unicode list

For Portuguese:

Unicode list

For Italian:

Unicode list

Authors

Taehoon Kim / @carpedm20

Kevin Wurster / @geowurster

Comments
  • Upgrade fails on mac, python 3.4.2

    Upgrade fails on mac, python 3.4.2

    Executed command: pip install -U emoji Error occurred: UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 738: ordinal not in range(128)

    Update in the PYCharm IDE, maybe a invisible character, like this issue had? But It could just be PyCharm related instead.

    Collecting emoji
      Using cached emoji-0.2.tar.gz
        Complete output from command python setup.py egg_info:
        Traceback (most recent call last):
          File "<string>", line 20, in <module>
          File "/private/var/folders/zh/ntn59b4954l6tldmt4hn8bgc0000gn/T/pycharm-packaging1.tmp/emoji/setup.py", line 17, in <module>
            readme_content = f.read().strip()
          File "/Users/luckydonald/virtualenv3.4.3/bin/../lib/python3.4/encodings/ascii.py", line 26, in decode
            return codecs.ascii_decode(input, self.errors)[0]
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 738: ordinal not in range(128)
    
        ----------------------------------------
    
        Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/zh/ntn59b4954l6tldmt4hn8bgc0000gn/T/pycharm-packaging1.tmp/emoji
    
    bug 
    opened by luckydonald 16
  • Identify unicode release version or date for each emoji

    Identify unicode release version or date for each emoji

    The problem I'm trying to solve:

    I have a group of users with particularly old phones. I want to make sure that I'm sending emojis that will likely render on their phones, so I'd like the ability to scan all emojis my code plans to send and report on any that are newer than N years old.

    Possible solution

    Update this project so that the big list of emojis also includes the unicode release version and/or year. For example, this emojipedia page for 🥰 includes this text:

    Smiling Face with Hearts was approved as part of Unicode 11.0 in 2018 under the name “Smiling Face with Smiling Eyes and Three Hearts” and added to Emoji 11.0 in 2018.

    This would require finding and adding these release dates and versions and adding new call(s) to access. The emoji data files from unicode.org might be a good source.

    Other solutions?

    If others are aware of projects or databases that already do this, please post as a comment here.

    opened by truthdoug 12
  • package name conflict on PyPI

    package name conflict on PyPI

    This package installs into a top-level emoji. This conflicts with the top-level of an older project named django-emoji.

    It is impossible to use both packages within a single project.

    To import from this package, you need to import from emoji.

    This conflicts with an older project called django-emoji. To import from django-emoji you also need to import from emoji.

    pip does not have the tooling to rename packages on install.

    PyPI has a policy of unique package names, which this project violates: http://legacy.python.org/dev/peps/pep-0423/ "make sure your project name is unique, i.e. avoid duplicates:"

    opened by koliber 10
  • Tests don't pass on 1.4.0

    Tests don't pass on 1.4.0

    Attempting to package 1.4.0 for openSUSE I noticed that the tests won't pass. Specifically, it seems the regex doesn't match all characters used by the French emoji names.

    ================================================================================= test session starts ==================================================================================
    platform linux -- Python 3.6.13, pytest-3.10.1, py-1.8.1, pluggy-0.13.1
    rootdir: /home/marix/emoji, inifile:
    plugins: flake8-1.0.4, instafail-0.4.1.post0, mock-2.0.0, cov-2.8.1
    collected 13 items                                                                                                                                                                     
    
    tests/test_core.py F..........                                                                                                                                                   [ 84%]
    tests/test_unicode_codes.py ..                                                                                                                                                   [100%]
    
    ======================================================================================= FAILURES =======================================================================================
    ________________________________________________________________________________ test_emojize_name_only ________________________________________________________________________________
    
        def test_emojize_name_only():
            for lang_code, emoji_pack in emoji.EMOJI_UNICODE.items():
                for name in emoji_pack.keys():
                    actual = emoji.emojize(name, False, language=lang_code)
                    expected = emoji_pack[name]
    >               assert expected == actual, '%s != %s' % (expected, actual)
    E               AssertionError: 😉 != :visage_faisant_un_clin_d’œil:
    E               assert '😉' == ':visage_faisant_un_clin_d’œil:'
    E                 - 😉
    E                 + :visage_faisant_un_clin_d’œil:
    
    tests/test_core.py:17: AssertionError
    ========================================================================= 1 failed, 12 passed in 1.20 seconds ==========================================================================
    

    On top of that, some of the German emoji names contain the colon character, which obviously cannot work.

    opened by theMarix 8
  • Missing emojis.

    Missing emojis.

    Hello, I'm printing on a webpage some texts taken from GitHub repositories that contains some emojis.

    The emojis used are all supported by GitHub and present in this collection https://github.com/ikatyang/emoji-cheat-sheet but calling emoji.emojize(text, language="alias") on the texts I noticed that some emojis are missing, eg:

    • :it: :it:
    • :zombie_man: :zombie_man:

    I know that these aliases are not official, but it would be nice to support them since they are supported by GitHub.

    opened by fabiocaccamo 7
  • Duplication of str using PySimpleGUI

    Duplication of str using PySimpleGUI

    Problem: I'm creating a GUI using PySimpleGUI. When I create a string variable to be displayed using the Text reference in PySimpleGUI with an emoji the entire string is duplicated.

    Packages Installed: PySimpleGUI Version 4.49.0 Emoji Version 1.5.0

    Example:

    app_title = emoji.emojize('App Title :football:')
    [sg.Text(app_title, font=('Any', 32))]
    

    Screenshots: Doubling of the STR

    Code showing STR and Emoji No Emoji in code. No doubling of STR Code showing no Emoji in code. No doubling of STR

    Notes: I tried several emojis

    opened by wildernessfamily 7
  • emoji.decode() is fundamentally broken, not needed, and should be removed.

    emoji.decode() is fundamentally broken, not needed, and should be removed.

    @carpedm20 having emoji.EMOJI_UNICODE gives us an easy way to look up unicode codes by emoji and emoji.UNICODE_EMOJI gives us an easy way to look up emoji names by unicode codes, but multiple aliases point to the same unicode code so we can't do reverse alias lookups. See the sample code below. About 400 aliases are dropped.

    emoji.decode() really isn't that useful and I vote we just remove it. Any objections?

    >>> import emoji
    >>> len(emoji.EMOJI_UNICODE)
    1282
    >>> len(emoji.UNICODE_EMOJI)
    1282
    >>> len(emoji.EMOJI_ALIAS_UNICODE)
    1694
    >>> len(emoji.UNICODE_EMOJI_ALIAS)
    1279
    
    bug 
    opened by geowurster 7
  • Taking a crack at issue #14

    Taking a crack at issue #14

    I'm pretty new to Python, but this seems to work. Having trouble safely ensuring that a given string sent to demojize is interpreted as Unicode, so right now I've just added that as a requirement going in.

    opened by BrendanJercich 6
  • Turkish Language

    Turkish Language

    Hi @cvzi,

    I am trying to demojize the emojis for the Turkish language and based on your doc I added the Turkish language and I am using it; however, some emojis are missing. For example,
    u'\U0001F3F4\U0000200D\U00002620\U0000FE0F' -> 🏴‍☠️ there is no Turkish equivalent text for that. But there is tts for u'\U0001F3F4\U0000200D\U00002620' -> 🏴‍☠

    I am actually working with Twitter data so I need to demojize the tweets. I want to know what I can do regarding this problem. Need to mention that, I also scraped the emojis from Emojiterra website to merge the DB, and it resolved some issues but still some of them are missing.

    All I want is demojizing the Twitter-supported emojis.

    Best!

    opened by AliNajafi1998 5
  • Proposal for several improvements

    Proposal for several improvements

    Since the EMOJI_DATA "dict of dicts" was added in #189 that contains all the data now, I like to propose some further (breaking) changes to the package:

    • Remove the old dicts
    • Only generate the dict that is needed for the language
    • Update the language data
    • Specify what happens when an emoji is not translated

    If we make any of these changes, I would suggest to update the version number to 2.0.0 to indicate that it is not backwards compatible. I would implement these changes myself. Please reply with your thoughts and if you have any other suggestions or objections.

    Remove the old dicts

    The old dicts like EMOJI_UNICODE_ENGLISH, UNICODE_EMOJI_ENGLISH, EMOJI_UNICODE_GERMAN, etc are now generated from the dict of dicts EMOJI_DATA. We don't need any of the old UNICODE_EMOJI_* dicts anymore, because we can just do EMOJI_DATA[unicode_point]. Currently they are public in the package, so removing them would be a breaking change.

    Only generate the dict that is needed for the language

    We still need the EMOJI_UNICODE_* to reverse the dict of dict: EMOJI_UNICODE_ENGLISH[":apple:"] = unicode_point_apple Currently we generate all EMOJI_UNICODE_ENGLISH, EMOJI_UNICOD_GERMAN, etc. when the package is imported. I would like to generate them only if the language is actually used, because it is unlikely that someone uses different languages at the same time.

    I suggest to use the following to store the dicts and only generate the data if needed:

    _EMOJI_UNICODE = {
        'en': None,
        'es': None,
        'pt': None,
        'it': None,
        'fr': None,
        'de': None,
    }
    _ALIASES_UNICODE = {}
    

    I would change to these variable names and prefix with _ to indicate that they are not public. Because they are only generated when needed and otherwise empty, they should no be used and therefore should not be public.

    ~~Improve demojize performance~~

    Implemented in Version 1.6.2 The speed of `demojize()` has become slower with the Python 3 releases. The problem is that we use a huge regular expression that consists of all unicode code points. `len(emoji.get_emoji_regexp().pattern) = 19559` This was fast in Python 2, but in Python 3 it is really slow (it is about 15x slower on Python 3.10 than on Python 2.7). I am working on this already and I replaced the regular expression with a search in a [tree](https://en.wikipedia.org/wiki/Tree_(data_structure)) to improve the speed by about 5-10x, I will test this further to see if it works.

    There was a suggestion in #30 to use ranges e.g [\U0001F1E6-\U0001F1FF] for consecutive codepoints in the regular expression. I initially tested this a little bit, by replacing a few emojis with ranges by hand. It would also be faster than the current regular expression, but I think it would be harder to implement and it is unclear to me how much faster if would be in the end.

    Update the language data

    The emojiterra.com website uses the tag release-38 for the translations, so e.g. https://github.com/unicode-org/cldr/raw/release-38/common/annotations/de.xml It is from October 2020

    There are three newer releases for the translations, release-38-1 from December 2020, release-39 from April 2021 and release-40 from end of October 2021. I don't know if want to use the newer ones, or if we want to wait until emojiterra.com updates.

    At the moment our translation data is a mixture of different (older) versions. When we update to either 38, 38.1, 39 or 40 it will change some existing translations. That is a problem if someone has stored a text with the old translation in their database, so it is a breaking change.

    I would suggest to update at least to release-38, so we have the same names as emojiterra.com

    Also we should think about how we deal with future updates: The translations can change with every release. A translation from release-38 might be different in release-39, so with the next update, we might have a breaking change again. We should make a note in the README how we are going to deal with this in the future.

    How to handle the case when a string contains an emoji that is supported in 'en' but not in the language selected?

    The demojize() method currently skips the emoji when it is not available in the selected language. I think it is not obvious that this can happen. One would assume that after demojize(string, language='xy') the string does not contain any emoji anymore. Example with :empty_nest: which is in Unicode 14 and has not been translated into Spanish 'es' yet:

        >> text_with_emoji_from_version_14 = emoji.emojize(':knife: :empty_nest:')
        >> emoji.demojize(text_with_emoji_from_version_14, language='en')
        >> ':knife: :empty_nest:'
        >> emoji.demojize(text_with_emoji_from_version_14, language='es')
        >> ':knife: \U0001fab9'
    

    I don't think we need to change this, but we should add a note to the README that explains what can happen. If we want to change it, it would be a breaking change as well, so now would be the time to do it.

    opened by cvzi 5
  • Fixed implementation error of `emoji.distinct_emoji_lis`

    Fixed implementation error of `emoji.distinct_emoji_lis`

    This library had a distinct_emoji_lis function to distinct list of emojis from the string, but it didn't work correctly due to an implementation error. It has been fixed, please review it.

    opened by daima3629 5
  • Add `regional_indicator_*` / big letter emojis

    Add `regional_indicator_*` / big letter emojis

    emoji currently supports these emojis combined together to form flag codes, such as:

    ':Afghanistan:' = '🇦🇫'
    

    The 🇦🇫 emoji is actually a combination of two emojis: 🇦 and 🇫. There is currently no way to emojize/demojize to these single character "regional indicator emojis."

    In Discord and the GNOME Characters app, these are known as "Regional Indicators," but they are not an official part of the unicode spec (besides as pieces of flags).

    If they were to be included, an example of their mapping could be:

    ':regional_indicator_f:' = '🇫'
    

    This is the naming convention Discord chooses for these symbols, but it may not be the best choice.

    opened by super-cooper 0
  • Multi-person skin tones

    Multi-person skin tones

    Multi-Person Skin Tones on unicode.org

    Edit: here's a tool to create these: https://codepen.io/cvzi/full/RwQNJBK

    These are currently not RGI by unicode (Recommended for General Interchange), which means they should not be generated with emojize(). However they work in some phones and browsers. For example a family of 4 persons with 4 different skin tones: 👨🏽‍👩🏿‍👧🏻‍👦🏾 This emoji consists of:

    • 👨🏽 :man_medium_skin_tone:
    • \u200d ZWJ
    • 👩🏿 :woman_dark_skin_tone:
    • \u200d ZWJ
    • 👧🏻 :girl_light_skin_tone:
    • \u200d ZWJ
    • 👦🏾 :boy_medium-dark_skin_tone:

    demojize() currently converts that emoji to: '👨:medium_skin_tone:\u200d👩:dark_skin_tone:\u200d:girl_light_skin_tone:\u200d:boy_medium-dark_skin_tone:'

    Possible solutions:

    Convert the man and woman as well to (minimal solution):

    • :man::medium_skin_tone:\u200d:woman::dark_skin_tone:\u200d:girl_light_skin_tone:\u200d:boy_medium-dark_skin_tone:

    or combine the skin tones into man and woman as well:

    • :man_medium_skin_tone:\u200d:woman_dark_skin_tone:\u200d:girl_light_skin_tone:\u200d:boy_medium-dark_skin_tone:

    remove the skin tones

    • :family_man_woman_girl_boy:'

    Or with the skin tones:

    • :family_man_woman_girl_boy_medium_dark_light_skin_medium-dark_tone:'

    Edit:

    Probably the easiest one is this: :man_medium_skin_tone:\u200d:woman_dark_skin_tone:\u200d:girl_light_skin_tone:\u200d:boy_medium-dark_skin_tone:

    Have to decide if we want to remove the \u200d or not. If we keep the \u200d, emojize() can revert the string correctly i.e. emojize(demojize(str)) == str. I don't know what's the effect of having them though, :\u200d: might be displayed strangely.

    opened by cvzi 1
  • Split chars of a string

    Split chars of a string

    Hey man, I'm the author of alive-progress. I'm struggling to correctly support emojis in there (https://github.com/rsalmei/alive-progress/issues/19), and I think this project could help me.

    Please, how could I split the chars of a string, including emojis of all kinds?

    For example:

    In [17]: [(x, hex(ord(x)), unicodedata.east_asian_width(x)) for x in 'a👩‍❤️‍💋‍👩a']
    Out[17]:
    [('a', '0x61', 'Na'),
     ('👩', '0x1f469', 'W'),
     ('\u200d', '0x200d', 'N'),
     ('❤', '0x2764', 'N'),
     ('️', '0xfe0f', 'A'),
     ('\u200d', '0x200d', 'N'),
     ('💋', '0x1f48b', 'W'),
     ('\u200d', '0x200d', 'N'),
     ('👩', '0x1f469', 'W'),
     ('a', '0x61', 'Na')]
    

    How could I correctly detect the three chars in this string using your code?

    Update: I've seen there's a regexp function, but that does not help yet:

    In [4]: emoji.get_emoji_regexp().split('asd😀►✧☘️❤️ok')
    Out[4]: ['asd', '😀', '►✧', '☘️', '', '❤️', 'ok']
    

    It still doesn't split the 'asd' and '►✧' and 'ok' strings, and I cannot seem to diferentiate them from the other grapheme clusters. (github doesn't correctly show ☘️ and ❤️ inside code blocks, which are the glyph variants, not the text ones.

    help wanted 
    opened by rsalmei 0
  • how to filter all the emojis?

    how to filter all the emojis?

    The raw text is : Pour l’amour de dieux me laisser plus jamais sortir avec un roux boutonneux 🙏🏻 and the result is: Pour l’amour de dieux me laisser plus jamais sortir avec un roux boutonneux :folded_hands_light_skin_tone:

    Whether i need a regex process to get the final result as follow ? Pour l’amour de dieux me laisser plus jamais sortir avec un roux boutonneux

    help wanted 
    opened by liutianling 0
Releases(v2.2.0)
Owner
Taehoon Kim
Taehoon Kim
Simple cross-platform colored terminal text in Python

Colorama Makes ANSI escape character sequences (for producing colored terminal text and cursor positioning) work under MS Windows. PyPI for releases |

Jonathan Hartley 3k Jan 1, 2023
Rich is a Python library for rich text and beautiful formatting in the terminal.

Rich 中文 readme • lengua española readme • Läs på svenska Rich is a Python library for rich text and beautiful formatting in the terminal. The Rich API

Will McGugan 41.4k Jan 2, 2023
Python library that measures the width of unicode strings rendered to a terminal

Introduction This library is mainly for CLI programs that carefully produce output for Terminals, or make pretend to be an emulator. Problem Statement

Jeff Quast 305 Dec 25, 2022
A thin, practical wrapper around terminal capabilities in Python

Blessings Coding with Blessings looks like this... from blessings import Terminal t = Terminal() print(t.bold('Hi there!')) print(t.bold_red_on_brig

Erik Rose 1.4k Jan 7, 2023
Terminalcmd - a Python library which can help you to make your own terminal program with high-intellegence instruments

Terminalcmd - a Python library which can help you to make your own terminal program with high-intellegence instruments, that will make your code clear and readable.

Dallas 0 Jun 19, 2022
plotting in the terminal

bashplotlib plotting in the terminal what is it? bashplotlib is a python package and command line tool for making basic plots in the terminal. It's a

Greg Lamp 1.7k Jan 2, 2023
Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.

Python Fire Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object. Python Fire is a s

Google 23.6k Dec 31, 2022
Python composable command line interface toolkit

$ click_ Click is a Python package for creating beautiful command line interfaces in a composable way with as little code as necessary. It's the "Comm

The Pallets Projects 13.3k Dec 31, 2022
Library for building powerful interactive command line applications in Python

Python Prompt Toolkit prompt_toolkit is a library for building powerful interactive command line applications in Python. Read the documentation on rea

prompt-toolkit 8.1k Dec 30, 2022
Python and tab completion, better together.

argcomplete - Bash tab completion for argparse Tab complete all the things! Argcomplete provides easy, extensible command line tab completion of argum

Andrey Kislyuk 1.1k Jan 8, 2023
Typer, build great CLIs. Easy to code. Based on Python type hints.

Typer, build great CLIs. Easy to code. Based on Python type hints. Documentation: https://typer.tiangolo.com Source Code: https://github.com/tiangolo/

Sebastián Ramírez 10.1k Jan 2, 2023
Python Command-line Application Tools

Clint: Python Command-line Interface Tools Clint is a module filled with a set of awesome tools for developing commandline applications. C ommand L in

Kenneth Reitz Archive 82 Dec 28, 2022
Textual is a TUI (Text User Interface) framework for Python using Rich as a renderer.

Textual is a TUI (Text User Interface) framework for Python using Rich as a renderer. The end goal is to be able to rapidly create rich termin

Will McGugan 17k Jan 2, 2023
Cement is an advanced Application Framework for Python, with a primary focus on CLI

Cement Framework Cement is an advanced Application Framework for Python, with a primary focus on Command Line Interfaces (CLI). Its goal is to introdu

Data Folk Labs, LLC 1.1k Dec 31, 2022
Corgy allows you to create a command line interface in Python, without worrying about boilerplate code

corgy Elegant command line parsing for Python. Corgy allows you to create a command line interface in Python, without worrying about boilerplate code.

Jayanth Koushik 7 Nov 17, 2022
prompt_toolkit is a library for building powerful interactive command line applications in Python.

Python Prompt Toolkit prompt_toolkit is a library for building powerful interactive command line applications in Python. Read the documentation on rea

prompt-toolkit 8.1k Jan 4, 2023
Train emoji embeddings based on emoji descriptions.

emoji2vec This is my attempt to train, visualize and evaluate emoji embeddings as presented by Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko

Miruna Pislar 17 Sep 3, 2022
emoji terminal output for Python

Emoji Emoji for Python. This project was inspired by kyokomi. Example The entire set of Emoji codes as defined by the unicode consortium is supported

Taehoon Kim 1.6k Jan 2, 2023
Uses diff command to compare expected output with student's submission output

AUTOGRADER for GRADESCOPE using diff with partial grading Description: Uses diff command to compare expected output with student's submission output U

null 2 Jan 11, 2022