Python Lex-Yacc

Related tags

Text Processing ply

PLY (Python Lex-Yacc)

Copyright (C) 2001-2020 David M. Beazley (Dabeaz LLC) All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the David Beazley or Dabeaz LLC may be used to endorse or promote products derived from this software without specific prior written permission.



PLY is a 100% Python implementation of the common parsing tools lex and yacc. Here are a few highlights:

  • PLY is very closely modeled after traditional lex/yacc. If you know how to use these tools in C, you will find PLY to be similar.

  • PLY provides very extensive error reporting and diagnostic information to assist in parser construction. The original implementation was developed for instructional purposes. As a result, the system tries to identify the most common types of errors made by novice users.

  • PLY provides full support for empty productions, error recovery, precedence specifiers, and moderately ambiguous grammars.

  • Parsing is based on LR-parsing which is fast, memory efficient, better suited to large grammars, and which has a number of nice properties when dealing with syntax errors and other parsing problems. Currently, PLY builds its parsing tables using the LALR(1) algorithm used in yacc.

  • PLY uses Python introspection features to build lexers and parsers.
    This greatly simplifies the task of parser construction since it reduces the number of files and eliminates the need to run a separate lex/yacc tool before running your program.

  • PLY can be used to build parsers for "real" programming languages. Although it is not ultra-fast due to its Python implementation, PLY can be used to parse grammars consisting of several hundred rules (as might be found for a language like C). The lexer and LR parser are also reasonably efficient when parsing typically sized programs. People have used PLY to build parsers for C, C++, ADA, and other real programming languages.

How to Use

PLY consists of two files : and These are contained within the ply directory which may also be used as a Python package. To use PLY, simply copy the ply directory to your project and import lex and yacc from the associated ply package. For example:

from .ply import lex
from .ply import yacc

Alternatively, you can copy just the files and individually and use them as modules however you see fit. For example:

import lex
import yacc

If you wish, you can use the script to install PLY into virtual environment.

PLY has no third-party dependencies.

The docs/ directory contains complete documentation on how to use the system. Documentation available at

The example directory contains several different examples including a PLY specification for ANSI C as given in K&R 2nd Ed.

A simple example is found at the end of this document


PLY requires the use of Python 3.6 or greater. However, you should use the latest Python release if possible. It should work on just about any platform.

Note: PLY does not support execution under python -OO. It can be made to work in that mode, but you'll need to change the programming interface with a decorator. See the documentation for details.


Official Documentation is available at:

More information about PLY can be obtained on the PLY webpage at:

For a detailed overview of parsing theory, consult the excellent book "Compilers : Principles, Techniques, and Tools" by Aho, Sethi, and Ullman. The topics found in "Lex & Yacc" by Levine, Mason, and Brown may also be useful.

The GitHub page for PLY can be found at:


A special thanks is in order for all of the students in CS326 who suffered through about 25 different versions of these tools :-).

The CHANGES file acknowledges those who have contributed patches.

Elias Ioup did the first implementation of LALR(1) parsing in PLY-1.x. Andrew Waters and Markus Schoepflin were instrumental in reporting bugs and testing a revised LALR(1) implementation for PLY-2.0.


Here is a simple example showing a PLY implementation of a calculator with variables.

# -----------------------------------------------------------------------------
# A simple calculator with variables.
# -----------------------------------------------------------------------------

tokens = (

# Tokens

t_PLUS    = r'\+'
t_MINUS   = r'-'
t_TIMES   = r'\*'
t_DIVIDE  = r'/'
t_EQUALS  = r'='
t_LPAREN  = r'\('
t_RPAREN  = r'\)'
t_NAME    = r'[a-zA-Z_][a-zA-Z0-9_]*'

def t_NUMBER(t):
    t.value = int(t.value)
    return t

# Ignored characters
t_ignore = " \t"

def t_newline(t):
    t.lexer.lineno += t.value.count("\n")

def t_error(t):
    print(f"Illegal character {t.value[0]!r}")

# Build the lexer
import ply.lex as lex

# Precedence rules for the arithmetic operators
precedence = (

# dictionary of names (for storing variables)
names = { }

def p_statement_assign(p):
    'statement : NAME EQUALS expression'
    names[p[1]] = p[3]

def p_statement_expr(p):
    'statement : expression'

def p_expression_binop(p):
    '''expression : expression PLUS expression
                  | expression MINUS expression
                  | expression TIMES expression
                  | expression DIVIDE expression'''
    if p[2] == '+'  : p[0] = p[1] + p[3]
    elif p[2] == '-': p[0] = p[1] - p[3]
    elif p[2] == '*': p[0] = p[1] * p[3]
    elif p[2] == '/': p[0] = p[1] / p[3]

def p_expression_uminus(p):
    'expression : MINUS expression %prec UMINUS'
    p[0] = -p[2]

def p_expression_group(p):
    'expression : LPAREN expression RPAREN'
    p[0] = p[2]

def p_expression_number(p):
    'expression : NUMBER'
    p[0] = p[1]

def p_expression_name(p):
    'expression : NAME'
        p[0] = names[p[1]]
    except LookupError:
        print(f"Undefined name {p[1]!r}")
        p[0] = 0

def p_error(p):
    print(f"Syntax error at {p.value!r}")

import ply.yacc as yacc

while True:
        s = input('calc > ')
    except EOFError:

Bug Reports and Patches

My goal with PLY is to simply have a decent lex/yacc implementation for Python. As a general rule, I don't spend huge amounts of time working on it unless I receive very specific bug reports and/or patches to fix problems. At this time, PLY is mature software and new features are no longer being added. If you think you have found a bug, please visit the PLY Github page at to report an issue.

Take a Class!

If you'd like to learn more about compiler principles and have a go at implementing a compiler, come take a course.

-- Dave

  • how do i do parsing in PLY of the expressions containing SET Operators (UNION/INTERSECT/MINUS) and function calls?

    how do i do parsing in PLY of the expressions containing SET Operators (UNION/INTERSECT/MINUS) and function calls?

    I have to do Lexing and Parsing using PLY. So, my expressions can look something like below :

    1.) (func1(b) INTERSECT func1(c)) UNION func2(a) 2.) func3(func1(b)) MINUS func1(d)

    a,b,c are some names on the basis of which list will be returned from the fucntions func1,func2,func3

    I have figured out the grammar rules, they will be something like below:

    expression -> expression OP expression expression -> func1(variable) OP expression | func2(variable) OP expression | expression | func3(variable) OP expression expression -> func1(expression) | func2(expression) | func3(expression) expression -> (expression) variable -> String OP -> UNION | INTERSECT | MINUS (or we can use python symbols instead which can be |(union) | & | - )

    Below is how basic code look like :

    import ply.lex as lex 1.) What should i declare in tokens so that my functions in the expression are identified correctly? (func1,func2,func3) ? Below is how we declare tokens in

    tokens = ( 'LPAREN', 'RPAREN', 'INTERSECT', 'UNION', 'MINUS', ) 2.) What should i declare in regular expressions part so that my INTERSECT,UNION and MINUS and functions are identified correctly? Below is how we declare tokens in

    t_LPAREN = r'(' t_RPAREN = r')'

    def t_newline(t): r'\n+' t.lexer.lineno += len(t.value)

    t_ignore = ' \t'

    def t_error(t): print("Illegal character '%s'" % t.value[0]) t.lexer.skip(1)

    lexer = lex.lex()

    The role of the functions in my expressions is that, they will return a "list" and then the intersection/union/minus etc will be done on returned lists.

    How do I write my so that my functions, parantheses and UNION/INTERSECT/MINUS operations are identified correctly?

    I'm trying to look for examples of PLY which does these sort of things, but i end up finding with normal +/- arithmetic operations code.

    Please help me to find the solution for this type of case.

    opened by nikitagupta55 20
  • TypeError in lex

    TypeError in lex

    With the 3.6 update, I see the following TypeError where slimit makes a call to ply:

    line 893, in lex
        if '.' not in lextab:
    TypeError: argument of type 'module' is not iterable
    opened by rmfitzpatrick 18
  • Cannot install ply 3.5 via pip

    Cannot install ply 3.5 via pip

    Error message is:

    $ pip install ply==3.5
    Collecting ply==3.5
      Could not find a version that satisfies the requirement ply==3.5 (from versions: 3.4)
      Some externally hosted files were ignored as access to them may be unreliable (use --allow-external ply to allow).
      No matching distribution found for ply==3.5
    opened by virtuald 15
  • Ply keeps regenerating even though nothing has changed

    Ply keeps regenerating even though nothing has changed

    Create a file with the following contents:

    #!/usr/bin/python3 -tt
    import ply.lex as lex
    import ply.yacc as yacc
    tokens = ['NAME']
    def p_word(t):
        'word : NAME'
        t[0] = t[1]
    parser = yacc.yacc()
    result = parser.parse('abcd')

    Then do this:

    mkdir subdir; cd subdir ../

    Ply will generate in the subdir. But if you run again, it will regenerate and overwrite the old one. The produced files are always different. A sample diff between two runs looks like this.

    < _lr_action_items = {'NAME':([0,],[1,]),'$end':([1,2,],[-1,0,]),}
    > _lr_action_items = {'$end':([1,2,],[-1,0,]),'NAME':([0,],[1,]),}

    The same happens when you try to write output in a different directory with yacc.yacc(outputdir='some_dir').

    If you run the script with ./ it will work.

    opened by jpakkane 12
  • Please make the files reproducible

    Please make the files reproducible

    Whilst working on the Debian reproducible builds effort, I noticed that python-ply generates files with non-determinstic contents.

    I first had a quick go at fixing this by adding a bunch of sorts inside write_table but looking deeper into the data structures it appears that "more" determinism is needed to ensure that the states are consistently numbered across builds. There are whole bunch of iterations over dict's items() throughout the table generation which—as you are no doubt aware—are non-determinstic. I'm sure some of these are harmless from a reproducibility point of view, so simply adding sorted() everywhere would be a total mess.

    Of course, one solution would be to wontfix this and simply decree that these files are non-determistc.. but that would require that Debian etc. would not be able to ship these useful optimisations as they would render the package unreproducible.

    opened by lamby 11
  • Question: How to add rules to parser program on the fly?

    Question: How to add rules to parser program on the fly?

    I have a parser program with a set of rules. When a statement is called, I want to add new rules to my parser. Its just like an import statement. Is there a way I can add rules to parser program on the fly (While the parser is running)?

    opened by bifunctor 10
  • Unable to parse seemingly good input

    Unable to parse seemingly good input

    Hi Everyone,

    Please consider the following self contained program:

    import ply.yacc as yacc
    import ply.lex as lex
    tokens = ('N','D','I')
    t_N = r'N'
    t_D = r'D'
    t_I = r'I'
    t_ignore = r' '
    def t_error(e):
    start = 'C'
    def p_C(p):
        """ C : N Cbody D I """
        print "C"
    def p_Cbody(p):
        """ Cbody : I Cbody
                  | D Cbody
        print "Cbody"
    def p_error(p):
        print "ERROR:",p
    print "Version:",yacc.__version__
    lexer = lex.lex()
    parser = yacc.yacc(debug=False,write_tables=False)
    parser.parse('N I D I')

    Unless I'm not understanding it correctly, the input string "N I D I" should be easily parsed by the above grammar. However ply gives me EOF error:

    Version: 3.7
    ERROR: None

    Does anybody know what is going on here? I'm using python 2.7.12. I've tried ply 3.10 with the same result.

    Thanks. -Mike

    opened by mikeyupol 9
  • Does ignoring comments work in Python 3.2?

    Does ignoring comments work in Python 3.2?


    Discarding comments by returning None seems does not work. If add:

    def t_COMMENT(t):
        # No return value. Token discarded

    And run example from the documentation with Python3.2, we will get:

    Traceback (most recent call last): File "./parsing/", line 115, in yacc.parse(s) File "./parsing/ply/", line 303, in parse return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc) File "./parsing/ply/", line 1095, in parseopt_notrack tok = call_errorfunc(self.errorfunc, errtoken, self) File "./parsing/ply/", line 196, in call_errorfunc r = errorfunc(token) File "./parsing/", line 106, in p_error print("Syntax error at '%s'" % t.value) AttributeError: 'NoneType' object has no attribute 'value'

    The same is true for simple '\n'. Ideas?


    opened by 5nizza 9
  • Incorrect shift-reduce conflict resolution with precedence specifier

    Incorrect shift-reduce conflict resolution with precedence specifier


    While learning to use PLY and experimenting with it I've ran into strange issue. My grammar has simple shift-reduce conflicts that I try to resolve via precedence specification. However, the resolution path taken by PLY seems to be incorrect: PLY chooses to reduce rule instead of shifting next token, although token is of higher priority than the rule used for reduction.

    The issue is illustrated in attached files with a bit modified classical 'dangling else' problem

    Archive contains

    1. and (bare minimum)
    2. parser.out (generated by PLY 3.9 from PyPI)
    3. bisontest.y (Yacc grammar - the same as in
    4. y.output generated by Yacc, bisontest.output generated by Bison

    What seems wrong

    The grammar part in question is (it's a pointless grammar only for illustrative purposes):

    if_stmt      :   IF stmt ELSE stmt FI
                 |   IF stmt                 %prec IFX

    Token ELSE has higher precedence than IFX:

    precedence = (
        ('nonassoc',    'IFX'),
        ('nonassoc',    'ELSE'),

    As such, I expect rule if_stmt -> IF stmt to have less priority than token ELSE, so shift-reduce conflict should be resolved by shifting (per documentation): If the current token has higher precedence than the rule on the stack, it is shifted. However, when shift-reduce conflict arises (state 11 in parser.out) PLY chooses to reduce using production if_stmt -> IF stmt instead of shifting ELSE token:

    state 11
        (6) if_stmt -> IF stmt . ELSE stmt FI
        (7) if_stmt -> IF stmt .
        ELSE            reduce using rule 7 (if_stmt -> IF stmt .)
      ! ELSE            [ shift and go to state 14 ]

    Yacc and Bison, on the other hand, decide to shift token (see state 9 both in y.output and bisontest.output).

    What's interesting is that if order of rules is reversed, PLY makes a decision to shift. I.e. rewriting grammar the following way

    if_stmt      :   IF stmt                 %prec IFX
                 |   IF stmt ELSE stmt FI


    state 9
        (6) if_stmt -> IF stmt .
        (7) if_stmt -> IF stmt . ELSE stmt FI
        ELSE            shift and go to state 13
      ! ELSE            [ reduce using rule 6 (if_stmt -> IF stmt .) ]

    Is it a bug?

    Thanks, Roman

    opened by RomaVis 8
  • Undesirable creation of file

    Undesirable creation of file

    Hi there!

    I work on Mozilla Firefox, and I'm finding that while running certain automated tests, a "" file is created in Firefox's source directory.

    I have traced the creation of this file to the PLY package, with the following backtrace:

    -> main(sys.argv[1:])
    -> sys.exit(
    -> return self._run(argv)
    -> debug_command=args.debug_command, **vars(args.command_args))
    -> result = fn(**kwargs)
    -> return self._run_reftest(**kwargs)
    -> return reftest.run_desktop_test(**kwargs)
    -> import runreftest
    -> module = self._original_import(name, globals, locals, fromlist, level)
    -> from marionette import Marionette
    -> module = self._original_import(name, globals, locals, fromlist, level)
    -> from .runner import (
    -> module = self._original_import(name, globals, locals, fromlist, level)
    -> from .mixins import (
    -> module = self._original_import(name, globals, locals, fromlist, level)
    -> from .browsermob import (
    -> module = self._original_import(name, globals, locals, fromlist, level)
    -> from browsermobproxy import Server
    -> module = self._original_import(name, globals, locals, fromlist, level)
    -> from .server import Server
    -> module = self._original_import(name, globals, locals, fromlist, level)
    -> from .client import Client
    -> module = self._original_import(name, globals, locals, fromlist, level)
    -> import requests
    -> module = self._original_import(name, globals, locals, fromlist, level)
    -> from .packages.urllib3.contrib import pyopenssl
    -> module = self._original_import(name, globals, locals, fromlist, level)
    -> import OpenSSL.SSL
    -> module = self._original_import(name, globals, locals, fromlist, level)
    -> from OpenSSL import rand, crypto, SSL
    -> module = self._original_import(name, globals, locals, fromlist, level)
    -> from OpenSSL._util import (
    -> module = self._original_import(name, globals, locals, fromlist, level)
    -> binding = Binding()
    -> self._ensure_ffi_initialized()
    -> libraries=libraries,
    -> ffi.cdef("\n".join(cdef_sources))
    -> self._parser.parse(csource, override=override, packed=packed)
    -> self._internal_parse(csource)
    -> ast, macros, csource = self._parse(csource)
    -> ast = _get_parser().parse(csource)
    -> _parser_cache = pycparser.CParser()
    -> lextab=lextab)
    -> self.lexer = lex.lex(object=self, **kwargs)
    -> lexobj.writetab(lextab,outputdir)
    > /home/botond/dev/mozilla/central/other-licenses/ply/ply/
    -> tf.write("# This file automatically created by PLY (version %s). Don't edit!\n" % (tabfile,__version__))

    I realize the creation of this file is probably not a bug in PLY; I was just hoping that I could get some guidance from the developers of PLY that would help me determine which component of the stack to blame for its creation in the source directory.

    Specifically, I would be interested to know:

    • What is the purpose of this file
    • What options does PLY provide for controlling whether and where to create this file

    Thanks in advance!

    opened by theres-waldo 8
  • Making the parser and lexer class-based

    Making the parser and lexer class-based

    I think in the interests of making ply more pythonic, I think that we should provide a parser and lexer class that the users extend and add their parsing/lexing functions. The parser/lexer base class would then using getattr/hasattr to call these functions.

    The advantage of this, is that we'd be firstly be removing the 'magic' from ply, (where it just grabs all functions from the current module starting with t_ or p_), which should make it more intuitive to use. Secondly, it allows for more logical namespacing of functions. At the moment, there can only be 1 parser/lexer per module which is a very strange convention.

    I'm happy to do a PR with these changes but I was thinking of running it past the author first. If moving towards classes isn't in your plan then I might consider maintaining a fork of ply.

    opened by multimeric 8
  • Warn user about tuple flattening in precedence table

    Warn user about tuple flattening in precedence table

    When creating a precedence table, the example in the documentation suggests to do something like this:

    precedence = (
         ('left', 'PLUS', 'MINUS'),
         ('left', 'TIMES', 'DIVIDE'),

    There's a very easy mistake to make here, if you have a single entry in the table and you miss the end comma, e.g.

    precedence = (
        ('left', 'ARROW')

    Python flattens the tuple, so the precedence would simply be ('left', 'ARROW') instead of (('left', 'ARROW'),). The way to fix this is to simply add a , to the end of the entry, but its not clear what the issue is once you make a mistake. This flattening trips the following line:

    My suggestion is to make this line a bit more descriptive, including what it expected, what it got, and a note about tuple flattening:

    I intend on making a pull request with a solution later.

    opened by meetowl 3
  • Remove broken method and update readme

    Remove broken method and update readme

    This is a redo of #267. When #267 was closed, I saw the notice in 818ab0684e33f5f513fc839673ff56ea330b6380 and thought that the PR was closed just as part of closing everything out and retiring the project which was understandable. I came back to reference to the notice now though and saw that it was changed to which implies that the project is maintained again. So I am submitting the suggestion again.

    Looking through the git history, it looks like bc4321d25db0b37f5e6264c58827d69264aa0260 made some major changes that converted a global Prodnames variable into an attribute of the Grammar class. This change missed one usage of Prodnames in the Production class which was then changed several years later to self.Prodnames in a813d9d76a4b67b0f6666c9933a59254a163d19b which satisfied flake8 at least at that time, even though Production has no Prodnames attribute. Since from the git history, lr_item has not been working for over ten years and is not referenced elsewhere in the project, this PR just removes it rather than trying to fix it.

    opened by wshanks 0
David Beazley
Author of the Python Essential Reference (Addison-Wesley), Python Cookbook (O'Reilly), and former computer science professor. Come take a class!
David Beazley
Python character encoding detector

Chardet: The Universal Character Encoding Detector Detects ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants) Big5, GB2312, EUC-TW, HZ-GB-2312, IS

Character Encoding Detector 1.8k Jan 8, 2023
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 8.8k Jan 8, 2023
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Contents Maintainer wanted Introduction Installation Documentation License History Source code Authors Maintainer wanted I am looking for a new mainta

Antti Haapala 1.2k Dec 16, 2022
Paranoid text spacing in Python Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width charact

Vinta Chen 194 Nov 19, 2022
An implementation of figlet written in Python

All of the documentation and the majority of the work done was by Christopher Jones (cjo[email protected]). Packaged by Peter Waller <[email protected]>,

Peter Waller 1.1k Jan 2, 2023
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 2, 2023
Python flexible slugify function

awesome-slugify Python flexible slugify function PyPi: Github:

Dmitry Voronin 471 Dec 20, 2022
Python library for creating PEG parsers

PyParsing -- A Python Parsing Module Introduction The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the t

Pyparsing 1.7k Dec 27, 2022
A simple Python module for parsing human names into their individual components

Name Parser A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. hn.title hn.first hn.middle hn.last hn.suff

Derek Gulbranson 574 Dec 20, 2022
Python port of Google's libphonenumber

phonenumbers Python Library This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase,

David Drysdale 3.1k Dec 29, 2022
A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Python User Agents user_agents is a Python library that provides an easy way to identify/detect devices like mobile phones, tablets and their capabili

Selwin Ong 1.3k Dec 22, 2022
A non-validating SQL parser module for Python

python-sqlparse - Parse SQL statements sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting and formatting S

Andi Albrecht 3.1k Jan 4, 2023
An anthology of a variety of tools for the Persian language in Python

An anthology of a variety of tools for the Persian language in Python

Persian Tools 106 Nov 8, 2022
Widevine KEY Extractor in Python

Widevine Client 3 This was originally written by T3rry7f. This repo is slightly modified version of his repo. This only works on standard Windows! Usa

Vank0n (SJJeon) 68 Dec 29, 2022
A Python app which can convert normal text to Handwritten text.

Text to HandWritten Text ✍️ Converter Watch Tutorial for this project Usage:- Clone my repository. Open CMD in working directory. Run following comman

Kushal Bhavsar 5 Dec 11, 2022
Etranslate is a free and unlimited python library for transiting your texts

Etranslate is a free and unlimited python library for transiting your texts

Abolfazl Khalili 16 Sep 13, 2022
Python Q&A for Network Engineers

Q & A I am often asked questions about how to solve this or that problem, and I decided to post these questions and solutions here, in case it is also

Natasha Samoylenko 30 Nov 15, 2022
py-trans is a Free Python library for translate text into different languages.

Free Python library to translate text into different languages.

I'm Not A Bot #Left_TG 13 Aug 27, 2022
a python package that lets you add custom colors and text formatting to your scripts in a very easy way!

colormate Python script text formatting package What is colormate? colormate is a python library that lets you add text formatting to your scripts, it

Rodrigo 2 Dec 14, 2022