Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

Overview

Lark - a parsing toolkit for Python

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

Lark can parse all context-free languages. To put it simply, it means that it is capable of parsing almost any programming language out there, and to some degree most natural languages too.

Who is it for?

  • Beginners: Lark is very friendly for experimentation. It can parse any grammar you throw at it, no matter how complicated or ambiguous, and do so efficiently. It also constructs an annotated parse-tree for you, using only the grammar and an input, and it gives you convienient and flexible tools to process that parse-tree.

  • Experts: Lark implements both Earley(SPPF) and LALR(1), and several different lexers, so you can trade-off power and speed, according to your requirements. It also provides a variety of sophisticated features and utilities.

What can it do?

  • Parse all context-free grammars, and handle any ambiguity gracefully
  • Build an annotated parse-tree automagically, no construction code required.
  • Provide first-rate performance in terms of both Big-O complexity and measured run-time (considering that this is Python ;)
  • Run on every Python interpreter (it's pure-python)
  • Generate a stand-alone parser (for LALR(1) grammars)

And many more features. Read ahead and find out!

Most importantly, Lark will save you time and prevent you from getting parsing headaches.

Quick links

Install Lark

$ pip install lark --upgrade

Lark has no dependencies.

Tests

Syntax Highlighting

Lark provides syntax highlighting for its grammar files (*.lark):

Clones

These are implementations of Lark in other languages. They accept Lark grammars, and provide similar utilities.

Hello World

Here is a little program to parse "Hello, World!" (Or any other similar phrase):

from lark import Lark

l = Lark('''start: WORD "," WORD "!"

            %import common.WORD   // imports from terminal library
            %ignore " "           // Disregard spaces in text
         ''')

print( l.parse("Hello, World!") )

And the output is:

Tree(start, [Token(WORD, 'Hello'), Token(WORD, 'World')])

Notice punctuation doesn't appear in the resulting tree. It's automatically filtered away by Lark.

Fruit flies like bananas

Lark is great at handling ambiguity. Here is the result of parsing the phrase "fruit flies like bananas":

fruitflies.png

Read the code here, and see more examples here.

List of main features

  • Builds a parse-tree (AST) automagically, based on the structure of the grammar
  • Earley parser
    • Can parse all context-free grammars
    • Full support for ambiguous grammars
  • LALR(1) parser
    • Fast and light, competitive with PLY
    • Can generate a stand-alone parser (read more)
  • CYK parser, for highly ambiguous grammars
  • EBNF grammar
  • Unicode fully supported
  • Python 2 & 3 compatible
  • Automatic line & column tracking
  • Standard library of terminals (strings, numbers, names, etc.)
  • Import grammars from Nearley.js (read more)
  • Extensive test suite codecov
  • MyPy support using type stubs
  • And much more!

See the full list of features here

Comparison to other libraries

Performance comparison

Lark is the fastest and lightest (lower is better)

Run-time Comparison

Memory Usage Comparison

Check out the JSON tutorial for more details on how the comparison was made.

Note: I really wanted to add PLY to the benchmark, but I couldn't find a working JSON parser anywhere written in PLY. If anyone can point me to one that actually works, I would be happy to add it!

Note 2: The parsimonious code has been optimized for this specific test, unlike the other benchmarks (Lark included). Its "real-world" performance may not be as good.

Feature comparison

Library Algorithm Grammar Builds tree? Supports ambiguity? Can handle every CFG? Line/Column tracking Generates Stand-alone
Lark Earley/LALR(1) EBNF Yes! Yes! Yes! Yes! Yes! (LALR only)
PLY LALR(1) BNF No No No No No
PyParsing PEG Combinators No No No* No No
Parsley PEG EBNF No No No* No No
Parsimonious PEG EBNF Yes No No* No No
ANTLR LL(*) EBNF Yes No Yes? Yes No

(* PEGs cannot handle non-deterministic grammars. Also, according to Wikipedia, it remains unanswered whether PEGs can really parse all deterministic CFGs)

Projects using Lark

  • Poetry - A utility for dependency management and packaging
  • tartiflette - a GraphQL server by Dailymotion
  • PyQuil - Python library for quantum programming using Quil
  • Preql - An interpreted relational query language that compiles to SQL
  • Hypothesis - Library for property-based testing
  • mappyfile - a MapFile parser for working with MapServer configuration
  • synapse - an intelligence analysis platform
  • Datacube-core - Open Data Cube analyses continental scale Earth Observation data through time
  • SPFlow - Library for Sum-Product Networks
  • Torchani - Accurate Neural Network Potential on PyTorch
  • Command-Block-Assembly - An assembly language, and C compiler, for Minecraft commands
  • EQL - Event Query Language
  • Fabric-SDK-Py - Hyperledger fabric SDK with Python 3.x
  • required - multi-field validation using docstrings
  • miniwdl - A static analysis toolkit for the Workflow Description Language
  • pytreeview - a lightweight tree-based grammar explorer
  • harmalysis - A language for harmonic analysis and music theory
  • gersemi - A CMake code formatter

Using Lark? Send me a message and I'll add your project!

License

Lark uses the MIT license.

(The standalone tool is under MPL2)

Contribute

Lark is currently accepting pull-requests. See How to develop Lark

Sponsor

If you like Lark, and want to see it grow, please consider sponsoring us!

Contact the author

Questions about code are best asked on gitter or in the issues.

For anything else, I can be reached by email at erezshin at gmail com.

-- Erez

Comments
  • Bug in handling ambiguity?

    Bug in handling ambiguity?

    When running this code:

    grammar = """
    expression: "c" | "d" | "c" "d"
    unit: expression "a"
        | "a" expression
        | "b" unit
        | "b" expression
    start: unit*
    
    %import common.WS
    %ignore WS
    """
    
    l = Lark(grammar, parser='earley', ambiguity='explicit')
    print(l.parse('b c d a a c').pretty())
    

    It is expected to have an ambiguous parse, but there is no '_ambig' node.

    At least these options are valid:

    unit(
        b
        unit(
            expression(
                c
                d
            )
            a
        )
    )
    unit(
        a
        expression(
            c
        )
    )
    

    and this parse:

    unit(
        b
        expression(
            c
        )
    )
    unit(
        expression(
            d
        )
        a
    )
    unit(
        a
        expression(
            c
        )
    )
    

    The only parse that comes back is the second one. When one removes the "b" expression option, you get the first one.

    bug 
    opened by uriva 67
  • Lark runs on Pyodide! (Online IDE)

    Lark runs on Pyodide! (Online IDE)

    Lark runs out-of-the-box inside the browser using Pyodide:

    image

    Pyodide is a CPython 3.7 interpreter compiled to web-assembly (wasm). Here's the Python console from above: https://pyodide.cdn.iodide.io/console.html

    Maybe this can be helpful as a quick start for all who quickly want to get into?

    discussion 
    opened by phorward 36
  • 0.11.2: pytest is failing

    0.11.2: pytest is failing

    I'm trying to package your module as rpm packag. So I'm using typical in such case build, install and test cycle used on building package from non-root account:

    • "setup.py build"
    • "setup.py install --root </install/prefix>"
    • "pytest with PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>

    May I ask for help because few units are failing:

    + PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-lark-parser-0.11.3-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-lark-parser-0.11.3-2.fc35.x86_64/usr/lib/python3.8/site-packages
    + /usr/bin/pytest -ra
    =========================================================================== test session starts ============================================================================
    platform linux -- Python 3.8.11, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
    benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
    Using --randomly-seed=2126451817
    rootdir: /home/tkloczko/rpmbuild/BUILD/lark-0.11.3
    plugins: forked-1.3.0, shutil-1.7.0, virtualenv-1.7.0, expect-1.1.0, flake8-1.0.7, timeout-1.4.2, betamax-0.8.1, freezegun-0.4.2, aspectlib-1.5.2, toolbox-0.5, rerunfailures-9.1.1, requests-mock-1.9.3, cov-2.12.1, pyfakefs-4.5.0, flaky-3.7.0, benchmark-3.4.1, xdist-2.3.0, pylama-7.7.1, datadir-1.3.1, regressions-2.2.0, cases-3.6.3, xprocess-0.18.1, black-0.3.12, checkdocs-2.7.1, anyio-3.3.0, Faker-8.11.0, asyncio-0.15.1, trio-0.7.0, httpbin-1.0.0, subtests-0.5.0, isort-2.0.0, hypothesis-6.14.6, mock-3.6.1, profiling-1.7.0, randomly-3.8.0
    collected 998 items
    
    tests/test_tools.py ....                                                                                                                                             [  0%]
    tests/test_logger.py ...                                                                                                                                             [  0%]
    tests/test_reconstructor.py .......                                                                                                                                  [  1%]
    tests/test_trees.py ..............                                                                                                                                   [  2%]
    tests/test_parser.py ...............s.....ss.s.s......ss.....ss..s..s.......s......s.s.s...s.....s...s.s......s...s.....s......s................s...s...s........... [ 17%]
    ..s...................s.........s.....s...................s...s..s...s........s...................s.........s....................................................... [ 33%]
    ...........s....s.......s........................s.................s.............s..s...................s....s.s...ss......................s...............s..s.s... [ 50%]
    .........s..s...s....................s..................s..........s...s................s.........s..s..s.....s........s.....s.s.......s......s......s....s......... [ 66%]
    ...........s............s.....s....................s.s............................s.......s....ss..ss..s...........s.ss......s...............s.s........s.s.s...s.s. [ 82%]
    ....ss...............s.......s.........................s....s............s..........s..........................................                                      [ 95%]
    tests/test_lexer.py .                                                                                                                                                [ 95%]
    tests/test_nearley/test_nearley.py ..FF...F                                                                                                                          [ 96%]
    tests/test_cache.py ....                                                                                                                                             [ 96%]
    . .                                                                                                                                                                  [ 97%]
    tests/test_cache.py F.                                                                                                                                               [ 97%]
    tests/test_grammar.py .......F.......                                                                                                                                [ 98%]
    tests/test_tree_forest_transformer.py ............                                                                                                                   [100%]
    
    ================================================================================= FAILURES =================================================================================
    _________________________________________________________________________ TestNearley.test_include _________________________________________________________________________
    
    self = <tests.test_nearley.test_nearley.TestNearley testMethod=test_include>
    
        def test_include(self):
            fn = os.path.join(NEARLEY_PATH, 'test/grammars/folder-test.ne')
    >       with open(fn) as f:
    E       FileNotFoundError: [Errno 2] No such file or directory: '/home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_nearley/nearley/test/grammars/folder-test.ne'
    
    tests/test_nearley/test_nearley.py:48: FileNotFoundError
    ______________________________________________________________________ TestNearley.test_multi_include ______________________________________________________________________
    
    self = <tests.test_nearley.test_nearley.TestNearley testMethod=test_multi_include>
    
        def test_multi_include(self):
            fn = os.path.join(NEARLEY_PATH, 'test/grammars/multi-include-test.ne')
    >       with open(fn) as f:
    E       FileNotFoundError: [Errno 2] No such file or directory: '/home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_nearley/nearley/test/grammars/multi-include-test.ne'
    
    tests/test_nearley/test_nearley.py:61: FileNotFoundError
    ___________________________________________________________________________ TestNearley.test_css ___________________________________________________________________________
    
    self = <tests.test_nearley.test_nearley.TestNearley testMethod=test_css>
    
        def test_css(self):
            fn = os.path.join(NEARLEY_PATH, 'examples/csscolor.ne')
    >       with open(fn) as f:
    E       FileNotFoundError: [Errno 2] No such file or directory: '/home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_nearley/nearley/examples/csscolor.ne'
    
    tests/test_nearley/test_nearley.py:28: FileNotFoundError
    __________________________________________________________________________ TestCache.test_imports __________________________________________________________________________
    
    self = <tests.test_cache.TestCache testMethod=test_imports>
    
        def test_imports(self):
            g = """
            %import .grammars.ab (startab, expr)
            """
    >       parser = Lark(g, parser='lalr', start='startab', cache=True)
    
    tests/test_cache.py:131:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    lark/lark.py:299: in __init__
        self.grammar, used_files = load_grammar(grammar, self.source_path, self.options.import_paths, self.options.keep_all_tokens)
    lark/load_grammar.py:1229: in load_grammar
        builder.load_grammar(grammar, source)
    lark/load_grammar.py:1082: in load_grammar
        self.do_import(dotted_path, base_path, aliases, mangle)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    
    self = <lark.load_grammar.GrammarBuilder object at 0x7f9944b73640>, dotted_path = (Token('RULE', 'grammars'), Token('RULE', 'ab')), base_path = '/usr/bin'
    aliases = {Token('RULE', 'expr'): Token('RULE', 'expr'), Token('RULE', 'startab'): Token('RULE', 'startab')}, base_mangle = None
    
        def do_import(self, dotted_path, base_path, aliases, base_mangle=None):
            assert dotted_path
            mangle = _get_mangle('__'.join(dotted_path), aliases, base_mangle)
            grammar_path = os.path.join(*dotted_path) + EXT
            to_try = self.import_paths + ([base_path] if base_path is not None else []) + [stdlib_loader]
            for source in to_try:
                try:
                    if callable(source):
                        joined_path, text = source(base_path, grammar_path)
                    else:
                        joined_path = os.path.join(source, grammar_path)
                        with open(joined_path, encoding='utf8') as f:
                            text = f.read()
                except IOError:
                    continue
                else:
                    h = hashlib.md5(text.encode('utf8')).hexdigest()
                    if self.used_files.get(joined_path, h) != h:
                        raise RuntimeError("Grammar file was changed during importing")
                    self.used_files[joined_path] = h
    
                    gb = GrammarBuilder(self.global_keep_all_tokens, self.import_paths, self.used_files)
                    gb.load_grammar(text, joined_path, mangle)
                    gb._remove_unused(map(mangle, aliases))
                    for name in gb._definitions:
                        if name in self._definitions:
                            raise GrammarError("Cannot import '%s' from '%s': Symbol already defined." % (name, grammar_path))
    
                    self._definitions.update(**gb._definitions)
                    break
            else:
                # Search failed. Make Python throw a nice error.
    >           open(grammar_path, encoding='utf8')
    E           FileNotFoundError: [Errno 2] No such file or directory: 'grammars/ab.lark'
    
    lark/load_grammar.py:1162: FileNotFoundError
    ______________________________________________________________________ TestGrammar.test_override_rule ______________________________________________________________________
    
    self = <tests.test_grammar.TestGrammar testMethod=test_override_rule>
    
        def test_override_rule(self):
            # Overrides the 'sep' template in existing grammar to add an optional terminating delimiter
            # Thus extending it beyond its original capacity
            p = Lark("""
                %import .test_templates_import (start, sep)
    
                %override sep{item, delim}: item (delim item)* delim?
                %ignore " "
            """, source_path=__file__)
    
            a = p.parse('[1, 2, 3]')
            b = p.parse('[1, 2, 3, ]')
            assert a == b
    
    >       self.assertRaises(GrammarError, Lark, """
                %import .test_templates_import (start, sep)
    
                %override sep{item}: item (delim item)* delim?
            """)
    
    tests/test_grammar.py:39:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    lark/lark.py:299: in __init__
        self.grammar, used_files = load_grammar(grammar, self.source_path, self.options.import_paths, self.options.keep_all_tokens)
    lark/load_grammar.py:1229: in load_grammar
        builder.load_grammar(grammar, source)
    lark/load_grammar.py:1082: in load_grammar
        self.do_import(dotted_path, base_path, aliases, mangle)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    
        def do_import(self, dotted_path, base_path, aliases, base_mangle=None):
            assert dotted_path
            mangle = _get_mangle('__'.join(dotted_path), aliases, base_mangle)
            grammar_path = os.path.join(*dotted_path) + EXT
            to_try = self.import_paths + ([base_path] if base_path is not None else []) + [stdlib_loader]
            for source in to_try:
                try:
                    if callable(source):
                        joined_path, text = source(base_path, grammar_path)
                    else:
                        joined_path = os.path.join(source, grammar_path)
                        with open(joined_path, encoding='utf8') as f:
                            text = f.read()
                except IOError:
                    continue
                else:
                    h = hashlib.md5(text.encode('utf8')).hexdigest()
                    if self.used_files.get(joined_path, h) != h:
                        raise RuntimeError("Grammar file was changed during importing")
                    self.used_files[joined_path] = h
    
                    gb = GrammarBuilder(self.global_keep_all_tokens, self.import_paths, self.used_files)
                    gb.load_grammar(text, joined_path, mangle)
                    gb._remove_unused(map(mangle, aliases))
                    for name in gb._definitions:
                        if name in self._definitions:
                            raise GrammarError("Cannot import '%s' from '%s': Symbol already defined." % (name, grammar_path))
    
                    self._definitions.update(**gb._definitions)
                    break
            else:
                # Search failed. Make Python throw a nice error.
    >           open(grammar_path, encoding='utf8')
    E           FileNotFoundError: [Errno 2] No such file or directory: 'test_templates_import.lark'
    
    lark/load_grammar.py:1162: FileNotFoundError
    ============================================================================= warnings summary =============================================================================
    tests/test_cache.py:110
      /home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_cache.py:110: DeprecationWarning: invalid escape sequence \d
        g = """
    
    tests/test_cache.py:48
      /home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_cache.py:48: PytestCollectionWarning: cannot collect test class 'TestT' because it has a __init__ constructor (from: tests/test_cache.py)
        class TestT(Transformer):
    
    tests/test_parser.py:166
      /home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_parser.py:166: DeprecationWarning: invalid escape sequence \d
        g = """
    
    tests/test_reconstructor.py:75
      /home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_reconstructor.py:75: DeprecationWarning: invalid escape sequence \s
        g = """
    
    tests/test_reconstructor.py:90
      /home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_reconstructor.py:90: DeprecationWarning: invalid escape sequence \s
        g = """
    
    tests/test_reconstructor.py:154
      /home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_reconstructor.py:154: DeprecationWarning: invalid escape sequence \s
        g1 = """
    
    tests/test_reconstructor.py:162
      /home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_reconstructor.py:162: DeprecationWarning: invalid escape sequence \s
        g2 = """
    
    -- Docs: https://docs.pytest.org/en/stable/warnings.html
    ========================================================================= short test summary info ==========================================================================
    SKIPPED [7] tests/test_parser.py:2005: Currently only Earley supports priority sum in rules
    SKIPPED [2] tests/test_parser.py:2077: No empty rules
    SKIPPED [7] tests/test_parser.py:2309: Serialize currently only works for LALR parsers without custom lexers (though it should be easy to extend)
    SKIPPED [9] tests/test_parser.py:1045: cStringIO not available
    SKIPPED [3] tests/test_parser.py:2355: match_examples() not supported for CYK/old custom lexer
    SKIPPED [9] tests/test_parser.py:1249: Flattening list isn't implemented (and may never be)
    SKIPPED [2] tests/test_parser.py:1961: Doesn't work for CYK
    SKIPPED [2] tests/test_parser.py:2231: Empty rules
    SKIPPED [2] tests/test_parser.py:2220: Empty rules
    SKIPPED [2] tests/test_parser.py:1120: Takes forever
    SKIPPED [9] tests/test_parser.py:1265: Flattening list isn't implemented (and may never be)
    SKIPPED [6] tests/test_parser.py:1705: Only standard lexers care about token priority
    SKIPPED [2] tests/test_parser.py:1512: No empty rules
    SKIPPED [2] tests/test_parser.py:1194: No empty rules
    SKIPPED [2] tests/test_parser.py:1650: No empty rules
    SKIPPED [6] tests/test_parser.py:2435: interactive_parser error handling only works with LALR for now
    SKIPPED [6] tests/test_parser.py:2398: interactive_parser is only implemented for LALR at the moment
    SKIPPED [2] tests/test_parser.py:1451: No empty rules
    SKIPPED [9] tests/test_parser.py:1281: Flattening list isn't implemented (and may never be)
    SKIPPED [2] tests/test_parser.py:1213: No empty rules
    SKIPPED [2] tests/test_parser.py:1233: No empty rules
    SKIPPED [4] tests/test_parser.py:2194: Priority not handled correctly right now
    SKIPPED [2] tests/test_parser.py:1915: %declare/postlex doesn't work with dynamic
    SKIPPED [2] tests/test_parser.py:1938: %declare/postlex doesn't work with dynamic
    SKIPPED [1] tests/test_parser.py:754: Only relevant for the dynamic_complete parser
    SKIPPED [1] tests/test_parser.py:402: Only relevant for the dynamic_complete parser
    FAILED tests/test_nearley/test_nearley.py::TestNearley::test_include - FileNotFoundError: [Errno 2] No such file or directory: '/home/tkloczko/rpmbuild/BUILD/lark-0.11.3...
    FAILED tests/test_nearley/test_nearley.py::TestNearley::test_multi_include - FileNotFoundError: [Errno 2] No such file or directory: '/home/tkloczko/rpmbuild/BUILD/lark-...
    FAILED tests/test_nearley/test_nearley.py::TestNearley::test_css - FileNotFoundError: [Errno 2] No such file or directory: '/home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tes...
    FAILED tests/test_cache.py::TestCache::test_imports - FileNotFoundError: [Errno 2] No such file or directory: 'grammars/ab.lark'
    FAILED tests/test_grammar.py::TestGrammar::test_override_rule - FileNotFoundError: [Errno 2] No such file or directory: 'test_templates_import.lark'
    ========================================================= 5 failed, 889 passed, 103 skipped, 7 warnings in 44.97s ==========================================================
    pytest-xprocess reminder::Be sure to terminate the started process by running 'pytest --xkill' if you have not explicitly done so in your fixture with 'xprocess.getinfo(<process_name>).terminate()'.
    
    opened by kloczek 35
  • Fix #696 now providing the correct amount of placeholders

    Fix #696 now providing the correct amount of placeholders

    p = Lark("""!start: ["a" "b" "c"] """, maybe_placeholders=True)
    p.parse("").children
    

    now returns [None, None, None] instead of [None]

    same for !start: ["a" ["b" "c"]].

    opened by ornariece 32
  • Changing file-extension for standalone grammar definitions from .g?

    Changing file-extension for standalone grammar definitions from .g?

    Currently standalone files like common and the example json use the file extension .g. However, it looks like g is already associated with the ANTLR parser. While I suppose it's possible to make Lark compatible with ANTLR, in the meantime it's probably best to use a different file extension. I would propose the extension lrk as it doesn't seem to be used by anything currently.

    I'm about to submit a pull request to change the file extensions to .lrk in the relevant file names and in the code referencing the .g extension. It's on a separate branch so if you want to use a different extension that should be easy enough to change.

    discussion 
    opened by RobRoseKnows 31
  • Fix `python.number` pattern

    Fix `python.number` pattern

    Python doesn't accept numbers with the _ in the beginning/end and numbers with more than one _ in the allowed places:

    >>> 69420
    69420
    >>> 69_420
    69420
    >>> 69__420
      File "<stdin>", line 1
        69__420
          ^
    SyntaxError: invalid decimal literal
    >>> 69_420_
      File "<stdin>", line 1
        69_420_
              ^
    SyntaxError: invalid decimal literal
    >>> _69_420
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name '_69_420' is not defined
    
    >>> 03.1415
    3.1415
    >>> 0_3.14_15
    3.1415
    >>> 0__3.14_15
      File "<stdin>", line 1
        0__3.14_15
         ^
    SyntaxError: invalid decimal literal
    >>> 0_3.14__15
      File "<stdin>", line 1
        0_3.14__15
              ^
    SyntaxError: invalid decimal literal
    >>> 0_3.14_15_
      File "<stdin>", line 1
        0_3.14_15_
                 ^
    SyntaxError: invalid decimal literal
    >>> 0_3._14_15
      File "<stdin>", line 1
        0_3._14_15
           ^
    SyntaxError: invalid decimal literal
    >>> 0_3_.14_15
      File "<stdin>", line 1
        0_3_.14_15
           ^
    SyntaxError: invalid decimal literal
    >>> _0_3.14_15
      File "<stdin>", line 1
        _0_3.14_15
        ^^^^^^^^^^
    SyntaxError: invalid syntax. Perhaps you forgot a comma?
    

    the same goes with complex numbers. And yes, python recognizes _xxx as a name, even though x is a digit, but it's still not a number, so this doesn't affect us.

    The current implementation only filters numbers with _ in the beginning, so here's the fix for the other cases.


    Hopefully, we still can make backward-incompatible changes, so it's fine to change IMAG_NUMBER to COMPLEX_NUMBER

    I also tested \d(?:_?\d+)* for DEC_NUMBER but haven't seen any significant performance changes (everything is within the normal range, considering that I was not using a stable environment).

    opened by 0dminnimda 30
  • Newbie questions

    Newbie questions

    Consider the below snippet:

    from lark import Lark, inline_args, Transformer
    
    grammars = [
        """
            ?start: sum | NAME "=" sum
            ?sum: product | sum "+" product | sum "-" product
            ?product: atom | product "*" atom | product "/" atom
            ?atom: NUMBER | "-" atom | NAME | "(" sum ")"
            %import common.CNAME -> NAME
            %import common.NUMBER
            %import common.WS_INLINE
            %ignore WS_INLINE
        """,
        """
            ?start: sum | NAME "=" sum
            ?sum: product | sum "+" product | sum "-" product
            ?product: atom | product "*" atom | product "/" atom
            ?atom: NUMBER | "-" atom | NAME | "(" sum ")"
            EQUAL: "="
            LPAR: "("
            RPAR: ")"
            SLASH: "/"
            STAR: "*"
            MINUS: "-"
            PLUS: "+"
            %import common.CNAME -> NAME
            %import common.NUMBER
            %import common.WS_INLINE
            %ignore WS_INLINE
        """,
        """
            ?start: sum | NAME "=" sum
            ?sum: product | sum "+" product | sum "-" product
            ?product: atom | product "*" atom | product "/" atom
            ?atom: NUMBER | "-" atom | NAME | "(" sum ")"
            OPERATOR : "=" | "(" | ")" | "/" | "*" | "-" | "+"
            %import common.CNAME -> NAME
            %import common.NUMBER
            %import common.WS_INLINE
            %ignore WS_INLINE
        """
    ]
    
    
    def test(grammar, text):
        parser = Lark(grammar, start='start')
        # print(parser.parse(text).pretty())
        print(sorted(list(set([t.type for t in parser.lex(text)]))))
        # print([t.name for t in parser.lexer.tokens])
    
    
    text = "x = 1+2 - 3-4 - 5*6 - 7/8 - (9+10-11*12/13)"
    for i, grammar in enumerate(grammars):
        print('grammar {}'.format(i).center(80, '*'))
        test(grammar, text)
    

    whose output is:

    ***********************************grammar 0************************************
    ['NAME', 'NUMBER', '__EQUAL', '__LPAR', '__MINUS', '__PLUS', '__RPAR', '__SLASH', '__STAR']
    ***********************************grammar 1************************************
    ['EQUAL', 'LPAR', 'MINUS', 'NAME', 'NUMBER', 'PLUS', 'RPAR', 'SLASH', 'STAR']
    ***********************************grammar 2************************************
    ['NAME', 'NUMBER', '__EQUAL', '__LPAR', '__MINUS', '__PLUS', '__RPAR', '__SLASH', '__STAR']
    

    got some questions:

    1. About grammar0, this set of token types '__EQUAL', '__LPAR', '__MINUS', '__PLUS', '__RPAR', '__SLASH', '__STAR' are generated automagically, how does this work internally?

    2. About grammar1, following this method I'll be able to identify easily the token types so I can use the types to syntax highlight with QScintilla, is there any problem with this approach?

    3. About grammar2, in case I want to syntax highlight a group of similar tokens, how can I do that? In this case the token types are still generated automatically instead becoming OPERATOR. I'd like to be able to apply one QScintilla style to a bunch of related tokens (ie: OPERATORS= " | "(" | ")" | "/" | "*" | "-" | "+")

    opened by brupelo 30
  • Fail to create parser using a big grammar (memory increase infinitely)

    Fail to create parser using a big grammar (memory increase infinitely)

    Hello,

    I'm trying to parse a text in python 3.5, using the 0.5.6 release of lark. I have a very long grammar in this format :

    start: title  field+
    
    field: rule1 -> alias1
    	| rule2 -> alias2
    	[…]
    	| rule386 -> alias386
    
    //AUXILIARY TERMS
    title: ...
    term1: ...
    [...]
    term90:
    
    //RULES GROUP 1
    rule1: ...
    [...]
    rule276: ...
    
    //RULES GROUP2
    rule277: ...
    [...]
    rule386: ...
    
    //TERMINALS
    [...]
    

    Here are some examples of the syntax of rules and terms :

    //AUXILIARY TERMS
    adexpmsg: CHARACTER*
    aidequipment: (("N"|"S") [equipmentcode])|equipmentcode
    aircraftid: ALPHANUM~2..7
    
    //PRIMARY FIELDS
    aatot: _HYPHEN _sep "AATOT" _sep timehhmm
    ad: _HYPHEN _sep "AD" _sep adid [_sep (fl|flblock)] [_sep eto] [_sep to] [_sep cto] [_sep sto] [_sep ptstay] [_sep ptrfl] [_sep ptrulchg] [_sep (ptspeed|ptmach)]
    ada: _HYPHEN _sep "ADA" _sep date
    
    //SUBFIELDS
    addrinfo: _HYPHEN _sep "ADDRINFO" _sep networktype _sep fac
    adid: _HYPHEN _sep "ADID" _sep (icaoaerodrome | "ZZZZ")
    adname: _HYPHEN _sep "ADNAME" _sep (LIM_CHAR)~1..50
    
    //TERMINALS
    _sep: SEP*
    ALPHA: /[A-Z]{1}/
    DIGIT: /[0-9]{1}/
    ALPHANUM: ALPHA|DIGIT
    SPACE: " "
    _HYPHEN: "-"
    FEF: "\n"|"\r"
    SEP: (SPACE|FEF)
    SPECIAL: SPACE
    	|"("
    	|")"
    	|"?"
    	|":"
    	|"."
    	|","
    	|"'"
    	|"="
    	|"+"
    	|"/"
    CHARACTER: ALPHA|DIGIT|SPECIAL|FEF|_HYPHEN
    LIM_CHAR: ALPHA|DIGIT|SPECIAL|FEF
    START_OF_FIELD: _HYPHEN
    %import common.WS
    

    The text format I'm trying to parse is the ADEXP format, which is a succession of fields, all of them beginning by a "-", followed by a name of field, and one or more values, which can also be a new field. The first field is "-TITLE". Here is an example of an ADEXP message :

    -TITLE BFD -REFDATA -SENDER -FAC BORD -RECVR -FAC A -SEQNUM 001 -ARCID RYR743D -SSRCODE A1122 -NBARC 1 
    -ARCTYP B738 -ADEP EDDH -ROUTE N0441F370 OLRAK DEGOL LAPRO PPG ALBER  -BEGIN RTEPTS -PT -PTID OLRAK -PT 
    -PTID DEGOL -PT -PTID LAPRO -PT -PTID PPG -PT -PTID ALBER -END RTEPTS -ADES LEBL -BEGIN EQCST -EQPT Y/EQ 
    -EQPT W/EQ -EQPT R/EQ -END EQCST -RFL F370 -SPEED N0441 -EOBT 2359 -WKTRC M
    

    This format allow separators between almost every fields, but some fields have to be directly one after the other, without any separator, so i had to explicit all separators in all rules. You can find in details the complete syntax of all fields here.

    I tested some rules individually and I'm able to parse a text with this rules.

    But when I try to parse the entire grammar above (~900 lines), lark can't create the parser:

    When I execute the code :

    [...]
    print("Creating parser")
    parser = Lark(grammar, parser='earley')
    print("Parser created")
    tree_res = parser.parse(text)
    print("Text parsed")
    

    The program displays "Creating parser", and processes to create it. The RAM is increasing progressively, until it reaches 12-13GO of RAM and get killed by the system (after 15-20 minutes).

    I also tried to use different parser with different lexers, but it doesn't change anything. I obtained the same result when I used a different python interpreter (Pypy).

    I would like to know if you have any idea why it's taking this time to finally get killed ? Is it my grammar which is too big ? Or maybe too ambiguous ? Tell me if you need any more details about the grammar or anything else.

    Thank you in advance for your answer.

    opened by dryslope 29
  • Improvement: Use Cython for Speed

    Improvement: Use Cython for Speed

    Having written a LALR parser for my language (https://github.com/eddieschoute/quippy) it still takes many seconds to parse a file of 100k LOC. One benchmark It takes very roughly 2m20s to parse a 600kLOC input file that I have, which is slow in my opinion. One straightforward improvement that I can think of is to use Cython to generate a C-implementation of the LALR parser. Most of the time seems to be spent in the main LALR parser loop, which can be significantly sped up by Cython. I would also be open to other suggestions to improve the parsing speed.

    Since specifically the LALR parser is meant to compete in speed, I think it would be worth exploring the possibility of pushing this parser to its limit. Hopefully, converting the code to Cython code will be fairly painless and from there it just remains to optimize the functionality.

    I do not know how the standalone parser will be affected by this, but I can image that instead of generating py files it should instead generate a pyx file that can be cythonized.

    enhancement discussion 
    opened by eddieschoute 29
  • Make the Earley parser closer to the spec and add a complete SPPF forest implementation.

    Make the Earley parser closer to the spec and add a complete SPPF forest implementation.

    Key changes: Add Items to the current Column and ensure unique before adding derivations

    • Ensures all derivations get added to the same unique items.

    Add rudimentary SPPF type implementation to derivations, indexed on start and end, end as per:

    • https://www.sciencedirect.com/science/article/pii/S1571066108001497
    • This was required after with fixed _ambig detection.

    Remove earley__predict_all property.

    • No longer needed after the above two changes.
    opened by night199uk 28
  • Bytes support

    Bytes support

    This is a start of implementing support for byte string as suggested in #626. This is still WIP.

    My idea is that the grammar is still string, but you pass the use_bytes=True flag, make the patterns to be compiled with bytes. If you need to use match bytes that are not compatible with whatever encoding is used, you can just escaped them. They will be unescaped later.

    TODO:

    • [X] Add use_bytes to make regex compile as bytes
    • [x] Add tests (essentially, everything needs to be tested again, but with bytes)
    • [x] Find and check edge cases

    @ctrlcctrlv, does this fix your use case?

    opened by MegaIng 27
  • Can I display progress status of Lark().parse()?

    Can I display progress status of Lark().parse()?

    I have implemented a JSON converter for a unique format of text with a CLI using Lark. When I run Lark().parse() on a large file, I have to wait for several tens of seconds.

    Is there a way to get a progress status -- for example, by returning a generator that can be passed to tqdm?

    I am not having any issues with the speed of Lark. I just want to be able to inform the user that the program is running👍.

    enhancement 
    opened by quag-cactus 2
  • How to keep track of tree while transforming it?

    How to keep track of tree while transforming it?

    I have a lark.visitors.Transformer which converts a AST into some other AST, while doing it I will exit if there is a error, i use the tokens to show where the error occured. now it is not possible to get the token because it is transformed.

    question 
    opened by aspizu 0
  • Generate Type-annotated Visitor definition from lark grammar

    Generate Type-annotated Visitor definition from lark grammar

    This feature will generate a python file containing a Visitor class definition with methods for every rule defined in the lark grammar file which will have the correct type-annotations.

    Example: grammar.lark

    start: "FOO" bar biz
    bar: (/[a-z]/)*
    biz: [/no/]
    

    Result: lark --generate-visitor grammar.lark

    from typing import Optional, Literal
    from lark import Visitor, Token
    
    class MyVisitor(Visitor[Token]):
        def start(self, args: tuple[tuple[Token, ...], Optional[Literal["no"]]]):
            ...
        
        def bar(self, args: tuple[Token, ...]):
            ...
        
        def biz(self, args: tuple[Optional[Literal["no"]]]):
            ...
    
    enhancement 
    opened by aspizu 4
  • Macroses support. Dynamic grammar

    Macroses support. Dynamic grammar

    I want to implement Macroses support for my masm parser. So is it possible to have kind of Dynamic grammar so I could add tokens at runtime? Or check if token match with my handler? (If macros was defined several lines before) It also might called custom matcher

    opened by xor2003 5
  • Support for Python-style comments in Lark grammar

    Support for Python-style comments in Lark grammar

    Given that

    • most (all?) editors are unaware of Lark's syntax
    • most lark grammars live in Python strings
    • most editors will use # when asked to comment lines or blocks within Lark strings (eg Pycharm's CTRL+D)
    • commenting lines and blocks is frequently done while developing a grammar (debugging...)
    • adding this style of comments should not break existing grammars

    I propose in this small PR to enable Python-style comments in Lark grammars. If accepted, I'll do another PR to reflect that in documentation.

    opened by vincent-hugot 12
Releases(1.1.5)
  • 1.1.5(Dec 6, 2022)

    What's Changed

    • setup.cfg: Replace deprecated license_file with license_files by @mgorny in https://github.com/lark-parser/lark/pull/1209
    • Fix Github shenanigans by @erezsh in https://github.com/lark-parser/lark/pull/1220
    • Fix AmbiguousExpander (Issue #1214) by @chanicpanic in https://github.com/lark-parser/lark/pull/1216
    • Fix EOF line information in InteractiveParser.resume_parse() by @erezsh in https://github.com/lark-parser/lark/pull/1224
    • Use generator instead of list expand or add method by @jmishra01 in https://github.com/lark-parser/lark/pull/1225

    New Contributors

    • @mgorny made their first contribution in https://github.com/lark-parser/lark/pull/1209
    • @jmishra01 made their first contribution in https://github.com/lark-parser/lark/pull/1225

    Full Changelog: https://github.com/lark-parser/lark/compare/1.1.4...1.1.5

    Source code(tar.gz)
    Source code(zip)
  • 1.1.4(Nov 2, 2022)

    What's Changed

    • ci: Python 3.11 final by @henryiii in https://github.com/lark-parser/lark/pull/1204
    • Add __all__ to __init__ by @aspizu in https://github.com/lark-parser/lark/pull/1200
    • PropagatePositions: Allow any object to carry the metadata, by returning it in __lark_meta__() by @erezsh in https://github.com/lark-parser/lark/pull/1203
    • fix: Token now pattern matches correctly by @marcinplatek in https://github.com/lark-parser/lark/pull/1181
    • Updates to merge PR #1151 by @erezsh in https://github.com/lark-parser/lark/pull/1205
    • style: pre-commit basic config by @henryiii in https://github.com/lark-parser/lark/pull/1151
    • PR for v1.1.4 by @erezsh in https://github.com/lark-parser/lark/pull/1208

    New Contributors

    • @aspizu made their first contribution in https://github.com/lark-parser/lark/pull/1200
    • @marcinplatek made their first contribution in https://github.com/lark-parser/lark/pull/1181

    Full Changelog: https://github.com/lark-parser/lark/compare/1.1.3...1.1.4

    Source code(tar.gz)
    Source code(zip)
  • 1.1.3(Oct 11, 2022)

    What's Changed

    • Add user to cache filename; better handle cache load/save failures by @klauer in https://github.com/lark-parser/lark/pull/1179

    • refactor: add 'usedforsecurity=False' arg to hashlib.md5 usage by @cquick01 in https://github.com/lark-parser/lark/pull/1190

    • Create lark/grammars/init.py by @chanicpanic in https://github.com/lark-parser/lark/pull/1171

    • Adjust imports for Python 3.11 by @The-Compiler in https://github.com/lark-parser/lark/pull/1140

    • Fix for issue #1173 by @erezsh in https://github.com/lark-parser/lark/pull/1198

    • Add match stmt support to python.lark by @joseph-e-k in https://github.com/lark-parser/lark/pull/1123

    • Added match stmt support to python.lark by @MegaIng in https://github.com/lark-parser/lark/pull/1016

    • Linting to fix minor issues by @Erotemic in https://github.com/lark-parser/lark/pull/1128

    • Simplify lexer: Use Match.lastgroup instead of lastindex by @erezsh in https://github.com/lark-parser/lark/pull/1129

    • Fix confusing import in examples by @JonasLoos in https://github.com/lark-parser/lark/pull/1138

    • Move iter_subtrees_topdown into standalone by @camgunz in https://github.com/lark-parser/lark/pull/1137

    • Fix 1146: use the class's get instead of the instance's get by @MegaIng in https://github.com/lark-parser/lark/pull/1147

    • fix: remove Python 2 legacy packaging code by @henryiii in https://github.com/lark-parser/lark/pull/1148

    • Fix for PR #1149 by @erezsh in https://github.com/lark-parser/lark/pull/1150

    • Old link for sppf is no longer valid. Point to web archive instead. by @patrickhuber in https://github.com/lark-parser/lark/pull/1159

    • Fix ForestToPyDotVisitor by @chanicpanic in https://github.com/lark-parser/lark/pull/1167

    • Close file-like objects to address ResourceWarning. by @shawnbrown in https://github.com/lark-parser/lark/pull/1183

    • Minor adjustments to PR #1179 by @erezsh in https://github.com/lark-parser/lark/pull/1189

    • Adjustments for PR #1152 by @erezsh in https://github.com/lark-parser/lark/pull/1191

    • Remove trailing whitespace by @bcr in https://github.com/lark-parser/lark/pull/1196

    New Contributors

    • @joseph-e-k made their first contribution in https://github.com/lark-parser/lark/pull/1123
    • @Erotemic made their first contribution in https://github.com/lark-parser/lark/pull/1128
    • @JonasLoos made their first contribution in https://github.com/lark-parser/lark/pull/1138
    • @camgunz made their first contribution in https://github.com/lark-parser/lark/pull/1137
    • @The-Compiler made their first contribution in https://github.com/lark-parser/lark/pull/1140
    • @henryiii made their first contribution in https://github.com/lark-parser/lark/pull/1148
    • @patrickhuber made their first contribution in https://github.com/lark-parser/lark/pull/1159
    • @shawnbrown made their first contribution in https://github.com/lark-parser/lark/pull/1183
    • @klauer made their first contribution in https://github.com/lark-parser/lark/pull/1179
    • @cquick01 made their first contribution in https://github.com/lark-parser/lark/pull/1190
    • @bcr made their first contribution in https://github.com/lark-parser/lark/pull/1196

    Full Changelog: https://github.com/lark-parser/lark/compare/1.1.2...1.1.3

    Source code(tar.gz)
    Source code(zip)
  • 1.1.2(Mar 1, 2022)

    Highlights

    • Tree instances now have a pretty print with the "rich" library, when doing rich.print(tree)
    • Bugfix for recursive regexes (with the "regex" library)
    • Refactors, cleanups, and better mypy support

    What's Changed

    • Clean up tree templates implementation to reduce mypy errors by @plannigan in https://github.com/lark-parser/lark/pull/1091
    • Remove redefinitions related to standalone parser by @plannigan in https://github.com/lark-parser/lark/pull/1115
    • Added Tree.rich() method to make Tree a Rich renderable by @erezsh in https://github.com/lark-parser/lark/pull/1117
    • Rename lexer_state->lexer_thread, and make a few adjustments for the benefit of Lark-Cython by @erezsh in https://github.com/lark-parser/lark/pull/1118
    • Use isinstance() checks in expcetions match_examples() by @plannigan in https://github.com/lark-parser/lark/pull/1065
    • change MAXREPEAT to int by @gruebel in https://github.com/lark-parser/lark/pull/1120
    • Tests: Small fixes by @erezsh in https://github.com/lark-parser/lark/pull/1122

    New Contributors

    • @gruebel made their first contribution in https://github.com/lark-parser/lark/pull/1120

    Full Changelog: https://github.com/lark-parser/lark/compare/1.1.1...1.1.2

    Source code(tar.gz)
    Source code(zip)
  • 1.1.1(Feb 8, 2022)

    What's Changed

    • Add test cases for tree templates by @plannigan in https://github.com/lark-parser/lark/pull/1096
    • 🖊 Fix Typo: plural "options" instead of singular "option" by @hf-kklein in https://github.com/lark-parser/lark/pull/1101
    • PEP 8: Minor Code Style Improvements by @hf-kklein in https://github.com/lark-parser/lark/pull/1102
    • Add Code Style Section to Contribution Guide by @hf-kklein in https://github.com/lark-parser/lark/pull/1107
    • Fix MyPy Warnings in lark/tools/init.py by @hf-kklein in https://github.com/lark-parser/lark/pull/1100
    • rename n to child when iterating over children by @hf-kklein in https://github.com/lark-parser/lark/pull/1110
    • specify ignored mypy error by using type: ignore[error] in lark/tree.py and lark/utils.py by @hf-kklein in https://github.com/lark-parser/lark/pull/1099
    • Add py.typed to package_data of lark package by @hf-kklein in https://github.com/lark-parser/lark/pull/1109
    • InteractiveParser: Added iter_parse() method, for easier instrumentation by @erezsh in https://github.com/lark-parser/lark/pull/1111

    New Contributors

    • @hf-kklein made their first contribution in https://github.com/lark-parser/lark/pull/1101

    Full Changelog: https://github.com/lark-parser/lark/compare/1.1.0...1.1.1

    Source code(tar.gz)
    Source code(zip)
  • 1.1.0(Jan 31, 2022)

    • Better support for typing and mypy. Includes generic tree typing (Thanks @plannigan!)

    • Improvements to python.lark (walrus operator, slashes in function params, and more). Now parses the entire Python 3.10 lib successfully

    • Bugfixes:

      • Transformer.__default__ not called in tree-less LALR mode (Issue #1029)
      • v_args failed to apply to class under standalone parser (Issue #1059)
      • maybe_placeholders incorrectly accumulated params when it encountered the | operator (Issue #1078)
    Source code(tar.gz)
    Source code(zip)
  • 1.0.0(Nov 15, 2021)

    Over the last few years, Lark has grown to become a comprehensive toolkit for parsing structured text.

    Today, I'm happy to announce the long anticipated version 1.0 of Lark, marking the API as stable.

    We've made quite a few breaking changes, in order to achieve congruous API with as little "gotchas" as possible. Upgrading to version 1.0 might require a few changes to your project.

    Breaking changes

    • Dropped Python 2 support! Lark now only supports Python 3.6 and up.

    • Install lark using pip install lark (instead of lark-parser ).

    • maybe_placeholders is now True by default.

    • Renamed TraditionalLexer to BasicLexer, and 'standard' lexer option to 'basic'.

    • Default priority is now 0, for both terminals and rules (used to be 1 for terminals).

    • Discard mechanism is now done by returning Discard, instead of raising it as an exception.

    • use_accepts in UnexpectedInput.match_examples() is now True by default.

    • v_args(meta=True) now gives meta as the first argument. i.e. (meta, children).

    Improvments

    • Better type annotations
    • Support for terminal priorities for dynamic Earley
    • Python3 grammar is now officially supported, and can be used via %import python (...)
    • New experimental feature: Tree Templates
    • Various bugfixes

    Acknowledgements

    Many thanks to all our contributors and donors, who made this release possible. Special thanks goes to -

    • @MegaIng, for innumerous features, bugfixes, and code-reviews.
    • @chanicpanic, for his immense and continual contributions to the Earley parser, and for helping with the v1.0 effort.
    • @erezsh, for being myself.
    Source code(tar.gz)
    Source code(zip)
  • 0.12.0(Aug 30, 2021)

    Announcements

    • This is likely to be the last major release that supports Python 2 !

    We are now working on a Python3.6+ only v1.0 branch, which will soon become the default. See the work in progress: https://github.com/lark-parser/lark/pull/925

    • We also have a new online IDE! Check it out here: https://lark-parser.github.io/ide

    • Lark can now generate standalone Javascript parsers! Check it out here: https://github.com/lark-parser/Lark.js (still in beta)

    Changes

    • Using rule repeat (~ syntax) is now much much faster for large numbers, thanks to @MegaIng

    • Bugfix for the propagate_positions option. Added option value propagate_positions='ignore_ws'.

    • Fixed reconstructor for when keep_all_tokens=True

    • Added merge_transformers (Thanks Robin!)

    • Many minor bugfixes, and improvements to code and docs

    Source code(tar.gz)
    Source code(zip)
  • 0.11.3(May 3, 2021)

    Cache

    • Lark now tracks changes in imported grammars (%import), and updates the cache if necessary
    • Added support for atomicwrites, for multiprocess caching and crash recovery

    InteractiveParser

    • Now an official interface (renamed from Puppet)
    • Added Lark.parse_interactive() for starting the parser in interactive mode

    Other

    • Added ast_utils, to assist in tranforming lark.Tree into a customized AST.

    • Better docs

    • Bugfixes

    Notification: Support for Python 2 is ending

    In the near future, Lark will drop support for Python 2. We will continue to develop for Python 3.6+ only, which will simplify the code and ease development.

    Old releases (including this one) will still work, of course, and should be stable enough to accompany the remaining Python 2 users into the sunset.

    If you have any objections, feel free to voice them here: https://github.com/lark-parser/lark/discussions/874

    Thanks for everyone who helped make Lark better!

    Source code(tar.gz)
    Source code(zip)
  • 0.11.2(Feb 16, 2021)

    New Features:

    • Better grammar re-use with the %override and %extend statements, which allow to rewrite and extend imported rules and tokens, similarly to class inheritance. (See this example: https://github.com/lark-parser/lark/blob/master/examples/advanced/extend_python.py)

    Improvements

    • Indenter now throws DedentError instead of AssertionError

    • Improved the Python3 grammar, now works with reconstructor. (See this example: https://github.com/lark-parser/lark/blob/master/examples/advanced/reconstruct_python.py)

    • Lots of refactoring for a better tomorrow.

    • rule/terminals names can now be in unicode. (thanks @julienmalard)

    • Better errors.

    • Better type hints.

    • lark.lark is now part of the standard library.

    • Earley:

      • Now works with match_examples()
      • Now supports a custom lexer
      • Better handling of ignored terminals
      • Faster forest visiting, and a few edge-case bugfixes (thanks @chanicpanic)

    Other

    • Lark now accepts funding as a member of Github Sponsors! See here: https://github.com/sponsors/lark-parser
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0(Nov 16, 2020)

    • LALR parser

      • The LALR parser now supports priority in rules, as a way to resolve collision errors

      • Improvements to the standalone tool, including more command-line options, like optional compression for the json data.

      • Improvements to the puppet error handling interface

      • Better error reporting on LALR collisions

    • Bugfixes in Earley

    Misc

    • Added support for syntax highlighting in Atom

    • Fixes and improvements for the cache option. cache=True now uses a temporary directory instead of working directory.

    • Lark can now be imported directly from a zip (See: ed5c8ec51c4c6e8bd0ac80caff6afcb90a97d218)

    • Added more terminals to the grammar library (available for %import).

    • Nearley tools now supports case insensitive strings

    • Deprecated some interfaces

    • Improvements to docs, stubs, and various bugfixes

    Thanks to @MegaIng for helping with Lark's maintenance, and to @ldbo, @chanicpanic, @michael-k, @ThatXliner and everyone else for their help and contributions.

    Source code(tar.gz)
    Source code(zip)
  • 0.10.0(Sep 21, 2020)

    • Complete overhaul of documentation. Now using sphinx to generate API docs from docstrings. (commit 0664cbd3d3c19e321cae8df044839e7baf7135af. Thank you @chsasank !)

      • Many improvements and additions to documentation
    • New and friendlier Earley SPPF interface! (commit 555b268eb26bcbfce64991ea7517338dee85a840. Thank you @chanicpanic !)

      • Added the ambiguity='forest' option. Added ForestTransformer and TreeForestTranformer.

      • Various Bugfixes to improve the handling of ambiguous results.

      • Read the docs here: https://lark-parser.readthedocs.io/en/latest/forest.html

    • New Vim syntax highlighting for Lark (https://github.com/lark-parser/vim-lark-syntax Thank you @omega16 !)

    • Lark now loads faster from cache (commit 7dc00179e63efa6e98d688bfba3265d382db79c4)

    • Terminals can now be composed of regexps and strings with different flags, if using Python 3.6+ (commit e6fc3c9b00306e3a8661210fcc93bf50479ee229)

    • Added support for parsing byte-strings, with the use_bytes flag (commit 9ee8428f3f6ad285ad93e2b62ec47d33fff54768).

    • UnexpectedToken exception now has the accepts attribute, which contains a list of terminals that would be accepted by the parser instead (in addition to the expects attribute, which is guided by the lexer and may include terminals that won't be accepted by the parser) (commit a7bcd0bc2d3cb96030d9e77523c0007e8034ce49)

    • Allow multiline regexes with the x flag (commit 9923987e94547ded8a17d7a03840c4cebce39188)

    • Lark no longer uses the default logger. Instead uses lark.LOGGER. (commit 7010f96825b5fbac79522d1b30689065df53dc8c)

    • Lark now notifies on unused terminals/rules through logging.debug.

    • Standalone generator now creates smaller files (without comments and docstrings). Also undergone various fixes. (commit bf2d9bf7b16cddb39f2e0ea3cefecc8de5269e2c)

    • Wheel distribution due to (somewhat) popular demand.

    • Lots of small bugfixes and improvements!

    Many thanks to @MegaIng for his continued work on many of these new features and fixes, and to everyone else who contributed to Lark and helped make it even better.

    Source code(tar.gz)
    Source code(zip)
  • 0.9.0(Jul 1, 2020)

    • Added error handling to LALR!

      • on_error option to Lark.parse(). Read here: https://lark-parser.readthedocs.io/en/latest/classes/#larkparse
      • Parser now comes with a puppet for advanced error handling. Read here: https://lark-parser.readthedocs.io/en/latest/classes/#parserpuppet
    • Support for better regexps with the regex module, when using Lark(..., regex=True) Read here: https://lark-parser.readthedocs.io/en/latest/classes/#using-unicode-character-classes-with-regex

    Source code(tar.gz)
    Source code(zip)
  • 0.8.9(Jun 16, 2020)

    The last two releases were wrong. I apologize.

    Hopefully that's the last of it, and we'll be back on track with periodic and accurate releases.

    Source code(tar.gz)
    Source code(zip)
  • 0.8.6(Jun 10, 2020)

    The main features for this release:

    • Grammar caching: It's now possible to cache the results of the LALR grammar analysis, for x2 to x3 faster loading. Use Lark(..., cache=True) or specify a file name. See here: https://lark-parser.readthedocs.io/en/latest/classes/

    • Grammar templates: Added support for grammar "functions" that expand in preprocessing. No docs yet, but see here for examples: https://github.com/lark-parser/lark/blob/master/tests/test_parser.py#L845

    • Lark online IDE: Technically not a feature, but it's possible to run Lark in the browser. Now we also have a simple IDE on github pages: https://lark-parser.github.io/lark/ide/app.html

    • Other changes:

      • Improved performance for large grammars

      • More debug prints when in debug mode

      • Better support for PyInstaller

      • Lots of bugfixes: mypy stubs, v_args, docs, and more.

    Source code(tar.gz)
    Source code(zip)
  • 0.8.3(Mar 28, 2020)

    • Added the g_regex_flags option, to allow applying flags to all terminals.
    • Fixed end_pos for Earley, when using propagate_positions
    • Fixes for mypy
    • Better docs
    Source code(tar.gz)
    Source code(zip)
  • 0.8.2(Mar 7, 2020)

    Changes in this version are:

    • Added type stubs for all public APIs, in order to support type checking and completion using MyPy (or others)

    • Added two new methods to the Lark class: Lark.save() and Lark.load(). Both methods pickle and unpickle (respectively) the class instance into/from file objects. These can be used to allow faster loading times. (future versions will implement an automatic caching feature)

    • The standalone parser is now MPL2, instead of GPL. The Mozilla Public License is much less restrictive, so this shouldn't affect anyone who's already using the standalone parser. But it should make it easier for other users to adopt it.

    Source code(tar.gz)
    Source code(zip)
  • 0.8.1(Jan 22, 2020)

  • 0.8.0(Jan 22, 2020)

    - Better LALR

    The biggest change to this release is a new LALR engine, that is capable of dealing with a few edge cases that the previous parser couldn't.

    This parser is supposed to be fully backwards-compatible with the previous one, but that is hard to verify!

    Thank you, @Raekye, for this great contribution to Lark!

    For more details, see issue #418

    - Transformers now visit tokens, as well as rules (an alternative to lexer_callbacks)

    Transformer now visit tokens, in addition to rules.

    Simply define a method with the correct name (uppercase, of course), and the transformer will visit your tokens before the rules that contain them.

    It's possible to disable this, for backwards compatibility, or for the slight performance gain.

    - Other Changes

    • Added visit_topdown methods to Visitor classes

    • Lark now allows line comments in its rule definitions

    • Better error messages

    • Improvements to documentation

    • Bugfixes

    • maybe_placeholders is now the default (backwards-incompatible)** (REVERTED in 0.8.1)

    Source code(tar.gz)
    Source code(zip)
  • 0.7.8(Nov 1, 2019)

    • Improved error messages for EOF in Earley, recursive terminals, UnexpectedToken

    • Bugfix for declared terminals, UnexpectedToken, unicode support in Python2,

    Source code(tar.gz)
    Source code(zip)
  • 0.7.7(Oct 3, 2019)

    • Fixed a bug in Earley where running it from different threads produced bad results

    • Improved error reporting when using LALR

    • Added 'edit_terminals' option, to allow programmatical manipulation of terminals, for example to support keywords in different languages.

    Note: This release skips 0.7.6, due to simple oversight on my part. Hopefully that shouldn't be a problem.

    Source code(tar.gz)
    Source code(zip)
  • 0.7.5(Sep 6, 2019)

    Lark transformers can now visit tokens as well. Use like this:

    class MyTransformer(Transformer):
        def TOKEN1(self, tok):
            return tok.upper()
    
        def rule_as_usual(self, children):
            return children
    
    MyTransformer(visit_tokens=True).transform(tree)
    

    Fixed a few regressions that I accidentally added to 0.7.4

    Source code(tar.gz)
    Source code(zip)
  • 0.7.4(Aug 29, 2019)

    • Fixed long-standing non-determinism and prioritization bugs in Earley.

    • Serialize tool now supports multiple start symbols

    • iter_subtrees, find_data and find_pred methods are now included in standalone parser

    • Bugfixes for the transformer interface, for the custom lexer, for grammar imports, and many more

    Source code(tar.gz)
    Source code(zip)
  • 0.7.3(Aug 14, 2019)

    • Added a new tool called Serialize, that stores Lark's internal state as JSON. That will allow for integration with other languages. I have already started such a project for Julia: https://github.com/erezsh/Lark_Julia (It's working, but still in early stages)

    • Minor bugfix regarding line-counting and the \s regex

    Source code(tar.gz)
    Source code(zip)
  • 0.7.2(Jul 30, 2019)

    New features:

    • Lark now allows you to specify the start symbol when calling Lark.parse() (requires pre-declaration of all possible start states, see the start option)

    • Negative priority now allows in rules and terminals (default value is still 1, may change in 0.8)

    Also includes many minor bugfixes, optimizations, and improvements to documentation

    Source code(tar.gz)
    Source code(zip)
  • 0.7.1(May 4, 2019)

    • Lark can now serialize its parsers, resulting in simplified stand-alone code.

    • Bugfix for v_args (Issue #350)

    • Improvements and bugfixes for importing rules from grammar files

    • Performance improvement for the reconstructor feature

    Source code(tar.gz)
    Source code(zip)
Owner
Lark - Parsing Library & Toolkit
Lark - Parsing Library & Toolkit
ticktock is a minimalist library to view Python time performance of Python code.

ticktock is a minimalist library to view Python time performance of Python code.

Victor Benichoux 30 Sep 28, 2022
SysInfo is an app developed in python which gives Basic System Info , and some detailed graphs of system performance .

SysInfo SysInfo is an app developed in python which gives Basic System Info , and some detailed graphs of system performance . Installation Download t

null 5 Nov 8, 2021
Fcpy: A Python package for high performance, fast convergence and high precision numerical fractional calculus computing.

Fcpy: A Python package for high performance, fast convergence and high precision numerical fractional calculus computing.

SciFracX 1 Mar 23, 2022
Toolkit for collecting and applying templates of prompting instances

PromptSource Toolkit for collecting and applying templates of prompting instances. WIP Setup Download the repo Navigate to root directory of the repo

BigScience Workshop 1k Jan 5, 2023
A toolkit for writing and executing automation scripts for Final Fantasy XIV

XIV Scripter This is a tool for scripting out series of actions in FFXIV. It allows for custom actions to be defined in config.yaml as well as custom

Jacob Beel 1 Dec 9, 2021
Python based utilities for interacting with digital multimeters that are built on the FS9721-LP3 chipset.

Python based utilities for interacting with digital multimeters that are built on the FS9721-LP3 chipset.

Fergus 1 Feb 2, 2022
This is discord nitro code generator and checker made with python. This will generate nitro codes and checks if the code is valid or not. If code is valid then it will print the code leaving 2 lines and if not then it will print '*'.

Discord Nitro Generator And Checker ⚙️ Rᴜɴ Oɴ Rᴇᴘʟɪᴛ ??️ Lᴀɴɢᴜᴀɢᴇs Aɴᴅ Tᴏᴏʟs If you are taking code from this repository without a fork, then atleast

Vɪɴᴀʏᴀᴋ Pᴀɴᴅᴇʏ 37 Jan 7, 2023
A python package containing all the basic functions and classes for python. From simple addition to advanced file encryption.

A python package containing all the basic functions and classes for python. From simple addition to advanced file encryption.

PyBash 11 May 22, 2022
🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Boltons boltons should be builtins. Boltons is a set of over 230 BSD-licensed, pure-Python utilities in the same spirit as — and yet conspicuously mis

Mahmoud Hashemi 6k Jan 4, 2023
A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!

Ubelt is a small library of robust, tested, documented, and simple functions that extend the Python standard library. It has a flat API that all behav

Jon Crall 638 Dec 13, 2022
isort is a Python utility / library to sort imports alphabetically, and automatically separated into sections and by type.

isort is a Python utility / library to sort imports alphabetically, and automatically separated into sections and by type. It provides a command line utility, Python library and plugins for various editors to quickly sort all your imports.

Python Code Quality Authority 5.5k Jan 8, 2023
A python lib for generate random string and digits and special characters or A combination of them

A python lib for generate random string and digits and special characters or A combination of them

Torham 4 Nov 15, 2022
Cleaning-utils - a collection of small Python functions and classes which make cleaning pipelines shorter and easier

cleaning-utils [] [] [] cleaning-utils is a collection of small Python functions

null 4 Aug 31, 2022
✨ Voici un code en Python par moi, et en français qui permet d'exécuter du Javascript en Python.

JavaScript In Python ❗ Voici un code en Python par moi, et en français qui permet d'exécuter du Javascript en Python. ?? Une vidéo pour vous expliquer

MrGabin 4 Mar 28, 2022
Simple python module to get the information regarding battery in python.

Battery Stats A python3 module created for easily reading the current parameters of Battery in realtime. It reads battery stats from /sys/class/power_

Shreyas Ashtamkar 5 Oct 21, 2022
Python @deprecat decorator to deprecate old python classes, functions or methods.

deprecat Decorator Python @deprecat decorator to deprecate old python classes, functions or methods. Installation pip install deprecat Usage To use th

null 12 Dec 12, 2022
Find dependent python scripts of a python script in a project directory.

Find dependent python scripts of a python script in a project directory.

null 2 Dec 5, 2021
An OData v4 query parser and transpiler for Python

odata-query is a library that parses OData v4 filter strings, and can convert them to other forms such as Django Queries, SQLAlchemy Queries, or just plain SQL.

Gorilla 39 Jan 5, 2023
Python program to do with percentages and chances, random generation.

Chances and Percentages Python program to do with percentages and chances, random generation. What is this? This small program will generate a list wi

n0 3 Jul 15, 2021