In this post we will review usage of the ANTLR parser in a python application.

The code is base on the ANTLR Python documentation, and on some ANTLR samples.

For this post example, we will create a simple calculator that can parse simple expressions using the following operations:

plus
minus
divide
multiple

Installation

Install ANTLR:

sudo apt install antlr4

And then install the ANTLR python library:

pip3 install antlr4-python3-runtime

Create ANTLR Grammar

The ANTLR grammar file defines our language.

We create a calc.g4 file for it:

grammar calc;

expression
   : plusExpression
   | minusExpression
   ;

plusExpression
   : priorityExpression (PLUS  priorityExpression)*
   ;

minusExpression
   : priorityExpression (MINUS  priorityExpression)*
   ;

priorityExpression
    : mulExpression
    | divExpression
    ;

mulExpression
   : atom (TIMES atom)*
   ;

divExpression
   : atom (DIV atom)*
   ;

atom
   : NUMBER
   | LPAREN expression RPAREN
   ;

NUMBER
   : ('0' .. '9') +
   ;

LPAREN
   : '('
   ;


RPAREN
   : ')'
   ;


PLUS
   : '+'
   ;


MINUS
   : '-'
   ;


TIMES
   : '*'
   ;

DIV
   : '/'
   ;

The grammar file is based on a recursive definition of expression. Notice that this will cause the ANTLR to build a tree, that we will later use to analyze the expression, and calculate the result.

In our case we must ensure to build the parsed tree according to the order of the mathematics operations, hence the expression is first spit to plus/minus expression, and only then to multiply/divide expression.

Example for tree (from http://csci201.artifice.cc/notes/antlr.html)

The Python Code

First generate the python code for parsing based on the ANTLR grammar file:

antlr4 -Dlanguage=Python3 calc.g4

This create several files that we should import in our application:

from antlr4 import *

from calcLexer import calcLexer
from calcListener import calcListener
from calcParser import calcParser

Next we create a listener. The listener inherits the generated listener and can override its methods to run some logic. In our case the logic will be run upon exit of each rule, which means it will be run after the child nodes of the current expression were already scanned.

class MyListener(calcListener):
    def __init__(self):
        self.verbose = False
        self.stack = []

    def debug(self, *argv):
        if self.verbose:
            print(*argv)

    def debug_items(self, operation, items):
        if len(items) == 1:
            self.debug(operation, items[0])
        else:
            self.debug(operation.join(map(str, items)), "=", self.stack[-1])
        self.debug("stack is {}".format(self.stack))

    def exitAtom(self, ctx: calcParser.AtomContext):
        number = ctx.NUMBER()
        if number is not None:
            value = int(str(ctx.NUMBER()))
            self.stack.append(value)
            self.debug("atom {}".format(value))

    def exitPlusExpression(self, ctx: calcParser.PlusExpressionContext):
        elements = len(ctx.PLUS()) + 1
        items = self.stack[-elements:]
        items_result = sum(items)
        self.stack = self.stack[:-elements]
        self.stack.append(items_result)
        self.debug_items("+", items)

    def exitMinusExpression(self, ctx: calcParser.MinusExpressionContext):
        elements = len(ctx.MINUS()) + 1
        items = self.stack[-elements:]
        items_result = items[0] - sum(items[1:])
        self.stack = self.stack[:-elements]
        self.stack.append(items_result)
        self.debug_items("-", items)

    def exitMulExpression(self, ctx: calcParser.MulExpressionContext):
        elements = len(ctx.TIMES()) + 1
        items = self.stack[-elements:]
        items_result = math.prod(items)
        self.stack = self.stack[:-elements]
        self.stack.append(items_result)
        self.debug_items("*", items)

    def exitDivExpression(self, ctx: calcParser.DivExpressionContext):
        elements = len(ctx.DIV()) + 1
        items = self.stack[-elements:]
        if len(items) > 1:
            items_result = items[0] / math.prod(items[1:])
        else:
            items_result = items[0]
        self.stack = self.stack[:-elements]
        self.stack.append(items_result)
        self.debug_items("/", items)

Now we can parse an expression, and run our listener:

def parse(text, verbose=False):
    input_stream = InputStream(text)
    lexer = calcLexer(input_stream)
    stream = CommonTokenStream(lexer)
    parser = calcParser(stream)
    tree = parser.expression()
    listener = MyListener()
    listener.verbose = verbose
    walker = ParseTreeWalker()
    walker.walk(listener, tree)
    return listener.stack[0]

And finally, lets run some tests:

def test(text, expected):
    print(text)
    actual = parse(text)
    print(text, "=", actual)

    if actual != expected:
        print("=== rerun in verbose ===")
        parse(text, True)
        raise Exception("expected {} but actual is {}".format(expected, actual))


test("1", 1)
test("1+2", 3)
test("3*4", 12)
test("10-8", 2)
test("10/2", 5)
test("4+2+3", 9)
test("90-10-20", 60)
test("(1)", 1)
test("(1)+(2)", 3)
test("(1+2)*(3+4)", 21)
test("(10-8)*(1+2+3)*4", 48)
test("(11-1)-(10-5)", 5)
test("(11-1)/(10-5)", 2)

Final Note

ANTLR is a great tool, that can be used in python, and in more lanuguages.

The listener API of ANTLR enables us to run any interpretation of the grammar result, and run our own logic, hence getting the max out of the parsed text.

Full Blog TOC

Full Blog Table Of Content with Keywords Available HERE

Wednesday, December 9, 2020

Example for ANTLR usage in Python

Installation

Create ANTLR Grammar

The Python Code

Final Note

No comments:

Post a Comment