Wednesday, December 9, 2020

Example for ANTLR usage in Python



 

In this post we will review usage of the ANTLR parser in a python application.

The code is base on the ANTLR Python documentation, and on some ANTLR samples.


For this post example, we will create a simple calculator that can parse simple expressions using the following operations:

  • plus
  • minus
  • divide
  • multiple



Installation


Install ANTLR:

sudo apt install antlr4



And then install the ANTLR python library:


pip3 install antlr4-python3-runtime



Create ANTLR Grammar


The ANTLR grammar file defines our language.

We create a calc.g4 file for it:


grammar calc;

expression
: plusExpression
| minusExpression
;

plusExpression
: priorityExpression (PLUS priorityExpression)*
;

minusExpression
: priorityExpression (MINUS priorityExpression)*
;

priorityExpression
: mulExpression
| divExpression
;

mulExpression
: atom (TIMES atom)*
;

divExpression
: atom (DIV atom)*
;

atom
: NUMBER
| LPAREN expression RPAREN
;

NUMBER
: ('0' .. '9') +
;

LPAREN
: '('
;


RPAREN
: ')'
;


PLUS
: '+'
;


MINUS
: '-'
;


TIMES
: '*'
;

DIV
: '/'
;



The grammar file is based on a recursive definition of expression. Notice that this will cause the ANTLR to build a tree, that we will later use to analyze the expression, and calculate the result.

In our case we must ensure to build the parsed tree according to the order of the mathematics operations, hence the expression is first spit to plus/minus expression, and only then to multiply/divide expression.


Example for tree (from http://csci201.artifice.cc/notes/antlr.html)





The Python Code


First generate the python code for parsing based on the ANTLR grammar file:

antlr4 -Dlanguage=Python3 calc.g4



This create several files that we should import in our application:


from antlr4 import *

from calcLexer import calcLexer
from calcListener import calcListener
from calcParser import calcParser



Next we create a listener. The listener inherits the generated listener and can override its methods to run some logic. In our case the logic will be run upon exit of each rule, which means it will be run after the child nodes of the current expression were already scanned.


class MyListener(calcListener):
def __init__(self):
self.verbose = False
self.stack = []

def debug(self, *argv):
if self.verbose:
print(*argv)

def debug_items(self, operation, items):
if len(items) == 1:
self.debug(operation, items[0])
else:
self.debug(operation.join(map(str, items)), "=", self.stack[-1])
self.debug("stack is {}".format(self.stack))

def exitAtom(self, ctx: calcParser.AtomContext):
number = ctx.NUMBER()
if number is not None:
value = int(str(ctx.NUMBER()))
self.stack.append(value)
self.debug("atom {}".format(value))

def exitPlusExpression(self, ctx: calcParser.PlusExpressionContext):
elements = len(ctx.PLUS()) + 1
items = self.stack[-elements:]
items_result = sum(items)
self.stack = self.stack[:-elements]
self.stack.append(items_result)
self.debug_items("+", items)

def exitMinusExpression(self, ctx: calcParser.MinusExpressionContext):
elements = len(ctx.MINUS()) + 1
items = self.stack[-elements:]
items_result = items[0] - sum(items[1:])
self.stack = self.stack[:-elements]
self.stack.append(items_result)
self.debug_items("-", items)

def exitMulExpression(self, ctx: calcParser.MulExpressionContext):
elements = len(ctx.TIMES()) + 1
items = self.stack[-elements:]
items_result = math.prod(items)
self.stack = self.stack[:-elements]
self.stack.append(items_result)
self.debug_items("*", items)

def exitDivExpression(self, ctx: calcParser.DivExpressionContext):
elements = len(ctx.DIV()) + 1
items = self.stack[-elements:]
if len(items) > 1:
items_result = items[0] / math.prod(items[1:])
else:
items_result = items[0]
self.stack = self.stack[:-elements]
self.stack.append(items_result)
self.debug_items("/", items)



Now we can parse an expression, and run our listener:


def parse(text, verbose=False):
input_stream = InputStream(text)
lexer = calcLexer(input_stream)
stream = CommonTokenStream(lexer)
parser = calcParser(stream)
tree = parser.expression()
listener = MyListener()
listener.verbose = verbose
walker = ParseTreeWalker()
walker.walk(listener, tree)
return listener.stack[0]



And finally, lets run some tests:



def test(text, expected):
print(text)
actual = parse(text)
print(text, "=", actual)

if actual != expected:
print("=== rerun in verbose ===")
parse(text, True)
raise Exception("expected {} but actual is {}".format(expected, actual))


test("1", 1)
test("1+2", 3)
test("3*4", 12)
test("10-8", 2)
test("10/2", 5)
test("4+2+3", 9)
test("90-10-20", 60)
test("(1)", 1)
test("(1)+(2)", 3)
test("(1+2)*(3+4)", 21)
test("(10-8)*(1+2+3)*4", 48)
test("(11-1)-(10-5)", 5)
test("(11-1)/(10-5)", 2)



Final Note


ANTLR is a great tool, that can be used in python, and in more lanuguages.
The listener API of ANTLR enables us to run any interpretation of the grammar result, and run our own logic, hence getting the max out of the parsed text.

 

No comments:

Post a Comment