G=<N,T,P,Prog> where N={Prog,Expr,Term,Storable,Factor}, T={number,+,-,*,/,S,R,(,),EOF} the start symbol is Prog, and the set of productions is given by these rules
Prog -> Expr EOF
Expr -> Expr + Term | Expr - Term | Term
Term -> Term * Storable | Term / Storable | Storable
Storable -> Factor S | Factor
Factor -> number | R | ( Expr )
Using this grammar we can check to see that the example expression given above is valid by constructing a derivation of the sentence. A derivation starts with the start symbol and replace one non-terminal at each step to generate the sentence. For instance, a derivation that generates the example above is:
Prog => Expr EOF =>Term EOF =>Term * Storable EOF => Storable * Storable EOF =>Factor * Storable EOF => ( Expr ) * Storable =>EOF ( Expr + Term ) * Storable EOF => ( Term + Term ) * Storable EOF => ( Storable + Term ) * Storable EOF => ( Factor S + Term ) * Storable EOF => ( 5 S + Term ) * Storable EOF => ( 5 S + Storable ) * Storable EOF => ( 5 S + Factor ) * Storable EOF => ( 5 S + Factor ) * Storable EOF => ( 5 S + 4 ) * Storable EOF => ( 5 S + 4 ) * Factor EOF => ( 5 S + 4 ) * R EOF
This derivation proves that (5S+4)*R EOF is a syntactically valid expression in the language of the grammar. The grammar presented above is what is called an LALR(1) grammar. This kind of grammar can be used by a program called a parser generator that given the grammar, will automatically generate a program called a parser. A parser is a program that given a sentence in its language, will construct a derivation of that sentence to check it for syntactic correctness. The derivation can also be expressed in tree form, called a parse tree. There may be many different derivations for a sentence in a language, but only one parse tree (if the grammar is unambiguous). Parse trees are outside the scope of this document. Take CS67 if you want to learn about parse trees.
Although parsers can be generated by parser generators, it is still sometimes convenient to write a parser by hand. However, LALR(1) grammars are not easy to use to manually construct parsers. Instead, we want an LL(1) grammar if we are going to manually construct a parser. An LL(1) grammar can be used to construct a top-down or recursive descent parser where an LALR(1) grammar is typically used to construct a bottom-up parser. A top-down parser constructs (or at least traverses) the parse tree starting at the root of the tree and proceeding downward. A bottom-up parser constructs or traverses the parse tree in a bottom-up fashion. Again, for more details, take CS67.
In a recursive descent parser, each non-terminal in the grammar becomes a function in a program. The right hand side of the productions become the bodies of the functions. An LALR(1) grammar is not appropriate for constructing a recursive descent parser. To create a recursive-descent parser (the topic of this page) we must convert the LALR(1) grammar above to an LL(1) grammar. Typically, there are two steps involved.
Expr -> Term RestExpr
RestExpr -> + Term RestExpr | - Term RestExpr | <null>
We must also eliminate left recursion in the Term Term * Factor | Term / Factor productions in the same way. We end up with an LL(1) grammar that looks like this:
Prog -> Expr EOF
Expr -> Term RestExpr
RestExpr -> + Term RestExpr | - Term RestExpr | <null>
Term -> Storable RestTerm
RestTerm -> * Storable RestTerm | / Storable RestTerm | <null>
Storable -> Factor S | Factor
Factor -> number | R | ( Expr )
Once you have an LL(1) grammar you use it to build a parser as follows. The following construction causes the parser to return an abstract syntax tree or expression tree for the sentence being parsed.
Assume that you have a binary tree class in Java called BTNode that can hold an Object and that operators in the expression are Characters and numbers in the expression are Doubles. Also, assume there are two functions that get the next Token in the input (getToken) and push the last token back on the input (pushBackToken). Then, we can write the first part of the parser as follows:
private BTNode Prog() { BTNode tmp = Expr(); if (!getNextToken().toString().equals("$")) throw new IllegalStateException("Error in expression"); return tmp; } private BTNode Expr() { return RestExpr(Term()); } private BTNode RestExpr(BTNode E1) { Object on = getNextToken(); if (on.toString().equals("+") || on.toString().equals("-")) return RestExpr(new BTNode(on,E1,Term())); pushBackToken(); return E1; }