Building a Recursive Descent Parser

This is a very basic overview and will not suffice to fully implement a recursive descent parser.

Productions

All production heads are non-terminals, so a function must exist for each non-terminal.

Production Body

For all terminals in the production body, call match(t), for all non-terminals call the associated function.

Example 1

AaBB  cAcBA \rightarrow aBB ~|~cAcB
Bbca  cB \rightarrow bca~|~c

Non-Terminals = {A, B}

Terminals = {a,c,b}

A() {
    if (lookahead == 'a') {
        match(a);
        B();
        B();
    }
    else if (lookahead == 'c') {
        match(c);
        A();
        match(c);
        B();
    }
    else {
        error();
    }
}

B() {
    if (lookahead == 'b') {
        match(b);
        match(c);
        match(a);
    }
    else if (lookahead == 'c') {
        match(c);
    }
    else {
        error();
    }
}

match(token t) {
    if (lookahead == t) {
        lookahead = nexttoken();
    }
    else {
        error();
    }
}

Example 2

expri cond t expr {print"true"} e expr {print"false"}expr \rightarrow i ~cond ~t ~expr ~\{print "true"\} ~e ~expr ~\{print "false"\}
exprϵexpr \rightarrow \epsilon
expr() {
    if (lookahead == 'i') {
        match(i);
        cond();
        match(t);
        expr();
        print("true");
        match(e);
        expr();
        print("false");
    }
    // if epsilon, then do nothing!
}

cond() {
    // some code here
}

match(token t) {
    if (lookahead == t) {
        lookahead = nexttoken();
    }
    else {
        error();
    }
}

Derivations

A syntax-tree can be leftmost or rightmost derived; left and right refer to the direction in which non-terminals are expanded.

Notation

Given Aυ, ifwe have αAβ, then aAbaυBGiven ~A \rightarrow \upsilon, ~if we ~have ~\alpha A \beta , ~then ~aAb \Rightarrow a \upsilon \Beta

  • \Rightarrow reads "derives"

  • \Rightarrow^* reads "derives in zero or more steps."

  • +\Rightarrow^+ reads "derives in one or more steps."

  • IfaB and Bυ, thenaυIf a \Rightarrow^*B ~ and ~B \Rightarrow \upsilon, ~then a \Rightarrow^* \upsilon

if Sa, where S is the start symbol, a, is a sentential form of Gif ~S \Rightarrow^* a, ~where ~S ~is ~the ~start ~symbol, ~a, ~is ~a ~sentential ~form ~of ~G

Can contain terminals or non-terminals

If the sentential form contains no non-terminals, it is a Sentence.

This is because no non-terminals means all non-terminals, that, in zero or more steps, were derived to reach the symbol, have been expanded.

All that remains is the yield of the syntax-tree, or a string of terminals

The language of some Grammar G is the set of Sentences.

A language generated by a grammar is a context-free Grammar.

LL(1) Grammar

The first 'L' denotes the order in which input is scanned; in this case, from left to right. The second 'L' denotes the derivation direction and '1' represents the number of input characters viewed at a time.

Thus, LL(1) means we scan from left to right, building a leftmost derivation tree, while observing one character at a time.

LL Grammar is suitable for top-down parsers.

Last updated