Building a Recursive Descent Parser

This is a very basic overview and will not suffice to fully implement a recursive descent parser.

Productions

All production heads are non-terminals, so a function must exist for each non-terminal.

Production Body

For all terminals in the production body, call match(t), for all non-terminals call the associated function.

Example 1

A \rightarrow aBB ~|~cAcB

B \rightarrow bca~|~c

Non-Terminals = {A, B}

Terminals = {a,c,b}

A() {
    if (lookahead == 'a') {
        match(a);
        B();
        B();
    }
    else if (lookahead == 'c') {
        match(c);
        A();
        match(c);
        B();
    }
    else {
        error();
    }
}

B() {
    if (lookahead == 'b') {
        match(b);
        match(c);
        match(a);
    }
    else if (lookahead == 'c') {
        match(c);
    }
    else {
        error();
    }
}

match(token t) {
    if (lookahead == t) {
        lookahead = nexttoken();
    }
    else {
        error();
    }
}

Example 2

expr \rightarrow i ~cond ~t ~expr ~\{print "true"\} ~e ~expr ~\{print "false"\}

expr \rightarrow \epsilon

expr() {
    if (lookahead == 'i') {
        match(i);
        cond();
        match(t);
        expr();
        print("true");
        match(e);
        expr();
        print("false");
    }
    // if epsilon, then do nothing!
}

cond() {
    // some code here
}

match(token t) {
    if (lookahead == t) {
        lookahead = nexttoken();
    }
    else {
        error();
    }
}

Derivations

A syntax-tree can be leftmost or rightmost derived; left and right refer to the direction in which non-terminals are expanded.

Notation

Given ~A \rightarrow \upsilon, ~if we ~have ~\alpha A \beta , ~then ~aAb \Rightarrow a \upsilon \Beta

$\Rightarrow$ reads "derives"
$\Rightarrow^*$ reads "derives in zero or more steps."
$\Rightarrow^+$ reads "derives in one or more steps."
$If a \Rightarrow^*B ~ and ~B \Rightarrow \upsilon, ~then a \Rightarrow^* \upsilon$

$if ~S \Rightarrow^* a, ~where ~S ~is ~the ~start ~symbol, ~a, ~is ~a ~sentential ~form ~of ~G$

Can contain terminals or non-terminals

If the sentential form contains no non-terminals, it is a Sentence.

This is because no non-terminals means all non-terminals, that, in zero or more steps, were derived to reach the symbol, have been expanded.

All that remains is the yield of the syntax-tree, or a string of terminals

The language of some Grammar G is the set of Sentences.

A language generated by a grammar is a context-free Grammar.

See this visual of an AST for the C++ programming language on Stack Overflow.

LL(1) Grammar

The first 'L' denotes the order in which input is scanned; in this case, from left to right. The second 'L' denotes the derivation direction and '1' represents the number of input characters viewed at a time.

Thus, LL(1) means we scan from left to right, building a leftmost derivation tree, while observing one character at a time.

LL Grammar is suitable for top-down parsers.

PreviousFIRST FOLLOW (Top-Down) Parsing NextConstruction: Regular Expression -> NFA

Last updated 10 months ago

Was this helpful?