Maurice's Notes
Blog
Low Level Computing
Low Level Computing
  • Operating Systems
    • General Operating Systems
      • OS Structure
      • Main Memory
        • Basic Hardware
        • Address Binding
        • Memory Address Register
      • Booting
        • MBR (Master Boot Record)
        • Global Descriptor Table
      • Direct Memory Access (DMA)
        • DMA
      • Processes
        • Basics
        • Process Scheduling
    • Linux Operating System
      • Linker Scripts
      • Position Independent Code/Executable
      • Relocation
      • Understanding PLT and GOT
    • Windows Operating System
      • Page 1
    • Real-Time Embedded Systems
      • Real-Time Scheduling
        • Cyclic Executive
  • Computer Architecture
    • Architecture Fundamentals
      • Introduction
      • Cache Basics
      • Cache Memory
      • A Few CPU Formulas
    • RISC Architectures
      • ARM
        • ARM Design Philosophy
        • RISC Review
        • Exceptions, Interrupts, & Vector Table
        • ARM Pipelines
        • ARM Registers
        • ARM Branch Instructions
        • ARM CSPR (Instructions)
        • ARM Data Processing Instructions
        • Load/Store Instructions
        • Profiling Cycle Counter
        • Compiler Optimizations
      • RISCV
    • CISC Architectures
    • Cache Coherency
      • Basic Introduction
      • Memory Sequential Consistency
  • Exploits
    • Walkthrough: Return-to-Libc
    • Access Physical Memory
  • Compilers & Runtime Systems
    • Introduction
      • Programming Language Basics
      • Static Scope
    • Syntax Translation
      • Syntax Defined
      • Parsing
    • Algorithms
      • FIRST FOLLOW (Top-Down) Parsing
      • Building a Recursive Descent Parser
      • Construction: Regular Expression -> NFA
Powered by GitBook
On this page
  • Productions
  • Production Body
  • Example 1
  • Example 2
  • Derivations
  • Notation
  • LL(1) Grammar

Was this helpful?

  1. Compilers & Runtime Systems
  2. Algorithms

Building a Recursive Descent Parser

This is a very basic overview and will not suffice to fully implement a recursive descent parser.

Productions

All production heads are non-terminals, so a function must exist for each non-terminal.

Production Body

For all terminals in the production body, call match(t), for all non-terminals call the associated function.

Example 1

A→aBB ∣ cAcBA \rightarrow aBB ~|~cAcBA→aBB ∣ cAcB
B→bca ∣ cB \rightarrow bca~|~cB→bca ∣ c

Non-Terminals = {A, B}

Terminals = {a,c,b}

A() {
    if (lookahead == 'a') {
        match(a);
        B();
        B();
    }
    else if (lookahead == 'c') {
        match(c);
        A();
        match(c);
        B();
    }
    else {
        error();
    }
}

B() {
    if (lookahead == 'b') {
        match(b);
        match(c);
        match(a);
    }
    else if (lookahead == 'c') {
        match(c);
    }
    else {
        error();
    }
}

match(token t) {
    if (lookahead == t) {
        lookahead = nexttoken();
    }
    else {
        error();
    }
}

Example 2

expr→i cond t expr {print"true"} e expr {print"false"}expr \rightarrow i ~cond ~t ~expr ~\{print "true"\} ~e ~expr ~\{print "false"\}expr→i cond t expr {print"true"} e expr {print"false"}
expr→ϵexpr \rightarrow \epsilonexpr→ϵ
expr() {
    if (lookahead == 'i') {
        match(i);
        cond();
        match(t);
        expr();
        print("true");
        match(e);
        expr();
        print("false");
    }
    // if epsilon, then do nothing!
}

cond() {
    // some code here
}

match(token t) {
    if (lookahead == t) {
        lookahead = nexttoken();
    }
    else {
        error();
    }
}

Derivations

A syntax-tree can be leftmost or rightmost derived; left and right refer to the direction in which non-terminals are expanded.

Notation

Given A→υ, ifwe have αAβ, then aAb⇒aυBGiven ~A \rightarrow \upsilon, ~if we ~have ~\alpha A \beta , ~then ~aAb \Rightarrow a \upsilon \BetaGiven A→υ, ifwe have αAβ, then aAb⇒aυB

  • ⇒\Rightarrow⇒ reads "derives"

  • ⇒∗\Rightarrow^*⇒∗ reads "derives in zero or more steps."

  • ⇒+\Rightarrow^+⇒+ reads "derives in one or more steps."

  • Ifa⇒∗B and B⇒υ, thena⇒∗υIf a \Rightarrow^*B ~ and ~B \Rightarrow \upsilon, ~then a \Rightarrow^* \upsilonIfa⇒∗B and B⇒υ, thena⇒∗υ

if S⇒∗a, where S is the start symbol, a, is a sentential form of Gif ~S \Rightarrow^* a, ~where ~S ~is ~the ~start ~symbol, ~a, ~is ~a ~sentential ~form ~of ~Gif S⇒∗a, where S is the start symbol, a, is a sentential form of G

Can contain terminals or non-terminals

If the sentential form contains no non-terminals, it is a Sentence.

This is because no non-terminals means all non-terminals, that, in zero or more steps, were derived to reach the symbol, have been expanded.

All that remains is the yield of the syntax-tree, or a string of terminals

The language of some Grammar G is the set of Sentences.

A language generated by a grammar is a context-free Grammar.

LL(1) Grammar

The first 'L' denotes the order in which input is scanned; in this case, from left to right. The second 'L' denotes the derivation direction and '1' represents the number of input characters viewed at a time.

Thus, LL(1) means we scan from left to right, building a leftmost derivation tree, while observing one character at a time.

LL Grammar is suitable for top-down parsers.

PreviousFIRST FOLLOW (Top-Down) ParsingNextConstruction: Regular Expression -> NFA

Last updated 7 months ago

Was this helpful?

See this on Stack Overflow.

visual of an AST for the C++ programming language