Maurice's Notes
Blog
Low Level Computing
Low Level Computing
  • Operating Systems
    • General Operating Systems
      • OS Structure
      • Main Memory
        • Basic Hardware
        • Address Binding
        • Memory Address Register
      • Booting
        • MBR (Master Boot Record)
        • Global Descriptor Table
      • Direct Memory Access (DMA)
        • DMA
      • Processes
        • Basics
        • Process Scheduling
    • Linux Operating System
      • Linker Scripts
      • Position Independent Code/Executable
      • Relocation
      • Understanding PLT and GOT
    • Windows Operating System
      • Page 1
    • Real-Time Embedded Systems
      • Real-Time Scheduling
        • Cyclic Executive
  • Computer Architecture
    • Architecture Fundamentals
      • Introduction
      • Cache Basics
      • Cache Memory
      • A Few CPU Formulas
    • RISC Architectures
      • ARM
        • ARM Design Philosophy
        • RISC Review
        • Exceptions, Interrupts, & Vector Table
        • ARM Pipelines
        • ARM Registers
        • ARM Branch Instructions
        • ARM CSPR (Instructions)
        • ARM Data Processing Instructions
        • Load/Store Instructions
        • Profiling Cycle Counter
        • Compiler Optimizations
      • RISCV
    • CISC Architectures
    • Cache Coherency
      • Basic Introduction
      • Memory Sequential Consistency
  • Exploits
    • Walkthrough: Return-to-Libc
    • Access Physical Memory
  • Compilers & Runtime Systems
    • Introduction
      • Programming Language Basics
      • Static Scope
    • Syntax Translation
      • Syntax Defined
      • Parsing
    • Algorithms
      • FIRST FOLLOW (Top-Down) Parsing
      • Building a Recursive Descent Parser
      • Construction: Regular Expression -> NFA
Powered by GitBook
On this page
  • Context-Free Grammar Components
  • Derivations

Was this helpful?

  1. Compilers & Runtime Systems
  2. Syntax Translation

Syntax Defined

Context-free grammar, or simply, grammar, is how syntax of a language is defined. Consider the typical if-else statement:

if (expression) statement else statement

The if-else structure of this statement illustrates that it is nothing more than an if keyword, followed by an opening parenthesis, an expression, a closing parenthesis, a statement, the else keyword, and another statement.

Using the variable expr to denote an expression and stmt to denote a statement, the structure rule can be defined as:

stmt -> if ( expr ) stmt else stmt

The arrow represents the words "can have the form." In other words, we've defined a grammar rule for if-else conditionals, that read: statement can have the form if (expr) stmt else stmt. This is known as a production rule. Lexical elements of the keyword if, else, and parenthesis are referred to as terminals.

Variables such as expr and stmt represent sequences of terminals and are called nonterminals.

Context-Free Grammar Components

  1. A set of terminal symbols, sometimes called tokens, which are elementary symbols of the language defined by the grammar.

  2. A set of nonterminals, sometimes referred to as syntactic variables, each representing a set of strings of terminals.

  3. A set of productions, where each production consists of a nonterminal, called the head or left side of the production, an arrow, and a sequence of terminals and/or nonterminals, called the body or right side of the production. The purpose of a production is to specify one of the written forms of construct.

  4. A designation of one of the nonterminals as the start symbol.

For this guide, grammars are specified by listing the productions.

  • Productions for the start symbol are listed first.

  • Digits, signs, and boldface strings are assumed to be terminals.

  • Italicized name is a nonterminal.

  • Productions with the same nonterminal head can have their bodies groups using the pipe symbol "|" as the delimiter.

Example

If an example uses an expression consisting of digits and plus and minus signs (e.g., 9-5+2, 3-1, 7), we will refer to that expression as a list of digits separated by plus or minus signs.

list -> list + digit
list -> list - digit
list -> digit 
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

// Alternativley
list -> list + digit | list - digit | digit
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

The above shows production for lists, which include addition and subtraction operations and for single digits, followed by a production rule for digits.

Derivations

A grammar derives strings by beginning with the start symbol and repeatedly replacing a nonterminal by the body of a production for that nonterminal. In the example above, list-> list + digit has two nonterminals: digit and list. The body for the production of the nonterminal for list is digit (list -> digit) and the body of the production for the nonterminal digit is 0, 1, 2, ... (digit -> 0 | 1 | 2 | ...).

So, list -> list + digit becomes list -> digit + digit for which an example may be list -> 2 + 4.

Terminal strings that can be derived from the start symbol, form the language of the grammar. Consider: 9 - 5 + 2.

  • 9 is a list by production list -> digit

  • 9 - 5 is a list by production list -> list - digit

  • 9 - 5 + 2 is a list by production list -> list + digit

PreviousSyntax TranslationNextParsing

Last updated 8 months ago

Was this helpful?