Syntax Defined

Context-free grammar, or simply, grammar, is how syntax of a language is defined. Consider the typical if-else statement:

if (expression) statement else statement

The if-else structure of this statement illustrates that it is nothing more than an if keyword, followed by an opening parenthesis, an expression, a closing parenthesis, a statement, the else keyword, and another statement.

Using the variable expr to denote an expression and stmt to denote a statement, the structure rule can be defined as:

stmt -> if ( expr ) stmt else stmt

The arrow represents the words "can have the form." In other words, we've defined a grammar rule for if-else conditionals, that read: statement can have the form if (expr) stmt else stmt. This is known as a production rule. Lexical elements of the keyword if, else, and parenthesis are referred to as terminals.

Variables such as expr and stmt represent sequences of terminals and are called nonterminals.

Context-Free Grammar Components

  1. A set of terminal symbols, sometimes called tokens, which are elementary symbols of the language defined by the grammar.

  2. A set of nonterminals, sometimes referred to as syntactic variables, each representing a set of strings of terminals.

  3. A set of productions, where each production consists of a nonterminal, called the head or left side of the production, an arrow, and a sequence of terminals and/or nonterminals, called the body or right side of the production. The purpose of a production is to specify one of the written forms of construct.

  4. A designation of one of the nonterminals as the start symbol.

For this guide, grammars are specified by listing the productions.

  • Productions for the start symbol are listed first.

  • Digits, signs, and boldface strings are assumed to be terminals.

  • Italicized name is a nonterminal.

  • Productions with the same nonterminal head can have their bodies groups using the pipe symbol "|" as the delimiter.

Example

If an example uses an expression consisting of digits and plus and minus signs (e.g., 9-5+2, 3-1, 7), we will refer to that expression as a list of digits separated by plus or minus signs.

list -> list + digit
list -> list - digit
list -> digit 
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

// Alternativley
list -> list + digit | list - digit | digit
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

The above shows production for lists, which include addition and subtraction operations and for single digits, followed by a production rule for digits.

Derivations

A grammar derives strings by beginning with the start symbol and repeatedly replacing a nonterminal by the body of a production for that nonterminal. In the example above, list-> list + digit has two nonterminals: digit and list. The body for the production of the nonterminal for list is digit (list -> digit) and the body of the production for the nonterminal digit is 0, 1, 2, ... (digit -> 0 | 1 | 2 | ...).

So, list -> list + digit becomes list -> digit + digit for which an example may be list -> 2 + 4.

Terminal strings that can be derived from the start symbol, form the language of the grammar. Consider: 9 - 5 + 2.

  • 9 is a list by production list -> digit

  • 9 - 5 is a list by production list -> list - digit

  • 9 - 5 + 2 is a list by production list -> list + digit

Last updated