Parsing

Parsing is taking a string of terminals and figuring out how to derive it from the start symbol of the grammar. If it cannot be derived from the start symbol of the grammar, the parser must report syntax errors within the string.

Parse Trees

Parse trees pictorially represent how the start symbol of a grammar derives a string in a language. Consider the following A -> XYZ; a parse would look like this:

Parse Tree Rules:

  1. The root is labeled by the start symbol

  2. Each leaf is labeled by a terminal or by ϵ\epsilon

  3. Each interior node is labeled by a nonterminal

  4. If A is the nonterminal labeling some interior node and X1,X2,X3,...,XnX_1, X_2, X_3, ..., X_n are the labels of the children of that node from left to right, then there must exist a production A -> X1X2...XnX_1X_2 ... X_n , where X1X2...XnX_1X_2 ... X_n are terminal or nonterminal symbols.

    1. A -> ϵ\epsilon is a special case requiring the interior node A to have a single child labeled ϵ\epsilon.

Consider the example statement from the previous page: 9 - 5 + 2. We created the following grammar productions:

list -> list + digit
list -> list - digit
list -> digit
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Since list -> list + digit is the first production the nonterminal list (on the left side) is the start symbol, and also the root of the parse tree. The root will have three children list, +, and digit. Recall that interior nodes are labeled by a nonterminal; you should immediately recognize the the plus operator is a terminal label, and cannot be an interior node; it is a leaf. (leaf nodes are nonterminal or ϵ\epsilon ).

From left to right, the leaves of a parse tree form the yield of the tree, which is the string generated or derived from the root of the parse tree.

The process of finding a parse tree for a given string of terminals is called parsing.

Last updated