Exercises on Lexing and Parsing

Programming Language Technology 2012
Exercise 2

1. Write a regular expression for comments of the form /* ... */. Construct an NFA and DFA for this kind of comments.

2. Write a compiler from regular expressions to NFA's, covering the minimal set (symbol, sequence, union, closure, empty) and the notation used in the lecture notes. You can use BNFC's Ident category for symbols. Closure should bind stronger than sequence, sequence stronger than union. The automata and the compiler can be expressed in a mathematical notation and pseudocode. For instance, the definition of automata in Section 3.3 suggests

    compile (Symbol a) = <Sigma,{0,1},{1},0,{((0,a),1)}>

But you can also write an actual program as a back-end to the BNFC-generated parser. And, if you are really ambitious, a generation of Graphviz code to show the bubbles!

3. Consider the simple NFA for the expression (a|b)* a (a|b) in Section 3.4 of the lecture notes. Make it into a DFA by using the subset construction. Is the result different from the DFA constructed by informal reasoning in 3.4?

4. Trace the LR parsing of the statement

    if (x + y > 4) if (true) {x = y = y - 1 ;} else 8 ;

in the grammar

    Stm   ::= "if" "(" Exp ")" Stm "else" Stm ;
    Stm   ::= "if" "(" Exp ")" Stm ;
    Stm   ::= "{" [Stm] "}"
    Stm   ::= Exp ";"
    Exp2  ::= Integer
    Exp2  ::= Ident
    Exp1  ::= Exp1 "+" Exp2
    Exp1  ::= Exp1 "-" Exp2
    Exp   ::= Exp1 ">" Exp1
    Exp   ::= Ident "=" Exp
    [Stm] ::= 
    [Stm] ::= Stm [Stm]
    Exp2  ::= "(" Exp ")"
    Exp1  ::= Exp2
    Exp   ::= Exp1

Do this by showing the evolution of the stack and the input, and whether shift or reduce actions are taken.

5. Consider the language 'X'*, i.e. sequences of symbol X. Write two context-free grammars for it: one left-recursive and one right-recursive. With both grammars, trace the LR parsing of the string XXXX. What can you say about the memory consumption of the two processes?