1. Write a regular expression for comments of the form /*
... */
.
Construct an NFA and DFA for this kind of comments.
2. Write a compiler from regular expressions to NFA's, covering
the minimal set (symbol, sequence, union, closure, empty) and the notation
used in the lecture notes. You can
use BNFC's Ident
category for symbols.
Closure should bind stronger than sequence, sequence stronger than union.
The automata and the compiler can be expressed in a mathematical notation and
pseudocode. For instance, the definition of automata in Section 3.3 suggests
compile (Symbol a) = <Sigma,{0,1},{1},0,{((0,a),1)}>
But you can also write an actual program as a back-end to the BNFC-generated parser. And, if you are really ambitious, a generation of Graphviz code to show the bubbles!
3. Consider the simple NFA for the expression (a|b)* a (a|b)
in Section 3.4
of the lecture notes. Make it into a DFA by using the subset construction.
Is the result different from the DFA constructed by informal reasoning
in 3.4?
4. Trace the LR parsing of the statement
if (x + y > 4) if (true) {x = y = y - 1 ;} else 8 ;
in the grammar
Stm ::= "if" "(" Exp ")" Stm "else" Stm ; Stm ::= "if" "(" Exp ")" Stm ; Stm ::= "{" [Stm] "}" Stm ::= Exp ";" Exp2 ::= Integer Exp2 ::= Ident Exp1 ::= Exp1 "+" Exp2 Exp1 ::= Exp1 "-" Exp2 Exp ::= Exp1 ">" Exp1 Exp ::= Ident "=" Exp [Stm] ::= [Stm] ::= Stm [Stm] Exp2 ::= "(" Exp ")" Exp1 ::= Exp2 Exp ::= Exp1
Do this by showing the evolution of the stack and the input, and whether shift or reduce actions are taken.
5. Consider the language 'X'*
, i.e. sequences of
symbol X
. Write two context-free grammars for it:
one left-recursive and one right-recursive. With both
grammars, trace the LR parsing of the string XXXX
.
What can you say about the memory consumption of the
two processes?