2018-11-29 12:15
Page 1

# Parsing

## "(1+2)*3" ⟹

Page 2
##### Parsing
• Remember `showExpr` from last time?
• `showExpr :: Expr -> String`
• What if we need the opposite?
• `readExpr :: String -> Expr`
Page 3

# Application

## Imagine a simple calculator

• Welcome to the simple calculator! Expression? `1+2*3` Value: 7 Expression? `(1+2)*3` Value: 9
• We can reuse
`eval :: Expr -> Integer`
from last time.
• We also need
`readExpr :: String -> Expr`
.
• This is a parsing problem.
Page 4

• Haskell can give us functions with the right type automatically
• ```data Expr = Num Integer
| Mul Expr Expr

ex1 = Mul (Add (Num 1) (Num 2)) (Num 3)
```
• `show ex1` "Mul (Add (Num 1) (Num 2)) (Num 3)" `read "Mul (Add (Num 1) (Num 2)) (Num 3)"::Expr` Mul (Add (Num 1) (Num 2)) (Num 3)
• But that doesn't give us the syntax we want. That's why we wrote
`showExpr`
.
Page 5

• Data construtors can be infix operators in Haskell.
• ```infixl 6 :+
infixl 7 :*

data Expr = C Integer
| Expr :+ Expr
| Expr :* Expr
ex1 = (C 1 :+ C 2) :* C 3```
• `read "C 1 :+ C 2 :* C 3" :: Expr` C 1 :+ C 2 :* C 3 `read "(C 1 :+ C 2) :* C 3" :: Expr` (C 1 :+ C 2) :* C 3
• This is much closer, but still not exactly the syntax we want.
Page 6

# Grammars

• A grammar defines a language.
• Formally, a language is a set of strings.
• We usually think of a grammar as defining the syntax of a language
• or the "acceptable" input.
• Example: an EBNF grammar for the addition of two numbers:
• digit ::= "0".."9". number ::= digit {digit}. addition ::= number "+" number.
Page 7

# The use of grammars

• Grammars can be used as documentation of languages.
• Grammars can also be used as specifications of
• Parsers (
`readExpr :: String -> Expr`
)
• Printers (
`showExpr :: Expr -> String`
)
• Test data generators (
`rExpr :: Gen Expr`
)
• If the grammar is written in a suitable form, these types of functions can be derived from the grammar in a fairly direct and systematic way.
• Haskell's ```deriving (Read,Show)``` is an example of parsers and printers generated automatically from a particular kind of grammar (a `data` declaration)
Page 8

# The purpose of a parser

## A parser usually does two things

• It checks that the input is correct (according to the grammar).
• It converts the input
`String`
to some other form, to simplify further processing.
• So a parsing function need to have away to
• indicate success or failure,
• return a result when the parsing is successful.
• `parseX :: String -> Maybe X`
Page 9

# Parser correctness

• If the output from a parser is a syntax tree, we can covert back and forth without loss of information.
• We can state the correctness as a property that can be tested with QuickCheck:
• ```prop_correct_parseX :: X -> Bool
prop_correct_parseX x = parseX (showX x) == Just x```
Page 10

# The parsing problem

• While writing functions to show a data structure (
`Expr->String`
) is fairly straight-forward, the opposite (
`String->Expr`
) can be much harder.
• This type of problem has been studied since the early days of computer science.
• The influential parser generator Yacc appeared in the 1970s.
• It uses the LALR(1) parsing method that supports a large class of grammars and produces efficient parser
• There are similar parser generators for Haskell, e.g. Happy and BNFC.
Page 11

• We will look at how to write parsers directly in Haskell.
• It's a good example of
• Higher order functions
• Abstract data types
• With a good parsing library creating a parser directly in Haskell is in some ways easier than using a parser generator.
• Parsing libraries typically create backtracking recursive descent parsers.
Page 12

# A first example

• Before we look at the parsing library, lets write a parser from scratch
• We will use functions of the following type as building blocks:
• `String -> Maybe (a,String)`
• If the parser fails, we return
`Nothing`
,
• If the parser succeeds, we return
`Just (x,r)`
• where
`x`
is the result and
`r`
is the remaining, unused input.
• Reminder: the predefined
`Maybe`
type:
• `data Maybe a = Nothing | Just a`
• Live demo: ParsingExamples.hs
Page 13

## A parsing library proves

• A type for parsers:
`Parser a`
• Functions for creating parsers
• A function for running a parser,
• i.e. applying a parser to some input and get the result
Page 14

# Running a parser

• The library contains the following function for running parsers:
• ``` -- Running a parser
parse :: Parser a -> String -> Maybe (a,String)
```
• Assuming we have a parser for numbers, `number ::= digit {digit}`,
• `number :: Parser Integer`
• here are some examples of how it behaves on different input:
• `parse number "42" == Just (42,"")`
• `parse number "xyz42" == Nothing`
• `parse number "42xyz" == Just (42,"xyz")`
Page 15

# Exercise

• Given
• ```parse :: Parser a -> String -> Maybe (a,String)
```
• write a function
• `completeParse :: Parser a -> String -> Maybe a`
• that only succeeds if the parser comsumes the entire input string (the remaining input is empty).
• This is typically what you want when you use a parser in an application.
Page 16

# Building parsers

• Our parsing library provides functions for building parsers:
• ```data Parser a -- abstract type of parsers

-- Parsing a single character
char :: Char -> Parser Char
sat  :: (Char->Bool) -> Parser Char

-- Choice
(<|>) :: Parser a -> Parser a -> Parser a

-- Repetition
zeroOrMore, oneOrMore :: Parser a -> Parser [a]
```
• This is a small but powerful set of parsing combinators from which parser for arbitrarily complex structures can be built.
• Actually, we need one more thing...
Page 17

# Parsing sequences

• There is no operator in the library to parse one thing followed by another
• ``` -- Not included in the library
pair :: Parser a -> Parser b -> Parser (a,b)
```
• But since
`Parser`
is a monad, we can use the
`do`
• This is a more convenient and flexible solution.
Page 18

# Rewriting our first parser (live demo)

• Example: parse two numbers, separated by `+`, and add them.
• digit ::= "0".."9". number ::= digit {digit}. addition ::= number "+" number.
• Live demo: ParsingExamples.hs
Page 19

# Rewriting our first parser

• ```digit :: Parser Char
digit = sat isDigit

number :: Parser Integer
number = do s <- oneOrMore digit

addition = do a <- number
char '+'
b <- number
return (a+b)
```
• digit ::= "0".."9". number ::= digit {digit}. addition ::= number "+" number.
Page 20

# A grammar for expressions

• ```data Expr = Num Integer
| Mul Expr Expr```
• expr ::= number | expr "+" expr | expr "*" expr | "(" expr ")"
• Using this grammar as a specification for a parser is problematic:
• It is ambiguous (doesn't specify operator precedences / associativity).
• It is left recursive, recursive decent parsers can't handle this.
• ...
Page 21

# A grammar for expressions, version 2

• A BNF grammar for expressions
• expr ::= term "+" expr | term. term ::= factor "*" term | factor. factor ::= number | "(" expr ")".
• This grammar "accepts" the same input as the previous one, and solves some problems.
• It makes it clear that "*" has higher precedence than "+"
• There is no left recursion
• But are the operators left or right associative?
Page 22

## Problems

• 1. Both alternatives in
`expr`
`term`
.
• If the first alternative fails after it has recognized a term, the second alternative will parse the same string again. Inefficient!
• Solution: left factorisation.
• 2. The operators have become right associative
• The choice between left & right associativity is a small change in the grammar:
• `expr ::= term "+" expr | term`
• `expr ::= expr "+" term | term`
• But the left associative grammar is also left recursive, which leads to problems in the parser...
Page 23

# A grammar for expressions, final version

• An EBNF grammar for expressions
• expr ::= term {"+" term}. term ::= factor {"*" factor}. factor ::= number | "(" expr ")".
• The grammar is left factored. Good for efficiency.
• The grammar is easy to read, but less similar to the
`Expr`
type.
• An
`expr`
is one or more
`term`
s separated by "+"
• We can choose how to convert a
`[Expr]`
to an
`Expr`
• Live demo: ParsingExamples.hs
Page 24

# An expression parser

• ```expr = do t <- term
ts <- zeroOrMore (do char '+'; term)

term = do f <- factor
fs <- zeroOrMore (do char '*'; factor)
return (foldl Mul f fs)

factor = (do n <- number; return (Num n))
<|>
(do char '('; e <- expr; char ')'; return e)
```
• Using
`foldl Add t ts`
makes "+" left associative.
• ```foldl :: (b -> a -> b) -> b -> [a] -> b
--  foldl is like foldr, but it works from the left...
```
• But the code is a bit long and repetitive. Let's improve it!
Page 25

# Two useful parsing combinator

## Factoring out the a pattern

• From the parsing library
• ```chain :: Parser item -> Parser sep -> Parser [item]
chain item sep = do i <- item
is <- zeroOrMore (do sep;item)
return (i:is)
```
• For our expression parser
• ```leftAssoc :: (t->t->t) -> Parser t -> Parser sep -> Parser t
leftAssoc op item sep = do is <- chain item sep
return (foldl1 op is)
```
Page 26

# A few more combinator from the library

• Parsing two things and keeping only one of them
• ```(<*) :: Parser b -> Parser a -> Parser b
(*>) :: Parser a -> Parser b -> Parser b```
• Applying a function to the result of a parser
• `(<\$>) :: (a->b) -> Parser a -> Parser b`
• Note: these combinators actually have more general types...
Page 27

# A more elegant expression parser

• ```expr, term, factor :: Parser Expr

expr = leftAssoc Add term (char '+')

term = leftAssoc Mul factor (char '*')

factor = Num <\$> number
<|>
char '(' *> expr <* char ')'```
• expr ::= term {"+" term}. term ::= factor {"*" factor}. factor ::= number | "(" expr ")".
Page 28

# Looking inside the Parsing module

• The
`Parser a`
type:
• ```data Parser a = P (String -> Maybe (a,String))

parse :: Parser a -> String -> Maybe(a,String)
parse (P f) s = f s```
• Parsers are represented as functions! (Functions as data)
• To apply a parser to an input string, we just extract the function and apply it!
Page 29
##### Looking inside the Parsing module
• Parsing a single character
• ```sat :: (Char->Bool) -> Parser Char
sat p = P sat_p
where
sat_p (c:s) | p c = Just (c,s)
sat_p _           = Nothing

char :: Char -> Parser Char
char c = sat (==c)
```
Page 30
##### Looking inside the Parsing module
• Choice
• ```(<|>) :: Parser a -> Parser a -> Parser a
P pf1 <|> P pf2 = P pf
where
pf s = case pf1 s of
Nothing -> pf2 s
r       -> r```
• Try the first parser, if it fails try the second parser. If the first parser succeeds, use that result.
Page 31
##### Looking inside the Parsing module
• Parsing a pair
• ```pair :: Parser a -> Parser b -> Parser (a,b)
pair (P pa) (P pb) =
P (\ s ->
case pa s of
Nothing -> Nothing
Just (a,r) ->
case pb r of
Nothing -> Nothing
Just (b,r) -> Just ((a,b),r))
```
• This is very similar to the
`addition`
parser we say earlier.
Page 32
##### Looking inside the Parsing module
• A simple variation on parsing a pair
• ```apply :: Parser (a->b) -> Parser a -> Parser b
apply (P pf) (P pa) =
P (\ s ->
case pf s of
Nothing -> Nothing
Just (f,r) ->
case pa r of
Nothing -> Nothing
Just (a,r) -> Just (f a,r))
```
• More useful than always returning a pair!
• `pair pa pb = return (,) `apply` pa `apply` pb`
Page 33

# What about supporting the do notation?

• To support the
`do`
notation, we need to create an instance in the
`Monad`
class.
• ```class Monad m where
return :: a -> m a
(>>=)  :: m a -> (a -> m b) -> m b```
• How does this relate to the
`do`
notation?
• `do x <- m; more ; stuff`
⟹
`m >>= (\x -> do more ; stuff)`
• Compare how the enumeration notation relates to the
`Enum`
class:
• `[x..y]`
⟹
`enumFromTo x y`
Page 34

• We need to implement
• ```return :: a -> Parser a
(>>=)  :: Parser a -> (a -> Parser b) -> Parser b```
• Defining
`return`
is simple enough.
• Defining
`(>>=)`
is a twist on
`apply`
...
Page 35

```instance Monad Parser where
--return :: a -> Parser a
return x = P (\ s -> Just (x,s))

-- (>>=)  :: Parser a -> (a -> Parser b) -> Parser b
P p >>= f = P (\ s -> case p s of
Nothing    -> Nothing
Just (a,r) -> parse (f a) r)
```
Page 36

`return`
and
`(>>=)`
operators.
• `Parser`
`Gen`
`IO`
Building things
`sat`

`char`

`(<|>)`

...
`elements`

`oneof`

`frequency`

...
`putStr`

`getLine`

`readFile`

...
Getting things out
`parse`
`sample`

`generate`
• You can't get out of the IO monad!
Page 37

# Summary

• We have seen how monadic parser combinators work and how to use them.
• The structure of a parser follows the grammar closely.
• Need to avoid left recursion.
• We saw the
`Maybe`
type and the
`Monad`
class.
• The Parsing module: