Page 1

# Parsing

## "(1+2)*3" ⟹

Page 2
##### Parsing
• Remember `showExpr` from last time?
• `showExpr :: Expr -> String`
• What if we need the opposite?
• `readExpr :: String -> Expr`
Page 3

• Haskell can give us functions with the right type automatically
• ```data Expr = Num Integer
| Mul Expr Expr

ex1 = Mul (Add (Num 1) (Num 2)) (Num 3)
```
• `show ex1` "Mul (Add (Num 1) (Num 2)) (Num 3)" `read "Mul (Add (Num 1) (Num 2)) (Num 3)"::Expr` Mul (Add (Num 1) (Num 2)) (Num 3)
• But that doesn't give us the syntax we want. That's why we wrote
`showExpr`
.
Page 4

• Data construtors can be infix operators in Haskell.
• ```infixl 6 :+
infixl 7 :*

data Expr = C Integer
| Expr :+ Expr
| Expr :* Expr
ex1 = (C 1 :+ C 2) :* C 3```
• `read "C 1 :+ C 2 :* C 3" :: Expr` C 1 :+ C 2 :* C 3 `read "(C 1 :+ C 2) :* C 3" :: Expr` (C 1 :+ C 2) :* C 3
• This is much closer, but still not exactly the syntax we want.
Page 5

# Grammars

• A grammar defines a language.
• Formally, a language is a set of strings.
• We usually think of a grammar as defining the syntax of a language
• or the "acceptable" input.
• Example: an EBNF grammar for the addition of two numbers:
• digit ::= "0".."9". number ::= digit {digit}. addNumbers ::= number "+" number.
Page 6

# The use of grammars

• Grammars can be used as documentation of languages.
• Grammars can also be used as specifications of
• Parsers (
`readExpr :: String -> Expr`
)
• Printers (
`showExpr :: Expr -> String`
)
• Test data generators (
`rExpr :: Gen Expr`
)
• If the grammar is written in a suitable form, these types of functions can be derived from the grammar in a fairly direct and systematic way.
• Haskell's ```deriving (Read,Show)``` is an example of parsers and printers generated automatically from a particular kind of grammar (a `data` declaration)
Page 7

# The purpose of a parser

## A parser usually does two things

• It checks that the input is correct (according to the grammar).
• It converts the an input
`String`
to some other form, to simplify further processing.
• This could be a syntax tree, in which case we could covert back and forth without loss of information.
• `prop_correct_parseX x = parseX (showX x) == x`
Page 8

# The parsing problem

• While writing functions to show a data structure (
`Expr->String`
) is fairly straight-forward, the opposite (
`String->Expr`
) can be much harder.
• This type of problem has been studied since the early days of computer science.
• The influential parser generator Yacc appeared in the 1970s.
• It uses the LALR(1) parsing method that produces efficient parser for a large class of grammars.
• There are similar parser generators for Haskell, e.g. Happy and BNFC.
Page 9

• We will look at how to write parsers directly in Haskell.
• It's a good examples of
• Higher order functions
• Abstract data types
• With a good parsing library, creating a parser directly in Haskell is in some ways easier than using a parser generator.
• Parsing libraries typically create backtracking recursive descent parsers.
Page 10

## There is some similarity between parsing and generating random test data

• QuickCheck provides test data generators of type
`Gen a`
.
• Keeps track of a supply of random numbers under the hood.
• Lets you build functions that return random values by using random numbers from the supply.
• Parsing libraries provide parsers of type
`Parser a`
.
• Keeps track of the remaining input under the hood.
• Lets you build parsing functions that return values by using some of the remaining input.
• Can fail.
• Both are monads, so we can use the
`do`
-notation in both cases.
Page 11

# A simple Parsing Library in Haskell

• ```data Parser a -- abstract type of parsers

-- Parsing a single character
sat  :: (Char->Bool) -> Parser Char
char :: Char -> Parser Char

-- Choice
(<|>) :: Parser a -> Parser a -> Parser a

-- Repetition
zeroOrMore, oneOrMore :: Parser a -> Parser [a]
```
• This is a small but powerful set of parsing combinators from which parser for arbitrarily complex structures can be built.
• Actually, we need a few more things...
Page 12

# Parsing sequences

• There is no operator in the library to parse one thing followed by another
• ``` -- Not included in the library
pair :: Parser a -> Parser b -> Parser (a,b)
```
• But since
`Parser`
is a monad, we can use the
`do`
• This is a more convenient and flexible solution.
Page 13

# Running a parser and the Maybe type

• The library also contains the following important function:
• ``` -- Running a parser
parse :: Parser a -> String -> Maybe (a,String)
```
• It uses the predefined
`Maybe`
type:
• `data Maybe a = Nothing | Just a`
• It is useful for functions like
`parse`
that need to indicate failure or success.
• The result of
`parse p s`
is:
• If the parser fails:
`Nothing`
,
• If the parser succeeds:
`Just (x,r)`
• where
`x`
is the result and
`r`
is the remaining, unused input.
Page 14

# Examples

• ```parse :: Parser a -> String -> Maybe (a,String)
```
• Assuming we have a parser for numbers
• `number :: Parser Integer`
• here are some examples of how it behaves on different input:
• `parse number "42" == Just (42,"")`
• `parse number "xyz42" == Nothing`
• `parse number "42xyz" == Just (42,"xyz")`
Page 15

# Exercise

• Write a function
• `completeParse :: Parser a -> String -> Maybe a`
• that only succeeds if the parser comsumes the entire input string (the remaining input is empty).
Page 16

# Our first parser (live demo)

• Example: parse two numbers, separated by `+`, and add them.
• digit ::= "0".."9". number ::= digit {digit}. addNumbers ::= number "+" number.
Page 17

# Our first parser

• Example: parse two numbers, separated by `+`, and add them.
• ```digit :: Parser Char
digit = sat isDigit

number :: Parser Integer
number = do s <- oneOrMore digit

addNumbers = do a <- number
char '+'
b <- number
return (a+b)
```
• digit ::= "0".."9". number ::= digit {digit}. addNumbers ::= number "+" number.
Page 18

# Writing the same parser directly

• Example: parse two numbers, separated by `+`, and add them.
• ```number :: String -> Maybe (Integer,String)
number s = case span isDigit s of
("",_) -> Nothing

addNumbers :: String -> Maybe (Integer,String)
case number s of
Nothing -> Nothing
Just (a,r) ->
case r of
'+':r -> case number r of
Nothing -> Nothing
Just (b,r) -> Just (a+b,r)
_ -> Nothing```
Page 19

# A grammar for expressions

• ```data Expr = Num Integer
| Mul Expr Expr```
• expr ::= number | expr "+" expr | expr "*" expr | "(" expr ")"
• But using this grammar as a specification for a parser is problematic:
• It is ambiguous (doesn't specify operator precedences).
• It is left recursive, recursive decent parsers can't handle this.
Page 20

# A new grammar for expressions

• A BNF grammar for expressions
• expr ::= term "+" expr | term. term ::= factor "*" term | factor. factor ::= number | "(" expr ")".
• This grammar "accepts" the same input as the previous one, and solves the problems.
• It makes it clear that "*" has higher precedence than "+"
• There is no left recursion
• But are the operators left or right associative?
Page 21

# A parser for expressions (version 1)

```expr, expr', term, term', factor :: Parser Expr
expr  = expr' <|> term
expr' = do t <- term
char '+'
e <- expr

term  = term' <|> factor
term' = do f <- factor
char '*'
t <- term
return (Mul f t)

factor = do n <- number; return (Num n)
<|>
do char '('
e <- expr
char ')'
return e```
Page 22

# Testing the parser

• `parse expr "1+2*3"` Just (Add (Num 1) (Mul (Num 2) (Num 3)),"")
• It works!
• But the code is rather long and repetitive.
• The parser is 17 lines long, but the grammar is only 3 lines long.
• We can do better...
Page 23

# Problems

• 1. Both alternatives in
`expr`
`term`
.
• If the
`expr'`
alternative fails,
`term`
will parse the same string twice. Inefficient!
• Solution: left factorisation.
• 2. The operators have become right associative
• Making them left associative is a simple change in the grammar:
• `expr ::= term "+" expr | term`
• `expr ::= expr "+" term | term`
• But the left associative grammar is also left recursive, which leads to problems in the parser...
Page 24

• An EBNF grammar for expressions
• expr ::= term {"+" term}. term ::= factor {"*" factor}. factor ::= number | "(" expr ")".
• The grammar is left factored. Good for efficiency.
• An
`expr`
is one or more
`term`
s separated by "+"
• We can choose how to convert a
`[Expr]`
to an
`Expr`
Page 25

# The new parser

• ```expr = do t <- term
ts <- zeroOrMore (do char '+'; term)

term = do f <- factor
fs <- zeroOrMore (do char '*'; factor)
return (foldl Mul f fs)
```
• Using
`foldl Add t ts`
makes "+" left associative.
• ```foldl :: (b -> a -> b) -> b -> [a] -> b
--  foldl is like foldr, but it works from the left...
```
• But there is still a repeating pattern!
Page 26

# Two useful parsing combinator

## Factoring out the common pattern

• From the parsing library
• ```chain :: Parser item -> Parser sep -> Parser [item]
chain item sep = do i <- item
is <- zeroOrMore (do sep;item)
return (i:is)
```
• For our expression parser
• ```leftAssoc :: (t->t->t) -> Parser t -> Parser sep -> Parser t
leftAssoc op item sep = do is <- chain item sep
return (foldl1 op is)
```
Page 27

# A few more combinator from the library

• Parsing two things and keeping only one of them (*)
• ```(<*) :: Parser b -> Parser a -> Parser b
(*>) :: Parser a -> Parser b -> Parser b```
• Applying a function to the result of a parser (*)
• `(<\$>) :: (a->b) -> Parser a -> Parser b`
• (*) These combinators actually have more general types...
Page 28

# A more elegant expression parser

• ```expr, term, factor :: Parser Expr

expr = leftAssoc Add term (char '+')

term = leftAssoc Mul factor (char '*')

factor = Num <\$> number
<|>
char '(' *> expr <* char ')'```
• expr ::= term {"+" term}. term ::= factor {"*" factor}. factor ::= number | "(" expr ")".
Page 29

# Looking inside the Parsing module

• The
`Parser a`
type:
• ```data Parser a = P (String -> Maybe (a,String))

parse :: Parser a -> String -> Maybe(a,String)
parse (P f) s = f s```
• Parsers are represented as functions!
• To apply a parser to an input string, we just extract the function and apply it!
Page 30
##### Looking inside the Parsing module
• Parsing a single character
• ```sat :: (Char->Bool) -> Parser Char
sat p = P sat_p
where
sat_p (c:s) | p c = Just (c,s)
sat_p _           = Nothing

char :: Char -> Parser Char
char c = sat (==c)
```
Page 31
##### Looking inside the Parsing module
• Choice
• ```(<|>) :: Parser a -> Parser a -> Parser a
P pf1 <|> P pf2 = P pf
where
pf s = case pf1 s of
Nothing -> pf2 s
r -> r```
• Try the first parser, if it fails try the second parser. If the first parser succeeds, use that result.
Page 32
##### Looking inside the Parsing module
• Parsing a pair
• ```pair :: Parser a -> Parser b -> Parser (a,b)
pair (P pa) (P pb) =
P (\ s ->
case pa s of
Nothing -> Nothing
Just (a,r) ->
case pb r of
Nothing -> Nothing
Just (b,r) -> Just ((a,b),r))
```
• This is very similar to the
`addNumbers`
parser we say earlier.
Page 33
##### Looking inside the Parsing module
• A simple variation on parsing a pair
• ```apply :: Parser (a->b) -> Parser a -> Parser b
apply (P pf) (P pb) =
P (\ s ->
case pf s of
Nothing -> Nothing
Just (f,r) ->
case pa r of
Nothing -> Nothing
Just (a,r) -> Just (f a,r))
```
• More useful than always returning a pair!
• `pair pa pb = return (,) `apply` pa `apply` pb`
Page 34

# What about supporting the do notation?

• To support the
`do`
notation, we need to create an instance in the
`Monad`
class.
• ```class Monad m where
return :: a -> m a
(>>=)  :: m a -> (a -> m b) -> m b```
• How does this relate to the
`do`
notation?
• `do x <- m; more ; stuff`
⟹
`m >>= (\x -> do more ; stuff)`
• Compare how the enumeration notation relates to the
`Enum`
class:
• `[x..y]`
⟹
`enumFromTo x y`
Page 35

• We need to implement
• ```return :: a -> Parser a
(>>=)  :: Parser a -> (a -> Parser b) -> Parser b```
• Defining
`return`
is simple enough.
• Defining
`(>>=)`
is a twist on
`apply`
...
Page 36

```instance Monad Parser where
return x = P (\ s -> Just (x,s))

P p >>= f = P (\ s -> case p s of
Nothing    -> Nothing
Just (a,r) -> parse (f a) r)
```
Page 37

`return`
and
`(>>=)`
operators.
• `Parser`
`Gen`
`IO`
Building things
`sat`

`char`

`(<|>)`

...
`elements`

`oneof`

`frequency`

...
`putStr`

`getLine`

`readFile`

...
Getting things out
`parse`
`sample`

`generate`
• You can't get out of the IO monad!
Page 38

# Summary

• We have seen how monadic combinator parsers work and how to use them.
• The structure of a parser follows the grammar closely.
• Need to avoid left recursion.
• We saw the
`Maybe`
type and the
`Monad`
class.
• The Parsing module: