"(1+2)*3" ⟹

- Remember
`showExpr`

from last time?`showExpr`**::****Expr****->****String**

- What if we need the opposite?
`readExpr`**::****String****->****Expr**

- Welcome to the simple calculator! Expression?
`1+2*3`

Value: 7 Expression?`(1+2)*3`

Value: 9 - We can reuse from last time.
`eval`**::****Expr****->****Integer** - We also need .
`readExpr`**::****String****->****Expr**- This is a parsing problem.

- Haskell can give us functions with the right type automatically
**data****Expr****=****Num****Integer****|****Add****Expr****Expr****|****Mul****Expr****Expr****deriving**(**Show**,**Read**)`ex1`**=****Mul**(**Add**(**Num**1) (**Num**2)) (**Num**3)`show ex1`

"Mul (Add (Num 1) (Num 2)) (Num 3)"`read "Mul (Add (Num 1) (Num 2)) (Num 3)"::Expr`

Mul (Add (Num 1) (Num 2)) (Num 3)- But that doesn't give us the syntax we want. That's why we wrote
.
`showExpr`

- Data construtors can be infix operators in Haskell.
**infixl**6**:+****infixl**7**:*****data****Expr****=****C****Integer****|****Expr****:+****Expr****|****Expr****:*****Expr****deriving**(**Show**,**Read**)`ex1`**=**(**C**1**:+****C**2)**:*****C**3`read "C 1 :+ C 2 :* C 3" :: Expr`

C 1 :+ C 2 :* C 3`read "(C 1 :+ C 2) :* C 3" :: Expr`

(C 1 :+ C 2) :* C 3- This is much closer, but still not exactly the syntax we want.

- A
*grammar*defines a*language*.- Formally, a language is a set of strings.
- We usually think of a grammar as defining the
*syntax*of a language- or the "acceptable" input.

- Example: an EBNF
grammar for the addition of two numbers:
- digit ::= "0".."9". number ::= digit {digit}. addition ::= number "+" number.

- Grammars can be used as
*documentation*of languages. - Grammars can also be used as
*specifications*of- Parsers ()
`readExpr`**::****String****->****Expr** - Printers ()
`showExpr`**::****Expr****->****String** - Test data generators ()
`rExpr`**::****Gen****Expr**

- Parsers (
- If the grammar is written in a suitable form, these types of functions
can be derived from the grammar in a fairly direct and systematic
way.
- Haskell's

is an example of parsers and printers generated automatically from a particular kind of grammar (a**deriving**(**Read**,**Show**)

declaration)**data**

- Haskell's

- It checks that the input is correct (according to the grammar).
- It converts the input to some other form, to simplify further processing.
**String**- This could be a
*syntax tree*, in which case we could covert back and forth without loss of information.`prop_correct_parseX``x`**=**`parseX`(`showX``x`)`==``x`

- This could be a

- While writing functions to show a data structure () is fairly straight-forward, the opposite (
**Expr****->****String**) can be much harder.**String****->****Expr** - This type of problem has been studied since the early days of computer science.
- The influential
*parser generator*Yacc appeared in the 1970s.- It uses the LALR(1) parsing method that produces efficient parser for a large class of grammars.

- There are similar parser generators for Haskell, e.g. Happy and BNFC.

- We will look at how to write parsers directly in Haskell.
- It's a good example of
- Higher order functions
- Abstract data types
- Monads

- It's a good example of
- With a good
*parsing library*, creating a parser directly in Haskell is in some ways easier than using a parser generator. - Parsing libraries typically create backtracking recursive descent parsers.

- Before we look at the parsing library, lets write a parser from scratch
- We will use the following type parsing functions that return
a value of type :
`a`**String****->****Maybe**(`a`,**String**)

- We will use the following type parsing functions that return
a value of type
- We use the predefined type:
**Maybe****data****Maybe**`a`**=****Nothing****|****Just**`a`- If the parser fails, we return ,
**Nothing** - If the parser succeeds, we return
**Just**(`x`,`r`)- where is the result and
`x`is the remaining, unused input.`r`

- where

- Live demo: ParsingExamples.hs

- QuickCheck provides a type for test data generators: .
**Gen**`a`- Keeps track of a supply of random numbers under the hood.
- Lets you build functions that return random values by using random numbers from the supply.

- A
*parsing library*provides a type for parsers:.**Parser**`a`- Keeps track of the remaining input under the hood.
- Lets you build parsing functions that return values by using some of the remaining input.
*Can fail*. (If the input is not correct according to the grammar)

- Both are monads, so we can use the -notation in both cases.
**do**

- The library contains the following function for running parsers:
-- Running a parser

`parse`**::****Parser**`a`**->****String****->****Maybe**(`a`,**String**)

- Assuming we have a parser for numbers,
`number ::= digit {digit}`

,`number`**::****Parser****Integer**

- here are some examples of how it behaves on different input:
`parse``number`"42"`==`**Just**(42,"")`parse``number`"xyz42"`==`**Nothing**`parse``number`"42xyz"`==`**Just**(42,"xyz")

- Given
`parse`**::****Parser**`a`**->****String****->****Maybe**(`a`,**String**)

- write a function
`completeParse`**::****Parser**`a`**->****String****->****Maybe**`a`- that only succeeds if the parser comsumes the entire input string (the remaining input is empty).

- This is typically what you want when you use a parser in an application.

- Our parsing library provides functions for building parsers:
**data****Parser**`a`-- abstract type of parsers -- Parsing a single character`char`**::****Char****->****Parser****Char**`sat`**::**(**Char****->****Bool**)**->****Parser****Char**-- Choice (`<|>`)**::****Parser**`a`**->****Parser**`a`**->****Parser**`a`-- Repetition`zeroOrMore`,`oneOrMore`**::****Parser**`a`**->****Parser**[`a`]- This is a small but powerful set of
*parsing combinators*from which parser for arbitrarily complex structures can be built. - Actually, we need one more thing...

- There is no operator in the library to parse one thing followed by another
-- Not included in the library

`pair`**::****Parser**`a`**->****Parser**`b`**->****Parser**(`a`,`b`)- But since is a monad, we can use the
**Parser**notation instead.**do**- This is a more convenient and flexible solution.

- Example: parse two numbers, separated by
`+`

, and add them. - digit ::= "0".."9". number ::= digit {digit}. addition ::= number "+" number.
- Live demo: ParsingExamples.hs

`digit`**::****Parser****Char**`digit`**=**`sat``isDigit``number`**::****Parser****Integer**`number`**=****do**`s`**<-**`oneOrMore``digit``return`(`read``s`)`addition`**::****Parser****Integer**`addition`**=****do**`a`**<-**`number``char`'+'`b`**<-**`number``return`(`a``+``b`)- digit ::= "0".."9". number ::= digit {digit}. addition ::= number "+" number.

**data****Expr****=****Num****Integer****|****Add****Expr****Expr****|****Mul****Expr****Expr**- expr ::= number | expr "+" expr | expr "*" expr | "(" expr ")"
- Using this grammar as a specification for a parser is problematic:
- It is ambiguous (doesn't specify operator precedences / associativity).
- It is
*left recursive*, recursive decent parsers can't handle this. - ...

- A BNF grammar for expressions
- expr ::= term "+" expr | term. term ::= factor "*" term | factor. factor ::= number | "(" expr ")".

- This grammar "accepts" the same input as the previous one, and
solves some problems.
- It makes it clear that "*" has higher precedence than "+"
- There is no left recursion
- But are the operators left or right associative?

- 1. Both alternatives in start with
`expr`.`term`- If the first alternative fails after it has recognized a term, the second alternative will parse the same string again. Inefficient!
- Solution:
*left factorisation*.

- 2. The operators have become right associative
- The choice between left & right associativity is a small
change in the grammar:
`expr ::= term "+" expr | term`

`expr ::= expr "+" term | term`

- But the left associative grammar is also left recursive, which leads to problems in the parser...

- The choice between left & right associativity is a small
change in the grammar:

- An EBNF grammar for expressions
- expr ::= term {"+" term}. term ::= factor {"*" factor}. factor ::= number | "(" expr ")".

- The grammar is left factored. Good for efficiency.
- The grammar is easy to read, but less similar to the type.
**Expr**- An is one or more
`expr`s separated by "+"`term`- We can choose how to convert a to an
[

**Expr**]**Expr**

- We can choose how to convert a

- An
- Live demo: ParsingExamples.hs

`expr`**=****do**`t`**<-**`term``ts`**<-**`zeroOrMore`(**do**`char`'+';`term`)`return`(`foldl`**Add**`t``ts`)`term`**=****do**`f`**<-**`factor``fs`**<-**`zeroOrMore`(**do**`char`'*';`factor`)`return`(`foldl`**Mul**`f``fs`)`factor`**=**(**do**`n`**<-**`number`;`return`(**Num**`n`))`<|>`(**do**`char`'(';`e`**<-**`expr`;`char`')';`return``e`)- Using makes "+" left associative.
`foldl`**Add**`t``ts``foldl`**::**(`b`**->**`a`**->**`b`)**->**`b`**->**[`a`]**->**`b`-- foldl is like foldr, but it works from the left...

- But the code is a bit long and repetitive. Let's improve it!

- From the parsing library
`chain`**::****Parser**`item`**->****Parser**`sep`**->****Parser**[`item`]`chain``item``sep`**=****do**`i`**<-**`item``is`**<-**`zeroOrMore`(**do**`sep`;`item`)`return`(`i`**:**`is`)

- For our expression parser
`leftAssoc`**::**(`t`**->**`t`**->**`t`)**->****Parser**`t`**->****Parser**`sep`**->****Parser**`t``leftAssoc``op``item``sep`**=****do**`is`**<-**`chain``item``sep``return`(`foldl1``op``is`)

- Parsing two things and keeping only one of them
(

`<*`)**::****Parser**`b`**->****Parser**`a`**->****Parser**`b`(`*>`)**::****Parser**`a`**->****Parser**`b`**->****Parser**`b`

- Applying a function to the result of a parser
(

`<$>`)**::**(`a`**->**`b`)**->****Parser**`a`**->****Parser**`b`

- Note: these combinators actually have more general types...

`expr`,`term`,`factor`**::****Parser****Expr**`expr`**=**`leftAssoc`**Add**`term`(`char`'+')`term`**=**`leftAssoc`**Mul**`factor`(`char`'*')`factor`**=****Num**`<$>``number``<|>``char`'('`*>``expr``<*``char`')'- expr ::= term {"+" term}. term ::= factor {"*" factor}. factor ::= number | "(" expr ")".

- The type:
**Parser**`a` **data****Parser**`a`**=****P**(**String****->****Maybe**(`a`,**String**))`parse`**::****Parser**`a`**->****String****->****Maybe**(`a`,**String**)`parse`(**P**`f`)`s`**=**`f``s`- Parsers are represented as functions!
(
*Functions as data*) - To apply a parser to an input string, we just extract the function and apply it!

- Parsing a single character
`sat`**::**(**Char****->****Bool**)**->****Parser****Char**`sat``p`**=****P**`sat_p`**where**`sat_p`(`c`**:**`s`)**|**`p``c`**=****Just**(`c`,`s`)`sat_p`**_****=****Nothing**`char`**::****Char****->****Parser****Char**`char``c`**=**`sat`(`==``c`)

- Choice
(

`<|>`)**::****Parser**`a`**->****Parser**`a`**->****Parser**`a`**P**`pf1``<|>`**P**`pf2`**=****P**`pf`**where**`pf``s`**=****case**`pf1``s`**of****Nothing****->**`pf2``s``r`**->**`r`- Try the first parser, if it fails try the second parser. If the first parser succeeds, use that result.

- Parsing a pair
`pair`**::****Parser**`a`**->****Parser**`b`**->****Parser**(`a`,`b`)`pair`(**P**`pa`) (**P**`pb`)**=****P**(**\**`s`**->****case**`pa``s`**of****Nothing****->****Nothing****Just**(`a`,`r`)**->****case**`pb``r`**of****Nothing****->****Nothing****Just**(`b`,`r`)**->****Just**((`a`,`b`),`r`))- This is very similar to the parser we say earlier.
`addition`

- This is very similar to the

- A simple variation on parsing a pair
`apply`**::****Parser**(`a`**->**`b`)**->****Parser**`a`**->****Parser**`b``apply`(**P**`pf`) (**P**`pb`)**=****P**(**\**`s`**->****case**`pf``s`**of****Nothing****->****Nothing****Just**(`f`,`r`)**->****case**`pa``r`**of****Nothing****->****Nothing****Just**(`a`,`r`)**->****Just**(`f``a`,`r`))- More useful than always returning a pair!
`pair``pa``pb`**=**`return`(,) ``apply```pa```apply```pb`

- To support the notation, we need to create an instance in the
**do**class.**Monad** **class****Monad**`m`**where**`return`**::**`a`**->**`m``a`(`>>=`)**::**`m``a`**->**(`a`**->**`m``b`)**->**`m``b`- How does this relate to the notation?
**do**- ⟹
**do**`x`**<-**`m`;`more`;`stuff``m``>>=`(**\**`x`**->****do**`more`;`stuff`)

- Compare how the enumeration notation relates to the class:
**Enum**- ⟹
[

`x`**..**`y`]`enumFromTo``x``y`

- We need to implement
`return`**::**`a`**->****Parser**`a`(`>>=`)**::****Parser**`a`**->**(`a`**->****Parser**`b`)**->****Parser**`b`- Defining is simple enough.
`return` - Defining is a twist on
(

`>>=`)...`apply`

instanceMonadParserwherereturnx=P(\s->Just(x,s))Pp>>=f=P(\s->casepsofNothing->NothingJust(a,r)->parse(fa)r)

- Every monad supports the and
`return`operators.(

`>>=`) - Different monads support different additional operations:
**Parser****Gen****IO**Building things `sat``char`(

`<|>`)

...`elements``oneof``frequency`

...`putStr``getLine``readFile`

...Getting things out `parse``sample``generate`- You can't get out of the IO monad!

- We have seen how monadic combinator parsers work and how to use them.
- The structure of a parser follows the grammar closely.
- Need to avoid left recursion.

- The structure of a parser follows the grammar closely.
- We saw the type and the
**Maybe**class.**Monad** - The
`Parsing module:`

- Source code: Parsing.hs.
- Documentation: Parsing.

**Next week**: symbolic expressions, more about monads!