"(1+2)*3" ⟹

- Remember
`showExpr`

from last time?`showExpr`**::****Expr****->****String**

- What if we need the opposite?
`readExpr`**::****String****->****Expr**

- Haskell can give us functions with the right type automatically
**data****Expr****=****Num****Integer****|****Add****Expr****Expr****|****Mul****Expr****Expr****deriving**(**Show**,**Read**)`ex1`**=****Mul**(**Add**(**Num**1) (**Num**2)) (**Num**3)`show ex1`

"Mul (Add (Num 1) (Num 2)) (Num 3)"`read "Mul (Add (Num 1) (Num 2)) (Num 3)"::Expr`

Mul (Add (Num 1) (Num 2)) (Num 3)- But that doesn't give us the syntax we want. That's why we wrote
.
`showExpr`

- Data construtors can be infix operators in Haskell.
**infixl**6**:+****infixl**7**:*****data****Expr****=****C****Integer****|****Expr****:+****Expr****|****Expr****:*****Expr****deriving**(**Show**,**Read**)`ex1`**=**(**C**1**:+****C**2)**:*****C**3`read "C 1 :+ C 2 :* C 3" :: Expr`

C 1 :+ C 2 :* C 3`read "(C 1 :+ C 2) :* C 3" :: Expr`

(C 1 :+ C 2) :* C 3- This is much closer, but still not exactly the syntax we want.

- A
*grammar*defines a*language*.- Formally, a language is a set of strings.
- We usually think of a grammar as defining the
*syntax*of a language- or the "acceptable" input.

- Example: an EBNF
grammar for the addition of two numbers:
- digit ::= "0".."9". number ::= digit {digit}. addNumbers ::= number "+" number.

- Grammars can be used as
*documentation*of languages. - Grammars can also be used as
*specifications*of- Parsers ()
`readExpr`**::****String****->****Expr** - Printers ()
`showExpr`**::****Expr****->****String** - Test data generators ()
`rExpr`**::****Gen****Expr**

- Parsers (
- If the grammar is written in a suitable form, these types of functions
can be derived from the grammar in a fairly direct and systematic
way.
- Haskell's

is an example of parsers and printers generated automatically from a particular kind of grammar (a**deriving**(**Read**,**Show**)

declaration)**data**

- Haskell's

- It checks that the input is correct (according to the grammar).
- It converts the an input to some other form, to simplify further processing.
**String**- This could be a
*syntax tree*, in which case we could covert back and forth without loss of information.`prop_correct_parseX``x`**=**`parseX`(`showX``x`)`==``x`

- This could be a

- While writing functions to show a data structure () is fairly straight-forward, the opposite (
**Expr****->****String**) can be much harder.**String****->****Expr** - This type of problem has been studied since the early days of computer science.
- The influential
*parser generator*Yacc appeared in the 1970s.- It uses the LALR(1) parsing method that produces efficient parser for a large class of grammars.

- There are similar parser generators for Haskell, e.g. Happy and BNFC.

- We will look at how to write parsers directly in Haskell.
- It's a good examples of
- Higher order functions
- Abstract data types
- Monads

- It's a good examples of
- With a good parsing library, creating a parser directly in Haskell is in some ways easier than using a parser generator.
- Parsing libraries typically create backtracking recursive descent parsers.

- QuickCheck provides test data generators of type .
**Gen**`a`- Keeps track of a supply of random numbers under the hood.
- Lets you build functions that return random values by using random numbers from the supply.

- Parsing libraries provide parsers of type .
**Parser**`a`- Keeps track of the remaining input under the hood.
- Lets you build parsing functions that return values by using some of the remaining input.
- Can fail.

- Both are monads, so we can use the -notation in both cases.
**do**

**data****Parser**`a`-- abstract type of parsers -- Parsing a single character`sat`**::**(**Char****->****Bool**)**->****Parser****Char**`char`**::****Char****->****Parser****Char**-- Choice (`<|>`)**::****Parser**`a`**->****Parser**`a`**->****Parser**`a`-- Repetition`zeroOrMore`,`oneOrMore`**::****Parser**`a`**->****Parser**[`a`]- This is a small but powerful set of
*parsing combinators*from which parser for arbitrarily complex structures can be built. - Actually, we need a few more things...

- There is no operator in the library to parse one thing followed by another
-- Not included in the library

`pair`**::****Parser**`a`**->****Parser**`b`**->****Parser**(`a`,`b`)- But since is a monad, we can use the
**Parser**notation instead.**do**- This is a more convenient and flexible solution.

- The library also contains the following important function:
-- Running a parser

`parse`**::****Parser**`a`**->****String****->****Maybe**(`a`,**String**)

- It uses the predefined type:
**Maybe****data****Maybe**`a`**=****Nothing****|****Just**`a`- It is useful for functions like that need to indicate failure or success.
`parse`

- The result of is:
`parse``p``s`- If the parser fails: ,
**Nothing** - If the parser succeeds:
**Just**(`x`,`r`)- where is the result and
`x`is the remaining, unused input.`r`

- where

- If the parser fails:

`parse`**::****Parser**`a`**->****String****->****Maybe**(`a`,**String**)- Assuming we have a parser for numbers
`number`**::****Parser****Integer**

- here are some examples of how it behaves on different input:
`parse``number`"42"`==`**Just**(42,"")`parse``number`"xyz42"`==`**Nothing**`parse``number`"42xyz"`==`**Just**(42,"xyz")

- Write a function
`completeParse`**::****Parser**`a`**->****String****->****Maybe**`a`

- that only succeeds if the parser comsumes the entire input string (the remaining input is empty).

- Example: parse two numbers, separated by
`+`

, and add them. - digit ::= "0".."9". number ::= digit {digit}. addNumbers ::= number "+" number.

- Example: parse two numbers, separated by
`+`

, and add them. `digit`**::****Parser****Char**`digit`**=**`sat``isDigit``number`**::****Parser****Integer**`number`**=****do**`s`**<-**`oneOrMore``digit``return`(`read``s`)`addNumbers`**::****Parser****Integer**`addNumbers`**=****do**`a`**<-**`number``char`'+'`b`**<-**`number``return`(`a``+``b`)- digit ::= "0".."9". number ::= digit {digit}. addNumbers ::= number "+" number.

- Example: parse two numbers, separated by
`+`

, and add them. `number`**::****String****->****Maybe**(**Integer**,**String**)`number``s`**=****case**`span``isDigit``s`**of**("",**_**)**->****Nothing**(`s`,`r`)**->****Just**(`read``s`,`r`)`addNumbers`**::****String****->****Maybe**(**Integer**,**String**)`addNumbers``s`**=****case**`number``s`**of****Nothing****->****Nothing****Just**(`a`,`r`)**->****case**`r`**of**'+'**:**`r`**->****case**`number``r`**of****Nothing****->****Nothing****Just**(`b`,`r`)**->****Just**(`a``+``b`,`r`)**_****->****Nothing**

**data****Expr****=****Num****Integer****|****Add****Expr****Expr****|****Mul****Expr****Expr**- expr ::= number | expr "+" expr | expr "*" expr | "(" expr ")"
- But using this grammar as a specification for a parser is problematic:
- It is ambiguous (doesn't specify operator precedences).
- It is
*left recursive*, recursive decent parsers can't handle this.

- A BNF grammar for expressions
- expr ::= term "+" expr | term. term ::= factor "*" term | factor. factor ::= number | "(" expr ")".

- This grammar "accepts" the same input as the previous one, and
solves the problems.
- It makes it clear that "*" has higher precedence than "+"
- There is no left recursion
- But are the operators left or right associative?

expr,expr',term,term',factor::ParserExprexpr=expr'<|>termexpr'=dot<-termchar'+'e<-exprreturn(Addte)term=term'<|>factorterm'=dof<-factorchar'*'t<-termreturn(Mulft)factor=don<-number;return(Numn)<|>dochar'('e<-exprchar')'returne

`parse expr "1+2*3"`

Just (Add (Num 1) (Mul (Num 2) (Num 3)),"")- It works!
- But the code is rather long and repetitive.
- The parser is 17 lines long, but the grammar is only 3 lines long.

- We can do better...

- 1. Both alternatives in start with
`expr`.`term`- If the alternative fails,
`expr'`will parse the same string twice. Inefficient!`term` - Solution: left factorisation.

- If the
- 2. The operators have become right associative
- Making them left associative is a simple change in the grammar:
`expr ::= term "+" expr | term`

`expr ::= expr "+" term | term`

- But the left associative grammar is also left recursive, which leads to problems in the parser...

- Making them left associative is a simple change in the grammar:

- An EBNF grammar for expressions
- expr ::= term {"+" term}. term ::= factor {"*" factor}. factor ::= number | "(" expr ")".

- The grammar is left factored. Good for efficiency.
- An is one or more
`expr`s separated by "+"`term`- We can choose how to convert a to an
[

**Expr**]**Expr**

- We can choose how to convert a

`expr`**=****do**`t`**<-**`term``ts`**<-**`zeroOrMore`(**do**`char`'+';`term`)`return`(`foldl`**Add**`t``ts`)`term`**=****do**`f`**<-**`factor``fs`**<-**`zeroOrMore`(**do**`char`'*';`factor`)`return`(`foldl`**Mul**`f``fs`)- Using makes "+" left associative.
`foldl`**Add**`t``ts``foldl`**::**(`b`**->**`a`**->**`b`)**->**`b`**->**[`a`]**->**`b`-- foldl is like foldr, but it works from the left...

- But there is still a repeating pattern!

- From the parsing library
`chain`**::****Parser**`item`**->****Parser**`sep`**->****Parser**[`item`]`chain``item``sep`**=****do**`i`**<-**`item``is`**<-**`zeroOrMore`(**do**`sep`;`item`)`return`(`i`**:**`is`)

- For our expression parser
`leftAssoc`**::**(`t`**->**`t`**->**`t`)**->****Parser**`t`**->****Parser**`sep`**->****Parser**`t``leftAssoc``op``item``sep`**=****do**`is`**<-**`chain``item``sep``return`(`foldl1``op``is`)

- Parsing two things and keeping only one of them (*)
(

`<*`)**::****Parser**`b`**->****Parser**`a`**->****Parser**`b`(`*>`)**::****Parser**`a`**->****Parser**`b`**->****Parser**`b`

- Applying a function to the result of a parser (*)
(

`<$>`)**::**(`a`**->**`b`)**->****Parser**`a`**->****Parser**`b`

- (*) These combinators actually have more general types...

`expr`,`term`,`factor`**::****Parser****Expr**`expr`**=**`leftAssoc`**Add**`term`(`char`'+')`term`**=**`leftAssoc`**Mul**`factor`(`char`'*')`factor`**=****Num**`<$>``number``<|>``char`'('`*>``expr``<*``char`')'- expr ::= term {"+" term}. term ::= factor {"*" factor}. factor ::= number | "(" expr ")".

- The type:
**Parser**`a` **data****Parser**`a`**=****P**(**String****->****Maybe**(`a`,**String**))`parse`**::****Parser**`a`**->****String****->****Maybe**(`a`,**String**)`parse`(**P**`f`)`s`**=**`f``s`- Parsers are represented as functions!
- To apply a parser to an input string, we just extract the function and apply it!

- Parsing a single character
`sat`**::**(**Char****->****Bool**)**->****Parser****Char**`sat``p`**=****P**`sat_p`**where**`sat_p`(`c`**:**`s`)**|**`p``c`**=****Just**(`c`,`s`)`sat_p`**_****=****Nothing**`char`**::****Char****->****Parser****Char**`char``c`**=**`sat`(`==``c`)

- Choice
(

`<|>`)**::****Parser**`a`**->****Parser**`a`**->****Parser**`a`**P**`pf1``<|>`**P**`pf2`**=****P**`pf`**where**`pf``s`**=****case**`pf1``s`**of****Nothing****->**`pf2``s``r`**->**`r`- Try the first parser, if it fails try the second parser. If the first parser succeeds, use that result.

- Parsing a pair
`pair`**::****Parser**`a`**->****Parser**`b`**->****Parser**(`a`,`b`)`pair`(**P**`pa`) (**P**`pb`)**=****P**(**\**`s`**->****case**`pa``s`**of****Nothing****->****Nothing****Just**(`a`,`r`)**->****case**`pb``r`**of****Nothing****->****Nothing****Just**(`b`,`r`)**->****Just**((`a`,`b`),`r`))- This is very similar to the parser we say earlier.
`addNumbers`

- This is very similar to the

- A simple variation on parsing a pair
`apply`**::****Parser**(`a`**->**`b`)**->****Parser**`a`**->****Parser**`b``apply`(**P**`pf`) (**P**`pb`)**=****P**(**\**`s`**->****case**`pf``s`**of****Nothing****->****Nothing****Just**(`f`,`r`)**->****case**`pa``r`**of****Nothing****->****Nothing****Just**(`a`,`r`)**->****Just**(`f``a`,`r`))- More useful than always returning a pair!
`pair``pa``pb`**=**`return`(,) ``apply```pa```apply```pb`

- To support the notation, we need to create an instance in the
**do**class.**Monad** **class****Monad**`m`**where**`return`**::**`a`**->**`m``a`(`>>=`)**::**`m``a`**->**(`a`**->**`m``b`)**->**`m``b`- How does this relate to the notation?
**do**- ⟹
**do**`x`**<-**`m`;`more`;`stuff``m``>>=`(**\**`x`**->****do**`more`;`stuff`)

- Compare how the enumeration notation relates to the class:
**Enum**- ⟹
[

`x`**..**`y`]`enumFromTo``x``y`

- We need to implement
`return`**::**`a`**->****Parser**`a`(`>>=`)**::****Parser**`a`**->**(`a`**->****Parser**`b`)**->****Parser**`b`- Defining is simple enough.
`return` - Defining is a twist on
(

`>>=`)...`apply`

instanceMonadParserwherereturnx=P(\s->Just(x,s))Pp>>=f=P(\s->casepsofNothing->NothingJust(a,r)->parse(fa)r)

- Every monad supports the and
`return`operators.(

`>>=`) - Different monads support different additional operations:
**Parser****Gen****IO**Building things `sat``char`(

`<|>`)

...`elements``oneof``frequency`

...`putStr``getLine``readFile`

...Getting things out `parse``sample``generate`- You can't get out of the IO monad!

- We have seen how monadic combinator parsers work and how to use them.
- The structure of a parser follows the grammar closely.
- Need to avoid left recursion.

- The structure of a parser follows the grammar closely.
- We saw the type and the
**Maybe**class.**Monad** - The
`Parsing module:`

- Source code: Parsing.hs.
- Documentation: Parsing.

**Next time**: more about monads!