Josef Svenningsson
The ideas I'm going to present in this tutorial have been used extensively in the development of Feldspar.
Feldspar is a collaboration between:
Ericsson | ![]() |
Chalmers University of Technology | ![]() |
ELTE University | ![]() |
Feldspar is a domain specific language for programming low-level, performance sensitive embedded systems with focus on digital signal processing.
What do I mean with Deep and Shallow Embeddings?
Simple language for 2D regions
type Region
inRegion :: Point -> Region -> Bool
circle :: Radius -> Region
outside :: Region -> Region
(/\) :: Region -> Region -> Region
(\/) :: Region -> Region -> Region
annulus :: Radius -> Radius -> Region
annulus r1 r2 = outside (circle r1) /\ (circle r2)
Credit to Carlsson, Hudak and Jones
type Region = Point -> Bool
p `inRegion` r = r p
circle r = \p -> magnitude p <= r
outside r = \p -> not (r p)
r1 /\ r2 = \p -> r1 p && r2 p
r1 \/ r2 = \p -> r1 p || r2 p
The type Point -> Bool
is the semantic domain of the shallow embedding
Easy to add new language constructs - as long as they can be interpreted in the semantic domain
Efficient, tagless evaluation
data Region = Circle Radius
| Outside Region
| Intersect Region Region
| Union Region Region
circle r = Circle r
outside r = Outside r
r1 /\ r2 = Intersect r1 r2
r1 \/ r2 = Union r1 r2
p `inRegion` (Circle r) = magnitude p <= r
p `inRegion` (Outside r) = not (p `inRegion` r)
p `inRegion` (Intersect r1 r2) = p `inRegion` r1 && p `inRegion` r2
p `inRegion` (Union r1 r2) = p `inRegion` r1 || p `inRegion` r2
Easy to add new forms of interpretations
E.g. compile regions to C functions
Possible to optimize the representation, e.g. using smart constructors
outside (Outside r) = r
Laborious to add new language constructs
Requires a new data type for the AST - more code
Potentially slow, tagful evaluation
Deep and Shallow embeddings are two complementary ways to embed a target language in a host language
Use a shallow embedding:
Use a deep embedding:
The Finally Tagless approach to embedding languages can be used to achieve effects similar to what I present here.
Why I not going to say anything about the Finally Tagless approach, and why we don't use it in the Feldspar project:
Goals:
The essence of this technique is to have:
Our example language for demonstrating our main point is a small functional flavored C.
data FunC a where
LitI :: Int -> FunC Int
LitB :: Bool -> FunC Bool
While :: (FunC s -> FunC Bool) -> (FunC s -> FunC s) -> FunC s -> FunC s
If :: FunC Bool -> FunC a -> FunC a -> FunC a
Pair :: FunC a -> FunC b -> FunC (a,b)
Fst :: FunC (a,b) -> FunC a
Snd :: FunC (a,b) -> FunC b
Prim1 :: String -> (a -> b) -> FunC a -> FunC b
Prim2 :: String -> (a -> b -> c) -> FunC a -> FunC b -> FunC c
Value :: a -> FunC a
Variable :: String -> FunC a
It is inspired by the core language used in Feldspar, which has served us well for writing embedded software.
The problem of having several constructor for primitive functions is solvable, but that's a topic for another tutorial
FunC
The semantics of FunC
eval :: FunC a -> a
eval (LitI i) = i
eval (LitB b) = b
eval (While c b i) = head $
dropWhile (eval . c . value) $
iterate (eval . b . value) $
eval i
eval (If c t e) = if eval c then eval t else eval e
eval (Pair a b) = (eval a,eval b)
eval (Fst p) = fst (eval p)
eval (Snd p) = snd (eval p)
eval (Prim1 _ f a) = f (eval a)
eval (Prim2 _ f a b) = f (eval a) (eval b)
eval (Value a) = a
In order to integrate additional types to the language in a smooth way we introduce the following type class.
class Syntactic a where
type Internal a
fromFunC :: FunC (Internal a) -> a
toFunC :: a -> FunC (Internal a)
This type class summarises all types which can be used in our domain specific language. The functions fromFunC
and toFunC
shows how to convert to and from our AST representation.
The Syntactic instance for FunC
instance Syntactic (FunC a) where
type Internal (FunC a) = a
toFunC ast = ast
fromFunC ast = ast
Given the Syntactic
class we now export all our language constructs as overloaded
true :: FunC Bool
true = LitB True
false :: FunC Bool
false = LitB False
ifC :: Syntactic a => FunC Bool -> a -> a -> a
ifC c t e = fromFunC (If c (toFunC t) (toFunC e))
(?) = ifC
while :: Syntactic s => (s -> FunC Bool) -> (s -> s) -> s -> s
while c b i = fromFunC (While (c . fromFunC) (toFunC . b . fromFunC) (toFunC i))
The types of these overloaded functions are very Haskell-like
Allows for easily extending the language, by instantiating Syntactic
Makes programming in the embedded language quite a bit more pleasant.
Base types are still on the form FunC a
A small program computing the modulus by means of iterated subtraction
modulus :: FunC Int -> FunC Int -> FunC Int
modulus a b = while (>=b) (subtract b) a
I assume the existence of primitive functions >=
and subtract
Under our new regime, new language features are represented using a Shallow embedding.
The Shallow embedding is a new type
If the Deep embedding cannot represent the new language feature we will have to extend the AST
As a warmup we will add conditional values to FunC
.
The idea is to mimic provide functionality similar to Haskell's Maybe
-type.
data Maybe a = Just a | Nothing
data Option a = Option { isSome :: FunC Bool
, fromSome :: a
}
instance Syntactic a => Syntactic (Option a) where
type Internal (Option a) = (Bool,Internal a)
fromFunC m = Option (Fst m) (Snd m)
toFunC (Option b a) = Pair b a
The name Option
comes from O`Caml and is chosen so as to not collide with any Haskell type.
some :: a -> Option a
some a = Option true a
none :: Option a
none = Option false ?
How can we define none
?
We need a notion of undefined values!
data FunC
...
Undef :: FunC a
...
eval (Undef) = undefined
undef :: Syntactic a => a
undef = fromFunC Undef
some :: a -> Option a
some a = Option true a
none :: Syntactic a => Option a
none = Option false undef
option :: (Syntactic a, Syntactic b) =>
b -> (a -> b) -> Option a -> b
option noneCase someCase opt = ifC (isSome opt)
(someCase (fromSome opt))
noneCase
instance Functor Option where
fmap f (Option b a) = Option b (f a)
instance Monad Option where
return a = some a
opt >>= k = b { isSome = isSome opt ? (isSome b, false) }
where b = k (fromSome opt)
Assuming the following operation
divO :: FunC Int -> FunC Int -> Option (FunC Int)
We can use Haskell's overloaded do-notation
divTest :: FunC Int -> FunC Int -> FunC Int -> Option (FunC Int)
divTest a b c = do r1 <- divO a b
r2 <- divO a c
return (a+b)
The resulting AST is a sequence of if-expressions.
Our FunC
language is rather impoverished at the moment and at the very minimum we will require arrays.
In our addition we will distinguish between arrays and vectors
The idea is that programmers should not worry about Arrays but only program with Vectors.
As our Deep representation we have arrays.
data FunC a where
...
Arr :: FunC Int -> (FunC Int -> FunC a) -> FunC (Array Int a)
...
eval (Arr l ixf) = listArray (0,lm1) $
[ (eval $ ixf $ value i) | i <- [0..lm1]]
where lm1 = eval l - 1
len :: FunC (Array Int a) -> FunC Int
len (Arr l _) = l
(+!+) :: FunC (Array Int a) -> FunC Int -> a
Arr _ f +!+ ix = f ix
This is a popular representation for arrays. Can be traced back to Pan
Arrays will be stored in memory
Each element is computed independently; the whole array can be computed in parallel
As our Shallow embedding we have the type Vector
data Vector a where
Indexed :: FunC Int -> (FunC Int -> a) -> Vector a
instance Syntactic a => Syntactic (Vector a) where
type Internal (Vector a) = Array Int (Internal a)
toFunC (Indexed l ixf) = Arr l (toFunC . ixf)
fromFunC arr = Indexed (len arr) (\ix -> arr +!+ ix)
Note the similarity with the Array constructor
The Syntactic
instance makes it part of our language
Examples of how to write functions for Vector
takeVec :: FunC Int -> Vector a -> Vector a
takeVec i (Indexed l ixf) = Indexed (minF i l) ixf
enumVec :: FunC Int -> FunC Int -> Vector (FunC Int)
enumVec f t = Indexed (t - f + 1) (\ix -> ix + f)
instance Functor Vector where
fmap f (Indexed l ixf) = Indexed l (f . ixf)
Note that they only involve the Shallow embedding, no manipulation of syntax trees
Compute the moving average of a vector
movingAvg :: FunC Int -> Vector (FunC Int) -> Vector (FunC Int)
movingAvg n = map ((`divP` n) . sum . take n) . tails
A consequence of this design is that the Vector
type only exists at compile time, i.e. during Haskell evaluation.
Once the syntax tree is built there are no Vector
values left
This is usually referred to as Fusion.
A small test program
squares :: FunC Int -> Vector (FunC Int)
squares n = fmap square $ enumVec 1 n
where square x = x * x
The intermediate vector will disappear
\a -> { (i+1) * (i+1) | i <- 0 .. (a + 1) - 1 }
This made-up syntax is an array comprehension. The variable i
takes on all the indices of the array, and the expression (i+1) * (i+1)
computes the value of the array for each index.
With our design of Vector
we can give strong guarantees about fusion
Whenever two vector functions are composed, the intermediate vector is removed.
Much stronger guarantee than conventional compiler optimizations
Helps the programmer understand whether his program will be efficient or not.
One caveat that is easy to forget
When Vectors are passed around in loops, they will be stored written to memory on each iteration.
If we are defining a pure language, it is often that we still want side-effects,
Monads is the standard solution to this problem and it is perfectly possible to add monads to an embedded language using the techniques presented here. There is a paper at IFL this year describing the details.
One problem with our representation of vectors is that some functions are very inefficient to compute. For example scans.
The core of this problem is that each element is computed independently, we cannot pass any values between these computations.
A better representation would be to base it on the type of array unfold
unfold :: (b -> (a,b)) -> b -> Int -> Array Int a
Excercise:
Vector
typeSyntactic
instancemap
and scan
.I can take very little credit for the presented material
Thanks!