Combining Deep and Shallow Embeddings: A Tutorial

Josef Svenningsson

Background

The ideas I'm going to present in this tutorial have been used extensively in the development of Feldspar.

Feldspar is a collaboration between:

Ericsson Ericsson logo
Chalmers University of Technology Chalmers logo
ELTE University ELTE logo

Feldspar

Feldspar is a domain specific language for programming low-level, performance sensitive embedded systems with focus on digital signal processing.

Contrast Deep and Shallow Embeddings

What do I mean with Deep and Shallow Embeddings?

Running example contrasting Deep and Shallow Embeddings

Simple language for 2D regions

type Region

inRegion :: Point -> Region -> Bool

circle  :: Radius -> Region
outside :: Region -> Region
(/\)    :: Region -> Region -> Region
(\/)    :: Region -> Region -> Region

annulus :: Radius -> Radius -> Region
annulus r1 r2 = outside (circle r1) /\ (circle r2)

Credit to Carlsson, Hudak and Jones

Shallow Embedding

type Region = Point -> Bool

p `inRegion` r = r p

circle  r = \p -> magnitude p <= r
outside r = \p -> not (r p)
r1 /\ r2  = \p -> r1 p && r2 p
r1 \/ r2  = \p -> r1 p || r2 p

The type Point -> Bool is the semantic domain of the shallow embedding

Shallow Embedding - Pros and Cons

Pros

Cons

Deep Embedding

data Region = Circle    Radius
            | Outside   Region
            | Intersect Region Region
            | Union     Region Region

circle  r = Circle    r
outside r = Outside   r
r1 /\ r2  = Intersect r1 r2
r1 \/ r2  = Union     r1 r2

p `inRegion` (Circle  r)       = magnitude p <= r
p `inRegion` (Outside r)       = not (p `inRegion` r)
p `inRegion` (Intersect r1 r2) = p `inRegion` r1 && p `inRegion` r2
p `inRegion` (Union     r1 r2) = p `inRegion` r1 || p `inRegion` r2

Deep Embedding - Pros and Cons

Pros

Cons

Conclusion: Deep and Shallow embeddings

Deep and Shallow embeddings are two complementary ways to embed a target language in a host language

Use a shallow embedding:

Use a deep embedding:

Finally Tagless

The Finally Tagless approach to embedding languages can be used to achieve effects similar to what I present here.

Why I not going to say anything about the Finally Tagless approach, and why we don't use it in the Feldspar project:

Combining Deep and Shallow Embedding

Goals:

Combining Deep and Shallow Embedding

The essence of this technique is to have:

Functional C - Deep Embedding

Our example language for demonstrating our main point is a small functional flavored C.

data FunC a where
  LitI     :: Int  -> FunC Int
  LitB     :: Bool -> FunC Bool
  While    :: (FunC s -> FunC Bool) -> (FunC s -> FunC s) -> FunC s -> FunC s
  If       :: FunC Bool -> FunC a -> FunC a -> FunC a
  Pair     :: FunC a -> FunC b -> FunC (a,b)
  Fst      :: FunC (a,b) -> FunC a
  Snd      :: FunC (a,b) -> FunC b

  Prim1    :: String -> (a -> b) -> FunC a -> FunC b
  Prim2    :: String -> (a -> b -> c) -> FunC a -> FunC b -> FunC c

  Value    :: a -> FunC a
  Variable :: String -> FunC a

It is inspired by the core language used in Feldspar, which has served us well for writing embedded software.

The problem of having several constructor for primitive functions is solvable, but that's a topic for another tutorial

The meaning of FunC

The semantics of FunC

eval :: FunC a -> a
eval (LitI i)        = i
eval (LitB b)        = b
eval (While c b i)   = head $
                       dropWhile (eval . c . value) $
                       iterate (eval . b . value) $
                       eval i
eval (If c t e)      = if eval c then eval t else eval e
eval (Pair a b)      = (eval a,eval b)
eval (Fst p)         = fst (eval p)
eval (Snd p)         = snd (eval p)
eval (Prim1 _ f a)   = f (eval a)
eval (Prim2 _ f a b) = f (eval a) (eval b)
eval (Value a)       = a

Syntactic class

In order to integrate additional types to the language in a smooth way we introduce the following type class.

class Syntactic a where
  type Internal a
  fromFunC :: FunC (Internal a) -> a
  toFunC   :: a -> FunC (Internal a)

This type class summarises all types which can be used in our domain specific language. The functions fromFunC and toFunC shows how to convert to and from our AST representation.

Syntactic instance

The Syntactic instance for FunC

instance Syntactic (FunC a) where
  type Internal (FunC a) = a
  toFunC   ast = ast
  fromFunC ast = ast

Rephrasing Deeply Embedded Features

Given the Syntactic class we now export all our language constructs as overloaded

true :: FunC Bool
true = LitB True

false :: FunC Bool
false = LitB False

ifC :: Syntactic a => FunC Bool -> a -> a -> a
ifC c t e = fromFunC (If c (toFunC t) (toFunC e))

(?) = ifC

while :: Syntactic s => (s -> FunC Bool) -> (s -> s) -> s -> s
while c b i = fromFunC (While (c . fromFunC) (toFunC . b . fromFunC) (toFunC i))

The types of these overloaded functions are very Haskell-like

Allows for easily extending the language, by instantiating Syntactic

Makes programming in the embedded language quite a bit more pleasant.

Base types are still on the form FunC a

Example Program

A small program computing the modulus by means of iterated subtraction

modulus :: FunC Int -> FunC Int -> FunC Int
modulus a b = while (>=b) (subtract b) a

I assume the existence of primitive functions >= and subtract

Adding a feature

Under our new regime, new language features are represented using a Shallow embedding.

The Shallow embedding is a new type

If the Deep embedding cannot represent the new language feature we will have to extend the AST

Options feature

As a warmup we will add conditional values to FunC.

The idea is to mimic provide functionality similar to Haskell's Maybe-type.

data Maybe a = Just a | Nothing

Option embedding

data Option a = Option { isSome   :: FunC Bool
                       , fromSome :: a
                       }

instance Syntactic a => Syntactic (Option a) where
  type Internal (Option a) = (Bool,Internal a)
  fromFunC m = Option (Fst m) (Snd m)
  toFunC (Option b a) = Pair b a

The name Option comes from O`Caml and is chosen so as to not collide with any Haskell type.

Operations on Option - 1st attempt

some :: a -> Option a
some a = Option true a

none :: Option a
none = Option false ?

How can we define none?

We need a notion of undefined values!

Extending the deep embedding with undefined values

data FunC
  ...
  Undef :: FunC a
  ...

eval (Undef) = undefined

undef :: Syntactic a => a
undef = fromFunC Undef

Operations on Option

some :: a -> Option a
some a = Option true a

none :: Syntactic a => Option a
none = Option false undef

option :: (Syntactic a, Syntactic b) => 
          b -> (a -> b) -> Option a -> b
option noneCase someCase opt = ifC (isSome opt)
                                   (someCase (fromSome opt))
                                   noneCase

instance Functor Option where
  fmap f (Option b a) = Option b (f a)

instance Monad Option where
  return a  = some a
  opt >>= k = b { isSome = isSome opt ? (isSome b, false) }
    where b = k (fromSome opt)

Example program

Assuming the following operation

divO :: FunC Int -> FunC Int -> Option (FunC Int)

We can use Haskell's overloaded do-notation

divTest :: FunC Int -> FunC Int -> FunC Int -> Option (FunC Int)
divTest a b c = do r1 <- divO a b
                   r2 <- divO a c
                   return (a+b)

The resulting AST is a sequence of if-expressions.

Vector feature

Our FunC language is rather impoverished at the moment and at the very minimum we will require arrays.

In our addition we will distinguish between arrays and vectors

The idea is that programmers should not worry about Arrays but only program with Vectors.

Vector feature - arrays

As our Deep representation we have arrays.

data FunC a where
  ...
  Arr   :: FunC Int -> (FunC Int -> FunC a) -> FunC (Array Int a)
  ...

eval (Arr l ixf) = listArray (0,lm1) $
                   [ (eval $ ixf $ value i) | i  <- [0..lm1]]
  where lm1  = eval l - 1

len :: FunC (Array Int a) -> FunC Int
len (Arr l _) = l

(+!+) :: FunC (Array Int a) -> FunC Int -> a
Arr _ f +!+ ix = f ix

This is a popular representation for arrays. Can be traced back to Pan

Arrays will be stored in memory

Each element is computed independently; the whole array can be computed in parallel

Vectors cont.

As our Shallow embedding we have the type Vector

data Vector a where
  Indexed :: FunC Int -> (FunC Int -> a) -> Vector a

instance Syntactic a => Syntactic (Vector a) where
  type Internal (Vector a) = Array Int (Internal a)
  toFunC (Indexed l ixf)   = Arr l (toFunC . ixf)
  fromFunC arr = Indexed (len arr) (\ix -> arr +!+ ix)

Note the similarity with the Array constructor

The Syntactic instance makes it part of our language

Vectors - example functions

Examples of how to write functions for Vector

takeVec :: FunC Int -> Vector a -> Vector a
takeVec i (Indexed l ixf) = Indexed (minF i l) ixf

enumVec :: FunC Int -> FunC Int -> Vector (FunC Int)
enumVec f t = Indexed (t - f + 1) (\ix -> ix + f)

instance Functor Vector where
  fmap f (Indexed l ixf) = Indexed l (f . ixf)

Note that they only involve the Shallow embedding, no manipulation of syntax trees

Vectors - Example program

Compute the moving average of a vector

movingAvg :: FunC Int -> Vector (FunC Int) -> Vector (FunC Int)
movingAvg n = map ((`divP` n) . sum . take n) . tails

Vectors cont.

A consequence of this design is that the Vector type only exists at compile time, i.e. during Haskell evaluation.

Once the syntax tree is built there are no Vector values left

This is usually referred to as Fusion.

Example of Fusion

A small test program

squares :: FunC Int -> Vector (FunC Int)
squares n = fmap square $ enumVec 1 n
  where square x = x * x

The intermediate vector will disappear

\a -> { (i+1) * (i+1) | i <- 0 .. (a + 1) - 1 }

This made-up syntax is an array comprehension. The variable i takes on all the indices of the array, and the expression (i+1) * (i+1) computes the value of the array for each index.

Fusion with Vectors

With our design of Vector we can give strong guarantees about fusion

Whenever two vector functions are composed, the intermediate vector is removed.

Much stronger guarantee than conventional compiler optimizations

Helps the programmer understand whether his program will be efficient or not.

Fusion caveat

One caveat that is easy to forget

When Vectors are passed around in loops, they will be stored written to memory on each iteration.

Monads

If we are defining a pure language, it is often that we still want side-effects,

Monads is the standard solution to this problem and it is perfectly possible to add monads to an embedded language using the techniques presented here. There is a paper at IFL this year describing the details.

Combining Deep and Shallow Embeddings - Pros and Cons

Pros

Cons

Remedy

Excercise

One problem with our representation of vectors is that some functions are very inefficient to compute. For example scans.

The core of this problem is that each element is computed independently, we cannot pass any values between these computations.

A better representation would be to base it on the type of array unfold

unfold :: (b -> (a,b)) -> b -> Int -> Array Int a

Excercise - cont.

Excercise:

Credits

I can take very little credit for the presented material

Thanks

Thanks!