Programming Language Technology DAT151/DIT231 Andreas Abel



Why an interpreter if we can have a compiler?

An interpreter should be compositional!

⟦op (e₁, e₂)⟧ = ⟦op⟧ (⟦e₁⟧, ⟦e₂⟧)

  eval (Op(e₁,e₂)):  
    v₁ ← eval e₁
    v₂ ← eval e₂
    v  ← combine v₁ and v₂ according to Op
    return v

An interpreter can be specified by:

  1. A mathematical function ⟦_⟧ : Expr → Value (domain theory).
  2. A relation _⇓_ ⊆ Expr × Value (big-step operational semantics).
  3. A reduction relation _→_ ⊆ Expr × Expr (small-step operational semantics).
  4. Pseudo code.
  5. A reference implementation.

We interpret type-checked programs only, since the meaning of overloaded operators depends on their type.

Evaluation of arithmetic expressions

In analogy to type-checking Γ ⊢ e : t we have judgement

γ ⊢ e ⇓ v

"In environment γ, expression e has value v."

Environments γ are similar to contexts Γ, but instead of mapping variables x to types t, they map variables to values v.

An environment γ is a stack of blocks δ, which are finite maps from variables x to values v.

Values are integer, floating-point, and boolean literals, or a special value null for being undefined.

This would be an LBNF grammar for our values:

    VInt.    Value ::= Integer;  
    VDouble. Value ::= Double;  
    VBool.   Value ::= Bool;  
    VNull.   Value ::= "null";  

    rules    Bool  ::= "true" | "false";  

(It is possible but not necessary to use BNFC to generate the value representation.)

Variable rule.

  ---------- γ(x) = v
  γ ⊢ x ⇓ v

Note that we take the value of the "topmost" x---it may appear in several blocks δ of γ.

    { double z = 3.14;       // (z=3.14)
      int    y = 1;          // (z=3.14,y=1)
      { int x = 0;           // (z=3.14,y=1).(x=0)
        int z = x + y;       // (z=3.14,y=1).(x=0,z=1)
        printInt(z);         // 1
      printDouble(z);        // 3.14

The meaning of an arithmetic operator depends on its static type.

  γ ⊢ e₁ ⇓ v₁    γ ⊢ e₂ ⇓ v₂
  --------------------------- v = divide(t,v₁,v₂)
  γ ⊢ e₁ /ₜ e₂ ⇓ v


Judgement γ ⊢ e ⇓ v should be read as a function with inputs γ and e and output v.

    eval(Env γ, Exp e): Value

    eval(γ, EId x)        = lookup(γ,x)
    eval(γ, EInt i)       = VInt i
    eval(γ, EDouble d)    = VDouble d
    eval(γ, EDiv t e₁ e₂) = divide(t, eval(γ,e₁), eval(γ,e₂))

In the last clause, eval(γ,e₁) and eval(γ,e₂) can be run in any order, even in parallel!

Correctness properties



Expression forms like increment (++x and x++) and decrement and assignment x = e in general change the values of variables.
This is called a (side) effect.
(In contrast, the type of variables never changes. Typing has no effects.)

We need to update the environment.
We return the updated environment along with the value:

γ ⊢ e ⇓ ⟨v; γ'⟩

"In environment γ, expression e has value v and updates environment to γ'."

We write γ[x=v] for environment γ' with γ'(x) = v and γ'(z) = γ(z) when z ≠ x.

Note that we update the "topmost" x only --- it may appear in several blocks δ of γ.


  -------------- γ(x) = v
  γ ⊢ x ⇓ ⟨v; γ⟩

  -------------------------------- γ(x) = i
  γ ⊢ ++(int)x ⇓ ⟨i+1; γ[x = i+1]⟩

  γ ⊢ e ⇓ ⟨v; γ'⟩
  γ ⊢ (x = e) ⇓ ⟨v; γ'[x=v]⟩

  γ ⊢ e₁ ⇓ ⟨v₁;γ₁⟩    γ₁ ⊢ e₂ ⇓ ⟨v₂;γ₂⟩
  ------------------------------------ v = divide(t,v₁,v₂)
  γ ⊢ e₁ /ₜ e₂ ⇓ ⟨v;γ₂⟩

The last rule show: we need to thread the environment through the judgements.

The evaluation order matters now!

    eval(Env γ, Exp e): Value × Env

    eval(γ, EId x)        =
      ⟨ lookup(γ,x), γ ⟩

    eval(γ, EAssign x e)  = let
        ⟨v,γ'⟩ = eval(γ,e)
      in ⟨ v, update(γ',x,v) ⟩

    eval(γ, EDiv t e₁ e₂) = let
        ⟨v₁,γ₁⟩ = eval(γ, e₁)
        ⟨v₂,γ₂⟩ = eval(γ₁,e₂)
      in ⟨ divide(t,v₁,v₂), γ₂ ⟩


    x=0 ⊢ x + x++


    x=0 ⊢ x++ + x.  

State threading behind the scenes

E.g. in Java, we can use a environment env global to the interpreter and mutate it with update.

    eval(Exp e): Value

    eval(EId x):  
        return env.lookup(x)

    eval(EAssign x e):  
        v ← eval(e)
        return v

    eval(γ, EDiv t e₁ e₂):  
        v₁ ← eval(e₁)
        v₂ ← eval(e₂)
        return divide(t,v₁,v₂)

This keeps the environment implicit.

In Haskell, we can use the state monad for the same purpose.

    import Control.Monad.State (State, get, modify, evalState)

    eval : Exp  State Env Value

    eval (EId x) = do
      γ  get
      return (lookupVar γ x)

    eval (EAssign x e) = do
      v  eval e
      modify (λ γ  updateVar γ x v)
      return v

    eval (EDiv TInt e₁ e₂) = do
      VInt i₁  eval e₁
      VInt i₂  eval e₂
      return (VInt (div i₁ i₂))

    interpret : Exp  Value
    interpret e = evalState (eval e) emptyEnv

The Haskell state monad is just sugar. It is implemented roughly by:

    type State s a = s  (a, s)

    get :: State s s                 -- s → (s, s)
    get s = (s, s)

    modify :: (s  s)  State s ()   -- s → ((), s)
    modify f s = ((), f s)

    return :: a  State s a          -- s → (a, s)
    return a s = (a, s)

    (do x  p; q(x)) s = let
         (a, s₁) = p s
         (b, s₂) = q(a) s₁
      in (b, s₂)

N.B.: For the Haskell hacker: (do x ← p; q x) = uncurry q . p.

Correctness revisited

We need to start with good environments γ : Γ.
This is defined pointwise: γ(x) : Γ(x) for all x in scope.


Input and output:

  ---------------------- read i from stdin
  γ ⊢ readInt() ⇓ ⟨i; γ⟩

  γ ⊢ e ⇓ ⟨i; γ'⟩
  ---------------------------- print i on stdout
  γ ⊢ printInt(e) ⇓ ⟨null; γ'⟩

We do not model input and output mathematically here.
Note that read i from stdin can also abort with an exception.

Exercise: Could we model evaluation with input and output as a mathematical function?


Statements do not have a value. They just have effects.

γ ⊢ s ⇓ γ'

γ ⊢ ss ⇓ γ'

"In environment γ, statement s (sequence ss) executes successfully, returning updated environment γ'."

Sequencing γ ⊢ ss ⇓ γ':

  γ ⊢ ε ⇓ γ

  γ ⊢ s ⇓ γ₁    γ₁ ⊢ ss ⇓ γ₂
  γ ⊢ s ss ⇓ γ₂

An empty sequence is interpreted as the identity function id : Env → Env,
sequence composition as function composition.


  γ. ⊢ ss ⇓ γ'.δ
  γ ⊢ {ss} ⇓ γ'

Variable declaration.

  γ.δ[x=null] ⊢ e ⇓ ⟨v, γ'.δ'⟩
  γ.δ ⊢ t x = e;  ⇓ γ'.δ'[x=v]


  γ ⊢ e ⇓ ⟨false; γ'⟩
  γ ⊢ while (e) s ⇓ γ'

  γ ⊢ e ⇓ ⟨true; γ₁⟩    γ₁. ⊢ s ⇓ γ₂.δ    γ₂ ⊢ while e (s) ⇓ γ₃
  γ ⊢ while (e) s ⇓ γ₃


  γ ⊢ e ⇓ ⟨true; γ₁⟩    γ₁ ⊢ { s } while e (s) ⇓ γ₃
  γ ⊢ while (e) s ⇓ γ₃


  γ ⊢ e ⇓ ⟨v, γ'⟩
  γ ⊢ return e; ⇓ ??  

Return alters the control flow:
We can exit from the middle of a loop, block etc.!

    bool prime (int p) {
      if (p <= 2) return p == 2;  
      else {
        int q = 3;  
        while (q * q <= p) {
          if (divides(q,p)) return false;  
          else q = q + 2;  
      return true;  

Similar to return: break, continue.


return can be modelled as an exception that carries the return value v as exception information.
When calling a function, we handle the exception, treating it as regular return from the function.

γ ⊢ s ⇓ r where r ::= v | γ'

"In environment γ statement s executes successfully,
either asking to return value v,
or to continue execution in updated environment γ'."

The disjoint union Value ⊎ Env could be modeled in LBNF as:

    Return.   Result ::= "return" Value;  
    Continue. Result ::= "continue" Env;  


  γ ⊢ e ⇓ ⟨v, γ'⟩
  γ ⊢ return e; ⇓ v

Statement as expression.

  γ ⊢ e ⇓ ⟨v, γ'⟩
  γ ⊢ e; ⇓ γ'

Sequence: Propagate exception.

  γ ⊢ s ⇓ v
  γ ⊢ s ss ⇓ v

  γ ⊢ s ⇓ γ₁   γ₁ ⊢ ss ⇓ r
  γ ⊢ s ss ⇓ r

Exercise: How to change while?
One solution:

  γ ⊢ e ⇓ ⟨true; γ₁⟩    γ₁ ⊢ { s } while e (s) ⇓ r
  γ ⊢ while (e) s ⇓ r

Function call.

  γ    ⊢ e₁ ⇓ ⟨v₁, γ₁⟩
  γ₁   ⊢ e₂ ⇓ ⟨v₂, γ₂⟩
  γₙ₋₁ ⊢ eₙ ⇓ ⟨vₙ, γₙ⟩
  (x₁=v₁,...,xₙ=vₙ) ⊢ ss ⇓ v
  -------------------------- σ(f) = t f (t₁ x₁,...,tₙ xₙ) { ss }
  γ ⊢ f(e₁,...,eₙ) ⇓ ⟨v, γₙ⟩

To implement the side condition, we need a global map σ from function names f to their definition t f (t₁ x₁,...,tₙ xₙ) { ss }.


In Java, we can use Java's exception mechanism.

   eval(ECall f es):  
     vs     ← eval(es)        // Evaluate list of arguments
     (Δ,ss) ← lookupFun(f)    // Get parameters and body of function
     γ      ← saveEnv()       // Save current environment
     setEnv(makeEnv(Δ,vs))    // New environment binding the parameters
     try {
        execStms(ss)          // Run function body
     } catch {
       ReturnException v:  
         setEnv(γ)            // Restore environment
         return v             // Evaluation result is returned value

   execStm(SReturn e):  
     v ← eval(e)
     throw new ReturnException(v)

In Haskell, we can use the exception monad.

    import Control.Monad.Except
    import Control.Monad.Reader
    import Control.Monad.State

    type M = ReaderT Sig (StateT Env (ExceptT Value IO))

    eval :: Exp  M Value
    eval (ECall f es) = do
      vs      mapM eval es
      (Δ,ss)  lookupFun f
      γ       get
      put (makeEnv Δ vs)
      execStms ss `catchError` λ v  do
        put γ
        return v

    execStm :: Stm  M ()
    execStm (SReturn e) = do
      v  eval e
      throwError v

More in the hands-on lecture...


A program is interpreted by executing the statements of the main function in an environment with just one empty block.


What is wrong with this rule?

  γ ⊢ e₁ ⇓ ⟨b₁;γ₁⟩    γ₁ ⊢ e₂ ⇓ ⟨b₂;γ₂⟩
  ------------------------------------ b = b₁ ∧ b₂
  γ ⊢ e₁ && e₂ ⇓ ⟨b;γ₂⟩

Here are some examples where we would like to shortcut computation:

    int b = 1;  
    if (b == 0 && the_goldbach_conjecture_holds_up_to_10E100) { ... }
    int x = 0 * number_of_atoms_on_the_moon;  

Short-cutting logical operators like && is essentially used in C, e.g.:

    if (p != NULL && p[0] > 10)

Without short-cutting, the program would crash in p[0] by accessing a NULL pointer.

Rules with short-cutting:

  γ ⊢ e₁ ⇓ ⟨false; γ₁⟩
  γ ⊢ e₁ && e₂ ⇓ ⟨false; γ₁⟩

  γ ⊢ e₁ ⇓ ⟨true; γ₁⟩    γ₁ ⊢ e₂ ⇓ ⟨b; γ₂⟩
  γ ⊢ e₁ && e₂ ⇓ ⟨b; γ₂⟩

The effects of e₂ are not executed if e₁ already evaluated to false.


  while (x < 10 && f(x++)) { ... }

You can circumvent this by defining your own and:

  bool and (bool x, bool y) { return x && y; }

  while (and (x < 10, f(x++))) { ... }

Digression on: call-by-name (call-by-need) call-by-value

Example 1:

    double (x) = x + x
    double (1+2)

Example 2:

    zero (x) = 0
    zero (1+2)

Languages with effects (such as C/C++) mostly use call-by-value.
Haskell is pure (no effects) and uses call-by-need, which refines call-by-name.