Programming Language Technology DAT151/DIT231 Andreas Abel
Why an interpreter if we can have a compiler?
An interpreter should be compositional!
We interpret type-checked programs only, since the meaning of overloaded operators depends on their type.
In analogy to type-checking Γ ⊢ e : t we have judgement
γ ⊢ e ⇓ v in environment γ, expression e has value v
Environments γ are similar to contexts Γ, but instead of mapping variables x to types t, they map variables to values v.
An environment γ is a stack of blocks δ, which are finite maps from variables x to values v.
Values are integer, floating-point, and boolean literals, or a special value null for being undefined.
This would be a LBNF grammar for our values:
VInt. Value ::= Integer;
VDouble. Value ::= Double;
VBool. Value ::= Bool;
VNull. Value ::= "null";
rules Bool ::= "true" | "false";
(It is possible but not necessary to use BNFC to generate the value representation.)
Variable rule.
---------- γ(x) = v
γ ⊢ x ⇓ v
Note that we take the value of the "topmost" x --- it may appear in several blocks δ of γ.
{ double z = 3.14; // (z=3.14)
int y = 1; // (z=3.14,y=1)
{ int z = 0; // (z=3.14,y=1).(z=0)
int x = y + z; // (z=3.14,y=1).(z=0,x=1)
printInt(x); // 1
}
}The meaning of the arithmetic operators depends on the static type.
γ ⊢ e₁ ⇓ v₁ γ ⊢ e₂ ⇓ v₂
--------------------------- v = divide(t,v₁,v₂)
γ ⊢ e₁ /ₜ e₂ ⇓ v
divide(int,v₁,v₂) is integer division of the integer literals v₁ by v₂.divide(double,v₁,v₂) is floating-point division of the floating-point literals v₁ by v₂. Integer literals will be converted to floating-point first.divide(bool,v₁,v₂) is undefined.Judgement γ ⊢ e ⇓ v should be read as a function with inputs γ and e and output v.
eval(Env γ, Exp e): Value
eval(γ, EId x) = lookup(γ,x)
eval(γ, EInt i) = VInt i
eval(γ, EDouble d) = VDouble d
eval(γ, EDiv t e₁ e₂) = divide(t, eval(γ,e₁), eval(γ,e₂))
In the last clause, eval(γ,e₁) and eval(γ,e₂) can be run in any order, even in parallel!
Expression forms like increment (++x and x++) and decrement and assignment x = e in general change the values of variables. This is called a (side) effect. (In contrast, the type of variables never changes. Typing has no effects.)
We need to update the environment. We return the updated environment along with the value:
γ ⊢ e ⇓ ⟨v; γ'⟩ "In environment γ, expression e has value v and updates environment to γ'."
We write γ[x=v] for environment γ' with γ'(x) = v and γ'(y) = γ(y) when y ≠ x.
Note that we update the "topmost" x only --- it may appear in several blocks δ of γ.
Rules.
-------------- γ(x) = v
γ ⊢ x ⇓ ⟨v; γ⟩
-------------------------------- γ(x) = i
γ ⊢ ++(int)x ⇓ ⟨i+1; γ[x = i+1]⟩
γ ⊢ e ⇓ ⟨v; γ'⟩
--------------------------
γ ⊢ (x = e) ⇓ ⟨v; γ'[x=v]⟩
γ ⊢ e₁ ⇓ ⟨v₁;γ₁⟩ γ₁ ⊢ e₂ ⇓ ⟨v₂;γ₂⟩
------------------------------------ v = divide(t,v₁,v₂)
γ ⊢ e₁ /ₜ e₂ ⇓ ⟨v;γ₂⟩
The last rule show: we need to thread the environment through the judgements.
The evaluation order matters now!
eval(Env γ, Exp e): Value × Env
eval(γ, EId x) = ⟨ lookup(γ,x), γ ⟩
eval(γ, EAssign x e) = let
⟨v,γ'⟩ = eval(γ,e)
in ⟨ v, update(γ',x,v) ⟩
eval(γ, EDiv t e₁ e₂) = let
⟨v₁,γ₁⟩ = eval(γ, e₁)
⟨v₂,γ₂⟩ = eval(γ₁,e₂)
in ⟨ divide(t,v₁,v₂), γ₂ ⟩
Consider x=0 ⊢ x + x++ vs x=0 ⊢ x++ + x.
E.g. in Java, we can use a environment env global to the interpreter and mutate it with update.
eval(Exp e): Value
eval(EId x): return env.lookup(x)
eval(EAssign x e):
v ← eval(e)
env.update(x,v)
return v
eval(γ, EDiv t e₁ e₂):
v₁ ← eval(e₁)
v₂ ← eval(e₂)
return divide(t,v₁,v₂)
This keeps the environment implicit.
In Haskell, we can use the state monad for the same purpose.
import Control.Monad.State (State, get, modify, evalState)
eval : Exp → State Env Value
eval (EId x) = do
γ ← get
lookupVar γ x
eval (EAssign x e) = do
v ← eval e
modify (λ γ → updateVar γ x v)
return v
eval (EDiv TInt e₁ e₂) = do
VInt i₁ ← eval e₁
VInt i₂ ← eval e₂
return (VInt (div i₁ i₂))
interpret : Exp → Value
interpret e = evalState (eval e) emptyEnvThe Haskell state monad is just sugar. It is implemented roughly by:
type State s a = s → (a, s)
get :: State s s -- s → (s, s)
get s = (s, s)
modify :: (s → s) → State s () -- s → ((), s)
modify f s = ((), f s)
return :: a → State s a -- s → (a, s)
return a s = (a, s)
(do x ← p; q(x)) s = let
(a, s₁) = p s
(b, s₂) = q(a) s₁
in (b, s₂)Input and output:
---------------------- read i from stdin
γ ⊢ readInt() ⇓ ⟨i, γ⟩
γ ⊢ e ⇓ ⟨i, γ'⟩
---------------------------- print i on stdout
γ ⊢ printInt(e) ⇓ ⟨null, γ'⟩
We do not model input and output mathematically here. Note that read i from stdin can also abort with an exception.
Exercise: Could we model evaluation with input and output as a mathematical function? Hint:
return statement below.)What is wrong with this rule?
γ ⊢ e₁ ⇓ ⟨b₁;γ₁⟩ γ₁ ⊢ e₂ ⇓ ⟨b₂;γ₂⟩
------------------------------------ b = b₁ ∧ b₂
γ ⊢ e₁ && e₂ ⇓ ⟨b;γ₂⟩
Here are some examples where we would like to shortcut computation:
int b = 1;
if (b == 0 && the_goldbach_conjecture_holds_up_to_10E100) { ... }
int x = 0 * number_of_atoms_on_the_moon;Short-cutting logical operators like && is essentially used in C, e.g.:
if (p != NULL && p[0] > 10)
Without short-cutting, the program would crash in p[0] by accessing a NULL pointer.
Rules with short-cutting:
γ ⊢ e₁ ⇓ ⟨false, γ₁⟩
-------------------------
γ ⊢ e₁ && e₂ ⇓ ⟨false, γ₂⟩
γ ⊢ e₁ ⇓ ⟨true, γ₁⟩ γ₁ ⊢ e₂ ⇓ ⟨b, γ₂⟩
----------------------------------------
γ ⊢ e₁ && e₂ ⇓ ⟨b;γ₂⟩
The effects of e₂ are not executed if e₁ already evaluated to false.
Example:
while (x < 10 && f(x++)) { ... }You can circumvent this by defining your own and:
bool and(bool x, bool y) { return x && y; }
while (and(x < 10, f(x++))) { ... }Digression on: call-by-name (call-by-need) call-by-value
Example 1:
double (x) = x + x
double (1+2)
... = double (3) = 3 + 3 = 6... = (1+2) + (1+2) = 3 + (1+2) = 3 + 3 = 6... = let x=1+2 in x + x = let x = 3 in x + x = 3 + 3 = 6Example 2:
zero (x) = 0
zero (1+2)
... = double (3) = 0... = 0... = let x=1+2 in 0 = 0Languages with effects (such as C/C++) mostly use call-by-value. Haskell is pure (no effects) and uses call-by-need, which refines call-by-name.
Statements do not have a value. They just have effects.
γ ⊢ s ⇓ γ'
γ ⊢ ss ⇓ γ'
Sequencing γ ⊢ ss ⇓ γ':
γ ⊢ ε ⇓ γ
γ ⊢ s ⇓ γ₁ γ₁ ⊢ ss ⇓ γ₂
---------------------------
γ ⊢ s ss ⇓ γ₂
An empty sequence is interpreted as the identity function id : Env → Env, sequence composition as function composition.
Blocks.
γ. ⊢ ss ⇓ γ'.δ
--------------
γ ⊢ {ss} ⇓ γ'
Variable declaration.
γ.δ[x=null] ⊢ e ⇓ ⟨v, γ'.δ'⟩
----------------------------
γ.δ ⊢ t x = e; ⇓ γ'.δ'[x=v]
While.
γ ⊢ e ⇓ ⟨false; γ'⟩
--------------------
γ ⊢ while (e) s ⇓ γ'
γ ⊢ e ⇓ ⟨true; γ₁⟩ γ₁. ⊢ s ⇓ γ₂.δ γ₂ ⊢ while e (s) ⇓ γ₃
-------------------------------------------------------------
γ ⊢ while (e) s ⇓ γ₃
Or:
γ ⊢ e ⇓ ⟨true; γ₁⟩ γ₁ ⊢ { s } while e (s) ⇓ γ₃
-------------------------------------------------------------
γ ⊢ while (e) s ⇓ γ₃
Return?
γ ⊢ e ⇓ ⟨v, γ'⟩
-------------------
γ ⊢ return e; ⇓ ??
Return alters the control flow: We can exit from the middle of a loop, block etc.!
bool prime (int p) {
if (p <= 2) return p == 2;
else {
int q = 3;
while (q * q <= p) {
if (divides(q,p)) return false;
else q = q + 2;
}
}
return true;
}Similar to return: break, continue.
return can be modelled as an exception that carries the return value v as exception information. When calling a function, we handle the exception, treating it as regular return from the function.
γ ⊢ s ⇓ r
r ::= v | γ'
The disjoint union Value ⊎ Env could be modeled in LBNF as:
Return. Result ::= "return" Value;
Continue. Result ::= "continue" Env;
Return.
γ ⊢ e ⇓ ⟨v, γ'⟩
------------------
γ ⊢ return e; ⇓ v
Statement as expression.
γ ⊢ e ⇓ ⟨v, γ'⟩
---------------
γ ⊢ e; ⇓ γ'
Sequence: Propagate exception.
γ ⊢ s ⇓ v
-------------
γ ⊢ s ss ⇓ v
γ ⊢ s ⇓ γ₁ γ₁ ⊢ ss ⇓ r
-------------------------
γ ⊢ s ss ⇓ r
Exercise: How to change while? One solution:
γ ⊢ e ⇓ ⟨true; γ₁⟩ γ₁ ⊢ { s } while e (s) ⇓ r
-------------------------------------------------------------
γ ⊢ while (e) s ⇓ r
Function call.
γ ⊢ e₁ ⇓ ⟨v₁, γ₁⟩
γ₁ ⊢ e₂ ⇓ ⟨v₂, γ₂⟩
...
γₙ₋₁ ⊢ eₙ ⇓ ⟨vₙ, γₙ⟩
(x₁=v₁,...,xₙ=vₙ) ⊢ ss ⇓ v
------------------------------- σ(f) = t f (t₁ x₁,...,tₙ xₙ) { ss }
γ ⊢ f(e₁,...,eₙ) ⇓ ⟨v, γₙ⟩
To implement the side condition, we need a global map σ from function names f to their definition t f (t₁ x₁,...,tₙ xₙ) { ss }.
In Java, we can use Java's exception mechanism.
eval(ECall f es):
vs ← eval(es)
(Δ,ss) ← lookupFun(f)
γ ← saveEnv()
setEnv(makeEnv(Δ,vs))
try {
execStms(ss)
} catch {
ReturnException v:
setEnv(γ)
return v
}
execStm(SReturn e):
v ← eval(e)
throw new ReturnException(v)In Haskell, we can use the exception monad.
import Control.Monad.Except
import Control.Monad.Reader
import Control.Monad.State
type M = ReaderT Sig (StateT Env (ExceptT Value IO))
eval :: Exp → M Value
eval (ECall f es) = do
vs ← mapM eval es
(Δ,ss) ← lookupFun f
γ ← get
put (makeEnv Δ vs)
execStms ss `catchError` λ v → do
put γ
return v
execStm :: Stm → M ()
execStm (SReturn e) = do
v ← eval e
throwError vMore in the hands-on lecture...
A program is interpreted by executing the statements of the main function in an environment with just one empty block.