Programming Language Technology DAT151/DIT231 Andreas Abel
Why an interpreter if we can have a compiler?
I
for new language X
in existing language Y
.C
for X
in X
.I
) C
on I
to get a compiled interpreter I'
.I'
) C
on C
to get a compiled compiler C'
.O
for X
in X
.C'
on O
to get an a compiled optimizing compiler Cₒ
.Cₒ
on Cₒ
to get an optimized optimizing compiler Cₒ'
.An interpreter should be compositional!
⟦op (e₁, e₂)⟧ = ⟦op⟧ (⟦e₁⟧, ⟦e₂⟧)
eval (Op(e₁,e₂)): v₁ ← eval e₁ v₂ ← eval e₂ v ← combine v₁ and v₂ according to Op return v
An interpreter can be specified by:
⟦_⟧ : Expr → Value
(domain theory)._⇓_ ⊆ Expr × Value
(big-step operational semantics)._→_ ⊆ Expr × Expr
(small-step operational semantics).We interpret type-checked programs only, since the meaning of overloaded operators depends on their type.
In analogy to type-checking Γ ⊢ e : t
we have judgement
γ ⊢ e ⇓ v
"In environment
γ
, expressione
has valuev
."
Environments γ
are similar to contexts Γ
, but instead of mapping variables x
to types t
, they map variables to values v
.
An environment γ
is a stack of blocks δ
, which are finite maps from variables x
to values v
.
Values are integer, floating-point, and boolean literals, or a special value null
for being undefined.
This would be an LBNF grammar for our values:
VInt. Value ::= Integer;
VDouble. Value ::= Double;
VBool. Value ::= Bool;
VNull. Value ::= "null";
rules Bool ::= "true" | "false";
(It is possible but not necessary to use BNFC to generate the value representation.)
Variable rule.
---------- γ(x) = v γ ⊢ x ⇓ v
Note that we take the value of the "topmost" x
---it may appear in several blocks δ
of γ
.
double z = 3.14; // (z=3.14)
{ int y = 1; // (z=3.14,y=1)
int x = 0; // (z=3.14,y=1).(x=0)
{ int z = x + y; // (z=3.14,y=1).(x=0,z=1)
// 1
printInt(z);
}// 3.14
printDouble(z); }
The meaning of an arithmetic operator depends on its static type.
γ ⊢ e₁ ⇓ v₁ γ ⊢ e₂ ⇓ v₂ --------------------------- v = divide(t,v₁,v₂) γ ⊢ e₁ /ₜ e₂ ⇓ v
divide(int,v₁,v₂)
is integer division of the integer literals v₁
by v₂
.divide(double,v₁,v₂)
is floating-point division of the floating-point literals v₁
by v₂
. Integer literals will be converted to floating-point first.divide(bool,v₁,v₂)
is undefined.Judgement γ ⊢ e ⇓ v
should be read as a function with inputs γ
and e
and output v
.
eval(Env γ, Exp e): Value
eval(γ, EId x) = lookup(γ,x)
eval(γ, EInt i) = VInt i
eval(γ, EDouble d) = VDouble d
eval(γ, EDiv t e₁ e₂) = divide(t, eval(γ,e₁), eval(γ,e₂))
In the last clause, eval(γ,e₁)
and eval(γ,e₂)
can be run in any order, even in parallel!
Type soundness (weak correctness):
If
e : t
ande ⇓ v
thenv : t
.
"If expression e
has type t
and e
evaluates to value v
then v
also has type t
."
Termination (strong correctness):
If
e : t
thene ⇓ v
for somev : t
.
Allowing non-termination:
If
e : t
then either evaluation ofe
diverges, ore ⇓ v
withv : t
.
Expression forms like increment (++x
and x++
) and decrement and assignment x = e
in general change the values of variables.
This is called a (side) effect.
(In contrast, the type of variables never changes. Typing has no effects.)
We need to update the environment.
We return the updated environment along with the value:
γ ⊢ e ⇓ ⟨v; γ'⟩
"In environment
γ
, expressione
has valuev
and updates environment toγ'
."
We write γ[x=v]
for environment γ'
with γ'(x) = v
and γ'(z) = γ(z)
when z ≠ x
.
Note that we update the "topmost" x
only --- it may appear in several blocks δ
of γ
.
Rules.
-------------- γ(x) = v γ ⊢ x ⇓ ⟨v; γ⟩ -------------------------------- γ(x) = i γ ⊢ ++(int)x ⇓ ⟨i+1; γ[x = i+1]⟩ γ ⊢ e ⇓ ⟨v; γ'⟩ -------------------------- γ ⊢ (x = e) ⇓ ⟨v; γ'[x=v]⟩ γ ⊢ e₁ ⇓ ⟨v₁;γ₁⟩ γ₁ ⊢ e₂ ⇓ ⟨v₂;γ₂⟩ ------------------------------------ v = divide(t,v₁,v₂) γ ⊢ e₁ /ₜ e₂ ⇓ ⟨v;γ₂⟩
The last rule show: we need to thread the environment through the judgements.
The evaluation order matters now!
eval(Env γ, Exp e): Value × Env
eval(γ, EId x) =
⟨ lookup(γ,x), γ ⟩
eval(γ, EAssign x e) = let
⟨v,γ'⟩ = eval(γ,e)
in ⟨ v, update(γ',x,v) ⟩
eval(γ, EDiv t e₁ e₂) = let
⟨v₁,γ₁⟩ = eval(γ, e₁)
⟨v₂,γ₂⟩ = eval(γ₁,e₂)
in ⟨ divide(t,v₁,v₂), γ₂ ⟩
Consider
x=0 ⊢ x + x++
vs.
x=0 ⊢ x++ + x.
E.g. in Java, we can use a environment env
global to the interpreter and mutate it with update.
eval(Exp e): Value
eval(EId x):
return env.lookup(x)
eval(EAssign x e):
v ← eval(e)
env.update(x,v)
return v
eval(γ, EDiv t e₁ e₂):
v₁ ← eval(e₁)
v₂ ← eval(e₂)
return divide(t,v₁,v₂)
This keeps the environment implicit.
In Haskell, we can use the state monad for the same purpose.
import Control.Monad.State (State, get, modify, evalState)
: Exp → State Env Value
eval
EId x) = do
eval (← get
γ return (lookupVar γ x)
EAssign x e) = do
eval (← eval e
v → updateVar γ x v)
modify (λ γ return v
EDiv TInt e₁ e₂) = do
eval (VInt i₁ ← eval e₁
VInt i₂ ← eval e₂
return (VInt (div i₁ i₂))
: Exp → Value
interpret = evalState (eval e) emptyEnv interpret e
The Haskell state monad is just sugar. It is implemented roughly by:
type State s a = s → (a, s)
get :: State s s -- s → (s, s)
= (s, s)
get s
modify :: (s → s) → State s () -- s → ((), s)
= ((), f s)
modify f s
return :: a → State s a -- s → (a, s)
return a s = (a, s)
do x ← p; q(x)) s = let
(= p s
(a, s₁) = q(a) s₁
(b, s₂) in (b, s₂)
N.B.: For the Haskell hacker: (do x ← p; q x) = uncurry q . p
.
We need to start with good environments γ : Γ
.
This is defined pointwise: γ(x) : Γ(x)
for all x
in scope.
Type soundness (weak correctness):
If
Γ ⊢ e : t
andγ : Γ
andγ ⊢ e ⇓ ⟨v; γ'⟩
thenv : t
andγ' : Γ
.
"If expression e
has type t
in context Γ
and γ
is an environment matching Γ
and e
evaluates to value v
and γ'
then v
also has type t
and γ'
also matches Γ
."
Termination (strong correctness):
If
Γ ⊢ e : t
andγ : Γ
thenγ ⊢ e ⇓ ⟨v; γ'⟩
for somev : t
andγ' : Γ
.
Allowing non-termination:
... (Exercise)
Input and output:
---------------------- read i from stdin γ ⊢ readInt() ⇓ ⟨i; γ⟩ γ ⊢ e ⇓ ⟨i; γ'⟩ ---------------------------- print i on stdout γ ⊢ printInt(e) ⇓ ⟨null; γ'⟩
We do not model input and output mathematically here.
Note that read i from stdin
can also abort with an exception.
Exercise: Could we model evaluation with input and output as a mathematical function?
Hint:
return
statement below.)Statements do not have a value. They just have effects.
γ ⊢ s ⇓ γ'
γ ⊢ ss ⇓ γ'
"In environment
γ
, statements
(sequencess
) executes successfully, returning updated environmentγ'
."
Sequencing γ ⊢ ss ⇓ γ'
:
γ ⊢ ε ⇓ γ γ ⊢ s ⇓ γ₁ γ₁ ⊢ ss ⇓ γ₂ --------------------------- γ ⊢ s ss ⇓ γ₂
An empty sequence is interpreted as the identity function id : Env → Env
,
sequence composition as function composition.
Blocks.
γ. ⊢ ss ⇓ γ'.δ -------------- γ ⊢ {ss} ⇓ γ'
Variable declaration.
γ.δ[x=null] ⊢ e ⇓ ⟨v, γ'.δ'⟩ ---------------------------- γ.δ ⊢ t x = e; ⇓ γ'.δ'[x=v]
While.
γ ⊢ e ⇓ ⟨false; γ'⟩ -------------------- γ ⊢ while (e) s ⇓ γ' γ ⊢ e ⇓ ⟨true; γ₁⟩ γ₁. ⊢ s ⇓ γ₂.δ γ₂ ⊢ while e (s) ⇓ γ₃ ------------------------------------------------------------- γ ⊢ while (e) s ⇓ γ₃
Or:
γ ⊢ e ⇓ ⟨true; γ₁⟩ γ₁ ⊢ { s } while e (s) ⇓ γ₃ ------------------------------------------------------------- γ ⊢ while (e) s ⇓ γ₃
Return?
γ ⊢ e ⇓ ⟨v, γ'⟩ ------------------- γ ⊢ return e; ⇓ ??
Return alters the control flow:
We can exit from the middle of a loop, block etc.!
bool prime (int p) {
if (p <= 2) return p == 2;
else {
int q = 3;
while (q * q <= p) {
if (divides(q,p)) return false;
else q = q + 2;
}
}return true;
}
Similar to return
: break
, continue
.
return
can be modelled as an exception that carries the return value v
as exception information.
When calling a function, we handle the exception, treating it as regular return from the function.
γ ⊢ s ⇓ r
wherer ::= v | γ'
"In environment
γ
statements
executes successfully,
either asking to return valuev
,
or to continue execution in updated environmentγ'
."
The disjoint union Value ⊎ Env
could be modeled in LBNF as:
Return. Result ::= "return" Value;
Continue. Result ::= "continue" Env;
Return.
γ ⊢ e ⇓ ⟨v, γ'⟩ ------------------ γ ⊢ return e; ⇓ v
Statement as expression.
γ ⊢ e ⇓ ⟨v, γ'⟩ --------------- γ ⊢ e; ⇓ γ'
Sequence: Propagate exception.
γ ⊢ s ⇓ v ------------- γ ⊢ s ss ⇓ v γ ⊢ s ⇓ γ₁ γ₁ ⊢ ss ⇓ r ------------------------- γ ⊢ s ss ⇓ r
Exercise: How to change while
?
One solution:
γ ⊢ e ⇓ ⟨true; γ₁⟩ γ₁ ⊢ { s } while e (s) ⇓ r ------------------------------------------------ γ ⊢ while (e) s ⇓ r
Function call.
γ ⊢ e₁ ⇓ ⟨v₁, γ₁⟩ γ₁ ⊢ e₂ ⇓ ⟨v₂, γ₂⟩ ... γₙ₋₁ ⊢ eₙ ⇓ ⟨vₙ, γₙ⟩ (x₁=v₁,...,xₙ=vₙ) ⊢ ss ⇓ v -------------------------- σ(f) = t f (t₁ x₁,...,tₙ xₙ) { ss } γ ⊢ f(e₁,...,eₙ) ⇓ ⟨v, γₙ⟩
To implement the side condition, we need a global map σ
from function names f
to their definition t f (t₁ x₁,...,tₙ xₙ) { ss }
.
In Java, we can use Java's exception mechanism.
eval(ECall f es):
eval(es) // Evaluate list of arguments
vs ← lookupFun(f) // Get parameters and body of function
(Δ,ss) ← saveEnv() // Save current environment
γ ← setEnv(makeEnv(Δ,vs)) // New environment binding the parameters
try {
execStms(ss) // Run function body
catch {
}
ReturnException v: setEnv(γ) // Restore environment
return v // Evaluation result is returned value
}
execStm(SReturn e):
eval(e)
v ← throw new ReturnException(v)
In Haskell, we can use the exception monad.
import Control.Monad.Except
import Control.Monad.Reader
import Control.Monad.State
type M = ReaderT Sig (StateT Env (ExceptT Value IO))
eval :: Exp → M Value
ECall f es) = do
eval (← mapM eval es
vs ← lookupFun f
(Δ,ss) ← get
γ
put (makeEnv Δ vs)`catchError` λ v → do
execStms ss
put γreturn v
execStm :: Stm → M ()
SReturn e) = do
execStm (← eval e
v throwError v
More in the hands-on lecture...
A program is interpreted by executing the statements of the main
function in an environment with just one empty block.
What is wrong with this rule?
γ ⊢ e₁ ⇓ ⟨b₁;γ₁⟩ γ₁ ⊢ e₂ ⇓ ⟨b₂;γ₂⟩ ------------------------------------ b = b₁ ∧ b₂ γ ⊢ e₁ && e₂ ⇓ ⟨b;γ₂⟩
Here are some examples where we would like to shortcut computation:
int b = 1;
if (b == 0 && the_goldbach_conjecture_holds_up_to_10E100) { ... }
int x = 0 * number_of_atoms_on_the_moon;
Short-cutting logical operators like &&
is essentially used in C, e.g.:
if (p != NULL && p[0] > 10)
Without short-cutting, the program would crash in p[0]
by accessing a NULL
pointer.
Rules with short-cutting:
γ ⊢ e₁ ⇓ ⟨false; γ₁⟩ ------------------------- γ ⊢ e₁ && e₂ ⇓ ⟨false; γ₁⟩ γ ⊢ e₁ ⇓ ⟨true; γ₁⟩ γ₁ ⊢ e₂ ⇓ ⟨b; γ₂⟩ ---------------------------------------- γ ⊢ e₁ && e₂ ⇓ ⟨b; γ₂⟩
The effects of e₂
are not executed if e₁
already evaluated to false
.
Example:
while (x < 10 && f(x++)) { ... }
You can circumvent this by defining your own and
:
bool and (bool x, bool y) { return x && y; }
while (and (x < 10, f(x++))) { ... }
Digression on: call-by-name (call-by-need) call-by-value
Example 1:
double (x) = x + x
double (1+2)
... = double (3) = 3 + 3 = 6
... = (1+2) + (1+2) = 3 + (1+2) = 3 + 3 = 6
... = let x=1+2 in x + x = let x = 3 in x + x = 3 + 3 = 6
Example 2:
zero (x) = 0
zero (1+2)
... = zero (3) = 0
... = 0
... = let x=1+2 in 0 = 0
Languages with effects (such as C/C++) mostly use call-by-value.
Haskell is pure (no effects) and uses call-by-need, which refines call-by-name.