Book: Chapter 1
Implementation:
Theory:
Theory of implementation:
To write programming language implementations
To design your own languages
Moreover, prerequisite to some other courses
1940's: connecting wires to represent 0's and 1's
1950's: assemblers, macro assemblers, FORTRAN, COBOL, LISP
1960's: ALGOL, BCPL (-> B -> C), SIMULA
1970's: Prolog, ML
1980's: C++, Perl, Python
1990's: Haskell, Java
Evolution: from lower to higher levels
"Generations": see book, p. 13
---------------------------------------------- human natural language Haskell Lisp Prolog Java C assembler machine language ---------------------------------------------- machine
In a way, revert the history of programming languages.
Source language:
5 * (9 + 12)
Assembly language (Intel x86):
mov eax,9 mov ebx,12 add eax,ebx mov ebx,5 mul ebx
Machine language (in Hex, approximately...):
B8 09 00 BB 0C 00 03 C3 BB 05 00 F7 03
A compiler translates source language to assembly language.
An assembler translates assembly language to machine language.
But often the whole chain from source to machine is called compilation.
Source language expression:
5 * (9 + 12)
Value:
105
An interpreter computes the value without translating the source code into machine code.
It is common to do compilation into something else than machine language, and then interpret this language.
Example: Java to JVM (Java Virtual Machine).
Java source code:
5 * (9 + 12)
JVM assembly code (Jasmin), and corresponding byte code (Hex, binary):
bipush 9 10 09 0001 0000 0000 1001 bipush 12 10 0C 0001 0000 0000 1100 iadd 60 0110 0000 bipush 5 10 05 0001 0000 0000 0101 imul 68 0110 1000
The value is obtained by an interpreter of the bytecode.
Key to JVM code:
operation | hex code | arguments | semantics | |
---|---|---|---|---|
bipush |
0x10 | byte | push constant | |
iadd |
0x60 | - | add | |
imul |
0x68 | - | multiply | |
isub |
0x64 | - | subtract |
Compiled vs. interpreted languages?
This is a misnomer: any language can have both an interpreter and a compiler.
The division refers to the usual tools:
+ Interpretation
+ Compilation
The advent of virtual machines such as VMWare is blurring the distinction.
2 * (31+result) character stream | lexer v 2 * ( 31 + result ) token list | parser v * 2 (+ 31 result) syntax tree | type checker v i* 2 (i+ 31 i-result) annotated syntax tree | code generator v bipush 31 assembly code iload_1 iadd iconst_2 imul
Each compiler phase can fail with a characteristic error.
Lexer errors
"hello
Parse errors
(4 * (y + 5) - 12))
Type errors
sort(45)
Errors on later phases are not commonly supported.
A good compiler finds an error at the earliest occation.
Usually, some errors are left to run time:
The ones above are the main phases. There can be many more, for instance,
Desugaring/normalization: remove syntactic sugar
int i, j ; ---> int i ; int j ;
This is normally done at the syntax tree level.
Optimizations:
i = 2 + 2 ; ---> i = 4 ; bipush 31 ; bipush 31 ---> bipush 31 ; dup
This can happen on many different levels.
Write regular expressions and implement lexers
Write grammars and implement parsers
Write typing rules and implement type checkers
Write semantic rules and implement interpreters
Write compilation schemes and implement code generators
Syntax-directed translation is a tool used in both type checkers, interpreters, and compilers
We learn the basic ideas of this for both imperative and functional languages.
Special-purpose language/domain-specific language - the ultimate abstraction for a domain.
Often a language grows out from a "notation" in an uncontrolled fashion (e.g. make
)
It can be better to think in terms of a language from the beginning.
The BNF Converter is a tool that helps to implement a language. It is available for C, C++, C#, Haskell, Java, and OCaml.
You will experience that language implementation is easy and productive.
We go through the course web page, including