Lecture 1: Introduction Programming Languages Course Aarne Ranta (aarne@chalmers.se) %!target:html %!postproc(html): #NEW Book: Chapter 1 #NEW ==This course is about programming languages== Implementation: - compilers - interpreters - tools Theory: - language structures - semantics Theory of implementation: - regular expressions, grammars, and parsing - type systems - syntax-directed translation #NEW ==What you will learn== To write programming language implementations - parsers - interpreters - (some quite simple) compilers To design your own languages - powerful programming technique: domain-specific language - useful to know the "design space" - confidence: this is not so difficult! Moreover, prerequisite to some other courses - Compiler Construction - (useful for) Frontiers of Programming Language Technology - (useful for) Models of Computation - (useful for) Programming Paradigms - ... #NEW ==History of programming languages== 1940's: connecting wires to represent 0's and 1's 1950's: assemblers, macro assemblers, FORTRAN, COBOL, LISP 1960's: ALGOL, BCPL (-> B -> C), SIMULA 1970's: Prolog, ML 1980's: C++, Perl, Python 1990's: Haskell, Java Evolution: from lower to higher levels "Generations": see book, p. 13 #NEW ==Language levels== ``` ---------------------------------------------- human natural language Haskell Lisp Prolog Java C assembler machine language ---------------------------------------------- machine ``` #NEW ==What is compilation== In a way, revert the history of programming languages. Source language: ``` 5 * (9 + 12) ``` Assembly language (Intel x86): ``` mov eax,9 mov ebx,12 add eax,ebx mov ebx,5 mul ebx ``` Machine language (in Hex, approximately...): ``` B8 09 00 BB 0C 00 03 C3 BB 05 00 F7 03 ``` A **compiler** translates source language to assembly language. An **assembler** translates assembly language to machine language. But often the whole chain from source to machine is called compilation. #NEW ==What is interpretation== Source language expression: ``` 5 * (9 + 12) ``` Value: ``` 105 ``` An **interpreter** computes the value without translating the source code into machine code. #NEW ==Compilation + interpretation== It is common to do compilation into something else than machine language, and then interpret this language. Example: Java to JVM (Java Virtual Machine). Java source code: ``` 5 * (9 + 12) ``` JVM assembly code (Jasmin), and corresponding byte code (Hex, binary): ``` bipush 9 10 09 0001 0000 0000 1001 bipush 12 10 0C 0001 0000 0000 1100 iadd 60 0110 0000 bipush 5 10 05 0001 0000 0000 0101 imul 68 0110 1000 ``` The value is obtained by an interpreter of the bytecode. Key to JVM code: || operation | hex code | arguments | semantics || | ``bipush`` | 0x10 | byte | push constant | ``iadd`` | 0x60 | - | add | ``imul`` | 0x68 | - | multiply | ``isub`` | 0x64 | - | subtract #NEW ==Compiled and interpreted languages== Compiled vs. interpreted languages? This is a misnomer: any language can have both an interpreter and a compiler. The division refers to the usual tools: - C is usually compiled to machine code by GCC - Java is usually compiled to JVM bytecode by Javac, and this bytecode is usually interpreted using JVM - JavaScript is interpreted in web browsers - Unix shell scripts are interpreted by the shell - Haskell programs are either compiled using GHC, or interpreted (via bytecode) using Hugs or GHCI. #NEW ==Trade-offs== **+** Interpretation - faster to get going - easier to implement - portable **+** Compilation - faster to execute resulting code - machine-independent target code easier to interpret/compile on new machines The advent of virtual machines such as VMWare is blurring the distinction. #NEW ==Compiler phases and data structures== ``` 2 * (31+result) character stream | lexer v 2 * ( 31 + result ) token list | parser v * 2 (+ 31 result) syntax tree | type checker v i* 2 (i+ 31 i-result) annotated syntax tree | code generator v bipush 31 assembly code iload_1 iadd iconst_2 imul ``` #NEW ==Compiler errors== Each compiler phase can fail with a characteristic error. Lexer errors ``` "hello ``` Parse errors ``` (4 * (y + 5) - 12)) ``` Type errors ``` sort(45) ``` Errors on later phases are not commonly supported. A good compiler finds an error at the earliest occation. Usually, some errors are left to run time: - array index out of bouns - bugs #NEW ==More compiler phases== The ones above are the main phases. There can be many more, for instance, Desugaring/normalization: remove syntactic sugar ``` int i, j ; ---> int i ; int j ; ``` This is normally done at the syntax tree level. Optimizations: ``` i = 2 + 2 ; ---> i = 4 ; bipush 31 ; bipush 31 ---> bipush 31 ; dup ``` This can happen on many different levels. #NEW ==What we learn in this course (in more detail)== Write regular expressions and implement lexers Write grammars and implement parsers Write typing rules and implement type checkers Write semantic rules and implement interpreters Write compilation schemes and implement code generators **Syntax-directed translation** is a tool used in both type checkers, interpreters, and compilers We learn the basic ideas of this for both imperative and functional languages. #NEW ==Language as a programming technique== Special-purpose language/domain-specific language - the ultimate abstraction for a domain. Often a language grows out from a "notation" in an uncontrolled fashion (e.g. ``make``) It can be better to think in terms of a language from the beginning. The **BNF Converter** is a tool that helps to implement a language. It is available for C, C++, C#, Haskell, Java, and OCaml. You will experience that language implementation is easy and productive. #NEW ==The structure of the course== We go through the [course web page ../index.html], including - the lecture schema - the lab assignments and their deadlines - lab supervision - literature - exercises and extra credits - exam