Lecture 1: Introduction
Programming Languages Course
Aarne Ranta (aarne@chalmers.se)

%!target:html

%!postproc(html): #NEW <!-- NEW -->


Book: Chapter 1


#NEW

==This course is about programming languages==

Implementation:
- compilers
- interpreters
- tools


Theory:
- language structures
- semantics


Theory of implementation:
- regular expressions, grammars, and parsing
- type systems
- syntax-directed translation


#NEW

==What you will learn==

To write programming language implementations
- parsers
- interpreters
- (some quite simple) compilers


To design your own languages
- powerful programming technique: domain-specific language
- useful to know the "design space"
- confidence: this is not so difficult!


Moreover, prerequisite to some other courses
- Compiler Construction
- (useful for) Frontiers of Programming Language Technology
- (useful for) Models of Computation
- (useful for) Programming Paradigms
- ...


#NEW

==History of programming languages==

1940's: connecting wires to represent 0's and 1's

1950's: assemblers, macro assemblers, FORTRAN, COBOL, LISP

1960's: ALGOL, BCPL (-> B -> C), SIMULA

1970's: Prolog, ML

1980's: C++, Perl, Python

1990's: Haskell, Java

Evolution: from lower to higher levels

"Generations": see book, p. 13


#NEW

==Language levels==

```
---------------------------------------------- human
  natural language


                Haskell

                          Lisp     Prolog

  Java

  C


  assembler

  machine language
---------------------------------------------- machine
```


#NEW

==What is compilation==

In a way, revert the history of programming languages.

Source language:
```
  5 * (9 + 12)
```
Assembly language (Intel x86):
```
  mov eax,9
  mov ebx,12
  add eax,ebx
  mov ebx,5
  mul ebx
```
Machine language (in Hex, approximately...):
```
  B8 09 00
  BB 0C 00
  03 C3 
  BB 05 00
  F7 03
```
A **compiler** translates source language to assembly language.

An **assembler** translates assembly language to machine language.

But often the whole chain from source to machine is called compilation.


#NEW

==What is interpretation==

Source language expression:
```
  5 * (9 + 12)
```
Value:
```
  105
```
An **interpreter** computes the value without translating the source
code into machine code.


#NEW

==Compilation + interpretation==

It is common to do compilation into something else than
machine language, and then interpret this language.

Example: Java to JVM (Java Virtual Machine).

Java source code:
```
  5 * (9 + 12)
```
JVM assembly code (Jasmin), and corresponding byte code (Hex, binary):
```
  bipush 9      10 09       0001 0000 0000 1001
  bipush 12     10 0C       0001 0000 0000 1100
  iadd          60          0110 0000
  bipush 5      10 05       0001 0000 0000 0101
  imul          68          0110 1000
```
The value is obtained by an interpreter of the bytecode.

Key to JVM code:

|| operation | hex code | arguments  | semantics || 
| ``bipush`` |     0x10 | byte       | push constant
| ``iadd``   |     0x60 | -          | add
| ``imul``   |     0x68 | -          | multiply
| ``isub``   |     0x64 | -          | subtract


#NEW

==Compiled and interpreted languages==

Compiled vs. interpreted languages?

This is a misnomer: any language can have both an interpreter and
a compiler.

The division refers to the usual tools:
- C is usually compiled to machine code by GCC
- Java is usually compiled to JVM bytecode by Javac, and this
  bytecode is usually interpreted using JVM
- JavaScript is interpreted in web browsers
- Unix shell scripts are interpreted by the shell
- Haskell programs are either compiled using GHC, or interpreted
  (via bytecode) using Hugs or GHCI.


#NEW

==Trade-offs==

**+** Interpretation
- faster to get going
- easier to implement
- portable


**+** Compilation
- faster to execute resulting code
- machine-independent target code easier to interpret/compile on new machines


The advent of virtual machines such as VMWare is blurring the distinction.


#NEW

==Compiler phases and data structures==

```
  2 * (31+result)                     character stream
     
     |                  lexer
     v

  2 * ( 31 + result )                 token list
     
     |                  parser
     v

  * 2 (+ 31 result)                   syntax tree
     
     |                  type checker
     v

  i* 2 (i+ 31 i-result)               annotated syntax tree
     
     |                  code generator
     v

  bipush 31                           assembly code
  iload_1
  iadd 
  iconst_2 
  imul
```


#NEW

==Compiler errors==

Each compiler phase can fail with a characteristic error.

Lexer errors
```
 "hello
```
Parse errors
```
  (4 * (y + 5) - 12))
```
Type errors
```
  sort(45)
```
Errors on later phases are not commonly supported.

A good compiler finds an error at the earliest occation.

Usually, some errors are left to run time:
- array index out of bouns
- bugs


#NEW

==More compiler phases==

The ones above are the main phases. There can be many more, for instance,

Desugaring/normalization: remove syntactic sugar
```
  int i, j ;   --->  int i ; int j ;
```
This is normally done at the syntax tree level.

Optimizations:
```
  i = 2 + 2 ;            --->  i = 4 ;

  bipush 31 ; bipush 31  --->  bipush 31 ; dup
```
This can happen on many different levels.


#NEW

==What we learn in this course (in more detail)==

Write regular expressions and implement lexers

Write grammars and implement parsers

Write typing rules and implement type checkers

Write semantic rules and implement interpreters

Write compilation schemes and implement code generators

**Syntax-directed translation** is a tool used in
both type checkers, interpreters, and compilers

We learn the basic ideas of this for both imperative
and functional languages.


#NEW

==Language as a programming technique==

Special-purpose language/domain-specific language - the ultimate abstraction for a domain.

Often a language grows out from a "notation" in an uncontrolled fashion (e.g. ``make``)

It can be better to think in terms of a language from the beginning.

The **BNF Converter** is a tool that helps to implement a language.
It is available for C, C++, C#, Haskell, Java, and OCaml.

You will experience that language implementation is easy and
productive.


#NEW

==The structure of the course==

We go through the [course web page ../index.html], including

- the lecture schema

- the lab assignments and their deadlines

- lab supervision

- literature

- exercises and extra credits

- exam