Lecture 1: Introduction

Programming Languages Course
Aarne Ranta (aarne@chalmers.se)

Book: Chapter 1

This course is about programming languages

Implementation:

compilers
interpreters
tools

Theory:

language structures
semantics

Theory of implementation:

regular expressions, grammars, and parsing
type systems
syntax-directed translation

What you will learn

To write programming language implementations

parsers
interpreters
(some quite simple) compilers

To design your own languages

powerful programming technique: domain-specific language
useful to know the "design space"
confidence: this is not so difficult!

Moreover, prerequisite to some other courses

Compiler Construction
(useful for) Frontiers of Programming Language Technology
(useful for) Models of Computation
(useful for) Programming Paradigms
...

History of programming languages

1940's: connecting wires to represent 0's and 1's

1950's: assemblers, macro assemblers, FORTRAN, COBOL, LISP

1960's: ALGOL, BCPL (-> B -> C), SIMULA

1970's: Prolog, ML

1980's: C++, Perl, Python

1990's: Haskell, Java

Evolution: from lower to higher levels

"Generations": see book, p. 13

Language levels

  ---------------------------------------------- human
    natural language
  
  
                  Haskell
  
                            Lisp     Prolog
  
    Java
  
    C
  
  
    assembler
  
    machine language
  ---------------------------------------------- machine

What is compilation

In a way, revert the history of programming languages.

Source language:

    5 * (9 + 12)

Assembly language (Intel x86):

    mov eax,9
    mov ebx,12
    add eax,ebx
    mov ebx,5
    mul ebx

Machine language (in Hex, approximately...):

A compiler translates source language to assembly language.

An assembler translates assembly language to machine language.

But often the whole chain from source to machine is called compilation.

What is interpretation

Source language expression:

    5 * (9 + 12)

Value:

An interpreter computes the value without translating the source code into machine code.

Compilation + interpretation

It is common to do compilation into something else than machine language, and then interpret this language.

Example: Java to JVM (Java Virtual Machine).

Java source code:

    5 * (9 + 12)

JVM assembly code (Jasmin), and corresponding byte code (Hex, binary):

    bipush 9      10 09       0001 0000 0000 1001
    bipush 12     10 0C       0001 0000 0000 1100
    iadd          60          0110 0000
    bipush 5      10 05       0001 0000 0000 0101
    imul          68          0110 1000

The value is obtained by an interpreter of the bytecode.

Key to JVM code:

operation	hex code	arguments	semantics
`bipush`	0x10	byte	push constant
`iadd`	0x60	-	add
`imul`	0x68	-	multiply
`isub`	0x64	-	subtract

Compiled and interpreted languages

Compiled vs. interpreted languages?

This is a misnomer: any language can have both an interpreter and a compiler.

The division refers to the usual tools:

C is usually compiled to machine code by GCC
Java is usually compiled to JVM bytecode by Javac, and this bytecode is usually interpreted using JVM
JavaScript is interpreted in web browsers
Unix shell scripts are interpreted by the shell
Haskell programs are either compiled using GHC, or interpreted (via bytecode) using Hugs or GHCI.

Trade-offs

+ Interpretation

faster to get going
easier to implement
portable

+ Compilation

faster to execute resulting code
machine-independent target code easier to interpret/compile on new machines

The advent of virtual machines such as VMWare is blurring the distinction.

Compiler phases and data structures

    2 * (31+result)                     character stream
       
       |                  lexer
       v
  
    2 * ( 31 + result )                 token list
       
       |                  parser
       v
  
    * 2 (+ 31 result)                   syntax tree
       
       |                  type checker
       v
  
    i* 2 (i+ 31 i-result)               annotated syntax tree
       
       |                  code generator
       v
  
    bipush 31                           assembly code
    iload_1
    iadd 
    iconst_2 
    imul

Compiler errors

Each compiler phase can fail with a characteristic error.

Lexer errors

   "hello

Parse errors

    (4 * (y + 5) - 12))

Type errors

    sort(45)

Errors on later phases are not commonly supported.

A good compiler finds an error at the earliest occation.

Usually, some errors are left to run time:

array index out of bouns
bugs

More compiler phases

The ones above are the main phases. There can be many more, for instance,

Desugaring/normalization: remove syntactic sugar

    int i, j ;   --->  int i ; int j ;

This is normally done at the syntax tree level.

Optimizations:

    i = 2 + 2 ;            --->  i = 4 ;
  
    bipush 31 ; bipush 31  --->  bipush 31 ; dup

This can happen on many different levels.

What we learn in this course (in more detail)

Write regular expressions and implement lexers

Write grammars and implement parsers

Write typing rules and implement type checkers

Write semantic rules and implement interpreters

Write compilation schemes and implement code generators

Syntax-directed translation is a tool used in both type checkers, interpreters, and compilers

We learn the basic ideas of this for both imperative and functional languages.

Language as a programming technique

Special-purpose language/domain-specific language - the ultimate abstraction for a domain.

Often a language grows out from a "notation" in an uncontrolled fashion (e.g. make)

It can be better to think in terms of a language from the beginning.

The BNF Converter is a tool that helps to implement a language. It is available for C, C++, C#, Haskell, Java, and OCaml.

You will experience that language implementation is easy and productive.

The structure of the course

We go through the course web page, including

the lecture schema
the lab assignments and their deadlines
lab supervision
literature
exercises and extra credits
exam