Lecture 1: Introduction

Programming Languages Course
Aarne Ranta (aarne@chalmers.se)

Book: Chapter 1

This course is about programming languages

Implementation:

Theory:

Theory of implementation:

What you will learn

To write programming language implementations

To design your own languages

Moreover, prerequisite to some other courses

History of programming languages

1940's: connecting wires to represent 0's and 1's

1950's: assemblers, macro assemblers, FORTRAN, COBOL, LISP

1960's: ALGOL, BCPL (-> B -> C), SIMULA

1970's: Prolog, ML

1980's: C++, Perl, Python

1990's: Haskell, Java

Evolution: from lower to higher levels

"Generations": see book, p. 13

Language levels

  ---------------------------------------------- human
    natural language
  
  
                  Haskell
  
                            Lisp     Prolog
  
    Java
  
    C
  
  
    assembler
  
    machine language
  ---------------------------------------------- machine

What is compilation

In a way, revert the history of programming languages.

Source language:

    5 * (9 + 12)

Assembly language (Intel x86):

    mov eax,9
    mov ebx,12
    add eax,ebx
    mov ebx,5
    mul ebx

Machine language (in Hex, approximately...):

    B8 09 00
    BB 0C 00
    03 C3 
    BB 05 00
    F7 03

A compiler translates source language to assembly language.

An assembler translates assembly language to machine language.

But often the whole chain from source to machine is called compilation.

What is interpretation

Source language expression:

    5 * (9 + 12)

Value:

    105

An interpreter computes the value without translating the source code into machine code.

Compilation + interpretation

It is common to do compilation into something else than machine language, and then interpret this language.

Example: Java to JVM (Java Virtual Machine).

Java source code:

    5 * (9 + 12)

JVM assembly code (Jasmin), and corresponding byte code (Hex, binary):

    bipush 9      10 09       0001 0000 0000 1001
    bipush 12     10 0C       0001 0000 0000 1100
    iadd          60          0110 0000
    bipush 5      10 05       0001 0000 0000 0101
    imul          68          0110 1000

The value is obtained by an interpreter of the bytecode.

Key to JVM code:

operation hex code arguments semantics
bipush 0x10 byte push constant
iadd 0x60 - add
imul 0x68 - multiply
isub 0x64 - subtract

Compiled and interpreted languages

Compiled vs. interpreted languages?

This is a misnomer: any language can have both an interpreter and a compiler.

The division refers to the usual tools:

Trade-offs

+ Interpretation

+ Compilation

The advent of virtual machines such as VMWare is blurring the distinction.

Compiler phases and data structures

    2 * (31+result)                     character stream
       
       |                  lexer
       v
  
    2 * ( 31 + result )                 token list
       
       |                  parser
       v
  
    * 2 (+ 31 result)                   syntax tree
       
       |                  type checker
       v
  
    i* 2 (i+ 31 i-result)               annotated syntax tree
       
       |                  code generator
       v
  
    bipush 31                           assembly code
    iload_1
    iadd 
    iconst_2 
    imul

Compiler errors

Each compiler phase can fail with a characteristic error.

Lexer errors

   "hello

Parse errors

    (4 * (y + 5) - 12))

Type errors

    sort(45)

Errors on later phases are not commonly supported.

A good compiler finds an error at the earliest occation.

Usually, some errors are left to run time:

More compiler phases

The ones above are the main phases. There can be many more, for instance,

Desugaring/normalization: remove syntactic sugar

    int i, j ;   --->  int i ; int j ;

This is normally done at the syntax tree level.

Optimizations:

    i = 2 + 2 ;            --->  i = 4 ;
  
    bipush 31 ; bipush 31  --->  bipush 31 ; dup

This can happen on many different levels.

What we learn in this course (in more detail)

Write regular expressions and implement lexers

Write grammars and implement parsers

Write typing rules and implement type checkers

Write semantic rules and implement interpreters

Write compilation schemes and implement code generators

Syntax-directed translation is a tool used in both type checkers, interpreters, and compilers

We learn the basic ideas of this for both imperative and functional languages.

Language as a programming technique

Special-purpose language/domain-specific language - the ultimate abstraction for a domain.

Often a language grows out from a "notation" in an uncontrolled fashion (e.g. make)

It can be better to think in terms of a language from the beginning.

The BNF Converter is a tool that helps to implement a language. It is available for C, C++, C#, Haskell, Java, and OCaml.

You will experience that language implementation is easy and productive.

The structure of the course

We go through the course web page, including