Book: 1.3, 1.5, 1.6
Minilanguages, from Eric S. Raymond, The Art of Unix Programming
The Hundred-Year Language from Paul Graham, Hackers & Painters
How simple can a programming language be?
Turing-completeness
Some programming language history
General and special purpose languages
Case study: the evolution of BNFC
Data formats, XML
Before electronic computers were built, mathematical models of computation were developed. All of these were proven equivalent:
Any programming language equivalent to one of these is said to be Turing-complete.
All usual general-purpose languages are Turing-complete.
(Material from Wikipedia)
Recall how simple lambda calculus is:
Exp ::= Ident | "(" Exp Exp ")" | "\" Ident "->" Exp
We don't even need integers, because they can be defined as follows:
0 = \f -> \x -> x 1 = \f -> \x -> f x 2 = \f -> \x -> f (f x) 3 = \f -> \x -> f (f (f x)) ...
In other words: number n is a higher-order function that applies any function n times to any given argument.
These functions are known as Church numerals.
Addition of Church numerals is defined as follows:
PLUS = \m -> \n -> \f -> \x -> n f (m f x)
Example:
PLUS 2 3 = (\m -> \n -> \f -> \x -> n f (m f x)) (\f -> \x -> f (f x)) (\f -> \x -> f (f (f x))) = \f -> \x -> (\f -> \x -> f (f (f x))) f ((\f -> \x -> f (f x)) f x) = \f -> \x -> (\f -> \x -> f (f (f x))) f (f (f x)) = \f -> \x -> f (f (f (f (f x)))) = 5
Multiplication:
MULT = \m -> \n -> m (PLUS n) 0
Idea: add n to 0 m times.
Booleans (Church booleans) and conditions
TRUE = \x -> \y -> x FALSE = \x -> \y -> y IFTHENELSE = \b -> \x -> \y -> b x y AND = \a -> \b -> IFTHENELSE a b FALSE OR = \a -> \b -> IFTHENELSE a TRUE b
Recursion, via the fix-point combinator
Y = \g -> (\x -> g (x x)) (\x -> g (x x))
This has the property
Y g = g (Y g)
which means iterating g
infinitely many times.
To write programs in lambda calculus is possible!
But it is inconvenient and inefficient.
However, it is a good starting point for a language to have a very small core language.
The implementation (compiler, interpreter) is then done for the core language with syntactic sugar and possibly optimizations.
Lisp is built from lambda calculus with a few additions, such as a primitive notion of lists.
Haskell has a small core language based on lambda calculus with algebraic datatypes and pattern matching, and primitive number types.
The following material is from this website.
By Urban Müller, with the goal: create a Turing-complete language for which one could write the smallest compiler ever. His compiler was 240 bytes in size.
A Brainfuck program has an implicit byte pointer, called "the pointer", which is free to move around within an array of 30000 bytes, initially all set to zero. The pointer itself is initialized to point to the beginning of this array.
The Brainfuck programming language consists of eight commands, each of which is represented as a single character.
> Increment the pointer. < Decrement the pointer. + Increment the byte at the pointer. - Decrement the byte at the pointer. . Output the byte at the pointer. , Input a byte and store it in the byte at the pointer. [ Jump forward past the matching ] if the byte at the pointer is zero. ] Jump backward to the matching [ unless the byte at the pointer is zero.
> becomes ++p; < becomes --p; + becomes ++*p; - becomes --*p; . becomes putchar(*p); , becomes *p = getchar(); [ becomes while (*p) { ] becomes }
Display the ASCII character set (Jeffry Johnston 2001)
.+[.+]
Print "HELLO WORLD!" (from Wikipedia)
++++++++++ [>+++++++>++++++++++>+++>+<<<<-] To set up useful values in the array >++. Print 'H' >+. Print 'e' +++++++. Print 'l' . Print 'l' +++. Print 'o' >++. Print ' ' <<+++++++++++++++. Print 'W' >. Print 'o' +++. Print 'r' ------. Print 'l' --------. Print 'd' >+. Print '!' >. Print newline
Certainly Turing completeness is not enough! (lambda calculus, Brainf*ck, the corresponding fragment of C...)
More or less obvious criteria:
These criteria are not always compatible: there are trade-offs.
In practice, different languages are good for different applications.
(And there are languages which are no good for any applications.)
What is Brainf*ck good for? Reasoning about computability!
We spend some time on looking at this poster with a time chart of languages.
Toward more structure in programs (from GOTOs to while
loops to
recursion)
Toward more static typing (from bit strings to numeric types to structures to algebraic data types to dependent types)
Toward more abstraction (from character arrays to strings, from arrays to vectors and lists, from unlimited access to abstract data types)
Toward more genericity (from cut-and-paste to functions to polymorphic functions to first-class modules)
Toward more streamlined syntax (from positions and line numbers,
keywords used as identifiers, begin
and end
markers,
limited-size identifiers, etc,
to a "C-like" syntax that can be processed with standard tools)
In general, toward more high-level languages, hence farther away from the machine. This creates more work for machines (and compiler writers!) but relieves the burden of language users.
Also called minilanguages, or domain (specific) languages
An "ultimate solution" to a certain class of problems
Examples:
make
for predefining compilations
bash
for working on files and directories
The latter two are actually Turing-complete!
Imperative or declarative?
Interpreted or compiled?
Portable or platform-independent?
Statically or dynamically checked?
Turing-complete or limited?
Language or library?
Minilanguage that is a fragment of a larger host language.
Really the same as a library in the host language.
Advantages
Disadvantages
The starting point was the course Kompilatorkonstruktion in 2002. The web page gives a link:
The first version written by Aarne Ranta and Markus Forsberg for Haskell/Happy/Alex
In 2004, ported to Java, C, and C++ by Michael Pellauer
In 2005, ported to Java 1.5 by Björn Bringert
In 2006, ported to OCaml by Kristofer Johannisson and C# by Johan Broberg
To implement exactly the idea that a parser returns an abstract syntax tree.
"The number of bugs per line is independent of programming language" (Eric S. Raymond)
Code size for language implementation in Lab 2.
format | CPP.cf | Haskell | Java 1.5 | C++ | raw C++ | |
---|---|---|---|---|---|---|
files | 1 | 9 | 55 | 12 | 12 | |
lines | 63 | 999 | 3353 | 5382 | 9424 | |
chars | 1548 | 28516 | 92947 | 96587 | 203659 | |
lines src/tgt | 100% | 6% | 2% | 1% | 0.5% |
Imperative or declarative? Declarative.
Interpreted or compiled? Compiled.
Portable or platform-independent? Portable.
Statically checked? Yes, but should be more.
Turing-complete or limited? Limited.
Language or library? Language.
WSCC - World's Smallest Compiler Compiler. 114 lines of Haskell.
Functionality:
-- grammar parser: one rule/line, format F. C ::= (C | "s")* ";" getCF :: String -> CF getCF = concat . map (getcf . init . words) . filter isRule . lines where getcf (fun : cat : "::=" : its) = return (init fun, (cat, map mkIt its)) getcf ww = [] mkIt ('"':w@(_:_)) = Right (init w) mkIt w = Left w isRule line = not (all isSpace line || take 2 line == "--") -- the type of context-free grammars type CF = [Rule] type Rule = (Fun, (Cat, [Either Cat Tok])) type Cat = String type Tok = String type Fun = String type Str = [Tok]
-- a complete set of parser combinators ā la Wadler and Hutton type Parser a b = [a] -> [(b,[a])] parseResults :: Parser a b -> [a] -> [b] parseResults p s = [x | (x,r) <- p s, null r] (...) :: Parser a b -> Parser a c -> Parser a (b,c) (p ... q) s = [((x,y),r) | (x,t) <- p s, (y,r) <- q t] (|||) :: Parser a b -> Parser a b -> Parser a b (p ||| q) s = p s ++ q s lit :: (Eq a) => a -> Parser a a lit x (c:cs) = [(x,cs) | x == c] lit _ _ = [] (***) :: Parser a b -> (b -> c) -> Parser a c (p *** f) s = [(f x,r) | (x,r) <- p s] succeed :: b -> Parser a b succeed v s = [(v,s)] fails :: Parser a b fails s = []
-- parser that works for non-left-recursive grammars -- generalization of LL(1) to ambiguous grammars pTree :: CF -> Cat -> Parser Tok Tree pTree cf cat = foldr (|||) fails (map pRule (rulesForCat cf cat)) where pRule (fun, (_,its)) = pIts its *** (\trees -> Tree (fun,trees)) pIts (Left c : ts) = (pTree cf c ... pIts ts) *** (uncurry (:)) pIts (Right s : ts) = (lit s ... pIts ts) *** snd pIts [] = succeed [] -- the type of syntax trees newtype Tree = Tree (Fun,[Tree])
The whole implementation: file WSCC.hs
Example grammar: file Mini.cf
Example run (interactive; indentation of parse result added):
$ runghc WSCC.hs Mini.cf 15 rules Program> Prog NilStm Program> int i ; { i = 1 ; int i ; } Prog (ConsStm (SDecl TInt Id_i) (ConsStm (SBlock (ConsStm (SAss Id_i (EInt Int_1)) (ConsStm (SDecl TInt Id_i) NilStm))) NilStm))
No built-in literals or identifiers
No built-in precedence
No documentation, pretty-printer, skeleton
No connection to standard compiler tools (Happy, Alex)
Only Haskell
No treatment of left recursive grammars
Unpredictable complexity due to backtracking
Undocumented grammar syntax
Built-in literals added
Built-in precedence via indexed categories
Documentation in Latex, pretty-printer and skeleton in Haskell
Code generated for standard compiler tools (Happy, Alex)
Only Haskell, still
Left recursion is a virtue in LALR parsing
Predictably linear complexity
BNF grammar syntax implemented in BNFC, therefore documented
Compilation to standard tools made it really useful.
The language is declarative and therefore portable and predictable.
Static checking as close to source as possible
token
definitions
do not get overshadowed
The source code of BNFC has become a terrible mess and is hard to maintain.
Easy to get started: a prototype is ready to run in 10 minutes.
Implementation language can be changed, e.g. fast prototype in Haskell and production system in C++.
Many implementation languages can be combined, because they can communicate via parser and pretty printer.
The language document can be handed to users.
Restrictions (we call them conditions for "well-behaved languages"):
XML = Extended Markup Language
Algebraic datatypes can be encoded in XML
DTD = Document Type Definition, tells what combinations are valid
BNFC can generate a DTD and encode syntax tree in XML with the
option -xml
(Haskell only). Try this for the grammar
Mini.cf
:
bnfc -xmlt -m Mini.cf
<?xml version="1.0" standalone="yes"?> <!DOCTYPE Mini [ <!ELEMENT Integer EMPTY> <!ATTLIST Integer value CDATA #REQUIRED> <!ELEMENT Double EMPTY> <!ATTLIST Double value CDATA #REQUIRED> <!ELEMENT String EMPTY> <!ATTLIST String value CDATA #REQUIRED> <!ELEMENT Ident EMPTY> <!ATTLIST Ident value CDATA #REQUIRED> <!ELEMENT Program ((Prog, Stm*))> <!ELEMENT Prog EMPTY> <!ELEMENT Stm ((SDecl, Type, Ident) | (SAss, Ident, Exp) | (SBlock, Stm*) | (SPrint, Exp))> <!ELEMENT SDecl EMPTY> <!ELEMENT SAss EMPTY> <!ELEMENT SBlock EMPTY> <!ELEMENT SPrint EMPTY> <!ELEMENT Exp ((EVar, Ident) | (EInt, Integer) | (EDouble, Double) | (EAdd, Exp, Exp))> <!ELEMENT EVar EMPTY> <!ELEMENT EInt EMPTY> <!ELEMENT EDouble EMPTY> <!ELEMENT EAdd EMPTY> <!ELEMENT Type ((TInt) | (TDouble))> <!ELEMENT TInt EMPTY> <!ELEMENT TDouble EMPTY> ]>
./TestMini ex.mini ex.mini Parse Successful! [Linearized tree] int x ; x = 6 ; int y ; y = x + 7 ; print y ; { int y ; y = 4 ; print y ; x = y ; print x ; } print x ; print y ; [XML] <Program> <Prog/> <Stm> <SDecl/> <Type> <TInt/> </Type> <Ident value = "x" /> </Stm> <Stm> <SAss/> <Ident value = "x" /> <Exp> <EInt/> <Integer value = "6" /> </Exp> </Stm> <Stm> <SDecl/> <Type> <TInt/> </Type> <Ident value = "y" /> </Stm> <Stm> <SAss/> <Ident value = "y" /> <Exp> <EAdd/> <Exp> <EVar/> <Ident value = "x" /> </Exp> <Exp> <EInt/> <Integer value = "7" /> </Exp> </Exp> </Stm> <Stm> <SPrint/> <Exp> <EVar/> <Ident value = "y" /> </Exp> </Stm> <Stm> <SBlock/> <Stm> <SDecl/> <Type> <TInt/> </Type> <Ident value = "y" /> </Stm> <Stm> <SAss/> <Ident value = "y" /> <Exp> <EInt/> <Integer value = "4" /> </Exp> </Stm> <Stm> <SPrint/> <Exp> <EVar/> <Ident value = "y" /> </Exp> </Stm> <Stm> <SAss/> <Ident value = "x" /> <Exp> <EVar/> <Ident value = "y" /> </Exp> </Stm> <Stm> <SPrint/> <Exp> <EVar/> <Ident value = "x" /> </Exp> </Stm> </Stm> <Stm> <SPrint/> <Exp> <EVar/> <Ident value = "x" /> </Exp> </Stm> <Stm> <SPrint/> <Exp> <EVar/> <Ident value = "y" /> </Exp> </Stm> </Program>
The question is not an exclusive "or": you can get both!
BNFC can be used for defining a datafile format, which is more compact than XML but portable between many languages, since many languages can print and parse these objects.
If this is not enough, the format can be converted automatically to an XML representation.