Parallel Functional Programming

Work in progress

Updates to this page will happen throughout the course.

Lecture 1 - Course Intro and Why Parallel Functional Programming?

Mon study week 1, 13.15 - 15.00 in EB

This is the lecture that sets the scene for the course. It introduces what we mean by parallelism, and distinguishes that from concurrency. It explains why functional languages are well suited for parallel programming, and this despite the fact that they really failed to deliver on the promise of solving parallel programming in the past. So why are they interesting now?

Slides

[pdf]

Reading

Note that we distinguish between parallel and concurrent programming. Our undergraduate curriculum at Chalmers has arguably overemphasized the latter (locks, semaphores, synchronisation mechanisms and the like) and underemphasized the business of making programs run faster by using many cores (see this discussion by Simon Marlow (previously Microsoft Research, now Facebook), who makes distinctions that exactly mirror our view). Similar opinions were later expressed by Bob Harper, who has radically reshaped the introductory computing curriculum at CMU to place parallelism at the centre — a major source of inspiration for this course.
Simon Marlow's book on Parallel and Concurrent Programming in Haskell gives a good explanation of why the topics of this course are interesting. It also makes the same distinction between concurrency and parallelism as that made in this course. We consider only Part I on parallelism. We will simply call the book PCPH. In the second lecture of the course (on thursday March 20 2014), Simon will present his work on the Par Monad as well as more about how to use Threadscope, the tool that John uses in this lecture
Be inspired by this video of Simon Peyton Jones lecturing on parallel programming in Haskell
The three papers listed on the second last slide of the first lecture are
Haskell on a Shared-Memory Multiprocessr, Harris, Marlow and Peyton Jones, Haskell'05
Feedback Directed Implicit Parallelism, Harris and Singh, ICFP'07
Runtime Support for Multicore Haskell, Marlow, Peyton Jones and Singh, IFFP'09
Chapter 24 of Real World Haskell covers concurrent and multicore programming.

Lecture 2 - The Par Monad (Simon Marlow)

Thurs study week 1, 10.00 - 11.45 in EC

This guest lecture is about a programming model for deterministic parallelism, introduced by Simon Marlow. It introduces the Par Monad, a monad for deterministic parallelism, and shows how I-structures are used to exchange information between parallel tasks (or "blobs"), see Marlow's Haskell'11 paper with Ryan Newton and Simon PJ. Take a look at the I-Structures paper referred to in the lecture. See PCPH chapter 4.

Slides

[pdf]

Lecture 3 - GHC heap internals (Nick Frolov)

Fri study week 1, 15.15 - 17.00 in EC

Nick is the course assistant (TA). In this lecture, he will tell you things that you need to know to make a good job of the labs in the course, based on experience from previous years.

There is only so much parallelism the memory can handle (the effect known as "memory wall"). While both functional and imperative languages use the concept of heap for managing memory, the behavior of programs written in pure languages like Haskell is radically different from that of programs written with aggressive use of side effects — there is no mutation of data but much more allocation of it. We will review the major design decisions behind GHC implementation of heap, including garbage collection, multithreading and I/O management. We will also take a look at how tweaking heap runtime parameters can impact performance of a program, with help of Threadscope.

Slides

[pdf]

Lecture 4 - from par and pseq to Strategies

Mon study week 2, 13.15- 15.00 in EB

This lecture considers par and pseq more critically, and concludes that it might be a good idea to separate the control of behaviour relating to parallelisation from the description of the algorithm itself. The idea of Strategies is described in a well-known paper called Algorithms + Strategies = Parallelism by Trinder, Hammond, Loidl and Peyton Jones. More recently, Marlow and some of the original authors have updated the idea, in Seq no more: Better Strategies for Parallel Haskell. We expect you to read both of these papers. The lecture is based on the newer paper. See also PCPH chapters 2 and 3.

Slides

[pdf]

Reading

If you have forgotten what a monad is, looking at Real World Haskell is probably a good option.

See above for papers. Read PCPH chapters 2 and 3.

The documentation of the Strategies Library is very helpful.

Lecture 5 - Data Parallel Programming in Repa

Thu study week 2, 10.00 - 11.45 in EC

This lecture covers Data parallel programming using the Repa library, which gives flat data parallelism (more about that in Lecture 13). A main source is the Repa paper from ICFP 2010. And then there are two more Repa papers, one from Haskell'11 and one (on Repa 3) from Haskell'12. See also PCPH chapter 5.

Slides

[pdf] ¨

Code

RepaExLec514.hs

Lecture 6 - Parallel Programming in Erlang

Fri study week 2, 15.15 - 17.00 in EC

This lecture introduced Erlang for Haskell programmers, taking parallelising quicksort as an example, both within one Erlang VM and distributed across a network. The latest version of the Erlang system can be downloaded from here. There is a Windows installer. Many linux versions have an Erlang packagage available, but not necessarily a package suitable for development of Erlang code, and not necessarily the latest version. On Ubuntu, try

sudo apt-get install erlang-dev

If that doesn't work or you can't find an appropriate package, build the VM from source.

Slides

[pdf]

Lecture 7 - Pull Arrays and Push Arrays, or The Art of Controlling Fusion (Jean-Philippe Bernardy)

Mon study week 3, 13.15 - 15.00 in EB

Fusion is an optimization that improves the performance of function composition, by removing the need for a data structure mediating the communication between the functions. Fusion is critical for good performance of functional programs, yet, predicting whether fusion will occur requires careful analysis of the functions being composed. Hence, if one cares about performance, referential transparency is broken.

I will show that this issue can be solved by refining the types of the data structures one wants to fuse. In particular, in the case of arrays, one needs two array types (push and pull). Besides, these types express parallelism opportunities, so they are an essential building block of parallel functional programming languages.

Slides

[pdf]

Lecture 8 - Robust Erlang

Thu study week 3, 10.00 - 11.45 in EC

In 2012, his lecture focussed on the fault tolerance constructs in Erlang--links and system processes--and the motivation for the "Let It Crash" philosophy. It introduced supervision trees and the Open Telecoms Platform, and developed a simple generic server.

Slides

[pdf](2012)

Lecture 9 - Skeletons (Jost Berthold, DIKU, Copenhagen Univ.)

Mon study week 4, 13.15 - 15.00 in EB

Jost discusses skeletons as a means to structure parallel computations -- viewing skeletons as higher order functions. He distinguishes three types of skeletons: small scale skeletons (like parMap), process communication topology skeletons, and proper algorithmic skeletons (like divide and conquer). He introduces the Eden dialect as a way to both implement and use skeletons in Haskell.

Information about Eden

All about Eden

Eden Trace Viewer on Hackage:

http://hackage.haskell.org/package/edentv
(but: pending a fix to make it work correctly with ghc-7.8)

Slides

[pdf]

Code

[.zip]

Lecture 10 - Map-Reduce

Thu study week 4, 10.00 - 11.45 in EC

Google's Map-Reduce framework has become a popular approach for processing very large datasets in distributed clusters. Although originally implemented in C++, it's connections with functional programming are close: the original inspiration came from the map and reduce functions in LISP; MapReduce is a higher-order function for distributed computing; purely functional behaviour of mappers and reducers is exploited for fault tolerance; it is ideal for implementation in Erlang. This lecture explains what Map-Reduce is, shows a sequential and a simple parallel implementation in Erlang, and discusses the refinements needed to make it work in reality.

Slides

[pdf]

Lecture 11 - Cache complexity and parallelism (Nikita Frolov)

Fri study week 4, 15:15 - 17:00 in EC

Over the last few decades, performance of processors has grown at a much faster pace than performance of memories. The issue becomes even more severe with the advent of the (massive) multicore era. This gap is addressed by clever design and use of caches. One wouldn't be wrong to say that design of parallel computers is, above all, caches. The quantitative study of algorithms in a parallel setting has already extended the time and space complexity analyses with notions of work and depth. In this lecture, we take one more step and show how to reason about the cache behavior of algorithms.

Slides

[pdf]

Reading

Cache-Oblivious Algorithms, Harald Prokop, MSc Thesis, MIT, 1999.

Lecture 12 - Data Parallel Programming

Fri study week 5, 15.15 - 17.00 in EC

This lecture is all about Guy Blelloch's seminal work on the NESL programming language and on parallel functional algorithms and associated cost models. The best introduction is to watch the video of his marvellous invited talk at ICFP 2010, which John and Mary had the pleasure to attend. There are pages about NESL and about his publications in general. For the notions of work and depth, see this part of the 1996 CACM paper, and also this page, which considers work and depth for three algorithms.

Slides

[pdf]
[pdf, with a few brief notes]

Lecture 13 - GPU Programming I

Mon study week 6, 13.15 - 15.00 in EB

To quote PCPH: "Accelerate is an embedded domain-specific language (EDSL) for programming the GPU. It allows us to write Haskell code in a somewhat stylized form and have it run directly on the GPU. For certain tasks, we can obtain orders of magnitude speedup by using Accelerate.". This lecture gives some information about Graphics Processing Units (GPUs) and GPU programming, and then introduces Accelerate.

Slides

[pdf]

Lecture 14 - GPU Programming II

Mon study week 6, 10.00 - 11.45 in EC

This lecture introduces Obsidian, an embedded domain specific language for GPU programming developed here at Chalmers by Joel Svensson.

Slides

[pdf]

Draft Paper

[pdf]

Lecture 15 - The Erlang Virtual Machine (Erik Stenman)

Mon study week 7, 10.00 - 11.45 in EB

Slides

[pdf]

Lecture 16 - Parallel Functional Programming in Erlang at Klarna (Richard Carlsson)

Fri study week 7, 15.15 - 17.00 in EC

Slides

[pdf]

Lecture 17 - Databases in the New World

Mon study week 8, 13.15 - 15.00 in EB

Slides

[pdf]

Lecture 18 - Taking Research into Production at Basho (Russell Brown)

Thu study week 8, 10.00 - 11.45 in EC

Slides

[pdf]

Lectures

Work in progress

Lecture 1 - Course Intro and Why Parallel Functional Programming?

Slides

Reading

Lecture 2 - The Par Monad (Simon Marlow)

Slides

Lecture 3 - GHC heap internals (Nick Frolov)

Slides

Lecture 4 - from par and pseq to Strategies

Slides

Reading

Lecture 5 - Data Parallel Programming in Repa

Slides

Code

Lecture 6 - Parallel Programming in Erlang

Slides

Lecture 7 - Pull Arrays and Push Arrays, or The Art of Controlling Fusion (Jean-Philippe Bernardy)

Slides

Lecture 8 - Robust Erlang

Slides

Lecture 9 - Skeletons (Jost Berthold, DIKU, Copenhagen Univ.)

Information about Eden

Eden Trace Viewer on Hackage:

Slides

Code

Lecture 10 - Map-Reduce

Slides

Lecture 11 - Cache complexity and parallelism (Nikita Frolov)

Slides

Reading

Lecture 12 - Data Parallel Programming

Slides

Lecture 13 - GPU Programming I

Slides

Lecture 14 - GPU Programming II

Slides

Draft Paper

Lecture 15 - The Erlang Virtual Machine (Erik Stenman)

Slides

Lecture 16 - Parallel Functional Programming in Erlang at Klarna (Richard Carlsson)

Slides

Lecture 17 - Databases in the New World

Slides

Lecture 18 - Taking Research into Production at Basho (Russell Brown)

Slides