Lectures

Work in progress

Info. about the remaining lectures will be added in due course.

Lecture 1 - Course Intro and Why Parallel Functional Programming?

Mon study week 1, 13.15 - 15.00 in EB

This is the lecture that sets the scene for the course. It introduces what we mean by parallelism, and distinguishes that from concurrency. It explains why functional languages are well suited for parallel programming, and this despite the fact that they really failed to deliver on the promise of solving parallel programming in the past. So why are they interesting now?

Slides

[pdf]

Competition

Note that John's last slide announces a competition to make a fast parallel Haskell program to do matrix multiplication by monday April 8 at midnight.

You should submit your code using the Fire system. This is for fun, so you may form any group you like. We suggest that a single person submits the code (with comments about who has worked on it).

Last year's competition was to make a parallel sort implementation in Haskell. The submitted entries and John's slides giving the results are available in this zipped file.

Reading

Note that we distinguish between parallel and concurrent programming. Our undergraduate curriculum at Chalmers has arguably overemphasized the latter (locks, semaphores, synchronisation mechanisms and the like) and underemphasized the business of making programs run faster by using many cores (see this discussion by Simon Marlow (previously Microsoft Research, now Facebook), who makes distinctions that exactly mirror our view). Similar opinions were later expressed by Bob Harper, who has radically reshaped the introductory computing curriculum at CMU to place parallelism at the centre — a major source of inspiration for this course.
On this page of Simon Marlow's papers, you can find notes from his course on Parallel and Concurrent Programming in Haskell. The notes give a good explanation of why the topics of this course are interesting. They also make the same distinction between concurrency and parallelism as that made in this course. (We consider only the parallelism part.) Later in the course, we will discuss Simon's work on the Par Monad.
Be inspired by this video of Simon Peyton Jones lecturing on parallel programming in Haskell
The three papers listed on the second last slide of the first lecture are
Haskell on a Shared-Memory Multiprocessr, Harris, Marlow and Peyton Jones, Haskell'05
Feedback Directed Implicit Parallelism, Harris and Singh, ICFP'07
Runtime Support for Multicore Haskell, Marlow, Peyton Jones and Singh, IFFP'09
Chapter 24 of Real World Haskell covers concurrent and multicore programming.

Lecture 2 - from par and pseq to Strategies

Thurs study week 1, 10.00 - 11.45 in EC

This lecture considers par and pseq more critically, and concludes that it might be a good idea to separate the control of behaviour relating to parallelisation from the description of the algorithm itself. The idea of Strategies is described in a well-known paper called Algorithms + Strategies = Parallelism by Trinder, Hammond, Loidl and Peyton Jones. More recently, Marlow and some of the original authors have updated the idea, in Seq no more: Better Strategies for Parallel Haskell. We expect you to read both of these papers. The lecture is based on the newer paper. (Note that in week 2 of the course Kevin Hammond will come and give a lecture.)

Slides

[pdf]

Reading

If you have forgotten what a monad is, looking at Real World Haskell is probably a good option.

See above for papers

The documentation of the Strategies Library is very helpful.

Lecture from 2012 (slides) - Threadscope and GHC events (Andres Löh, Well-Typed LLP)

Andres gave a lecture in 2012, but was, unfortunately, unable to make it this year. We are leaving the link to his slides here as additional information for the course. Andres expanded on some of the topics from the first two lectures, emphasising the need to think about lazy evaluation (and degrees of evaluation) when parallelising programs. He showed some pitfalls of parallel programming through small examples, but based on his experience of parallelising larger programs, gained through his work on the Parallel GHC project. He went more deeply into how to use Threadscope to debug parallel Haskell programs, and into the associated GHC event system. (Well-Typed have made significant improvements to Threadscope and the event system recently.) He ran his examples on an 8 core machine in Leipzig and we looked on enviously.

One thing we hope to do at the end of the course is to compile a list of suggestions for useful improvements to Threadscope. So keep a note of your ideas along these lines.

Slides etc.

Slides(pdf),code and some associated exercises

You would be well advised to study the code and to try some of the exercises.

Lecture 3 - The Par Monad

Mon study week 2, 13.15 - 15.00 in EB

This lecture is about a new programming model for deterministic parallelism, introduced by Simon Marlow. It introduces the Par Monad, a monad for deterministic parallelism, and shows how I-structures are used to exchange information between parallel tasks (or "blobs"), see Marlow's Haskell'11 paper with Ryan Newton and Simon PJ. Take a look at the I-Structures paper referred to in the lecture.

Slides

[pdf]

Slides by Simon Marlow (2012)

[pdf]

Lecture 4 - Structured Parallel Programming (Kevin Hammond, St. Andrews)

Fri study week 2, 15.15 - 17.00 in EC

These lectures will introduce high-level structured parallel programming in Parallel Haskell using high-level patterns of parallelism. The parallel patterns that are introduced can be implemented using standard par/pseq constructs in Parallel Haskell building on algorithmic skeletons and evaluation strategy approaches. The lectures will introduce a range of data parallel and task parallel patterns, including bulk synchronous parallelism, map-reduce and parallel folds, and show how these can be implemented in Parallel Haskell.

The paper about solving problems with parallel quicksort (and about execution replay) is available here. A slightly revised version will appear in the proceedings of TFP 2012 (copyright Springer).

Slides

[pdf]

Biography

Kevin Hammond has worked extensively in the field of advanced programming language design and implementation, with a focus on cost and performance issues. His work concentrates on functional language designs, including that of the standard non-strict functional language Haskell, where he served on the international design committee, and worked on the dominant compiler, GHC. Since receiving his PhD in 1989, he has published widely in the general area of parallel programming, producing over 80 books, book chapters, journal papers and other refereed publications focusing on parallel computing, domain-specific programming languages, real-time systems, cost issues, adaptive run-time environments, lightweight concurrency, high-level programming language design and performance monitoring/visualisation. He has run over 20 successful national and international research projects, and is a founder member of IFIP WG 2.11 (Generative Programming).

Lecture 5 - GPU Programming I (Joel Svensson)

Mon study week 3, 13.15 - 15.00 in EB

This lecture covers imperative and functional approaches to GPU programming. There are many references at the end, and the assignments page also points to references (for Lab B). The lecture will cover Obsidian, which is the topic of Joel's PhD thesis work. Other approaches to GPU programming in Haskell include Accelerate and Nikola, and the following papers are very interesting:

Manuel M.T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. Accelerating Haskell array codes with multicore GPUs.
In Proceedings of the sixth workshop on Declarative aspects of multicore programming, DAMP ’11, ACM, 2009.
[pdf]

Trevor L. McDonell, Manuel M. T. Chakravarty, Gabriele Keller, and Ben Lippmeier. Optimising Purely Functional GPU Programs.
Submitted to ICFP 2013.
[pdf]

Geoffrey Mainland and Greg Morrisett. Nikola: Embedding Compiled GPU Functions in Haskell.
Proceedings of the 2010 ACM SIGPLAN Symposium on Haskell (Haskell '10), 2010.
[pdf]

Slides

[pdf]

Lecture 6 - GPU Programming II (Joel Svensson)

Thu study week 3, 10.00- 11.45 in EC

This lecture is about how Obsidian is implemented as a compiled embedded language in Haskell.

Slides

[pdf]

Joel would like to thank the students for the constructive feedback that they provided after his second lecture.

Lecture 7 - Skeletons (Jost Berthold, DIKU, Copenhagen Univ.)

Fri study week 3, 15.15 - 17.00 in EC

Jost discusses skeletons as a means to structure parallel computations -- viewing skeletons as higher order functions. He distinguishes three types of skeletons: small scale skeletons (like parMap), process communication topology skeletons, and proper algorithmic skeletons (like divide and conquer). He introduces the Eden dialect as a way to both implement and use skeletons in Haskell.

Lecture 8 - Data Parallel Programming I

Mon study week 4, 13.15 - 15.00 in EB

This lecture is all about Guy Blelloch's seminal work on the NESL programming language and on parallel functional algorithms and associated cost models. The best introduction is to watch the video of his marvellous invited talk at ICFP 2010, which John and Mary had the pleasure to attend. There are pages about NESL and about his publications in general. For the notions of work and depth, see this part of the 1996 CACM paper, and also this page, which considers work and depth for three algorithms.

Slides

[pdf]

Lecture 9 - Data Parallel Programming II

Thu study week 4, 10.00 - 11.45 in EC

This lecture gave a brief intro to Data Parallel Haskell by showing the Barnes-Hut algorithm in DPH. The slides contain a link to the original Barnes-Hut paper from Nature. It is great!

Next, the lecture covered Data parallel programming using the Repa library, which gives flat data parallelism. A main source is the Repa paper from ICFP 2010. And then there are two more Repa papers, one from Haskell'11 and one (on Repa 3) from Haskell'12.

Slides

[pdf] ¨

Code

repaExLec9.hs

Lecture 10 - Parallel Programming in Erlang

Fri study week 4, 15.15 - 17.00 in EC

This lecture introduced Erlang for Haskell programmers, taking parallelising quicksort as an example, both within one Erlang VM and distributed across a network. The latest version of the Erlang system can be downloaded from here. There is a Windows installer. Many linux versions have an Erlang packagage available, but not necessarily a package suitable for development of Erlang code, and not necessarily the latest version. On Ubuntu, try

sudo apt-get install erlang-dev

If that doesn't work or you can't find an appropriate package, build the VM from source.

Slides

[pdf]

Lecture 11 - Robust Erlang

Mon study week 5, 13.15 - 15.00 in EB

In 2012, his lecture focussed on the fault tolerance constructs in Erlang--links and system processes--and the motivation for the "Let It Crash" philosophy. It introduced supervision trees and the Open Telecoms Platform, and developed a simple generic server.

Slides (2012)

[pdf]

Lecture 12 - A Report from the Real World (Lennart Augustsson, Standard Chartered)

Thu study week 5, 10:00 - 11:45 in EC

Lennart told us about how functional programming is used in the investment banking part of Standard Chartered. He explained how many of the pricing and risk analysis problems that demand heavy computation are embarassingly parallel, so that a form of pmap is just about the only way that is used to express parallelism. A strategy parameter determines whether the resulting computation is run on multiple threads or processes on a local machine, or is sent off to a grid. The grid computations must be pure and Lennart stressed the usefulness of the type system of either Mu (a strict version of Haskell) or Haskell in enforcing this. He emphasised that putting the Quant library, Context (based on the financial contracts paper by Peyton Jones, Eber and Seward), into practical use at many sites around the world involved a lot of hard engineering work related both to how to serialise both data and functions and to having to cope with the fact that different sites may be running different versions of the library, and on different architectures. Along the way, Lennart mentioned that it is well known that some programmers are ten times more productive than others, and pointed out that such programmers can, in fact, get paid ten times as much if they choose the right employer :)

Slides

[pdf]

Lecture 13 - Parallelism in Erlang (Patrik Nyblom, Ericsson)

Fri study week 5, 15:15 - 17:00 in EC

Scalable parallel programming in Erlang demands dividing tasks into sufficiently many processes, which can run in parallel, and which avoid heavy sequential parts, such as the last step of divide-and-conquer algorithms which is often the most expensive, and runs in parallel with nothing. But even then, congestion for shared resources can spoil performance. The lecture discussed ways of reducing congestion, both at the Erlang source level and in the virtual machine, for example by replacing one resource shared by n processes with n^2 resources shared by just two. "Invisible" shared resources, such as the scheduler queue(s) and the L3 cache can hit performance badly, so even Erlang programmers do need to be aware of architectural limitations such as cache sizes.

Slides

[pdf]

Lecture 14 - Efficient Parallel and Incremental Parsing of Practical Context-Free Languages (Jean-Philippe Bernardy)

Mon study week 6, 13:15 - 15:00 in EB

We present a divide-and-conquer algorithm for parsing that enables both parallel and incremental parsing of context-free languages in polylogarithmic time, under certain conditions that seem to hold in practice. These conditions occur for example when parsing program texts written by humans. Our algorithm is a refinement of Valiant's (1975), who reduced the problem of parsing to that of doing matrix multiplications, yielding sub-cubic complexity for the context-free recognition problem. We are able to obtain a much improved complexity result in practice, because the multiplications performed by Valiant's algorithm involve an overwhelming majority of empty matrices. Under our assumptions, our implementation of Valiant's algorithm takes O(n log^3 n) time when run sequentially; and O(\log^4 n) when run using O(n) processors in parallel, or when making an incremental update.

Slides

[pdf]

Lecture 15 - Map-Reduce

Wed study week 6, 13:15 - 15:00 in EF

Google's Map-Reduce framework has become a popular approach for processing very large datasets in distributed clusters. Although originally implemented in C++, it's connections with functional programming are close: the original inspiration came from the map and reduce functions in LISP; MapReduce is a higher-order function for distributed computing; purely functional behaviour of mappers and reducers is exploited for fault tolerance; it is ideal for implementation in Erlang. This lecture explains what Map-Reduce is, shows a sequential and a simple parallel implementation in Erlang, and discusses the refinements needed to make it work in reality.

Slides (2012)

[pdf]

Lecture 16 - Concurrency in the Real World (Richard Carlsson, Klarna)

Thu study week 7, 10:00 - 11:45 in EC

In 2012, Richard told us why Erlang is a good fit for Klarna, emphasizing that though Erlang's performance can, of course, be beaten, it lets you get close enough to the best possible performance in a very short time. He talked about designing for parallelism, for example splitting shared resources to reduce contention. Databases in parallel distributed systems bring consistency problems, and Richard explained the famous CAP-theorem. Finally he mentioned that Klarna are always hiring!

Slides

[pdf]

Lecture 17 - Cache complexity and parallelism (Nikita Frolov)

Mon study week 8, 13:15 - 15:00 in EB

Over the last few decades, performance of processors has grown at a much faster pace than performance of memories. The issue becomes even more severe with the advent of the (massive) multicore era. This gap is addressed by clever design and use of caches. One wouldn't be wrong to say that design of parallel computers is, above all, caches. The quantitative study of algorithms in a parallel setting has already extended the time and space complexity analyses with notions of work and depth. In this lecture, we take one more step and show how to reason about the cache behavior of algorithms.

Slides

[pdf]

Reading

Cache-Oblivious Algorithms, Harald Prokop, MSc Thesis, MIT, 1999.

  o Main
  o Schedule
  - Lectures
     o Lecture 1
     o Lecture 2
     o Lecture 3
     o Lecture 4
     o Lecture 5
     o Lecture 6
     o Lecture 7
     o Lecture 8
     o Lecture 9
     o Lecture 10
     o Lecture 11
     o Lecture 12
     o Lecture 13
     o Lecture 14
     o Lecture 15
     o Lecture 16
     o Lecture 17
  + Assignments
  o Exam
  o Reporting system
  o Contact us
  o Haskell search

Lectures

Work in progress

Lecture 1 - Course Intro and Why Parallel Functional Programming?

Slides

Competition

Reading

Lecture 2 - from par and pseq to Strategies

Slides

Reading

Lecture from 2012 (slides) - Threadscope and GHC events (Andres Löh, Well-Typed LLP)

Slides etc.

Lecture 3 - The Par Monad

Slides

Slides by Simon Marlow (2012)

Lecture 4 - Structured Parallel Programming (Kevin Hammond, St. Andrews)

Slides

Biography

Lecture 5 - GPU Programming I (Joel Svensson)

Slides

Lecture 6 - GPU Programming II (Joel Svensson)

Slides

Lecture 7 - Skeletons (Jost Berthold, DIKU, Copenhagen Univ.)

Information about Eden

Slides

Code

Lecture 8 - Data Parallel Programming I

Slides

Lecture 9 - Data Parallel Programming II

Slides

Code

Lecture 10 - Parallel Programming in Erlang

Slides

Lecture 11 - Robust Erlang

Slides (2012)

Lecture 12 - A Report from the Real World (Lennart Augustsson, Standard Chartered)

Slides

Lecture 13 - Parallelism in Erlang (Patrik Nyblom, Ericsson)

Slides

Lecture 14 - Efficient Parallel and Incremental Parsing of Practical Context-Free Languages (Jean-Philippe Bernardy)

Slides

Lecture 15 - Map-Reduce

Slides (2012)

Lecture 16 - Concurrency in the Real World (Richard Carlsson, Klarna)

Slides

Lecture 17 - Cache complexity and parallelism (Nikita Frolov)

Slides

Reading