Parallel Functional Programming

This page describes each lecture and contains links to related materials.

Course Introduction

This is the lecture that sets the scene for the course. It explains why functional languages are well suited for parallel programming, and this despite the fact that they really failed to deliver on the promise of solving parallel programming in the past. So why are they interesting now?

Slides:

The slides

Reading:

Note that we distinguish between parallel and concurrent programming. Our undergraduate curriculum at Chalmers has arguably overemphasized the latter (locks, semaphores, synchronisation mechanisms and the like) and underemphasized the business of making programs run faster by using many cores (see this discussion by Simon Marlow (previously Microsoft Research, now Facebook)), who makes distinctions that exactly mirror our view). Similar opinions were later expressed by Bob Harper, who has radically reshaped the introductory computing curriculum at CMU to place parallelism at the centre — a major source of inspiration for this course.

Simon Marlow’s book on Parallel and Concurrent Programming in Haskell gives a good explanation of why the topics of this course are interesting. It also makes the same distinction between concurrency and parallelism as that made in this course. We consider only Part I on parallelism. We will simply call the book PCPH.

Be inspired by this video of Simon Peyton Jones lecturing on parallel programming in Haskell
The three papers listed on the second last slide of the first lecture are
- Haskell on a Shared-Memory Multiprocessr, Harris, Marlow and Peyton Jones, Haskell’05
- Feedback Directed Implicit Parallelism, Harris and Singh, ICFP’07
- Runtime Support for Multicore Haskell, Marlow, Peyton Jones and Singh, ICFP’09
Make sure to read the last of these.

from par and pseq to Strategies

This lecture considers par and pseq more critically, and concludes that it might be a good idea to separate the control of behaviour relating to parallelisation from the description of the algorithm itself. The idea of Strategies is described in a well-known paper called Algorithms + Strategies = Parallelism by Trinder, Hammond, Loidl and Peyton Jones. More recently, Marlow and some of the original authors have updated the idea, in Seq no more: Better Strategies for Parallel Haskell. We expect you to read both of these papers. The lecture is based on the newer paper. See also PCPH chapters 2 and 3.

Slides: * The slides

Other Material:

The documentation of the Strategies Library is very helpful.

exercise session on parallelising Haskell

Code:

Haskell file

The Par Monad

This lecture is about a programming model for deterministic parallelism, introduced by Simon Marlow and colleagues. It introduces the Par Monad, a monad for deterministic parallelism, and shows how I-structures are used to exchange information between parallel tasks (or “blobs”), see Marlow’s Haskell’11 paper with Ryan Newton and Simon PJ. You should read this paper.

Take a look at the I-Structures paper referred to in the lecture (not obligatory but interesting). See PCPH chapter 4.

Also, Phil Wadler’s “Essence of Functional Programming” is a very interesting read, and it covers monads and continuation passing style.

The lecture starts with a presentation by Koen Claessen on his Poor Man’s Concurrency Monad (see his JFP Pearl).

Slides:

The version of the Par monad that Max described that allows you to draw pictures of your programs is at

ParMGraph

Parallel Programming in Erlang

This lecture introduces Erlang for Haskell programmers, taking parallelising quicksort as an example, both within one Erlang VM and distributed across a network. The latest version of the Erlang system can be downloaded from here. There is a Windows installer. Many linux versions have an Erlang packagage available, but not necessarily a package suitable for development of Erlang code, and not necessarily the latest version. On Ubuntu, try

sudo apt-get install erlang-dev

If that doesn’t work or you can’t find an appropriate package, build the VM from source.

Slides:

The slides

exercise session on parallel programming in Erlang

The slides and code are given as a tarball, as the slides are in a html file.

The sources are in the sat-src subdirectory. The files sat{1..4}.erl contain the solver broken into parts; sat.erl contains the whole thing, so you can run it.

Anton has also included the source for a heavily commented Haskell version of the Sudoku solver that he wrote when getting into the code. This may help you to understand the SAT solver (though not how to parallelize it).

Code:

Files, tarball

Robust Erlang

This lecture focusses on the fault tolerance constructs in Erlang–links and system processes–and the motivation for the “Let It Crash” philosophy. It introduces supervision trees and the Open Telecoms Platform, and develops a simple generic server.

Slides :

The slides

Parallel Functional Programming in Erlang at Klarna (Richard Carlsson)

Slides:

The slides

The Erlang Virtual Machine (Erik Stenman)

Slides:

The slides

Erlang Parallel Search

Slides:

The slides

Map Reduce

Google’s Map-Reduce framework has become a popular approach for processing very large datasets in distributed clusters. Although originally implemented in C++, it’s connections with functional programming are close: the original inspiration came from the map and reduce functions in LISP; MapReduce is a higher-order function for distributed computing; purely functional behaviour of mappers and reducers is exploited for fault tolerance; it is ideal for implementation in Erlang. This lecture explains what Map-Reduce is, shows a sequential and a simple parallel implementation in Erlang, and discusses the refinements needed to make it work in reality.

Reading:

One of
MapReduce: Simplified Data Processing on Large Clusters, the original paper from 2004.
MapReduce: Simplified Data Processing on Large Clusters, a retrospective published in CACM in 2008.

Yes, both papers have the same title (and the same authors). What can you do?

Slides:

The slides

Parallel Functional Programming in Java (Peter Sestoft)

It has long been assumed in academic circles that functional programming, and declarative processing of streams of immutable data, are convenient and effective tools for parallel programming. Evidence for this is now provided, paradoxically, by the object-imperative Java language, whose version 8 (from 2014) supports functional programming, parallelizable stream processing, and parallel array prefix operations. We illustrate some of these features and use them to solve computational problems that are usually handled by (hard to parallelize) for-loops, and also combinatorial problems such as the n-queens problem, using only streams, higher-order functions and recursion. We show that this declarative approach leads to very good performance on shared-memory multicore machines with a near-trivial parallelization effort on this widely used programming platform. We also highlight a few of the warts caused by the embedding in Java. Some of the examples presented are from Sestoft: Java Precisely, 3rd edition, MIT Press 2016.

Slides:

The slides

Data Parallel Programming I

This lecture is all about Guy Blelloch’s seminal work on the NESL programming language, and on parallel functional algorithms and associated cost models.

Material:

The best introduction is to watch the video of his marvellous invited talk at ICFP 2010
A page about NESL (including interactive tutorial, papers and information about applications)

Reading:

To read about Work and Depth, start with this page. Note that there will be an exam question about calculating work and depth of a NESL program. So study examples!
We advise reading the whole of Blelloch’s Programming Parallel Algorithms, CACM 39(3).

Slides:

The slides

Single Assignment C — Functional Programming for HP^3 (Sven-Bodo Scholz)

SaC is designed to combine High-Productivity with High-Performance and High-Portability. The key to achieving this goal is a purely functional core of the language combined with several advanced compilation and runtime techniques. This lecture gives an overview of the key design choices that SaC is based upon and it sketches how these can be leveraged to producing codes for various heterogeneous many-core systems that often outperform hand-written low-level counterparts.

Slides:

The slides

Data Parallel Programming II

We briefly present some details of (and some non-idiomatic programming in) Repa (a library for data parallel programming in Haskell).

Material:

Chapter 5 of Marlow’s book is about Repa
The Repa home page
The third Repa paper: Guiding Parallel Array Fusion with Indexed Types

Slides:

The slides

General Purpose Computations on GPUs

Graphics Processing Units (GPUs) are massively parallel computers that offer great performance benefits over traditional CPUs for highly data parallel problems. The cost of this performance benefit is a more complicated software development procedure. When programming GPUs the programmer needs to manage, layout and store intermediate or often used values manually in a scratchpad memory. This can be compared to the transparent service provided caches in a CPU. GPUs thrive under workloads consisting of (tens of) thousands of independent threads, all doing the same work and exhibiting highly regular memory access patterns.

For achieving optimal performance on a GPU, programmers often specialize code for a particular problem size and decomposition of work over the available resources. This is because the cost of dynamic choice within the threads running on the GPU is high. The choices that leads to the best performance may also differ between different GPUs. The cost of experimentation with program decomposition is high in a language such as CUDA, where changing a decision of work-to-thread mapping made early may mean a complete rewrite of the application code.

Obsidian is an embedded language for design space exploration. The idea is to raise the level of abstraction enough to enable a faster turn around time when experimenting with decompositions of work onto the GPU resources. Using Obsidian it is possible to write parameterized, higher level, descriptions of algorithms and to generate specialized CUDA code for each parameter setting. Thus once the high level description is written, the variants are generated by the a small tweak of a parameter.

In paper [1], we outline the general ideas and goal behind Obsidian without going in-depth on any details. In Paper [2], we combine parameterized Obsidian programs with an auto-tuning system to do the parameter exploration automatically. For a long and very in-depth description of Obsidian you can refer to our JFP paper [3]. This is not required reading.

References:

[1] Bo Joel Svensson, Mary Sheeran, Ryan R. Newton Design Exploration through Code-generating DSLs

[2] Michael Vollmer, Bo Joel Svensson, Eric Holk, Ryan R. Newton Meta-programming and auto-tuning in the search for high performance GPU code

[3] Bo Joel Svensson, Ryan R. Newton, Mary Sheeran A language for hierarchical data parallel design-space exploration on GPUs

Slides:

The slides

To learn more:

A Udacity course about GPU programming that also gives access to GPUs for running CUDA code (for doing assignments). We have not tried this. Let us know what you think of it!

Skeletons for Parallel Scientific Computing (David Duke, Leeds Univ.)

This lecture, will unfortunately, not be held this year because David Duke has been taken ill. We wish him a speedy recovery and hope to see him back next year! We are leaving the abstract and slides up because this is really impressive and interesting work, and a good example of parallel programming in Haskell. Take a look.

Scientific datasets arising from observation (e.g. satellites, microscopes) or simulation (supercomputing) are simply files of numbers. While small data (kilobytes and megabytes) are sometimes valuable, more common examples are at the gigabyte/terabyte scale, with peta-byte datasets now routine. And work is underway to develop machines capable of processing exabyte-scale data. But numbers alone are useless to scientists - value comes from converting them into forms that answer open questions of shed new insight into the phenomena being studied. In computational science this step is achieved through a combination of mathematical analysis to extract structural features of the data, and visualization to present those features.

Given the size of datasets in computational science, parallel computing is routine. But while some algorithms are “embarrassingly parallel”, others have proven more difficult. A particularly awkward class are those for extracting topological abstractions, which provide highly compact and useful descriptors of the “shape” of phenomena. In this lecture I will introduce two of these abstractions, and then explain how we exploited Haskell to carry out analysis of data from nuclear simulations, first using a sequential implementation, and then moving to parallel for scalability. This work, carried out in collaboration with Nicholas Schunck, a physicist at the Lawrence Livermore National Laboratory in the US, has resulted in multiple papers including two publications [1],[2] in Physical Review C, and new insight into the process of nuclear fission.

The lecture will highlight an benefit of parallel programming in Haskell or indeed other functional languages: the opportunity to develop new higher-order abstractions, in the form of “skeletons” that capture patterns of computation. In principle these afford concise and elegant construction of parallel computations by applying computational patterns to problem-specific code. The reality is somewhat messier: effective use of parallel resources sometimes requires deep domain-specific knowledge and the creation of specialised skeletons. In the lecture I will illustrate these challenges using examples from computational topology, along with the steps needed to tune performance of our parallel applications on both shared and distributed memory architectures. Finally, I will also touch on some of the non-technical challenges in working with Haskell.

To read more about algorithmic skeletons, the recommended reference is

Parallel Functional Programming in Eden, R. Loogen, Y. Ortega-Mallen, and R. Pena-Mari, Journal of Functional Programming, 15(3), 431-475, 2005

Slides :

The slides (pdf)
The slides (pptx)

Databases in the New World

No-SQL databases have become very popular for the kind of scalable applications that Erlang is used for. In this lecture, we introduce the mother of them all, Amazon’s Dynamo, and one of its descendants – Riak, implemented in Erlang by Basho Technologies. We discuss scalability, the CAP theorem, eventual consistency, consistent hashing and the ring, and the mechanisms used to detect, tolerate, and repair inconsistency.

Reading:

The key reference is the Dynamo paper.
For amusing and informative background reading, check out The Network is Reliable (yeah right).

Slides:

The slides

Simple Tips & Tricks for Map-Reduce and Big Data Databases (Jyrki Nummenmaa)

These lectures are a follow-up attempt to the lectures presented thus far on these topics. In the Map-Reduce part we will take a set of example problems and present some approaches to solve them. In the Big Data Databases part we will consider the end-user data needs and some issues related to how to organise the data for particular applications.

The first two chapters of this book provide an introduction to MapReduce, and they would be good to look through. The rest contains extensive examples of MapReduce.

Slides:

The slides

The speaker is professor Jyrki Nummenmaa, expert in databases from the University of Tampere, currently on a two-month research visit at the department of Computer Science and Engineering at Chalmers/GU.

Parallel Functional Programming – Lecture content	DAT280 / DIT261, LP4 2017
Home \| Schedule \| Labs \| Lectures \| Exam \| About	Fire \| Forum \| TimeEdit \| Links

Parallel Functional Programming – Lecture content	DAT280 / DIT261, LP4 2017
Home \| Schedule \| Labs \| Lectures \| Exam \| About	Fire \| Forum \| TimeEdit \| Links

Course Introduction

from par and pseq to Strategies

exercise session on parallelising Haskell

The Par Monad

Parallel Programming in Erlang

exercise session on parallel programming in Erlang

Robust Erlang

Parallel Functional Programming in Erlang at Klarna (Richard Carlsson)

The Erlang Virtual Machine (Erik Stenman)

Erlang Parallel Search

Map Reduce

Parallel Functional Programming in Java (Peter Sestoft)

Data Parallel Programming I

Single Assignment C — Functional Programming for HP^3 (Sven-Bodo Scholz)

Data Parallel Programming II

General Purpose Computations on GPUs

Skeletons for Parallel Scientific Computing (David Duke, Leeds Univ.)

Databases in the New World

Simple Tips & Tricks for Map-Reduce and Big Data Databases (Jyrki Nummenmaa)

A Brief History of Time (in Riak) (Russell Brown)