Parallel Functional Programming – Lecture contentDAT280 / DIT261, LP4 2017
Home | Schedule | Labs | Lectures | Exam | AboutFire | Forum | TimeEdit | Links
Parallel Functional Programming – Lecture contentDAT280 / DIT261, LP4 2017
Home | Schedule | Labs | Lectures | Exam | AboutFire | Forum | TimeEdit | Links

This page describes each lecture and contains links to related materials.

Course Introduction

This is the lecture that sets the scene for the course. It explains why functional languages are well suited for parallel programming, and this despite the fact that they really failed to deliver on the promise of solving parallel programming in the past. So why are they interesting now?

Slides:

Reading:

Simon Marlow’s book on Parallel and Concurrent Programming in Haskell gives a good explanation of why the topics of this course are interesting. It also makes the same distinction between concurrency and parallelism as that made in this course. We consider only Part I on parallelism. We will simply call the book PCPH.


from par and pseq to Strategies

This lecture considers par and pseq more critically, and concludes that it might be a good idea to separate the control of behaviour relating to parallelisation from the description of the algorithm itself. The idea of Strategies is described in a well-known paper called Algorithms + Strategies = Parallelism by Trinder, Hammond, Loidl and Peyton Jones. More recently, Marlow and some of the original authors have updated the idea, in Seq no more: Better Strategies for Parallel Haskell. We expect you to read both of these papers. The lecture is based on the newer paper. See also PCPH chapters 2 and 3.

Slides: * The slides

Other Material:

exercise session on parallelising Haskell

Code:


The Par Monad

This lecture is about a programming model for deterministic parallelism, introduced by Simon Marlow and colleagues. It introduces the Par Monad, a monad for deterministic parallelism, and shows how I-structures are used to exchange information between parallel tasks (or “blobs”), see Marlow’s Haskell’11 paper with Ryan Newton and Simon PJ. You should read this paper.

Take a look at the I-Structures paper referred to in the lecture (not obligatory but interesting). See PCPH chapter 4.

Also, Phil Wadler’s “Essence of Functional Programming” is a very interesting read, and it covers monads and continuation passing style.

The lecture starts with a presentation by Koen Claessen on his Poor Man’s Concurrency Monad (see his JFP Pearl).

Slides:

The version of the Par monad that Max described that allows you to draw pictures of your programs is at

Parallel Programming in Erlang

This lecture introduces Erlang for Haskell programmers, taking parallelising quicksort as an example, both within one Erlang VM and distributed across a network. The latest version of the Erlang system can be downloaded from here. There is a Windows installer. Many linux versions have an Erlang packagage available, but not necessarily a package suitable for development of Erlang code, and not necessarily the latest version. On Ubuntu, try

If that doesn’t work or you can’t find an appropriate package, build the VM from source.

Slides:

exercise session on parallel programming in Erlang

The slides and code are given as a tarball, as the slides are in a html file.

The sources are in the sat-src subdirectory. The files sat{1..4}.erl contain the solver broken into parts; sat.erl contains the whole thing, so you can run it.

Anton has also included the source for a heavily commented Haskell version of the Sudoku solver that he wrote when getting into the code. This may help you to understand the SAT solver (though not how to parallelize it).

Code:

Robust Erlang

This lecture focusses on the fault tolerance constructs in Erlang–links and system processes–and the motivation for the “Let It Crash” philosophy. It introduces supervision trees and the Open Telecoms Platform, and develops a simple generic server.

Slides :

Parallel Functional Programming in Erlang at Klarna (Richard Carlsson)

Slides:

The Erlang Virtual Machine (Erik Stenman)

Slides:

Slides:

Map Reduce

Google’s Map-Reduce framework has become a popular approach for processing very large datasets in distributed clusters. Although originally implemented in C++, it’s connections with functional programming are close: the original inspiration came from the map and reduce functions in LISP; MapReduce is a higher-order function for distributed computing; purely functional behaviour of mappers and reducers is exploited for fault tolerance; it is ideal for implementation in Erlang. This lecture explains what Map-Reduce is, shows a sequential and a simple parallel implementation in Erlang, and discusses the refinements needed to make it work in reality.

Reading:

Yes, both papers have the same title (and the same authors). What can you do?

Slides:

Parallel Functional Programming in Java (Peter Sestoft)

It has long been assumed in academic circles that functional programming, and declarative processing of streams of immutable data, are convenient and effective tools for parallel programming. Evidence for this is now provided, paradoxically, by the object-imperative Java language, whose version 8 (from 2014) supports functional programming, parallelizable stream processing, and parallel array prefix operations. We illustrate some of these features and use them to solve computational problems that are usually handled by (hard to parallelize) for-loops, and also combinatorial problems such as the n-queens problem, using only streams, higher-order functions and recursion. We show that this declarative approach leads to very good performance on shared-memory multicore machines with a near-trivial parallelization effort on this widely used programming platform. We also highlight a few of the warts caused by the embedding in Java. Some of the examples presented are from Sestoft: Java Precisely, 3rd edition, MIT Press 2016.

Slides:

Data Parallel Programming I

This lecture is all about Guy Blelloch’s seminal work on the NESL programming language, and on parallel functional algorithms and associated cost models.

Material:

Reading:

Slides:

Single Assignment C — Functional Programming for HP^3 (Sven-Bodo Scholz)

SaC is designed to combine High-Productivity with High-Performance and High-Portability. The key to achieving this goal is a purely functional core of the language combined with several advanced compilation and runtime techniques. This lecture gives an overview of the key design choices that SaC is based upon and it sketches how these can be leveraged to producing codes for various heterogeneous many-core systems that often outperform hand-written low-level counterparts.

Slides:

Data Parallel Programming II

We briefly present some details of (and some non-idiomatic programming in) Repa (a library for data parallel programming in Haskell).

Material:

Slides:

General Purpose Computations on GPUs

Graphics Processing Units (GPUs) are massively parallel computers that offer great performance benefits over traditional CPUs for highly data parallel problems. The cost of this performance benefit is a more complicated software development procedure. When programming GPUs the programmer needs to manage, layout and store intermediate or often used values manually in a scratchpad memory. This can be compared to the transparent service provided caches in a CPU. GPUs thrive under workloads consisting of (tens of) thousands of independent threads, all doing the same work and exhibiting highly regular memory access patterns.

For achieving optimal performance on a GPU, programmers often specialize code for a particular problem size and decomposition of work over the available resources. This is because the cost of dynamic choice within the threads running on the GPU is high. The choices that leads to the best performance may also differ between different GPUs. The cost of experimentation with program decomposition is high in a language such as CUDA, where changing a decision of work-to-thread mapping made early may mean a complete rewrite of the application code.

Obsidian is an embedded language for design space exploration. The idea is to raise the level of abstraction enough to enable a faster turn around time when experimenting with decompositions of work onto the GPU resources. Using Obsidian it is possible to write parameterized, higher level, descriptions of algorithms and to generate specialized CUDA code for each parameter setting. Thus once the high level description is written, the variants are generated by the a small tweak of a parameter.

In paper [1], we outline the general ideas and goal behind Obsidian without going in-depth on any details. In Paper [2], we combine parameterized Obsidian programs with an auto-tuning system to do the parameter exploration automatically. For a long and very in-depth description of Obsidian you can refer to our JFP paper [3]. This is not required reading.

References:

[1] Bo Joel Svensson, Mary Sheeran, Ryan R. Newton Design Exploration through Code-generating DSLs

[2] Michael Vollmer, Bo Joel Svensson, Eric Holk, Ryan R. Newton Meta-programming and auto-tuning in the search for high performance GPU code

[3] Bo Joel Svensson, Ryan R. Newton, Mary Sheeran A language for hierarchical data parallel design-space exploration on GPUs

Slides:

To learn more:

Skeletons for Parallel Scientific Computing (David Duke, Leeds Univ.)

This lecture, will unfortunately, not be held this year because David Duke has been taken ill. We wish him a speedy recovery and hope to see him back next year! We are leaving the abstract and slides up because this is really impressive and interesting work, and a good example of parallel programming in Haskell. Take a look.

Scientific datasets arising from observation (e.g. satellites, microscopes) or simulation (supercomputing) are simply files of numbers. While small data (kilobytes and megabytes) are sometimes valuable, more common examples are at the gigabyte/terabyte scale, with peta-byte datasets now routine. And work is underway to develop machines capable of processing exabyte-scale data. But numbers alone are useless to scientists - value comes from converting them into forms that answer open questions of shed new insight into the phenomena being studied. In computational science this step is achieved through a combination of mathematical analysis to extract structural features of the data, and visualization to present those features.

Given the size of datasets in computational science, parallel computing is routine. But while some algorithms are “embarrassingly parallel”, others have proven more difficult. A particularly awkward class are those for extracting topological abstractions, which provide highly compact and useful descriptors of the “shape” of phenomena. In this lecture I will introduce two of these abstractions, and then explain how we exploited Haskell to carry out analysis of data from nuclear simulations, first using a sequential implementation, and then moving to parallel for scalability. This work, carried out in collaboration with Nicholas Schunck, a physicist at the Lawrence Livermore National Laboratory in the US, has resulted in multiple papers including two publications [1],[2] in Physical Review C, and new insight into the process of nuclear fission.

The lecture will highlight an benefit of parallel programming in Haskell or indeed other functional languages: the opportunity to develop new higher-order abstractions, in the form of “skeletons” that capture patterns of computation. In principle these afford concise and elegant construction of parallel computations by applying computational patterns to problem-specific code. The reality is somewhat messier: effective use of parallel resources sometimes requires deep domain-specific knowledge and the creation of specialised skeletons. In the lecture I will illustrate these challenges using examples from computational topology, along with the steps needed to tune performance of our parallel applications on both shared and distributed memory architectures. Finally, I will also touch on some of the non-technical challenges in working with Haskell.

To read more about algorithmic skeletons, the recommended reference is

Parallel Functional Programming in Eden, R. Loogen, Y. Ortega-Mallen, and R. Pena-Mari, Journal of Functional Programming, 15(3), 431-475, 2005

Slides :

Databases in the New World

No-SQL databases have become very popular for the kind of scalable applications that Erlang is used for. In this lecture, we introduce the mother of them all, Amazon’s Dynamo, and one of its descendants – Riak, implemented in Erlang by Basho Technologies. We discuss scalability, the CAP theorem, eventual consistency, consistent hashing and the ring, and the mechanisms used to detect, tolerate, and repair inconsistency.

Reading:

Slides:

A Brief History of Time (in Riak) (Russell Brown, Basho)

Slides:

Reading:

Note that the slides contain links to the papers mentioned in the lecture.