This is the lecture that sets the scene for the course. It introduces what we mean by parallelism, and distinguishes that from concurrency. It explains why functional languages are well suited for parallel programming, and this despite the fact that they really failed to deliver on the promise of solving parallel programming in the past. So why are they interesting now?
Simon Marlow's book on Parallel and Concurrent Programming in Haskell gives a good explanation of why the topics of this course are interesting. It also makes the same distinction between concurrency and parallelism as that made in this course. We consider only Part I on parallelism. We will simply call the book PCPH. Simon will kindly give us a guest lecture in Week 5 of the course.
Haskell on a Shared-Memory Multiprocessr, Harris, Marlow and Peyton Jones, Haskell'05
Feedback Directed Implicit Parallelism, Harris and Singh, ICFP'07
Runtime Support for Multicore Haskell, Marlow, Peyton Jones and Singh, IFFP'09
This lecture considers par and pseq more critically, and concludes that it might be a good idea to separate the control of behaviour relating to parallelisation from the description of the algorithm itself. The idea of Strategies is described in a well-known paper called Algorithms + Strategies = Parallelism by Trinder, Hammond, Loidl and Peyton Jones. More recently, Marlow and some of the original authors have updated the idea, in Seq no more: Better Strategies for Parallel Haskell. We expect you to read both of these papers. The lecture is based on the newer paper. See also PCPH chapters 2 and 3.
See above for papers. Read PCPH chapters 2 and 3.
The documentation of the Strategies Library is very helpful.
Nick is the course assistant (TA). In this lecture, he will tell you things that you need to know to make a good job of the labs in the course, based on experience from previous years.
There is only so much parallelism the memory can handle (the effect known as "memory wall"). While both functional and imperative languages use the concept of heap for managing memory, the behavior of programs written in pure languages like Haskell is radically different from that of programs written with aggressive use of side effects — there is no mutation of data but much more allocation of it. We will review the major design decisions behind GHC implementation of heap, including garbage collection, multithreading and I/O management. We will also take a look at how tweaking heap runtime parameters can impact performance of a program, with help of Threadscope.
This lecture is about a programming model for deterministic parallelism, introduced by Simon Marlow and colleagues. It introduces the Par Monad, a monad for deterministic parallelism, and shows how I-structures are used to exchange information between parallel tasks (or "blobs"), see Marlow's Haskell'11 paper with Ryan Newton and Simon PJ. Take a look at the I-Structures paper referred to in the lecture. See PCPH chapter 4.
This lecture is all about Guy Blelloch's seminal work on the NESL programming language and on parallel functional algorithms and associated cost models. The best introduction is to watch the video of his marvellous invited talk at ICFP 2010, which John and Mary had the pleasure to attend. There are pages about NESL and about his publications in general. For the notions of work and depth, see this part of the 1996 CACM paper, and also this page, which considers work and depth for three algorithms.
This part of the course covers Data parallel programming using the Repa library, which gives flat data parallelism in Haskell. I decided that your best chance of learning about it is by self study (using my pointers to material) rather than in a lecture. So I have cancelled the lecture and would like you to spend the time saved on studying the materials mentioned below and on programming using Repa (which you need to do in Lab B). My suggestion is that you first read chapter 5 of PCPH. Then read the third Repa paper: paper on Repa 3 from Haskell'12. After that, you probably want to at least look at the first Repa paper from ICFP 2010. For completeness, I also mention the paper in between these two: the second Repa paper from Haskell'11. Through the Repa home page, you can find the source repos on github and much else.
Also, I found this video about Haskell at the NYT and it seems to me to be a very interesting Repa success story.
Below I include the slides from the lecture that I gave in 2014 (which made me think that a lecture was not the ideal way to present this material). Importantly, the slides contain a link to an informative video by Ben Lippmeier, so don't miss that. The sorting network example that I covered in the lecture was chosen as a test of whether Repa is good for rather simple data parallel programming that does not make use of the shape polymorphism that is an important part of Repa. I wanted to see if I could still get decent performance from short, pretty naive, simple code. And the answer was yes! Indeed, later some colleagues from Copenhagen tried to redo the tree shaped sorter using more idiomatic Repa (with shape polymorphism). The resulting performance was at least a factor of two worse. So there is more research to do ... I note that Repa is still under development. We could probably find interesting masters thesis projects in this area! Note that you should devote most of your time to studying more standard Repa examples :)
Note that the lecture starts with a very brief overview of Data Parallel Haskell, using the famous Barnes Hut algorithm as an example. (Take a look at the Barnes Hut paper (to which there is a link) if only to enjoy a CS paper appearing in nature (the journal)!) We will not be presenting Data Parallel Haskell further (though it is a really interesting project).
A life insurance or pensions company typically has hundreds of thousands of customers (the future retirees) and multiple very precise and well-formalized contracts with each customer. These contracts are long-running: a contract entered with a 25 year old woman today may still be in force and have to be managed in year 2090. Computing the company's obligations, in terms of reserves (the expected net present value of future payments) and cashflows (the expected distribution of these payments over time), is crucial to estimating the company's financial health. There are strong regulatory regimes mandating such estimates, for instance the forthcoming EU Solvency 2 rules.
In the lecture, we first outline this application area.
Second, we present the development of a domain-specific language, the Actulus Modeling Language, for describing a broad range of life insurance and pension contracts based on state models of the insured lives.
Third, we show that General-Purpose Graphics Processing Units (GPGPUs) are well suited for computing reserves and cashflows. These quantities are described using Thiele's differential equations (1875) which can be solved numerically with high precision and efficiency, also for the otherwise rather challenging so-called collective (eg. spouse or child) pension products.
This is joint work with David R Christiansen, other students, and several colleagues at Edlund A/S. It was conducted within the Actulus project, a collaboration between the company Edlund A/S, the section for financial mathematics at Copenhagen University, and the IT University of Copenhagen, financed in part by the Danish Advanced Technology Foundation.
Bio: Peter Sestoft is professor at the IT University of Copenhagen and works mainly with programming language technology. He is co-developer of various open source software, including the Moscow ML implementation of Standard ML, the C5 Generic Collection Library for C# and CLI, and the Funcalc spreadsheet research prototype. He is a co-author, with Jones and Gomard, of the standard reference on partial evaluation (1993), and author of several other books, most recently Spreadsheet Implementation Technology (MIT Press, October 2014).
To quote PCPH: "Accelerate is an embedded domain-specific language (EDSL) for programming the GPU. It allows us to write Haskell code in a somewhat stylized form and have it run directly on the GPU. For certain tasks, we can obtain orders of magnitude speedup by using Accelerate.". This lecture gives some information about Graphics Processing Units (GPUs) and GPU programming, and then introduces Accelerate, Obsidian and related topics. You can find more information on the Accelerate page on github, including the two papers that we expect you to read, a video, tutorial examples etc. Further information about Obsidian is available in a draft paper, and in various papers on the home page of Joel Svensson, the author of Obsidian.
This lecture introduced Erlang for Haskell programmers, taking parallelising quicksort as an example, both within one Erlang VM and distributed across a network. The latest version of the Erlang system can be downloaded from here. There is a Windows installer. Many linux versions have an Erlang packagage available, but not necessarily a package suitable for development of Erlang code, and not necessarily the latest version. On Ubuntu, try
sudo apt-get install erlang-dev
If that doesn't work or you can't find an appropriate package, build the VM from source.
Over the last few decades, performance of processors has grown at a much faster pace than performance of memories. The issue becomes even more severe with the advent of the (massive) multicore era. This gap is addressed by clever design and use of caches. One wouldn't be wrong to say that design of parallel computers is, above all, caches. The quantitative study of algorithms in a parallel setting has already extended the time and space complexity analyses with notions of work and depth. In this lecture, we take one more step and show how to reason about the cache behavior of algorithms.
This lecture focusses on the fault tolerance constructs in Erlang--links and system processes--and the motivation for the "Let It Crash" philosophy. It introduces supervision trees and the Open Telecoms Platform, and develops a simple generic server.
Google's Map-Reduce framework has become a popular approach for processing very large datasets in distributed clusters. Although originally implemented in C++, it's connections with functional programming are close: the original inspiration came from the map and reduce functions in LISP; MapReduce is a higher-order function for distributed computing; purely functional behaviour of mappers and reducers is exploited for fault tolerance; it is ideal for implementation in Erlang. This lecture explains what Map-Reduce is, shows a sequential and a simple parallel implementation in Erlang, and discusses the refinements needed to make it work in reality.
Facebook has a large existing system that identifies and remediates abuse: primarily spam, but also other types of abuse, using a combination of techniques including manually written rules and machine learning classifiers. This system actively and automatically prevents vast amounts of undesirable content from reaching users of Facebook.
The system provides a domain-specific language in which the detection logic is written, and we are in the process of migrating this language from an in-house functional language called FXL to Haskell. At the current time, the system is running nearly all of its requests on Haskell. We believe this is the largest Haskell deployment currently in existence.
In this talk I'll explain the problem domain, and why Haskell is uniquely suited to it. The key value proposition of the DSL implementation is implicit concurrency, and I'll outline our solution to this (the Haxl framework). I'll also cover many of the engineering problems we had to solve, including how we deploy new code, going from a source code change to running new code on all the machines in a few minutes. The migration path we are on is littered with the corpses of bugs found and problems solved; I'll share a few of the war stories and the lessons we have learned.
Bio: Simon Marlow is a Software Engineer at Facebook in London. He is a co-author of the Glasgow Haskell Compiler, author of the book “Parallel and Concurrent Programming in Haskell”, and has a string of research publications in functional programming, language design, compilers, and language implementation.
No-SQL databases have become very popular for the kind of scalable applications that Erlang is used for. In this lecture we introduce the mother of them all, Amazon's Dynamo, and one of its descendants--Riak, implemented in Erlang by Basho Technologies. We discuss scalability, the CAP theorem, eventual consistency, consistent hashing and the ring, and the mechanisms used to detect, tolerate, and repair inconsistency. The key reference is the Dynamo paper. For amusing and informative background reading, check out The Network is Reliable (yeah right).
SAC (Single Assignment C) is in several aspects a functional programming language out of the ordinary. As the name suggests, SAC combines a C-like syntax (with lots of curly brackets) with a state-free, purely functional semantics. Originally motivated to ease adoption by programmers with an imperative background, the choice offers surprising insights into what constitutes a "typical" functional or a "typical" imperative language construct.
Again on the exotic side, SAC does not favour lists and trees, or more generally algebraic data types, but puts all emphasis on multi-dimensional arrays as the primary data structure. Based on a formal array calculus SAC supports declarative array processing in the spirit of interpreted languages such as APL. Array programming treats multidimensional arrays in a holistic way: functions map potentially huge argument array values into result array values following a call-by-value semantics and new array operations are defined by composition of existing ones.
SAC is a high-productivity language for application domains that deal with large collections of data in a computationally intensive way. At the same time SAC also is a high performance language competing with low-level imperative languages through compilation technology. The abstract view on arrays combined with the functional semantics support far-reaching program transformations. A highly optimised runtime system takes care of automatic memory management with an emphasis on immediate reuse. Last not least, the SAC compiler exploits the state-free semantics of SAC and the data-parallel nature of SAC programs for fully compiler-directed acceleration on contemporary multi- and many-core architectures.
Note that the slides contain links to the papers that Russell mentioned.