Assignments
General Information
There are 2 lab assignments, each with two parts, so we end up with labs A to D.
You should work in pairs.
If you have a good reason for doing the assignments by yourself,
please contact Nick, Mary or John. You need to pass all lab assignments
in order to pass the course.
The assignments should be handed in using the Fire system.
The final deadline for all labs is May 28 at midnight. That means that you should aim to have all labs finished (and preferably also approved!) by then. Each lab should, in addition, be submitted at least once before the initial deadline for that lab.
Rules - IMPORTANT
Please read these
early and carefully!
- Deadlines are hard. If you for some reason cannot
make the deadline, contact us before the deadline,
and tell us what your reason is, together with a realistic
proposal of a new personal deadline for you. You may then
get an extension of the deadline.
- Your last attempt has to be submitted before the final
deadline. If you fail to do this, your submission will
be rejected.
- Cheating is taken very seriously. Before you start working
on the assignments, please read the note on cheating.
Submission
Clean Code
Before you submit your code, Clean It Up! Submitting clean code is Really
Important, and simply the polite thing to do. After you feel you are done, spend
some time on cleaning your code; make it simpler, remove unneccessary things,
etc. We will reject your solution if it is not clean. Clean code:
- Does not have long lines (< 80 characters)
- Has a consistent layout
- Has type signatures for all top-level functions
- Has good comments
- Has no junk (junk is unused code, commented code, unnecessary comments)
- Has no overly complicated function definitions
- Does not contain any repetitive code (copy-and-paste programming)
What to include
Your submission needs to include the following information:
- Your Haskell file (.hs or .lhs), containing your
solution. A single Haskell file is preferred. Call it LabYxx
where xx is your group number and Y is the letter name of that lab sub-part.
Do not submit .o, .hi or .exe files!
- report.txt or report.pdf, a file containing brief documentation
of what you have done. Answer the questions asked in the assignment.
Before you submit,
please read the note on cheating.
You are supposed to submit your solution using the Fire system.
Note: You do NOT have to use zip/gzip/tar/bzip on
your files! Just upload each file that is part of your
solution. Don't forget to submit when you have uploaded all the files.
Benchmarking Requirements (Haskell part)
Include Threadscope plots in the report and draw conclusions from
them
Draw conclusions from runtime statistics (+RTS -s)
Play with different granularities (for example by introducing a threshold or depth parameter).
To the Fire System
Tools
The stuDAT (Linux) computers have a recent Haskell Platform installed. We will also be using
Amazon's Cloud (EC2) to provide you with access to parallel machines. More on that shortly.
The GHC user guide contains a chapter about Using GHC (command-line arguments
and the like).
Threadscope is also available on the (Linux) stuDAT computers. (Type threadscope
at the prompt.)
To get the Haskell Platform for your own laptop, go to
the Haskell Platform page on haskell.org (which is a great source of Haskell info).
The Threadscope page includes information about how to use the tool, as well as how to install it.
Installing on Linux laptops seems straightforward. There are binary releases for Windows and Mac, thank goodness.
You should plan to get used to using Threadscope during the first week of the course.
Advice on GHC flags, garbage collection, task size etc. from Nick Frolov (TA)
If you're not getting speedups, or if the performance of a parallel program is
worse than that of a sequential one, make sure that garbage collection is not
the bottleneck in your program. If a significant part of your Threadscope plots
is GC (orange color), try to run your program pretending you have unlimited
memory, so no GC will happen. Set a ridiculously large nursery size with the -A
flag (100 Mb will do), this will effectively turn the collector off, as the
nursery will be never full, therefore no GC will be needed.
If your program actually requires more memory for intermediate values over its
runtime, you might want to consider to recycle memory. A small nursery size will
cause more frequent GC but also more eager promotion. The former brings an
overhead when allocated data doesn't tend to be short-living, and the latter in
the opposite case (if short-living data ended up being promoted to an older
generation, it will still have to be collected). A good nursery size should not
usually also be larger than L2 cache to exploit locality, but a better one can
only be found experimentally. The -H flag (setting the initial heap size) should
not be neglected either.
While the heap will expand if its initial size was not enough, doing so is an
extra work for the collector. Set it generously (perhaps, 1 Gb), there is no
fault in doing it other than exhausting your RAM. Which, ideally, you shouldn't
do, as swapping is much more expensive than GC, just as serving cache faults
from main memory is if the nursery exceeds the cache size.
* * *
Don't forget to experiment with depth (or unit of work size, if you prefer
that). If you're getting many fizzled sparks (check on it with the -s flag or in
Threadscope), it is a clear sign that your units of work are too small and not
worth of spawning a thread. Remember that to run even a Haskell green thread
means tens of instructions, this does not justify an addition of 32-bit integers
or a similarly cheap operation. Another cause of fizzled sparks is uneven
division of work into chunks, which can also lead to sparked computation results
being needed sooner than the corresponding sparks get a chance to be
converted. You can see spark size statistics in Threadscope ("Spark sizes" tab).
* * *
If you're wondering why your sequential scan with an "expensive" operator runs
much faster than a parallelized version, try to switch off optimization. Some
"expensive" operators are not that expensive if GHC takes a good look at them,
especially if it sees a chance for aggressive inlining (and there are many
indeed, when no sparks are being created).
Exercise 1 - Getting Started
Study the use of
Threadscope.
Write a simple merge sort program in Haskell and parallelise it
Details.
Lab A
Parallelising scan and a Fast Fourier Transform algorithm in Haskell. The lab description also contains pointers to interesting papers and slides.
Details
Lab B
Parallelising a Sudoku solver in Erlang. The deadline is Friday April 25th at midnight.
Details, problems.txt, sudoku.erl
Lab C
NESL-style programming and tutorial writing.
Details
Lab D
Distributed Erlang and Map-Reduce.
Details
Exercise 2 : Using Accelerate (not obligatory, but advised)
It would be a good idea to get some experience of using Accelerate for GPU-acceleration.
Chapter 6 of Simon Marlow's book is a good introduction to using Accelerate.
My approach was to read that and to play with the array manipulation functions in the API by messing about with BMP pictures, which are easy to import, transform and write out to file. It's quite fun to program up your own pictures too. (Here is one of my productions (
z1.bmp) so that you have one to start with). You might get some inspiration from
last year's tutorial winner (which is about Repa, mind you). I also found this Haskell library for image manipulation
quite inspiring. It is interesting to experience the difference between Repa programming (where you are in control of the array representation) and Accelerate programming (where you are not). Note that you can get all the examples from Marlow's book in the package parconc-examples on Hackage.
Study the Accelerate examples at https://github.com/AccelerateHS/accelerate-examples, including how they are benchmarked.
For a non-graphics exercise, see page 14 and onwards in this set of exercises by Simon Marlow. (You will also be able to find the solution online if you look hard enough.)
I followed Simon Marlow's advice to install accelerate-cuda with -fdebug, so that -ddump-cc can be used when running examples, to see what kernels are generated. This didn't work for me at the same time as using Criterion, however. (I am running on MacOS and the installation worked without any problems. Unix should also be easy. Windows is harder, though it is rumoured that Geoff Mainland has made this work.) If you would like to use Amazon's EC2 GPU instances (in Linux), send your SSH keys to Nick (frolov at chalmers.se) and he will set you up. These instances no longer commit to providing a Tesla, but they should still be an easy way to get access to a decent GPU.
For the guide on how to create SSH keys, see the GitHub tips section
on generating SSH keys.