Inconvenient truth(s)

Trends in CPUs

Skylake die

source: http://www.techpowerup.com/215333/intel-skylake-die-layout-detailed.html

source: http://arstechnica.com/gadgets/2015/09/skylake-for-laptops-faster-core-m-and-ultrabook-gpus-with-edram/

source: http://arstechnica.com/information-technology/2015/08/the-many-tricks-intel-skylake-uses-to-go-faster-and-use-less-power/

source: http://techreport.com/review/28751/intel-core-i7-6700k-skylake-processor-reviewed/4


Back-of-the-envelope calculations

How to tune your application

Sawzall

Parallelizing computations

int c = 0;
for(int i = 0; i < n; ++i) {
  c += a[i];
}

int c = 0;
int[] tmp = new int[4];
pfor (int j = 0; j < 4; ++j) { // This is pseudocode
  for(int i = 0; i < n; ++i) {
    tmp[j] += a[i];
  }
}
for (int i = 0; i < 4; ++i) { c += tmp[i];}

source: https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation

source: https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation

Multicore

Aspects

Speeding up programs on a multicore CPU

General guidelines

Making code run in parallel

code

map(_, []) -> [];
map(F, [H|T]) -> [F(H)|map(F, T)].

Parallel map

pmap(F, Xs) ->
   S = self(),
   Ref = make_ref(),
   Pids = map(fun(X) -> spawn( fun() -> 
                                  S ! {self(), Ref, F(X)} 
                               end )
              end, Xs),
   gather(Pids, Ref).

gather([Pid|T], Ref) -> 
   receive
       {Pid, Ref, Ret} -> [Ret|gather(T, Ref)]
   end ;
gather([], _) -> [].

Workers

code

Workers example

Implementation of workers

   worker(Compute) ->
       spawn (fun () -> worker_body(Compute) end ).

   worker_body(Compute) ->
            receive {Pid, Tasks} ->
                    Result = Compute(Tasks),
                    Pid ! {self(), Result},
                    worker_body(Compute)
            end.
  

Function pmap using workers

On lock based programming

Solution: atomic blocks

Software Transactional Memories (STM)

We have two processes with two different transactions.

Transaction 1

Now, both transactions read their corresponding variables. Each transaction recalls the version number of the read variables.

Transaction 2

The transaction on the left firstly writes into variable x, and the transaction on the right follows but it fails (Why?) and retry.

Transaction 3

The transaction on the right retries.

Transaction 4

At the time of writing, it succeeds (Why?). Transaction 5



Concurrent Programming 2016 - Chalmers University of Technology & Gothenburg University