Labs

Lab 3: A-mazed

Deadlines

See labs page
To pass this lab you have to give a short demo of your submission to the TAs on 19 October 2017. To do that, you have to sign up here by 17 October 2017, indicating your group number in Fire, to select a time slot. Please try to split evenly among the available slots.

Lab description

In this lab, you will have to develop the parallel version of a sequential search algorithm, using the fork/join model. The goal of the search is finding a goal inside a maze taking advantage of parallelism.

Programming language

This lab assignment must be developed in Java.

See the computing resources page for further instructions; make sure you are using Java 8 or newer versions.

Overview

To get started with the lab, download and unpack the material for the lab. To get a better understanding of the problem, run the the sequential version of the search, which is provided as part of the material:

make sequential_small

You should see a maze in a graphical window with an animated player:

The goal of the search is to find a path that leads from the cell in the top-left corner to a goal, denoted by a heart. All cells colored in green are accessible, whereas cells in grey correspond to walls that cannot be traversed.

In the sequential version of the search, which is the one we started, there is only one player — the little figure that moves around — who will explore all cells until it eventually reaches the goal. Once a goal is found, the program displays the path that leads to it by coloring the cells in red:

This lab’s objective is to modify the sequential search algorithm using fork/join parallelism to perform the same search but with multiple players concurrently exploring different parts of the maze.

Maze exploration: the API of class `Maze`

First of all, we should become familiar with the class structure, and in particular with the API of class amazed.maze.Maze. In the material for the lab you can find Javadoc documentation (subdirectory doc) that complements this overview.

Search algorithms interact with an instance of class Maze. Every node (or cell) in the maze has a unique integer identifier. Starting from the initial node, whose identifier is returned by method start(), we can discover the identifiers of new nodes by calling method neighbors(id), which inputs the identifier id of a node we have previously visited, and returns a set of node identifiers, corresponding to all nodes directly adjacent to id and accessible from it. Finally, method hasGoal(id) tests whether id is the identifier of a node containing a goal. If it returns true, we have found a goal!

Node identifiers are randomized at the beginning of every run (even on the same map), which means that they may change from run to run, and that we cannot rely on a fixed mapping between identifiers and positions on a maze’s grid. In fact, if you run the sequential solution multiple times you will probably notice a different sequence of nodes that are visited (it is effectively nondeterministic). Thus, the only option available to build a robust solution is to explore nodes neighborhood by neighborhood, keeping track of the nodes we have already visited.

Strictly speaking, these are the only methods of Maze’s API that are needed to implement a search algorithm. However, it’s nice to keep track of the progress of a search on the graphical animation that we have seen in the sequential version. To do this, we have to add some book-keeping. To explicitly create a new player on a given node id, we call newPlayer(id), which returns a unique identifier of the player. To move a player, we call method move(playerId, id), which moves the player with identifer playerId to node id, provided the latter is accessible and is a neighbor of the node the player is currently in.

Sequential search

Next, let’s have a look at the implementation of class amazed.solver.SequentialSolver, which is the sequential solver algorithm that we have to parallelize. SequentialSolver formally inherits from Java’s RecursiveTask<List<Integer>>, which represents tasks that can be executed in parallel and return a list of integers (see more details in the lecture slides on Parallelizing computations). In the sequential solver, this is just for convenience: SequentialSolver does not actually use any parallelism, but your solution in class ForkJoinSolver can just inherit from SequentialSolver and modify what has to be modified for paralellism, without having to change anything in the class structure.

Class SequentialSolver implements a standard depth-first search. To this end, it uses three data structures to keep track of the state:

A set visited of identifiers of all nodes visited so far during a search.
A stack of frontier nodes, which are all neighbors that have not been visited yet. Using a stack ensures that the most recently discovered neighbors are visited first (LIFO), thus implementing a depth-first search.
A map predecessor that associates to every explored node n its neighbor m, so that n has been reached from m during the search; when this is the case, m maps to n in predecessor. The mapping is only needed at the end of a search, in order to reconstruct the path from the initial node by following the chain of predecessors.

Method depthFirstSearch() implements the actual search:

Start by pushing the start node onto the frontier stack.
Repeat until the frontier becomes empty — meaning that all nodes have been explored:

2.1. Pop a node current from frontier; this is the node to explore in this iteration.

2.2. Check whether current is a goal; if it is, terminate the search successfully.
2.3. Otherwise, add current to visited (if it’s not already visited), move the player there, push all its neighbors onto the frontier stack, and update predecessor with information about how to reach current’s neighbors.
Once a goal has been found, method pathFromTo(from, to) reconstructs a path, as a list of node ids, from node from to node to (where from is the start node and to is a goal node) based on the information previously stored in predecessor. If no goal is found and all nodes have been explored, return null.

How to parallelize the search

This lab’s objective is to implement method parallelDepthFirstSearch() of class ForkJoinSolver so that it implements a parallel version of depth-first search using Java’s fork/join parallelism.

The basic idea is that, at some point during the search, you will fork new threads, and each thread will continue the search in a different part of the maze in parallel to the others. Obviously, a good place to do that is whenever we discover new neighbors: different threads explore different neighbors of the current node. Since ForkJoinSolver is a descendant of Java’s RecursiveTask, to fork a new thread you just create a new instance of ForkJoinSolver, with suitable parameters, and call fork() on the instance.

To design the parallel version of search, start by understanding what state has to be shared among parallel threads. Specifically, you have three data structures that may be shared: a set visited, a stack frontier, and a map predecessor. The threads will have to share some information to synchronize indirectly during the search, but sharing as little as possible helps reduce the possibilities of synchronization errors.

Once you have decided which data structures should be shared, use a thread-safe version of those data structures provided by Java’s standard libraries in java.util.concurrent. For example, for thread-safe sets, use ConcurrentSkipListSet, which implements interface Set. You do not have to implement your own thread-safe version of any data structures for this lab.

Finally, revise the depth-first search algorithm so that it forks new threads during the search, and combines the results of their explorations.

Fine-tuning the degree of parallelism

The constructor of ForkJoinSolver takes a second argument forkAfter, which is an integer that allows us to control the degree of parallelism in our solution.

You are free to ignore this value. However, you may want to use it in your solution to decide when, during a thread’s search, it is time to fork new threads. In fact, a simple approach would be to do it with every new explored cell, but this may be inefficient: forking comes with an overhead, which may grow until it cancels out the speedup advantage of having more threads. We can control the amount of forking using a positive integer forkAfter: search continue sequentially for forkAfter consecutive steps, after which it forks new threads, which will each proceed for forkAfter steps before recursively forking.

The scripts provided with the material for the lab let you test your solution with different values for forkAfter to get an idea of what works better (see Testing your solution).

Assignment

Your assignment is to provide an implementation of parallel depth-first search using fork/join parallelism. You only need to implement method parallelDepthFirstSearch() in class ForkJoinSolver. You are allowed to use auxiliary (private) methods and attributes, but you should not modify other parts of the given implementation.

Submission: You should submission consists of the following files:
- ForkJoinSolver.java: the code of the class modified with your solution; make sure your code includes comments
- documentation.txt: a short plain text file describing the rationale of your solution (up to 300 words of text)

Please do not submit compressed archives. Just upload the individual files.

Requirements

A solution must work correctly, that is it has to always return a path to the goal if a goal is reachable from the initial node, and it has to return null otherwise.
A correct solution has to terminate in finite time in any maze, and it has to work correctly regardless of the number of physical processors that are available.
A correct solution has to work with arbitrary mazes, including mazes with no goals, or with more than one goal (in the latter case, finding any goal is sufficient).
A correct solution has to explore a maze by following adjacent nodes. It cannot “guess” a node identifier and, if it corresponds to a valid node jump to it; it also cannot enumerate identifiers until it finds valid ones.
A correct solution has to use depth-first search, which has to be implemented in parallelDepthFirstSearch().
A correct solution has to employ some level of parallelism: at some point during the search, more than one thread has to be active searching on the maze.
A correct solution may ignore the parameter forkAfter. In any case, it has to behave correctly independent of the value forkAfter is set to in ForkJoinSolver’s constructor.

Tips and tricks

Reuse as much as possible of the existing implementation. In particular, you can reuse the implementation of pathFromTo, to reconstruct paths between specific nodes that have been visited.
In order for the search to terminate, and to take advantage of parallelism, it is important that different threads do not visit again nodes that other threads have already visited.
After a thread forks new threads, it is also responsible for joining them, that is wait for their termination and combine the results of their work. In particular, a child thread may find a path from an intermediate node to a goal node; it is then the responsibility of the child thread’s parent to combine this partial path with additional information to extend the path all the way back to the start node.
It is strongly suggested that you keep track of the new forked tasks as “players” in the graphical animation of the maze. This will help you understand what is going on in the maze. To do this, you have to call newPlayer(id) for every new thread forked at node id; and move(playerId, id) whenever player playerId moves from its current node to node id. Do not wait until you have a complete solution to add animation: the animation will help you debug partial solutions.
As we have seen in class, getting an actual measurable speedup with fork/join parallelism may be difficult, and hard to achieve on input that is not really big. The purpose of this lab is to experiment with the logic of fork/join parallelism, not to achieve a significant, measurable speedup on simple examples of mazes.

Testing your solution

Included in the material for the lab are two mazes, “small” and “medium”, on which you can test your solution. Compile your solution and run it with the tests using make:

Compile your solution:
```
make compile
```
Run your solution on map small, with forkAfter equal to 3 or to 9:
```
make parallel_small_step3
make parallel_small_step9
```
Run your solution on map medium, with forkAfter equal to 3 or to 9:
```
make parallel_medium_step3
make parallel_medium_step9
```

Compiling and running without `make`

You can install make following the instructions on the computing resources page. If you do not have make, you can run the examples from the command line after compiling all the project’s source files:

java -cp src/main amazed.Main maps/MAP.map parallel-N

where MAP is small or medium for each of the maps, and N is a positive integer corresponding to the chosen value of forkAfter.