Exercise 1

Program

Basic idea: Maintain a set which contains all words seen so far. For each word, check if its mirror image is a member of the set.

allocate a set, implemented as a hash table with separate chaining,
  where the chaining is implemented using linked lists

do the following for each word w in the file:
  if reverse(w) exists in the set
    print "yes"
    exit the program
  else
    insert w into the set

print "no"

Time complexity

Assuming a uniform cost model.

Let us assume that the hash function does constant work per character, i.e. that the time needed to compute a hash value for the word w is θ(∣w∣) time units, where ∣w∣ is the length of w.

The number of characters in the file is denoted by N.

Perfect hash function

Let us now assume that we have a perfect hash function, and consider an iteration of the loop body:

We find the next word w. This operation takes O(∣w∣ + s) time units, where s is the size of any surrounding white space that is traversed.
We reverse the word.
We look up the reversed word.
We may also insert w into the set.

The last three operations take O(∣w∣) time units (amortised in the case of insertion, due to potential resizing/rehashing). We get that the loop body is amortised linear in the number of processed characters.

Each character in the file is processed at most once, so the total time complexity is O(N). In the worst case the program must read all the input, so this bound is tight; we get the worst-case time complexity θ(N).

Note that in this scenario most of the time is likely to be spent reading the program's input.

Worst-case scenario

If all words hash to the same bucket, then looking up the word w takes O(i∣w∣) time units, where i is the current size of the hash table. (Note that comparing two words w₁ and w₂ for equality takes O(min(∣w₁∣, ∣w₂∣)) time units.) The other operations have the same time complexity as in the previous case, so the loop body's time complexity is O(i∣w∣ + s) (amortised).

Assume now that we have a worst-case scenario where the file does not contain any duplicated words and no two words are mirror images of each other. (It is not immediately obvious that such an input can be constructed for all alphabet sizes and all N, but this assumption will at least give us an upper bound on the time complexity.) In this case the worst-case time complexity is

O(∑ _i = 0^n - 1(i∣w_i∣ + s_i)) = O(cn² + N),

where n is the number of words, and every word has at most c characters. If we have both c = θ(N) and n = θ(N), then we get the time complexity O(N³).

Is the scenario above reasonable? Can we find a constant alphabet size A such that, for all (large) file sizes N, there is a file without duplicated or mirrored words, satisfying c = θ(N) and n = θ(N), and requiring θ(c) character comparisons in order to see that any two distinct words are not equal? No: if θ(N) character comparisons are required, then every word has θ(N) characters, in which case we cannot have θ(N) words.

Challenge: Improve the analysis above (either prove that the bound is tight, or come up with a better bound).

Exercise 2

Graph representation

Let us represent graphs as follows:

One hash table mapping node identifiers (strings) to node objects.
Node objects containing two fields:
- identifier: The node's identifier.
- adjacent: An adjacency list, containing (pointers to) node objects.

Note that this representation ensures that one can find the successors of a node without having to look up strings.

Program

Basic idea:

Parse the file into a list of list of strings.
Go through the list once to find all node identifiers, allocate node objects for them (with empty adjacency lists) and construct the hash table mapping node identifiers to node objects.
Go through the list again to populate the adjacency lists.

Program:

l ← parse the file into a list of lists of strings, where
    * the outer list contains one inner list per line in the file,
      and
    * every inner list contains one node identifier (string) per
      word on the line, in the order in which the words occurred

h ← an empty hash table mapping strings (node identifiers) to node
      objects

for each inner list l′ of l do
  id ← the first element of l′
  n  ← new node object: {identifier = id, adjacent = empty list}
  h.insert(id, n)

for each inner list l′ of l do
  id  ← the first element of l′
  adj ← h.lookup(id).adjacent
  for each element id′ of l′, except for the first do
    adj.insert(h.lookup(id′))

Exercise 3

Algorithm

Add a new node to the graph, connected to all the other nodes. Return the result of running the Hamiltonian cycle algorithm on the extended graph.

Correctness

Denote the node added to the input graph by v.

The algorithm is correct if we can prove that the input graph has a Hamiltonian path iff the extended graph has a Hamiltonian cycle.

The only if direction:

Assume that the input graph has a Hamiltonian path v₁, v₂, …, v_n. Note that all these nodes are distinct from each other and from v. We get that the extended graph has a Hamiltonian cycle v, v₁, v₂, …, v_n, v.
The if direction:

Assume that the extended graph has a Hamiltonian cycle. We can assume, without loss of generality, that the cycle starts and ends in v: v, v₁, v₂, …, v_n, v. Here the nodes v₁, v₂, …, v_n form a Hamiltonian path in the input graph.

Time complexity

If the input graph is G = (V, E), then it takes O(|V|) time units to add the extra node (if a suitable adjacency list representation is used). The linear Hamiltonian cycle algorithm is now applied to the larger graph; its running time is O((∣V∣ + 1) + (∣E∣ + ∣V∣)) = O(∣V∣ + ∣E∣). Hence the total running time is also O(∣V∣ + ∣E∣), if our assumption about the graph's representation is correct (perhaps the Hamiltonian cycle algorithm uses some non-standard graph representation).

Solutions to problem set 4

Nils Anders Danielsson

Exercise 1

Program

Time complexity

Perfect hash function

Worst-case scenario

Exercise 2

Graph representation

Program

Exercise 3

Algorithm

Correctness

Time complexity