Problem 1

The statement we are proving does not have the form

∀n ∈ ℕ.P(n)

for some predicate P, so the form of induction used in the argument is not applicable.

The Θ notation is possibly confusing. If we use a higher-order variant of Θ instead, then the statement to be proved can be reformulated as follows:

The equations have a solution T, which has the property T ∈ Θ(λn.n).

Here we can see that we cannot perform "induction on n", because n is not in scope.

As an aside we could ask ourselves if we can fix the proof. The answer is no, because T(n)=(n + 1)n/2 = Θ(n²). Seeing a failed attempt could be instructive, though. Let us try to prove

∀n ∈ ℕ.T(n)=n

by induction on n:

Base case: T(0)=0.
Step case (n > 0):

T(n)
  =
n + T(n − 1)
  = { Inductive hypothesis. }
n + (n − 1)
  =
Θ(n)
  =
n.

The last couple of steps look suspicious, right?

Problem 2

Worst-case time complexity

pop takes constant time, so multi-pop(n,s) is linear in n (Θ(n)).

Amortised time complexity

The goal is to prove that all operations, including multi-pop, take constant amortised time. The key observation is that we can only multi-pop n elements after we have already pushed n elements.

Assume that the stack ADT only includes the following operations:

Creation of an empty stack.
push.
pop.
size.

We can see every operation as being one of those or a multi-pop.

Let us choose the time unit so that

multi-pop(n,s) takes at most 1 + n time units under normal operation, and at most 1 time unit if the precondition is not satisfied,
and the other operations take at most one time unit.

Let us further assign the following amortised time complexities to the operations:

push: 2.
All other operations, including multi-pop: 1.

We need to prove that the sum of the amortised time complexities for a sequence of operations (for a given stack, starting with its creation) is an upper bound of the sum of the worst-case time complexities:

∑_i = 1ⁿt_i ≤ ∑_i = 1ⁿa_i,

where the sequence contains n operations, t_i stands for the actual (idealised) time complexity of the i-th operation, and a_i is its amortised time complexity.

The inequality above follows from the following inequality:

∑_i = 1ⁿt_i + s_n ≤ ∑_i = 1ⁿa_i.

Here s_i is the size of the stack after i operations (or 0 if the stack has not been created yet). We can prove the inequality by induction on n:

Case 0: The left and the right hand sides both evaluate to 0.

Case 1+n: We proceed by case analysis on the 1+n-th operation:

Creation of empty stack. Here n must be 0:

∑_i = 1¹t_i + s₁
≤
1 + 0
=
1
= { Amortised time complexity of creation: 1. }
∑_i = 1¹a_i
push:

∑_i = 1^1 + nt_i + s_1 + n
≤
∑_i = 1ⁿt_i + 1 + s_n + 1
=
∑_i = 1ⁿt_i + s_n + 2
≤ { Inductive hypothesis. }
∑_i = 1ⁿa_i + 2
= { Amortised time complexity of push: 2. }
∑_i = 1^1 + na_i
pop, assuming the precondition is satisfied (i.e. s_n > 0):

∑_i = 1^1 + nt_i + s_1 + n
≤
∑_i = 1ⁿt_i + 1 + s_n − 1
=
∑_i = 1ⁿt_i + s_n
≤ { Inductive hypothesis. }
∑_i = 1ⁿa_i
≤
∑_i = 1ⁿa_i + 1
= { Amortised time complexity of pop: 1. }
∑_i = 1^1 + na_i
multi-pop(m,s) when the precondition is satisfied:

∑_i = 1^1 + nt_i + s_1 + n
≤
∑_i = 1ⁿt_i + 1 + m + s_n − m
=
∑_i = 1ⁿt_i + s_n + 1
≤ { Inductive hypothesis. }
∑_i = 1ⁿa_i + 1
= { Amortised time complexity of multi-pop: 1. }
∑_i = 1^1 + na_i
size, or pop or multi-pop when their respective preconditions are not satisfied:

∑_i = 1^1 + nt_i + s_1 + n
≤ { The size is preserved. }
∑_i = 1ⁿt_i + 1 + s_n
=
∑_i = 1ⁿt_i + s_n + 1
≤ { Inductive hypothesis. }
∑_i = 1ⁿa_i + 1
= { Amortised time complexity: 1. }
∑_i = 1^1 + na_i

We get that all operations take constant amortised time.

Is this the "best possible amortised time complexity of multi-pop" (up to constant factors)? A slightly more refined analysis shows that one can assign multi-pop the amortised time complexity zero whenever at least one element is popped. However, in the worst case we pop nothing. Consider the operation sequence consisting of the creation of an empty stack followed by n multi-pops of nothing. This sequence takes 1 + n time units. Furthermore creation can only be assigned a constant amortised time complexity (because it does not take any input). Hence we cannot, in general, assign multi-pop a "sub-constant" amortised time complexity.

Some notes:

This analysis is an example of a reusable method, the "potential method". The solution could be simplified by including a reference to this method and omitting some steps.
One could also simplify the analysis somewhat by only considering well-formed operations, for which the preconditions are satisfied.

Problem 3

Program

Basic idea: Maintain a set which contains all words seen so far. For each word, check if its mirror image is a member of the set.

allocate a set, implemented as a hash table with separate chaining,
  where the chaining is implemented using linked lists

do the following for each word w in the file:
  if reverse(w) exists in the set
    print "yes"
    exit the program
  else
    insert w into the set

print "no"

Time complexity

Assuming a uniform cost model.

Let us assume that the hash function does constant work per character, i.e. that the time needed to compute a hash value for the word w is Θ(|w|) time units, where |w| is the length of w.

The number of characters in the file is denoted by N.

Perfect hash function

Let us now assume that we have a perfect hash function, and consider an iteration of the loop body:

We find the next word w. This operation takes O(|w|+s) time units, where s is the size of any surrounding white space that is traversed.
We reverse the word.
We look up the reversed word.
We may also insert w into the set.

The last three operations take O(|w|) time units (amortised in the case of insertion, due to potential resizing/rehashing). We get that the loop body is amortised linear in the number of processed characters.

Each character in the file is processed at most once, so the total time complexity is O(N). In the worst case the program must read all the input, so this bound is tight; we get the worst-case time complexity Θ(N).

Note that in this scenario most of the time is likely to be spent reading the program's input.

Worst-case scenario

If all words hash to the same bucket, then looking up the word w takes O(i|w|) time units, where i is the current size of the hash table. (Note that comparing two words w₁ and w₂ for equality takes O(min(|w₁|, |w₂|)) time units.) The other operations have the same time complexity as in the previous case, so the loop body's time complexity is O(i|w|+s) (amortised).

Assume now that we have a worst-case scenario where the file does not contain any duplicated words and no two words are mirror images of each other. (It is not immediately obvious that such an input can be constructed for all alphabet sizes and all N, but this assumption will at least give us an upper bound on the time complexity.) In this case the worst-case time complexity is

O(∑_i = 0^n − 1(i|w_i|+s_i)) = O(cn²+N),

where n is the number of words, and every word has at most c characters. If we have both c = Θ(N) and n = Θ(N), then we get the time complexity O(N³).

Is the scenario above reasonable? Can we find a constant alphabet size A such that, for all (large) file sizes N, there is a file without duplicated or mirrored words, satisfying c = Θ(N) and n = Θ(N), and requiring Θ(c) character comparisons in order to see that any two distinct words are not equal? No: if Θ(N) character comparisons are required, then every word has Θ(N) characters, in which case we cannot have Θ(N) words.

Challenge: Improve the analysis above (either prove that the bound is tight, or come up with a better bound).

Problem 4

Graph representation

Let us represent graphs as follows:

One hash table mapping node identifiers (strings) to node objects.
Node objects containing two fields:
- identifier: The node's identifier.
- adjacent: An adjacency list, containing (pointers to) node objects.

Note that this representation ensures that one can find the successors of a node without having to look up strings.

Program

Basic idea:

Parse the file into a list of list of strings.
Go through the list once to find all node identifiers, allocate node objects for them (with empty adjacency lists) and construct the hash table mapping node identifiers to node objects.
Go through the list again to populate the adjacency lists.

Program:

l ← parse the file into a list of lists of strings, where
    * the outer list contains one inner list per line in the file,
      and
    * every inner list contains one node identifier (string) per
      word on the line, in the order in which the words occurred

h ← an empty hash table mapping strings (node identifiers) to node
      objects

for each inner list l′ of l do
  id ← the first element of l′
  n  ← new node object: {identifier = id, adjacent = empty list}
  h.insert(id, n)

for each inner list l′ of l do
  id  ← the first element of l′
  adj ← h.lookup(id).adjacent
  for each element id′ of l′, except for the first do
    adj.insert(h.lookup(id′))

Problem 5

Algorithm

Add a new node to the graph, connected to all the other nodes. Return the result of running the Hamiltonian cycle algorithm on the extended graph.

Correctness

Denote the node added to the input graph by v.

The algorithm is correct if we can prove that the input graph has a Hamiltonian path iff the extended graph has a Hamiltonian cycle.

The only if direction:

Assume that the input graph has a Hamiltonian path v₁, v₂, …, v_n. Note that all these nodes are distinct from each other and from v. We get that the extended graph has a Hamiltonian cycle v, v₁, v₂, …, v_n, v.
The if direction:

Assume that the extended graph has a Hamiltonian cycle. We can assume, without loss of generality, that the cycle starts and ends in v: v, v₁, v₂, …, v_n, v. Here the nodes v₁, v₂, …, v_n form a Hamiltonian path in the input graph.

Time complexity

If the input graph is G = (V, E), then it takes O(|V|) time units to add the extra node (if a suitable adjacency list representation is used). The linear Hamiltonian cycle algorithm is now applied to the larger graph; its running time is O((|V|+1)+(|E|+|V|))=O(|V|+|E|). Hence the total running time is also O(|V|+|E|), if our assumption about the graph's representation is correct (perhaps the Hamiltonian cycle algorithm uses some non-standard graph representation).

Problem 6

Assume that a graph G = (V, E) is represented using an array of length |V|, where position n contains the adjacency list for the node with node index n (n ∈ {0,…,|V|-1}), and the adjacency lists contain node indices.

We need to check if a given graph is reflexive, symmetric, and transitive. Checking for reflexivity and symmetry is easy:

reflexive(G):
  for n ∈ {0,…,number of nodes in G - 1} do
    if n does not occur in its own adjacency list then
      return false

  return true

symmetric(G):
  M ← adjacency matrix for G

  for m ∈ {0,…,number of nodes in G - 1} do
    for n ∈ {0,…,number of nodes in G - 1} do
      if M[m,n] ≠ M[n,m] then
        return false

  return true

(The code for symmetric can be optimised.)

A reflexive graph G = (V, E) is transitive iff all shortest paths between any two nodes contain at most one edge. Proof:

Only if: Assume that, for all u, v, w ∈ V, (u, v) ∈ E and (v, w) ∈ E implies (u, w) ∈ E. Take any path v₁,…,v_n. If this path has length at least 2, then it can be shortened by replacing v₁ → v₂ → v₃ by v₁ → v₃. Hence all shortest paths have length 1 or less.
If: Assume that all shortest paths contain at most one edge, and assume that (u, v) ∈ E and (v, w) ∈ E. We know that there exists a path from u to w, so there must be a path from u to w of length at most one. If the length is one, then we have (u, w) ∈ E. If the length is zero, then u = w, and hence (u, w) ∈ E follows from reflexivity.

Hence we can check transitivity as follows:

// Precondition: G must be reflexive.
transitive(G):
  for every node v in G do
    p ← an array containing the lengths of the shortest paths from
          v to every node in G (using -1 to mark the absence of a
          path)

    if any element in p is ≥ 2 then
      return false

  return true

Finally the pieces above can be combined as follows:

equivalence-relation(G):
  return (reflexive(G) and symmetric(G) and transitive(G))

Time complexity

Assuming a uniform cost model.

Assume that the input is G = (V, E). The different functions have the following time complexities:

reflexive: The for loop is iterated at most |V| times. The total time needed to scan the adjacency lists is O(|E|) time units. We get the time complexity O(|V|+|E|).
symmetric: Creating an adjacency matrix takes O(|V|²) time units (first allocating and initialising a |V|×|V| matrix, and then updating one matrix entry for every edge in the graph). The nested loops take O(|V|²) time units, for a total of O(|V|²).
transitive: The following is done at most |V| times:
- Computing the shortest path from one node to every other. If a reasonable implementation of breadth-first search is used, then this takes O(|V|+|E|) time units.
- Checking if any of the shortest paths is long. This takes O(|V|) time units.
Hence the total time is O(|V|(|V|+|E|)).

We see that the time complexity is dominated by transitive. The total time required is O(|V|(|V|+|E|)).

Problem 7

Algorithm

Basic idea: Find a minimum spanning tree, return the weight of (one of) its most expensive edge(s).

This can be implemented by modifying Kruskal's algorithm so that the weight of the last edge added to the minimum spanning tree is returned.

Time complexity

Assuming a uniform cost model.

This algorithm has the same time complexity as Kruskal's algorithm: O(|E|log|V|), if we assume that we can compare two weights in constant time.

Correctness

The algorithm produces a result.

The graph is connected, so it has an MST. Furthermore the graph contains at least two nodes, so any MST contains at least one edge.
The returned value is the spacing of some partition.

Denote the computed MST by T, and (one of) its most expensive edge(s) by e. If we remove e from T, then we get two trees T₁ and T₂. Denote the nodes contained in these trees by V₁ and V₂, respectively. V₁ and V₂ partition the graph, and no edge connecting these sets is strictly cheaper than e (because T is an MST), so the weight of e is the spacing of (V₁, V₂).
The returned value is the largest possible spacing.

Assume that there is some other partition (V₁′, V₂′) which has a strictly larger spacing. This means that e is strictly cheaper than any of the (at least one) cheapest edges between V₁′ and V₂′. Let us denote one of these edges by e′, and the nodes that it connects by v₁′ and v₂′ (with v₁′ ∈ V₁′ and v₂′ ∈ V₂′).

We know that there is a simple path p in T which connects v₁′ and v₂′, because T is a spanning tree. Furthermore v₁′ ∈ V₁′ and v₂′ ∈ V₂′, so we know that at least one of the edges in p must go between V₁′ and V₂′. This edge is not more expensive than e, which in turn is strictly cheaper than e′. However, e′ was assumed to be a cheapest edge between V₁′ and V₂′, so we have found a contradiction.

You may want to try to generalise the algorithm to partitions with more than two components.

This exercise is related to cluster analysis, see for instance the book Algorithm Design by Kleinberg and Tardos. Some examples found online.

Problem 8

No solution given.

Problem 9

Part 1, solution 1

Perform a postorder traversal of the tree, swapping left and right subtrees.

Time complexity analysis

We do a constant amount of work per tree node, so the algorithm is linear in the size of the tree.

Correctness proof (sketch)

We want to prove that the resulting tree is the reverse of the input. We do this by induction on the structure of the tree.

The empty tree case: The algorithm does nothing, which is fine, because an empty tree is its own reverse.

The node case: Assume that we start with a node ⟨l, x, r⟩, where l is the left subtree, x is the node element, and r is the right subtree. We know that all elements in l are smaller than x, and that x is smaller than all elements in r.

We recursively reverse l and r, ending up with l′ and r′, respectively. By the inductive hypothesis we know that l′ and r′ are reversals of l and r, respectively. The final tree is ⟨r′, x, l′⟩, which is sorted in reverse order because

the subtrees are,
all elements in r′ are greater than x, and
x is greater than all elements in l′.

Part 1, solution 2

Haskell code:

  data BinaryTree a = Empty | Node (BinaryTree a) a (BinaryTree a)

  reverseTree :: BinaryTree a -> BinaryTree a
  reverseTree Empty        = Empty
  reverseTree (Node l x r) = Node (reverseTree r) x (reverseTree l)

Time complexity analysis

As before. (If we assume that strict evaluation is used.)

Correctness proof

We want to prove, for all (total, finite) trees t, that

  toList (reverseTree t) = reverse (toList t),

where reverse reverses a list and toList converts a tree into a list:

  toList :: BinaryTree a -> [a]
  toList Empty = []
  toList (Node l x r) = toList l ++ [x] ++ toList r

Proof by induction on t:

Case Empty:

  toList (reverseTree Empty)
    =
  toList Empty
    =
  []
    =
  reverse []
    =
  reverse (toList Empty)

Case Node l x r:

  toList (reverseTree (Node l x r))
    =
  toList (Node (reverseTree r) x (reverseTree l))
    =
  toList (reverseTree r) ++ [x] ++ toList (reverseTree l)
    = { Inductive hypothesis (twice). }
  reverse (toList r) ++ [x] ++ reverse (toList l)
    = { Property of reverse. }
  reverse (toList l ++ [x] ++ toList r)
    =
  reverse (toList (Node l x r))

Part 2

We can store one boolean in the tree (not one boolean per tree node), which indicates whether the tree is reversed or not. Other operations (lookup, delete, etc.) need to consult this bit at most once per visited node, so their asymptotic worst-case time complexities do not change.

Problem 10

This solution may be hard to understand for those who are not familiar with functional programming.

Definition of binary trees:

  data BinaryTree a = Empty | Node (BinaryTree a) a (BinaryTree a)

We need to check if a given binary tree is a search tree satisfying the AVL tree balancing invariant:

  isAVL : BinaryTree a → Bool
  isAVL t = searchTree t ∧ balanced t

Here I assume that the type a comes with a constant-time comparison operator

  _<_ : a → a → Bool

which implements a strict total order.

We can check if a tree is a search tree by turning the tree into a list using an inorder traversal, and then checking if this list is strictly increasing:

  searchTree : BinaryTree a → Bool
  searchTree t = strictlyIncreasing (inorderList t)

  inorderList : BinaryTree a → [a]
  inorderList Empty        = []
  inorderList (Node l x r) = inorderList l ++ [x] ++ inorderList r

  strictlyIncreasing : [a] → Bool
  strictlyIncreasing []           = True
  strictlyIncreasing [x]          = True
  strictlyIncreasing (x ∷ y ∷ xs) =
    x < y ∧ strictlyIncreasing (y ∷ xs)

The call inorderList t is linear in the size of t, assuming that an implementation of lists which supports constant-time concatenation (++) is used.¹ Furthermore strictlyIncreasing xs is, in the worst case, linear in the length of xs (because x < y executes in constant time). Hence searchTree t is linear in the size of t (in the worst case).

When checking if the tree is balanced it seems reasonable to avoid recomputing heights. The following function accomplishes this by checking if a tree is balanced and computing its height, both in a single postorder traversal:

  balancedAndHeight : BinaryTree a → Bool × ℤ
  balancedAndHeight Empty        = (True, -1)
  balancedAndHeight (Node l x r) =
    (bl ∧ br ∧ |hl - hr| ≤ 1, 1 + max hl hr)
    where
    (bl, hl) = balancedAndHeight l
    (br, hr) = balancedAndHeight r

Because a single traversal is used and we do a constant amount of processing per tree node² we get that balancedAndHeight is linear in the size of the tree. We can wrap up by throwing away the computed height:

  balanced : BinaryTree a → Bool
  balanced t = b
    where (b, _) = balancedAndHeight t

The total time needed by isAVL t is linear in t (in the worst case). Is this optimal? Yes, one cannot be sure that an arbitrary tree is an AVL tree without inspecting every node.

Problem 11

Let N(h) denote the smallest possible number of nodes in an AVL tree of height h. An AVL tree of a given height with as few nodes as possible must be maximally unbalanced, so we can define N recursively as follows (assuming that the height of an empty tree is -1):

N(-1) = 0
N(0) = 1
N(h+1) = 1 + N(h) + N(h-1), h ∈ ℕ

This recursive definition is well-founded, so it must have a unique solution.

For all h ∈ ℕ we have N(h) ≥ ϕ^h, where ϕ is the golden ratio (1+√5)/2. Proof by induction on h:

Case 0: 1 ≥ 1 = ϕ⁰.
Case 1: 2 = (1+√9)/2 ≥ ϕ = ϕ¹.
Case h+2, h ∈ ℕ:

N(h+2)
=
1 + N(h+1) + N(h)
≥ { inductive hypothesis (twice) }
1 + ϕ^h + 1 + ϕ^h
=
1 + (1+ϕ)·ϕ^h
= { 1+ϕ = ϕ² }
1 + ϕ^h + 2
≥
ϕ^h + 2.

Now let H(n) denote the largest possible height of an AVL tree with n nodes. For any AVL tree with n nodes and height h we have N(h) ≤ n (by the definition of N). In particular, this applies to a tree with n nodes and height H(n). If we assume that n ≥ 1, then we get

n
≥
N(H(n))
≥ { n ≥ 1 implies H(n) ≥ 0 }
ϕ^H(n).

Both sides of this inequality are positive, so we can apply the golden ratio logarithm (which is monotone) to both sides and get the following inequality:

∀n ≥ 1. H(n) ≤ log_ϕ n,

i.e. the height of an AVL tree with n nodes is O(log n).

Problem 12

I would expect you to find that the "efficient" implementation is much more efficient than the quadratic one for large input, and that the optimised library implementation is quite a bit faster than your own "efficient" implementation, but not wildly so.

As for testing, a good idea seems to be to test that the three implementations always give the same result (up to reordering of duplicates).

Problem 13

Two possible answers:

Radix sort. See the ordinary exam from 2010, exercise 5c. However, this answer is rather theoretical.
Quicksort or mergesort. LaMarca and Ladner (The Influence of Caches on the Performance of Sorting, Journal of Algorithms, 31(1):66-104, 1999) implement optimised variants of quicksort, heapsort, mergesort and radix sort, and run actual experiments. While radix sort is claimed to have the lowest instruction count, cache effects make it roughly 50% slower than quicksort or mergesort for sorting one million 64-bit integers on the given hardware. See the figures on page 23 (or page 98 of the journal version of the paper) for a quick summary.

If the input is likely to be partially sorted, then one can also consider using an adaptive sorting algorithm.

Problem 14

Assume that the input consists of n elements, out of which n/c have the same key k. Then quicksort using this partitioning strategy (and no optimisations, such as switching to other sorting algorithms) requires Ω(n²/c²) steps.

Proof: Whenever the input is partitioned, all elements with key k end up in the same subset (S₁ or S₂), except perhaps for one (the pivot). Quicksort does not terminate until every element has been selected as a pivot once, so we need to partition the input at least n/c times. Computing these partitions takes at least

∑_i = 1^n/c i = Ω(n²/c²)

steps.

I assume that strict evaluation is used.↩
Assuming a uniform cost model.↩

Solutions

Nils Anders Danielsson

Problem 1

Problem 2

Worst-case time complexity

Amortised time complexity

Problem 3

Program

Time complexity

Perfect hash function

Worst-case scenario

Problem 4

Graph representation

Program

Problem 5

Algorithm

Correctness

Time complexity

Problem 6

Time complexity

Problem 7

Algorithm

Time complexity

Correctness

More

Problem 8

Problem 9

Part 1, solution 1

Time complexity analysis

Correctness proof (sketch)

Part 1, solution 2

Time complexity analysis

Correctness proof

Part 2

Problem 10

Problem 11

Problem 12

Problem 13

Problem 14