The statement we are proving does not have the form
∀n ∈ ℕ.P(n)
for some predicate P, so the form of induction used in the argument is not applicable.
The Θ notation is possibly confusing. If we use a higher-order variant of Θ instead, then the statement to be proved can be reformulated as follows:
The equations have a solution T, which has the property T ∈ Θ(λn.n).
Here we can see that we cannot perform "induction on n", because n is not in scope.
As an aside we could ask ourselves if we can fix the proof. The answer is no, because T(n)=(n + 1)n/2 = Θ(n²). Seeing a failed attempt could be instructive, though. Let us try to prove
∀n ∈ ℕ.T(n)=n
by induction on n:
Base case: T(0)=0.
Step case (n > 0):
T(n)
=
n + T(n − 1)
= { Inductive hypothesis. }
n + (n − 1)
=
Θ(n)
=
n.
The last couple of steps look suspicious, right?
pop
takes constant time, so multi-pop(n,s)
is linear in n
(Θ(n
)).
The goal is to prove that all operations, including multi-pop
, take constant amortised time. The key observation is that we can only multi-pop
n elements after we have already push
ed n elements.
Assume that the stack ADT only includes the following operations:
Creation of an empty stack.
push
.
pop
.
size
.
We can see every operation as being one of those or a multi-pop
.
Let us choose the time unit so that
multi-pop(n,s)
takes at most 1 + n
time units under normal operation, and at most 1 time unit if the precondition is not satisfied,
and the other operations take at most one time unit.
Let us further assign the following amortised time complexities to the operations:
push
: 2.
All other operations, including multi-pop
: 1.
We need to prove that the sum of the amortised time complexities for a sequence of operations (for a given stack, starting with its creation) is an upper bound of the sum of the worst-case time complexities:
∑i = 1nti ≤ ∑i = 1nai,
where the sequence contains n operations, ti stands for the actual (idealised) time complexity of the i-th operation, and ai is its amortised time complexity.
The inequality above follows from the following inequality:
∑i = 1nti + sn ≤ ∑i = 1nai.
Here si is the size of the stack after i operations (or 0 if the stack has not been created yet). We can prove the inequality by induction on n:
Case 0: The left and the right hand sides both evaluate to 0.
Case 1+n: We proceed by case analysis on the 1+n-th operation:
Creation of empty stack. Here n must be 0:
∑i = 11ti + s1
≤
1 + 0
=
1
= { Amortised time complexity of creation: 1. }
∑i = 11ai
push
:
∑i = 11 + nti + s1 + n
≤
∑i = 1nti + 1 + sn + 1
=
∑i = 1nti + sn + 2
≤ { Inductive hypothesis. }
∑i = 1nai + 2
= { Amortised time complexity of push
: 2. }
∑i = 11 + nai
pop
, assuming the precondition is satisfied (i.e. sn > 0):
∑i = 11 + nti + s1 + n
≤
∑i = 1nti + 1 + sn − 1
=
∑i = 1nti + sn
≤ { Inductive hypothesis. }
∑i = 1nai
≤
∑i = 1nai + 1
= { Amortised time complexity of pop
: 1. }
∑i = 11 + nai
multi-pop(m,s)
when the precondition is satisfied:
∑i = 11 + nti + s1 + n
≤
∑i = 1nti + 1 + m + sn − m
=
∑i = 1nti + sn + 1
≤ { Inductive hypothesis. }
∑i = 1nai + 1
= { Amortised time complexity of multi-pop
: 1. }
∑i = 11 + nai
size
, or pop
or multi-pop
when their respective preconditions are not satisfied:
∑i = 11 + nti + s1 + n
≤ { The size is preserved. }
∑i = 1nti + 1 + sn
=
∑i = 1nti + sn + 1
≤ { Inductive hypothesis. }
∑i = 1nai + 1
= { Amortised time complexity: 1. }
∑i = 11 + nai
We get that all operations take constant amortised time.
Is this the "best possible amortised time complexity of multi-pop
" (up to constant factors)? A slightly more refined analysis shows that one can assign multi-pop
the amortised time complexity zero whenever at least one element is popped. However, in the worst case we pop nothing. Consider the operation sequence consisting of the creation of an empty stack followed by n multi-pop
s of nothing. This sequence takes 1 + n time units. Furthermore creation can only be assigned a constant amortised time complexity (because it does not take any input). Hence we cannot, in general, assign multi-pop
a "sub-constant" amortised time complexity.
Some notes:
This analysis is an example of a reusable method, the "potential method". The solution could be simplified by including a reference to this method and omitting some steps.
One could also simplify the analysis somewhat by only considering well-formed operations, for which the preconditions are satisfied.
Basic idea: Maintain a set which contains all words seen so far. For each word, check if its mirror image is a member of the set.
allocate a set, implemented as a hash table with separate chaining,
where the chaining is implemented using linked lists
do the following for each word w in the file:
if reverse(w) exists in the set
print "yes"
exit the program
else
insert w into the set
print "no"
Assuming a uniform cost model.
Let us assume that the hash function does constant work per character, i.e. that the time needed to compute a hash value for the word w is Θ(|w|) time units, where |w| is the length of w.
The number of characters in the file is denoted by N.
Let us now assume that we have a perfect hash function, and consider an iteration of the loop body:
We find the next word w. This operation takes O(|w|+s) time units, where s is the size of any surrounding white space that is traversed.
We reverse the word.
We look up the reversed word.
We may also insert w into the set.
The last three operations take O(|w|) time units (amortised in the case of insertion, due to potential resizing/rehashing). We get that the loop body is amortised linear in the number of processed characters.
Each character in the file is processed at most once, so the total time complexity is O(N). In the worst case the program must read all the input, so this bound is tight; we get the worst-case time complexity Θ(N).
Note that in this scenario most of the time is likely to be spent reading the program's input.
If all words hash to the same bucket, then looking up the word w takes O(i|w|) time units, where i is the current size of the hash table. (Note that comparing two words w₁ and w₂ for equality takes O(min(|w₁|, |w₂|)) time units.) The other operations have the same time complexity as in the previous case, so the loop body's time complexity is O(i|w|+s) (amortised).
Assume now that we have a worst-case scenario where the file does not contain any duplicated words and no two words are mirror images of each other. (It is not immediately obvious that such an input can be constructed for all alphabet sizes and all N, but this assumption will at least give us an upper bound on the time complexity.) In this case the worst-case time complexity is
O(∑i = 0n − 1(i|wi|+si)) = O(cn²+N),
where n is the number of words, and every word has at most c characters. If we have both c = Θ(N) and n = Θ(N), then we get the time complexity O(N³).
Is the scenario above reasonable? Can we find a constant alphabet size A such that, for all (large) file sizes N, there is a file without duplicated or mirrored words, satisfying c = Θ(N) and n = Θ(N), and requiring Θ(c) character comparisons in order to see that any two distinct words are not equal? No: if Θ(N) character comparisons are required, then every word has Θ(N) characters, in which case we cannot have Θ(N) words.
Challenge: Improve the analysis above (either prove that the bound is tight, or come up with a better bound).
Let us represent graphs as follows:
One hash table mapping node identifiers (strings) to node objects.
identifier
: The node's identifier.adjacent
: An adjacency list, containing (pointers to) node objects.Note that this representation ensures that one can find the successors of a node without having to look up strings.
Basic idea:
Program:
l ← parse the file into a list of lists of strings, where
* the outer list contains one inner list per line in the file,
and
* every inner list contains one node identifier (string) per
word on the line, in the order in which the words occurred
h ← an empty hash table mapping strings (node identifiers) to node
objects
for each inner list l′ of l do
id ← the first element of l′
n ← new node object: {identifier = id, adjacent = empty list}
h.insert(id, n)
for each inner list l′ of l do
id ← the first element of l′
adj ← h.lookup(id).adjacent
for each element id′ of l′, except for the first do
adj.insert(h.lookup(id′))
Add a new node to the graph, connected to all the other nodes. Return the result of running the Hamiltonian cycle algorithm on the extended graph.
Denote the node added to the input graph by v.
The algorithm is correct if we can prove that the input graph has a Hamiltonian path iff the extended graph has a Hamiltonian cycle.
The only if direction:
Assume that the input graph has a Hamiltonian path v1, v2, …, vn. Note that all these nodes are distinct from each other and from v. We get that the extended graph has a Hamiltonian cycle v, v1, v2, …, vn, v.
The if direction:
Assume that the extended graph has a Hamiltonian cycle. We can assume, without loss of generality, that the cycle starts and ends in v: v, v1, v2, …, vn, v. Here the nodes v1, v2, …, vn form a Hamiltonian path in the input graph.
If the input graph is G = (V, E), then it takes O(|V|) time units to add the extra node (if a suitable adjacency list representation is used). The linear Hamiltonian cycle algorithm is now applied to the larger graph; its running time is O((|V|+1)+(|E|+|V|))=O(|V|+|E|). Hence the total running time is also O(|V|+|E|), if our assumption about the graph's representation is correct (perhaps the Hamiltonian cycle algorithm uses some non-standard graph representation).
Assume that a graph G = (V, E) is represented using an array of length |V|, where position n contains the adjacency list for the node with node index n (n ∈ {0,…,|V|-1}), and the adjacency lists contain node indices.
We need to check if a given graph is reflexive, symmetric, and transitive. Checking for reflexivity and symmetry is easy:
reflexive(G):
for n ∈ {0,…,number of nodes in G - 1} do
if n does not occur in its own adjacency list then
return false
return true
symmetric(G):
M ← adjacency matrix for G
for m ∈ {0,…,number of nodes in G - 1} do
for n ∈ {0,…,number of nodes in G - 1} do
if M[m,n] ≠ M[n,m] then
return false
return true
(The code for symmetric
can be optimised.)
A reflexive graph G = (V, E) is transitive iff all shortest paths between any two nodes contain at most one edge. Proof:
Only if: Assume that, for all u, v, w ∈ V, (u, v) ∈ E and (v, w) ∈ E implies (u, w) ∈ E. Take any path v₁,…,vn. If this path has length at least 2, then it can be shortened by replacing v₁ → v₂ → v₃ by v₁ → v₃. Hence all shortest paths have length 1 or less.
If: Assume that all shortest paths contain at most one edge, and assume that (u, v) ∈ E and (v, w) ∈ E. We know that there exists a path from u to w, so there must be a path from u to w of length at most one. If the length is one, then we have (u, w) ∈ E. If the length is zero, then u = w, and hence (u, w) ∈ E follows from reflexivity.
Hence we can check transitivity as follows:
// Precondition: G must be reflexive.
transitive(G):
for every node v in G do
p ← an array containing the lengths of the shortest paths from
v to every node in G (using -1 to mark the absence of a
path)
if any element in p is ≥ 2 then
return false
return true
Finally the pieces above can be combined as follows:
equivalence-relation(G):
return (reflexive(G) and symmetric(G) and transitive(G))
Assuming a uniform cost model.
Assume that the input is G = (V, E). The different functions have the following time complexities:
reflexive
: The for loop is iterated at most |V| times. The total time needed to scan the adjacency lists is O(|E|) time units. We get the time complexity O(|V|+|E|).
symmetric
: Creating an adjacency matrix takes O(|V|²) time units (first allocating and initialising a |V|×|V| matrix, and then updating one matrix entry for every edge in the graph). The nested loops take O(|V|²) time units, for a total of O(|V|²).
transitive
: The following is done at most |V| times:
Hence the total time is O(|V|(|V|+|E|)).
We see that the time complexity is dominated by transitive
. The total time required is O(|V|(|V|+|E|)).
Basic idea: Find a minimum spanning tree, return the weight of (one of) its most expensive edge(s).
This can be implemented by modifying Kruskal's algorithm so that the weight of the last edge added to the minimum spanning tree is returned.
Assuming a uniform cost model.
This algorithm has the same time complexity as Kruskal's algorithm: O(|E|log|V|), if we assume that we can compare two weights in constant time.
The algorithm produces a result.
The graph is connected, so it has an MST. Furthermore the graph contains at least two nodes, so any MST contains at least one edge.
The returned value is the spacing of some partition.
Denote the computed MST by T, and (one of) its most expensive edge(s) by e. If we remove e from T, then we get two trees T₁ and T₂. Denote the nodes contained in these trees by V₁ and V₂, respectively. V₁ and V₂ partition the graph, and no edge connecting these sets is strictly cheaper than e (because T is an MST), so the weight of e is the spacing of (V₁, V₂).
The returned value is the largest possible spacing.
Assume that there is some other partition (V₁′, V₂′) which has a strictly larger spacing. This means that e is strictly cheaper than any of the (at least one) cheapest edges between V₁′ and V₂′. Let us denote one of these edges by e′, and the nodes that it connects by v₁′ and v₂′ (with v₁′ ∈ V₁′ and v₂′ ∈ V₂′).
We know that there is a simple path p in T which connects v₁′ and v₂′, because T is a spanning tree. Furthermore v₁′ ∈ V₁′ and v₂′ ∈ V₂′, so we know that at least one of the edges in p must go between V₁′ and V₂′. This edge is not more expensive than e, which in turn is strictly cheaper than e′. However, e′ was assumed to be a cheapest edge between V₁′ and V₂′, so we have found a contradiction.
You may want to try to generalise the algorithm to partitions with more than two components.
This exercise is related to cluster analysis, see for instance the book Algorithm Design by Kleinberg and Tardos. Some examples found online.
No solution given.
Perform a postorder traversal of the tree, swapping left and right subtrees.
We do a constant amount of work per tree node, so the algorithm is linear in the size of the tree.
We want to prove that the resulting tree is the reverse of the input. We do this by induction on the structure of the tree.
The empty tree case: The algorithm does nothing, which is fine, because an empty tree is its own reverse.
The node case: Assume that we start with a node ⟨l, x, r⟩, where l is the left subtree, x is the node element, and r is the right subtree. We know that all elements in l are smaller than x, and that x is smaller than all elements in r.
We recursively reverse l and r, ending up with l′ and r′, respectively. By the inductive hypothesis we know that l′ and r′ are reversals of l and r, respectively. The final tree is ⟨r′, x, l′⟩, which is sorted in reverse order because
Haskell code:
data BinaryTree a = Empty | Node (BinaryTree a) a (BinaryTree a)
reverseTree :: BinaryTree a -> BinaryTree a
reverseTree Empty = Empty
reverseTree (Node l x r) = Node (reverseTree r) x (reverseTree l)
As before. (If we assume that strict evaluation is used.)
We want to prove, for all (total, finite) trees t
, that
toList (reverseTree t) = reverse (toList t),
where reverse
reverses a list and toList
converts a tree into a list:
toList :: BinaryTree a -> [a]
toList Empty = []
toList (Node l x r) = toList l ++ [x] ++ toList r
Proof by induction on t
:
Case Empty
:
toList (reverseTree Empty)
=
toList Empty
=
[]
=
reverse []
=
reverse (toList Empty)
Case Node l x r
:
toList (reverseTree (Node l x r))
=
toList (Node (reverseTree r) x (reverseTree l))
=
toList (reverseTree r) ++ [x] ++ toList (reverseTree l)
= { Inductive hypothesis (twice). }
reverse (toList r) ++ [x] ++ reverse (toList l)
= { Property of reverse. }
reverse (toList l ++ [x] ++ toList r)
=
reverse (toList (Node l x r))
We can store one boolean in the tree (not one boolean per tree node), which indicates whether the tree is reversed or not. Other operations (lookup, delete, etc.) need to consult this bit at most once per visited node, so their asymptotic worst-case time complexities do not change.
This solution may be hard to understand for those who are not familiar with functional programming.
Definition of binary trees:
data BinaryTree a = Empty | Node (BinaryTree a) a (BinaryTree a)
We need to check if a given binary tree is a search tree satisfying the AVL tree balancing invariant:
isAVL : BinaryTree a → Bool
isAVL t = searchTree t ∧ balanced t
Here I assume that the type a
comes with a constant-time comparison operator
_<_ : a → a → Bool
which implements a strict total order.
We can check if a tree is a search tree by turning the tree into a list using an inorder traversal, and then checking if this list is strictly increasing:
searchTree : BinaryTree a → Bool
searchTree t = strictlyIncreasing (inorderList t)
inorderList : BinaryTree a → [a]
inorderList Empty = []
inorderList (Node l x r) = inorderList l ++ [x] ++ inorderList r
strictlyIncreasing : [a] → Bool
strictlyIncreasing [] = True
strictlyIncreasing [x] = True
strictlyIncreasing (x ∷ y ∷ xs) =
x < y ∧ strictlyIncreasing (y ∷ xs)
The call inorderList t
is linear in the size of t
, assuming that an implementation of lists which supports constant-time concatenation (++
) is used.1 Furthermore strictlyIncreasing xs
is, in the worst case, linear in the length of xs
(because x < y
executes in constant time). Hence searchTree t
is linear in the size of t
(in the worst case).
When checking if the tree is balanced it seems reasonable to avoid recomputing heights. The following function accomplishes this by checking if a tree is balanced and computing its height, both in a single postorder traversal:
balancedAndHeight : BinaryTree a → Bool × ℤ
balancedAndHeight Empty = (True, -1)
balancedAndHeight (Node l x r) =
(bl ∧ br ∧ |hl - hr| ≤ 1, 1 + max hl hr)
where
(bl, hl) = balancedAndHeight l
(br, hr) = balancedAndHeight r
Because a single traversal is used and we do a constant amount of processing per tree node2 we get that balancedAndHeight
is linear in the size of the tree. We can wrap up by throwing away the computed height:
balanced : BinaryTree a → Bool
balanced t = b
where (b, _) = balancedAndHeight t
The total time needed by isAVL t
is linear in t
(in the worst case). Is this optimal? Yes, one cannot be sure that an arbitrary tree is an AVL tree without inspecting every node.
Let N(h) denote the smallest possible number of nodes in an AVL tree of height h. An AVL tree of a given height with as few nodes as possible must be maximally unbalanced, so we can define N recursively as follows (assuming that the height of an empty tree is -1):
N(-1) = 0
N(0) = 1
N(h+1) = 1 + N(h) + N(h-1), h ∈ ℕ
This recursive definition is well-founded, so it must have a unique solution.
For all h ∈ ℕ we have N(h) ≥ ϕh, where ϕ is the golden ratio (1+√5)/2. Proof by induction on h:
Case 0: 1 ≥ 1 = ϕ0.
Case 1: 2 = (1+√9)/2 ≥ ϕ = ϕ1.
Case h+2, h ∈ ℕ:
N(h+2)
=
1 + N(h+1) + N(h)
≥ { inductive hypothesis (twice) }
1 + ϕh + 1 + ϕh
=
1 + (1+ϕ)·ϕh
= { 1+ϕ = ϕ² }
1 + ϕh + 2
≥
ϕh + 2.
Now let H(n) denote the largest possible height of an AVL tree with n nodes. For any AVL tree with n nodes and height h we have N(h) ≤ n (by the definition of N). In particular, this applies to a tree with n nodes and height H(n). If we assume that n ≥ 1, then we get
n
≥
N(H(n))
≥ { n ≥ 1 implies H(n) ≥ 0 }
ϕH(n).
Both sides of this inequality are positive, so we can apply the golden ratio logarithm (which is monotone) to both sides and get the following inequality:
∀n ≥ 1. H(n) ≤ logϕ n,
i.e. the height of an AVL tree with n nodes is O(log n).
I would expect you to find that the "efficient" implementation is much more efficient than the quadratic one for large input, and that the optimised library implementation is quite a bit faster than your own "efficient" implementation, but not wildly so.
As for testing, a good idea seems to be to test that the three implementations always give the same result (up to reordering of duplicates).
Two possible answers:
Radix sort. See the ordinary exam from 2010, exercise 5c. However, this answer is rather theoretical.
Quicksort or mergesort. LaMarca and Ladner (The Influence of Caches on the Performance of Sorting, Journal of Algorithms, 31(1):66-104, 1999) implement optimised variants of quicksort, heapsort, mergesort and radix sort, and run actual experiments. While radix sort is claimed to have the lowest instruction count, cache effects make it roughly 50% slower than quicksort or mergesort for sorting one million 64-bit integers on the given hardware. See the figures on page 23 (or page 98 of the journal version of the paper) for a quick summary.
If the input is likely to be partially sorted, then one can also consider using an adaptive sorting algorithm.
Assume that the input consists of n elements, out of which n/c have the same key k. Then quicksort using this partitioning strategy (and no optimisations, such as switching to other sorting algorithms) requires Ω(n²/c²) steps.
Proof: Whenever the input is partitioned, all elements with key k end up in the same subset (S₁ or S₂), except perhaps for one (the pivot). Quicksort does not terminate until every element has been selected as a pivot once, so we need to partition the input at least n/c times. Computing these partitions takes at least
∑i = 1n/c i = Ω(n²/c²)
steps.