I have chosen to give a functional solution. Given the explanatory text it should be easy to implement this solution in an imperative language.
Definition of binary trees:
data BinaryTree a = Empty | Node (BinaryTree a) a (BinaryTree a)
We need to check if a given binary tree is a search tree satisfying the AVL tree balancing invariant:
isAVL : BinaryTree a → Bool
isAVL t = searchTree t ∧ balanced t
Here I assume that the type a
comes with a constant-time comparison operator
_<_ : a → a → Bool
which implements a strict total order.
We can check if a tree is a search tree by turning the tree into a list using an inorder traversal, and then checking if this list is strictly increasing:
searchTree : BinaryTree a → Bool
searchTree t = strictlyIncreasing (inorderList t)
inorderList : BinaryTree a → [a]
inorderList Empty = []
inorderList (Node l x r) = inorderList l ++ [x] ++ inorderList r
strictlyIncreasing : [a] → Bool
strictlyIncreasing [] = True
strictlyIncreasing [x] = True
strictlyIncreasing (x ∷ y ∷ xs) =
x < y ∧ strictlyIncreasing (y ∷ xs)
The call inorderList t
is linear in the size of t
, assuming that an implementation of lists which supports constant-time concatenation (++
) is used.1 Furthermore strictlyIncreasing xs
is, in the worst case, linear in the length of xs
(because x < y
executes in constant time). Hence searchTree t
is linear in the size of t
(in the worst case).
When checking if the tree is balanced it seems reasonable to avoid recomputing heights. The following function accomplishes this by checking if a tree is balanced and computing its height, both in a single postorder traversal:
balancedAndHeight : BinaryTree a → Bool × ℤ
balancedAndHeight Empty = (True, -1)
balancedAndHeight (Node l x r) =
(bl ∧ br ∧ |hl - hr| ≤ 1, 1 + max hl hr)
where
(bl, hl) = balancedAndHeight l
(br, hr) = balancedAndHeight r
Because a single traversal is used and we do a constant amount of processing per tree node2 we get that balancedAndHeight
is linear in the size of the tree. We can wrap up by throwing away the computed height:
balanced : BinaryTree a → Bool
balanced t = b
where (b, _) = balancedAndHeight t
The total time needed by isAVL t
is linear in t
(in the worst case). Is this optimal? Yes, one cannot be sure that an arbitrary tree is an AVL tree without inspecting every node.
Let N(h) denote the smallest possible number of nodes in an AVL tree of height h. An AVL tree of a given height with as few nodes as possible must be maximally unbalanced, so we can define N recursively as follows (assuming that the height of an empty tree is -1):
N(-1) = 0
N(0) = 1
N(h+1) = 1 + N(h) + N(h-1), h ∈ ℕ
This recursive definition is well-founded, so it must have a unique solution.
For all h ∈ ℕ we have N(h) ≥ ϕh, where ϕ is the golden ratio (1+√5)/2. Proof by induction on h:
Case 0: 1 ≥ 1 = ϕ0.
Case 1: 2 = (1+√9)/2 ≥ ϕ = ϕ1.
Case h+2, h ∈ ℕ:
N(h+2)
=
1 + N(h+1) + N(h)
≥ { inductive hypothesis (twice) }
1 + ϕh + 1 + ϕh
=
1 + (1+ϕ)·ϕh
= { 1+ϕ = ϕ² }
1 + ϕh + 2
≥
ϕh + 2.
Now let H(n) denote the largest possible height of an AVL tree with n nodes. For any AVL tree with n nodes and height h we have N(h) ≤ n (by the definition of N). In particular, this applies to a tree with n nodes and height H(n). If we assume that n ≥ 1, then we get
n
≥
N(H(n))
≥ { n ≥ 1 implies H(n) ≥ 0 }
ϕH(n).
Both sides of this inequality are positive, so we can apply the golden ratio logarithm (which is monotone) to both sides and get the following inequality:
∀n ≥ 1. H(n) ≤ logϕ n,
i.e. the height of an AVL tree with n nodes is O(log n).