Lecture 5 - Notes

January 18, 2016

Today

  1. Finish comparison based sorting
  2. 2-3 and Red-Black Trees

Comparison Based Bounds

Problem Size Lower Bound Upper Bound Reason
max of $n$ $O(n-1)$ $O(n-1)$ See graph theory proof
merge $n+n$ $O(2n-1)$ $O(2n-1)$ See adversarial argument proof
search on ordered table $n$ $O(\lg n)$ $\lceil \lg n \rceil$ Information Theory
max & min of $n$ $\lceil \frac{3}{2}n -2 \rceil$ $\lceil \frac{3}{n} -2 \rceil$ Tricky adversarial argument
max & next of $n$ $n + \lceil \lg n \rceil - 1$ $n + \lceil \lg n \rceil - 1$ Tricky adversarial argument
sorting $n$ $O(n \ log n)$ $\lceil \lg n! \rceil$ Information theory

Note: I know that $O(2n-1) = O(n)$ but Ruskey insists on "more accurate numbers". So I've added the big-o's even though they aren't strictly necessary.

Max

Theorem: $n - 1$ is the best upper bound for finding the maximum in an unordered array.

Proof: Make a graph $G = (V,E)$,

Each time a comparison is made say $a_i \lt a_j$ add the edge $(i,j)$. If the algorithm is correct, then at the end $G$ must be connected. For $G$ to be connected it must have at least $\left|E\right| \ge n - 1$ edges.

Merge

Theorem: $2n - 1$ is the best upper bound for the merge operation.

Proof: Suppose the input is ordered so that,

If some comparison, either $a_i \lt b_i$ or $b_i \lt a_{i+1}$ is not made, then the algorithm will fail. This is called an adversarial argument.

Max & Min

If we use a tree to model the comparisons, we have $\frac{n}{2}$ comparisons on the first row. Since an extended binary tree has $N - 1$ internal nodes (where $N$ is the number of leaves), and our comparison tree has $\frac{n}{2}$ leaves

Max & Next Max

If we use a tree like before, the next maxes will all be those that lost to the first max. Since we have one on each level we have another "sub" comparison set.

Balanced Search Trees

Slides

2-3 Trees

definition: A 2-3 Tree is a search tree which allows,

  • 2-node: one key, two children or
  • 3-node: two keys, three children

and we maintain,

  • Symmetric order: Inorder traversal yields keys in ascending order
  • Perfect balance: Every path from root to null link has same length

Search work more-or-less as it would on a binary search tree working it's way down from the root.

Insertion

If we insert into a 2-node we can just make it a 3-node. If we insert into a 3-node,

  • Add new key to 3-node to create temporary 4-node
  • Move middle key in 4-node into parent
  • Repeat up the tree, as necessary
  • If you reach the root and it's a 4-node, split it into three 2-nodes

Because the rest of the tree isn't affected, splitting a 4-node is a local transformation and requires constant time.

Perfect Balance

Perfect balance means that every path from the root to a leaf is the same length. The height is therefore,

Tree Height Reason
Worst Case $\lg N$ All 2-nodes
Best Case $\log_3 N$ All 3-nodes

In practice this means between 12 and 20 for a million nodes and between 18 and 30 for a billion nodes. The bottom line is 2-3 trees provide guaranteed logarithmic performance.

Implementation

Directly implementing a 2-3 tree is complicated, because,

  • Maintaining multiple node types is cumbersome
  • Need multiple compares to move down tree
  • Need to move back up the tree to split 4-nodes
  • Large number of cases for splitting

for example,

public void put(Key key, Value val)
{
  Node x = root;
  while (x.getTheCorrectChild(key) != null)
  {
    x = x.getTheCorrectChildKey();
    if (x.is4Node()) x.split();
  }
  if (x.is2Node()) x.make3Node(key, val);
  else if (x.is3Node()) x.make4Node(key, val);
}