Analysis of Binary Search Tree
For analysis of time complexity for operations on BST we explore the following questions.
- What could be a worst-case scenario?, and
- What could be a best-case scenario?
Search is the critical first step any operation including both insertion and deletion on a binary search tree. A successful or unsuccessful search depends on the length of the tree path from the root to the farthest leaf node. A right-skewed or a left-skewed BSTs exhibit pathological cases for insertions and deletions. These cases occur as follows:
- Perform an ascending sequence of n insertions on an initially empty BST it generates a right-skewed tree.
- Perform a descending sequence of n insertions on an initially empty BST it generates a left-skewed tree.
A completely skewed tree requires a time of O(n) for any operation. On the other hand, an operation on a balanced binary search tree takes O(log n). Therefore,
- A right-skewed or a left-skewed BST provides the worst-case scenario, and
- A balanced BST gives a best-case scenario for BSTs.
When we have a random input source, all distributions of the values are equally likely. So, it is important to analyze the average case time complexity. The average case analysis of BST depends on the average path length traversed on a BST for performing an operation. We begin with an example. Consider the tree shown below. The total internal path length of the tree is 15.
The first problem we need to address is:
- How to compute the internal path length of a BST in a dynamic situation where an operation can choose to traverse any section of a BST?
We capture such a dynamic situation by considering the possibilities of insertions and deletions happening in any possible order. We use the following assumptions:
- P(n): the average path length of a node in a BST with n nodes
- Initial values are: P(0) = 0 and P(1)=1.
- x: is the root of the BST or the first node to be inserted
- x: may equally likely to be 1st, 2nd, 3rd or nth element in the input
Now consider ith insertion for a fixed i for i = 0, 1, 2, …, n-1. The configuration of BST at this point is illustrated in the figure below:
Consider how the subsequent insertion occurs.
- The root node will be probed in any case.
- If left subtree of the root is probed for i+1 insertion then path length is 1+P(i)
- If right subtree of the root is probed for i+1 insertion then path length is 1+P(n-1-i)
- The probability of searching any element is 1/n.
So the average number of probes for i+1the insertion would be:
The average value is obtained by letting i to be any of the n elements. So, the average value is given by the expression:
In the above equation, we use the expression found earlier to replace P(n,i). Since the pair of sums for P(i) and P(n,i) are symmetric, the simplified expression for P(n) becomes:
Now apply induction to prove . The induction
requires a base case, an induction hypothesis and a proof for the induction step.
Base case: We know P(1) = 1, and evaluation of the expression for n=1 gives us:
Induction hypothesis: Assume
for any i less than n.
Proof of induction step: With the induction hypothesis in mind we apply to terms of the sum in the expression for P(n) and simplify it further to obtain:
Consider the sum on the right-hand side of previous inequality and apply simplification:
Replacing the sum by the final expression, we finally get
Therefore, the average case complexity of an operation on BST is of the order of log n