[algorithm] The best way to calculate the height in a binary search tree? (balancing an AVL-tree)

I'm looking for the best way to calculate a nodes balance in an AVL-tree. I thought I had it working, but after some heavy inserting/updating I can see that it's not working correct (at all).

This is kind of a two-part question, the first part would be how to calculate the height of a sub-tree, I know the definition "The height of a node is the length of the longest downward path to a leaf from that node." and I understand it, but I fail at implementing it. And to confuse me further this quote can be found on wikipedia on tree-heights "Conventionally, the value -1 corresponds to a subtree with no nodes, whereas zero corresponds to a subtree with one node."

And the second part is getting the balance factor of a sub-tree in an AVL tree, I've got no problem understanding the concept, "get the height of your L and R sub-trees and subtract R from L". And this is defined as something like this: BALANCE = NODE[L][HEIGHT] - NODE[R][HEIGT]

Reading on wikipedia says this on the first few lines describing insertions into an AVL tree: "If the balance factor becomes -1, 0, or 1 then the tree is still in AVL form, and no rotations are necessary."

It then goes on, saying this "If the balance factor becomes 2 or -2 then the tree rooted at this node is unbalanced, and a tree rotation is needed. At most a single or double rotation will be needed to balance the tree." - which I have no trouble grasping.

But (yes, there's always a but).

Here's where it gets confusing, the text states "If the balance factor of R is 1, it means the insertion occurred on the (external) right side of that node and a left rotation is needed". But from m understanding the text said (as I quoted) that if the balance factor was within [-1, 1] then there was no need for balancing?

I feel I'm so close to grasping the concept, I've gotten the tree rotations down, implemented a normal binary search tree, and on the brink of grasping AVL-trees but just seem to be missing that essential epiphany.

Edit: Code examples are preferred over academic formulas as I've always had an easier time grasping something in code, but any help is greatly appreciated.

Edit: I wish I could mark all answers as "accepted", but for me NIck's answer was the first that made me go "aha".

The answer is


Give BinaryTree<T, Comparator>::Node a subtreeHeight data member, initialized to 0 in its constructor, and update automatically every time with:

template <typename T, typename Comparator>
inline void BinaryTree<T, Comparator>::Node::setLeft (std::shared_ptr<Node>& node) {
    const std::size_t formerLeftSubtreeSize = left ? left->subtreeSize : 0;
    left = node;
    if (node) {
        node->parent = this->shared_from_this();
        subtreeSize++;
        node->depthFromRoot = depthFromRoot + 1;
        const std::size_t h = node->subtreeHeight;
        if (right)
            subtreeHeight = std::max (right->subtreeHeight, h) + 1;
        else
            subtreeHeight = h + 1;
    }
    else {
        subtreeSize -= formerLeftSubtreeSize;
        subtreeHeight = right ? right->subtreeHeight + 1 : 0;
    }
}

template <typename T, typename Comparator>
inline void BinaryTree<T, Comparator>::Node::setRight (std::shared_ptr<Node>& node) {
    const std::size_t formerRightSubtreeSize = right ? right->subtreeSize : 0;
    right = node;
    if (node) {
        node->parent = this->shared_from_this();
        subtreeSize++;
        node->depthFromRoot = depthFromRoot + 1;
        const std::size_t h = node->subtreeHeight;
        if (left)
            subtreeHeight = std::max (left->subtreeHeight, h) + 1;
        else
            subtreeHeight = h + 1;
    }
    else {
        subtreeSize -= formerRightSubtreeSize;
        subtreeHeight = left ? left->subtreeHeight + 1 : 0;
    }
}

Note that data members subtreeSize and depthFromRoot are also updated. These functions are called when inserting a node (all tested), e.g.

template <typename T, typename Comparator>
inline std::shared_ptr<typename BinaryTree<T, Comparator>::Node>
BinaryTree<T, Comparator>::Node::insert (BinaryTree& tree, const T& t, std::shared_ptr<Node>& node) {
    if (!node) {
        std::shared_ptr<Node> newNode = std::make_shared<Node>(tree, t);
        node = newNode;
        return newNode;
    }
    if (getComparator()(t, node->value)) {
        std::shared_ptr<Node> newLeft = insert(tree, t, node->left);
        node->setLeft(newLeft);
    }
    else {
        std::shared_ptr<Node> newRight = insert(tree, t, node->right);
        node->setRight(newRight);
    }
    return node;
}

If removing a node, use a different version of removeLeft and removeRight by replacing subtreeSize++; with subtreeSize--;. Algorithms for rotateLeft and rotateRight can be adapted without much problem either. The following was tested and passed:

template <typename T, typename Comparator>
void BinaryTree<T, Comparator>::rotateLeft (std::shared_ptr<Node>& node) {  // The root of the rotation is 'node', and its right child is the pivot of the rotation.  The pivot will rotate counter-clockwise and become the new parent of 'node'.
    std::shared_ptr<Node> pivot = node->right;
    pivot->subtreeSize = node->subtreeSize;
    pivot->depthFromRoot--;
    node->subtreeSize--;  // Since 'pivot' will no longer be in the subtree rooted at 'node'.
    const std::size_t a = pivot->left ? pivot->left->subtreeHeight + 1 : 0;  // Need to establish node->heightOfSubtree before pivot->heightOfSubtree is established, since pivot->heightOfSubtree depends on it.
    node->subtreeHeight = node->left ? std::max(a, node->left->subtreeHeight + 1) : std::max<std::size_t>(a,1);
    if (pivot->right) {
        node->subtreeSize -= pivot->right->subtreeSize;  // The subtree rooted at 'node' loses the subtree rooted at pivot->right.
        pivot->subtreeHeight = std::max (pivot->right->subtreeHeight, node->subtreeHeight) + 1;
    }
    else
        pivot->subtreeHeight = node->subtreeHeight + 1;
    node->depthFromRoot++;
    decreaseDepthFromRoot(pivot->right);  // Recursive call for the entire subtree rooted at pivot->right.
    increaseDepthFromRoot(node->left);  // Recursive call for the entire subtree rooted at node->left.
    pivot->parent = node->parent;
    if (pivot->parent) {  // pivot's new parent will be its former grandparent, which is not nullptr, so the grandparent must be updated with a new left or right child (depending on whether 'node' was its left or right child).
        if (pivot->parent->left == node)
            pivot->parent->left = pivot;
        else
            pivot->parent->right = pivot;
    }
    node->setRightSimple(pivot->left);  // Since pivot->left->value is less than pivot->value but greater than node->value.  We use the NoSizeAdjustment version because the 'subtreeSize' values of 'node' and 'pivot' are correct already.
    pivot->setLeftSimple(node);
    if (node == root) {
        root = pivot;
        root->parent = nullptr; 
    }
}

where

inline void decreaseDepthFromRoot (std::shared_ptr<Node>& node) {adjustDepthFromRoot(node, -1);}
inline void increaseDepthFromRoot (std::shared_ptr<Node>& node) {adjustDepthFromRoot(node, 1);}

template <typename T, typename Comparator>
inline void BinaryTree<T, Comparator>::adjustDepthFromRoot (std::shared_ptr<Node>& node, int adjustment) {
    if (!node)
        return;
    node->depthFromRoot += adjustment;
    adjustDepthFromRoot (node->left, adjustment);
    adjustDepthFromRoot (node->right, adjustment);
}

Here is the entire code: http://ideone.com/d6arrv


Here's an alternate way of finding height. Add an additional attribute to your node called height:

class Node
{
data value; //data is a custom data type
node right;
node left;
int height;
}

Now, we'll do a simple breadth-first traversal of the tree, and keep updating the height value for each node:

int height (Node root)
{
Queue<Node> q = Queue<Node>();
Node lastnode;
//reset height
root.height = 0;

q.Enqueue(root);
while(q.Count > 0)
{
   lastnode = q.Dequeue();
   if (lastnode.left != null){
      lastnode.left.height = lastnode.height + 1; 
      q.Enqueue(lastnode.left);
   }

   if (lastnode.right != null){
      lastnode.right.height = lastnode.height + 1;
      q.Enqueue(lastnode.right);
   }
}
return lastnode.height; //this will return a 0-based height, so just a root has a height of 0
}

Cheers,


Here's where it gets confusing, the text states "If the balance factor of R is 1, it means the insertion occurred on the (external) right side of that node and a left rotation is needed". But from m understanding the text said (as I quoted) that if the balance factor was within [-1, 1] then there was no need for balancing?

R is the right-hand child of the current node N.

If balance(N) = +2, then you need a rotation of some sort. But which rotation to use? Well, it depends on balance(R): if balance(R) = +1 then you need a left-rotation on N; but if balance(R) = -1 then you will need a double-rotation of some sort.


Well, you can compute the height of a tree with the following recursive function:

int height(struct tree *t) {
    if (t == NULL)
        return 0;
    else
        return max(height(t->left), height(t->right)) + 1;
}

with an appropriate definition of max() and struct tree. You should take the time to figure out why this corresponds to the definition based on path-length that you quote. This function uses zero as the height of the empty tree.

However, for something like an AVL tree, I don't think you actually compute the height each time you need it. Instead, each tree node is augmented with a extra field that remembers the height of the subtree rooted at that node. This field has to be kept up-to-date as the tree is modified by insertions and deletions.

I suspect that, if you compute the height each time instead of caching it within the tree like suggested above, that the AVL tree shape will be correct, but it won't have the expected logarithmic performance.


Here's where it gets confusing, the text states "If the balance factor of R is 1, it means the insertion occurred on the (external) right side of that node and a left rotation is needed". But from m understanding the text said (as I quoted) that if the balance factor was within [-1, 1] then there was no need for balancing?

Okay, epiphany time.

Consider what a rotation does. Let's think about a left rotation.

 P = parent
 O = ourself (the element we're rotating)
 RC = right child
 LC = left child (of the right child, not of ourself)

 P              10
  \               \
   O               15
    \                \
     RC               20
    /                /
   LC               18

          ?
 P              10
   \               \
     RC              20
   /               /
  O              15
   \               \
   LC              18

 basically, what happens is;

 1. our right child moves into our position
 2. we become the left child of our right child
 3. our right child's left child becomes our right

Now, the big thing you have to notice here - this left rotation HAS NOT CHANGED THE DEPTH OF THE TREE. We're no more balanced for having done it.

But - and here's the magic in AVL - if we rotated the right child to the right FIRST, what we'd have is this...

 P
  \
   O
    \
     LC
      \
       RC

And NOW if we rotate O left, what we get is this...

 P
   \
     LC
    /  \
   O   RC

Magic! we've managed to get rid of a level of the tree - we've made the tree balanced.

Balancing the tree means getting rid of excess depth, and packing the upper levels more completely - which is exactly what we've just done.

That whole stuff about single/double rotations is simply that you have to have your subtree looking like this;

 P
  \
   O
    \
     LC
      \
       RC

before you rotate - and you may have to do a right rotate to get into that state. But if you're already in that state, you only need to do the left rotate.


  • Height is easily implemented by recursion, take the maximum of the height of the subtrees plus one.

  • The "balance factor of R" refers to the right subtree of the tree which is out of balance, I suppose.


You do not need to calculate tree depths on the fly.

You can maintain them as you perform operations.

Furthermore, you don't actually in fact have to maintain track of depths; you can simply keep track of the difference between the left and right tree depths.

http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_avl.aspx

Just keeping track of the balance factor (difference between left and right subtrees) is I found easier from a programming POV, except that sorting out the balance factor after a rotation is a PITA...


This BFS-like solution is pretty straightforward. Simply jumps levels one-by-one.

def getHeight(self,root, method='links'):
    c_node = root
    cur_lvl_nodes = [root]
    nxt_lvl_nodes = []
    height = {'links': -1, 'nodes': 0}[method]

    while(cur_lvl_nodes or nxt_lvl_nodes):
        for c_node in cur_lvl_nodes:
            for n_node in filter(lambda x: x is not None, [c_node.left, c_node.right]):
                nxt_lvl_nodes.append(n_node)

        cur_lvl_nodes = nxt_lvl_nodes
        nxt_lvl_nodes = []
        height += 1

    return height

Examples related to algorithm

How can I tell if an algorithm is efficient? Find the smallest positive integer that does not occur in a given sequence Efficiently getting all divisors of a given number Peak signal detection in realtime timeseries data What is the optimal algorithm for the game 2048? How can I sort a std::map first by value, then by key? Finding square root without using sqrt function? Fastest way to flatten / un-flatten nested JSON objects Mergesort with Python Find common substring between two strings

Examples related to data-structures

Program to find largest and second largest number in array golang why don't we have a set datastructure How to initialize a vector with fixed length in R C compiling - "undefined reference to"? List of all unique characters in a string? Binary Search Tree - Java Implementation How to clone object in C++ ? Or Is there another solution? How to check queue length in Python Difference between "Complete binary tree", "strict binary tree","full binary Tree"? Write code to convert given number into words (eg 1234 as input should output one thousand two hundred and thirty four)

Examples related to binary-tree

Difference between "Complete binary tree", "strict binary tree","full binary Tree"? Difference between binary tree and binary search tree Heap vs Binary Search Tree (BST) C linked list inserting node at the end How to print binary tree diagram? With ' N ' no of nodes, how many different Binary and Binary Search Trees possible? How to implement a binary tree? Find kth smallest element in a binary search tree in Optimum way What are the applications of binary trees? How to find the lowest common ancestor of two nodes in any binary tree?

Examples related to avl-tree

The best way to calculate the height in a binary search tree? (balancing an AVL-tree)

Examples related to tree-balancing

The best way to calculate the height in a binary search tree? (balancing an AVL-tree)