Are duplicate keys allowed in the definition of binary search trees

Question

I m trying to find the definition of a binary search tree and I keep finding different definitions everywhere    Some say that for any given subtree the left child key is less than or equal to the root   Some say that for any given subtree the right child key is greater than or equal to the root   And my old college data structures book says  every element has a key and no two elements have the same key    Is there a universal definition of a bst  Particularly in regards to what to do with trees with multiple instances of the same key   EDIT  Maybe I was unclear  the definitions I m seeing are  1  left  lt   root  lt  right  2  left  lt  root  lt   right  3  left  lt  root  lt  right  such that no duplicate keys exist

User · Answer

Those three things you said are all true.

Keys are unique
To the left are keys less than this one
To the right are keys greater than this one

I suppose you could reverse your tree and put the smaller keys on the right, but really the "left" and "right" concept is just that: a visual concept to help us think about a data structure which doesn't really have a left or right, so it doesn't really matter.

User · Answer

In a BST  all values descending on the left side of a node are less than  or equal to  see later  the node itself   Similarly  all values descending on the right side of a node are greater than  or equal to  that node value a   Some BSTs may choose to allow duplicate values  hence the  quot or equal to quot  qualifiers above  The following example may clarify       14            13    22             1    16    29                        28    29  This shows a BST that allows duplicates b  - you can see that to find a value  you start at the root node and go down the left or right subtree depending on whether your search value is less than or greater than the node value  This can be done recursively with something like  def hasVal  node  srchval       if node    NULL           return false     if node val    srchval          return true     if node val  gt  srchval          return hasVal  node left  srchval      return hasVal  node right  srchval   and calling it with  foundIt   hasVal  rootNode  valToLookFor   Duplicates add a little complexity since you may need to keep searching once you ve found your value  for other nodes of the same value  Obviously that doesn t matter for hasVal since it doesn t matter how many there are  just whether at least one exists  It will however matter for things like countVal  since it needs to know how many there are    a  You could actually sort them in the opposite direction should you so wish provided you adjust how you search for a specific key  A BST need only maintain some sorted order  whether that s ascending or descending  or even some weird multi-layer-sort method like all odd numbers ascending  then all even numbers descending  is not relevant    b  Interestingly  if your sorting key uses the entire value stored at a node  so that nodes containing the same key have no other extra information to distinguish them   there can be performance gains from adding a count to each node  rather than allowing duplicate nodes  The main benefit is that adding or removing a duplicate will simply modify the count rather than inserting or deleting a new node  an action that may require re-balancing the tree   So  to add an item  you first check if it already exists  If so  just increment the count and exit  If not  you need to insert a new node with a count of one then rebalance  To remove an item  you find it then decrement the count - only if the resultant count is zero do you then remove the actual node from the tree and rebalance  Searches are also quicker given there are fewer nodes but that may not be a large impact  For example  the following two trees  non-counting on the left  and counting on the right  would be equivalent  in the counting tree  i c means c copies of item i          14                         22 2                                                 14        22             7 1            29 1                                                 1    14  22    29      1 1     14 3  28 1      30 1                      7         28    30  Removing the leaf-node 22 from the left tree would involve rebalancing  since it now has a height differential of two  the resulting 22-29-28-30 subtree such as below  this is one option  there are others that also satisfy the  quot height differential must be zero or one quot  rule                             22                     29                                 29      -- gt       28    30                        28    30         22  Doing the same operation on the right tree is a simple modification of the root node from 22 2 to 22 1  with no rebalancing required

User · Answer

1   left  lt   root  lt  right 2   left  lt  root  lt   right 3   left  lt  root  lt  right  such that no duplicate keys exist   I might have to go and dig out my algorithm books  but off the top of my head  3  is the canonical form   1  or  2  only come about when you start to allow duplicates nodes and you put duplicate nodes in the tree itself  rather than the node containing a list

User · Answer

In the book  Introduction to algorithms   third edition  by Cormen  Leiserson  Rivest and Stein  a binary search tree  BST  is explicitly defined as allowing duplicates  This can be seen in figure 12 1 and the following  page 287        The keys in a binary search tree are always stored in such a way as to satisfy the binary-search-tree property  Let x be a node in a binary search tree  If y is a node in the left subtree of x  then y key  lt   x key  If y is a node in the right subtree of x  then y key  gt   x key     In addition  a red-black tree is then defined on page 308 as       A red-black tree is a binary search tree with one extra bit of storage per node  its color    Therefore  red-black trees defined in this book support duplicates

User · Answer

If your binary search tree is a red black tree, or you intend to any kind of "tree rotation" operations, duplicate nodes will cause problems. Imagine your tree rule is this:

left < root <= right

Now imagine a simple tree whose root is 5, left child is nil, and right child is 5. If you do a left rotation on the root you end up with a 5 in the left child and a 5 in the root with the right child being nil. Now something in the left tree is equal to the root, but your rule above assumed left < root.

I spent hours trying to figure out why my red/black trees would occasionally traverse out of order, the problem was what I described above. Hopefully somebody reads this and saves themselves hours of debugging in the future!

User · Answer

Working on a red-black tree implementation I was getting problems validating the tree with multiple keys until I realized that with the red-black insert rotation, you have to loosen the constraint to

left <= root <= right

Since none of the documentation I was looking at allowed for duplicate keys and I didn't want to rewrite the rotation methods to account for it, I just decided to modify my nodes to allow for multiple values within the node, and no duplicate keys in the tree.

User · Answer

Any definition is valid   As long as you are consistent in your implementation  always put equal nodes to the right  always put them to the left  or never allow them  then you re fine   I think it is most common to not allow them  but it is still a BST if they are allowed and place either left or right

User · Answer

If your binary search tree is a red black tree, or you intend to any kind of "tree rotation" operations, duplicate nodes will cause problems. Imagine your tree rule is this:

left < root <= right

Now imagine a simple tree whose root is 5, left child is nil, and right child is 5. If you do a left rotation on the root you end up with a 5 in the left child and a 5 in the root with the right child being nil. Now something in the left tree is equal to the root, but your rule above assumed left < root.

I spent hours trying to figure out why my red/black trees would occasionally traverse out of order, the problem was what I described above. Hopefully somebody reads this and saves themselves hours of debugging in the future!

User · Answer

The elements ordering relation  lt   is a total order so the relation must be reflexive but commonly a binary search tree  aka BST  is a tree without duplicates    Otherwise if there are duplicates you need run twice or more the same function of deletion

User · Answer

Those three things you said are all true.

Keys are unique
To the left are keys less than this one
To the right are keys greater than this one

I suppose you could reverse your tree and put the smaller keys on the right, but really the "left" and "right" concept is just that: a visual concept to help us think about a data structure which doesn't really have a left or right, so it doesn't really matter.

User · Answer

I just want to add some more information to what @Robert Paulson answered.

Let's assume that node contains key & data. So nodes with the same key might contain different data.
(So the search must find all nodes with the same key)

left <= cur < right

left < cur <= right

left <= cur <= right

left < cur < right && cur contain sibling nodes with the same key.

left < cur < right, such that no duplicate keys exist.

1 & 2. works fine if the tree does not have any rotation-related functions to prevent skewness.
But this form doesn't work with AVL tree or Red-Black tree, because rotation will break the principal.
And even if search() finds the node with the key, it must traverse down to the leaf node for the nodes with duplicate key.
Making time complexity for search = theta(logN)

3. will work well with any form of BST with rotation-related functions.
But the search will take O(n), ruining the purpose of using BST.
Say we have the tree as below, with 3) principal.

         12
       /    \
     10     20
    /  \    /
   9   11  12 
      /      \
    10       12

If we do search(12) on this tree, even tho we found 12 at the root, we must keep search both left & right child to seek for the duplicate key.
This takes O(n) time as I've told.

4. is my personal favorite. Let's say sibling means the node with the same key.
We can change above tree into below.

         12 - 12 - 12
       /    \
10 - 10     20
    /  \
   9   11

Now any search will take O(logN) because we don't have to traverse children for the duplicate key.
And this principal also works well with AVL or RB tree.

User · Answer

In a BST  all values descending on the left side of a node are less than  or equal to  see later  the node itself   Similarly  all values descending on the right side of a node are greater than  or equal to  that node value a   Some BSTs may choose to allow duplicate values  hence the  quot or equal to quot  qualifiers above  The following example may clarify       14            13    22             1    16    29                        28    29  This shows a BST that allows duplicates b  - you can see that to find a value  you start at the root node and go down the left or right subtree depending on whether your search value is less than or greater than the node value  This can be done recursively with something like  def hasVal  node  srchval       if node    NULL           return false     if node val    srchval          return true     if node val  gt  srchval          return hasVal  node left  srchval      return hasVal  node right  srchval   and calling it with  foundIt   hasVal  rootNode  valToLookFor   Duplicates add a little complexity since you may need to keep searching once you ve found your value  for other nodes of the same value  Obviously that doesn t matter for hasVal since it doesn t matter how many there are  just whether at least one exists  It will however matter for things like countVal  since it needs to know how many there are    a  You could actually sort them in the opposite direction should you so wish provided you adjust how you search for a specific key  A BST need only maintain some sorted order  whether that s ascending or descending  or even some weird multi-layer-sort method like all odd numbers ascending  then all even numbers descending  is not relevant    b  Interestingly  if your sorting key uses the entire value stored at a node  so that nodes containing the same key have no other extra information to distinguish them   there can be performance gains from adding a count to each node  rather than allowing duplicate nodes  The main benefit is that adding or removing a duplicate will simply modify the count rather than inserting or deleting a new node  an action that may require re-balancing the tree   So  to add an item  you first check if it already exists  If so  just increment the count and exit  If not  you need to insert a new node with a count of one then rebalance  To remove an item  you find it then decrement the count - only if the resultant count is zero do you then remove the actual node from the tree and rebalance  Searches are also quicker given there are fewer nodes but that may not be a large impact  For example  the following two trees  non-counting on the left  and counting on the right  would be equivalent  in the counting tree  i c means c copies of item i          14                         22 2                                                 14        22             7 1            29 1                                                 1    14  22    29      1 1     14 3  28 1      30 1                      7         28    30  Removing the leaf-node 22 from the left tree would involve rebalancing  since it now has a height differential of two  the resulting 22-29-28-30 subtree such as below  this is one option  there are others that also satisfy the  quot height differential must be zero or one quot  rule                             22                     29                                 29      -- gt       28    30                        28    30         22  Doing the same operation on the right tree is a simple modification of the root node from 22 2 to 22 1  with no rebalancing required

User · Answer

In a BST  all values descending on the left side of a node are less than  or equal to  see later  the node itself   Similarly  all values descending on the right side of a node are greater than  or equal to  that node value a   Some BSTs may choose to allow duplicate values  hence the  quot or equal to quot  qualifiers above  The following example may clarify       14            13    22             1    16    29                        28    29  This shows a BST that allows duplicates b  - you can see that to find a value  you start at the root node and go down the left or right subtree depending on whether your search value is less than or greater than the node value  This can be done recursively with something like  def hasVal  node  srchval       if node    NULL           return false     if node val    srchval          return true     if node val  gt  srchval          return hasVal  node left  srchval      return hasVal  node right  srchval   and calling it with  foundIt   hasVal  rootNode  valToLookFor   Duplicates add a little complexity since you may need to keep searching once you ve found your value  for other nodes of the same value  Obviously that doesn t matter for hasVal since it doesn t matter how many there are  just whether at least one exists  It will however matter for things like countVal  since it needs to know how many there are    a  You could actually sort them in the opposite direction should you so wish provided you adjust how you search for a specific key  A BST need only maintain some sorted order  whether that s ascending or descending  or even some weird multi-layer-sort method like all odd numbers ascending  then all even numbers descending  is not relevant    b  Interestingly  if your sorting key uses the entire value stored at a node  so that nodes containing the same key have no other extra information to distinguish them   there can be performance gains from adding a count to each node  rather than allowing duplicate nodes  The main benefit is that adding or removing a duplicate will simply modify the count rather than inserting or deleting a new node  an action that may require re-balancing the tree   So  to add an item  you first check if it already exists  If so  just increment the count and exit  If not  you need to insert a new node with a count of one then rebalance  To remove an item  you find it then decrement the count - only if the resultant count is zero do you then remove the actual node from the tree and rebalance  Searches are also quicker given there are fewer nodes but that may not be a large impact  For example  the following two trees  non-counting on the left  and counting on the right  would be equivalent  in the counting tree  i c means c copies of item i          14                         22 2                                                 14        22             7 1            29 1                                                 1    14  22    29      1 1     14 3  28 1      30 1                      7         28    30  Removing the leaf-node 22 from the left tree would involve rebalancing  since it now has a height differential of two  the resulting 22-29-28-30 subtree such as below  this is one option  there are others that also satisfy the  quot height differential must be zero or one quot  rule                             22                     29                                 29      -- gt       28    30                        28    30         22  Doing the same operation on the right tree is a simple modification of the root node from 22 2 to 22 1  with no rebalancing required

User · Answer

Those three things you said are all true.

Keys are unique
To the left are keys less than this one
To the right are keys greater than this one

I suppose you could reverse your tree and put the smaller keys on the right, but really the "left" and "right" concept is just that: a visual concept to help us think about a data structure which doesn't really have a left or right, so it doesn't really matter.

User · Answer

In the book  Introduction to algorithms   third edition  by Cormen  Leiserson  Rivest and Stein  a binary search tree  BST  is explicitly defined as allowing duplicates  This can be seen in figure 12 1 and the following  page 287        The keys in a binary search tree are always stored in such a way as to satisfy the binary-search-tree property  Let x be a node in a binary search tree  If y is a node in the left subtree of x  then y key  lt   x key  If y is a node in the right subtree of x  then y key  gt   x key     In addition  a red-black tree is then defined on page 308 as       A red-black tree is a binary search tree with one extra bit of storage per node  its color    Therefore  red-black trees defined in this book support duplicates

User · Answer

Any definition is valid   As long as you are consistent in your implementation  always put equal nodes to the right  always put them to the left  or never allow them  then you re fine   I think it is most common to not allow them  but it is still a BST if they are allowed and place either left or right

User · Answer

1   left  lt   root  lt  right 2   left  lt  root  lt   right 3   left  lt  root  lt  right  such that no duplicate keys exist   I might have to go and dig out my algorithm books  but off the top of my head  3  is the canonical form   1  or  2  only come about when you start to allow duplicates nodes and you put duplicate nodes in the tree itself  rather than the node containing a list

User · Answer

Many algorithms will specify that duplicates are excluded. For example, the example algorithms in the MIT Algorithms book usually present examples without duplicates. It is fairly trivial to implement duplicates (either as a list at the node, or in one particular direction.)

Most (that I've seen) specify left children as <= and right children as >. Practically speaking, a BST which allows either of the right or left children to be equal to the root node, will require extra computational steps to finish a search where duplicate nodes are allowed.

It is best to utilize a list at the node to store duplicates, as inserting an '=' value to one side of a node requires rewriting the tree on that side to place the node as the child, or the node is placed as a grand-child, at some point below, which eliminates some of the search efficiency.

You have to remember, most of the classroom examples are simplified to portray and deliver the concept. They aren't worth squat in many real-world situations. But the statement, "every element has a key and no two elements have the same key", is not violated by the use of a list at the element node.

So go with what your data structures book said!

Edit:

Universal Definition of a Binary Search Tree involves storing and search for a key based on traversing a data structure in one of two directions. In the pragmatic sense, that means if the value is <>, you traverse the data structure in one of two 'directions'. So, in that sense, duplicate values don't make any sense at all.

This is different from BSP, or binary search partition, but not all that different. The algorithm to search has one of two directions for 'travel', or it is done (successfully or not.) So I apologize that my original answer didn't address the concept of a 'universal definition', as duplicates are really a distinct topic (something you deal with after a successful search, not as part of the binary search.)

User · Answer

Working on a red-black tree implementation I was getting problems validating the tree with multiple keys until I realized that with the red-black insert rotation, you have to loosen the constraint to

left <= root <= right

Since none of the documentation I was looking at allowed for duplicate keys and I didn't want to rewrite the rotation methods to account for it, I just decided to modify my nodes to allow for multiple values within the node, and no duplicate keys in the tree.

User · Answer

Many algorithms will specify that duplicates are excluded. For example, the example algorithms in the MIT Algorithms book usually present examples without duplicates. It is fairly trivial to implement duplicates (either as a list at the node, or in one particular direction.)

Most (that I've seen) specify left children as <= and right children as >. Practically speaking, a BST which allows either of the right or left children to be equal to the root node, will require extra computational steps to finish a search where duplicate nodes are allowed.

It is best to utilize a list at the node to store duplicates, as inserting an '=' value to one side of a node requires rewriting the tree on that side to place the node as the child, or the node is placed as a grand-child, at some point below, which eliminates some of the search efficiency.

You have to remember, most of the classroom examples are simplified to portray and deliver the concept. They aren't worth squat in many real-world situations. But the statement, "every element has a key and no two elements have the same key", is not violated by the use of a list at the element node.

So go with what your data structures book said!

Edit:

Universal Definition of a Binary Search Tree involves storing and search for a key based on traversing a data structure in one of two directions. In the pragmatic sense, that means if the value is <>, you traverse the data structure in one of two 'directions'. So, in that sense, duplicate values don't make any sense at all.

This is different from BSP, or binary search partition, but not all that different. The algorithm to search has one of two directions for 'travel', or it is done (successfully or not.) So I apologize that my original answer didn't address the concept of a 'universal definition', as duplicates are really a distinct topic (something you deal with after a successful search, not as part of the binary search.)

User · Answer

1   left  lt   root  lt  right 2   left  lt  root  lt   right 3   left  lt  root  lt  right  such that no duplicate keys exist   I might have to go and dig out my algorithm books  but off the top of my head  3  is the canonical form   1  or  2  only come about when you start to allow duplicates nodes and you put duplicate nodes in the tree itself  rather than the node containing a list

User · Answer

Any definition is valid   As long as you are consistent in your implementation  always put equal nodes to the right  always put them to the left  or never allow them  then you re fine   I think it is most common to not allow them  but it is still a BST if they are allowed and place either left or right

User · Answer

Many algorithms will specify that duplicates are excluded. For example, the example algorithms in the MIT Algorithms book usually present examples without duplicates. It is fairly trivial to implement duplicates (either as a list at the node, or in one particular direction.)

Most (that I've seen) specify left children as <= and right children as >. Practically speaking, a BST which allows either of the right or left children to be equal to the root node, will require extra computational steps to finish a search where duplicate nodes are allowed.

It is best to utilize a list at the node to store duplicates, as inserting an '=' value to one side of a node requires rewriting the tree on that side to place the node as the child, or the node is placed as a grand-child, at some point below, which eliminates some of the search efficiency.

You have to remember, most of the classroom examples are simplified to portray and deliver the concept. They aren't worth squat in many real-world situations. But the statement, "every element has a key and no two elements have the same key", is not violated by the use of a list at the element node.

So go with what your data structures book said!

Edit:

Universal Definition of a Binary Search Tree involves storing and search for a key based on traversing a data structure in one of two directions. In the pragmatic sense, that means if the value is <>, you traverse the data structure in one of two 'directions'. So, in that sense, duplicate values don't make any sense at all.

This is different from BSP, or binary search partition, but not all that different. The algorithm to search has one of two directions for 'travel', or it is done (successfully or not.) So I apologize that my original answer didn't address the concept of a 'universal definition', as duplicates are really a distinct topic (something you deal with after a successful search, not as part of the binary search.)

User · Answer

If your binary search tree is a red black tree, or you intend to any kind of "tree rotation" operations, duplicate nodes will cause problems. Imagine your tree rule is this:

left < root <= right

Now imagine a simple tree whose root is 5, left child is nil, and right child is 5. If you do a left rotation on the root you end up with a 5 in the left child and a 5 in the root with the right child being nil. Now something in the left tree is equal to the root, but your rule above assumed left < root.

I spent hours trying to figure out why my red/black trees would occasionally traverse out of order, the problem was what I described above. Hopefully somebody reads this and saves themselves hours of debugging in the future!

User · Answer

In a BST  all values descending on the left side of a node are less than  or equal to  see later  the node itself   Similarly  all values descending on the right side of a node are greater than  or equal to  that node value a   Some BSTs may choose to allow duplicate values  hence the  quot or equal to quot  qualifiers above  The following example may clarify       14            13    22             1    16    29                        28    29  This shows a BST that allows duplicates b  - you can see that to find a value  you start at the root node and go down the left or right subtree depending on whether your search value is less than or greater than the node value  This can be done recursively with something like  def hasVal  node  srchval       if node    NULL           return false     if node val    srchval          return true     if node val  gt  srchval          return hasVal  node left  srchval      return hasVal  node right  srchval   and calling it with  foundIt   hasVal  rootNode  valToLookFor   Duplicates add a little complexity since you may need to keep searching once you ve found your value  for other nodes of the same value  Obviously that doesn t matter for hasVal since it doesn t matter how many there are  just whether at least one exists  It will however matter for things like countVal  since it needs to know how many there are    a  You could actually sort them in the opposite direction should you so wish provided you adjust how you search for a specific key  A BST need only maintain some sorted order  whether that s ascending or descending  or even some weird multi-layer-sort method like all odd numbers ascending  then all even numbers descending  is not relevant    b  Interestingly  if your sorting key uses the entire value stored at a node  so that nodes containing the same key have no other extra information to distinguish them   there can be performance gains from adding a count to each node  rather than allowing duplicate nodes  The main benefit is that adding or removing a duplicate will simply modify the count rather than inserting or deleting a new node  an action that may require re-balancing the tree   So  to add an item  you first check if it already exists  If so  just increment the count and exit  If not  you need to insert a new node with a count of one then rebalance  To remove an item  you find it then decrement the count - only if the resultant count is zero do you then remove the actual node from the tree and rebalance  Searches are also quicker given there are fewer nodes but that may not be a large impact  For example  the following two trees  non-counting on the left  and counting on the right  would be equivalent  in the counting tree  i c means c copies of item i          14                         22 2                                                 14        22             7 1            29 1                                                 1    14  22    29      1 1     14 3  28 1      30 1                      7         28    30  Removing the leaf-node 22 from the left tree would involve rebalancing  since it now has a height differential of two  the resulting 22-29-28-30 subtree such as below  this is one option  there are others that also satisfy the  quot height differential must be zero or one quot  rule                             22                     29                                 29      -- gt       28    30                        28    30         22  Doing the same operation on the right tree is a simple modification of the root node from 22 2 to 22 1  with no rebalancing required

User · Answer

1   left  lt   root  lt  right 2   left  lt  root  lt   right 3   left  lt  root  lt  right  such that no duplicate keys exist   I might have to go and dig out my algorithm books  but off the top of my head  3  is the canonical form   1  or  2  only come about when you start to allow duplicates nodes and you put duplicate nodes in the tree itself  rather than the node containing a list

User · Answer

The elements ordering relation  lt   is a total order so the relation must be reflexive but commonly a binary search tree  aka BST  is a tree without duplicates    Otherwise if there are duplicates you need run twice or more the same function of deletion

User · Answer

All three definitions are acceptable and correct. They define different variations of a BST.

Your college data structure's book failed to clarify that its definition was not the only possible.

Certainly, allowing duplicates adds complexity. If you use the definition "left <= root < right" and you have a tree like:

      3
    /   \
  2       4

then adding a "3" duplicate key to this tree will result in:

Note that the duplicates are not in contiguous levels.

This is a big issue when allowing duplicates in a BST representation as the one above: duplicates may be separated by any number of levels, so checking for duplicate's existence is not that simple as just checking for immediate childs of a node.

An option to avoid this issue is to not represent duplicates structurally (as separate nodes) but instead use a counter that counts the number of occurrences of the key. The previous example would then have a tree like:

      3(1)
    /     \
  2(1)     4(1)

and after insertion of the duplicate "3" key it will become:

      3(2)
    /     \
  2(1)     4(1)

This simplifies lookup, removal and insertion operations, at the expense of some extra bytes and counter operations.

User · Answer

I just want to add some more information to what @Robert Paulson answered.

Let's assume that node contains key & data. So nodes with the same key might contain different data.
(So the search must find all nodes with the same key)

left <= cur < right

left < cur <= right

left <= cur <= right

left < cur < right && cur contain sibling nodes with the same key.

left < cur < right, such that no duplicate keys exist.

1 & 2. works fine if the tree does not have any rotation-related functions to prevent skewness.
But this form doesn't work with AVL tree or Red-Black tree, because rotation will break the principal.
And even if search() finds the node with the key, it must traverse down to the leaf node for the nodes with duplicate key.
Making time complexity for search = theta(logN)

3. will work well with any form of BST with rotation-related functions.
But the search will take O(n), ruining the purpose of using BST.
Say we have the tree as below, with 3) principal.

         12
       /    \
     10     20
    /  \    /
   9   11  12 
      /      \
    10       12

If we do search(12) on this tree, even tho we found 12 at the root, we must keep search both left & right child to seek for the duplicate key.
This takes O(n) time as I've told.

4. is my personal favorite. Let's say sibling means the node with the same key.
We can change above tree into below.

         12 - 12 - 12
       /    \
10 - 10     20
    /  \
   9   11

Now any search will take O(logN) because we don't have to traverse children for the duplicate key.
And this principal also works well with AVL or RB tree.

User · Answer

Those three things you said are all true.

Keys are unique
To the left are keys less than this one
To the right are keys greater than this one

I suppose you could reverse your tree and put the smaller keys on the right, but really the "left" and "right" concept is just that: a visual concept to help us think about a data structure which doesn't really have a left or right, so it doesn't really matter.

User · Answer

Many algorithms will specify that duplicates are excluded. For example, the example algorithms in the MIT Algorithms book usually present examples without duplicates. It is fairly trivial to implement duplicates (either as a list at the node, or in one particular direction.)

Most (that I've seen) specify left children as <= and right children as >. Practically speaking, a BST which allows either of the right or left children to be equal to the root node, will require extra computational steps to finish a search where duplicate nodes are allowed.

It is best to utilize a list at the node to store duplicates, as inserting an '=' value to one side of a node requires rewriting the tree on that side to place the node as the child, or the node is placed as a grand-child, at some point below, which eliminates some of the search efficiency.

You have to remember, most of the classroom examples are simplified to portray and deliver the concept. They aren't worth squat in many real-world situations. But the statement, "every element has a key and no two elements have the same key", is not violated by the use of a list at the element node.

So go with what your data structures book said!

Edit:

Universal Definition of a Binary Search Tree involves storing and search for a key based on traversing a data structure in one of two directions. In the pragmatic sense, that means if the value is <>, you traverse the data structure in one of two 'directions'. So, in that sense, duplicate values don't make any sense at all.

This is different from BSP, or binary search partition, but not all that different. The algorithm to search has one of two directions for 'travel', or it is done (successfully or not.) So I apologize that my original answer didn't address the concept of a 'universal definition', as duplicates are really a distinct topic (something you deal with after a successful search, not as part of the binary search.)

User · Answer

Any definition is valid   As long as you are consistent in your implementation  always put equal nodes to the right  always put them to the left  or never allow them  then you re fine   I think it is most common to not allow them  but it is still a BST if they are allowed and place either left or right

User · Answer

Duplicate Keys     What happens if there s more than one data item with the same key      This presents a slight problem in red-black trees      It s important that nodes with the same key are distributed on both sides of other nodes with the same key      That is  if keys arrive in the order 50  50  50      you want the second 50 to go to the right of the first one  and the third 50 to go to the left of the first one      Otherwise  the tree becomes unbalanced      This could be handled by some kind of randomizing process in the insertion algorithm      However  the search process then becomes more complicated if all items with the same key must be found      It s simpler to outlaw items with the same key      In this discussion we ll assume duplicates aren t allowed  One can create a linked list for each node of the tree that contains duplicate keys and store data in the list

User · Answer

Duplicate Keys     What happens if there s more than one data item with the same key      This presents a slight problem in red-black trees      It s important that nodes with the same key are distributed on both sides of other nodes with the same key      That is  if keys arrive in the order 50  50  50      you want the second 50 to go to the right of the first one  and the third 50 to go to the left of the first one      Otherwise  the tree becomes unbalanced      This could be handled by some kind of randomizing process in the insertion algorithm      However  the search process then becomes more complicated if all items with the same key must be found      It s simpler to outlaw items with the same key      In this discussion we ll assume duplicates aren t allowed  One can create a linked list for each node of the tree that contains duplicate keys and store data in the list

User · Answer

If your binary search tree is a red black tree, or you intend to any kind of "tree rotation" operations, duplicate nodes will cause problems. Imagine your tree rule is this:

left < root <= right

Now imagine a simple tree whose root is 5, left child is nil, and right child is 5. If you do a left rotation on the root you end up with a 5 in the left child and a 5 in the root with the right child being nil. Now something in the left tree is equal to the root, but your rule above assumed left < root.

I spent hours trying to figure out why my red/black trees would occasionally traverse out of order, the problem was what I described above. Hopefully somebody reads this and saves themselves hours of debugging in the future!

User · Answer

All three definitions are acceptable and correct. They define different variations of a BST.

Your college data structure's book failed to clarify that its definition was not the only possible.

Certainly, allowing duplicates adds complexity. If you use the definition "left <= root < right" and you have a tree like:

      3
    /   \
  2       4

then adding a "3" duplicate key to this tree will result in:

Note that the duplicates are not in contiguous levels.

This is a big issue when allowing duplicates in a BST representation as the one above: duplicates may be separated by any number of levels, so checking for duplicate's existence is not that simple as just checking for immediate childs of a node.

An option to avoid this issue is to not represent duplicates structurally (as separate nodes) but instead use a counter that counts the number of occurrences of the key. The previous example would then have a tree like:

      3(1)
    /     \
  2(1)     4(1)

and after insertion of the duplicate "3" key it will become:

      3(2)
    /     \
  2(1)     4(1)

This simplifies lookup, removal and insertion operations, at the expense of some extra bytes and counter operations.

[data-structures] Are duplicate keys allowed in the definition of binary search trees?

The answer is

Examples related to data-structures

Examples related to computer-science

Examples related to binary-tree

Tags