Hashset vs Treeset

Question

I ve always loved trees  that nice O n log n   and the tidiness of them  However  every software engineer I ve ever known has asked me pointedly why I would use a TreeSet  From a CS background  I don t think it matters all that much which you use  and I don t care to mess around with hash functions and buckets  in the case of Java    In which cases should I use a HashSet over a TreeSet

User · Answer

The reason why most use HashSet is that the operations are  on average  O 1  instead of O log n    If the set contains standard items you will not be  messing around with hash functions  as that has been done for you   If the set contains custom classes  you have to implement hashCode to use HashSet  although Effective Java shows how   but if you use a TreeSet you have to make it Comparable or supply a Comparator  This can be a problem if the class does not have a particular order   I have sometimes used TreeSet  or actually TreeMap  for very small sets maps   lt  10 items  although I have not checked to see if there is any real gain in doing so   For large sets the difference can be considerable     Now if you need the sorted  then TreeSet is appropriate  although even then if updates are frequent and the need for a sorted result is infrequent  sometimes copying the contents to a list or an array and sorting them can be faster

User · Answer

Why have apples when you can have oranges   Seriously guys and gals - if your collection is large  read and written to gazillions of times  and you re paying for CPU cycles  then the choice of the collection is relevant ONLY if you NEED it to perform better    However  in most cases  this doesn t really matter - a few milliseconds here and there go unnoticed in human terms   If it really mattered that much  why aren t you writing code in assembler or C   cue another discussion     So the point is if you re happy using whatever collection you chose  and it solves your problem  even if it s not specifically the best type of collection for the task  knock yourself out    The software is malleable    Optimise your code where necessary     Uncle Bob says Premature Optimisation is the root of all evil  Uncle Bob says so

User · Answer

If you aren t inserting enough elements to result in frequent rehashings  or collisions  if your HashSet can t resize   a HashSet certainly gives you the benefit of constant time access  But on sets with lots of growth or shrinkage  you may actually get better performance with Treesets  depending on the implementation   Amortized time can be close to O 1  with a functional red-black tree  if memory serves me  Okasaki s book would have a better explanation than I can pull off   Or see his publication list

User · Answer

1 HashSet allows null object   2 TreeSet will not allow null object  If you try to add null value it will throw a NullPointerException   3 HashSet is much faster than TreeSet   e g    TreeSet lt String gt  ts   new TreeSet lt String gt      ts add null      throws NullPointerException   HashSet lt String gt  hs   new HashSet lt String gt      hs add null      runs fine

User · Answer

import java util HashSet  import java util Set  import java util TreeSet   public class HashTreeSetCompare          It is generally faster to add elements to the HashSet and then       convert the collection to a TreeSet for a duplicate-free sorted       Traversal         really       O Hash   tree set   gt  O tree set         Really     Why         public static void main String args               int size   80000          useHashThenTreeSet size           useTreeSetOnly size               private static void useTreeSetOnly int size             System out println  useTreeSetOnly              long start   System currentTimeMillis            Set lt String gt  sortedSet   new TreeSet lt String gt              for  int i   0  i  lt  size  i                  sortedSet add i                             System out println sortedSet           long end   System currentTimeMillis             System out println  useTreeSetOnly       end - start               private static void useHashThenTreeSet int size             System out println  useHashThenTreeSet              long start   System currentTimeMillis            Set lt String gt  set   new HashSet lt String gt              for  int i   0  i  lt  size  i                  set add i                           Set lt String gt  sortedSet   new TreeSet lt String gt  set             System out println sortedSet           long end   System currentTimeMillis             System out println  useHashThenTreeSet       end - start

User · Answer

Even after 11 years  nobody thought of mentioning a very important difference  Do you think that if HashSet equals TreeSet then the opposite is true as well   Take a look at this code  TreeSet lt String gt  treeSet   new TreeSet lt  gt  String CASE INSENSITIVE ORDER   HashSet lt String gt  hashSet   new HashSet lt  gt     treeSet add  quot a quot    hashSet add  quot A quot    System out println hashSet equals treeSet    System out println treeSet equals hashSet     Try to guess the output and then hover below snippet for seeing what the real output is  Ready  Here you go    false   true  That s right  they don t hold equivalence relation for a comparator that is inconsistent with equals  The reason for this is that a TreeSet uses a comparator to determine the equivalence while HashSet uses equals  Internally they use HashMap and TreeMap so you should expect this behavior with the mentioned Maps as well  Originally answered

User · Answer

HashSet implementations are  of course  much much faster -- less overhead because there s no ordering  A good analysis of the various Set implementations in Java is provided at http   java sun com docs books tutorial collections implementations set html    The discussion there also points out an interesting  middle ground  approach to the Tree vs Hash question  Java provides a LinkedHashSet  which is a HashSet with an  insertion-oriented  linked list running through it  that is  the last element in the linked list is also the most recently inserted into the Hash  This allows you to avoid the unruliness of an unordered hash without incurring the increased cost of a TreeSet

User · Answer

Message Edit   complete rewrite   When order does not matter  that s when  Both should give Log n  - it would be of utility to see if either is over five percent faster than the other  HashSet can give O 1  testing in a loop should reveal whether it is

User · Answer

The TreeSet is one of two sorted collections  the other being TreeMap   It uses a Red-Black tree structure  but you knew that   and guarantees that the elements will be in ascending order  according to natural order  Optionally  you can construct a TreeSet with a constructor that lets you give the collection your own rules for what the order should be  rather than relying on the ordering defined by the elements  class  by using a Comparable or Comparator  and A LinkedHashSet is an ordered version of HashSet that maintains a doubly-linked List across all elements  Use this class instead of HashSet when you care about the iteration order  When you iterate through a HashSet the order is unpredictable  while a LinkedHashSet lets you iterate through the elements in the order in which they were inserted

User · Answer

Basing on lovely visual answer on Maps by  shevchyk here is my take    ------------------------------------------------------------------------------       Property            HashSet               TreeSet             LinkedHashSet        -------------- --------------------- ------------------- ---------------------                       no guarantee order    sorted according                                 Order         will remain constant   to the natural          insertion-order                             over time            ordering                                   -------------- --------------------- ------------------- ---------------------      Add remove             O 1                 O log n                 O 1               -------------- --------------------- ------------------- ---------------------                                               NavigableSet                                  Interfaces             Set                  Set                    Set                                                         SortedSet                                  -------------- --------------------- ------------------- ---------------------                                                not allowed                                  Null values          allowed          1st element only          allowed                                                         in Java 7                                 -------------- ---------------------------------------------------------------                        Fail-fast behavior of an iterator cannot be guaranteed              Fail-fast     impossible to make any hard guarantees in the presence of             behavior                unsynchronized concurrent modification                   -------------- ---------------------------------------------------------------           Is                                                                             synchronized                 implementation is not synchronized                   ------------------------------------------------------------------------------

User · Answer

HashSet is O 1  to access elements  so it certainly does matter  But maintaining order of the objects in the set isn t possible   TreeSet is useful if maintaining an order In terms of values and not the insertion order  matters to you   But  as you ve noted  you re trading order for slower time to access an element  O log n  for basic operations   From the javadocs for TreeSet      This implementation provides guaranteed log n  time cost for the basic operations  add  remove and contains

User · Answer

HashSet is much faster than TreeSet  constant-time versus log-time for most operations like add  remove and contains  but offers no ordering guarantees like TreeSet   HashSet   the class offers constant time performance for the basic operations  add  remove  contains and size    it does not guarantee that the order of elements will remain constant over time iteration performance depends on the initial capacity and the load factor of the HashSet    It s quite safe to accept default load factor but you may want to specify an initial capacity that s about twice the size to which you expect the set to grow     TreeSet   guarantees log n  time cost for the basic operations  add  remove and contains  guarantees that elements of set will be sorted  ascending  natural  or the one specified by you via its constructor   implements SortedSet  doesn t offer any tuning parameters for iteration performance offers a few handy methods to deal with the ordered set like first    last    headSet    and tailSet   etc   Important points    Both guarantee duplicate-free collection of elements It is generally faster to add elements to the HashSet and then convert the collection to a TreeSet for a duplicate-free sorted traversal  None of these implementations are synchronized  That is if multiple threads access a set concurrently  and at least one of the threads modifies the set  it must be synchronized externally  LinkedHashSet is in some sense intermediate between HashSet and TreeSet  Implemented as a hash table with a linked list running through it  however it provides insertion-ordered iteration which is not same as sorted traversal guaranteed by TreeSet    So a choice of usage depends entirely on your needs but I feel that even if you need an ordered collection then you should still prefer HashSet to create the Set and then convert it into TreeSet    e g  SortedSet lt String gt  s   new TreeSet lt String gt  hashSet

User · Answer

One advantage not yet mentioned of a TreeSet is that its has greater  locality   which is shorthand for saying  1  if two entries are nearby in the order  a TreeSet places them near each other in the data structure  and hence in memory  and  2  this placement takes advantage of the principle of locality  which says that similar data is often accessed by an application with similar frequency   This is in contrast to a HashSet  which spreads the entries all over memory  no matter what their keys are   When the latency cost of reading from a hard drive is thousands of times the cost of reading from cache or RAM  and when the data really is accessed with locality  the TreeSet can be a much better choice

User · Answer

A lot of answers have been given  based on technical considerations  especially around performance  According to me  choice between TreeSet and HashSet matters   But I would rather say the choice should be driven by conceptual considerations first  If  for the objects your need to manipulate  a natural ordering does not make sense  then do not use TreeSet   It is a sorted set  since it implements SortedSet  So it means you need to override function compareTo  which should be consistent with what returns function equals  For example if you have a set of objects of a class called Student  then I do not think a TreeSet would make sense  since there is no natural ordering between students  You can order them by their average grade  okay  but this is not a  natural ordering   Function compareTo would return 0 not only when two objects represent the same student  but also when two different students have the same grade  For the second case  equals would return false  unless you decide to make the latter return true when two different students have the same grade  which would make equals function have a misleading meaning  not to say a wrong meaning   Please note this consistency between equals and compareTo is optional  but strongly recommended  Otherwise the contract of interface Set is broken  making your code misleading to other people  thus also possibly leading to unexpected behavior   This  link might be a good source of information regarding this question

[java] Hashset vs Treeset

Examples related to java

Examples related to hashset

Examples related to treeset