Consistency of hashCode on a Java string

Question

The hashCode value of a Java String is computed as  String hashCode      s 0  31  n-1    s 1  31  n-2          s n-1    Are there any circumstances  say JVM version  vendor  etc   under which the following expression will evaluate to false   boolean expression    This is a Java string  hashCode      586653468   Update  1  If you claim that the answer is  yes  there are such circumstances  - then please give a concrete example of when  This is a Java string  hashCode      586653468  Try to be as specific concrete as possible   Update  2  We all know that relying on the implementation details of hashCode   is bad in general  However  I m talking specifically about String hashCode   - so please keep the answer focused to String hashCode    Object hashCode   is totally irrelevant in the context of this question

User · Answer

As said above  in general you should not rely on the hash code of a class remaining the same  Note that even subsequent runs of the same application on the same VM may produce different hash values  AFAIK the Sun JVM s hash function calculates the same hash on every run  but that s not guaranteed   Note that this is not theoretical  The hash function for java lang String was changed in JDK1 2  the old hash had problems with hierarchical strings like URLs or file names  as it tended to produce the same hash for strings which only differed at the end    java lang String is a special case  as the algorithm of its hashCode   is  now  documented  so you can probably rely on that  I d still consider it bad practice  If you need a hash algorithm with special  documented properties  just write one  -

User · Answer

You should not rely on a hash code being equal to a specific value  Just that it will return consistent results within the same execution   The API docs say the following       The general contract of hashCode is          Whenever it is invoked on the same object more than once during an execution of a Java application  the hashCode method must consistently return the same integer  provided no information used in equals comparisons on the object is modified  This integer need not remain consistent from one execution of an application to another execution of the same application       EDIT Since the javadoc for String hashCode   specifies how a String s hash code is computed  any violation of this would violate the public API specification

User · Answer

I can see that documentation as far back as Java 1 2   While it s true that in general you shouldn t rely on a hash code implementation remaining the same  it s now documented behaviour for java lang String  so changing it would count as breaking existing contracts   Wherever possible  you shouldn t rely on hash codes staying the same across versions etc - but in my mind java lang String is a special case simply because the algorithm has been specified    so long as you re willing to abandon compatibility with releases before the algorithm was specified  of course

User · Answer

Another     issue to worry about is the possible change of implementation between early late versions of Java  I don t believe the implementation details are set in stone  and so potentially an upgrade to a future Java version could cause problems   Bottom line is  I wouldn t rely on the implementation of hashCode     Perhaps you can highlight what problem you re actually trying to solve by using this mechanism  and that will highlight a more suitable approach

User · Answer

Just to answer your question and not to continue any discussions  The Apache Harmony JDK implementation seems to use a different algorithm  at least it looks totally different   Sun JDK  public int hashCode         int h   hash      if  h    0            int off   offset          char val     value          int len   count           for  int i   0  i  lt  len  i                  h   31 h   val off                       hash   h            return h      Apache Harmony  public int hashCode         if  hashCode    0            int hash   0  multiplier   1          for  int i   offset   count - 1  i  gt   offset  i--                hash    value i    multiplier              int shifted   multiplier  lt  lt  5              multiplier   shifted - multiplier                    hashCode   hash            return hashCode      Feel free to check it yourself

User · Answer

I found something about JDK 1 0 and 1 1 and    1 2      In JDK 1 0 x and 1 1 x the hashCode   function for long Strings worked by   sampling every nth character  This   pretty well guaranteed you would have   many Strings hashing to the same   value  thus slowing down Hashtable   lookup  In JDK 1 2 the function has   been improved to multiply the result   so far by 31 then add the next   character in sequence  This is a   little slower  but is much better at   avoiding collisions    Source  http   mindprod com jgloss hashcode html   Something different  because you seem to need a number  How about using CRC32 or MD5 instead of hashcode and you are good to go - no discussions and no worries at all

User · Answer

If you re worried about changes and possibly incompatibly VMs  just copy the existing hashcode implementation into your own utility class  and use that to generate your hashcodes

User · Answer

The hashcode will be calculated based on the ASCII values of the characters in the String   This is the implementation in the String Class is as follows  public int hashCode         int h   hash      if  h    0  amp  amp  value length  gt  0            hash   h   isLatin1     StringLatin1 hashCode value                                  StringUTF16 hashCode value             return h      Collisions in hashcode are unavoidable  For example  the strings  Ea  and  FB  give the same hashcode as 2236

[java] Consistency of hashCode() on a Java string

Examples related to java

Examples related to string

Examples related to hashcode