[c#] Why is it important to override GetHashCode when Equals method is overridden?

Given the following class

public class Foo
{
    public int FooId { get; set; }
    public string FooName { get; set; }

    public override bool Equals(object obj)
    {
        Foo fooItem = obj as Foo;

        if (fooItem == null) 
        {
           return false;
        }

        return fooItem.FooId == this.FooId;
    }

    public override int GetHashCode()
    {
        // Which is preferred?

        return base.GetHashCode();

        //return this.FooId.GetHashCode();
    }
}

I have overridden the Equals method because Foo represent a row for the Foos table. Which is the preferred method for overriding the GetHashCode?

Why is it important to override GetHashCode?

This question is related to c# overriding hashcode

The answer is


It's actually very hard to implement GetHashCode() correctly because, in addition to the rules Marc already mentioned, the hash code should not change during the lifetime of an object. Therefore the fields which are used to calculate the hash code must be immutable.

I finally found a solution to this problem when I was working with NHibernate. My approach is to calculate the hash code from the ID of the object. The ID can only be set though the constructor so if you want to change the ID, which is very unlikely, you have to create a new object which has a new ID and therefore a new hash code. This approach works best with GUIDs because you can provide a parameterless constructor which randomly generates an ID.


You should always guarantee that if two objects are equal, as defined by Equals(), they should return the same hash code. As some of the other comments state, in theory this is not mandatory if the object will never be used in a hash based container like HashSet or Dictionary. I would advice you to always follow this rule though. The reason is simply because it is way too easy for someone to change a collection from one type to another with the good intention of actually improving the performance or just conveying the code semantics in a better way.

For example, suppose we keep some objects in a List. Sometime later someone actually realizes that a HashSet is a much better alternative because of the better search characteristics for example. This is when we can get into trouble. List would internally use the default equality comparer for the type which means Equals in your case while HashSet makes use of GetHashCode(). If the two behave differently, so will your program. And bear in mind that such issues are not the easiest to troubleshoot.

I've summarized this behavior with some other GetHashCode() pitfalls in a blog post where you can find further examples and explanations.


By overriding Equals you're basically stating that you are the one who knows better how to compare two instances of a given type, so you're likely to be the best candidate to provide the best hash code.

This is an example of how ReSharper writes a GetHashCode() function for you:

public override int GetHashCode()
{
    unchecked
    {
        var result = 0;
        result = (result * 397) ^ m_someVar1;
        result = (result * 397) ^ m_someVar2;
        result = (result * 397) ^ m_someVar3;
        result = (result * 397) ^ m_someVar4;
        return result;
    }
}

As you can see it just tries to guess a good hash code based on all the fields in the class, but since you know your object's domain or value ranges you could still provide a better one.


We have two problems to cope with.

  1. You cannot provide a sensible GetHashCode() if any field in the object can be changed. Also often a object will NEVER be used in a collection that depends on GetHashCode(). So the cost of implementing GetHashCode() is often not worth it, or it is not possible.

  2. If someone puts your object in a collection that calls GetHashCode() and you have overrided Equals() without also making GetHashCode() behave in a correct way, that person may spend days tracking down the problem.

Therefore by default I do.

public class Foo
{
    public int FooId { get; set; }
    public string FooName { get; set; }

    public override bool Equals(object obj)
    {
        Foo fooItem = obj as Foo;

        if (fooItem == null)
        {
           return false;
        }

        return fooItem.FooId == this.FooId;
    }

    public override int GetHashCode()
    {
        // Some comment to explain if there is a real problem with providing GetHashCode() 
        // or if I just don't see a need for it for the given class
        throw new Exception("Sorry I don't know what GetHashCode should do for this class");
    }
}

Below using reflection seems to me a better option considering public properties as with this you don't have have to worry about addition / removal of properties (although not so common scenario). This I found to be performing better also.(Compared time using Diagonistics stop watch).

    public int getHashCode()
    {
        PropertyInfo[] theProperties = this.GetType().GetProperties();
        int hash = 31;
        foreach (PropertyInfo info in theProperties)
        {
            if (info != null)
            {
                var value = info.GetValue(this,null);
                if(value != null)
                unchecked
                {
                    hash = 29 * hash ^ value.GetHashCode();
                }
            }
        }
        return hash;  
    }

It's not necessarily important; it depends on the size of your collections and your performance requirements and whether your class will be used in a library where you may not know the performance requirements. I frequently know my collection sizes are not very large and my time is more valuable than a few microseconds of performance gained by creating a perfect hash code; so (to get rid of the annoying warning by the compiler) I simply use:

   public override int GetHashCode()
   {
      return base.GetHashCode();
   }

(Of course I could use a #pragma to turn off the warning as well but I prefer this way.)

When you are in the position that you do need the performance than all of the issues mentioned by others here apply, of course. Most important - otherwise you will get wrong results when retrieving items from a hash set or dictionary: the hash code must not vary with the life time of an object (more accurately, during the time whenever the hash code is needed, such as while being a key in a dictionary): for example, the following is wrong as Value is public and so can be changed externally to the class during the life time of the instance, so you must not use it as the basis for the hash code:


   class A
   {
      public int Value;

      public override int GetHashCode()
      {
         return Value.GetHashCode(); //WRONG! Value is not constant during the instance's life time
      }
   }    

On the other hand, if Value can't be changed it's ok to use:


   class A
   {
      public readonly int Value;

      public override int GetHashCode()
      {
         return Value.GetHashCode(); //OK  Value is read-only and can't be changed during the instance's life time
      }
   }


Just to add on above answers:

If you don't override Equals then the default behavior is that references of the objects are compared. The same applies to hashcode - the default implmentation is typically based on a memory address of the reference. Because you did override Equals it means the correct behavior is to compare whatever you implemented on Equals and not the references, so you should do the same for the hashcode.

Clients of your class will expect the hashcode to have similar logic to the equals method, for example linq methods which use a IEqualityComparer first compare the hashcodes and only if they're equal they'll compare the Equals() method which might be more expensive to run, if we didn't implement hashcode, equal object will probably have different hashcodes (because they have different memory address) and will be determined wrongly as not equal (Equals() won't even hit).

In addition, except the problem that you might not be able to find your object if you used it in a dictionary (because it was inserted by one hashcode and when you look for it the default hashcode will probably be different and again the Equals() won't even be called, like Marc Gravell explains in his answer, you also introduce a violation of the dictionary or hashset concept which should not allow identical keys - you already declared that those objects are essentially the same when you overrode Equals so you don't want both of them as different keys on a data structure which suppose to have a unique key. But because they have a different hashcode the "same" key will be inserted as different one.


It's actually very hard to implement GetHashCode() correctly because, in addition to the rules Marc already mentioned, the hash code should not change during the lifetime of an object. Therefore the fields which are used to calculate the hash code must be immutable.

I finally found a solution to this problem when I was working with NHibernate. My approach is to calculate the hash code from the ID of the object. The ID can only be set though the constructor so if you want to change the ID, which is very unlikely, you have to create a new object which has a new ID and therefore a new hash code. This approach works best with GUIDs because you can provide a parameterless constructor which randomly generates an ID.


We have two problems to cope with.

  1. You cannot provide a sensible GetHashCode() if any field in the object can be changed. Also often a object will NEVER be used in a collection that depends on GetHashCode(). So the cost of implementing GetHashCode() is often not worth it, or it is not possible.

  2. If someone puts your object in a collection that calls GetHashCode() and you have overrided Equals() without also making GetHashCode() behave in a correct way, that person may spend days tracking down the problem.

Therefore by default I do.

public class Foo
{
    public int FooId { get; set; }
    public string FooName { get; set; }

    public override bool Equals(object obj)
    {
        Foo fooItem = obj as Foo;

        if (fooItem == null)
        {
           return false;
        }

        return fooItem.FooId == this.FooId;
    }

    public override int GetHashCode()
    {
        // Some comment to explain if there is a real problem with providing GetHashCode() 
        // or if I just don't see a need for it for the given class
        throw new Exception("Sorry I don't know what GetHashCode should do for this class");
    }
}

You should always guarantee that if two objects are equal, as defined by Equals(), they should return the same hash code. As some of the other comments state, in theory this is not mandatory if the object will never be used in a hash based container like HashSet or Dictionary. I would advice you to always follow this rule though. The reason is simply because it is way too easy for someone to change a collection from one type to another with the good intention of actually improving the performance or just conveying the code semantics in a better way.

For example, suppose we keep some objects in a List. Sometime later someone actually realizes that a HashSet is a much better alternative because of the better search characteristics for example. This is when we can get into trouble. List would internally use the default equality comparer for the type which means Equals in your case while HashSet makes use of GetHashCode(). If the two behave differently, so will your program. And bear in mind that such issues are not the easiest to troubleshoot.

I've summarized this behavior with some other GetHashCode() pitfalls in a blog post where you can find further examples and explanations.


How about:

public override int GetHashCode()
{
    return string.Format("{0}_{1}_{2}", prop1, prop2, prop3).GetHashCode();
}

Assuming performance is not an issue :)


By overriding Equals you're basically stating that you are the one who knows better how to compare two instances of a given type, so you're likely to be the best candidate to provide the best hash code.

This is an example of how ReSharper writes a GetHashCode() function for you:

public override int GetHashCode()
{
    unchecked
    {
        var result = 0;
        result = (result * 397) ^ m_someVar1;
        result = (result * 397) ^ m_someVar2;
        result = (result * 397) ^ m_someVar3;
        result = (result * 397) ^ m_someVar4;
        return result;
    }
}

As you can see it just tries to guess a good hash code based on all the fields in the class, but since you know your object's domain or value ranges you could still provide a better one.


It is because the framework requires that two objects that are the same must have the same hashcode. If you override the equals method to do a special comparison of two objects and the two objects are considered the same by the method, then the hash code of the two objects must also be the same. (Dictionaries and Hashtables rely on this principle).


As of C# 9(.net 5 or .net core 3.1), you may want to use records as it does Value Based Equality.


It is because the framework requires that two objects that are the same must have the same hashcode. If you override the equals method to do a special comparison of two objects and the two objects are considered the same by the method, then the hash code of the two objects must also be the same. (Dictionaries and Hashtables rely on this principle).


Please don´t forget to check the obj parameter against null when overriding Equals(). And also compare the type.

public override bool Equals(object obj)
{
    Foo fooItem = obj as Foo;

    if (fooItem == null)
    {
       return false;
    }

    return fooItem.FooId == this.FooId;
}

The reason for this is: Equals must return false on comparison to null. See also http://msdn.microsoft.com/en-us/library/bsc2ak47.aspx


It's my understanding that the original GetHashCode() returns the memory address of the object, so it's essential to override it if you wish to compare two different objects.

EDITED: That was incorrect, the original GetHashCode() method cannot assure the equality of 2 values. Though objects that are equal return the same hash code.


Hash code is used for hash-based collections like Dictionary, Hashtable, HashSet etc. The purpose of this code is to very quickly pre-sort specific object by putting it into specific group (bucket). This pre-sorting helps tremendously in finding this object when you need to retrieve it back from hash-collection because code has to search for your object in just one bucket instead of in all objects it contains. The better distribution of hash codes (better uniqueness) the faster retrieval. In ideal situation where each object has a unique hash code, finding it is an O(1) operation. In most cases it approaches O(1).


As of C# 9(.net 5 or .net core 3.1), you may want to use records as it does Value Based Equality.


Below using reflection seems to me a better option considering public properties as with this you don't have have to worry about addition / removal of properties (although not so common scenario). This I found to be performing better also.(Compared time using Diagonistics stop watch).

    public int getHashCode()
    {
        PropertyInfo[] theProperties = this.GetType().GetProperties();
        int hash = 31;
        foreach (PropertyInfo info in theProperties)
        {
            if (info != null)
            {
                var value = info.GetValue(this,null);
                if(value != null)
                unchecked
                {
                    hash = 29 * hash ^ value.GetHashCode();
                }
            }
        }
        return hash;  
    }

It's actually very hard to implement GetHashCode() correctly because, in addition to the rules Marc already mentioned, the hash code should not change during the lifetime of an object. Therefore the fields which are used to calculate the hash code must be immutable.

I finally found a solution to this problem when I was working with NHibernate. My approach is to calculate the hash code from the ID of the object. The ID can only be set though the constructor so if you want to change the ID, which is very unlikely, you have to create a new object which has a new ID and therefore a new hash code. This approach works best with GUIDs because you can provide a parameterless constructor which randomly generates an ID.


As of .NET 4.7 the preferred method of overriding GetHashCode() is shown below. If targeting older .NET versions, include the System.ValueTuple nuget package.

// C# 7.0+
public override int GetHashCode() => (FooId, FooName).GetHashCode();

In terms of performance, this method will outperform most composite hash code implementations. The ValueTuple is a struct so there won't be any garbage, and the underlying algorithm is as fast as it gets.


It's my understanding that the original GetHashCode() returns the memory address of the object, so it's essential to override it if you wish to compare two different objects.

EDITED: That was incorrect, the original GetHashCode() method cannot assure the equality of 2 values. Though objects that are equal return the same hash code.


As of .NET 4.7 the preferred method of overriding GetHashCode() is shown below. If targeting older .NET versions, include the System.ValueTuple nuget package.

// C# 7.0+
public override int GetHashCode() => (FooId, FooName).GetHashCode();

In terms of performance, this method will outperform most composite hash code implementations. The ValueTuple is a struct so there won't be any garbage, and the underlying algorithm is as fast as it gets.


Hash code is used for hash-based collections like Dictionary, Hashtable, HashSet etc. The purpose of this code is to very quickly pre-sort specific object by putting it into specific group (bucket). This pre-sorting helps tremendously in finding this object when you need to retrieve it back from hash-collection because code has to search for your object in just one bucket instead of in all objects it contains. The better distribution of hash codes (better uniqueness) the faster retrieval. In ideal situation where each object has a unique hash code, finding it is an O(1) operation. In most cases it approaches O(1).


Please don´t forget to check the obj parameter against null when overriding Equals(). And also compare the type.

public override bool Equals(object obj)
{
    Foo fooItem = obj as Foo;

    if (fooItem == null)
    {
       return false;
    }

    return fooItem.FooId == this.FooId;
}

The reason for this is: Equals must return false on comparison to null. See also http://msdn.microsoft.com/en-us/library/bsc2ak47.aspx


By overriding Equals you're basically stating that you are the one who knows better how to compare two instances of a given type, so you're likely to be the best candidate to provide the best hash code.

This is an example of how ReSharper writes a GetHashCode() function for you:

public override int GetHashCode()
{
    unchecked
    {
        var result = 0;
        result = (result * 397) ^ m_someVar1;
        result = (result * 397) ^ m_someVar2;
        result = (result * 397) ^ m_someVar3;
        result = (result * 397) ^ m_someVar4;
        return result;
    }
}

As you can see it just tries to guess a good hash code based on all the fields in the class, but since you know your object's domain or value ranges you could still provide a better one.


It is because the framework requires that two objects that are the same must have the same hashcode. If you override the equals method to do a special comparison of two objects and the two objects are considered the same by the method, then the hash code of the two objects must also be the same. (Dictionaries and Hashtables rely on this principle).


How about:

public override int GetHashCode()
{
    return string.Format("{0}_{1}_{2}", prop1, prop2, prop3).GetHashCode();
}

Assuming performance is not an issue :)


It is because the framework requires that two objects that are the same must have the same hashcode. If you override the equals method to do a special comparison of two objects and the two objects are considered the same by the method, then the hash code of the two objects must also be the same. (Dictionaries and Hashtables rely on this principle).


It's actually very hard to implement GetHashCode() correctly because, in addition to the rules Marc already mentioned, the hash code should not change during the lifetime of an object. Therefore the fields which are used to calculate the hash code must be immutable.

I finally found a solution to this problem when I was working with NHibernate. My approach is to calculate the hash code from the ID of the object. The ID can only be set though the constructor so if you want to change the ID, which is very unlikely, you have to create a new object which has a new ID and therefore a new hash code. This approach works best with GUIDs because you can provide a parameterless constructor which randomly generates an ID.


Just to add on above answers:

If you don't override Equals then the default behavior is that references of the objects are compared. The same applies to hashcode - the default implmentation is typically based on a memory address of the reference. Because you did override Equals it means the correct behavior is to compare whatever you implemented on Equals and not the references, so you should do the same for the hashcode.

Clients of your class will expect the hashcode to have similar logic to the equals method, for example linq methods which use a IEqualityComparer first compare the hashcodes and only if they're equal they'll compare the Equals() method which might be more expensive to run, if we didn't implement hashcode, equal object will probably have different hashcodes (because they have different memory address) and will be determined wrongly as not equal (Equals() won't even hit).

In addition, except the problem that you might not be able to find your object if you used it in a dictionary (because it was inserted by one hashcode and when you look for it the default hashcode will probably be different and again the Equals() won't even be called, like Marc Gravell explains in his answer, you also introduce a violation of the dictionary or hashset concept which should not allow identical keys - you already declared that those objects are essentially the same when you overrode Equals so you don't want both of them as different keys on a data structure which suppose to have a unique key. But because they have a different hashcode the "same" key will be inserted as different one.


Examples related to c#

How can I convert this one line of ActionScript to C#? Microsoft Advertising SDK doesn't deliverer ads How to use a global array in C#? How to correctly write async method? C# - insert values from file into two arrays Uploading into folder in FTP? Are these methods thread safe? dotnet ef not found in .NET Core 3 HTTP Error 500.30 - ANCM In-Process Start Failure Best way to "push" into C# array

Examples related to overriding

How to underline a UILabel in swift? How to 'update' or 'overwrite' a python list maven command line how to point to a specific settings.xml for a single command? How to override the properties of a CSS class using another CSS class What is the difference between dynamic and static polymorphism in Java? Overriding css style? Android Overriding onBackPressed() What is the 'override' keyword in C++ used for? Why do we have to override the equals() method in Java? Can overridden methods differ in return type?

Examples related to hashcode

Hashing with SHA1 Algorithm in C# HashMaps and Null values? How to create a HashMap with two keys (Key-Pair, Value)? What is hashCode used for? Is it unique? How does a Java HashMap handle different objects with the same hash code? Hashcode and Equals for Hashset What is the use of hashCode in Java? Good Hash Function for Strings Why do I need to override the equals and hashCode methods in Java? Memory address of variables in Java