[.net] Comparing two collections for equality irrespective of the order of items in them

I would like to compare two collections (in C#), but I'm not sure of the best way to implement this efficiently.

I've read the other thread about Enumerable.SequenceEqual, but it's not exactly what I'm looking for.

In my case, two collections would be equal if they both contain the same items (no matter the order).

Example:

collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};

collection1 == collection2; // true

What I usually do is to loop through each item of one collection and see if it exists in the other collection, then loop through each item of the other collection and see if it exists in the first collection. (I start by comparing the lengths).

if (collection1.Count != collection2.Count)
    return false; // the collections are not equal

foreach (Item item in collection1)
{
    if (!collection2.Contains(item))
        return false; // the collections are not equal
}

foreach (Item item in collection2)
{
    if (!collection1.Contains(item))
        return false; // the collections are not equal
}

return true; // the collections are equal

However, this is not entirely correct, and it's probably not the most efficient way to do compare two collections for equality.

An example I can think of that would be wrong is:

collection1 = {1, 2, 3, 3, 4}
collection2 = {1, 2, 2, 3, 4}

Which would be equal with my implementation. Should I just count the number of times each item is found and make sure the counts are equal in both collections?


The examples are in some sort of C# (let's call it pseudo-C#), but give your answer in whatever language you wish, it does not matter.

Note: I used integers in the examples for simplicity, but I want to be able to use reference-type objects too (they do not behave correctly as keys because only the reference of the object is compared, not the content).

This question is related to .net collections comparison equality

The answer is


Here's my extension method variant of ohadsc's answer, in case it's useful to someone

static public class EnumerableExtensions 
{
    static public bool IsEquivalentTo<T>(this IEnumerable<T> first, IEnumerable<T> second)
    {
        if ((first == null) != (second == null))
            return false;

        if (!object.ReferenceEquals(first, second) && (first != null))
        {
            if (first.Count() != second.Count())
                return false;

            if ((first.Count() != 0) && HaveMismatchedElement<T>(first, second))
                return false;
        }

        return true;
    }

    private static bool HaveMismatchedElement<T>(IEnumerable<T> first, IEnumerable<T> second)
    {
        int firstCount;
        int secondCount;

        var firstElementCounts = GetElementCounts<T>(first, out firstCount);
        var secondElementCounts = GetElementCounts<T>(second, out secondCount);

        if (firstCount != secondCount)
            return true;

        foreach (var kvp in firstElementCounts)
        {
            firstCount = kvp.Value;
            secondElementCounts.TryGetValue(kvp.Key, out secondCount);

            if (firstCount != secondCount)
                return true;
        }

        return false;
    }

    private static Dictionary<T, int> GetElementCounts<T>(IEnumerable<T> enumerable, out int nullCount)
    {
        var dictionary = new Dictionary<T, int>();
        nullCount = 0;

        foreach (T element in enumerable)
        {
            if (element == null)
            {
                nullCount++;
            }
            else
            {
                int num;
                dictionary.TryGetValue(element, out num);
                num++;
                dictionary[element] = num;
            }
        }

        return dictionary;
    }

    static private int GetHashCode<T>(IEnumerable<T> enumerable)
    {
        int hash = 17;

        foreach (T val in enumerable.OrderBy(x => x))
            hash = hash * 23 + val.GetHashCode();

        return hash;
    }
}

You could use a Hashset. Look at the SetEquals method.


If you use Shouldly, you can use ShouldAllBe with Contains.

collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};

collection1.ShouldAllBe(item=>collection2.Contains(item)); // true

And finally, you can write an extension.

public static class ShouldlyIEnumerableExtensions
{
    public static void ShouldEquivalentTo<T>(this IEnumerable<T> list, IEnumerable<T> equivalent)
    {
        list.ShouldAllBe(l => equivalent.Contains(l));
    }
}

UPDATE

A optional parameter exists on ShouldBe method.

collection1.ShouldBe(collection2, ignoreOrder: true); // true

A duplicate post of sorts, but check out my solution for comparing collections. It's pretty simple:

This will perform an equality comparison regardless of order:

var list1 = new[] { "Bill", "Bob", "Sally" };
var list2 = new[] { "Bob", "Bill", "Sally" };
bool isequal = list1.Compare(list2).IsSame;

This will check to see if items were added / removed:

var list1 = new[] { "Billy", "Bob" };
var list2 = new[] { "Bob", "Sally" };
var diff = list1.Compare(list2);
var onlyinlist1 = diff.Removed; //Billy
var onlyinlist2 = diff.Added;   //Sally
var inbothlists = diff.Equal;   //Bob

This will see what items in the dictionary changed:

var original = new Dictionary<int, string>() { { 1, "a" }, { 2, "b" } };
var changed = new Dictionary<int, string>() { { 1, "aaa" }, { 2, "b" } };
var diff = original.Compare(changed, (x, y) => x.Value == y.Value, (x, y) => x.Value == y.Value);
foreach (var item in diff.Different)
  Console.Write("{0} changed to {1}", item.Key.Value, item.Value.Value);
//Will output: a changed to aaa

Original post here.


Create a Dictionary "dict" and then for each member in the first collection, do dict[member]++;

Then, loop over the second collection in the same way, but for each member do dict[member]--.

At the end, loop over all of the members in the dictionary:

    private bool SetEqual (List<int> left, List<int> right) {

        if (left.Count != right.Count)
            return false;

        Dictionary<int, int> dict = new Dictionary<int, int>();

        foreach (int member in left) {
            if (dict.ContainsKey(member) == false)
                dict[member] = 1;
            else
                dict[member]++;
        }

        foreach (int member in right) {
            if (dict.ContainsKey(member) == false)
                return false;
            else
                dict[member]--;
        }

        foreach (KeyValuePair<int, int> kvp in dict) {
            if (kvp.Value != 0)
                return false;
        }

        return true;

    }

Edit: As far as I can tell this is on the same order as the most efficient algorithm. This algorithm is O(N), assuming that the Dictionary uses O(1) lookups.


This simple solution forces the IEnumerable's generic type to implement IComparable. Because of OrderBy's definition.

If you don't want to make such an assumption but still want use this solution, you can use the following piece of code :

bool equal = collection1.OrderBy(i => i?.GetHashCode())
   .SequenceEqual(collection2.OrderBy(i => i?.GetHashCode()));

In the case of no repeats and no order, the following EqualityComparer can be used to allow collections as dictionary keys:

public class SetComparer<T> : IEqualityComparer<IEnumerable<T>> 
where T:IComparable<T>
{
    public bool Equals(IEnumerable<T> first, IEnumerable<T> second)
    {
        if (first == second)
            return true;
        if ((first == null) || (second == null))
            return false;
        return first.ToHashSet().SetEquals(second);
    }

    public int GetHashCode(IEnumerable<T> enumerable)
    {
        int hash = 17;

        foreach (T val in enumerable.OrderBy(x => x))
            hash = hash * 23 + val.GetHashCode();

        return hash;
    }
}

Here is the ToHashSet() implementation I used. The hash code algorithm comes from Effective Java (by way of Jon Skeet).


Here is a solution which is an improvement over this one.

public static bool HasSameElementsAs<T>(
        this IEnumerable<T> first, 
        IEnumerable<T> second, 
        IEqualityComparer<T> comparer = null)
    {
        var firstMap = first
            .GroupBy(x => x, comparer)
            .ToDictionary(x => x.Key, x => x.Count(), comparer);

        var secondMap = second
            .GroupBy(x => x, comparer)
            .ToDictionary(x => x.Key, x => x.Count(), comparer);

        if (firstMap.Keys.Count != secondMap.Keys.Count)
            return false;

        if (firstMap.Keys.Any(k1 => !secondMap.ContainsKey(k1)))
            return false;

        return firstMap.Keys.All(x => firstMap[x] == secondMap[x]);
    }

EDIT: I realized as soon as I posed that this really only works for sets -- it will not properly deal with collections that have duplicate items. For example { 1, 1, 2 } and { 2, 2, 1 } will be considered equal from this algorithm's perspective. If your collections are sets (or their equality can be measured that way), however, I hope you find the below useful.

The solution I use is:

return c1.Count == c2.Count && c1.Intersect(c2).Count() == c1.Count;

Linq does the dictionary thing under the covers, so this is also O(N). (Note, it's O(1) if the collections aren't the same size).

I did a sanity check using the "SetEqual" method suggested by Daniel, the OrderBy/SequenceEquals method suggested by Igor, and my suggestion. The results are below, showing O(N*LogN) for Igor and O(N) for mine and Daniel's.

I think the simplicity of the Linq intersect code makes it the preferable solution.

__Test Latency(ms)__
N, SetEquals, OrderBy, Intersect    
1024, 0, 0, 0    
2048, 0, 0, 0    
4096, 31.2468, 0, 0    
8192, 62.4936, 0, 0    
16384, 156.234, 15.6234, 0    
32768, 312.468, 15.6234, 46.8702    
65536, 640.5594, 46.8702, 31.2468    
131072, 1312.3656, 93.7404, 203.1042    
262144, 3765.2394, 187.4808, 187.4808    
524288, 5718.1644, 374.9616, 406.2084    
1048576, 11420.7054, 734.2998, 718.6764    
2097152, 35090.1564, 1515.4698, 1484.223

In many cases the only suitable answer is the one of Igor Ostrovsky , other answers are based on objects hash code. But when you generate an hash code for an object you do so only based on his IMMUTABLE fields - such as object Id field (in case of a database entity) - Why is it important to override GetHashCode when Equals method is overridden?

This means , that if you compare two collections , the result might be true of the compare method even though the fields of the different items are non-equal . To deep compare collections , you need to use Igor's method and implement IEqualirity .

Please read the comments of me and mr.Schnider's on his most voted post.

James


If comparing for the purpose of Unit Testing Assertions, it may make sense to throw some efficiency out the window and simply convert each list to a string representation (csv) before doing the comparison. That way, the default test Assertion message will display the differences within the error message.

Usage:

using Microsoft.VisualStudio.TestTools.UnitTesting;

// define collection1, collection2, ...

Assert.Equal(collection1.OrderBy(c=>c).ToCsv(), collection2.OrderBy(c=>c).ToCsv());

Helper Extension Method:

public static string ToCsv<T>(
    this IEnumerable<T> values,
    Func<T, string> selector,
    string joinSeparator = ",")
{
    if (selector == null)
    {
        if (typeof(T) == typeof(Int16) ||
            typeof(T) == typeof(Int32) ||
            typeof(T) == typeof(Int64))
        {
            selector = (v) => Convert.ToInt64(v).ToStringInvariant();
        }
        else if (typeof(T) == typeof(decimal))
        {
            selector = (v) => Convert.ToDecimal(v).ToStringInvariant();
        }
        else if (typeof(T) == typeof(float) ||
                typeof(T) == typeof(double))
        {
            selector = (v) => Convert.ToDouble(v).ToString(CultureInfo.InvariantCulture);
        }
        else
        {
            selector = (v) => v.ToString();
        }
    }

    return String.Join(joinSeparator, values.Select(v => selector(v)));
}

static bool SetsContainSameElements<T>(IEnumerable<T> set1, IEnumerable<T> set2) {
    var setXOR = new HashSet<T>(set1);
    setXOR.SymmetricExceptWith(set2);
    return (setXOR.Count == 0);
}

Solution requires .NET 3.5 and the System.Collections.Generic namespace. According to Microsoft, SymmetricExceptWith is an O(n + m) operation, with n representing the number of elements in the first set and m representing the number of elements in the second. You could always add an equality comparer to this function if necessary.


A simple and fairly efficient solution is to sort both collections and then compare them for equality:

bool equal = collection1.OrderBy(i => i).SequenceEqual(
                 collection2.OrderBy(i => i));

This algorithm is O(N*logN), while your solution above is O(N^2).

If the collections have certain properties, you may be able to implement a faster solution. For example, if both of your collections are hash sets, they cannot contain duplicates. Also, checking whether a hash set contains some element is very fast. In that case an algorithm similar to yours would likely be fastest.


Allowing for duplicates in the IEnumerable<T> (if sets are not desirable\possible) and "ignoring order" you should be able to use a .GroupBy().

I'm not an expert on the complexity measurements, but my rudimentary understanding is that this should be O(n). I understand O(n^2) as coming from performing an O(n) operation inside another O(n) operation like ListA.Where(a => ListB.Contains(a)).ToList(). Every item in ListB is evaluated for equality against each item in ListA.

Like I said, my understanding on complexity is limited, so correct me on this if I'm wrong.

public static bool IsSameAs<T, TKey>(this IEnumerable<T> source, IEnumerable<T> target, Expression<Func<T, TKey>> keySelectorExpression)
    {
        // check the object
        if (source == null && target == null) return true;
        if (source == null || target == null) return false;

        var sourceList = source.ToList();
        var targetList = target.ToList();

        // check the list count :: { 1,1,1 } != { 1,1,1,1 }
        if (sourceList.Count != targetList.Count) return false;

        var keySelector = keySelectorExpression.Compile();
        var groupedSourceList = sourceList.GroupBy(keySelector).ToList();
        var groupedTargetList = targetList.GroupBy(keySelector).ToList();

        // check that the number of grouptings match :: { 1,1,2,3,4 } != { 1,1,2,3,4,5 }
        var groupCountIsSame = groupedSourceList.Count == groupedTargetList.Count;
        if (!groupCountIsSame) return false;

        // check that the count of each group in source has the same count in target :: for values { 1,1,2,3,4 } & { 1,1,1,2,3,4 }
        // key:count
        // { 1:2, 2:1, 3:1, 4:1 } != { 1:3, 2:1, 3:1, 4:1 }
        var countsMissmatch = groupedSourceList.Any(sourceGroup =>
                                                        {
                                                            var targetGroup = groupedTargetList.Single(y => y.Key.Equals(sourceGroup.Key));
                                                            return sourceGroup.Count() != targetGroup.Count();
                                                        });
        return !countsMissmatch;
    }

Why not use .Except()

// Create the IEnumerable data sources.
string[] names1 = System.IO.File.ReadAllLines(@"../../../names1.txt");
string[] names2 = System.IO.File.ReadAllLines(@"../../../names2.txt");
// Create the query. Note that method syntax must be used here.
IEnumerable<string> differenceQuery =   names1.Except(names2);
// Execute the query.
Console.WriteLine("The following lines are in names1.txt but not names2.txt");
foreach (string s in differenceQuery)
     Console.WriteLine(s);

http://msdn.microsoft.com/en-us/library/bb397894.aspx


This is my (heavily influenced by D.Jennings) generic implementation of the comparison method (in C#):

/// <summary>
/// Represents a service used to compare two collections for equality.
/// </summary>
/// <typeparam name="T">The type of the items in the collections.</typeparam>
public class CollectionComparer<T>
{
    /// <summary>
    /// Compares the content of two collections for equality.
    /// </summary>
    /// <param name="foo">The first collection.</param>
    /// <param name="bar">The second collection.</param>
    /// <returns>True if both collections have the same content, false otherwise.</returns>
    public bool Execute(ICollection<T> foo, ICollection<T> bar)
    {
        // Declare a dictionary to count the occurence of the items in the collection
        Dictionary<T, int> itemCounts = new Dictionary<T,int>();

        // Increase the count for each occurence of the item in the first collection
        foreach (T item in foo)
        {
            if (itemCounts.ContainsKey(item))
            {
                itemCounts[item]++;
            }
            else
            {
                itemCounts[item] = 1;
            }
        }

        // Wrap the keys in a searchable list
        List<T> keys = new List<T>(itemCounts.Keys);

        // Decrease the count for each occurence of the item in the second collection
        foreach (T item in bar)
        {
            // Try to find a key for the item
            // The keys of a dictionary are compared by reference, so we have to
            // find the original key that is equivalent to the "item"
            // You may want to override ".Equals" to define what it means for
            // two "T" objects to be equal
            T key = keys.Find(
                delegate(T listKey)
                {
                    return listKey.Equals(item);
                });

            // Check if a key was found
            if(key != null)
            {
                itemCounts[key]--;
            }
            else
            {
                // There was no occurence of this item in the first collection, thus the collections are not equal
                return false;
            }
        }

        // The count of each item should be 0 if the contents of the collections are equal
        foreach (int value in itemCounts.Values)
        {
            if (value != 0)
            {
                return false;
            }
        }

        // The collections are equal
        return true;
    }
}

erickson is almost right: since you want to match on counts of duplicates, you want a Bag. In Java, this looks something like:

(new HashBag(collection1)).equals(new HashBag(collection2))

I'm sure C# has a built-in Set implementation. I would use that first; if performance is a problem, you could always use a different Set implementation, but use the same Set interface.


There are many solutions to this problem. If you don't care about duplicates, you don't have to sort both. First make sure that they have the same number of items. After that sort one of the collections. Then binsearch each item from the second collection in the sorted collection. If you don't find a given item stop and return false. The complexity of this: - sorting the first collection: NLog(N) - searching each item from second into the first: NLOG(N) so you end up with 2*N*LOG(N) assuming that they match and you look up everything. This is similar to the complexity of sorting both. Also this gives you the benefit to stop earlier if there's a difference. However, keep in mind that if both are sorted before you step into this comparison and you try sorting by use something like a qsort, the sorting will be more expensive. There are optimizations for this. Another alternative, which is great for small collections where you know the range of the elements is to use a bitmask index. This will give you a O(n) performance. Another alternative is to use a hash and look it up. For small collections it is usually a lot better to do the sorting or the bitmask index. Hashtable have the disadvantage of worse locality so keep that in mind. Again, that's only if you don't care about duplicates. If you want to account for duplicates go with sorting both.


Examples related to .net

You must add a reference to assembly 'netstandard, Version=2.0.0.0 How to use Bootstrap 4 in ASP.NET Core No authenticationScheme was specified, and there was no DefaultChallengeScheme found with default authentification and custom authorization .net Core 2.0 - Package was restored using .NetFramework 4.6.1 instead of target framework .netCore 2.0. The package may not be fully compatible Update .NET web service to use TLS 1.2 EF Core add-migration Build Failed What is the difference between .NET Core and .NET Standard Class Library project types? Visual Studio 2017 - Could not load file or assembly 'System.Runtime, Version=4.1.0.0' or one of its dependencies Nuget connection attempt failed "Unable to load the service index for source" Token based authentication in Web API without any user interface

Examples related to collections

Kotlin's List missing "add", "remove", Map missing "put", etc? How to unset (remove) a collection element after fetching it? How can I get a List from some class properties with Java 8 Stream? Java 8 stream map to list of keys sorted by values How to convert String into Hashmap in java How can I turn a List of Lists into a List in Java 8? MongoDB Show all contents from all collections Get nth character of a string in Swift programming language Java 8 Distinct by property Is there a typescript List<> and/or Map<> class/library?

Examples related to comparison

Wildcard string comparison in Javascript How to compare two JSON objects with the same elements in a different order equal? Comparing strings, c++ Char Comparison in C bash string compare to multiple correct values Comparing two hashmaps for equal values and same key sets? Comparing boxed Long values 127 and 128 Compare two files report difference in python How do I fix this "TypeError: 'str' object is not callable" error? Compare cell contents against string in Excel

Examples related to equality

Python if not == vs if != Comparing arrays for equality in C++ Correct way to override Equals() and GetHashCode() Determine if 2 lists have the same elements, regardless of order? equals vs Arrays.equals in Java What is the difference between == and equals() in Java? What's the difference between equal?, eql?, ===, and ==? bash string equality jQuery object equality String comparison in Python: is vs. ==