[c#] Distinct by property of class with LINQ

I have a collection:

List<Car> cars = new List<Car>();

Cars are uniquely identified by their property CarCode.

I have three cars in the collection, and two with identical CarCodes.

How can I use LINQ to convert this collection to Cars with unique CarCodes?

This question is related to c# linq distinct

The answer is


Same approach as Guffa but as an extension method:

public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> items, Func<T, TKey> property)
{
    return items.GroupBy(property).Select(x => x.First());
}

Used as:

var uniqueCars = cars.DistinctBy(x => x.CarCode);

Another way to accomplish the same thing...

List<Car> distinticBy = cars
    .Select(car => car.CarCode)
    .Distinct()
    .Select(code => cars.First(car => car.CarCode == code))
    .ToList();

It's possible to create an extension method to do this in a more generic way. It would be interesting if someone could evalute performance of this 'DistinctBy' against the GroupBy approach.


You can implement an IEqualityComparer and use that in your Distinct extension.

class CarEqualityComparer : IEqualityComparer<Car>
{
    #region IEqualityComparer<Car> Members

    public bool Equals(Car x, Car y)
    {
        return x.CarCode.Equals(y.CarCode);
    }

    public int GetHashCode(Car obj)
    {
        return obj.CarCode.GetHashCode();
    }

    #endregion
}

And then

var uniqueCars = cars.Distinct(new CarEqualityComparer());

You can't effectively use Distinct on a collection of objects (without additional work). I will explain why.

The documentation says:

It uses the default equality comparer, Default, to compare values.

For objects that means it uses the default equation method to compare objects (source). That is on their hash code. And since your objects don't implement the GetHashCode() and Equals methods, it will check on the reference of the object, which are not distinct.


You can check out my PowerfulExtensions library. Currently it's in a very young stage, but already you can use methods like Distinct, Union, Intersect, Except on any number of properties;

This is how you use it:

using PowerfulExtensions.Linq;
...
var distinct = myArray.Distinct(x => x.A, x => x.B);

Use MoreLINQ, which has a DistinctBy method :)

IEnumerable<Car> distinctCars = cars.DistinctBy(car => car.CarCode);

(This is only for LINQ to Objects, mind you.)


Another extension method for Linq-to-Objects, without using GroupBy:

    /// <summary>
    /// Returns the set of items, made distinct by the selected value.
    /// </summary>
    /// <typeparam name="TSource">The type of the source.</typeparam>
    /// <typeparam name="TResult">The type of the result.</typeparam>
    /// <param name="source">The source collection.</param>
    /// <param name="selector">A function that selects a value to determine unique results.</param>
    /// <returns>IEnumerable&lt;TSource&gt;.</returns>
    public static IEnumerable<TSource> Distinct<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
    {
        HashSet<TResult> set = new HashSet<TResult>();

        foreach(var item in source)
        {
            var selectedValue = selector(item);

            if (set.Add(selectedValue))
                yield return item;
        }
    }

I think the best option in Terms of performance (or in any terms) is to Distinct using the The IEqualityComparer interface.

Although implementing each time a new comparer for each class is cumbersome and produces boilerplate code.

So here is an extension method which produces a new IEqualityComparer on the fly for any class using reflection.

Usage:

var filtered = taskList.DistinctBy(t => t.TaskExternalId).ToArray();

Extension Method Code

public static class LinqExtensions
{
    public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> items, Func<T, TKey> property)
    {
        GeneralPropertyComparer<T, TKey> comparer = new GeneralPropertyComparer<T,TKey>(property);
        return items.Distinct(comparer);
    }   
}
public class GeneralPropertyComparer<T,TKey> : IEqualityComparer<T>
{
    private Func<T, TKey> expr { get; set; }
    public GeneralPropertyComparer (Func<T, TKey> expr)
    {
        this.expr = expr;
    }
    public bool Equals(T left, T right)
    {
        var leftProp = expr.Invoke(left);
        var rightProp = expr.Invoke(right);
        if (leftProp == null && rightProp == null)
            return true;
        else if (leftProp == null ^ rightProp == null)
            return false;
        else
            return leftProp.Equals(rightProp);
    }
    public int GetHashCode(T obj)
    {
        var prop = expr.Invoke(obj);
        return (prop==null)? 0:prop.GetHashCode();
    }
}

Examples related to c#

How can I convert this one line of ActionScript to C#? Microsoft Advertising SDK doesn't deliverer ads How to use a global array in C#? How to correctly write async method? C# - insert values from file into two arrays Uploading into folder in FTP? Are these methods thread safe? dotnet ef not found in .NET Core 3 HTTP Error 500.30 - ANCM In-Process Start Failure Best way to "push" into C# array

Examples related to linq

Async await in linq select How to resolve Value cannot be null. Parameter name: source in linq? What does Include() do in LINQ? Selecting multiple columns with linq query and lambda expression System.Collections.Generic.List does not contain a definition for 'Select' lambda expression join multiple tables with select and where clause LINQ select one field from list of DTO objects to array The model backing the 'ApplicationDbContext' context has changed since the database was created Check if two lists are equal Why is this error, 'Sequence contains no elements', happening?

Examples related to distinct

Using DISTINCT along with GROUP BY in SQL Server How to "select distinct" across multiple data frame columns in pandas? Laravel Eloquent - distinct() and count() not working properly together SQL - select distinct only on one column SQL: Group by minimum value in one field while selecting distinct rows Count distinct value pairs in multiple columns in SQL sql query distinct with Row_Number Eliminating duplicate values based on only one column of the table MongoDB distinct aggregation Pandas count(distinct) equivalent