All mentioned solutions until now perform a GroupBy. Even if I only need the first Duplicate all elements of the collections are enumerated at least once.
The following extension function stops enumerating as soon as a duplicate has been found. It continues if a next duplicate is requested.
As always in LINQ there are two versions, one with IEqualityComparer and one without it.
public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource> source)
{
return source.ExtractDuplicates(null);
}
public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource source,
IEqualityComparer<TSource> comparer);
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (comparer == null)
comparer = EqualityCompare<TSource>.Default;
HashSet<TSource> foundElements = new HashSet<TSource>(comparer);
foreach (TSource sourceItem in source)
{
if (!foundElements.Contains(sourceItem))
{ // we've not seen this sourceItem before. Add to the foundElements
foundElements.Add(sourceItem);
}
else
{ // we've seen this item before. It is a duplicate!
yield return sourceItem;
}
}
}
Usage:
IEnumerable<MyClass> myObjects = ...
// check if has duplicates:
bool hasDuplicates = myObjects.ExtractDuplicates().Any();
// or find the first three duplicates:
IEnumerable<MyClass> first3Duplicates = myObjects.ExtractDuplicates().Take(3)
// or find the first 5 duplicates that have a Name = "MyName"
IEnumerable<MyClass> myNameDuplicates = myObjects.ExtractDuplicates()
.Where(duplicate => duplicate.Name == "MyName")
.Take(5);
For all these linq statements the collection is only parsed until the requested items are found. The rest of the sequence is not interpreted.
IMHO that is an efficiency boost to consider.