How to get duplicate items from a list using LINQ

Question

I m having a List lt string gt  like   List lt String gt  list   new List lt String gt   6   1   2   4   6   5   1      I need to get the duplicate items in the list into a new list  Now I m using a nested for loop to do this   The resulting list will contain   6   1     Is there any idea to do this using LINQ or lambda expressions

User · Answer

All mentioned solutions until now perform a GroupBy. Even if I only need the first Duplicate all elements of the collections are enumerated at least once.

The following extension function stops enumerating as soon as a duplicate has been found. It continues if a next duplicate is requested.

As always in LINQ there are two versions, one with IEqualityComparer and one without it.

public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource> source)
{
    return source.ExtractDuplicates(null);
}
public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource source,
    IEqualityComparer<TSource> comparer);
{
    if (source == null) throw new ArgumentNullException(nameof(source));
    if (comparer == null)
        comparer = EqualityCompare<TSource>.Default;

    HashSet<TSource> foundElements = new HashSet<TSource>(comparer);
    foreach (TSource sourceItem in source)
    {
        if (!foundElements.Contains(sourceItem))
        {   // we've not seen this sourceItem before. Add to the foundElements
            foundElements.Add(sourceItem);
        }
        else
        {   // we've seen this item before. It is a duplicate!
            yield return sourceItem;
        }
    }
}

Usage:

IEnumerable<MyClass> myObjects = ...

// check if has duplicates:
bool hasDuplicates = myObjects.ExtractDuplicates().Any();

// or find the first three duplicates:
IEnumerable<MyClass> first3Duplicates = myObjects.ExtractDuplicates().Take(3)

// or find the first 5 duplicates that have a Name = "MyName"
IEnumerable<MyClass> myNameDuplicates = myObjects.ExtractDuplicates()
    .Where(duplicate => duplicate.Name == "MyName")
    .Take(5);

For all these linq statements the collection is only parsed until the requested items are found. The rest of the sequence is not interpreted.

IMHO that is an efficiency boost to consider.

User · Answer

Here s another option   var list   new List lt string gt     6    1    2    4    6    5    1      var set   new HashSet lt string gt     var duplicates   list Where x   gt   set Add x

User · Answer

Here is one way to do it   List lt String gt  duplicates   lst GroupBy x   gt  x                                Where g   gt  g Count    gt  1                                Select g   gt  g Key                                ToList      The GroupBy groups the elements that are the same together  and the Where filters out those that only appear once  leaving you with only the duplicates

User · Answer

I wrote this extension method based off  Lee s response to the OP   Note  a default parameter was used  requiring C  4 0    However  an overloaded method call in C  3 0 would suffice          lt summary gt      Method that returns all the duplicates  distinct  in the collection       lt  summary gt       lt typeparam name  T  gt The type of the collection  lt  typeparam gt       lt param name  source  gt The source collection to detect for duplicates lt  param gt       lt param name  distinct  gt Specify  lt b gt true lt  b gt  to only return distinct elements  lt  param gt       lt returns gt A distinct list of duplicates found in the source collection  lt  returns gt       lt remarks gt This is an extension method to IEnumerable amp lt T amp gt  lt  remarks gt  public static IEnumerable lt T gt  Duplicates lt T gt            this IEnumerable lt T gt  source  bool distinct   true         if  source    null                 throw new ArgumentNullException  source                    select the elements that are repeated      IEnumerable lt T gt  result   source GroupBy a   gt  a  SelectMany a   gt  a Skip 1             distinct       if  distinct    true                    deferred execution helps us here         result   result Distinct                 return result

User · Answer

I was trying to solve the same with a list of objects and was having issues because I was trying to repack the list of groups into the original list  So I came up with looping through the groups to repack the original List with items that have duplicates   public List lt MediaFileInfo gt  GetDuplicatePictures         List lt MediaFileInfo gt  dupes   new List lt MediaFileInfo gt         var grpDupes   from f in  fileRepo                    group f by f Length into grps                    where grps Count    gt 1                    select grps      foreach  var item in grpDupes                foreach  var thing in item                        dupes Add thing                       return dupes

User · Answer

I know it s not the answer to the original question  but you may find yourself here with this problem   If you want all of the duplicate items in your results  the following works   var duplicates   list      GroupBy  x   gt  x                    group matching items      Where  g   gt  g Skip 1  Any          where the group contains more than one item      SelectMany  g   gt  g                 re-expand the groups with more than one item   In my situation I need all duplicates so that I can mark them in the UI as being errors

User · Answer

List lt String gt  list   new List lt String gt     6    1    2    4    6    5    1          var q   from s in list             group s by s into g             where g Count    gt  1             select g First         foreach  var item in q                Console WriteLine item

User · Answer

var duplicates   lst GroupBy s   gt  s       SelectMany grp   gt  grp Skip 1      Note that this will return all duplicates  so if you only want to know which items are duplicated in the source list  you could apply Distinct to the resulting sequence or use the solution given by Mark Byers

User · Answer

Hope this wil help  int   listOfItems   new     4  2  3  1  6  4  3     var duplicates   listOfItems       GroupBy i   gt  i       Where g   gt  g Count    gt  1       Select g   gt  g Key    foreach  var d in duplicates      Console WriteLine d

[c#] How to get duplicate items from a list using LINQ?

Examples related to c#

Examples related to linq

Examples related to duplicates