[c#] Remove duplicates from a List<T> in C#

Anyone have a quick method for de-duplicating a generic List in C#?

This question is related to c# list generics duplicates

The answer is


How about:

var noDupes = list.Distinct().ToList();

In .net 3.5?


Installing the MoreLINQ package via Nuget, you can easily distinct object list by a property

IEnumerable<Catalogue> distinctCatalogues = catalogues.DistinctBy(c => c.CatalogueCode); 

If you don't care about the order you can just shove the items into a HashSet, if you do want to maintain the order you can do something like this:

var unique = new List<T>();
var hs = new HashSet<T>();
foreach (T t in list)
    if (hs.Add(t))
        unique.Add(t);

Or the Linq way:

var hs = new HashSet<T>();
list.All( x =>  hs.Add(x) );

Edit: The HashSet method is O(N) time and O(N) space while sorting and then making unique (as suggested by @lassevk and others) is O(N*lgN) time and O(1) space so it's not so clear to me (as it was at first glance) that the sorting way is inferior (my apologies for the temporary down vote...)


Simply initialize a HashSet with a List of the same type:

var noDupes = new HashSet<T>(withDupes);

Or, if you want a List returned:

var noDupsList = new HashSet<T>(withDupes).ToList();

If you're using .Net 3+, you can use Linq.

List<T> withDupes = LoadSomeData();
List<T> noDupes = withDupes.Distinct().ToList();

A simple intuitive implementation:

public static List<PointF> RemoveDuplicates(List<PointF> listPoints)
{
    List<PointF> result = new List<PointF>();

    for (int i = 0; i < listPoints.Count; i++)
    {
        if (!result.Contains(listPoints[i]))
            result.Add(listPoints[i]);
        }

        return result;
    }

I think the simplest way is:

Create a new list and add unique item.

Example:

        class MyList{
    int id;
    string date;
    string email;
    }
    
    List<MyList> ml = new Mylist();

ml.Add(new MyList(){
id = 1;
date = "2020/09/06";
email = "zarezadeh@gmailcom"
});

ml.Add(new MyList(){
id = 2;
date = "2020/09/01";
email = "zarezadeh@gmailcom"
});

 List<MyList> New_ml = new Mylist();

foreach (var item in ml)
                {
                    if (New_ml.Where(w => w.email == item.email).SingleOrDefault() == null)
                    {
                        New_ml.Add(new MyList()
                        {
                          id = item.id,
     date = item.date,
               email = item.email
                        });
                    }
                }

An extension method might be a decent way to go... something like this:

public static List<T> Deduplicate<T>(this List<T> listToDeduplicate)
{
    return listToDeduplicate.Distinct().ToList();
}

And then call like this, for example:

List<int> myFilteredList = unfilteredList.Deduplicate();

As a helper method (without Linq):

public static List<T> Distinct<T>(this List<T> list)
{
    return (new HashSet<T>(list)).ToList();
}

I like to use this command:

List<Store> myStoreList = Service.GetStoreListbyProvince(provinceId)
                                                 .GroupBy(s => s.City)
                                                 .Select(grp => grp.FirstOrDefault())
                                                 .OrderBy(s => s.City)
                                                 .ToList();

I have these fields in my list: Id, StoreName, City, PostalCode I wanted to show list of cities in a dropdown which has duplicate values. solution: Group by city then pick the first one for the list.

I hope it helps :)


As kronoz said in .Net 3.5 you can use Distinct().

In .Net 2 you could mimic it:

public IEnumerable<T> DedupCollection<T> (IEnumerable<T> input) 
{
    var passedValues = new HashSet<T>();

    // Relatively simple dupe check alg used as example
    foreach(T item in input)
        if(passedValues.Add(item)) // True if item is new
            yield return item;
}

This could be used to dedupe any collection and will return the values in the original order.

It's normally much quicker to filter a collection (as both Distinct() and this sample does) than it would be to remove items from it.


There are many ways to solve - the duplicates issue in the List, below is one of them:

List<Container> containerList = LoadContainer();//Assume it has duplicates
List<Container> filteredList = new  List<Container>();
foreach (var container in containerList)
{ 
  Container duplicateContainer = containerList.Find(delegate(Container checkContainer)
  { return (checkContainer.UniqueId == container.UniqueId); });
   //Assume 'UniqueId' is the property of the Container class on which u r making a search

    if(!containerList.Contains(duplicateContainer) //Add object when not found in the new class object
      {
        filteredList.Add(container);
       }
  }

Cheers Ravi Ganesan


  public static void RemoveDuplicates<T>(IList<T> list )
  {
     if (list == null)
     {
        return;
     }
     int i = 1;
     while(i<list.Count)
     {
        int j = 0;
        bool remove = false;
        while (j < i && !remove)
        {
           if (list[i].Equals(list[j]))
           {
              remove = true;
           }
           j++;
        }
        if (remove)
        {
           list.RemoveAt(i);
        }
        else
        {
           i++;
        }
     }  
  }

You can use Union

obj2 = obj1.Union(obj1).ToList();

It worked for me. simply use

List<Type> liIDs = liIDs.Distinct().ToList<Type>();

Replace "Type" with your desired type e.g. int.


Might be easier to simply make sure that duplicates are not added to the list.

if(items.IndexOf(new_item) < 0) 
    items.add(new_item)

Use Linq's Union method.

Note: This solution requires no knowledge of Linq, aside from that it exists.

Code

Begin by adding the following to the top of your class file:

using System.Linq;

Now, you can use the following to remove duplicates from an object called, obj1:

obj1 = obj1.Union(obj1).ToList();

Note: Rename obj1 to the name of your object.

How it works

  1. The Union command lists one of each entry of two source objects. Since obj1 is both source objects, this reduces obj1 to one of each entry.

  2. The ToList() returns a new List. This is necessary, because Linq commands like Union returns the result as an IEnumerable result instead of modifying the original List or returning a new List.


Here's a simple solution that doesn't require any hard-to-read LINQ or any prior sorting of the list.

   private static void CheckForDuplicateItems(List<string> items)
    {
        if (items == null ||
            items.Count == 0)
            return;

        for (int outerIndex = 0; outerIndex < items.Count; outerIndex++)
        {
            for (int innerIndex = 0; innerIndex < items.Count; innerIndex++)
            {
                if (innerIndex == outerIndex) continue;
                if (items[outerIndex].Equals(items[innerIndex]))
                {
                    // Duplicate Found
                }
            }
        }
    }

Here's an extension method for removing adjacent duplicates in-situ. Call Sort() first and pass in the same IComparer. This should be more efficient than Lasse V. Karlsen's version which calls RemoveAt repeatedly (resulting in multiple block memory moves).

public static void RemoveAdjacentDuplicates<T>(this List<T> List, IComparer<T> Comparer)
{
    int NumUnique = 0;
    for (int i = 0; i < List.Count; i++)
        if ((i == 0) || (Comparer.Compare(List[NumUnique - 1], List[i]) != 0))
            List[NumUnique++] = List[i];
    List.RemoveRange(NumUnique, List.Count - NumUnique);
}

This takes distinct (the elements without duplicating elements) and convert it into a list again:

List<type> myNoneDuplicateValue = listValueWithDuplicate.Distinct().ToList();

In Java (I assume C# is more or less identical):

list = new ArrayList<T>(new HashSet<T>(list))

If you really wanted to mutate the original list:

List<T> noDupes = new ArrayList<T>(new HashSet<T>(list));
list.clear();
list.addAll(noDupes);

To preserve order, simply replace HashSet with LinkedHashSet.


All answers copy lists, or create a new list, or use slow functions, or are just painfully slow.

To my understanding, this is the fastest and cheapest method I know (also, backed by a very experienced programmer specialized on real-time physics optimization).

// Duplicates will be noticed after a sort O(nLogn)
list.Sort();

// Store the current and last items. Current item declaration is not really needed, and probably optimized by the compiler, but in case it's not...
int lastItem = -1;
int currItem = -1;

int size = list.Count;

// Store the index pointing to the last item we want to keep in the list
int last = size - 1;

// Travel the items from last to first O(n)
for (int i = last; i >= 0; --i)
{
    currItem = list[i];

    // If this item was the same as the previous one, we don't want it
    if (currItem == lastItem)
    {
        // Overwrite last in current place. It is a swap but we don't need the last
       list[i] = list[last];

        // Reduce the last index, we don't want that one anymore
        last--;
    }

    // A new item, we store it and continue
    else
        lastItem = currItem;
}

// We now have an unsorted list with the duplicates at the end.

// Remove the last items just once
list.RemoveRange(last + 1, size - last - 1);

// Sort again O(n logn)
list.Sort();

Final cost is:

nlogn + n + nlogn = n + 2nlogn = O(nlogn) which is pretty nice.

Note about RemoveRange: Since we cannot set the count of the list and avoid using the Remove funcions, I don't know exactly the speed of this operation but I guess it is the fastest way.


David J.'s answer is a good method, no need for extra objects, sorting, etc. It can be improved on however:

for (int innerIndex = items.Count - 1; innerIndex > outerIndex ; innerIndex--)

So the outer loop goes top bottom for the entire list, but the inner loop goes bottom "until the outer loop position is reached".

The outer loop makes sure the entire list is processed, the inner loop finds the actual duplicates, those can only happen in the part that the outer loop hasn't processed yet.

Or if you don't want to do bottom up for the inner loop you could have the inner loop start at outerIndex + 1.


If you have tow classes Product and Customer and we want to remove duplicate items from their list

public class Product
{
    public int Id { get; set; }
    public string ProductName { get; set; }
}

public class Customer
{
    public int Id { get; set; }
    public string CustomerName { get; set; }

}

You must define a generic class in the form below

public class ItemEqualityComparer<T> : IEqualityComparer<T> where T : class
{
    private readonly PropertyInfo _propertyInfo;

    public ItemEqualityComparer(string keyItem)
    {
        _propertyInfo = typeof(T).GetProperty(keyItem, BindingFlags.GetProperty | BindingFlags.Instance | BindingFlags.Public);
    }

    public bool Equals(T x, T y)
    {
        var xValue = _propertyInfo?.GetValue(x, null);
        var yValue = _propertyInfo?.GetValue(y, null);
        return xValue != null && yValue != null && xValue.Equals(yValue);
    }

    public int GetHashCode(T obj)
    {
        var propertyValue = _propertyInfo.GetValue(obj, null);
        return propertyValue == null ? 0 : propertyValue.GetHashCode();
    }
}

then, You can remove duplicate items in your list.

var products = new List<Product>
            {
                new Product{ProductName = "product 1" ,Id = 1,},
                new Product{ProductName = "product 2" ,Id = 2,},
                new Product{ProductName = "product 2" ,Id = 4,},
                new Product{ProductName = "product 2" ,Id = 4,},
            };
var productList = products.Distinct(new ItemEqualityComparer<Product>(nameof(Product.Id))).ToList();

var customers = new List<Customer>
            {
                new Customer{CustomerName = "Customer 1" ,Id = 5,},
                new Customer{CustomerName = "Customer 2" ,Id = 5,},
                new Customer{CustomerName = "Customer 2" ,Id = 5,},
                new Customer{CustomerName = "Customer 2" ,Id = 5,},
            };
var customerList = customers.Distinct(new ItemEqualityComparer<Customer>(nameof(Customer.Id))).ToList();

this code remove duplicate items by Id if you want remove duplicate items by other property, you can change nameof(YourClass.DuplicateProperty) same nameof(Customer.CustomerName) then remove duplicate items by CustomerName Property.


Sort it, then check two and two next to each others, as the duplicates will clump together.

Something like this:

list.Sort();
Int32 index = list.Count - 1;
while (index > 0)
{
    if (list[index] == list[index - 1])
    {
        if (index < list.Count - 1)
            (list[index], list[list.Count - 1]) = (list[list.Count - 1], list[index]);
        list.RemoveAt(list.Count - 1);
        index--;
    }
    else
        index--;
}

Notes:

  • Comparison is done from back to front, to avoid having to resort list after each removal
  • This example now uses C# Value Tuples to do the swapping, substitute with appropriate code if you can't use that
  • The end-result is no longer sorted

Another way in .Net 2.0

    static void Main(string[] args)
    {
        List<string> alpha = new List<string>();

        for(char a = 'a'; a <= 'd'; a++)
        {
            alpha.Add(a.ToString());
            alpha.Add(a.ToString());
        }

        Console.WriteLine("Data :");
        alpha.ForEach(delegate(string t) { Console.WriteLine(t); });

        alpha.ForEach(delegate (string v)
                          {
                              if (alpha.FindAll(delegate(string t) { return t == v; }).Count > 1)
                                  alpha.Remove(v);
                          });

        Console.WriteLine("Unique Result :");
        alpha.ForEach(delegate(string t) { Console.WriteLine(t);});
        Console.ReadKey();
    }

Examples related to c#

How can I convert this one line of ActionScript to C#? Microsoft Advertising SDK doesn't deliverer ads How to use a global array in C#? How to correctly write async method? C# - insert values from file into two arrays Uploading into folder in FTP? Are these methods thread safe? dotnet ef not found in .NET Core 3 HTTP Error 500.30 - ANCM In-Process Start Failure Best way to "push" into C# array

Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to generics

Instantiating a generic type Are these methods thread safe? The given key was not present in the dictionary. Which key? Using Java generics for JPA findAll() query with WHERE clause Using Spring RestTemplate in generic method with generic parameter How to create a generic array? Create a List of primitive int? How to have Java method return generic list of any type? Create a new object from type parameter in generic class What is the "proper" way to cast Hibernate Query.list() to List<Type>?

Examples related to duplicates

Remove duplicates from dataframe, based on two columns A,B, keeping row with max value in another column C Remove duplicates from a dataframe in PySpark How to "select distinct" across multiple data frame columns in pandas? How to find duplicate records in PostgreSQL Drop all duplicate rows across multiple columns in Python Pandas Left Join without duplicate rows from left table Finding duplicate integers in an array and display how many times they occurred How do I use SELECT GROUP BY in DataTable.Select(Expression)? How to delete duplicate rows in SQL Server? Python copy files to a new directory and rename if file name already exists