[c#] how to remove empty strings from list, then remove duplicate values from a list

Lets say I have a list of some column values coming from a table, how do I remove empty strings and duplicate values. Please see the following code:

List<string> dtList = dtReportsList.AsEnumerable().Select(dr => dr.Field<string>("column1")).ToList();

This is what I have coded just now but but Amiram's code is way more elegant, so I will choose that answer here is how I did it:

DataTable dtReportsList = someclass.GetReportsList();

        if (dtReportsList.Rows.Count > 0)
       { 
           List<string> dtList = dtReportsList.AsEnumerable().Select(dr => dr.Field<string>("column1")).ToList();
           dtList.RemoveAll(x=>x == "");
           dtList = dtList.Distinct().ToList();         

           rcboModule.DataSource = dtList;
           rcboModule.DataBind();               
           rcboModule.Items.Insert(0, new RadComboBoxItem("All", "All"));
       }

This question is related to c# linq

The answer is


Amiram Korach solution is indeed tidy. Here's an alternative for the sake of versatility.

var count = dtList.Count;
// Perform a reverse tracking.
for (var i = count - 1; i > -1; i--)
{
    if (dtList[i]==string.Empty) dtList.RemoveAt(i);
}
// Keep only the unique list items.
dtList = dtList.Distinct().ToList();

To simplify Amiram Korach's solution:

dtList.RemoveAll(s => string.IsNullOrWhiteSpace(s))

No need to use Distinct() or ToList()


Amiram's answer is correct, but Distinct() as implemented is an N2 operation; for each item in the list, the algorithm compares it to all the already processed elements, and returns it if it's unique or ignores it if not. We can do better.

A sorted list can be deduped in linear time; if the current element equals the previous element, ignore it, otherwise return it. Sorting is NlogN, so even having to sort the collection, we get some benefit:

public static IEnumerable<T> SortAndDedupe<T>(this IEnumerable<T> input)
{
   var toDedupe = input.OrderBy(x=>x);

   T prev;
   foreach(var element in toDedupe)
   {
      if(element == prev) continue;

      yield return element;
      prev = element;      
   }
}

//Usage
dtList  = dtList.Where(s => !string.IsNullOrWhitespace(s)).SortAndDedupe().ToList();

This returns the same elements; they're just sorted.