What is the best way to remove duplicate entries from a Data Table?
This question is related to
c#
.net
datatable
duplicate-data
Completely distinct rows:
public static DataTable Dictinct(this dt) => dt.DefaultView.ToTable(true);
Distinct by particular row(s) (Note that the columns mentioned in "distinctCulumnNames" will be returned in resulting DataTable):
public static DataTable Dictinct(this dt, params string[] distinctColumnNames) =>
dt.DefaultView.ToTable(true, distinctColumnNames);
Distinct by particular column (preserves all columns in given DataTable):
public static void Distinct(this DataTable dataTable, string distinctColumnName)
{
var distinctResult = new DataTable();
distinctResult.Merge(
.GroupBy(row => row.Field<object>(distinctColumnName))
.Select(group => group.First())
.CopyToDataTable()
);
if (distinctResult.DefaultView.Count < dataTable.DefaultView.Count)
{
dataTable.Clear();
dataTable.Merge(distinctResult);
dataTable.AcceptChanges();
}
}
Heres a easy and fast way using AsEnumerable().Distinct()
private DataTable RemoveDuplicatesRecords(DataTable dt)
{
//Returns just 5 unique rows
var UniqueRows = dt.AsEnumerable().Distinct(DataRowComparer.Default);
DataTable dt2 = UniqueRows.CopyToDataTable();
return dt2;
}
My Blog Article: Remove duplicate rows from datatable
A simple way would be:
var newDt= dt.AsEnumerable()
.GroupBy(x => x.Field<int>("ColumnName"))
.Select(y => y.First())
.CopyToDataTable();
/* To eliminate Duplicate rows */
private void RemoveDuplicates(DataTable dt)
{
if (dt.Rows.Count > 0)
{
for (int i = dt.Rows.Count - 1; i >= 0; i--)
{
if (i == 0)
{
break;
}
for (int j = i - 1; j >= 0; j--)
{
if (Convert.ToInt32(dt.Rows[i]["ID"]) == Convert.ToInt32(dt.Rows[j]["ID"]) && dt.Rows[i]["Name"].ToString() == dt.Rows[j]["Name"].ToString())
{
dt.Rows[i].Delete();
break;
}
}
}
dt.AcceptChanges();
}
}
There is a simple way using Linq GroupBy Method.
var duplicateValues = dt.AsEnumerable()
.GroupBy(row => row[0])
.Where(group => (group.Count() == 1 || group.Count() > 1))
.Select(g => g.Key);
foreach (var d in duplicateValues)
Console.WriteLine(d);
In order to distinct all datatable columns, you can easily retrieve the names of the columns in a string array
public static DataTable RemoveDuplicateRows(this DataTable dataTable)
{
List<string> columnNames = new List<string>();
foreach (DataColumn col in dataTable.Columns)
{
columnNames.Add(col.ColumnName);
}
return dataTable.DefaultView.ToTable(true, columnNames.Select(c => c.ToString()).ToArray());
}
As you can notice, I thought of using it as an extension to DataTable class
Do dtEmp
on your current working DataTable:
DataTable distinctTable = dtEmp.DefaultView.ToTable( /*distinct*/ true);
It's nice.
You can use the DefaultView.ToTable method of a DataTable to do the filtering like this (adapt to C#):
Public Sub RemoveDuplicateRows(ByRef rDataTable As DataTable)
Dim pNewDataTable As DataTable
Dim pCurrentRowCopy As DataRow
Dim pColumnList As New List(Of String)
Dim pColumn As DataColumn
'Build column list
For Each pColumn In rDataTable.Columns
pColumnList.Add(pColumn.ColumnName)
Next
'Filter by all columns
pNewDataTable = rDataTable.DefaultView.ToTable(True, pColumnList.ToArray)
rDataTable = rDataTable.Clone
'Import rows into original table structure
For Each pCurrentRowCopy In pNewDataTable.Rows
rDataTable.ImportRow(pCurrentRowCopy)
Next
End Sub
This post is regarding fetching only Distincts rows from Data table on basis of multiple Columns.
Public coid removeDuplicatesRows(DataTable dt)
{
DataTable uniqueCols = dt.DefaultView.ToTable(true, "RNORFQNo", "ManufacturerPartNo", "RNORFQId", "ItemId", "RNONo", "Quantity", "NSNNo", "UOMName", "MOQ", "ItemDescription");
}
You need to call this method and you need to assign value to datatable. In Above code we have RNORFQNo , PartNo,RFQ id,ItemId, RNONo, QUantity, NSNNO, UOMName,MOQ, and Item Description as Column on which we want distinct values.
Source: Stackoverflow.com