[java] How to remove duplicates from a list?

I want to remove duplicates from a list but what I am doing is not working:

List<Customer> listCustomer = new ArrayList<Customer>();    
for (Customer customer: tmpListCustomer)
{
  if (!listCustomer.contains(customer)) 
  {
    listCustomer.add(customer);
  }
 }

This question is related to java list collections duplicates

The answer is


The correct answer for Java is use a Set. If you already have a List<Customer> and want to de duplicate it

Set<Customer> s = new HashSet<Customer>(listCustomer);

Otherise just use a Set implemenation HashSet, TreeSet directly and skip the List construction phase.

You will need to override hashCode() and equals() on your domain classes that are put in the Set as well to make sure that the behavior you want actually what you get. equals() can be as simple as comparing unique ids of the objects to as complex as comparing every field. hashCode() can be as simple as returning the hashCode() of the unique id' String representation or the hashCode().


As others have mentioned, you are probably not implementing equals() correctly.

However, you should also note that this code is considered quite inefficient, since the runtime could be the number of elements squared.

You might want to consider using a Set structure instead of a List instead, or building a Set first and then turning it into a list.


Two suggestions:

  • Use a HashSet instead of an ArrayList. This will speed up the contains() checks considerably if you have a long list

  • Make sure Customer.equals() and Customer.hashCode() are implemented properly, i.e. they should be based on the combined values of the underlying fields in the customer object.


Assuming you want to keep the current order and don't want a Set, perhaps the easiest is:

List<Customer> depdupeCustomers =
    new ArrayList<>(new LinkedHashSet<>(customers));

If you want to change the original list:

Set<Customer> depdupeCustomers = new LinkedHashSet<>(customers);
customers.clear();
customers.addAll(dedupeCustomers);

I suspect you might not have Customer.equals() implemented properly (or at all).

List.contains() uses equals() to verify whether any of its elements is identical to the object passed as parameter. However, the default implementation of equals tests for physical identity, not value identity. So if you have not overwritten it in Customer, it will return false for two distinct Customer objects having identical state.

Here are the nitty-gritty details of how to implement equals (and hashCode, which is its pair - you must practically always implement both if you need to implement either of them). Since you haven't shown us the Customer class, it is difficult to give more concrete advice.

As others have noted, you are better off using a Set rather than doing the job by hand, but even for that, you still need to implement those methods.


List ? Set ? List (distinct)

Just add all your elements to a Set: it does not allow it's elements to be repeated. If you need a list afterwards, use new ArrayList(theSet) constructor afterwards (where theSet is your resulting set).


Using java 8 stream api.

    List<String> list = new ArrayList<>();
    list.add("one");
    list.add("one");
    list.add("two");
    System.out.println(list);
    Collection<String> c = list.stream().collect(Collectors.toSet());
    System.out.println(c);

Output:

Before values : [one, one, two]

After Values : [one, two]


java 8 update
you can use stream of array as below:

Arrays.stream(yourArray).distinct()
                    .collect(Collectors.toList());

The "contains" method searched for whether the list contains an entry that returns true from Customer.equals(Object o). If you have not overridden equals(Object) in Customer or one of its parents then it will only search for an existing occurrence of the same object. It may be this was what you wanted, in which case your code should work. But if you were looking for not having two objects both representing the same customer, then you need to override equals(Object) to return true when that is the case.

It is also true that using one of the implementations of Set instead of List would give you duplicate removal automatically, and faster (for anything other than very small Lists). You will still need to provide code for equals.

You should also override hashCode() when you override equals().


Nearly all of the above answers are right but what I suggest is to use a Map or Set while creating the related list, not after to gain performance. Because converting a list to a Set or Map and then reconverting it to a List again is a trivial work.

Sample Code:

Set<String> stringsSet = new LinkedHashSet<String>();//A Linked hash set 
//prevents the adding order of the elements
for (String string: stringsList) {
    stringsSet.add(string);
}
return new ArrayList<String>(stringsSet);

IMHO best way how to do it these days:

Suppose you have a Collection "dups" and you want to create another Collection containing the same elements but with all duplicates eliminated. The following one-liner does the trick.

Collection<collectionType> noDups = new HashSet<collectionType>(dups);

It works by creating a Set which, by definition, cannot contain duplicates.

Based on oracle doc.


private void removeTheDuplicates(List<Customer>myList) {
    for(ListIterator<Customer>iterator = myList.listIterator(); iterator.hasNext();) {
        Customer customer = iterator.next();
        if(Collections.frequency(myList, customer) > 1) {
            iterator.remove();
        }
    }
    System.out.println(myList.toString());

}

Does Customer implement the equals() contract?

If it doesn't implement equals() and hashCode(), then listCustomer.contains(customer) will check to see if the exact same instance already exists in the list (By instance I mean the exact same object--memory address, etc). If what you are looking for is to test whether or not the same Customer( perhaps it's the same customer if they have the same customer name, or customer number) is in the list already, then you would need to override equals() to ensure that it checks whether or not the relevant fields(e.g. customer names) match.

Note: Don't forget to override hashCode() if you are going to override equals()! Otherwise, you might get trouble with your HashMaps and other data structures. For a good coverage of why this is and what pitfalls to avoid, consider having a look at Josh Bloch's Effective Java chapters on equals() and hashCode() (The link only contains iformation about why you must implement hashCode() when you implement equals(), but there is good coverage about how to override equals() too).

By the way, is there an ordering restriction on your set? If there isn't, a slightly easier way to solve this problem is use a Set<Customer> like so:

Set<Customer> noDups = new HashSet<Customer>();
noDups.addAll(tmpListCustomer);
return new ArrayList<Customer>(noDups);

Which will nicely remove duplicates for you, since Sets don't allow duplicates. However, this will lose any ordering that was applied to tmpListCustomer, since HashSet has no explicit ordering (You can get around that by using a TreeSet, but that's not exactly related to your question). This can simplify your code a little bit.


The cleanest way is:

List<XXX> lstConsultada = dao.findByPropertyList(YYY);
List<XXX> lstFinal = new ArrayList<XXX>(new LinkedHashSet<GrupoOrigen>(XXX));

and override hascode and equals over the Id's properties of each entity


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to collections

Kotlin's List missing "add", "remove", Map missing "put", etc? How to unset (remove) a collection element after fetching it? How can I get a List from some class properties with Java 8 Stream? Java 8 stream map to list of keys sorted by values How to convert String into Hashmap in java How can I turn a List of Lists into a List in Java 8? MongoDB Show all contents from all collections Get nth character of a string in Swift programming language Java 8 Distinct by property Is there a typescript List<> and/or Map<> class/library?

Examples related to duplicates

Remove duplicates from dataframe, based on two columns A,B, keeping row with max value in another column C Remove duplicates from a dataframe in PySpark How to "select distinct" across multiple data frame columns in pandas? How to find duplicate records in PostgreSQL Drop all duplicate rows across multiple columns in Python Pandas Left Join without duplicate rows from left table Finding duplicate integers in an array and display how many times they occurred How do I use SELECT GROUP BY in DataTable.Select(Expression)? How to delete duplicate rows in SQL Server? Python copy files to a new directory and rename if file name already exists