[java] Getting the difference between two sets

So if I have two sets:

Set<Integer> test1 = new HashSet<Integer>();
test1.add(1);
test1.add(2);
test1.add(3);

Set<Integer> test2 = new HashSet<Integer>();
test2.add(1);
test2.add(2);
test2.add(3);
test2.add(4);
test2.add(5);

Is there a way to compare them and only have a set of 4 and 5 returned?

This question is related to java set

The answer is


If you are using Java 8, you could try something like this:

public Set<Number> difference(final Set<Number> set1, final Set<Number> set2){
    final Set<Number> larger = set1.size() > set2.size() ? set1 : set2;
    final Set<Number> smaller = larger.equals(set1) ? set2 : set1;
    return larger.stream().filter(n -> !smaller.contains(n)).collect(Collectors.toSet());
}

Just to put one example here (system is in existingState, and we want to find elements to remove (elements that are not in newState but are present in existingState) and elements to add (elements that are in newState but are not present in existingState) :

public class AddAndRemove {

  static Set<Integer> existingState = Set.of(1,2,3,4,5);
  static Set<Integer> newState = Set.of(0,5,2,11,3,99);

  public static void main(String[] args) {

    Set<Integer> add = new HashSet<>(newState);
    add.removeAll(existingState);

    System.out.println("Elements to add : " + add);

    Set<Integer> remove = new HashSet<>(existingState);
    remove.removeAll(newState);

    System.out.println("Elements to remove : " + remove);

  }
}

would output this as a result:

Elements to add : [0, 99, 11]
Elements to remove : [1, 4]

If you use Guava (former Google Collections) library there is a solution:

SetView<Number> difference = com.google.common.collect.Sets.difference(test2, test1);

The returned SetView is a Set, it is a live representation you can either make immutable or copy to another set. test1 and test2 are left intact.


Java 8

We can make use of removeIf which takes a predicate to write a utility method as:

// computes the difference without modifying the sets
public static <T> Set<T> differenceJava8(final Set<T> setOne, final Set<T> setTwo) {
     Set<T> result = new HashSet<T>(setOne);
     result.removeIf(setTwo::contains);
     return result;
}

And in case we are still at some prior version then we can use removeAll as:

public static <T> Set<T> difference(final Set<T> setOne, final Set<T> setTwo) {
     Set<T> result = new HashSet<T>(setOne);
     result.removeAll(setTwo);
     return result;
}

You can make a union using .addAll(), and an intersection using .retainAll(), of the two sets and use .removeIf(), to remove the intersection (or the duplicated element) from the union.

HashSet union = new HashSet(group1);
union.addAll(group2);
        
System.out.println("Union: " + union);
        
HashSet intersection = new HashSet(group1);
intersection.retainAll(group2);
        
System.out.println("Intersection: " + intersection);
        
HashSet difference = new HashSet(union);
difference.removeIf(n -> (difference.contains(intersection)));
        
System.out.println("Difference: " + difference);

Adding a solution which I've recently used myself and haven't seen mentioned here. If you have Apache Commons Collections available then you can use the SetUtils#difference method:

// Returns all the elements of test2 which are not in test1
SetUtils.difference(test2, test1) 

Note that according to the documentation the returned set is an unmodifiable view:

Returns a unmodifiable view containing the difference of the given Sets, denoted by a \ b (or a - b). The returned view contains all elements of a that are not a member of b.

Full documentation: https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/SetUtils.html#difference-java.util.Set-java.util.Set-


You can use CollectionUtils.disjunction to get all differences or CollectionUtils.subtract to get the difference in the first collection.

Here is an example of how to do that:

    var collection1 = List.of(1, 2, 3, 4, 5);
    var collection2 = List.of(2, 3, 5, 6);
    System.out.println(StringUtils.join(collection1, " , "));
    System.out.println(StringUtils.join(collection2, " , "));
    System.out.println(StringUtils.join(CollectionUtils.subtract(collection1, collection2), " , "));
    System.out.println(StringUtils.join(CollectionUtils.retainAll(collection1, collection2), " , "));
    System.out.println(StringUtils.join(CollectionUtils.collate(collection1, collection2), " , "));
    System.out.println(StringUtils.join(CollectionUtils.disjunction(collection1, collection2), " , "));
    System.out.println(StringUtils.join(CollectionUtils.intersection(collection1, collection2), " , "));
    System.out.println(StringUtils.join(CollectionUtils.union(collection1, collection2), " , "));

Yes:

test2.removeAll(test1)

Although this will mutate test2, so create a copy if you need to preserve it.

Also, you probably meant <Integer> instead of <int>.