[java] Should I always use a parallel stream when possible?

With Java 8 and lambdas it's easy to iterate over collections as streams, and just as easy to use a parallel stream. Two examples from the docs, the second one using parallelStream:

myShapesCollection.stream()
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));

myShapesCollection.parallelStream() // <-- This one uses parallel
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));

As long as I don't care about the order, would it always be beneficial to use the parallel? One would think it is faster dividing the work on more cores.

Are there other considerations? When should parallel stream be used and when should the non-parallel be used?

(This question is asked to trigger a discussion about how and when to use parallel streams, not because I think always using them is a good idea.)

This question is related to java parallel-processing java-8 java-stream

The answer is


Never parallelize an infinite stream with a limit. Here is what happens:

    public static void main(String[] args) {
        // let's count to 1 in parallel
        System.out.println(
            IntStream.iterate(0, i -> i + 1)
                .parallel()
                .skip(1)
                .findFirst()
                .getAsInt());
    }

Result

    Exception in thread "main" java.lang.OutOfMemoryError
        at ...
        at java.base/java.util.stream.IntPipeline.findFirst(IntPipeline.java:528)
        at InfiniteTest.main(InfiniteTest.java:24)
    Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.stream.SpinedBuffer$OfInt.newArray(SpinedBuffer.java:750)
        at ...

Same if you use .limit(...)

Explanation here: Java 8, using .parallel in a stream causes OOM error

Similarly, don't use parallel if the stream is ordered and has much more elements than you want to process, e.g.

public static void main(String[] args) {
    // let's count to 1 in parallel
    System.out.println(
            IntStream.range(1, 1000_000_000)
                    .parallel()
                    .skip(100)
                    .findFirst()
                    .getAsInt());
}

This may run much longer because the parallel threads may work on plenty of number ranges instead of the crucial one 0-100, causing this to take very long time.


Other answers have already covered profiling to avoid premature optimization and overhead cost in parallel processing. This answer explains the ideal choice of data structures for parallel streaming.

As a rule, performance gains from parallelism are best on streams over ArrayList , HashMap , HashSet , and ConcurrentHashMap instances; arrays; int ranges; and long ranges. What these data structures have in common is that they can all be accurately and cheaply split into subranges of any desired sizes, which makes it easy to divide work among parallel threads. The abstraction used by the streams library to perform this task is the spliterator , which is returned by the spliterator method on Stream and Iterable.

Another important factor that all of these data structures have in common is that they provide good-to-excellent locality of reference when processed sequentially: sequential element references are stored together in memory. The objects referred to by those references may not be close to one another in memory, which reduces locality-of-reference. Locality-of-reference turns out to be critically important for parallelizing bulk operations: without it, threads spend much of their time idle, waiting for data to be transferred from memory into the processor’s cache. The data structures with the best locality of reference are primitive arrays because the data itself is stored contiguously in memory.

Source: Item #48 Use Caution When Making Streams Parallel, Effective Java 3e by Joshua Bloch


The Stream API was designed to make it easy to write computations in a way that was abstracted away from how they would be executed, making switching between sequential and parallel easy.

However, just because its easy, doesn't mean its always a good idea, and in fact, it is a bad idea to just drop .parallel() all over the place simply because you can.

First, note that parallelism offers no benefits other than the possibility of faster execution when more cores are available. A parallel execution will always involve more work than a sequential one, because in addition to solving the problem, it also has to perform dispatching and coordinating of sub-tasks. The hope is that you'll be able to get to the answer faster by breaking up the work across multiple processors; whether this actually happens depends on a lot of things, including the size of your data set, how much computation you are doing on each element, the nature of the computation (specifically, does the processing of one element interact with processing of others?), the number of processors available, and the number of other tasks competing for those processors.

Further, note that parallelism also often exposes nondeterminism in the computation that is often hidden by sequential implementations; sometimes this doesn't matter, or can be mitigated by constraining the operations involved (i.e., reduction operators must be stateless and associative.)

In reality, sometimes parallelism will speed up your computation, sometimes it will not, and sometimes it will even slow it down. It is best to develop first using sequential execution and then apply parallelism where

(A) you know that there's actually benefit to increased performance and

(B) that it will actually deliver increased performance.

(A) is a business problem, not a technical one. If you are a performance expert, you'll usually be able to look at the code and determine (B), but the smart path is to measure. (And, don't even bother until you're convinced of (A); if the code is fast enough, better to apply your brain cycles elsewhere.)

The simplest performance model for parallelism is the "NQ" model, where N is the number of elements, and Q is the computation per element. In general, you need the product NQ to exceed some threshold before you start getting a performance benefit. For a low-Q problem like "add up numbers from 1 to N", you will generally see a breakeven between N=1000 and N=10000. With higher-Q problems, you'll see breakevens at lower thresholds.

But the reality is quite complicated. So until you achieve experthood, first identify when sequential processing is actually costing you something, and then measure if parallelism will help.


JB hit the nail on the head. The only thing I can add is that Java 8 doesn't do pure parallel processing, it does paraquential. Yes I wrote the article and I've been doing F/J for thirty years so I do understand the issue.


I watched one of the presentations of Brian Goetz (Java Language Architect & specification lead for Lambda Expressions). He explains in detail the following 4 points to consider before going for parallelization:

Splitting / decomposition costs
– Sometimes splitting is more expensive than just doing the work!
Task dispatch / management costs
– Can do a lot of work in the time it takes to hand work to another thread.
Result combination costs
– Sometimes combination involves copying lots of data. For example, adding numbers is cheap whereas merging sets is expensive.
Locality
– The elephant in the room. This is an important point which everyone may miss. You should consider cache misses, if a CPU waits for data because of cache misses then you wouldn't gain anything by parallelization. That's why array-based sources parallelize the best as the next indices (near the current index) are cached and there are fewer chances that CPU would experience a cache miss.

He also mentions a relatively simple formula to determine a chance of parallel speedup.

NQ Model:

N x Q > 10000

where,
N = number of data items
Q = amount of work per item


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to parallel-processing

Custom thread pool in Java 8 parallel stream How to do parallel programming in Python? Should I always use a parallel stream when possible? How to create threads in nodejs Shared-memory objects in multiprocessing How do I parallelize a simple Python loop? No ConcurrentList<T> in .Net 4.0? What are the differences between stateless and stateful systems, and how do they impact parallelism? How to abort a Task like aborting a Thread (Thread.Abort method)? Can Powershell Run Commands in Parallel?

Examples related to java-8

Default interface methods are only supported starting with Android N Class has been compiled by a more recent version of the Java Environment Why is ZoneOffset.UTC != ZoneId.of("UTC")? Modify property value of the objects in list using Java 8 streams How to use if-else logic in Java 8 stream forEach Android Studio Error: Error:CreateProcess error=216, This version of %1 is not compatible with the version of Windows you're running Error:could not create the Java Virtual Machine Error:A fatal exception has occured.Program will exit What are functional interfaces used for in Java 8? java.time.format.DateTimeParseException: Text could not be parsed at index 21 Java 8 lambda get and remove element from list

Examples related to java-stream

Sorting a list with stream.sorted() in Java Modify property value of the objects in list using Java 8 streams How to use if-else logic in Java 8 stream forEach Java 8 lambda get and remove element from list Create list of object from another using Java 8 Streams Java 8 Stream API to find Unique Object matching a property value Reverse a comparator in Java 8 Ignore duplicates when producing map using streams Modifying Objects within stream in Java8 while iterating How can I get a List from some class properties with Java 8 Stream?