Should I always use a parallel stream when possible

Question

With Java 8 and lambdas it s easy to iterate over collections as streams  and just as easy to use a parallel stream  Two examples from the docs  the second one using parallelStream   myShapesCollection stream        filter e - gt  e getColor      Color RED       forEach e - gt  System out println e getName       myShapesCollection parallelStream       lt -- This one uses parallel      filter e - gt  e getColor      Color RED       forEach e - gt  System out println e getName        As long as I don t care about the order  would it always be beneficial to use the parallel  One would think it is faster dividing the work on more cores   Are there other considerations  When should parallel stream be used and when should the non-parallel be used    This question is asked to trigger a discussion about how and when to use parallel streams  not because I think always using them is a good idea

User · Answer

I watched one of the presentations of Brian Goetz  Java Language Architect  amp  specification lead for Lambda Expressions   He explains in detail the following 4 points to consider before going for parallelization   Splitting   decomposition costs     Sometimes splitting is more expensive than just doing the work  Task dispatch   management costs     Can do a lot of work in the time it takes to hand work to another thread  Result combination costs     Sometimes combination involves copying lots of data  For example  adding numbers is cheap whereas merging sets is expensive  Locality     The elephant in the room  This is an important point which everyone may miss  You should consider cache misses  if a CPU waits for data because of cache misses then you wouldn t gain anything by parallelization  That s why array-based sources parallelize the best as the next indices  near the current index  are cached and there are fewer chances that CPU would experience a cache miss   He also mentions a relatively simple formula to determine a chance of parallel speedup   NQ Model     N x Q  gt  10000   where  N   number of data items Q   amount of work per item

User · Answer

The Stream API was designed to make it easy to write computations in a way that was abstracted away from how they would be executed  making switching between sequential and parallel easy  However  just because its easy  doesn t mean its always a good idea  and in fact  it is a bad idea to just drop  parallel   all over the place simply because you can  First  note that parallelism offers no benefits other than the possibility of faster execution when more cores are available   A parallel execution will always involve more work than a sequential one  because in addition to solving the problem  it also has to perform dispatching and coordinating of sub-tasks   The hope is that you ll be able to get to the answer faster by breaking up the work across multiple processors  whether this actually happens depends on a lot of things  including the size of your data set  how much computation you are doing on each element  the nature of the computation  specifically  does the processing of one element interact with processing of others    the number of processors available  and the number of other tasks competing for those processors  Further  note that parallelism also often exposes nondeterminism in the computation that is often hidden by sequential implementations  sometimes this doesn t matter  or can be mitigated by constraining the operations involved  i e   reduction operators must be stateless and associative   In reality  sometimes parallelism will speed up your computation  sometimes it will not  and sometimes it will even slow it down   It is best to develop first using sequential execution and then apply parallelism where  A  you know that there s actually benefit to increased performance and  B  that it will actually deliver increased performance   A  is a business problem  not a technical one   If you are a performance expert  you ll usually be able to look at the code and determine  B   but the smart path is to measure    And  don t even bother until you re convinced of  A   if the code is fast enough  better to apply your brain cycles elsewhere   The simplest performance model for parallelism is the  quot NQ quot  model  where N is the number of elements  and Q is the computation per element   In general  you need the product NQ to exceed some threshold before you start getting a performance benefit   For a low-Q problem like  quot add up numbers from 1 to N quot   you will generally see a breakeven between N 1000 and N 10000   With higher-Q problems  you ll see breakevens at lower thresholds  But the reality is quite complicated   So until you achieve experthood  first identify when sequential processing is actually costing you something  and then measure if parallelism will help

User · Answer

Other answers have already covered profiling to avoid premature optimization and overhead cost in parallel processing  This answer explains the ideal choice of data structures for parallel streaming   As a rule  performance gains from parallelism are best on streams over ArrayList   HashMap   HashSet   and ConcurrentHashMap instances  arrays  int ranges  and long ranges  What these data structures have in common is that they can all be accurately and cheaply split into subranges of any desired sizes  which makes it easy to divide work among parallel threads  The abstraction used by the streams library to perform this task is the spliterator   which is returned by the spliterator method on Stream and Iterable  Another important factor that all of these data structures have in common is that they provide good-to-excellent locality of reference when processed sequentially  sequential element references are stored together in memory  The objects referred to by those references may not be close to one another in memory  which reduces locality-of-reference  Locality-of-reference turns out to be critically important for parallelizing bulk operations  without it  threads spend much of their time idle  waiting for data to be transferred from memory into the processor   s cache  The data structures with the best locality of reference are primitive arrays because the data itself is stored contiguously in memory   Source  Item  48 Use Caution When Making Streams Parallel  Effective Java 3e by Joshua Bloch

User · Answer

JB hit the nail on the head  The only thing I can add is that Java 8 doesn t do pure parallel processing  it does paraquential  Yes I wrote the article and I ve been doing F J for thirty years so I do understand the issue

User · Answer

Never parallelize an infinite stream with a limit  Here is what happens       public static void main String   args               let s count to 1 in parallel         System out println              IntStream iterate 0  i - gt  i   1                   parallel                    skip 1                   findFirst                    getAsInt             Result      Exception in thread  main  java lang OutOfMemoryError         at             at java base java util stream IntPipeline findFirst IntPipeline java 528          at InfiniteTest main InfiniteTest java 24      Caused by  java lang OutOfMemoryError  Java heap space         at java base java util stream SpinedBuffer OfInt newArray SpinedBuffer java 750          at       Same if you use  limit       Explanation here  Java 8  using  parallel in a stream causes OOM error  Similarly  don t use parallel if the stream is ordered and has much more elements than you want to process  e g   public static void main String   args           let s count to 1 in parallel     System out println              IntStream range 1  1000 000 000                       parallel                        skip 100                       findFirst                        getAsInt         This may run much longer because the parallel threads may work on plenty of number ranges instead of the crucial one 0-100  causing this to take very long time

User · Answer

A parallel stream has a much higher overhead compared to a sequential one  Coordinating the threads takes a significant amount of time  I would use sequential streams by default and only consider parallel ones if   I have a massive amount of items to process  or the processing of each item takes time and is parallelizable  I have a performance problem in the first place I don t already run the process in a multi-thread environment  for example  in a web container  if I already have many requests to process in parallel  adding an additional layer of parallelism inside each request could have more negative than positive effects    In your example  the performance will anyway be driven by the synchronized access to System out println    and making this process parallel will have no effect  or even a negative one   Moreover  remember that parallel streams don t magically solve all the synchronization problems  If a shared resource is used by the predicates and functions used in the process  you ll have to make sure that everything is thread-safe  In particular  side effects are things you really have to worry about if you go parallel   In any case  measure  don t guess  Only a measurement will tell you if the parallelism is worth it or not

[java] Should I always use a parallel stream when possible?

Examples related to java

Examples related to parallel-processing

Examples related to java-8

Examples related to java-stream