The Stream API was designed to make it easy to write computations in a way that was abstracted away from how they would be executed, making switching between sequential and parallel easy.
However, just because its easy, doesn't mean its always a good idea, and in fact, it is a bad idea to just drop .parallel()
all over the place simply because you can.
First, note that parallelism offers no benefits other than the possibility of faster execution when more cores are available. A parallel execution will always involve more work than a sequential one, because in addition to solving the problem, it also has to perform dispatching and coordinating of sub-tasks. The hope is that you'll be able to get to the answer faster by breaking up the work across multiple processors; whether this actually happens depends on a lot of things, including the size of your data set, how much computation you are doing on each element, the nature of the computation (specifically, does the processing of one element interact with processing of others?), the number of processors available, and the number of other tasks competing for those processors.
Further, note that parallelism also often exposes nondeterminism in the computation that is often hidden by sequential implementations; sometimes this doesn't matter, or can be mitigated by constraining the operations involved (i.e., reduction operators must be stateless and associative.)
In reality, sometimes parallelism will speed up your computation, sometimes it will not, and sometimes it will even slow it down. It is best to develop first using sequential execution and then apply parallelism where
(A) you know that there's actually benefit to increased performance and
(B) that it will actually deliver increased performance.
(A) is a business problem, not a technical one. If you are a performance expert, you'll usually be able to look at the code and determine (B), but the smart path is to measure. (And, don't even bother until you're convinced of (A); if the code is fast enough, better to apply your brain cycles elsewhere.)
The simplest performance model for parallelism is the "NQ" model, where N
is the number of elements, and Q
is the computation per element. In general, you need the product NQ to exceed some threshold before you start getting a performance benefit. For a low-Q problem like "add up numbers from 1
to N
", you will generally see a breakeven between N=1000
and N=10000
. With higher-Q problems, you'll see breakevens at lower thresholds.
But the reality is quite complicated. So until you achieve experthood, first identify when sequential processing is actually costing you something, and then measure if parallelism will help.