Java 8 - Best way to transform a list map or foreach

Question

I have a list myListToParse where I want to filter the elements and apply a method on each element  and add the result in another list myFinalList   With Java 8 I noticed that I can do it in 2 different ways  I would like to know the more efficient way between them and understand why one way is better than the other one    I m open for any suggestion about a third way   Method 1   myFinalList   new ArrayList lt  gt     myListToParse stream            filter elt - gt  elt    null           forEach elt - gt  myFinalList add doSomething elt       Method 2   myFinalList   myListToParse stream            filter elt - gt  elt    null           map elt - gt  doSomething elt            collect Collectors toList

User · Answer

I agree with the existing answers that the second form is better because it does not have any side effects and is easier to parallelise (just use a parallel stream).

Performance wise, it appears they are equivalent until you start using parallel streams. In that case, map will perform really much better. See below the micro benchmark results:

Benchmark                         Mode  Samples    Score   Error  Units
SO28319064.forEach                avgt      100  187.310 ± 1.768  ms/op
SO28319064.map                    avgt      100  189.180 ± 1.692  ms/op
SO28319064.mapWithParallelStream  avgt      100   55,577 ± 0,782  ms/op

You can't boost the first example in the same manner because forEach is a terminal method - it returns void - so you are forced to use a stateful lambda. But that is really a bad idea if you are using parallel streams.

Finally note that your second snippet can be written in a sligthly more concise way with method references and static imports:

myFinalList = myListToParse.stream()
    .filter(Objects::nonNull)
    .map(this::doSomething)
    .collect(toList());

User · Answer

If using 3rd Pary Libaries is ok cyclops-react defines Lazy extended collections with this functionality built in. For example we could simply write

ListX myListToParse;

ListX myFinalList = myListToParse.filter(elt -> elt != null) .map(elt -> doSomething(elt));

myFinalList is not evaluated until first access (and there after the materialized list is cached and reused).

[Disclosure I am the lead developer of cyclops-react]

User · Answer

I prefer the second way   When you use the first way  if you decide to use a parallel stream to improve performance  you ll have no control over the order in which the elements will be added to the output list by forEach    When you use toList  the Streams API will preserve the order even if you use a parallel stream

User · Answer

If you use Eclipse Collections you can use the collectIf() method.

MutableList<Integer> source =
    Lists.mutable.with(1, null, 2, null, 3, null, 4, null, 5);

MutableList<String> result = source.collectIf(Objects::nonNull, String::valueOf);

Assert.assertEquals(Lists.immutable.with("1", "2", "3", "4", "5"), result);

It evaluates eagerly and should be a bit faster than using a Stream.

Note: I am a committer for Eclipse Collections.

User · Answer

There is a third option - using stream   toArray   - see comments under why didn t stream have a toList method  It turns out to be slower than forEach   or collect    and less expressive  It might be optimised in later JDK builds  so adding it here just in case   assuming List lt String gt       myFinalList   Arrays asList              myListToParse stream                        filter Objects  nonNull                       map this  doSomething                       toArray String    new           with a micro-micro benchmark  1M entries  20  nulls and simple transform in doSomething    private LongSummaryStatistics benchmark final String testName  final Runnable methodToTest  int samples        long   timing   new long samples       for  int i   0  i  lt  samples  i              long start   System currentTimeMillis            methodToTest run            timing i    System currentTimeMillis   - start            final LongSummaryStatistics stats   Arrays stream timing  summaryStatistics        System out println testName          stats       return stats      the results are  parallel   toArray  LongSummaryStatistics count 10  sum 3721  min 321  average 372 100000  max 535  forEach  LongSummaryStatistics count 10  sum 3502  min 249  average 350 200000  max 389  collect  LongSummaryStatistics count 10  sum 3325  min 265  average 332 500000  max 368    sequential   toArray  LongSummaryStatistics count 10  sum 5493  min 517  average 549 300000  max 569  forEach  LongSummaryStatistics count 10  sum 5316  min 427  average 531 600000  max 571  collect  LongSummaryStatistics count 10  sum 5380  min 444  average 538 000000  max 557    parallel without nulls and filter  so the stream is SIZED   toArrays has the best performance in such case  and  forEach   fails with  indexOutOfBounds  on the recepient ArrayList  had to replace with  forEachOrdered    toArray  LongSummaryStatistics count 100  sum 75566  min 707  average 755 660000  max 1107  forEach  LongSummaryStatistics count 100  sum 115802  min 992  average 1158 020000  max 1254  collect  LongSummaryStatistics count 100  sum 88415  min 732  average 884 150000  max 1014

User · Answer

One of the main benefits of using streams is that it gives the ability to process data in a declarative way, that is, using a functional style of programming. It also gives multi-threading capability for free meaning there is no need to write any extra multi-threaded code to make your stream concurrent.

Assuming the reason you are exploring this style of programming is that you want to exploit these benefits then your first code sample is potentially not functional since the foreach method is classed as being terminal (meaning that it can produce side-effects).

The second way is preferred from functional programming point of view since the map function can accept stateless lambda functions. More explicitly, the lambda passed to the map function should be

Non-interfering, meaning that the function should not alter the source of the stream if it is non-concurrent (e.g. ArrayList).
Stateless to avoid unexpected results when doing parallel processing (caused by thread scheduling differences).

Another benefit with the second approach is if the stream is parallel and the collector is concurrent and unordered then these characteristics can provide useful hints to the reduction operation to do the collecting concurrently.

User · Answer

May be Method 3  I always prefer to keep logic separate  Predicate lt Long gt  greaterThan100   new Predicate lt Long gt           Override     public boolean test Long currentParameter            return currentParameter  gt  100                    List lt Long gt  sourceLongList   Arrays asList 1L  10L  50L  80L  100L  120L  133L  333L   List lt Long gt  resultList   sourceLongList parallelStream   filter greaterThan100  collect Collectors toList

User · Answer

Don t worry about any performance differences  they re going to be minimal in this case normally   Method 2 is preferable because   it doesn t require mutating a collection that exists outside the lambda expression  it s more readable because the different steps that are performed in the collection pipeline are written sequentially  first a filter operation  then a map operation  then collecting the result  for more info on the benefits of collection pipelines  see Martin Fowler s excellent article   you can easily change the way values are collected by replacing the Collector that is used   In some cases you may need to write your own Collector  but then the benefit is that you can easily reuse that

[java] Java 8 - Best way to transform a list: map or foreach?

The answer is

Examples related to java

Examples related to java-8

Examples related to java-stream

Tags